Calibrated breast density methods for full field digital mammography: a system for serial quality control and inter-system generalization.

PURPOSE
The authors are developing a system for calibrated breast density measurements using full field digital mammography (FFDM). Breast tissue equivalent (BTE) phantom images are used to establish baseline (BL) calibration curves at time zero. For a given FFDM unit, the full BL dataset is comprised of approximately 160 phantom images, acquired prior to calibrating prospective patient mammograms. BL curves are monitored serially to ensure they produce accurate calibration and require updating when calibration accuracy degrades beyond an acceptable tolerance, rather than acquiring full BL datasets repeatedly. BL updating is a special case of generalizing calibration datasets across FFDM units, referred to as cross-calibration. Serial monitoring, BL updating, and cross-calibration techniques were developed and evaluated.


METHODS
BL curves were established for three Hologic Selenia FFDM units at time zero. In addition, one set of serial phantom images, comprised of equal proportions of adipose and fibroglandular BTE materials (50/50 compositions) of a fixed height, was acquired biweekly and monitored with the cumulative sum (Cusum) technique. These 50/50 composition images were used to update the BL curves when the calibration accuracy degraded beyond a preset tolerance of ±4 standardized units. A second set of serial images, comprised of a wide-range of BTE compositions, was acquired biweekly to evaluate serial monitoring, BL updating, and cross-calibration techniques.


RESULTS
Calibration accuracy can degrade serially and is a function of acquisition technique and phantom height. The authors demonstrated that all heights could be monitored simultaneously while acquiring images of a 50/50 phantom with a fixed height for each acquisition technique biweekly, translating into approximately 16 image acquisitions biweekly per FFDM unit. The same serial images are sufficient for serial monitoring, BL updating, and cross-calibration. Serial calibration accuracy was maintained within ±4 standardized unit variation from the ideal when applying BL updating. BL updating is a special case of cross-calibration; the BL dataset of unit 1 can be converted to the BL dataset for another similar unit (i.e., unit 2) at any given time point using the 16 serial monitoring 50/50 phantom images of unit 2 (or vice versa) acquired near this time point while maintaining the ±4 standardized unit tolerance.


CONCLUSIONS
A methodology for monitoring and maintaining serial calibration accuracy for breast density measurements was evaluated. Calibration datasets for a given unit can be translated forward in time with minimal phantom imaging effort. Similarly, cross-calibration is a method for generalizing calibration datasets across similar units without additional phantom imaging. This methodology will require further evaluation with mammograms for complete validation.


INTRODUCTION
Breast density is an important breast cancer risk factor most often estimated from mammograms. There are various methods under investigation for measuring breast density 1,2 including those that operate on the image data directly or incorporate calibration. Calibration, or standardization, methods are designed to account for image acquisition technique differences and are more recent developments in mammography. [3][4][5][6][7][8][9][10] Commercial products using standardization are also available to estimate breast density. 11,12 Similarities in the various calibration approaches and related breast density measures were discussed in our previous work. 3,13 When developing a standardized inter-mammogram data (i.e., pixel) representation, calibration data are normally acquired or developed from the respective mammography unit(s). In general, there are different ways to collect these data. In one approach, for example, data can be collected simultaneously at the time of the mammogram acquisition by installing a calibration device in the mammography system. 8,9 We refer to this as an internal method. Alternatively, calibration data can be acquired without a hard-system interface, which we refer to as an external method. 7 There are both benefits and drawbacks with either approach. The internal approach collects data in real-time, mitigating serial drift influences. External approaches that use phantom imaging may experience serial calibration accuracy degradation if the mammography unit experiences drift relative to a calibration dataset collected prior to patient imaging. The external approach may have more latitude in the calibration datatype because the data collection is not visible to the clinical operation. The most appropriate method for standardization is still under investigation. Because we are using an external approach with phantom imaging, 3,4,14,15 it is necessary to establish a serial monitoring component in the calibration system.
We are operationalizing the cumulative sum (Cusum) technique 16 for monitoring serial calibration accuracy. 17 Cumulative sum has its origins in industrial process quality control [18][19][20] and is suited for detecting sustained drift relative to a reference. Since its original development, Cusum has been used in various arenas including medical outcomes evaluation and public health surveillance. Such applications include monitoring performance of clinicians in various specialties, [21][22][23][24][25][26][27][28][29][30] patient survival following transplant procedures, 31 screening process quality, 32 infectious disease outbreaks, [33][34][35][36] cancer incidence, 37 air pollution levels, 38 and occupational safety events. 39 Our calibration methodology was initially developed using indirect x-ray conversion full field digital mammography (FFDM) technology. 3,4,15,17 We are currently modifying these methods for use with direct x-ray conversion FFDM. 14 In this paper, we use phantom imaging to evaluate the serial stability of three direct x-ray conversion FFDM units relative to calibration accuracy, assess methods for maintaining accuracy using the Cusum technique, and evaluate a basis for applying calibration data collected from one unit to another similar unit for generalization purposes.

2.A. Background
Our calibration approach relies on acquiring a baseline (BL) calibration dataset at time zero (BL 0 ) for each FFDM unit. Each calibration dataset requires a considerable amount of breast tissue equivalent (BTE) phantom imaging (about 160 acquisitions), making it impractical to collect full calibration datasets repeatedly in time to maintain prospective accuracy. Instead, we monitor each unit serially to assess the applicability of its BL 0 with a minimal amount of serial phantom imaging, requiring approximately 20 minutes biweekly per mammography unit. When sustained variation beyond a preset level is detected at some time point after establishing BL 0 in a given unit, its calibration dataset can be updated (by hypothesis) using the serial phantom images, essentially bringing its BL 0 dataset forward in time. The basis for this updating mechanism is a special case of more general theory that shows how a calibration dataset acquired with one FFDM unit can be converted to another similarly manufactured unit using a minimal amount of phantom imaging, referred to as cross-calibration. Both the special case of serial calibration dataset updating for a given unit and the more general theory of cross-calibration are presented and evaluated in this paper.

2.B. Mammography units
Phantom images were acquired from three Hologic Selenia direct x-ray conversion FFDM units. These units are used for breast screening and diagnostic purposes at Moffitt Cancer Center. The Selenia detector has 70 micron pitch (pixel resolution). These units produce raw images, stored with 14 bit per pixel dynamic range, and clinical display images, stored with 12 bit per pixel dynamic range. Raw images were used for our work. For screening mammography, these systems primarily use two detector field of views (FOVs) determined by the choice of compression paddle. We examined data acquired with the large FOV (24 × 29 cm or 3328 × 4096 pixels) in this paper, because the choice of FOV does not affect the calibration accuracy. 14 Two units (H 1 and H 2 ) have tungsten/rhodium (W/Rh) and tungsten/silver (W/Ag) target/filter combinations. The third unit (H 3 ) has molybdenum/molybdenum (Mo/Mo) and molybdenum/rhodium (Mo/Rh) combinations.

2.C. Materials
Images of BTE phantom (CIRS, Norfolk, VA) were used for this work. Our phantom sets consist of 100% adipose and 100% fibroglandular (glandular) BTE materials with 1 mm, 2 mm, 1 cm, or 2 cm thickness (i.e., precise heights). The area dimensions are 18 cm × 24 cm for the large set and 12.5 cm × 10 cm for the small set. For a given set, composite compositions were constructed by stacking combinations of these homogeneous BTE materials. For example, we let h = h g + h a , where h is the total height in centimeters of a given stacked phantom arrangement (i.e., height equates with the compressed breast thickness) measured in centimeters, h g is the height of the glandular component, and h a is the height of the adipose component. The theoretical (ideal) percent glandular (PG) standardized value of a given composite is then given by 100 × h g /h. Composite phantoms are referenced by the glandular percentage/adipose percentage. For example, 40/60 references a composite phantom comprised of 40% glandular and 60% adipose BTE materials for a given total height and is referred to as 40 PG. More generally, a composite composition is referenced as w/z below, where w + z = 100, w gives the PG designation, and z is the percentage of adipose tissue by height.

2.D. Calibration
The calibration methods were described previously 3,4,14 and are outlined here. Initial BL 0 datasets are acquired prior to calibrating patient mammograms and are used to establish calibration curves. These curves are functions of the acquisition techniques shown in Table I for each unit. There are 16 acquisition techniques sampled for the H 1 and H 2 units (similar units) and 15 acquisition techniques for the H 3 unit. To limit the amount of calibration data collection, we only sampled the range of compressed breast thicknesses routinely observed in practice. For thicknesses beyond this range, we used extrapolation based on the linear regression model described previously 4 considering sampled heights ≥4 cm in the model. For each acquisition technique, there are two calibration curves corresponding to the 100% adipose and 100% glandular BTE phantom images. These curves and calibration are developed in the logarithm of the relative exposure (LRE) representation as a function of compressed breast thickness (i.e., phantom height). When setting x = mAs (i.e., the acquisition mAs) and letting the value of a given pixel or the average value of a group of pixels = pv, the LRE representation is given by ln(pv/x).
Shorthand notation is used for the developments below. For time points other than time zero, we replace the zero subscript with the index n, unless noted, where n is a running index defining the number of serial samples taken since time zero that can be converted to the total number of days since time zero. Adipose and glandular calibration curves for a given acquisition technique corresponding to BL n are referred to as A n and G n, respectively. When height considerations are required, we reference these as A n,h and G n,h , where h is the total phantom height in centimeters. Given a LRE corresponding to an arbitrary w/z composite phantom [i.e., LRE(w)], we express the calibration application symbolically as CAL[LRE(w)] = w PG units paralleling the description in Sec. 2.C.

2.E. Serial imaging
In addition to the BL datasets, two series of phantom images were acquired from each FFDM unit defined as Timeline 1 and Timeline 2. Timeline 1 was comprised of 50/50 composition phantoms (large set) acquired biweekly using the acquisition techniques shown in Table I with h = 4 cm. Due to study timing, the starting date and the duration of the Timeline 1 imaging vary across the units. Imaging was performed from July 2012 to present for H 1 and H 2 and from March 2013 to present for H 3 . The phantom position and region of interest (ROI) used for the Timeline 1 acquisitions, the BL 0 curve generation, and analyses are shown in Fig. 1   Timeline 2 images were used to evaluate the serial calibration accuracy derived from either BL 0 or the updated BL n for each unit and to evaluate the cross-calibration. Timeline 2 imaging was initiated at a later date relative to Timeline 1 (March 2013-present for all three units). Timeline 2 images offer a means for independent evaluation because they were not used in the BL updating analysis.

2.F. Cumulative sum monitoring
The sequential Cusum technique was used to construct the decision interval (DI for shorthand) Cusum monitoring. 16 We applied the DI method described in our previous work 17 with modifications to monitor in control (IC) behavior and detect sustained variation using Timeline 1 images. The DI detects upward and downward drift (separately) from a defined reference relative to a preset tolerance using two respective time series. The LRE of a 50/50 composition taken at or near BL 0 (time zero) defines the initial reference for each acquisition technique, m 0 = LRE 0 (50). The LRE at n, expressed as m n = LRE n (50), is used to define the Cusum variable for n > 0 with U 0 = 0. The above relationship sets the monitoring to a relative shift from m 0 . 17 The standard sequential Cusum is given by Briefly, Eq. (2) is the running total of the deviation from the reference for the first n samples 16 in normalized form. For random variation, the plot of S n versus n is random (i.e., S n is a zero mean random variable). When the drift is sustained in a specific direction, S n will veer off in the same direction. Using Eq. (1), the respective DI forms for detecting upward and downward shifts are given by and where S − 0 = S + 0 = 0, k is the chart constant, and the normalized LRE shift tolerance defined below. When the deviation is less than this tolerance, the DI terms return zero indicating IC behavior. Otherwise, the DI triggers and returns a value other than zero from one of its terms depending on the drift direction. We define OC behavior as two consecutive upward or downward LRE shifts exceeding the tolerance (i.e., two consecutive DI triggers in the same direction). When OC behavior is detected, calibration curves and the Cusum are adjusted accordingly to bring the system into tolerance. For example, when this event occurs at n, BL 0 is updated at n − 1 giving BL n−1 (i.e., effectively creating a new BL 0 ) and the Cusum is reset: m 0 = m n−1 and S n−1 = S 0 = 0. For each BL update, k is recalculated with the new m 0 to maintain the ±4 PG tolerance after the reset. Figure 2 shows the schema for the monitoring and updating. Typically, Cusum applications use a detection threshold in conjunction with the chart F. 2. Serial monitoring schema for Timeline 1. Out of control (OC) behavior is defined as two consecutive DI Cusum triggers, occurring at n − 1 and n. The reset and updating take place retrospectively at n − 1 relative to the OC event detection at n.
constant 16 to define OC behavior. In our application, we use the two consecutive DI trigger rules developed specifically for this application based on our choice of k, rather than the threshold approach.
The chart constant, k, plays a critical role in capturing OC behavior and is equivalent to the fractional shift from the initial LRE reference for a given acquisition technique (i.e., kV and target/filter) in our application. We used methods developed previously 17 to determine k with further modifications. For reference, there is a linear and symmetric relationship between the degree of deviation from the reference LRE and the resulting variation in PG (i.e., calibration error) for a fixed height. Additionally, the variation in the calibration accuracy is a function of the acquisition technique and height, but not of composition. For all acquisition techniques and heights, we required the serial calibration accuracy to be within a ±4 PG tolerance from the ideal value, 14 forming the basis for this user-imposed IC bound. The ±4 PG width defines the 95% confidence interval (CI) for the calibration accuracy. To achieve this uniform accuracy objective, lookup tables (LUTs) were developed numerically and modeled with polynomials to estimate the appropriate k because we only monitor h = 4 cm with actual serial imaging.
We outline the methods to estimate k below and provide the mathematical details in the Appendix. The 50/50 reference LREs are established from the BL 0 dataset using a linear combination. We numerically find the values of k for each acquisition technique that gives a ±4 PG shift in the calibration of the Timeline 1 50/50 composition images (h = 4 cm) giving k 4 . A similar approach is then used to estimate k for the full range of heights (i.e., for h 4 cm) using a normalization that references arbitrary k-k 4 resulting in height dependent LUTs. The k for arbitrary h is found by taking the appropriate point from the LUT and adding it to k 4 giving k h . Thus 50/50 compositions for arbitrary h can be monitored using Timeline 1, essentially creating virtual timelines. The justification for the simultaneous monitoring for all h based on one serial timeline (i.e., h = 4 cm) follows from the developments provided in Sec. 2.G. below and is illustrated in Sec. 3.

2.G. Baseline updating and cross-calibration theory
Timeline 1 images are used to update BL 0 (or more generally BL n ) when the DI detects OC behavior. The goal of updating is to reconstruct the BL 0 dataset at any time after its initial collection using one phantom image per acquisition technique taken near the time of reconstruction. We make these assumptions used in the updating development outlined below: (i) various difference equations (described below) derived from calibration curves at arbitrary time points are statistically time-invariant; and (ii) BL n datasets within or across similar Hologic units vary by an offset (i.e., a constant shift). With these assumptions, the following hypotheses were evaluated: (i) the BL 0 dataset for a given acquisition technique and unit can be updated (i.e., reconstructed) using one 50/50 composite phantom image per acquisition technique taken at n giving BL n . This essentially translates BL 0 in time to BL n by making the appropriate adjustments and is a special application of a more general theory. (ii) For arbitrary n and m, the BL n from H 1 can be converted to BL m for H 2 (or vice versa) by using the other similar unit's 50/50 compositions taken at n or m, defining the more general theory of cross-calibration.
The BL updating (translation) and cross-calibration operations follow from the same development. The theoretical calibration curves for a given unit at n are related to a composite composition acquired at n (or near n) by a linear combination where c is the [0,1] valued mixing coefficient with c = 1/2 for a 50/50 composition. We solve Eq. (5) for h = 4 cm with c = 1/2 giving one calibration curve point at n, We use a difference equation relating BL 0 with BL n (n > 0) to define an increment Equation (7) is the difference between a given unit's calibration curves as a function of h and is valid under our assumptions because BL 0 can be acquired at any time. We note, this difference is the contrast between the adipose and glandular pixel values for a given h in the LRE representation. Also by hypothesis, this difference is a characteristic quantity of similarly manufactured units (i.e., H 1 and H 2 ). Rearranging the above equations with substitution gives and Equations (8) and (9) specify one point from each theoretical calibration curve at n in terms of a difference from the BL 0 dataset and a measured 4 cm LRE for a 50/50 composition acquired at n. The next differences resemble discrete derivatives with respect to h expressed as and Equations (10) and (11) provide the means to derive the calibration curves at n using A n,4 and G n,4 for a given unit or develop the other unit's BL n (i.e., cross-calibration) by the choice of 50/50 composition used in this updating scheme (i.e., either acquired from H 1 or H 2 ). The general equation for the adipose calibration curve reconstruction at n for h > 4 cm is given by For example, the point for h = 5 cm is given by The general expression for h ≤ 4 cm is expressed as For example, the point for h = 3 cm is given by The other points are derived similarly, and the solution for G n,h follows the same form. The development outlined above shows how to reconstruct the entire BL calibration dataset with one image per acquisition technique, and its merits were evaluated with additional experiments described below. The assumptions used above for the difference relationships are supported by the linear form of the calibration curves. 4,14,15 We used a 50/50 composition with h = 4 cm for this development, but these choices are not unique. The 50/50 composition gives even weight to the A and G curves (i.e., reducing bias because the mixing coefficients are equivalent c = c glandular = 1 − c adipose = 1/2), and 4 cm represents a central range for the expected compressed breast thickness. We note in practice, the BL curves are established (sampled) at integer height values. A cubic-spline interpolation is used to estimate intermediate points as required. 14 Three additional experiments were designed to evaluate the BL update/translation and cross-calibration hypotheses: (i) to evaluate the BL translation within a given unit, we updated BL n at each n for all n ≥ 1 and calibrated the respective unit's Timeline 2 images acquired at each n and evaluated accuracies with respect to the tolerance (i.e., within-unit evaluation), referred to as continual BL translation; (ii) we performed cross-calibration at each n ≥ 0 for H 1 and H 2 using their Timeline 2 images at n and made comparisons with the tolerance and the continual within-unit BL translation; and (iii) as an additional means for the cross-calibration comparison, calibration data acquired from H 1 was applied to the Timeline 2 images acquired with H 2 and vice versa at each n for all n ≥ 0 without modification (i.e., switching BL 0 calibration datasets referenced as Cal-switch), in contrast with the cross-calibration evaluated above.

3.A. Decision interval cusum monitoring
To establish the serial monitoring, chart constants were estimated numerically as a function of acquisition technique and height for each unit. Figure 3 shows examples of the adjustment LUTs with the k 4 influence removed for the H 1 and H 3 units. The modified k was determined by adding the respective adjustment from the LUT to k 4 . The adjusted k detects ±4 PG shift in the calibration at a given height using the Timeline 1 series (i.e., 4 cm phantoms) as the reference (all acquisition techniques). Data for H 2 are not shown due to its similarity with H 1 data. The chart constant adjustments were modeled as a function of h with a fifth degree polynomial. The plots show the k adjustments and h increase in tandem indicating that the calibration accuracy has greater uncertainty as h decreases. This greater uncertainty is due to the separation between the adipose and glandular calibration curves, which decreases as h decreases. This effect is illustrated in Table II by showing: (i) various LRE shifts (absolute value) for a 50/50 composition (W/Ag at 27 kV) required at specific heights to cause a 4 PG calibration shift at each of the specified heights; and (ii) the calibration errors that this shift induces at the nonspecified heights. As a specific example, the italicized row shows that a 0.0383 LRE shift at 5 cm translates to these errors: (i) 4 PG shift at 5 cm as expected, (ii) 5.71 PG shift at 3 cm, and (iii) 3.27 PG shift at 7 cm. Similarly, these plots show that smaller variations in the LRE are required to induce a 4 PG shift as kV increases for a given height and target/filter combination, as the calibration curves also become less separated. . Sample points (dots) were generated numerically using measured values at time zero and represent relative k adjustments with the influence of k 4 removed. These were derived from the baseline dataset for each target/filter and kV combination. The fitted fifth degree polynomials (solid curves) represent the relative adjustments for the chart constant as a function of height (h). The adjustment is determined from the appropriate curve at the respective time (i.e., for a given target/filter and kV) given h 4 cm, which is then added to k 4 for the DI monitoring and is required to detect a ±4 PG shift.
To further illustrate the importance of using the appropriate k and BL updating when required, the DI was first applied over the entire Timeline 1 interval without resetting or updating BL 0 using k 4 (i.e., the k that applies specifically to Timeline 1). OC behavior was detected in H 1 for most acquisition techniques initiating at 110 days (e.g., following the schema in Fig. 2, OC behavior was detected at 119 days corresponding to the second consecutive Cusum trigger at n. The reset takes place at n − 1 corresponding to 110 days, which was the first Cusum trigger for this OC event) with these exceptions: W/Ag, 31 kV at 18 days; W/Ag, 32 kV at 10 days; and W/Rh, 27 kV at 91 days. In contrast, when using k 5 (i.e., the k that applies to Timeline 2), OC behavior was detected in H 1 only for the W/Ag, 27 and 28 kV techniques initiating at 110 days. Figure 4  In comparison when using k 5 , OC behavior was detected once for four consecutive time points between 110 and 147 days (meeting the two-consecutive event criterion) and deviated past the tolerance randomly at four time points not meeting the two-consecutive event criterion at 216, 259, 357, and 427 days. The contrast between the 4 and 5 cm samples shows that at least part of the acquisition space can be OC while other parts are IC over extended periods. The corresponding DI plots with BL updating-resetting (i.e., adjusting m 0 and k 4 , bringing BL 0 forward and resetting the Cusum argument to zero) are also shown in Fig. 4 (top-right) Fig. 4. To illustrate the influence of serial BL updating on k, we compare the adjusted LUTs at two time points in Fig. 5. This shows the LUTs for H 1 (W/Rh at 27 kV) at time zero and at 110 days (i.e., time of the update for 4 cm to monitor the 5 cm series). The curves diverge as h increases. In summary, adjusting k for h 4 cm to monitor Timeline 1 images (h = 4 cm) establishes virtual timelines for a given height without performing additional imaging.

3.B. Serial calibration accuracy
The calibration accuracy evaluation with BL updating (within a given unit) due to OC behavior is not possible using Timeline 2 images at current. The required updating for H 1 with h = 5 cm was detected prior to the initiation of its Timeline 2 acquisitions, and the other units were IC over the interval. We used two alternative examples to show the validity of the updating approach. First to illustrate the merits of updating, we used Timeline 1 images from H 1 and constructed an artificial problem by not resetting the Cusum for the first two OC events. As indicated in Fig. 4 (top-right), OC behavior was detected at 119 days (first reset event detected) and at 385 days (second reset event detected). We show the Cusum trajectory from the first event through 357 days (the time point before the second event) without resetting and adjusting in will exhibit a linear form afterward. The linear agreement of the fitted line (linear correlation = 0.95) with the observed sustained drift indicates the OC behavior was induced by a constant (approximately) jump in m n detected at about 119 days (i.e., two consecutive points beyond the tolerance detected at n corresponding with 119 days). We updated BL 0 at 110 days and calibrated m n for all n up to 357 days. The bottom plot in Fig. 6 shows the corresponding updated and nonupdated calibration accuracies for the same time period. The updated calibration values are within tolerance (dashed lines), whereas the majority (74%) of the calibration values derived from BL 0 are below the lower tolerance. Second, Fig. 7 shows the virtual monitoring of 3 cm 50/50 composition phantom images for H 1 (W/Ag at 27 kV), which shows OC behavior over most of the interval. The triangles on the x-axis mark the time points (days) of irregularly acquired 3 cm 50/50 composition phantom image acquisitions. These images were taken at irregular time points for other purposes. The system was IC for the first time point and OC for the two later time points. The calibration accuracy is shown in Table III, which indicates improved accuracy due to updating. The within-unit calibration accuracies for Timeline 2 are summarized in Table IV for all three units. Because the accuracies were similar, samples corresponding to every other kV are provided in Table IV, except where noted. The calibration results are presented as serial averages for the respective compositions and are all within tolerance.
The continual BL translation and cross-calibration accuracy summaries for the H 1 and H 2 units are provided in Table V. For either unit, the accuracies from cross-calibration derived from the other unit's Timeline 1 images are similar to those obtained from the within-unit calibration using continual BL translation. Because the findings across these two units for all F. 7. DI cumulative sum monitoring of a 3 cm 50/50 composition. This shows a virtual 3 cm DI timeline for H 1 (W/Ag at 27 kV) derived from modifying the chart constant. The triangles on the x-axis mark the time points of the irregularly sampled LREs for a 3 cm 50/50 composition. The downward arrows indicate DI triggers. Thus, the two samples to the right were acquired when the system was out of control. Calibration results for these three samples are shown in Table III. compositions were similar, we show without loss of generality, (i) the 20/80 and 40/60 compositions (i.e., less glandular content) for H 1 in Fig. 8 and (ii) the 60/40 and 80/20 compositions (greater glandular content) for H 2 in Fig. 9 (W/Ag at 27 kV). These plots illustrate (i) within-unit calibration for continual BL translation (diamonds), (ii) cross-calibration (plus signs), and (iii) calibration obtained by switching the respective BL 0 datasets (asterisks). In contrast, the accuracy obtained from switching the BL datasets (asterisks) is for the most part beyond tolerance. The continual BL translation findings for H 3 are summarized in Table VI (example plots not shown), similarly demonstrating the translation principle. Because the accuracies were similar, samples corresponding to every other kV are provided in Table VI. These findings indicate that the special case BL translation and the more general technique of cross-calibration as developed in Sec. 2.G. are valid approximations.

DISCUSSION
This paper introduced several novel concepts advancing both Cusum and calibration theory. The relative shift monitoring and LUT generalization represent key advancements in both areas. We modified the Cusum to detect a relative shift from the standard. Other researchers have modified the Cusum argument as well by normalizing it to the estimated deviation 40 was for a secondary application of maintaining prospective calibration accuracy. The relative shift monitoring provided a foundation for developing a method of determining a suitable chart constant with numerical analyses tied to the calibration accuracy. We showed that calibration accuracy is a function of the acquisition technique as well as the compressed breast thickness (i.e., phantom height in this paper) with respect to a given LRE standard. The calibration accuracy is generally more sensitive to LRE shifts for smaller compressed breast thicknesses and for higher energy beams. This is expected because the separation between the adipose and glandular calibration curves decreases as thickness decreases and the energy of the beam increases. The chart constant was adjusted to achieve uniform serial calibration accuracy across the acquisition techniques and heights. Effectively in the prospective application, each patient will have their own monitoring timeline based on the mammography unit's compressed breast thickness reading. A system to monitor and maintain serial calibration accuracy was presented and evaluated with three FFDM units. Baseline translation requires one serial image for each acquisition technique per FFDM unit. These images are from the Timeline 1 serial monitoring dataset. Therefore, additional imaging is not required for BL translation. Generally, the serial calibration accuracy was within the ±4 PG tolerance as demonstrated when calibrating Timeline 2 images for all units. This BL translation principle was validated with the H 1 unit using a segment of its Timeline 1 images, spanning approximately 210 days. In this example, BL translation corrected the calibrated timeline segment by moving the respective accuracy into the IC region. To further demonstrate the BL translation, we set up an artificial application. This application included updating the BL continually at each time point, n, and calibrating the Timeline 2 images at each respective n, which gave similar calibration accuracies as derived from the standard application for a given unit indicating equivalency between the many BL datasets. The BL translation is a special case of the more general cross-calibration principle that shows how to convert the BL dataset of one unit to the other unit's (unit 2) BL at arbitrary n by using the Timeline 1 images acquired at n with unit 2. The cross-calibration application was validated by showing its accuracy was similar to that of the continual BL updating (within-unit) and was within the ±4 PG tolerance. As a counter example, we switched the BL calibration datasets (Cal-switch) for H 1 and H 2 and applied the calibration without adjustments. As expected, the accuracy was outside of the tolerance region for the majority of samples because in general, the BL n from unit 1 is not interchangeable with the BL n from unit 2 or vice versa.
There are several aspects of this work that warrant further comment. An automated breast density measure suitable for T V. Baseline calibration dataset translation and cross-calibration evaluation. For each unit (H 1 , and H 2 ), the baseline dataset BL 0 (i.e., time zero) was translated forward in time and updated to BL n with Timeline 1 images on a continual biweekly basis from day 0 to approximately 482 days. BL n was then used to calibrate the corresponding Timeline 2 images at each n (within-unit). For the cross-calibration experiment: (1) the BL 0 dataset for H 1 was adjusted with Timeline 1 images acquired with H 2 and then used to calibrate the Timeline 2 images acquired with H 2 (H 2 cross-cal); (2) the BL 0 dataset for H 2 was adjusted with Timeline 1 images acquired with H 1 and used to calibrate the Timeline 2 images acquired with H 1 (H 1 cross-cal); and (3) for comparison, the BL 0 from H 1 was used to calibrate Timeline 2 images acquired with H 2 and vice versa (Cal-switch). Serial averages are provided in this table. The respective standard deviation of the mean distribution is provided parenthetically. The number of serial samples for each composition for each unit is provided: H 1 : n 1 = 16; and H 2 : units. We also demonstrated that the calibration approach developed with indirect x-ray conversion FFDM applies to direct x-ray conversion units as well with refinements, also addressing intertechnology generalization. There are also several qualifications with our findings. The reported accuracies are best considered ideal because phantoms with precise heights were used in the evaluation. The BL updating illustrations for the most part pertained to phantom heights at 4 and 5 cm. However, the limited number of 3 cm samples also indicates the approach is valid over the height range. Uncertainty caused by variation in compressed breast thickness did not impact our analysis as would be the case when calibrating mammograms. We are developing a compressed breast thickness correction to compensate for uncertainty introduced by the compression paddle for each unit based on our related work. 3 Generally, the compressed breast thickness correction is a function of the pixel coordinates defining a surface. The method for incorporating a correction surface into our serial monitoring is under development. As noted previously, 13 the x-ray attenuation of the adipose BTE material does not match that of the adipose breast tissue precisely. This attenuation artifact was not in effect in this study, as would be encountered when calibrating mammograms using BTE phantoms. Although we have addressed this attenuation artifact previously, 13 we believe more analysis is required to find an approximate solution. Both the translation and cross-calibration techniques were derived by assuming differences in the BL data acquired at different times differed by a constant offset observable in the serial LRE measurements from Timeline 1 and not where drift is a function of time. As observed previously, DI monitoring exhibited nonlinear behavior (i.e., drift was a function of time) prior to x-ray tube failure when analyzing data from another FFDM technology. 17 This artifact is mitigated by our two-consecutive event criterion definition for OC behavior intervention. As noted, 17 a constant LRE shift gives rise to a linear DI trajectory. To detect this form of drift, the OC behavior can be evaluated for linearity as part of the monitoring process.

CONCLUSION
We presented a methodology to maintain serial calibration accuracy using phantom images acquired with direct x-ray conversion FFDM. Monitoring techniques developed earlier were extended by generalizing the DI chart constant with LUTs, permitting the surveillance of many processes simultaneously without requiring additional imaging. The approach relies on acquiring one phantom image per acquisition technique for each FFDM unit biweekly, translating into approximately 20 minutes effort per unit. We evaluated a novel method of updating calibration datasets using the images acquired for the serial monitoring (i.e., with no additional effort). This updating translates the current BL dataset forward in time with a minimal amount of imaging effort. Baseline translation is a special case of a more general cross-calibration application, which may be useful for applying calibration at facilities with similar units, eliminating the necessity of acquiring full BL datasets. Further evaluation of the serial monitoring and BL update/translation theory across other sites and imaging technologies is required to substantiate its generality. Tomosynthesis (TS) is an emerging imaging technology 42 that may supplant or at least exist in parallel with conventional two-dimensional (2D) FFDM. As raw TS data is comprised of multiple 2D projections, 43 we would expect, by hypothesis, our calibration methodology to translate. Most importantly, our techniques will require validation using patient images with breast cancer status as the endpoint comparison, which is the next planned investigation.

ACKNOWLEDGMENT
This work was supported by National Institutes of Health Grant No. R01CA166269. The authors have no conflicts of interest to disclose.

APPENDIX: CHART CONSTANT ESTIMATES
The 50/50 reference LREs for 4 cm are established from the BL 0 dataset using a linear combination with equal weights = 1/2 because these are not included in the BL datasets. We include a height dependency in m n [i.e., m n (h)]. To estimate k numerically for h = 4 cm, we use Eq. (1) and shift m n (4) incrementally giving m n (4) + α j , where j is the increment index and α j is the increment at j. We estimate numerically CAL[m n (4) + α j ] that gives (50 ± 4) PG. Due to symmetry, either 46 PG or 54 PG can be used equivalently for the k solution. The numerical solution for specific α j = d 4 is then used to express k 4 = d 4 /|m n (4)| for Timeline 1. The same approach was used to determine k for any height (i.e., h 4) that is then used with the Timeline 1 monitoring, creating virtual timelines corresponding to any h as required. Using the same approach for h 4 gives d h such that CAL[m n (h) + d h ] = PG ideal ± 4 PG for all h. The modified k used with Timeline 1 (i.e., the 4 cm timeline) is then given by k h = d h /|m n (4)| producing a virtual timeline for monitoring samples (images) for a given height referenced to the preset tolerance. We present LUTs for the k h determination as well. When applying this technique to virtual timelines (i.e., h 4 cm), the required m n is also estimated from the appropriate calibration curves as a linear combination, if not imaged or measured explicitly.