Medical physics 3.0 versus 1.0: A case study in digital radiography quality control

Abstract Purpose The study illustrates how a renewed approach to medical physics, Medical Physics 3.0 (MP3.0), can identify performance decrement of digital radiography (DR) systems when conventional Medical Physics 1.0 (MP1.0) methods fail. Methods MP1.0 tests included traditional annual tests plus the manufacturer's automated Quality Assurance Procedures (QAP) of a DR system before and after a radiologist's image quality (IQ) complaint repeated after service intervention. Further analysis was conducted using nontraditional MP3.0 tests including longitudinal review of QAP results from a 15‐yr database, exposure‐dependent signal‐to‐noise (SNR 2), clinical IQ, and correlation with the institutional service database. Clinical images were analyzed in terms of IQ metrics by the Duke University Clinical Imaging Physics Group using previously validated software. Results Traditional metrics did not indicate discrepant system performance at any time. QAP reported a decrease in contrast‐to‐noise ratio (CNR) after detector replacement, but remained above the manufacturer's action limit. Clinical images showed increased lung noise (Ln), mediastinum noise (Mn), and subdiaphragm‐lung contrast (SLc), and decreased lung gray level (Lgl) following detector replacement. After detector recalibration, QAP CNR improved, but did not return to previous levels. Lgl and SLc no longer significantly differed from before detector recalibration; however, Ln and Mn remained significantly different. Exposure‐dependent SNR 2 documented the detector operating within acceptable limits 9 yr previously but subsequently becoming miscalibrated sometime before four prior annual tests. Service records revealed catastrophic failure of the computer containing the original detector calibration from 11 yr prior. It is likely that the incorrect calibration backup file was uploaded at that time. Conclusions MP1.0 tests failed to detect substandard system performance, but MP3.0 methods determined the root cause of the problem. MP3.0 exploits the wealth of data with more sensitive performance indicators. Data analytics are powerful tools whose proper application could facilitate early intervention in degraded system performance.

with the medical physicist. 1 QC tests are typically performed on an incidental basis during acceptance, commissioning, annual inspections, troubleshooting, or performance verifications after service, and are part of an overall quality assurance program. [2][3][4] The test procedures and pass/fail criteria may come from federal, state, or local regulations, accrediting or professional organizations, adaptions from the scientific literature, or the equipment manufacturers themselves. [5][6][7] The QC tests are snapshots of system performance in time, and with rare exceptions, there are no firm requirements to compare performance to historical results, to other systems, or to establish trends. A measurement within acceptable criteria is considered to "pass." Once the system performance level is established, monitoring of its performance is not required until the next inspection or service event. [5][6][7] This pattern of QC support is what could be called "Medical Physics 1.0 (MP1.0)," the current standard of practice. [8][9][10][11] Clinical medical physicists exceed this basic level of service as their time, resources, and individual preferences allow, but this description provides a reasonable minimum expectation for physicist testing. This level of QC support is also consistent with the description of "Level 1 services" defined by the American Association of Physicists in Medicine (AAPM) Diagnostic Work and Workforce Study Subcommittee's Levels of Service model in Report 301. 12 In this sense, MP1.0 tests are "well- The Information Age has afforded medical physicists advanced analytical capabilities using both imaging systems themselves and the computers that they employ to acquire and analyze test data.
These capabilities allow for a new level of sophistication, efficiency, and sensitivity in QC of imaging systems that could be associated with MP3.0. MP3.0 is a vision for transitioning to value-and evidence-based medicine and aims to expand clinical physics beyond the traditional insular models of testing that could be regarded as MP1.0. MP3.0 is scientifically informed by findings and methods, clinically relevant to the operational practice, and pragmatic in its meaningful and efficient use of resources. Furthermore, MP3.0 strives for quality consistency in addition to compliance, team-based clinical operation models, and retrospective evaluation of clinical performance. This type of effort in QC support is consistent with "Level 3 services" as defined in AAPM Report 301. 12 Whereas MP1.0 analysis considers system performance in temporal isolation, MP3.0 may use sophisticated informatics resources to analyze the temporal system performance characteristics. As the medical physicist collects and analyzes historical QC results and establishes trends, the MP3.0 framework exploits the wealth of data through the use of more sensitive performance indicators. As a result, the interval to detection of substandard performance can be decreased.
Herein, a clinical case is described to illustrate these two different QC paradigms. An image quality complaint from a radiologist called for medical physics attention to this case, and rootcause analysis was subsequently incorporated into an ongoing institutionally approved retrospective quality improvement project.  (Fig. 1). The specific complaint was that grid lines were very prominent on the posteroanterior (PA) view of the radiograph (Fig. 2). Upon inspection, the exaggerated grid lines were verified; however, white artifacts along the skin lines and cortical bones were also visible on the PA and lateral (LAT) views. These image processing artifacts along regions of rapidly changing density are also known as rebound or "Uberschwinger" artifacts. [16][17][18] Prior clinical experience with these artifacts suggested that improper detector gain and offset calibration was a likely cause of both this and the prominent grid lines.
A proper detector gain and offset calibration can reduce the appearance of both rebound artifacts and grid lines present in clinical images. In the DR system, the grid is located in a fixed position relative to the detector. At a given source-to-image distance (SID), the grid lines are projected onto the detector in the exact same location except for any slight deviation from perfect alignment of the x-ray tube and the grid/image receptor. The projection of the grid lines imposes a periodic nonuniformity in exposure across the detector. If the gain and offset calibration is performed by means of a flat-field acquisition with the grid in place, this nonuniformity tends to be corrected. 19,20 If this calibration is not performed properly, the digital image processing algorithm can aggravate this periodic nonuniformity, which then manifests itself to the radiologist as "prominent grid lines." 19,20 A service call was made for recalibration of the detector, and the system was removed from clinical use. The detector was recalibrated for gain and offset, and a manufacturer Quality Assurance Procedures (QAP) test was performed. The DR unit passed the QAP test and was returned to clinical use. Afterward, clinical images obtained using the system no longer exhibited excessive grid lines or rebound artifacts (Fig. 3).
Overall, corrective action was successful, but why did routine QC measurements not warn of the problem sooner? Apparently, routine QC measurements were either not designed or not optimized to detect the cause of these artifacts. Additionally, how long had the system been producing substandard images, and could there be other measurements that might have been more prognosticative? To address these questions, root-cause analysis was initiated using four advanced methods: inspection of a database containing QAP test results, exploration of clinical image quality metrics, analysis of exposure-dependent signal-to-noise ratio squared (SNR 2 ) data, and queries of the institutional service events record database.  (Fig. 4). 21 The two uniform images are analyzed to determine artifacts, local and global brightness nonuniformity, and SNR nonuniformity. The IQST phantom contains inserts for measuring spatial modulation transfer function (MTF), dynamic range linearity and accuracy, resolution nonuniformity, electronic and correlated noise, and contrast-to-noise ratio (CNR), which proved to be of particular value in this case.

2.A | QAP database
The QAP analysis software automatically calculates CNR 22 for three different contrast levels in the "for-processing" (aka "raw, ranged," "unprocessed") image of the IQST phantom (Fig. 4). The calculations are made using three pairs of rectangular and square regions of interest (ROIs) located on the left side of the central portion of the IQST image (see Fig. 5). The difference in the mean gray level between each rectangle and its corresponding square is defined as F I G . 1. Image quality report documenting radiologist complaint. Application integrated into the PACS viewer sends formatted message via email to predefined distribution list including medical physicist. All fields are automatically populated with the exception of the "issue," which can be selected from a pull-down menu or input free-text, and four available lines of free-text for further description of the problem. The contact name and phone can be over-ridden. The reports are archived and used to track action on complaints. the contrast. Each square ROI is also used to calculate the value for noise. CNR is reported as CNR1, CNR2, and CNR3 from low to high contrast level, respectively. a MD Anderson uses a custom software program to retrieve the vendor-generated QAP test results from each machine, parse the files, and store the results in a database. A website displays the long-term test results for review. The program was developed in

2.B | Exposure-dependent SNR 2
Exposure-dependent SNR 2 measurements, which are analogous to the noise-equivalent quanta of an image (NEQ), provide criteria for analyzing the performance of digital flat-panel imaging systems. 23 Gain and offset calibration of the detector has been shown to reduce the variation in exposure-dependent SNR 2 performance among DR systems. Because these measurements are valuable for identifying abnormal detector performance, the next step in the root-cause analysis was to compare the exposure-dependent SNR 2 measurements from the Revolution XQi system with established confidence limits. 23 MD Anderson routinely acquires SNR 2 as a function of exposure as part of DR annual testing, so these data were available from annual reports.

2.C | Clinical image quality metrics
The next step in the root-cause analysis was to evaluate clinical image quality metrics for individual PA chest radiographs acquired F I G . 2. PA chest radiograph that prompted radiologist's image quality complaint on June 1, 2015, for prominent grid lines. Rebound artifacts also known as "Uberschwinger" artifacts 16-18were noted as indicated by the arrow. This is also seen in exaggerated contrast of some cortical bone. Upper right inset: line profile shows exaggerated grid lines in image. The prominent beat frequency corresponds to the aliased frequency of the grid. Lower right inset: surface plot of one of the rebound artifacts.
using a Revolution XQi DR unit. Ninety-three images were chosen for analysis as part of a retrospective review. This evaluation was approved by the MD Anderson institutional Quality Improvement Assurance Board. Based on inspection of the weekly QAP measurements, images were chosen to represent different periods of recorded CNR, consisting of "normal" CNR (17 images), "lower" CNR (20 images), "higher" CNR (20 images), and images acquired during a transitional period from normal to lower CNR (36 images).   Upon further examination of the service history, it was apparent that the CNR groups corresponded to service events, i.e., prior to detector replacement, after detector replacement, after detector recalibration, and the 2-week period immediately after detector replacement, respectively.
The images were anonymized, securely transferred, and analyzed by the Duke University Clinical Imaging Physics Group using a previously described 24  An example is shown in Fig. 6.
Descriptive statistics were calculated for image quality metrics for each group of images using the SPSS software program (version 23; IBM Corporation, Armonk, NY). Image quality metrics were compared across groups using a one-way analysis of variance (ANOVA).
The same software program was used to generate control charts.

2.D | Institutional service database
MD Anderson's service database was queried to reveal events that may have affected detector performance. This database contains records of service events for all diagnostic imaging modalities and auxiliary equipment at MD Anderson. The database was implemented using a customized commercial software program (EAM, version 10; Infor, New York, NY) and is populated semiautomatically from service calls and service reports. Electronic records in the database date back to 2004. Paper records of events before 2004 are available.

2.E | Integration of performance metrics
The QAP database provided a means to visually assess each of the seventeen QAP metrics before and after the time period when the artifact was observed and reported. Images were analyzed from the same time period, and the resulting values for the ten clinical image quality metrics for groups of images (predetector replacement, postdetector replacement, postdetector recalibration, and the transition groups) were compared statistically to identify which metrics showed substantial changes that were concurrent with the event. The exposure-dependent SNR 2 data were used to broaden the search for a root cause of the detector miscalibration. The service database was the ultimate source of an explanation of the unexpected performance changes.

3.A | QAP data
The QAP results immediately before the image quality complaint and immediately after the detector recalibration are shown in Table 1

3.B | Clinical image quality metrics
Descriptive statistics for image quality metrics are reported in Table 2  The image quality metrics of five of the groups are compared in Table 2, excluding the pooled weeks of transition from high to low CNR. The results of ANOVA for these paired comparisons are shown in Table 3. No statistically significant differences were observed between groups in lung detail (Ld), rib-lung contrast (RLc), rib sharp- 3.C | Exposure-dependent SNR 2 The initial set of exposure-dependent SNR 2 measurements was made in October 2006 soon after acceptance testing of the unit and calibration of the new detector. Unfortunately, similar measurements T A B L E 1 QAP results before and after detector gain and offset calibration. Contrast-to-noise ratio values are emphasized to indicate that these were the only values to display large differences before and after detector replacement. The abbreviations, "LSL" and "USL", stand for "lower system limit" and "upper system limit," respectively.  (Fig. 10). The exposure-dependent SNR 2 measurements also confirmed improper detector calibration after detector replacement, which was reflected in the 2015 annual performance evaluation.
The 2015 annual test was performed in January, in the middle of the period between the detector replacement and the detector recalibration (see Fig. 8). This test was unique in that it indicated performance well below the lower acceptable limit. Clinical images acquired during the same time showed rebound artifacts and prominent grid lines similar to the ones shown in Fig. 2 that prompted the radiologist complaint. In fact, the rebound artifacts and prominent gridlines appeared in clinical images on Aug 11, 2014 briefly after the detector replacement and acceptance testing. The artifacts were present in clinical images throughout the 10-month period until the radiologist's complaint.

3.D | Institutional service database
To discover an event that caused the improper detector calibration, the institutional service events records database for this DR system annual testing data (Fig. 10). The timeline in Fig. 12  The clinical image quality metrics for the chest radiographs in this case provided a new level of sophistication for root-cause analysis. Ten perceptual attributes of patient image quality were calculated on an image-by-image basis, and the Ln metric was found to be closely correlated with changes in detector performance. A control chart was created for Ln demonstrating that, had Ln been monitored in the images, it would have warned of abnormal performance far ahead of the radiologist's complaint. Because the radiologists continued to interpret suboptimal images with exaggerated grid lines and skin line artifacts for 10 months before reporting them, the data suggest that the automated software is more sensitive to changes in system performance than are human observerseven highly trained radiologists. This is consistent with other findings using these clinical image quality metrics. 26 To investigate a possible relationship, images that had been analyzed previously were selected if they had been acquired within 2 days of a weekly QAP test. The Ln data from these images were paired with the CNR1 results from the corresponding QAP tests. A plot of Ln vs CNR1 is shown in Fig. 13. A simple linear regression revealed a Pearson correlation coefficient of −0.85, indicating a high negative correlation between these two metrics. It is important to recognize that the Ln values depend on anatomic features and inherently contain quite a bit of variation from patient-to-patient, as evidenced by the error bars in Fig. 13. The variation in CNR1 under stable conditions has not been established, however, data in Fig. 8 suggest that ±10% may be a reasonable estimate. F I G . 1 2 . Timeline of QC testing and service events. The first annual test that included measurement of SNR 2 vs exposure with use of the appropriate phantom for comparison was in January 2010. If these data had been compared with limits for exposure-dependent SNR 2 (Fig. 9), the problem with the calibration file would have been indicated earlier.
It is interesting to note that while each of the MP3.0 metrics was compelling technical reasons to preclude these methods from being used contemporaneously. Each of these methods has the potential of detecting problems before they impact the clinical imaging operation and in advance of a radiologist's image quality complaint. However, development and fielding of these methods requires an investment of time and resources that must be based on confidence in future benefit of the kind that these results demonstrate.

| CONCLUSIONS
In this case, MP1.0 tests failed to detect substandard DR system performance. All of the traditional tests passed indicating that the system was behaving normally. However, when MP3.0 methods were employed, a problem with the system was not only identified, but also its root cause was determined. This investigation also suggests that the clinical image quality metrics are more sensitive to changes in detector performance than are human observers, as the radiologist's image quality complaint was received nearly a year after the problem originated. A total of 421 patient chest exams were performed on the unit while the problem went undetected. This case demonstrates the necessity of MP3.0. Had these methods been used from the very beginning, awareness of the problem would have occurred much sooner, leading to intervention before it was even noticed in the clinic. Although this case involved the use of a DR system, the principles should extend to other imaging modalities.

CONF LICTS OF INTEREST
The authors have no relevant conflicts of interest to disclose.

N O T E
a Contrast is usually considered to be the difference between the signal behind a feature and its surrounding background. The noise can also be calculated from the background. 22 In the QAP, each square ROI is located on a hole and each rectangular ROI is located on the plate that achieves the desired signal difference by attenuation, so that the "background" is actually the hole in this case.