Performance evaluation of iterative reconstruction algorithms for achieving CT radiation dose reduction — a phantom study

The purpose of this study was to characterize image quality and dose performance with GE CT iterative reconstruction techniques, adaptive statistical iterative reconstruction (ASiR), and model‐based iterative reconstruction (MBIR), over a range of typical to low‐dose intervals using the Catphan 600 and the anthropomorphic Kyoto Kagaku abdomen phantoms. The scope of the project was to quantitatively describe the advantages and limitations of these approaches. The Catphan 600 phantom, supplemented with a fat‐equivalent oval ring, was scanned using a GE Discovery HD750 scanner at 120 kVp, 0.8 s rotation time, and pitch factors of 0.516, 0.984, and 1.375. The mA was selected for each pitch factor to achieve CTDIvol values of 24, 18, 12, 6, 3, 2, and 1 mGy. Images were reconstructed at 2.5 mm thickness with filtered back‐projection (FBP); 20%, 40%, and 70% ASiR; and MBIR. The potential for dose reduction and low‐contrast detectability were evaluated from noise and contrast‐to‐noise ratio (CNR) measurements in the CTP 404 module of the Catphan. Hounsfield units (HUs) of several materials were evaluated from the cylinder inserts in the CTP 404 module, and the modulation transfer function (MTF) was calculated from the air insert. The results were confirmed in the anthropomorphic Kyoto Kagaku abdomen phantom at 6, 3, 2, and 1 mGy. MBIR reduced noise levels five‐fold and increased CNR by a factor of five compared to FBP below 6 mGy CTDIvol, resulting in a substantial improvement in image quality. Compared to ASiR and FBP, HU in images reconstructed with MBIR were consistently lower, and this discrepancy was reversed by higher pitch factors in some materials. MBIR improved the conspicuity of the high‐contrast spatial resolution bar pattern, and MTF quantification confirmed the superior spatial resolution performance of MBIR versus FBP and ASiR at higher dose levels. While ASiR and FBP were relatively insensitive to changes in dose and pitch, the spatial resolution for MBIR improved with increasing dose and pitch. Unlike FBP, MBIR and ASiR may have the potential for patient imaging at around 1 mGy CTDIvol. The improved low‐contrast detectability observed with MBIR, especially at low‐dose levels, indicate the potential for considerable dose reduction. PACS number(s): 87.57.Q‐, 87.57,nf, 87.57.C‐, 87.57.cj, 87.57.cf, 87.57.cm, 87.57.uq

blends of the ASiR with the FBP; and MBIR. Figure 2 illustrates images of the Kyoto Kagaku abdomen phantom acquired at 3 mGy CTDI vol and reconstructed using different algorithms. The display field-of-view (DFOV) was 36 cm for the reconstructed images of Catphan phantom and was 30 cm for the reconstructed images of Kyoto Kagaku abdomen phantom. These default DFOV sizes were selected based on the lateral dimensions of the phantoms from skin-to-skin. The image matrix was 512 × 512.

B. Image quality metrics
To ensure that the data is sampled from images approximating clinical conditions, all quantitative image analysis was performed directly from the images with default DFOVs, without reconstructing to a small DFOV. To minimize statistical variations in noise for each scan condition, we analyzed images from 10 independent acquisitions at the same identical location in a phantom. ImageJ software (U.S. National Institutes of Health, Bethesda, Maryland) (9) was used to analyze the phantom images downloaded from PACS (iSite Enterprise, Philips Healthcare, Andover, MA).

B.1 Low-contrast detectability: noise
Noise was calculated as the mean of the standard deviations of three, 0.4 cm 2 regions of interest (ROIs) located in the background material of the Catphan. Three ROIs increased the total area of data sampling (from 0.4 to 1.2 cm 2 ). For the Kyoto Kagaku abdomen phantom image analysis, one 3.1 cm 2 ROI was used (Fig. 1).

B.2 Low-contrast detectability: contrast-to-noise ratio
To calculate the contrast-to-noise ratio (CNR), an ROI was placed in the center of the 1% contrast, 15 mm diameter supraslice target of the Catphan CTP515 module and three identical ROIs were placed in the immediate background. The CNR was defined as the mean target signal minus the mean background signal divided by the background standard deviation. For the Kyoto Kagaku abdomen phantom, the target organs used were the spleen, liver, pancreas, and kidneys. Figures 1(c) and (d) show that, in both cases, the sizes of the ROIs in the target and background were identical.

B.3 Changes in image quality with dose and reconstruction techniques
The difference in noise and CNR between the three techniques (namely FBP, ASiR, and MBIR) was calculated. To evaluate changes in image quality with dose, a baseline reference value was defined as the noise or CNR achieved with MBIR at 1 mGy. The 1 mGy CTDI vol value was chosen to represent a very low-dose target level and is consistent with several recent clinical CT imaging studies where CTDI vol levels near or at 1 mGy were evaluated. (10)(11)(12)

B.4 High-contrast spatial resolution
Visual inspection of the CTP528 high resolution module of the Catphan 600 was used to evaluate gross changes in spatial resolution, as typically performed during annual compliance testing. The air target in the CTP404 module was used to measure the modulation transfer function (MTF) of reconstructed images using the edge method as depicted in Fig. 3. (13)(14)(15) Fourteen mm line profiles that started at the center of the target and traversed an equal amount of surrounding material were used to obtain the edge-spread function (ESF). Two of the profiles sample the pixels at the edge of the target horizontally and two diagonally, to avoid bias in the sampling. The line-spread function (LSF) was calculated as the derivative of the ESF. The magnitude of the fast Fourier transform (FFT) of the LSF was normalized to zero frequency to calculate the MTF. FFT magnitude results from 10 replicate images obtained at the same location in the phantom and four line profiles were added together to smooth the MTF.

B.5 Hounsfield unit change
Eight cylindrical inserts in the CTP404 module of the Catphan were used to quantify changes in HU: air, PMP, LDPE, water, polystyrene, acrylic, Delrin, and Teflon. Slight variations from the estimated CT numbers in phantom manual were expected to result from use of the fat ring during scan acquisition.

C. Statistical analysis
Noise, CNR, and MTF were summarized using mean, standard deviation (SD), and range. Noise and MTF were transformed to the logarithmic scale prior to statistical modeling. For noise, a positive difference means higher noise (worse) and a negative means lower noise (better). For CNR, a positive difference means higher CNR (better), while a negative difference means lower CNR (worse). Logarithmic transformation is to reduce skewness in the data to better suit the underlying Normal assumption of linear mixed model or ANOVA. Linear mixed model was used to estimate and compare noise and CNR between algorithms. Linear mixed model can account for correlations between measurements from the same experimental unit. As ANOVA is a generalized form of two-sample t-test, the linear mixed model is a generalized form of paired t-test. The interpretation of linear mixed model is the same as ANOVA. Dunnett's procedure was used to adjust for multiple pairwise comparisons against the reference level (MBIR at CTDI vol of 1 mGy). ANOVA was used to compare MTF between algorithms by pitch, CTDI vol , and spatial frequency. Pairwise comparisons against FBP, based on ANOVA estimates, were also adjusted using the Dunnett's procedure. All tests were two-sided and adjusted p-values of 0.05 or less were considered statistically significant. Statistical analysis was carried out using SAS version 9 (SAS Institute, Cary, NC). The values of statistical analysis (e.g., adjusted p-value) have been associated to the outcomes in Appendix A.

A. Low-contrast detectability: noise and CNR
Noise and CNR analysis with the Catphan 600 clearly demonstrated the advantage of using MBIR, especially at low-dose levels ( Fig. 4). Compared to the moderate improvement in CNR observed with ASiR (1.2 times for 20% ASiR, 1.3 times for 40% ASiR, and 1.8 times for 70% ASiR), MBIR resulted in a five times increase in CNR at low-dose levels, relative to FBP. Above 5 mGy CTDI vol , the improvements observed with MBIR were within experimental error to those of 70% ASiR. Noise reduction with MBIR was constant among the three pitch factors and similar in magnitude to the improvements observed with CNR ( Fig. 5(a)). However, the CNR did vary with pitch when MBIR was used: eightfold for 0.984, fivefold for 1.375, and threefold for 0.516 ( Fig. 5(b)). Likewise, with the Kyoto Kagaku abdomen phantom, a mean three times reduction in noise and three times improvement in CNR were achieved for the four organs investigated (Fig. 6). A similar trend of slightly increased CNR at a 0.984 pitch was observed with the abdomen phantom.

B. Changes in noise and CNR with dose and reconstruction techniques
Estimated noise and CNR differences from a baseline, reference level were calculated to quantify changes in image quality with increasing CTDI vol across the reconstruction techniques (Fig. 7). From the estimated noise differences ( Fig. 7(a)), it can be observed that 40% ASiR and 70% ASiR at 12 mGy CTDI vol and 20% ASiR and FBP at 18 mGy CTDI vol yielded a similar noise as MBIR at 1 mGy CTDI vol . On the other hand, from the estimated CNR differences ( Fig. 7(b)), the following results are evident: a) MBIR yielded similar CNR at 1, 2, and 3 mGy CTDI vol ; b) 70% ASiR reached the baseline CNR at CTDI vol of 6 mGy and it yielded significantly better CNR at higher doses (adjusted p-value = 0.70); c) 20% ASiR and 40% ASiR CNR values at 12 mGy were comparable to MBIR CNR at 1 mGy and didn't yield significantly better CNR until CTDI vol reached 24 mGy (adjusted p-value = 0.48 and 1.00, respectively); d) FBP needed to reach 18 mGy CTDI vol to yield similar CNR as MBIR at 1 mGy CTDI vol , and it didn't yield significantly better CNR even at CTDI vol of 24 mGy.

C. High-contrast spatial resolution
To evaluate spatial resolution, an initial inspection of bar patterns demonstrated an improvement with iterative reconstruction techniques; this observation was supported by line profiles drawn across the 7 lp/cm bar pattern (Fig. 8). The MTF was also computed for the various reconstruction algorithms and dose levels. Figure 9 shows the improved MTF of MBIR at pitch 0.984, with increasing dose and frequency compared to ASiR and FBP. To compare the MTF for MBIR and ASiR to FBP at a spatial frequency of 5 lp/cm, the estimated ratio relative to FBP was calculated and the results are plotted in Fig. 10. Estimated mean ratio higher than 1 means larger MTF than FBP, and lower than 1 means smaller MTF than FBP.    10. Summary of estimated ratios between ASiR/MBIR and FBP with respect to MTF by pitch and CTDI vol at 5 lp/cm, plotted as mean ± 95% CI. MTF was transformed to the logarithmic scale prior to ANOVA analysis. Estimated differences on the logarithmic scale were back-transformed as ratio to the raw scale as shown in the table. Dunnett's adjustment was used to control overall type 1 error rate at 5% for each model. Detailed results of the statistical analysis can be found in Appendix A. HU values in ASiR and FBP images. Table A6 in Appendix A summarizes changes in material HUs with reconstruction algorithm and dose levels. As deviations from the HUs estimated in the Catphan 600 user manual are expected from the use of the fat-equivalent ring, the data were also plotted as the difference in HU from images reconstructed with FBP at 24 mGy (Fig. 11).
The material HU values in MBIR images are on average 10 HU below those of FBP, whereas the material HU values in 40% ASiR images are almost identical to those in the FBP images. Figure 11 also shows that below 5 mGy CTDI vol , the measured material HU values differed greatly from the HU values of the FBP images at 24 mGy CTDI vol level.

IV. DISCUSSION
Since adaptive statistical iterative reconstruction has become available for clinical patient imaging, the potential for dose reduction with ASiR has been extensively studied in the literature. The reported dose reduction ranges from 30%-60%. In pediatric imaging (aged 1-year-old to adolescence), 100% ASiR was estimated to reduce dose by 82% compared to FBP in a phantom study. (6) Later on, 40% ASiR was implemented clinically, with 42%-48% dose reductions observed. (6) In CT ACR phantom studies, (6) a dose reduction potential of 25%-29% has been reported, when the vendor-recommended 30% ASiR was applied. In chest exams of an elderly patient population (60 years ± 15), Leipsic et al. (16) reported a 26% dose reduction when 30% ASiR was applied. The modest reduction in dose previously reported confirms that ASiR was not intended to result in marked dose reduction, but was rather a balanced approach to maintain image quality with reasonable dose savings. (17) The improvement in image quality observed when correctly modeling the noise properties of the image has led to the development of model-based iterative reconstruction techniques that model the X-ray and image production chain. Several publications have reported that MBIR resulted in dose reductions of 50%-75% in abdominal CT, (18) 70%-80% in chest CT, (19,20) and 67%-86% in phantom studies. (21) For the same delivered dose, MBIR increases the SNR and CNR compared to FBP, a 30% decrease in noise and 46% increase in CNR on abdominal imaging (4) and 70% and 60% decreases in noise in paranasal CT (22) and phantom studies, respectively. (23) In agreement with our findings, Husarik et al. (18) reported that, in abdominal liver examinations with CTDI vol of 4.38-23.35 mGy, an increase in CNR (1.5-2.7 CNR for MBIR and 0.16-0.63 CNR for FBP, medium patient, 120 kVp) and an 80% decrease in noise with MBIR compared with FBP were observed. Shuman et al. (24) also reported in a liver study that image background noise with MBIR was significantly lower and CNR was significantly higher compared to FBP and ASiR of the same raw dataset, and hence at the same dose level of clinical liver imaging. Furthermore, in the cervicothoracic region, which suffers from noise and streak artifacts as a result of beam hardening through the shoulders, Katsura et al. (25) reported that MBIR improved both noise and spatial resolution, whereas the high-and low-pass filters used in analytical reconstruction techniques only recovered one or the other.
A phantom study provides the opportunity for performing in-depth evaluations, by adjusting one scan/reconstruction parameter at a time, while other conditions remain unchanged, so that appropriate comparisons are conducted. Such work is especially valuable when a phantom is scanned repeatedly at many different dose levels for assessing the potential for dose reduction. In contrast with other phantom studies, the phantoms used in our work mimicked patient size/ shape, the base scan protocol was a routine clinical protocol used for patient abdominal imaging, and the phantom images were reconstructed with clinical parameters. We therefore performed an analysis of image quality under various conditions of image reconstruction and at a number of radiation dose levels. The quantitative image analysis included noise, CNR, material HU, and spatial resolution. Our results show that the largest improvement in noise reduction and contrast with MBIR (threefold to fivefold) occurred at the lowest dose levels, demonstrating the potential feasibility of low-dose patient imaging where the appropriate dose levels depend on clinical applications as well as sizes of patients. Conversely, noise, CNR, and material HU for three ASiR blends are less dependent on dose level and provide for a more modest reduction in dose, 15%-20%.
However, image analysis based on noise and CNR does not completely capture the differences in image texture, which may affect the outcomes of patient diagnosis. This limitation of the study could account for the differences in results presented here compared to published clinical studies. Additionally, the use of clinical scan parameters, the size/shape and attenuation of the phantoms (added a fat ring to the Catphan 600 for approximating a medium size patient and the anthropomorphic abdomen phantom), the data sampling directly from the large DFOV, plus additional validation with an anthropomorphic phantom might also account for the discrepancy in results. Overall, this study is an initial step in the systematic evaluation of iterative reconstructions and is limited by the use of phantom data and objective ROIs. Depending on clinical applications, future task-based image quality assessment will be conducted to evaluate overall image quality including, but not limited to, texture characteristics and spatial resolution at various contrast levels.
Regarding if/how iterative reconstructions affect material HU, there exist a small number of publications. Using the Catphan 600 phantom at 1 mGy CTDI vol , 120 kVp, and 1.375 pitch, Mieville et al. (21) reported no HU differences for the air and PMP inserts, differences of 3-4 HU for polystyrene and Delrin, and differences of 10 HU for Teflon for MBIR compared to 100% ASiR and FBP. Larger differences have been reported between MBIR and FBP -19-20 HU for Delrin and 31-33 HU for Teflon -using the Catphan 600 and a bone-mimicking ring. (26) In our study, all three ASiR blends behaved similar to FBP, whereas MBIR reconstructed images had HU that were on average 10 HU lower than the FBP HU at 24 mGy. All these results indicate that MBIR affects HU, especially for materials above 200 HU (Teflon and Delrin).
It must be emphasized that the results presented here must be considered alongside the limitations of the methodology used to evaluate the reconstruction approaches. Although contrast and noise are basic properties of image quality, there are other metrics, such as the Fourier-based noise power spectrum (NPS), that may provide characterization of additional dimensions of image quality by taking into account the multifaceted properties of noise and image texture. However, it can also be argued that the application of FFT to FBP and iterative reconstructed images, such as for the purposes of MTF or NPS analysis, is not appropriate because the basic assumptions of linearity and shift-invariance are violated, especially for MBIR images. In the case of FBP, the assumptions come reasonably close and the scientific community has adopted the use of FFT methods. (27) Conversely, given the spatial-dependence of noise and the contrastdependence of spatial resolution introduced by iterative reconstruction, it is difficult to argue for the shift-invariance of images reconstructed with iterative methods. (28) Therefore, a perceptional reader study is perhaps a more appropriate option for making a direct comparison of images reconstructed using FBP and iterative reconstruction methods. There are reports that, even though the apparent difference in the texture of the MBIR images makes it difficult to conduct a blinded reader study, (19) the appearance of MBIR images is reported to have a minimal impact on clinical diagnosis. (19,29,30) In the near future, we plan to conduct a reader study based on the phantom images that we have acquired to evaluate the effect of texture on perception, but it is not within the scope of this study. It must be emphasized that the results presented here apply to the very specific phantom/image acquisition conditions and objects analyzed, and the results may not be readily extrapolated to other situations.

V. CONCLUSIONS
We performed an objective comparison of FBP, ASiR, and MBIR using both modular and anthropomorphic phantoms over a wide range of image acquisition conditions: 3 levels of ASiR, 3 pitch factors, and 6-7 dose levels. Our results show that iterative reconstruction produced low-dose images that are equivalent in image quality to that of conventional FBP images acquired at higher dose levels. In addition, HUs in MBIR images are highly sensitive to low-dose levels, requiring careful attention for quantitative assessment of anatomy. A combined effort by clinical staff, radiologists, and medical physicists will be needed to integrate these findings into the clinical workflow and establish new CT protocols and standard operating procedures.