Performance of deep learning synthetic CTs for MR‐only brain radiation therapy

Abstract Purpose To evaluate the dosimetric and image‐guided radiation therapy (IGRT) performance of a novel generative adversarial network (GAN) generated synthetic CT (synCT) in the brain and compare its performance for clinical use including conventional brain radiotherapy, cranial stereotactic radiosurgery (SRS), planar, and volumetric IGRT. Methods and Materials SynCT images for 12 brain cancer patients (6 SRS, 6 conventional) were generated from T1‐weighted postgadolinium magnetic resonance (MR) images by applying a GAN model with a residual network (ResNet) generator and a convolutional neural network (CNN) with 5 convolutional layers as the discriminator that classified input images as real or synthetic. Following rigid registration, clinical structures and treatment plans derived from simulation CT (simCT) images were transferred to synCTs. Dose was recalculated for 15 simCT/synCT plan pairs using fixed monitor units. Two‐dimensional (2D) gamma analysis (2%/2 mm, 1%/1 mm) was performed to compare dose distributions at isocenter. Dose–volume histogram (DVH) metrics (D95%, D99%, D0.2cc, and D0.035cc) were assessed for the targets and organ at risks (OARs). IGRT performance was evaluated via volumetric registration between cone beam CT (CBCT) to synCT/simCT and planar registration between KV images to synCT/simCT digital reconstructed radiographs (DRRs). Results Average gamma passing rates at 1%/1mm and 2%/2mm were 99.0 ± 1.5% and 99.9 ± 0.2%, respectively. Excellent agreement in DVH metrics was observed (mean difference ≤0.10 ± 0.04 Gy for targets, 0.13 ± 0.04 Gy for OARs). The population averaged mean difference in CBCT‐synCT registrations were <0.2 mm and 0.1 degree different from simCT‐based registrations. The mean difference between kV‐synCT DRR and kV‐simCT DRR registrations was <0.5 mm with no statistically significant differences observed (P > 0.05). An outlier with a large resection cavity exhibited the worst‐case scenario. Conclusion Brain GAN synCTs demonstrated excellent performance for dosimetric and IGRT endpoints, offering potential use in high precision brain cancer therapy.


| INTRODUCTION
Magnetic resonance imaging (MRI) provides superior soft tissue contrast than computed tomography (CT), the gold standard image modality used for treatment planning in radiotherapy. The incorporation of MRI as an adjunct to CT significantly reduces inter/intraobserver variations in structure delineation. 1 As a complimentary modality, the MRI is registered to simulation CT (simCT) to transfer MRI delineated structures for treatment planning. However, this multimodality registration may introduce up to~2 mm of systematic error in the head region. [2][3][4] In an effort to eliminate multimodality image registration uncertainty and improve clinical efficiency, MR-only treatment planning has emerged as a viable treatment option for many disease sites. [5][6][7] Yet, implementing MR-only treatment planning presents several challenges including that MRI does not provide electron density information required for accurate dose calculation. In the brain, several synthetic CT (synCT) generation methods from MRI data have been developed including bulk density assignments, atlas-based, voxel-based, and machine learning-based methods. [8][9][10][11][12][13][14] Recently, deep learning has achieved superior accuracy in synCT generation than other approaches. [8][9][10][11][12] Wang et al. 15 reported that synCT generated from a path-based random forest method achieved less than 0.6% dose difference in target DVH metrics and a 99% average gamma passing rate (3%/3mm) in brain stereotactic radiosurgery (SRS) treatments. Kazemifar et al. 16 assessed the dosimetric accuracy of generative adversarial networks (GANs) generated synCT in brain radiotherapy and found <1% dose difference in dose-volume histogram (DVH) endpoints. Despite the existing studies on dosimetric performance, very few studies have assessed the performance of synCT for image-guided radiation therapy (IGRT). Price et al. 17 and Morris et al. 18 found that synCTs generated using voxel-based weighted summation achieved similar performance for whole and partial brain IGRT, respectively. However, the synCT method employed required multiple MR datasets to generate synCT images that may have introduced other potential coregistration errors and did not implement deep learning. We recently developed and validated a GAN model that generates brain synCTs from a single MRI input in~6 s, yielding excellent agreement to the corresponding CT. 12 This work aims to further evaluate the dosimetric and IGRT performance of GAN generated synCT (GAN-synCT) in the brain and compare its performance for clinical use including conventional brain radiotherapy, cranial SRS, planar IGRT, and volumetric IGRT.

2.B | Synthetic CT Generation and Preprocessing
The synCT images were generated using a previously developed GAN deep learning model. 12 The GAN model trains two competing networks simultaneously: (a) an encoder-decoder architecture called the generator, which tries to generate the synCTs from the input MR images (b) and a discriminator which classifies the generated synCTs as real or synthetic. 19 The generator's architecture includes nine residual blocks, where the discriminator is a CNN with five convolutional layers. As outlined in detail in the original developmental work, the GAN model was validated using a fivefold cross-validation technique. A detailed comparison of GAN synCT and simplified CNN highlighted that our GAN reduced the mean absolute error and better preserved details than CNN. 12 To ensure equivalent dosimetric and IGRT comparisons were conducted between the simCT reference and synCT, all synCTs were sampled to the CT simulation grid resolution for each patient case   The evaluated DVH metrics were defined in Quantitative Analyses of Normal Tissue Effects in the Clinic (QUANTEC) 22 and AAPM Task Group No. 101 Report, 23 including D 99% and D 95% for the target, D 0.02cc for optic pathways, D 0.05cc for brainstem, and D 0.035cc for both target and organs at risk (OARs). D 99% and D 95% represent the doses delivered to 99% and 95% of the planning target volume (PTV), and D 0.035cc , as a representation of maximum dose, is the dose to 0.035cc of a structure's volume. GI, defined as the ratio of the volume of half the prescription isodose to the volume of the prescription isodose, describes how fast the dose falls off outside of the target. 24 CI is the ratio of 100% isodose volume to the volume of the PTV, indicating how well the prescription dose conformed to the target. 25 Gamma analysis at 1%/1 mm and 2%/2 mm (dose difference/distance to agreement) was conducted using a low-dose threshold of 10% of the maximum dose in simCT plan using in-house software.
For cases with tumor volumes >100cc, a 30 × 30 cm 2 dose plane was used to evaluate the global dose distribution. For the remaining cases with small tumor volumes, a 15 × 15 cm 2 dose plane was exported to yield higher resolution.

2.E | IGRT performance evaluation
To assess synCT performance for IGRT, offline rigid registrations were performed between daily on-board images and both simCT and synCT reference images. CBCTs were rigidly registered to synCT images using the Image Registration Workspace in Eclipse using six degrees of freedom (three translational and three rotational). The registration was then compared for equivalence to the corresponding CBCT-simCT registrations to quantify in registration discrepancy. For patients with more than five CBCTs acquired during the treatment course, the first five CBCTs were chosen for evaluation. A total of 43 independent CBCT-synCT and CBCT-simCT registration pairs were compared. Two-dimensional (2D) rigid registrations between on-board kV images and DRR (n = 7) were completed using Elastix (University Medical Center Utrecht, Utrecht, Netherlands) via an inhouse MATLAB tool previously described. 17,18 Normalized mutual information (NMI) was used as the voxel-based similarity metric 26 to determine the translations in the registrations of both the anteriorposterior and lateral DRR images to their corresponding kV images.

2.F | Analysis for statistical comparisons
To assess the agreement between the simCT and synCT measurements, intraclass correlation coefficients (ICC) 27 were computed for DVH metrics, CBCT-CT, and kV-DRR registration. To account for the correlations among multiple measurements from the same patient  Table 1 summarizes key DVH metric results. Among 15 tested plan pairs, the mean difference (MD) was ≤0.10 ± 0.04 Gy for the target D 95% and ≤0.13 ± 0.04 Gy for OARs. While some statistically significant deviations were observed, the overall differences were deemed to not be clinically significant (i.e., low dose difference (<0.05 Gy)).
The ICCs for evaluated DVH metrics were above 0.99, indicating excellent agreement between simCT and synCT plans. Across the entire cohort, close concordance in the GI for the synCT plans (3.88 ± 1.77, Range: 2.35 to 9.74) as compared to that of the simCT plans (3.76 ± 1.69, Range: 2.26 to 9.40) was observed. The maximum GIs of 9.74 and 9.40 for the synCT and simCT plans, respectively, occurred for a patient who had a simultaneous integrated boost with two separate target volumes treated with fractionated SRS to 32 Gy (Case SRS3 in Fig. 1). Similarly, the average CI was 1.12 ± 0.40 (Range: 0.49 to 2.26) and 1.14 ± 0.40 (Range: 0.59 to 2.33) for synCT and simCT plans for the entire cohort, respectively. No significant differences were observed in GI and CI between synCT plans and simCT plans (P > 0.05).

3.B | IGRT performance
The MDs between the CBCT-synCT/simCT registrations and kV-synCT DRR/simCT DRR registrations are summarized in Table 2   F I G . 1. Gamma passing rates at 1%/1 mm and 2%/2 mm comparing dose distributions of synthetic CT (synCT) plan and of simulation CT (simCT) plan. Abbreviations: SRS = stereotactic radiosurgery; Con = primary plan of conventional brain radiotherapy; Con Boost = boost plan of conventional brain radiotherapy.

3.C | Case studies
1 mm and 2%/2 mm, respectively. IGRT evaluation yielded results similar to the population mean for both SRS1 and SRS2 cases.  brain synCTs generated using a hybrid magnitude and phase MRI processing pipeline that required several input images for synCT generation. 14 Another study evaluating brain synCT generated using dilated CNNs reported less than 1.5% difference in target dose as compared to the corresponding CT and yielded a mean gamma passing rate of 98.8% at 1%/1 mm. 29 Wang et al. 15 showed that brain synCTs generated from a path-based random forest method in 14 brain SRS cases achieved less than 0.6% dose difference in target DVH metrics and a 99% average gamma passing rate at 3%/3 mm. Our work outperformed a more recent investigation on GAN-based brain synCT that reported less than 1% dose difference for targets and OARs and gamma passing rates of 98.7% and 93.6% at 2%/2mm and 1%/1mm, respectively. 29 Our work adds to the literature by also considering IGRT performance for our cohort, as well as quantifying the dosimetric impact.

| DISCUSSION
IGRT evaluation showed that differences between synCT-based and simCT-based registrations were minimal. The MDs between volumetric registration pairs were <0.2 mm and <0.1°. The largest T A B L E 2 Differences of volumetric cone beam computed tomography (CBCT-synCT/simCT, 43 observations in 12 subjects) and planar (kilovoltage (kV)-synCT DRR/simCT DRR, 7 observations in 7 subjects) image registrations for image-guided radiation therapy evaluation.