Dosimetric evaluation of synthetic CT generated with GANs for MRI‐only proton therapy treatment planning of brain tumors

Abstract Purpose The purpose of this study was to address the dosimetric accuracy of synthetic computed tomography (sCT) images of patients with brain tumor generated using a modified generative adversarial network (GAN) method, for their use in magnetic resonance imaging (MRI)‐only treatment planning for proton therapy. Methods Dose volume histogram (DVH) analysis was performed on CT and sCT images of patients with brain tumor for plans generated for intensity‐modulated proton therapy (IMPT). All plans were robustly optimized using a commercially available treatment planning system (RayStation, from RaySearch Laboratories) and standard robust parameters reported in the literature. The IMPT plan was then used to compute the dose on CT and sCT images for dosimetric comparison, using RayStation analytical (pencil beam) dose algorithm. We used a second, independent Monte Carlo dose calculation engine to recompute the dose on both CT and sCT images to ensure a proper analysis of the dosimetric accuracy of the sCT images. Results The results extracted from RayStation showed excellent agreement for most DVH metrics computed on the CT and sCT for the nominal case, with a mean absolute difference below 0.5% (0.3 Gy) of the prescription dose for the clinical target volume (CTV) and below 2% (1.2 Gy) for the organs at risk (OARs) considered. This demonstrates a high dosimetric accuracy for the generated sCT images, especially in the target volume. The metrics obtained from the Monte Carlo doses mostly agreed with the values extracted from RayStation for the nominal and worst‐case scenarios (mean difference below 3%). Conclusions This work demonstrated the feasibility of using sCT generated with a GAN‐based deep learning method for MRI‐only treatment planning of patients with brain tumor in intensity‐modulated proton therapy.


| INTRODUCTION
Magnetic resonance imaging (MRI) is often used in radiation therapy to accurately contour the clinical target volume (CTV) and organs at risk (OARs) because of its superior soft tissue contrast compared with computed tomography (CT) images. The use of MRI images is especially crucial in treatment sites in the abdomen and brain, where the tumor volume is mainly surrounded by soft tissue. However, CT images are still required to retrieve information about the physical quantities needed for dose calculation, that is, electron density for radiation therapy with photons and stopping powers for ion therapy. 1 Therefore, the current treatment planning workflow for these sites relies on contouring the target and OARs on MRI, then transferring the contours to CT via image registration. Magnetic resonance imaging-CT co-registration introduces geometrical uncertainties of~2 mm for the brain 2,3 and 2-3 mm for prostate and gynecological patients. 4 Importantly, these errors are systematic, persist throughout treatment, shift high-dose regions away from the target, 5 and may lead to a geometric miss that compromises tumor control. This problem has recently led to the concept of MRI-only-based treatment planning, where pseudo or synthetic CT (sCT) images for dose calculation are generated directly from the MRI scan. Magnetic resonance imagingonly treatment planning would also reduce radiation dose, imaging time, and hospital resources. 6 Magnetic resonance imaging-only treatment planning is, then, an attractive concept that is gaining popularity. 7 However, accurately generating Hounsfield unit (HU) maps from MRI images is not straightforward.
The conventional methods proposed in the literature for automatically generating sCT images can be divided into four categories: bulk density methods, voxel-based or tissue segmentation-based methods, single-or multi-atlas registration with fusion algorithms, and hybrid approaches that combine both atlas-and machine learning-based approaches. 8,9 The accuracy of these methods has improved with time, but they still suffer from several limitations.
Voxel-or tissue segmentation-based methods either require the acquisition of multiple MRI sequences, which result in a longer scanning time, or they use nonstandard sequences seldom available in clinical routines, such as ultrashort echo time (UTE), to segment bone and air regions. 10 Atlas-based methods often fail to handle atypical patient anatomy and may cause intersubject registration errors. 11,12 It has been demonstrated that using multiple atlases improves the results, but the optimal number of atlases remains a question to address. 8,13 The combination of atlas-based registration and machine learning-based methods has demonstrated superior accuracy, 14,15 but these methods largely depend on handcrafted features, which present a twofold weakness: first, defining these features requires human intervention, and second, it is still uncertain which features have the greatest impact on the model's accuracy. To overcome these problems, deep learning methods have recently been proposed, because they completely eliminate dependence on handcrafted features by allowing the deep network to learn its own optimal features to accurately generate sCT images. Several groups have reported a lower HU error between synthetic and real CT images with deep learning-based methods than with conventional methods, such as atlas-based methods. 14 In addition, deep learningbased methods showed excellent dosimetric accuracy for treatment plans based on sCT images generated for brain 16 and prostate patients 17 treated with conventional radiation therapy with photons.
However, these small errors in the HU maps generated may still lead to large dosimetric differences for proton therapy treatments because of the proton range's high sensitivity to the tissue traversed along the beam path. 18,19 The literature is sparse regarding the dosimetric evaluation of sCT generation methods for proton therapy, 20-23 but a couple of groups that analyzed the performance of conventional methods based on tissue segmentation reported, indeed, the need to manually pre-or post-process the pseudo HU values to minimize proton range differences and ensure reasonable dosimetric accuracy. For instance, Koivula et al. 20 segmented bone regions before assigning the corresponding HU, while Maspero et al. 21 manually inserted air cavities within the body contour as found in the CT images to minimize interscan differences (at different time points). Using the newly developed deep learning methods mentioned above could help to achieve higher accuracy while removing any manual operations. In the last year, several groups have started to investigate the application of deep learning for sCT generation, achieving very promising results. [24][25][26][27] In addition, they analyzed the dosimetric accuracy of the generated sCT for single field uniform dose (SFUD) and conventional PTV optimization. But to our knowledge, a proper dosimetric evaluation of these methods for fully intensity-modulated proton therapy (IMPT) with robust optimization has not been performed yet. This article aims to address this issue by analyzing the performance of a deep learning sCT generation method based on generative adversarial networks (GANs) for IMPT treatment planning. Specifically, we focus on treatment plans for brain patients that have been robustly optimized using a commercially available treatment planning system (RayStation, from Ray-Search Laboratories) and standard robust parameters 28,29 reported in the literature (3 mm for the systematic setup error and 3% for the range uncertainty). Robust optimization is the state of the art for treatment planning in proton therapy, and it might help to mitigate the small HU errors in the generated sCT images. However, robustness must be properly evaluated to analyze dosimetric accuracy in all possible scenarios, accounting for both conventional delivery errors and the uncertainties inherent in the sCT generation algorithm, which is crucial to ensure correct treatment outcomes in proton therapy. For this purpose, although the plans were optimized using the analytical dose algorithm embedded in RayStation, we used an independent Monte Carlo dose engine for the final dose recalculation and robustness evaluation.

2.A | Image acquisition
We analyzed CT and MRI images from patients who had undergone conventional radiotherapy for brain tumors. Tumor sizes varied KAZEMIFAR ET AL.  (discriminator). This process is repeated until the discriminator can no longer distinguish between the real CT and the sCT, which indicates that the generator has learned to accurately transform MRI to CT images. This work applied the concept of conditional GAN 31 but modified the original model to improve its performance for our particular application. First, we used U-Net 32 with mutual information (MI) as the loss function to overcome difficulties in MRI-to-CT registration, and second, we used several convolutional layers and several fully connected layers with rectified linear unit (ReLU) 33 and binary cross entropy as the activation/loss functions in the discriminator network. In the following paragraphs, we describe more detail of both the generator and the discriminator components of the conditional GAN model.

2.B.1 | Generator
Our model uses a 2D U-Net as the generator network, which directly learns a mapping function to convert a 2D grayscale image to its corresponding 2D sCT image. Our generator network contains blocks of convolutional 2D layers with variable filter sizes, but the same kernel sizes and activation functions, except the last layer. The structure of our U-Net generator model is illustrated in Fig. 1. On the left side of the U-Net structure, the low-level feature maps are downsampled to high-level feature maps using a max pooling layer. Therefore, we used three 3 × 3 convolutional layers, 34,35 each followed by an ReLU (activation function), and one max pooling operation. On the right side of the U-Net structure, the high-level feature maps and low-level feature maps are fed to the upsampling step using the transposed convolutional layer to construct the predicted image. Therefore, we used a 2 × 2 transposed convolutional layer followed by a concatenate layer and added two 3 × 3 convolutional layers with an ReLU activation function. In addition, a batch normalization layer was added to each 3 × 3 convolutional layer, and a dropout layer was added to one 3 × 3 convolutional layer. In the final layer, we used a 1 × 1 convolutional layer with filter size (1) and a sigmoid activation function. The generator's loss function was MI, using an Adam optimizer, of learning rate = 0.0002, beta_1 = 0.5 (exponential decay rates for the moment estimates 36 ).

2.B.2 | Mutual information cost function
We defined the custom loss function "mutual information" between CT and sCT of the generator using Keras package. MI measures the "amount of information" of one variable when another variable is known. Maximizing MI is equivalent to minimizing the joint entropy (joint histogram). The MI between our two variables, the real CT (x i ) and the generated sCT (G y i ð Þ, with y i as the MRI), is expressed as: Þis the joint distribution, and p x i ð Þ and p G y i ð Þ ð Þindicate the distribution of images x i and G y i ð Þ, respectively. Here, the loss function of the generator and discriminator need to be updated.
The discriminator "D" gets updated by the loss function: and the generator "G" gets updated by the cost function,

2.C | Model training, validation, and testing
From a database of 77 patients, we randomly selected 66 patients (85%) for training and cross-validation and used the remaining 11 patients (15%) for testing. The cross-validation procedure, which is often used to evaluate the stability of the model, 39  were evaluated as a measure of our plan quality and robustness for the treatment plans generated on both the CT and the sCT images.  (difference in optic chiasm D 2 = 5.2% and D mean = 6.0%), but these differences are not clinically relevant since the metric itself (D 2 ) is far from the maximum dose that the organ can tolerate.  (Fig. 4). In this case, the affected organ (RON) has a very small volume and is close to the nasal cavity ( Fig. 4), which increases the chance of the pencil beam algorithm providing a lower accuracy. On the other hand, the differences for the worst-case scenario were below 3% on average, as previously reported, but again exceeded 5% in some exceptional cases, such as patient #1 (difference in worst-case brainstem D 2 = 9.8%, brainstem D mean = 5.4%, LON D 2 = 6.0%, and RON D 2 = 14.8%), patient #2

3.B.2 | Monte Carlo dose (nominal case and robustness test)
(difference in worst-case optic chiasm D 2 = 12.1%), and patient #7 (difference in worst-case brainstem D 2 = 7.6%). The data used for training were paired, that is, the MR/CT pairs were corresponding to the same patient. However, one of the advantages of GANs is the ability to learn from unpaired data. Learning image-to-image translation from unpaired data has achieved excellent results in fields like computer vision, but this task appears to be rather more complex when medical images are involved, since it requires the exact reproduction of the same patient anatomy, and not just any random or average patient anatomy. Nevertheless, it is an interesting topic to investigate in the future. T A B L E 2 Absolute differences between relevant dose volume histogram (DVH) metrics from the Monte Carlo doses (nominal and robustness test) computed on the computed tomography (CT) and synthetic CT for the 11 test patients, expressed as percentage (%) of the prescription dose (60 Gy). The values in regular font correspond to the nominal case, while those in italics correspond to the worst-case scenario. The last two columns contain the mean over all patients and its standard deviation (SD   For the worst-case scenario, the differences between the doses computed on CT and sCT were slightly higher than for the nominal case in some patients, but they generally remained below 3%, except for a few metrics in certain patients ( Increasing these values could help to reduce the sensitivity of the IMPT plans to the small differences in HU between the CT and sCT.

| DISCUSSION
But finding the most suitable values may require a detailed analysis of how best to translate the HU error associated with our sCT generation model into an equivalent robustness recipe, given the existing parameters available in commercial software (i.e., systematic setup error and constant range uncertainty). This type of study has already been performed to account for random setup errors, 52 and a similar workflow could be applied to our particular problem. An alternative strategy to reduce the dosimetric differences between the CT and sCT would be to simulate HU errors directly in the robustness scenarios used during the optimization process. This would require generating an HU error distribution that could later be sampled to generate multiple scenarios to cover the entire error space.
As previously mentioned, the literature on the use of sCT images for MRI-only proton therapy planning is rather scarce. However, given the increasing success of MR-guided photon radiotherapy, 53 we believe that the medical community will soon turn its attention to MRI-guided proton therapy. 7 20 reported an MAE of 34 HU and a relative dose difference from sCT to CT within 0.5% in ten brain patients for their dual HU conversion model enabling heterogeneous tissue representation. However, their method excluded air cavity volumes, which is one of the most challenging parts, and required that the bone regions from the MRI images be segmented before the HU conversion. In both studies, the tumor was located in rather homogeneous regions, which might explain their good results, but they acknowledge the limitations of their method for tumors close to the nasal cavity, as is the case for some of our patients (Fig. 3). In addition, the need for multiple non-standard MRI sequences or dedicated software for bone segmentation complicates the implementation of these methods in clinical practice. Another group analyzed the use of a commercial solution for creating bulk-assigned sCTs for prostate patients 21 and reported the need to manually adapt the assigned synthetic HU values by, for example, inserting the air cavities found on the CT. Again, the need for human intervention impedes the full automation of MRI-only proton therapy planning and the implementation of MRI-guided online treatment adaptation strategies. 7 This is even more desirable for IMPT treatments than for conventional radiotherapy, given the potential to reduce inter-and intra-fraction motion errors. 19,58 In contrast, the method proposed in this work enables a fast (1 s for sCT generation) and entirely automatic MRIonly treatment planning process that removes all manual components from the workflow and achieves excellent dosimetric accuracy. A more recent study from Spadea et al. 26 investigated the use of deep convolutional neural networks for sCT generation and also analyzed their dosimetric accuracy for single-field uniform dose (SFUD) plans for brain tumor patients. In contrast, the present work investigated the dosimetric accuracy of the generated sCT for fully IMPT treatment planning, which is much more challenging than the case of SFUD due to the extra sensitivity of this technique to HU uncertainties. Therefore, worst-case robust optimization on the CTV was used to generate the plans. Moreover, we performed a complete evaluation of the robustness of the generated plans, recomputing the dose on both CT and sCT for all considered uncertainty scenarios with an independent Monte Carlo dose engine. No previous study has performed such a complete dosimetric and robustness evaluation, which we believe is crucial for IMPT treatment plans, given their sensitivity to dose calculation and delivery uncertainties.

| CONCLUSIONS
This work explanted the feasibility of using sCT images generated with a deep learning method based on generative adversarial networks (GANs) for intensity-modulated proton therapy. We tested the method in brain tumors-some of them located close to complex bone, air, and soft-tissue interfaces-and obtained excellent dosimetric accuracy even in those challenging cases. The proposed method can generate sCT images in around 1\,s without any manual pre-or post-processing operations. This opens the door for online MRIguided adaptation strategies for IMPT, which would eliminate the dose burden issue of current adaptive CT-based workflows, while providing the superior soft-tissue contrast characteristic of MRI images.