Dosimetric evaluation of synthetic CT image generated using a neural network for MR‐only brain radiotherapy

Abstract Purpose and background The magnetic resonance (MR)‐only radiotherapy workflow is urged by the increasing use of MR image for the identification and delineation of tumors, while a fast generation of synthetic computer tomography (sCT) image from MR image for dose calculation remains one of the key challenges to the workflow. This study aimed to develop a neural network to generate the sCT in brain site and evaluate the dosimetry accuracy. Materials and methods A generative adversarial network (GAN) was developed to translate T1‐weighted MRI to sCT. First, the "U‐net" shaped encoder‐decoder network with some image translation‐specific modifications was trained to generate sCT, then the discriminator network was adversarially trained to distinguish between synthetic and real CT images. We enrolled 37 brain cancer patients acquiring both CT and MRI for treatment position simulation. Twenty‐seven pairs of 2D T1‐weighted MR images and rigidly registered CT image were used to train the GAN model, and the remaining 10 pairs were used to evaluate the model performance through the metric of mean absolute error. Furthermore, the clinical Volume Modulated Arc Therapy plan was calculated on both sCT and real CT, followed by gamma analysis and comparison of dose‐volume histogram. Results On average, only 15 s were needed to generate one sCT from one T1‐weighted MRI. The mean absolute error between synthetic and real CT was 60.52 ± 13.32 Housefield Unit over 5‐fold cross validation. For dose distribution on sCT and CT, the average pass rates of gamma analysis using the 3%/3 mm and 2%/2 mm criteria were 99.76% and 97.25% over testing patients, respectively. For parameters of dose‐volume histogram for both target and organs at risk, no significant differences were found between both plans. Conclusion The GAN model can generate synthetic CT from one single MRI sequence within seconds, and a state‐of‐art accuracy of CT number and dosimetry was achieved.


| INTRODUCTION
Traditional radiotherapy workflow relies on computer tomography (CT) image for anatomy acquisition, tumors/organs delineation, patient positioning, and dose calculation. In the past two decades, magnetic resonance image (MRI) as the complementary modality to CT has been increasingly used in clinical routine as it can provide superior soft-tissue contrast, especially for brain and pelvis site.
Besides, the workflow in which CT images were replaced with MRI in each step of the entire radiotherapy chain, so-called MR-only workflow, is of growing interest. MR-only workflow is reported to be advantageous, as it can avoid the registration error between CT and MRI, reduce inter-and intra-observer contouring variation, lower the cost of radiotherapy, improve radiotherapy accuracy, reduce the patient exposure to ionization radiation, 1-10 etc.
The key challenge to MR-only workflow is to extract the information of electron density from MRI for radiation dose calculation.
Unlike CT number which can be directly converted to electron density, the pixel value in MRI only represents the magnetic relaxation time of tissue which has no direct correlation with electron density.
However, the tissue relaxation time can be converted firstly into CT number and further into electron density, and the conversions can be categorized as three approaches. 11 The first approach, in general, is to assign bulk densities for different tissues in MRI, which can be inaccurate and labor-intensive because of manually contouring of tissue. The second approach is to establish CT number for the corresponding MRI voxel by aligning its voxel to an atlas with a preknown correlation between the MRI voxel location and the corresponding CT number. The third approach is the pixel-wise conversion, which establishes a correlation between pixel values of MRI and CT by training through machine learning. Among those approaches, neural networks as a specific method of machine learning stands out for its advantage of high accuracy and automation, and it is considered as the potential priority method for clinical MRIonly radiotherapy workflow.
Deep convolutional neural network (DCNN) has been reported successful in a wide range of medical applications. Several studies utilized the convolutional neural network to perform the synthesis of CT from a variety of MRI sequences. Han 12 and Liu 13 applied the u-net 14 based network to convert MRI to sCT pixel by pixel.
The encoder-decoder architecture in their networks enable the learning of a hierarchy of features from MRI through a downsampling process, then those features in various resolution were combined to generate high-resolution CT image through an upsampling process. Besides, the generative adversarial network (GAN) tailored for image-to-image translation has been applied in the translation of MRI to CT. [15][16][17][18][19] Those U-net based networks contain only the generator of CT image, while the GAN contain an additional adversarial network as the discriminator which would compete with the generator to distinguish generated CT images from real CT.
Although those deep learning-based methods mentioned above have achieved state-of-the-art performance, there still a lot of factors, that is, MRI sequence, registration method, loss function, worthy spending efforts on since many of them can be influential to the results. In this study, we aimed to develop a GAN model to translate clinical standard MRI to synthetic CT, and evaluate its accuracy in terms of image pixel value and clinical radiotherapy dosimetry.

2.A | Patient data collection
Thirty-seven brain cancer patients who had undergone external radiotherapy from July 2019 to April 2020 in our department were

2.B | Generative adversarial network
A conditional generative adversarial network similar to "pix2pix" was adopted here. Two networks namely generator and discriminator comprised of the network. The paired MRI and CT images of each patient were feed into the generator for learning the mapping from CT from MRI, so that the generator can generate sCT from an input MRI. Then the discriminator was trained to compete with the generator and distinguish sCT from the corresponding real CT as well as possible. Through the training of generator and adversarially training of discriminator, the network would converge to its best performance. The detailed architectures of generator and discriminator were illustrated in Figs. 1(a) and 1(b), respectively. We adopted a "U-net" shaped encoder-decoder network as the generator. For the encoder, we have five convolutional layers with a filter size 4 × 4 and a stride of 2 to downsample the input 2D MRI slices from size 512 × 512 to 16 × 16. Each convolutional layer was followed by batch normalization and a Leaky rectified linear unit (Leaky ReLU). For the decoder, a mirrored upsampling process with skip connection to corresponding encoder layers decodes the lowresolution feature maps into 2D synthetic CT. The features from each encoder layer were copied and concatenated with the corresponding feature before each deconvolution layer except the first and last one. The dropout layers were applied after the first three batch normalizations in the decoder network to improve network generalization. 20 Compared to the original U-Net, the total number of convolutional layers was reduced from 19 to 11. Another modification to U-Net is that all pooling layers and unpooling layers were replaced by convolutional and deconvolutional layers, because fractionally strided convolutional layers can be trained to produce dense high-resolution feature maps, while unpooling layers use memorized pooling indices from maxpooling layers to produce sparse high-resolution feature maps. 21 The discriminator network consisted of five convolutional layers with a filter size 4x4 and a stride of 2. The concatenation of input MRI and synthetic or real CT was feed to the first convolutional layer. The leaky ReLU followed each convolutional layer except the last one, which was followed by a sigmoid function then output a score map of shape 1 × 32 × 512 to distinguish between synthetic CT and real CT.
The loss function used in the generator network was mean absolute error (MAE) as defined in Section 2.D, to represent the pixel-wise difference between synthetic CT and real CT. For discriminator, we adopt the least square loss function since it strongly penalized the fake samples away from decision boundary and improve the stability of learning process. 22 The Loss term can be expressed as follows: where the D and G represented the discriminator and generator, respectively, and x,y represented the pair of real CT and MRI, G(y) is output of generator, namely the synthetic CT.  The network weights were initialized using Xavier 23 and updated using the ADAM algorithm 24 with a fixed learning rate of 0.0002.
The batch size was set to 20 to make best use of video memory, and around 32000 steps (720 epochs) were taken to converge each training. The training was performed on a 64-bit Windows workstation, with an Intel Core i7 CPU and an NVIDIA GeForce GTX Titian X graphics card with 12 G RAM.

2.D | Evaluation of synthetic CT
For each testing patient, the mean absolute error (MAE) of each pixel value within patient body contour between sCT and real CT was calculated as follows: The peak signal to noise ratio (PSNR) is also evaluated as follows: where MAX strands for maximum signal value of real CT, and MSE stands for mean square error calculated by     Figure 2 showed the comparison of MRI, synthetic CT, planning CT and difference map for one example patient. A good visual result of CT synthesis by GAN was shown, except for some blurry area in the vicinity of the interface between skull and brain tissue.

3.B | Dosimetric comparison between synthetic and real CT
For each testing patient, a VMAT radiotherapy plan was optimized on the planning CT then calculated again on the corresponding sCT.
For the comparison of dose distribution, the result was illustrated in  Table 1. No significant differences were found for both target and OARs. The comparison of DVH for one example patient was illustrated in Fig. 5.
On the converse, a VMAT plan was optimized on synthetic CT following clinical protocols, then transferred and calculated on planning CT. and the gamma analysis showed 99.96% and 97.99% with criteria of 3 mm/3% and 2 mm/2%, respectively, which is close to the comparison result when the VMAT plan was optimized on the planning CT as stated before.

CONFLI CT OF INTEREST
The authors have no relevant conflict of interest to disclose.