Deep learning based synthetic-CT generation in radiotherapy and PET: A review
Abstract
Recently,deep learning (DL)-based methods for the generation of synthetic computed tomography (sCT) have received significant research attention as an alternative to classical ones. We present here a systematic review of these methods by grouping them into three categories, according to their clinical applications: (i) to replace computed tomography in magnetic resonance (MR) based treatment planning, (ii) facilitate cone-beam computed tomography based image-guided adaptive radiotherapy, and (iii) derive attenuation maps for the correction of positron emission tomography. Appropriate database searching was performed on journal articles published between January 2014 and December 2020. The DL methods' key characteristics were extracted from each eligible study, and a comprehensive comparison among network architectures and metrics was reported. A detailed review of each category was given, highlighting essential contributions, identifying specific challenges, and summarizing the achievements. Lastly, the statistics of all the cited works from various aspects were analyzed, revealing the popularity and future trends and the potential of DL-based sCT generation. The current status of DL-based sCT generation was evaluated, assessing the clinical readiness of the presented methods.
ACRONYMS & ABBREVIATIONS
-
- -map
-
- attenuation map
-
- AC
-
- attenuation correction
-
- aff
-
- affine
-
- AG
-
- attention gate
-
- CBCT
-
- cone-beam computed tomography
-
- CC
-
- cross-correlation
-
- CNNs
-
- convolutional neural networks
-
- cor
-
- coronal
-
- CT
-
- computed tomography
-
- cycle-GAN
-
- cycle-consistent generative adversarial networks
-
- DD
-
- dose difference
-
- def
-
- deformable
-
- DL
-
- deep learning
-
- DPR
-
- dose pass rate
-
- DSC
-
- Dice similarity coefficient
-
- DVH
-
- dose-volume histogram
-
- FLAIR
-
- fluid-attenuated inversion recovery
-
- FOV
-
- field-of-view
-
- GANs
-
- generative adversarial networks
-
- Gd
-
- gadolinium
-
- GPR
-
- gamma pass rate
-
- GRE
-
- gradient recalled-echo
-
- H&N
-
- head-and-neck
-
- IGART
-
- image-guided adaptive radiation therapy
-
- MAE
-
- mean absolute error
-
- MR
-
- magnetic resonance
-
- MRAC
-
- magnetic resonance attenuation correction
-
- MRI
-
- magnetic resonance imaging
-
- MSE
-
- mean squared error
-
- mUTE
-
- multiple echo ultra-short echo time
-
- NCC
-
- normalised cross correlation
-
- OAR
-
- organ-at-risk
-
- p
-
- proton
-
- paed
-
- paediatric
-
- PET
-
- positron emission tomography
-
-
- absolute error positron emission tomography
-
-
- relative error positron emission tomography
-
- PSNR
-
- peak signal-to-noise ratio
-
- rig
-
- rigid
-
- RMSE
-
- root mean squared error
-
- ROI
-
- region of interest
-
- RS
-
- range shift
-
- RT
-
- radiotherapy
-
- sag
-
- sagittal
-
- sCT
-
- synthetic computed tomography
-
- SSIM
-
- structural similarity index measure
-
- SUV
-
- standard uptake values
-
- tra
-
- transverse
-
- TSE
-
- turbo spin-echo
-
- UTE
-
- ultra-short echo time
-
- VOI
-
- volume of interest
-
- x
-
- photon
1 INTRODUCTION
Medical imaging's impact on oncological patients' diagnosis and therapy has grown significantly over the last decades.1 Especially in radiotherapy (RT),2 imaging plays a crucial role in the entire workflow, from treatment simulation to patient positioning and monitoring.3-6
Traditionally, computed tomography (CT) is considered the primary imaging modality in RT. It provides accurate and high-resolution patient's geometry, enabling direct electron density conversion needed for dose calculations.7 X-ray based imaging, including planar imaging and cone-beam computed tomography (CBCT), are widely adopted for patient positioning and monitoring before, during or after the dose delivery.4 Along with CT, functional and metabolic information, mainly derived from positron emission tomography (PET), is commonly acquired, allowing tumor staging and improving tumor contouring.8 Magnetic resonance imaging (MRI) has also proved its added value for tumors and organs-at-risk (OARs) delineation, due to its superb soft tissue contrast.9, 10
To benefit from the complementary advantages offered by different imaging modalities, MRI is generally registered to CT.11 However, residual misregistration and differences in patient setup may introduce systematic errors that would affect the accuracy of the whole treatment.12, 13
Recently, MR-only based RT has been proposed14-16 to eliminate residual registration errors. Furthermore, it can simplify and speed-up the workflow, decreasing patient's exposure to ionizing radiation, which is particularly relevant for repeated simulations17 or fragile populations, for example, children. Also, MR-only RT may reduce overall treatment costs18 and workload.19 Additionally, the development of MR-only techniques can be beneficial for MR-guided RT.20
The main obstacle regarding the introduction of MR-only RT is the lack of tissue attenuation information required for accurate dose calculations.12, 21 Many methods have been proposed to convert MR to CT-equivalent representations, often known as synthetic CT (sCT), for treatment planning and dose calculation. These approaches are summarised in two specific reviews on this topic,22-24 in site-specific reviews,18, 25, 26 or in broader review on MR-guided27 or proton therapy.28
Additionally, similar techniques to derive sCT from a different imaging modality have been envisioned to improve the quality of CBCT.29 CBCT plays a vital role in image-guided adaptive radiation therapy (IGART) for photon and proton therapy. However, due to the severe scatter noise and truncated projections, image reconstruction is affected by several artifacts, such as shading, streaking, and cupping.30, 31 For these reasons, daily CBCT has not commonly been used for online plan adaptation. The conversion of CBCT-to-CT would allow accurate dose computation and improve the quality of IGART provided to the patients.
Finally, sCT estimation is also crucial for PET attenuation correction (AC). Accurate PET quantification requires a reliable photon AC map, usually derived from CT. In the new PET/MRI hybrid scanners, this step is not immediate, and MRI-to-sCT translation has been proposed to solve the MR attenuation correction (MRAC) issue. Besides, stand-alone PET scanners can benefit from the derivation of sCT from uncorrected PET.32-34
In the last years, the derivation of sCT from MRI, PET, or CBCT has raised increasing interest based on artificial intelligence algorithms such as machine learning or DL.35 This paper aims to perform a systematic review and summarise the latest developments, challenges, and trends in DL-based sCT generation methods. DL is a branch of machine learning, a field of artificial intelligence that involves using neural networks to generate hierarchical representations of the input data to learn a specific task without hand-engineered features.36 Recent reviews have discussed the application of DL in RT,37-43 and in PET AC.34 Convolutional neural networks (CNNs), which are the most successful type of models for image processing,44, 45 have been proposed for sCT generation since 2016,46 with a rapidly increasing number of published papers on the topic. However, DL-based sCT generation has not been reviewed in details, except for applications in PET.47 With this survey, we aim at summarizing the latest developments in DL-based sCT generation, highlighting the contributions based on the applications and providing detailed statistics discussing trends in terms of imaging protocols, DL architectures, and performance achieved. Finally, the clinical readiness of the reviewed methods will be discussed.
2 MATERIALS AND METHODS
A systematic review of techniques was carried out using the PRISMA guidelines. PubMed, Scopus, and Web of Science databases were searched from January 2014 to December 2020 using defined criteria (for more details, see Appendix). Studies related to radiation therapy, either with photons or protons and AC for PET, were included when dealing with sCT generation from MRI, CBCT, or PET. This review considered external beam radiation therapy, excluding, therefore, investigations that are focusing on brachytherapy. Conversion methods based on fundamental machine learning techniques were not considered in this review, preferring only DL-based approaches. Also, the generation of dual-energy CT was not considered along with the direct estimation of corrected attenuation maps from PET. Finally, conference proceedings were excluded: proceedings can contain valid methodologies; however, a large number of relevant abstracts and incomplete report of information were considered not suitable for this review. After the database search, duplicate articles were removed and records screened for eligibility. A citation search of the identified articles was performed.
- I. MR-only RT;
- II. CBCT-to-CT for image-guided (adaptive) radiotherapy;
- III. PET attenuation correction.

Independent of the input image, that is, MRI, CBCT, or PET, the chosen architecture (CNN) can be trained with paired or unpaired input data and different configurations. In this review, we define the following configurations: 2D (single slice, 2D, or patch) when training was performed considering transverse (tra), sagittal (sag), or coronal (cor) images; 2D+ when independently trained 2D networks in different views were combined during or after inference; multi-2D (m2D, also known as multiplane) when slices from different views, for example, transverse, sagittal, and coronal, were provided to the same network; 2.5D when training was performed with neighboring slices that were provided to multiple input channels of one network; 3D when volumes were considered as input (the whole volume, 3D, or patches). The architectures generally considered are introduced in the next section (Section 2.1). The sCTs are generated inferring on an independent test set the trained network or combining an ensemble of trained networks. Finally, the quality of the sCT can be evaluated with image-based or task-specific metrics (2.2).
For each of the sCT generation category, we compiled tables providing a summary of the published techniques, including the key findings of each study and other pertinent factors, here indicated the anatomic site investigated; the number of patients included; relevant information about the imaging protocol; DL architecture, the configuration chosen to sample the patient volume (2D or 2D+ or m-2D, 2.5D, or 3D); using paired/unpaired data during the network training; the radiation treatment adopted, where appropriate, along with the most popular metrics used to evaluate the quality of sCT (see 2.2).
The year of publication for each category was noted according to the date of the first online appearance. Statistics in terms of popularity of the mentioned fields were calculated with pie charts for each category. Specifically, we subdivided the papers according to the anatomical region they dealt with: abdomen, brain, head and neck (H&N), thorax, pelvis, and whole body; where available, tumor site was also reported. A discussion of the clinical feasibility of each methodology and observed trends follows.
The most common network architectures and metrics will be introduced in the following sections to facilitate the tables' interpretation.
2.1 Deep learning for image synthesis
Medical image synthesis can be formulated as an image-to-image translation problem, where a model that maps input image (A) to a target image (B) has to be found.48 Among all the possible strategies, DL methods have dramatically improved state of the art.49 DL approaches mainly used to synthesise sCT belong to the class of CNNs, where convolutional filters are combined through weights (also called parameters) learned during training. The depth is provided using multiple layers of filters.50 The training is regulated by finding the “optimal” model parameters according to the search criterion defined by a loss function (). Many CNN-based architectures have been proposed for image synthesis, with the most popular being the U-nets51 and generative adversarial networks (GANs)52 (see Figure 2). U-net presents an encoding and a decoding path with additional skip connections to extract and reconstruct image features, thus learning to go from domains A to B. In the most simple GAN architecture, two networks are competing. A generator (G) that is trained to obtain synthetic images () similar to the input set (), and a discriminator (D) that is trained to classify whether B is real or fake (), improving G's performances.

GANs learn a loss that combines both tasks resulting in realistic images.53 Given these premises, many variants of GANs can be arranged, with U-nets being employed as a possible generator in the GAN framework. We will not detail all possible configurations since it is not the scope of this review, and we address the interested reader to References 54-56. A particular derivation of GAN, called cycle-consistent GAN (cycle-GAN), is worth mentioning. Cycle-GANs opened the era of unpaired image-to-image translation.57 Here, two GANs are trained, one going from A to B, called forward pass (forw), and the second going from B to A, called backward pass (back), are adopted with their related loss terms (Figure 2 bottom right). Two consistency losses are introduced, aiming at minimizing differences between A and A and B and B, enabling unpaired training.
2.2 Metrics
An overview of the metrics used to assess and compare the reviewed publications' performances is summarised in Table 1.
Category | Metric | |
---|---|---|
Image similarity | , with =voxel number in ROI; | |
with | ||
Mean, variance/covariance | ||
Dynamic range, and | ||
Geometry accuracy | ||
Task-specific | MR-only and CBCT-to-CT | , with = dose; |
DPR = % of voxel with % in an ROI | ||
GPR % of voxel with in an ROI | ||
DVH difference of specific points in dose-volume histogram plot | ||
PET reconstruction |
- Abbreviations: CBCT, cone-beam computed tomography; CT, computed tomography; DD, dose difference; DPR: dose pass rate; DSC: Dice similarity coefficient; DVH, dose-volume histogram; GPR: gamma pass rate; ROI, region of interest; M(A)E, mean absolute error; MR, magnetic resonance; PET, positron emission tomography; PSNR, peak signal-to-noise ratio; (R)MSE, (root) mean squared error; SSIM, structural similarity index measure.
Image similarity: The most straightforward way to evaluate the quality of the sCT is to calculate the similarity of the sCT to the ground truth/target CT, on a voxel-wise basis. The calculation of voxel-based image similarity metrics implies that sCT and CT are aligned by translation, rigid (rig), affine or deformable (def) registrations. Widespread similarity metrics for this task are reported in Table 1 and include mean (absolute) error (M(A)E), sometimes referred to as mean absolute prediction error, peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM). Other less common metrics are cross-correlation and normalised cross-correlation, along with the (root) mean squared error ((R)MSE).
M(A)E, (R)MSE are relatively easy to compute, and together with PSNR, are the most widely used fidelity measures. SSIM is a more sophisticated metric developed to take advantage of the known characteristics of the human visual system58 perceiving the loss of image structure due to variations in lighting.
Geometric accuracy: Along with voxel-based metrics, the geometric accuracy of the generated sCT can also be assessed by comparing the volume and morphology of delineated structures or their corresponding binary masks. For example, the Dice similarity coefficient (DSC) is a common metric that assesses the accuracy of depicting specific tissue classes/structures, for example, bones, fat, muscle, air, and body. In this context, DSC is calculated after having applied a threshold to CT and sCT and, if necessary, morphological operations on the binary masks. Additionally, metrics generally used to estimate segmentation's accuracy can also be adopted as Hausdorff distance or mean absolute surface distance, which measures the maximum and average distance of two sets of contours.
Other image-based metrics can be subdivided according to the application and presented in the following sections' appropriate subcategory.
Task-specific metrics: In MR-only RT and CBCT-to-CT for adaptive RT, the accuracy of dose calculation on sCT is generally compared to CT-based in specific region of interests (ROIs) for dose calculations performed either for photon (x) and proton () RT.
The most common voxel-wise based metric is the dose difference (DD), calculated as the average dose () in ROIs as well as in the whole body, target or other structures of interest. The DD can be expressed as an absolute value (Gy) or relative (%), either to the prescribed dose, the maximum dose or the voxel-wise reference dose. The dose pass rate is directly correlated to DD, and it is calculated as the percentage of voxels with DD less than a set threshold.
Gamma () analysis allows combining dose and spatial criteria59 and it can be performed either in 2D or 3D. Several parameters need to be set to perform -analyses, including dose criteria, distance-to-agreement criteria, local or global analysis, and dose threshold. Interpretation and comparison between studies of gamma index results are challenging since they depend on the chosen parameters, dose grid size, and voxel resolution.60, 61 Results of -analysis are generally expressed as gamma pass rate (GPR), counting the percentage of voxels with or the mean in an ROI generally defined based on a threshold of the reference dose distribution.
Dose-volume histograms (DVHs) are one of the most diffused tools in the clinical routine.62 For the evaluation of sCT, generally, the differences among clinically relevant DVH points is reported.
For sCT for PET AC, the absolute and relative error of the PET reconstruction ( and , respectively) are usually reported along with the difference in standard uptake values (SUV).
Please note that differences could occur in the ROI where the metrics are calculated. For example, mean absolute error (MAE) can be computed on the whole predicted volume, in a volume of interest (VOI) or a cropped volume. In addition to that, the implementation of the metric computation can change. For example, (), (), and () can be calculated on ROI obtained from different dose thresholds and with 2D or 3D algorithms. In the following sections, we will highlight the possible differences speculating on the impact.
3 RESULTS
Database searching led to 91 records on PubMed, 98 on Scopus, and 218 on Web of Science. After duplicates removal and content check, 83 eligible papers were found.
Figure 3 summarises the number of articles published by year, grouped in 51 (), 15 (), and 17 () for MR-only RT (category I), CBCT-to-CT for adaptive RT (category II), and sCT for PET AC (category III), respectively. The first conference paper appeared in 2016.46 Given that we excluded conference papers from our search, we found that the first work was published in 2017. In general, the number of articles increased over the years, except for CBCT-to-CT and sCT for PET AC, which was stable in the last years. Figure 3 shows that the brain, pelvis, and H&N were the most popular anatomical regions investigated in DL-based sCT for MR-only RT, covering 80% of the studies. For CBCT-to-CT, H&N and pelvic regions were the most explored sites, being present in 75% of the works. Finally, for PET AC, H&N was investigated in the majority of the studies, followed by the pelvic region. Together, they covered 75% of the publications.

The total number of patients included in the analysis was variable, but most studies dealt with less than 50 patients for all three categories. The largest patient cohorts included 40265 (I), 32866 (II), and 193 patients67 (I), while the smallest studies included 10 patients68 and another 10 volunteers69(I).
Most papers enrolled adult patients. Pediatric patients represent a more heterogeneous dataset for network training, and its feasibility has been investigated first for AC in PET70 (79 patients) and more recently for photon and proton RT.71, 72
All the models were trained to perform a regression task from the input to sCT, except for two studies where networks were trained to segment the input image into a predefined number of classes, thus performing a segmentation task.73, 74
In most of the works, training was implemented in a paired manner, with unpaired training investigated in 13 of 83 articles. Four studies compared paired against unpaired.67, 75-77 The 2D networks were the most common over the three categories, being adopted about 61% of the times, 2D+ 6%, 2.5D 10%, and 3D configuration 24%. In some studies, multiple configurations were investigated, for example.75, 78, 79 GANs were the most popular architectures (45 times), followed by U-nets (36) and other CNNs. Note that the GAN generator a U-net may be employed, but this counted as GAN.
All the investigations employed registration between sCT and CT to evaluate the quality of the sCT, except for Xu et al.77 and Fetty et al.,80 where metrics were defined to assess the quality of the sCT in an unpaired manner, for example, Frechet inception distance.
Main findings are reported in Table 2 for studies on sCT for MR-only RT without dosimetric evaluations, in Tables 3a and 3b for studies on sCT for MR-only RT with dosimetric evaluations, in Table 4 for studies on CBCT-to-CT for IGART, and in Table 5 for studies on PET AC. Tables are organised by anatomical site and tumor location where available. Studies investigating the independent training and testing of several anatomical regions are reported for each specific site.66, 77, 81-83 Works using the same network to train or test data from different scanners and anatomy are reported at the bottom of the table.84, 85 Detailed results based on these tables are presented in the following sections subdivided for each category.
Patients | MRI | DL method | Image similarity | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tumor site | Train | val | Test | x-Fold | Field (T) | Sequence | conf | arch | Reg | MAE (HU) | PSNR (dB) | SSIM | Others | Reference | |
Abdomen | Abdomen | 10a | 10 | LoO | n.a. | mDixon | 2D pair | GAN | def | 61 ± 3 | CC | Xu et al.69 | |||
Abdomen | 160 | LoO | n.a. | n.a. | 2D pair | GAN | rig | 5.1 ± 0.5 | 0.90 ± 0.43 | (F/M)SIM IS … | Xu et al.77 | ||||
Brain | Brain | 18 | 6x | 1.5 | 3D T1 GRE | 2D pair | U-net | rig | 85 ± 17 | MSE, ME | Han87 | ||||
Brain | 16 | LoO | n.a. | T1 | 2.5Dp pair | CNN+ | rig | 85 ± 9 | 27.3 ± 1.1 | Xiang et al.81 | |||||
Brain | 15 | 5x | 1.0 | T1 Gd | 2D pair | CNN | def | 102 ± 11 | 25.4 ± 1.1 | 0.79 ± 0.03 | Tissues | Emami et al.88 | |||
Brain | GAN | 89 ± 10 | 26.6 ± 1.2 | 0.83 ± 0.03 | Tissues | Emami et al.88 | |||||||||
Brain | 98CT | 10 | 3 | 3D T2 | 2D | GAN | aff | 19 ± 3 | 65.4 ± 0.9 | 0.25 ± 0.01 | Jin et al.89 | ||||
Brain | 84MR | pair/unp | |||||||||||||
Brain | 24 | LoO | n.a. | T1 | 3Dp pair | GAN | rig | 56 ± 9 | 26.6 ± 2.3 | NCC, HD body | Lei et al.90 | ||||
Brain | 33 | LoO | n.a. | T1d | 2D unp | GAN | No | 9.0 ± 0.8 | 0.75 ± 0.77 | (F/M)SIM IS … | Xu et al.77 | ||||
Brain | 28e | 2 | 15 | 1.5 | n.a. | 2D pair | GAN | aff | 134 ± 12 | 24.0 ± 0.9 | 0.76 ± 0.02 | Yang et al.91 | |||
Brain | 81 | 11 | 8x | 1.5 | 3D T1 GRE | 2D pair | U-net | aff | 45.4 ± 8.5 | 43.0 ± 2.0 | 0.65 ± 0.05 | Metrics for air | Massa et al.92 | ||
Brain | 3D T1 GRE Gd | 44.6 ± 7.4 | 43.4 ± 1.2 | 0.63 ± 0.03 | Air, bones | ||||||||||
Brain | 2D T2 SE | 45.7 ± 8.8 | 43.4 ± 1.2 | 0.64 ± 0.03 | Soft tissues | ||||||||||
Brain | 2D T2 FLAIR | 51.2 ± 4.5 | 44.9 ± 1.2 | 0.61 ± 0.04 | DSC bones | ||||||||||
Brain | 28 | 6 | 1.5 | T2 | 2D pair | U-net | rig | 65 ± 4 | 28.8 ± 0.6 | 0.972 ± 0.004 | Same metrics for | Li et al.76 | |||
2D unp | GAN | 94 ± 6 | 26.3 ± 0.6 | 0.955 ± 0.007 | Synthetic MRI | ||||||||||
Head and neck | Nasopharinx | 23 | 10 | 1.5 | T2 | 2D pair | U-net | def | 131 ± 24 | MAE ME tissue/bone | Wang et al.93 | ||||
H&N | 28 | 4 | 8x | 1.5 | 2D T1 ± Gd, T2 | 2D pair | GAN | aff | 76 ± 15 | 29.1 ± 1.6 | 0.92 ± 0.02 | DSC MAE bone | Tie et al.94 | ||
H&N | 60 | 30 | 3 | T1 | 2D unp | GAN | n.a. | 19.6 ± 0.7 | 62.4 ± 0.5 | 0.78 ± 0.2 | Kearney et al.95 | ||||
H&N | 7 | 8 | LoO | 1.5 | 3D T1, T2 | 2D pair | GAN | def | 83 ± 49 | ME | Largent et al.96 | ||||
H&N | 10 | LoO | 1.5 | 3D T1, T2 | 2D pair | GAN | def | 42–62 | RMSE, CC | Qian et al.68 | |||||
H&N | 32 | 8 | 5x | 3 | 3D UTE | 2D pair | U-net | def | 104 ± 21 | DSC, spatial corr | Su et al.97 | ||||
Pelvis | Prostate | 22 | LoO | n.a. | T1 | 2.5Dp pair | CNN+ | rig | 43 ± 3 | 33.5 ± 0.8 | Xiang et al.81 | ||||
Pelvis | 20 | LoO | n.a. | 3D T2 | 3Dp pair | GAN | rig | 51 ± 16 | 24.5 ± 2.6 | NCC, HD body | Lei et al.90 | ||||
Prostate | 20 | 5x | 1.5 | 2D T1 TSE | 2D pair | U-net | def | 41 ± 5 | DSC bone | Fu et al.78 | |||||
Pelvis human | 3Dp pair | 38 ± 5 | |||||||||||||
Pelvis canine | 27 | 3x | 3 | 3D T1 GRE | 3Dp | U-net | def | 32 ± 8 | 36.5 ± 1.6 | MAE/DSC bone | Florkow et al.98 | ||||
Pelvis | 18 | 1.5 | mDixonc | pair | 36 ± 4 | 36.1 ± 1.7 | Surf dist 0.5 mm | ||||||||
Pelvis | 15 | 4 | 5x | 3 | 3D T2 | 2D pair | CNN | def | 38 ± 6 | 29.5 ± 1.2 | 0.96 ± 0.01 | ME, PCC | Bahrami et al.99 | ||
Pelvis | U-net | 43 ± 9 | 28.2 ± 1.6 | 0.95 ± 0.01 | |||||||||||
Pelvis | 100 | 3 | 2D T2 FSE | 2D unp | GAN | No | FID | Fetty et al.80 | |||||||
Thorax | Breast | 14 | 2 | LoO | n.a. | n.a. | 2D pair | U-netb | def | DSC 0.74–0.76 | Jeon et al.73 |
- Abbreviations: arch, architecture; CC, cross-correlation; CNN, convolutional neural network; conf, configuration; CT, computed tomography; DL, deep learning; DSC, Dice score coefficient; FID, Frechet inception distance; FLAIR: fluid-attenuated inversion; (F/M)SIM, (feature/multi-scale structural) similarity; GAN: generative adversarial network; Gd: gadolinium; GRE: gradient recalled-echo; HD, Hausdorff distance; H&N, head and neck; IS, inception score; LoO, leave one out; mDixon, multicontrast Dixon reconstruction; MAE, mean absolute error; ME, mean error; MR, magnetic resonance; MRI, magnetic resonance imaging; (N)CC, normalised cross-correlation; PCC, Pearson correlation coefficient; PSNR, peak signal-to-noise ratio; (R)MSE, (root) mean squared error; SE, spin-echo; SSIM, structural similarity index; (T)SE, (turbo) spin-echo; val, validation; x-Fold, cross-fold.
- a Volunteers, not patients.
- b To segment CT into five classes.
- c Multiple combinations of Dixon images was investigated but omitted here.
- d Dataset from http://www.med.harvard.edu/AANLIB/.
- e Robustness to training size was investigated.
- * Multiple networks or architectures have been compared.
Patients | MRI | DL method | Image similarity | Dose | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tumor site | Train | val | Test | x-Fold | Field (T) | Sequence | conf | arch | Reg | MAE (HU) | PSNR (dB) | Others | Plan | DD (%) | GPR (%) | DVH | Others | Reference | |
Abdomen | Liver | 21 | LoO | 3 | 3D T1 | 3Dp pair | GAN | def | 73 ± 18 | 22.7 ± 3.6 | NCC | 99.4 ± 1.0b | 1% | Range | Liu et al.100 | ||||
GRE | |||||||||||||||||||
Abdomen | 12 | 4x | 0.3 | GRE | 2D pair | GANa | def | 90 ± 19 | 27.4 ± 1.6 | x | <±0.6 | 98.7 ± 1.5c | ±0.15 | Fu et al.75 | |||||
1.5 | 2D unp | 94 ± 30 | 27.2 ± 2.2 | +B | ±0.6 | 98.5 ± 1.6c | |||||||||||||
Abdomen | 46 | 31 | 3x | 3 | 3D T1 | 2.5D pair | U-net | syn | 79 ± 18 | MAE ME | x | 2 Gy | Liu et al.101 | ||||||
GRE | rig | organs | |||||||||||||||||
Abdomen | 39 | 19 | 0.35 | GRE | 2D pair | U-net | def | 79 ± 18 | ME | 0.1 | 98.7 ± 1.1c | 2.5% | Cusumano et al.82 | ||||||
Abdomen | 54 | 18 | 12 | 3x | 1.5 | 3D T1 | 3Dp pair | U-net | def | 62 ± 13 | 30.0 ± 1.8 | ME, DSC | x | 0.1 | 99.7 ± 0.3c | 2% | Beam | Florkow et al.72 | |
paed | 3 | GRE, T2 TSE | tissues | 0.5 | 96.2 ± 4.0c | 3% | Depth | ||||||||||||
Brain | Brain | 26 | 2x | 1.5 | 3D T1 | m2De pair | CNN | rig | 67 ± 11 | ME tissues | −0.1 ± 0.3 | 99.8 ± 0.7c | Beam | Dinkla et al.102 | |||||
GRE | DSC dist body | Depth | |||||||||||||||||
Brain | 40 | 10 | 1.5 | 3D T1 | 2D pair | CNN | def | 75 ± 23 | DSC | x | <0.2 ± 0.5 | 99.2b | LiuF et al.103 | ||||||
GRE Gd | |||||||||||||||||||
Brain | 54 | 9 | 14 | 5x | 1.5 | 2D T1 | 2D pair | GAN | rig | 47 ± 11 | Each fold | x | −0.7 ± 0.5 | 99.2 ± 0.8c | 1% | 2D/3D | Kazemifar et al.104 | ||
SE Gd | |||||||||||||||||||
Brain | 55 | 28 | 4 | 1.5 | 3D T1 | 2D pair | U-net | rig | 116 ± 26 | ME | x | c, 98 ± 2c | Range | Neppl et al.79 | |||||
GRE | 3Dp pair | 137 ± 32 | >98c, 97 ± 3c | ||||||||||||||||
Brain | 25 | 2 | 25 | 1.5 | 3D T1 | 3Dp | GAN | rig | 55 ± 7 | ME | x | 2 | 98.4 ± 3.5c | 1.65% | Range | Shafai et al.105 | |||
GRE | pair | DSC | |||||||||||||||||
Brain | 47 | 13 | 5x | 3 | T1 | 2D pair | U-net | rig | 81 ± 15 | ME air, | x | 2.3 ± 0.1 | Align | Gupta et al.106 | |||||
tissues | CBCT | ||||||||||||||||||
Brain | 12 | 2 | 1 | LoO | 3 | 3D T1 | 2D+ pair | U-net | rig | 54 ± 7 | ME, DSC tissues | 0.00 ± 0.01 | Range | Spadea et al.107 | |||||
GRE | |||||||||||||||||||
Brain | 15 | 5x | n.a. | T1, T2 | 2Dp pair | GAN | def | 108 ± 24 | Tissues | x | 0.7 | 99.2 ± 1.0c | 1% | Beam | Koike et al.108 | ||||
FLAIRg | Depth | ||||||||||||||||||
Brain | 30fh | 10 | 20 | 3x | 1.5 | 3D T1 | 2D+a pair | GANa | rig | 61 ± 14 | 26.7 ± 1.9 | ME DSC | x | −0.1 ± 0.3 | 99.5 ± 0.8c | 1% | Beam | Maspero et al.71 | |
paed | 3 | GRE ± Gd | SSIM | 0.1 ± 0.4 | 99.6 ± 1.1c | 3% | Depth | ||||||||||||
Brain | 66 | 11 | 5x | 1.5 | 2D T1 | 2D unp | GAN | rig | 78 ± 11 | 0.3 ± 0.3 | 99.2 ± 1.0c | 3% | Beam | Kazemifar et al.109 | |||||
SE Gd | Depth | ||||||||||||||||||
Brain | 242fh | 81 | 79 | 3 | 3D T1 | 3Dp pair | CNN | def | 81 ± 22 | Tissues | x | 0.13 ± 0.13 | 99.6 ± 0.3c | ±0.15 | Andres et al.65 | ||||
1.5 | GRE ± Gd | U-net | 90 ± 21 | 0.31 ± 0.18 | 99.4 ± 0.5c |
- Abbreviations: arch, architecture; CNN, convolutional neural network; conf, configuration; DL, deep learning; DRR, digitally reconstructed radiograph; DSC, Dice score coefficient; FLAIR: fluid-attenuated inversion; GAN: generative adversarial network; Gd: gadolinium; GRE: gradient recalled-echo; LoO, leave one out; LtO, MAE, mean absolute error; ME, mean error; MR, magnetic resonance; MRI, magnetic resonance imaging; NCC, normalised cross-correlation; p, proton plan; paed, pediatric; PSNR, peak signal-to-noise ratio; sCT, synthetic computed tomography; SE, spin-echo; SSIM, structural similarity index; TSE, turbo spin-echo; val, validation; x, photon plan; x-Fold, cross-fold.
- a Comparison with other architecture has been provided.
- b = .
- c = .
- e Trained in 2D on multiple view and aggregated after inference.
- f Robustness to training size was investigated.
- g Multiple combinations (also ± Dixon reconstruction, where present) of the sequences were investigated but omitted.
- h Data from multiple centers.
Patients | MRI | DL method | Image similarity | Dose | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tumor site | Train | val | Test | x-fold | Field (T) | Sequence | conf | arch | Reg | MAE (HU) | PSNR (dB) | Others | Plan | DD (%) | GPR (%) | DVH | Others | Reference | |
Pelvis | Prostate | 36 | 15 | 3 | T2 TSE | 2D pair | U-net | def | 30 ± 5 | ME tissues | x | 0.16 ± 0.09 | 99.4c | 0.2 Gy | Chen et al. 110 | ||||
Prostate | 39 | 4x | 3 | 3D T2 | 2D pair | U-net | def | 33 ± 8 | ME DSC dist body | x | −0.01 ± 0.64 | 98.5 ± 0.7c | Arabi et al.111 | ||||||
Prostate | 17 | LoO | 1.5 | T2 | 3Dp | GANa | rig | 51 ± 17 | 24.2 ± 2.5 | NCC, bone: | −0.07 ± 0.07 | 98 ± 6c | 1% | Range, | Liu et al.112 | ||||
unp | dist, uniform | Peak, | |||||||||||||||||
Prostate | 25 | 14 | 3x | 3 | 3D T2 | 2D pair | U-neta | def | 34 ± 8 | Tissues | x | 1% | 99.2 ± 1d | 1% | Largent et al.113 | ||||
TSE | GANa | 34 ± 8 | ME | 1% | 99.1 ± 1d | ||||||||||||||
Pelvis | 11h | 8 | 3 | T2 | 2D pair | GANa | def | 49 ± 6 | ME | x | 0.7 ± 0.4 | 99.2 ± 1.0c | 1.5% | Boni et al.114 | |||||
1.5 | TSE | organs | |||||||||||||||||
Pelvis | 26 | 15 | 10+19h | 0.35 | 3D T2 | 2.5D pair | GANa | def | 41 ± 4 | 31.4 ± 1 | ME MSE | x | <±1 | 1.5% | Fetty et al.115 | ||||
1.5/3 | bone | ||||||||||||||||||
Pelvis | 39 | 14 | 0.35 | GRE | 2D pair | U-net | def | 54 ± 12 | Tissues | 0.5 | 99.0 ± 0.7c | 1% | Cusumano et al.82 | ||||||
Rectum | 46h | 44 | 1.5 | 3D T2 | 2D pair | GAN | def | 35 ± 7 | ME | x | <±0.8 | 99.8 ± 0.1c | 1% | Bird et al.116 | |||||
bone | |||||||||||||||||||
Head & neck | H&N | 34 | 3x | 1.5 | 3D T2 | 3Dp | U-net | def | 75 ± 9 | ME | x | −0.07 ± 0.22 | 95.6 ± 2.9c | Dinkla et al.117 | |||||
TSE | pair | DSC bone | |||||||||||||||||
H&N | 15 | 12 | 3 | T1 | 2Dpa | GANa | def | 68 ± 2 | SSIM | 0.5 | <98c | 0.5 | Klages et al.118 | ||||||
GRE | pair | RMSE | |||||||||||||||||
H&N | 30 | 15 | 3 | T1 ± Gd | 2D pair | GANa | rig | 70 ± 12 | 29.4 ± 1.3 | SSIM | −0.3 ± 0.2 | 97.8 ± 0.9c | Qi et al.119 | ||||||
T2 TSEg | U-net | 71 ± 12 | 29.2 ± 1.3 | DSC, DRR | −0.2 ± 0.2 | 97.6 ± 1.3c | |||||||||||||
H&N | 135f | 10 | 28 | 3 | 3D T1 | 2D pair | GANa | def | 70 ± 9 | ME, DSC | x | −0.1 ± 0.3 | 98.7 ± 1.0c | 1.5% | Beam | Peng et al.67 | |||
GRE | 2D unp | 101 ± 8 | tissues | 0.1 ± 0.4 | 98.5 ± 1.1c | 1.5% | Depth | ||||||||||||
H&N | 27 | 3x | 3 | 3D T1 | 2D+ | GAN | def | 65 ± 4 | ME | p | <±0.2 | 93.5 ± 3.4c | 1.5% | NTCP | Thummerer et al.120 | ||||
GRE | pair | DSC | RS | ||||||||||||||||
Thorax | Breast | 12f | 18 | LtO | 1.5 | 3D GRE | 2Dpe | GANa | def | 94 ± 11 | NCC | 0.5 | 98.4 ± 3.5c | DRR | Olberg et al.121 | ||||
mDixon | pair | 103 ± 15 | dist bone | ||||||||||||||||
Multiple sites with one network | |||||||||||||||||||
Prostate | 32 | 27 | 3 | 3D T1 | 2D pair | GAN | rig | 60 ± 6 | ME | x | −0.3 ± 0.4 | 99.4 ± 0.6b | 1% | Maspero et al.86 | |||||
Rectum | 18 | 1.5 | GRE | 56 ± 5 | −0.3 ± 0.5 | 98.5 ± 1.1b | |||||||||||||
Cervix | 14 | 1.5/3 | mDixon | 59 ± 6 | −0.1 ± 0.3 | 99.6 ± 1.9b |
- Abbreviations: arch, architecture; CC, cross-correlation; CNN, convolutional neural network; conf, configuration; CT, computed tomography; DD: dose difference; DL, deep learning; DRR, digitally reconstructed radiograph; DSC, Dice score coefficient; DVH, dose-volume histogram; FID, Frechet inception distance; FLAIR: fluid-attenuated inversion; GAN: generative adversarial network; Gd: gadolinium; GRE: gradient recalled-echo; GRE, gradient echo; HD, Hausdorff distance; H&N, head and neck; LoO, leave-one-out; LtO, leave-two-out; mDixon, multicontrast Dixon reconstruction; ME, mean error; MR, magnetic resonance; MRI, magnetic resonance imaging; NCC, normalised cross-correlation; NTCP, normal tissue complication probability; PSNR, peak signal-to-noise ratio; RMSE, root mean squared error; SSIM, structural similarity index; (T)SE, (turbo) spin-echo; val, validation; x-fold, cross-fold; x, photon plan; p, proton plan.
- a Comparison with other architecture has been provided.
- b = .
- c = .
- d = .
- e Trained in 2D on multiple view and aggregated after inference.
- f Robustness to training size was investigated.
- g Multiple combinations (also ±Dixon reconstruction, where present) of the sequences were investigated but omitted.
- h Data from multiple centers.
Patients | DL method | Image similarity | Dose | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tumor site | Train | val | Test | x-Fold | conf | arch | Reg | MAE (HU) | PSNR (dB) | SSIM | Others | Plan | DD (%) | DPR (%) | GPR (%) | DVH | Others | Reference | |
Abdomen | Pancreas | 30 | LoO | 3Dp | GANa | def | 56.9 ± 13.8 | 28.8 ± 2.5 | 0.71 ± 0.03 | NCC | x | <1Gy | Liu et al.131 | ||||||
pair | SNU | ||||||||||||||||||
Thorax | Thorax | 53 | 15 | 2D pair | GAN | def | 94 ± 32 | ME DSC HD tis | x | 76.7 ± 17.3c | 93.8 ± 5.9c | 2.6 | Eckl et al.66 | ||||||
Brain | 24 | LoO | 3Dp | GAN | rig | 13 ± 2 | 37.5 ± 2.3 | NCC | No | Harms et al.83 | |||||||||
Pelvis | Pelvis | 20 | pair | 16 ± 5 | 30.7 ± 3.7 | SNU | |||||||||||||
Prostate | 16 | 4 | 5x | 2D pair | U-net | def | 50.9 | 0.967 | SNU | No | Kida et al.124 | ||||||||
RMSE | |||||||||||||||||||
Prostate | 27 | 7 | 8 | 2D pair | U-neta | def | 58 | ME | x | 99.5c | DPR | Landry et al.132 | |||||||
88.5b | 96.5c | DPR RS | |||||||||||||||||
Prostate | 18 | 8 | 4x | 2D ens | GANe | rig | 87 ± 5 | ME | x | 99.9 ± 0.3c | <±1.5% | DPR | Kurz et al.133 | ||||||
unp | 80.5 ± 5c | 95.9 ± 2.0c | 1% | DPRb RS | |||||||||||||||
Prostate | 16 | 4 | 2D pair | GANa | rig | SSIM | No | Kida et al.126 | |||||||||||
diffROI | |||||||||||||||||||
Pelvis | 205 | 15 | 2D pair | GAN | def | 42 ± 5 | ME DSC HD tis | x | 88.9 ± 9.3c | 98.5 ± 1.7c | 1 | Eckl et al.66 | |||||||
H&N | 81 | 9 | 20 | 2D unp | GANa | def | 29.9 ± 4.9 | 30.7 ± 1.4 | 0.85 ± 0.03 | RMSE | x | 98.4 ± 1.7c | Liang et al.128 | ||||||
phantom | 96.3 ± 3.6 | ||||||||||||||||||
Nasophar | 50 | 10 | 10 | 2D pair | U-net | rig | 6-27 | ME | x | 0.2 ± 0.1 | 95.5 ± 1.6 | 1% | Li et al.129 | ||||||
organs | |||||||||||||||||||
H&N | 30 | 7 | 7 | 2D pair | U-neta | rig | 18.98 | 33.26 | 0.8911 | RMSE | No | Chen et al.125 | |||||||
H&N | 50f | 10 | 2.5D pair | U-net | rig | 49.28 | 14.25 | 0.85 | SNR | No | Yuan et al.127 | ||||||||
H&N | 22 | 11 | 3x | 2Dd pair | U-net | def | 36 ± 6 | ME DSC | −0.1 ± 0.3 | 98.1 ± 1.2c | RS | Thummerer et al.134 | |||||||
SNU | |||||||||||||||||||
H&N | 30 | 14 | 2D pair | GAN | def | 82.4 ± 10.6 | ME | x | 91.0 ± 5.3c | 1 Gy | Barateau et al.130 | ||||||||
tissues | 1% | ||||||||||||||||||
H&N | 25 | 15 | 2D pair | GAN | def | 77.2 ± 16.6 | ME DSC HD tis | x | 91.5 ± 4.3c | 95.0 ± 2.4c | 2.4 | Eckl et al.66 | |||||||
Multiple sites with one network | |||||||||||||||||||
H&N | 15 | 8 | 10 | 2D unpa | GANa | rig | 53 ± 12 | 30.5 ± 2.2 | 0.81 ± 0.04 | ME | x | 0.1 ± 0.5 | 97.8 ± 1c | <2% | Maspero et al.84 | ||||
Lung | 15 | 8 | 10 | 83 ± 10 | 28.5 ± 1.6 | 0.78 ± 0.04 | 0.2 ± 0.9 | 94.9 ± 3c | |||||||||||
Breast | 15 | 8 | 10 | 66 ± 18 | 29.0 ± 2.1 | 0.76 ± 0.02 | 0.1 ± 0.4 | 92 ± 8c | |||||||||||
Pelvis | 135 | 15 | 15 | 10x | 2.5D pair | GANa | def | 24 ± 5 | 20.1 ± 3.4 | x | <1% | Zhang et al.85 | |||||||
H&N | 10 | 24 ± 4 | 22.8 ± 3.4 | RS | Zhang et al.85 |
- Abbreviations: arch, architecture; conf, configuration; CT, computed tomography; DD: dose difference; DL, deep learning; DSC, Dice score coefficient; DVH, dose-volume histogram; ens, ensemble; GAN: generative adversarial network; HD, Hausdorff distance; H&N, head and neck; LoO, leave one out; ME, mean error; NCC, normalised cross-correlation; p, proton plan; PSNR, peak signal-to-noise ratio; RMSE, root mean squared error; RS, range shift; SSIM, structural similarity index; val, validation; x-fold, cross-fold; x, photon plan.
- a Comparison with other architecture has been provided.
- b Dose pass rate (DPR) 1% or = .
- c DPR 2% or = ; DPR 3% or = .
- d Trained in 2D on multiple view and aggregated after inference.
- e Different nets were trained and the different outputs were weighted to obtain final sCT.
- f Robustness to training size was investigated.
Patients | MRI | DL method | Image similarity | PET-related | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Region | Train | val | Test | x-Fold | Field | Contrast | conf | arch | Reg | MAE (HU) | DSC | Tracer | (%) | Others | Reference |
Pelvis | 10 | 16 | 3d | Dixon ± ZTE | 3Dp pair | U-net | def | F-FDG | RMSE | Leynes et al.138 | |||||
Ga-PSMA | SUV diff | ||||||||||||||
Pelvis | 15 | 4 | 4 | 3d | T1 GREb | 2D pair | U-net | def | F-FDG | 1.8 ± 2.4 | -map diff | Torrado et al.142 | |||
Dixon | 1.7 ± 2.0j | ||||||||||||||
1.8 ± 2.4i | |||||||||||||||
3.8 ± 3.9h | |||||||||||||||
Pelvis | 12 | 6 | 3d | T1 GREn | 3Dp pair | CNNl | def | 0.99 ± 0.00i | F-FDG | RMSE | Bradshaw et al.74 | ||||
T2 TSE | 0.48 ± 0.21g | ||||||||||||||
0.94 ± 0.01j | |||||||||||||||
0.88 ± 0.03k | |||||||||||||||
0.98 ± 0.01i | |||||||||||||||
Prostate | 18 | 10 | 3d | Dixon | 2D pair | GANa | def | Ga-PSMA | SSIM | Pozaruk et al.145 | |||||
0.75 ± 0.64e | -map diff | ||||||||||||||
0.52 ± 0.62f | |||||||||||||||
Head | 30 | 10 | 1.5 | T1 GRE | 2D pair | CNNl | def | 0.971 ± 0.005g | n.a. | −0.7 ± 1.1 | Liu et al.147 | ||||
5 | Gd | 0.936 ± 0.011i | |||||||||||||
0.803 ± 0.021h | |||||||||||||||
Head | 30b + 6 | 8 | 1.5b+3d | UTE | 2D pair | U-netl | def | 0.76 ± 0.03g | F-FDG | 1 | Jang et al.141 | ||||
0.96 ± 0.01i | |||||||||||||||
0.88 ± 0.01h | |||||||||||||||
H&N | 32 | 8 | 5 | 3d | Dixon | 2D pair | U-net | rig | 13.8 ± 1.4 | 0.76 ± 0.04h | F-FDG | 3 | Gong et al.140 | ||
12 | 2 | 7 | ± ZTE | 12.6 ± 1.5 | 0.80 ± 0.04h | ||||||||||
Head | 60 | 19 | 4 | 3d | mDixon | 3Dp | U-net | rig | 0.90 ± 0.07m | F-FET | biol tumor | Ladefoged et al.70 | |||
paed | +UTE | pair | vol, SUV | ||||||||||||
Head | 40 | 2 | 3 | T1 GRE | 3Dp | GAN | def | 101 ± 40 | 0.80 ± 0.07h | F-FDG | 3.2 ± 3.4 | rel vol dif | Arabi et al.148 | ||
pair | 302 ± 79h | 1.2 ± 13.8h | surf dist ME | ||||||||||||
407 ± 228g | 3.2 ± 13.6i | RMSE PSNR | |||||||||||||
10 ± 5i | 3.2 ± 13.6g | SSIM SUV | |||||||||||||
Head | 44 | 11 | 11 | 1.5 | T1 GRE | 2.5D pair | U-net | rig | C-WAY | −0.49 ± 1.7 | synt -map, | Spuhler et al.149 | |||
C-DASB | −1.52 ± 0.73 | kin anal | |||||||||||||
Head | 23 | 47 | 3d | ZTE | 3Dp pair | U-net | def | 0.81 ± 0.03h | F-FDG | −0.2 ± 5.6 | Jac | Blanc-Durand et al. 143 | |||
Head | 32 | 4 | 3d | Dixonn | 3Dp pair | GANa | def | 15.8 ± 2.4% | 0.74 ± 0.05h | F-FDG | −1.0 ± 13 | SUV | Gong et al. 146 | ||
Head | 35 | 5 | 3 | mDixon | 2.5D pair | U-net | rig | 10.94 ± 0.01% | 0.87 ± 0.03h | C-PiB | 2 | Gong et al.144 | |||
UTEn | F-MK | ||||||||||||||
Thorax | 14 | LoO | 3d | Dixonn | 2D pair | GANa | def | 67.45 ± 9.89 | F-NaF | PSNR SSIM | Baydoun et al.139 | ||||
RMSE | |||||||||||||||
Other than MR-based sCT | |||||||||||||||
Body | 100 | 28 | PET, no att corrected | 2D pair | U-net | Yo | 111 ± 16 | 0.94 ± 0.01h | F-FDG | −0.6 ± 2.0 | abs err | Liu et al.150 | |||
Body | 80 | 39 | PET, no att corrected | 3Dp pair | GAN | Yo | 109 ± 19 | 0.87 ± 0.03h | F-FDG | 1.0 | NCC PSNR ME | Dong et al.151 | |||
Body | 100 | 25 | PET, no att corrected | 2.5D pair | GAN | Yo | F-FDG | −0.8 ± 8.6 | SUV ME | Armanious et al.152 |
- a Comparison with other architecture has been provided.
- b Data from another MRI sequence used as pretraining.
- d MRI data from hybrid PET/MRI scanner.
- e In SUV max.
- f In SUV mean.
- g In air or bowel gas.
- h In the bony structures.
- i In the soft tissue.
- j In the fatty tissue.
- k In water.
- l Trained to segment the CT/sCT into classes,
- m Expressed in terms of Jaccard index and not DSC,
- n Multiple combinations (also ±Dixon reconstruction, where present) of the sequences were investigated but omitted.
- o Intrinsically registered: PET-CT data.
3.1 MR-only radiotherapy
The first work ever published in this category, and in among all the categories, was by Han in 2017, where he proposed to use a paired U-net for brain sCT generation. After 1 year, the first work published with a dosimetric evaluation was presented by Maspero et al.,86 investigating a 2D-paired GAN trained on prostate patients and evaluated on prostate, rectal, and cervical cancer patients.
Considering the imaging protocol, we can observe that most of the MRIs were acquired at 1.5 T (51.9%), followed by 3 T (42.6%), and the remaining 6.5% at 1 T or 0.35/0.3 T. The most popular MRI sequences adopted depends on the anatomical site: T1 gradient recalled-echo (T1 GRE) for abdomen and brain; T2 turbo spin-echo (TSE) for pelvis and H&N. Unfortunately, for more than 10 studies, either sequence or magnetic field were not adequately reported.
Generally, a single MRI sequence is used as input. However, eight studies investigated using multiple input sequences or Dixon reconstructions 69, 72, 86, 94, 95, 98, 108, 121 based on the assumption that more input contrast may facilitate sCT generation. A relevant aspect related to MRI is which kind on preprocessing is applied to the data before being fed to the network. Generally, intensity normalization techniques such as z-score,122 percentile,71, 86 or range-based normalization, histogram matching,75, 78, 81, 94 or linear rescaling were applied.107, 123 However, techniques such as bias field,65, 75, 78, 80, 81, 87, 90, 91, 94, 96, 100, 101, 105, 108, 111, 118 intensity homogeneity65, 75, 78, 80, 81, 87, 90, 91, 94, 96, 100, 101, 105, 108, 111 were also applied to minimise interpatient intensity variations.
Some studies compared the performance of sCT generation depending on the sequence acquired. For example, Massa et al. 92 compared sCT from the most adopted MRI sequences in the brain, for example, T1 GRE with (+Gd) and without gadolinium (−Gd), T2 SE and T2 fluid-attenuated inversion recovery, obtaining the lowest MAE and highest PSNR for T1 GRE sequences with Gd administration. Florkow et al.98 investigated how the performance of a 3D patch based paired U-net was impacted by different combinations of T1 GRE images along with its Dixon reconstructions, finding that using multiple Dixon images is beneficial in the human and canine pelvis. Qi et al.119 studied the impact of combining T1 (±Gd) and T2 TSE, obtaining that their 2D-paired GAN model trained on multiple sequences outperformed any model on a single sequence.
When focusing on the DL model configuration, we found that 2D models were the most popular ones, followed by 3D patch based and 2.5D models. Only one study adopted a multi-2D (m2D) configuration.102 Three studies also investigated whether the impact of combining sCTs from multiple 2D models after inference (2D+) shows that 2D+ is beneficial compared to single 2D view.71, 107, 118 When comparing the performances of 2D against 3D models, Fu et al.78 found that a modified 3D U-net outperformed a 2D U-net; while Neppl et al.79 1 month later published that their 3D U-net underperformed a 2D U-net not only on image similarity metrics but also considering photon and proton DDs. These contradicting results will be discussed later. Paired models were the most adopted, with only ten studies investigating unpaired training.67, 75-77, 80, 89, 91, 95, 109, 112 Interestingly, Li et al. 76 compared a 2D U-net trained in a paired manner against a cycle-GAN trained in an unpaired manner, finding that image similarity was higher with the U-net. Similarly, two other studies compared 2D-paired against unpaired GANs, achieving slightly better similarity and lower DD with paired training in the abdomen75 and H&N.67 Mixed paired/unpaired training was proposed by Jin et al.89 who found such a technique beneficial against either paired or unpaired training. Yang et al.91 found that structure-constrained loss functions and spectral normalization ameliorated unpaired training performances in the pelvic and abdominal regions.
An interesting study on the impact of the directions of patch-based 2D slices, patch size and GAN architecture was conducted by Klages et al.118 who reported that 2D+ is beneficial against a single view (2D) training, overlapping/nonoverlapping patches is not a crucial point, and that upon good registration training of paired GANs outperforms unpaired training (cycle-GANs).
If we now turn to the architectures employed, we can observe that GAN covers the majority of the studies (55%), followed by U-net (35%) and other CNNs (10%). A detailed examination of different 2D-paired GANs against U-net with different loss functions by Largent et al.113 showed that U-net and GANs could achieve similar image- and dose-base performances. Fetty et al.115 focused on comparing different generators of a 2D-paired GAN against the performance of an ensemble of models, finding that the ensemble was overall better than single models being more robust to generalization on data from different scanners/centers. When considering CNNs architectures, it is worth mentioning using 2.5D-dilated CNNs by Dinkla et al.,102 where the m2D training was claimed to increase the robustness of inference in a 2D+ manner, maintaining a big receptive field and a low number of weights.
An exciting aspect investigated by four studies is the impact of the training size,65, 67, 71, 91, 121 which will be further reviewed in the discussion section.
Finally, when considering the metric performances, we found that 21 studies reported only image similarity metrics, and 30 also investigated the accuracy of sCT-based dose calculation on photon RT (19), proton RT (8), or both (3). Two studies performed treatment planning, considering the contribution of magnetic fields,75, 82 which is crucial for MR-guided RT. Also, only four publications studied the robustness of sCT generation in a multicentric setting.65, 71, 114, 116
Overall, DL-based sCT resulted in DD on average 1% and GPR 95%, except for one study.120 For each anatomical site, the metrics on image similarity and dose were not always calculated consistently. Such aspect will be detailed in the next section.
3.2 CBCT-to-CT generation
CBCT-to-CT conversion via DL is the most recent CT synthesis application, with the first paper published in 2018.124 Some of the works (5 out of 15) focused only on improving CBCT image quality for better IGRT.83, 124-127 The remaining 10 proved the validity of the transformation with dosimetric studies for photons,66, 71, 101, 128-131 protons,120 and for both photons and protons.85, 132, 133
Only three studies investigated unpaired training84, 128, 133; in 11 cases, paired training was implemented by matching the CBCT and ground truth CT by rigid or deformable registration. In Eck et al.,66 however, CBCT and CT were not registered for the training phase, as the authors claimed the first fraction CBCT was geometrically close enough to the planning CT for the network. Deformable registration was then performed for image similarity analyzing. In this work, the quality of contours propagated to sCT from CT was compared to manual contours drawn on the CT to assess each step of the IGART workflow: image similarity, anatomical segmentation and dosimetric accuracy. The network, a 2D cycle-GAN implemented on a vendor's provided research software, was independently trained and tested on different sites, H&N, thorax, and pelvis, leading to best results for the pelvic region.
Other authors studied training a single network for different anatomical regions. In Maspero et al.,84 authors compared the performances of three cycle-GANs trained independently on three anatomical sites (H&N, breast, and lung) versus a single trained with all the anatomical sites together, finding similar results in terms of image similarity.
Zhang et al.85 trained a 2.5D conditional GAN57 with feature matching on a large cohort of 135 pelvic patients. Then, they tested the network on additional 15 pelvic patients acquired with a different CT scanner and 10 H&N patients. The network predicted sCT with similar MAE for both testing groups, demonstrating the potentialities to transfer pretrained models to different anatomical regions. They also compared different GAN flavors and U-net finding the latter statistically worse than any GAN configuration.
Three works tested unpaired training with cycle-GANs.84, 128, 133 In particular, Liang et al.128 compared unsupervised training among cycle-GAN, DCGAN,135 and PGGAN136 on the same dataset, finding the first to perform better both in terms of image similarity and dose agreement.
As regards the body region, most of the studies were focused on H&N and pelvic region. Liu et al.131 investigated CBCT-to-CT in the framework of breath-hold stereotactic pancreatic RT, where they trained a 3D patch cycle-GAN introducing an attention gate (AG)137 to deal with moving organs. They found that the cycle-GAN with AG performed better than U-net and cycle-GAN without AG. Moreover, the DL approach led to a statistically significant improvement in sCT versus CBCT, although some residual discrepancies were still present for this particular anatomical site.
3.3 PET attenuation correction
DL methods for deriving sCT for PET AC have been published since 2017.138 Two possible image translations are available in this category: (i) MR-to-CT for MRAC, where 14 papers were found; (ii) uncorrected PET-to-CT, with three published articles.
In the first case, most methods have been tested with paired data in H&N (nine papers) and the pelvic region (four papers) except Baydoun et al.139 who investigated the thorax district. The number of patients used for training ranged between 10 and 60. Most of the MR images employed in these studies have been acquired directly through 3T PET/MRI hybrid scanners, where specific MR sequences, such as ultra-short echo time (UTE) and zero time echo (ZTE) are used to enhance short tissues, such as in the cortical bone and Dixon reconstruction is employed to derive fat and water images.
Leynes et al.138 compared the Dixon-based sCT versus sCT predicted by U-net receiving both Dixon and ZTE. Results showed that DL prediction reduced the root mean squared error (RMSE) in corrected PET SUV by a factor of 4 for bone lesions and 1.5 for soft tissue lesions. Following this first work, other authors showed the improvement of DL-based AC over the traditional atlas-based MRAC proposed by the vendors,70, 139-144 also comparing several network configurations.145, 146
Torrado et al.142 pretrained their U-net on 19 healthy brains acquired with GRE MRI and, subsequently, they trained the network using Dixon images of colorectal and prostate cancer patients. They showed that pretraining led to faster training with a slightly smaller residual error than U-net weights' random initialization.
Pozaruk et al.145 proposed data augmentation over 18 prostate cancer patients by perturbing the deformation field used to match the MR/CT pair for feeding the network. They compared the performance of GAN with augmentation versus (1) Dixon based and (2) Dixon + bone segmentation from the vendor, (3) U-net with and (4) without augmentation. They found significant differences between the 3 DL methods and classic MRAC routines. GAN with augmentation performed slightly better than the U-net with/without augmentation, although the differences were not statistically relevant.
Gong et al.146 used unregistered MR/CT pair for a 3D patch cycle-GAN, comparing the results versus atlas-based MRAC and CNN with registered pair. Both DL methods performed better than atlas MRAC in DSC, MAE, and . No significant difference was found between CNN and cycle-GAN. They concluded that cycle-GAN has the potentiality to skip the limit of using a perfectly aligned dataset for training. However, it requires more input data to improve output.
Baydoun et al.139 tried different network configurations (VGG16,153 VGG19,153 and ResNet154) as a benchmark with a 2D conditional GAN receiving either two Dixon input (water and fat) or four (water, fat, in-phase, and opposed-phase). The GAN always performed better than VGG19 and ResNet, with more accurate results obtained with four inputs.
In the effort to reduce the time for image acquisition and patient discomfort, some authors proposed to obtain the sCT directly from diagnostic images, - or -weighted, both using images from stand-alone MRI scanners111, 147, 149 or hybrid machines.74 In particular, Bradshaw et al.74 trained a combination of three CNNs with GRE and TSE MRI (single sequence or both) to derive an sCT stratified in classes (air, water, fat, and bone), which was compared with the scanner default MRAC output. The RMSE on PET reconstruction computed on SUV and was significantly lower with the DL method and / input. However, recently, Gong et al.144 tested on a brain patient cohort a CNN with either or Dixon and multiple echo UTE (mUTE) as input. The latter overperformed the others. Liu et al.147 trained a CNN to predict CT tissue classes from diagnostic 1.5 T GRE of 30 patients. They tested on ten independent patients of the same cohort, whose results are reported in Table 5 in terms of DSC. Then, they predicted sCT for five patients acquired prospectively with a 3 T MRI/PET scanner ( GRE), and they computed the , resulting 1%. They concluded that DL approaches are flexible and promising to be applied to heterogeneous datasets acquired with different scanners and settings.
DL methods have also been proposed to estimate sCT from uncorrected PET. Due to the more considerable number of single PET exams, these methods have been tested on the full-body acquisitions and larger patient populations (up to 100 for training and 39 for testing). Although the global MAE is higher than site-specific MR-to-CT studies (about 110 HU vs. 10–15 HU), is below 1% on average, demonstrating the validity of the approach for the scope of PET AC.
4 DISCUSSION
- I.
MR-only RT. The generation of sCT for MR-only RT with DL is the most populated category. Its 51 papers demonstrate the potential of using DL for sCT generation from MRI. Several training techniques and configurations have been proposed. For anatomical regions, as pelvis and brain/H&N, high image similarity and dosimetric accuracy, that is, DDs , can be achieved for photon RT and proton therapy. In region strongly affected by motion,156, 157 for example, abdomen and thorax, the first feasibility studies seem to be promising.72, 75, 82, 112, 121 However, no study proposed the generation of DL-based 4D sCT yet, as from classical methods.158 An exciting application is the DL-based sCT generation for the pediatric population,71, 72 which is considered more radiation-sensitive than an adult population159 and could enormously benefit from MR-only, especially when patients' simulations are repeated.19
The geometric accuracy of sCT needs to be thoroughly tested to enable the clinical adoption of sCT for treatment planning purposes, primarily when MRI or sCT are used to substitute CT for position verification purposes. So far, the number of studies that investigated such an aspect from DL-based sCT is still scarce. Only Gupta et al.,106 for brain, and Olberg et al.,121 for breast cancer, have investigated this aspect assessing the accuracy of alignment based on CBCT and digitally reconstructed radiography, respectively. Future studies are required to strengthen the clinical use of sCT, especially considering that geometric accuracy has been already extensively investigated for sCT generated with classical methods for 3 T and below.160-162
DL-based sCT generation in the context of MR-guided RT20, 163-167 may reduce the treatment time, facilitating for daily image guidance and plan adaptation based on sole MRI.168, 169 For this application, the accuracy of dose calculation in the magnetic field's presence must be assessed before clinical implementation. So far, the studies investigating this aspect are still few, for example, for abdominal75 and pelvic tumors82 and only considered low magnetic fields. Recently, Groot Koerkamp et al.170 published the first dosimetric evaluation of DL-based sCT for high magnetic field MR-guided RT achieving DDs for breast cases. The results are promising, but we advocate for further studies on additional anatomical sites and magnetic field strengths.
- II.
CBCT-to-CT for image-guided (adaptive) radiotherapy. In-room CBCT imaging is widespread in photon and proton RT for daily patient setup.171 However, CBCT is not commonly exploited for daily plan adaptation and dose recalculation due to the artifacts associated with scatter and reconstruction algorithms that affect the quality of the electron density predicted by CBCT.172 Traditional methods to cope with this issue have been based on image registration,173, 174 on scatter correction,175 look-up-table to rescale HU intensities,176 and histogram matching.177 DL's introduction for converting CBCT to sCT has substantially improved image quality leading to faster results than image registration and analytical corrections.134 Speed is crucial for the translation of the method into the clinical routine. However, one of the problems arising in CBCT-to-CT conversion for clinical application, is the different field of view (FOV) between CBCT and CT. Usually, the training is performed by registering, cropping, and resampling the volume to the CBCT size that is smaller than the planning CT.
Nonetheless, for replanning purposes, the limited FOV may hinder calculating the plan to the sCT. Some authors have proposed to assign water equivalent density within the CT body contour for the missing information.130 In other cases, the sCT patch has been stitched to the planning CT to cover the entire dose volume.84 Ideally, appropriate FOV coverage should be employed when recalculating the plan for online adaptive RT. Besides the dosimetric aspect, improved image quality may increase accuracy during image guidance for patient set-up and OAR segmentation. These are necessary steps for online adaptive RT especially for anatomical sites prone to large movements, as speculated by Liu et al.131 in the framework of pancreatic treatments.
CBCT-to-CT has been proved both for photon and proton RT. For proton RT, the setup accuracy and dose calculation are even more relevant to avoid RS errors that could jeopardise the benefit of treatment.63 Because there is an intrinsic error in converting HU to relative proton stopping power,178 it has been shown that DL methods can translate CBCT directly to stopping power.179 This approach has not been covered in this review, but it is an exciting approach that will probably lead to further investigations.
Interestingly, increasing the quality of CBCT can be tackled not only as an image-to-image translation problem but also as an inverse problem, that is, from a reconstruction perspective. Specifically, by having the raw data measurements (projections), DL could be used to improve tomography. In this sense, many investigations have been proposed, but they are considered out of the scope this review. For the interested reader, we suggest the following resources.180-184 Currently, it is unclear whether formulating (CB)CT quality enhancement as a synthesis or reconstruction problem would be beneficial. First attempts showed that training convolutional networks for reconstruction enhanced their generalization capability to other anatomy185; however, research on such aspects is still ongoing.
- III. PET attenuation correction. The sCT in this category is obtained either from MRI or from uncorrected PET. In the first case, the work's motivation is to overcome the current limitations in generating attenuation maps (-maps) from MR images in MRI/PET hybrid acquisitions that miscalculated the bone contribution.186 In the second case, the limits to overcome are different: (i) to avoid extra-radiation dose when the sole PET exam is required, (ii) to avoid misregistration errors when stand-alone CT and PET machines are used, and (iii) to be independent of the MR contrast in MRI/PET acquisitions. Besides the network configuration, MRI used for the input, or the number of patients included in the studies, DL-based sCT have consistently outperformed current MRAC methods available on commercial software. The results of this review support the idea that DL-based sCT will substitute current AC methods, being also able to overcome most of the limitations mentioned above. These aspects seem to contradict the stable number of papers in this category in the last three years. Nonetheless, we have to consider that the recent trend has been to directly derive the -map from uncorrected PET via DL. Because this review considered only image-to-CT translation, these works were not included, but they can be found in a recent review by Lee.47 However, it is worth mentioning a recent study from Shiri et al.,187 where the largest patient cohort ever (1150 patients split in 900 for training, 100 for validation and 150 for test) was used for the scope. Direct -map prediction via DL is an auspicious opportunity that may direct future research efforts in this context.
4.1 Deep-learning considerations and trends
The number of patients used for training the networks is quite variable, ranging from a minimum of 7 (in I)68 to a maximum of 205 (in II),66 and 24265 (in I). In most of the cases, the patient number is limited to the availability of training pairs. Data augmentation is performed as linear and nonlinear transformation188 to increase the training accuracy, as demonstrated in Pozaruk et al.145 However, few publications investigated the impact of increasing the training size,65, 67, 71, 121, 127 finding that image similarity increases when training up to 50 patients. This investigation can indicate the minimum amount of patients necessary to include in the training to achieve the state of the art performances. The optimal patient number may also depend on the anatomical site and its inter- and intrafraction variability. Besides, attention should be dedicated to balancing the training set, as performed in Reference 65, 71. Otherwise, the network may overfit, as previously demonstrated for segmentation tasks.189
GANs were the most popular architecture, but we cannot conclude that it is the best network scheme for sCT. Indeed, some studies compared U-net or other CNN versus GAN finding GAN performing statistically better85, 139; others found similar results145, 146 or even worse performances.76, 144 We can speculate that, as demonstrated by Largent et al.,113 a vital role is played by the loss function, which, despite being the effective driver for network learning, has been investigated less than the network architecture, as highlighted for image restoration.190 Another important aspect is the growing trend, except category III, in unpaired training (five and seven papers in 2019 and 2020, respectively). The quality of the registration when training in a paired manner influences the quality of DL-based sCT generation.122 In this sense, unpaired training offers an option to alleviate the need for well-matched training pairs. When comparing paired versus unpaired training, we observed that paired training leads to slightly better performances. However, the differences were not always statistically significant.67, 76, 91 As proposed by Yang et al.,91 unsupervised training decreases the semantic information from one domain to another.91 Such an issue may be solved by introducing a structure-consistency loss, which extracts structural features from the image defining the loss in the feature space. Yang et al.'s results showed improvements in this sense relative to other unsupervised methods. They also showed that pre-registering unpaired MR-CT further improves unsupervised training results, which can be an option when input and target images are available, but perfect alignment is not achievable. In some cases, unpaired training even demonstrated to be superior to paired training.191 A trend lately emerged is the use of architecture initially thought for unpaired training, for example, cycle-GAN to be used for paired training.83, 90
Focusing on the body sites, we observed that most of the investigations were conducted in the brain, H&N, and pelvic regions. Simultaneously, fewer studies are available for the thorax and the abdomen, representing a more challenging patient population due to the organ motion.192
In MR-only RT results, we found contradicting results regarding the best performing spatial configuration for the papers that directly compared 2D versus 3D training.78, 79 It is undoubtedly clear that 2D+ increases the sCT quality compared to a single 2D views, as demonstrated in Spadea et al.107 and Maspero et al.71; however, when comparing 2D against 3D training, patch size is a vital aspect.118 3D deep networks require a more significant number of training parameters than 2D networks.193 For sCT generation, the approaches adopted have chosen to use patch size much smaller than the whole volume, probably hindering the contextual information considered. Generally, downsampling approaches have been proposed to increase the network's perceptive field, for example, for segmentation tasks,194 but they have not been applied to sCT generation. We believe this will be an exciting area of research.
For what concerns the latest development from the DL perspective, in 2018, Oktay et al.137 proposed a new mechanism, called AG, to focus on target structures that can vary in shape and size. Liu et al.131 incorporated the AG in the generator of a cycle-GAN to learn organ variation from CBCT-CT pairs in the context of pancreas adaptive RT, showing that its contribution significantly improved the predictions compared to a network without AG. Other papers also adopted attention.91, 95 Embedding has also been proposed to increase the network's expressivity of the network and applied by Xiang et al.81 (I). As AG's mechanism is a way to focus the on specific portions of the image, it can potentially open the path for new research topics. In 2019, Schlemper and colleagues195 evaluated the AG for different tasks in medical image processing: classification, object detection, segmentation. So, we can envision that in the online IGART, such a mechanism could lead to multitask applications, such as deriving sCT, while delineating the structure of interests.
4.2 Benefits and challenges for clinical implementations
DL-based sCT generations may reduce the need for additional or nonstandard MRI sequences, for example, UTE or ZTE, which it could turn in shorten the total acquisition time and speed up the workflow or increase patient throughput. As already mentioned, speed is particularly interesting for MR-guided RT as well as for adaptive RT in II, where it is considered crucial for on line correction. For what concern categories II and III, the generation of DL-based sCT possibly enables dose decreasing during imaging by reducing the need for CT in case of anatomical changes (in II) or by possibly diminishing the amount of radioactive material injected (in III).
Finally, it is worth commenting on the current status of the clinical adoption of DL-based sCT. We could not find that any of the methods considered are now clinically implemented and used. We speculate that this is probably related to the fact that the field is still relatively young, with the first publications only from 2017 and that time for clinical implementations generally last years, if not decades.196, 197 Additionally, as already mentioned, for categories I/II, the impact of sCT for position verification still needs to be thoroughly investigated. The implementation may also be more comfortable for category III if the methods would be directly integrated into scanners. In general, the involvement of vendors may streamline the clinical adoption of DL-based sCT. In this sense, we can report that vendors are currently active in evaluating their methods in research settings, e.g. for brain,65 pelvis116 in I, and for H&N, thorax, and pelvis in II.66 In the last month, Palmer et al.198 also reported using a prereleased version of a DL-based sCT generation approach for H&N in MR-only RT. Another essential aspect that needs to be satisfied is the compliance to the currently adopted regulations,199 where vendors can offer vital support.200, 201
A key aspect of clinical implementation is the precise definition of a DL-based solution's requirements before being accepted. If we consider the reported metrics, we cannot find uniform criteria for reporting. Multiple metrics have been defined, and it is not clear on which ROIs they should be computed. For example, the image-based similarity was reported on the body contour or in tissues generally defined by different thresholds; for task-specific metrics, the methods employed are even more heterogeneous. For example, in I and II, GPRs can be performed in 2D, 3D, and different dose thresholds level have been employed, for example, 10%, 30%, 50%, or 90% of the prescribed or maximum dose. In III, the can be computed either on SUV, max SUV, or in larger VOI, making it difficult to compare different network configurations' performances. We think that this lack of standardization in reporting the results is also detrimental to clinical adoption. A first attempt at revising the metrics currently adopted has been performed by Liesbeth et al.202 However, this is still insufficient, considering the differences in how such metrics can be calculated and reported. In this sense, we advocate for consensus-based requirements that may facilitate reporting in future clinical trials.203 Also, no public datasets arranged in the form of grand challenges (https://grand-challenge.org/) are available to enable a fair and open evaluation of different approaches.204
To date, four scientific studies have already investigated the performance of DL-based sCT in a multicenter setting.71, 114-116 These studies have been reported only for MR-only RT. Future work should focus on assessing the performance of DL-based sCT generation for II and III. On the contrary, investigations on sCT generation with classical methods using multicenter data are more diffuse for all the three categories.26, 205-209 Of particular relevance when considering the generalization of a DL model for sCT generation may be the application of transfer learning.210, 211 Mainly, transfer learning may be exploited to facilitate fine-tuning a model pretrained on a specific MRI contrast or CBCT image protocols; or generalise among multiple anatomies. No paper was found up to December 2020 investigating this aspect, but it could be an exciting research area. More recently, Li et al.212 showed that transfer learning facilitated training a DL model on different MRI contrasts for sCT generation.
The quality of sCT cannot be judged by a user, except when its quality is inferior. Therefore, software-based quality assurance (QA) procedures should be put in place. It could be pretty interesting to have a phantom to allow regular QA procedures, for example, for CT.213 This would be relatively straightforward for II; however, in MR-based sCT, phantoms' manufacturing is quite challenging due to the need for contrast in MRI and CT. Recently, the first phantoms have been proposed for such task214-217 showing the potential of additive manufacturing.
Alternatively, it would be relevant if a CNN could automatically generate a metric to assess the quality of sCTs, for example, already presented for automatic segmentation.218 In this sense, Bragman et al.219 introduce the use of uncertainty for such a task, by adopting a multitask network and a Bayesian probabilistic framework. More recently, two other works proposed to use uncertainty either from the combination of independently trained networks71 or via dropout-based variational inference.220 So far, the field of uncertainty estimation with DL221 has been superficially touched for sCT generation. It would be interesting to see future work focusing on developing criteria for automatically identifying of failure cases using uncertainty prediction. Patients with inaccurate sCTs will be flagged for CT rescan or manual adjustment of the sCT if deemed feasible.
4.3 Beyond sCT for radiotherapy
We found other possible applications of DL-based image generation during the database search, which are beyond the categories mentioned so far or the RT application. For example, Kawahara et al.222 proposed to generate synthetic dual-energy CT from CT to assess the body material composition using 2D-paired GANs. Also, commercial solutions start to be evaluated for the generation of DL-based sCT from MRI for lesion detection of suspected sacroiliitis223 or to facilitate surgical planning of the spine.224 An exciting application is also the generation of sCT to facilitate multimodal image registration, as proposed by Mckenzie et al.225
All the techniques of category I could be directly applied to MR-guided high-intensity focused ultrasound, where otherwise an additional CT would be required to properly plan the treatment.226
Additionally, the methods here reviewed to generate sCT can be applied to translating other image modalities. Interesting examples in the RT realm are provided by Jiang et al.227 who investigated using MRI-to-CT translation to increase the segmentation's robustness. Kieselmann et al.228 generated synthetic MRI from CT to train segmentation networks that exploit the wealth of delineation on another modality. A detailed review of other image-to-image translation applications in RT has been recently compiled by Wang et al.49
5 CONCLUSION
DL-based generation of sCT has a bright future, with extensive amount of research work being done on the topic. DL methods for sCT generation have been reviewed in the context of (I) MR to replace CT in RT treatment planning, (II) CBCT-based adaptive RT, and (III) in generating attenuation maps for PET.
A detailed review of each category was presented, providing a comprehensive comparison among DL-based methods in terms of the most popular metrics reported. We found that DL-based sCT generation is an active and growing area of research. For several anatomical sites, for example, H&N/brain and pelvis, sCT seems feasible, with DL achieving DD to CT-based planning 1% in the RT context and better performance for PET AC compared to the standard MRAC methods.
While DL-based sCT generation techniques are up and upcoming, comprehensive commissioning and QA of DL-based sCT techniques are critical prior and essential during clinical deployment, in order to ensure patient safety. The key to further diffusion of DL-based sCT techniques is evaluating their generalization capability in a multicenter setting.
ACKNOWLEDGMENTS
Matteo Maspero is grateful to prof.dr.ir. Cornelis (Nico) AT van den Berg, head of the Computational Imaging Group for MR diagnostics and therapy, Center for Image Sciences, UMC Utrecht, the Netherlands, for the general support provided during this manuscript's compilation.
CONFLICT OF INTEREST
The authors declare that there is no conflict of interest.
APPENDIX
The query used in selected databases—PubMed, Scopus and Web of Science—in the fields (Title/Abstract/Keywords) was the following (Figure A1):

((“radiotherapy”)OR (“radiation therapy”) OR (“proton therapy”) OR (“oncology”) OR (“imaging”) OR (“radiology”) OR (“healthcare”) OR (“CBCT”) OR (“cone-beam CT”) OR (“PET”) OR (“attenuation correction”) OR (“attenuation map”)) AND ((“synthetic CT”) OR (“syntheticCT”) OR (“synthetic-CT”) OR (“pseudo CT”) OR (“pseudoCT”) OR (“pseudo-CT”) OR (“virtual CT”) OR (“virtualCT”) OR (“virtual-CT”) OR (“derived CT”) OR (“derivedCT”) OR (“derived-CT”) OR (sCT)) AND ((“deep learning”) OR (“convolutional network”) OR (“CNN”) OR (“GAN”) OR (“GANN”) OR (artificial intelligence)).