Clinical evaluation of 4D MRI in the delineation of gross and internal tumor volumes in comparison with 4DCT

Abstract Purpose To evaluate clinical utility of respiratory‐correlated (RC) four‐dimensional magnetic resonance imaging (4DMRI) for lung tumor delineation and motion assessment, in comparison with the current clinical standard of 4D computed tomography (4DCT). Methods and Materials A prospective T2‐weighted (T2w) RC‐4DMRI technique was applied to acquire coronal 4DMRI images for 14 lung cancer patients (16 lesions) during free breathing (FB) under an IRB‐approved protocol, together with a breath‐hold (BH) T1w 3DMRI and axial 4DMRI. Clinical simulation CT and 4DCT were acquired within 2 h. An internal navigator was applied to trigger amplitude‐binned 4DMRI acquisition whereas a bellows or real‐time position management (RPM) was used in the 4DCT reconstruction. Six radiation oncologists manually delineated the gross and internal tumor volumes (GTV and ITV) in 399 3D images using programmed clinical workflows under a tumor delineation guideline. The ITV was the union of GTVs within the breathing cycle without margin. Average GTV and motion range were assessed and ITV variation between 4DMRI and 4DCT was evaluated using the Dice similarity index, mean distance agreement (MDA), and volume difference. Results The mean tumor volume is similar between 4DCT (GTV4DCT = 1.0, as the reference) and T2w‐4DMRI (GTVT2wMR = 0.97), but smaller in T1w MRI (GTVT1wMR = 0.76), suggesting possible peripheral edema around the tumor. Average GTV variation within the breathing cycle (22%) in 4DMRI is slightly greater than 4DCT (17%). GTV motion variation (−4 to 12 mm) and ITV variation (∆VITV=−25 to 95%) between 4DCT and 4DMRI are large, confirmed by relatively low ITV similarity (Dice = 0.72 ± 0.11) and large MDA = 2.9 ± 1.5 mm. Conclusion Average GTVs are similar between T2w‐4DMRI and 4DCT, but smaller by 25% in T1w BH MRI. Physician training and breathing coaching may be necessary to reduce ITV variability between 4DMRI and 4DCT. Four‐dimensional magnetic resonance imaging is a promising and viable technique for clinical lung tumor delineation and motion assessment.

magnetic resonance imaging is a promising and viable technique for clinical lung tumor delineation and motion assessment. computed tomography (4DCT), the current clinical standard in lung tumor motion assessment. 1 In addition, 4DMRI allows utilizing an internal navigator as a respiratory surrogate, eliminating the uncertainty from an assumed external-internal motion correlation of an external surrogate used in the 4DCT acquisition. Thus, a navigatortriggered/binned 4DMRI has higher image quality with fewer and less severe binning artifacts. [2][3][4][5][6] Furthermore, MRI provides the option of the nonaxial scanning direction, such as sagittal or coronal scans, which are more desirable for characterizing tumor/organ respiratory motion. 7,8 Therefore, 4DMRI promises to be clinically beneficial in assessing respiratory-induced tumor motion. [9][10][11] Although normal lung has low MR signal from the "air-diluted" soft tissue, a lung tumor usually has higher density and produces sufficient MR signal, including lung tumor perfusion with dynamic contrast enhancement imaging 12,13 and lung tumor microenvironment with diffusion-weighted imaging. 14,15 In lung tumor motion assessment and monitoring, dynamic two-dimensional (2D) cine imaging has been widely applied, including MR-guided radiotherapy, 7,16-20 automatic tumor contouring for motion tracking, 21,22 and tumor motion variation during radiotherapy. [23][24][25] A fast field echo with either balanced steady-state free precession or T1-weighted (T1w) 2D cine has been used to achieve 4 Hz frame rate. 7,[16][17][18][19][20] For treatment planning purposes, volumetric 4DMRI is required so that both lung tumor and surrounding normal organs can be delineated for accurate targeting and motion assessment, using the gross and internal tumor volume (GTV and ITV). Recently, 4DMRI has been assessed for delineating five organs and propagating the contours between different respiratory states. 26 Among various MR contrast, T2-weighted (T2w) 4DMRI provides higher tissue contrast for GTV delineation 4,27 and the clinical utility needs to be further assessed in comparison with 4DCT.
In this study, we present the comparison of lung tumor delineation based on T2w 4DMRI, T1w BH MRI, and 4DCT by six radiation oncologists in 14 lung cancer patients with 16 lesions, which were grouped by location (central vs peripheral) and size (small, medium, and large). The comparison includes GTV variation within a breathing cycle and average GTV difference among these imaging modalities. Furthermore, GTV motion variation was assessed and ITV difference between 4DMRI and 4DCT was characterized in terms of size and shape. The clinical implication of the lung tumor delineation using 4DMRI and 4DCT was discussed.

| METHODS AND MATERIALS
An IRB-approved protocol was established and 14 lung cancer patients were scanned using a 3 T MRI scanner (Φ = 70 cm, Ingenia, Philips Healthcare, the Netherlands) after clinical CT and 4DCT scans for treatment planning using a helical CT scanner (Φ = 85 cm, bigbore brilliant, Philips Healthcare, the Netherlands) or a cine PET/CT scanner (Φ = 70 cm, Discovery, STE, GE Healthcare, Milwaukee, WI).

2.A | Acquisition of clinical 4DCT and planning CT
Clinical 4DCT and planning CT images were acquired first before MR scans, within 1-2 h on the same day. The patient body immobilization mold was prepared in the CT room and its width (<70 cm) was made to fit in the MR scanner. The patient was asked to have both arms up above the head and wear an MR headphone during molding for later MR scans.
Standard clinical thoracic CT/4DCT scan protocols were applied with a voxel size of 1 × 1 × 3 mm 3 covering the entire lung. The planning CT was first acquired in free breathing, followed by the 4DCT scan. A bellows or real-time position management device was placed around 5-10 cm inferior to the xiphoid process of the sternum as the respiratory surrogate for retrospective amplitude-binned 4DCT reconstruction.
2.B | Image acquisition of T2w 4DMRI and T1w BH MRI The MR scans were performed after CT scans using the same body mold. A prospective navigator-triggered amplitude-binned T2w 4DMRI scanning protocol was applied to acquire the 4D images with 2 × 2 × 5 mm 3 voxel size in the coronal direction. The navigator is a dynamic 1D image (20 Hz) within a small field of view (3 × 3 × 6 cm 3 ) set at the right diaphragm dome to detect internal motion signal (waveform) based on the image intensity gradient for respiratory binning. The first 10-second navigator waveform was acquired and used to train the system for an amplitude triggering to fill the bin-slice table (10 bins vs anterior-posterior slices). The pulse sequence was a single-shot, turbo spin echo with TE/TR = 80/5000-7000 ms, flip angle = 90°; SENSE (SENSitivity Encoding) factor = 2, and a half-scan factor = 0.7. Three-to-four segments were used to avoid signal saturation due to two consecutive acquisitions from the same segment. As control experiments, the axial 4DMRI scan (10 bins at 2 × 2 × 5 mm 3 ) and high-resolution coronal 4DMRI scan (3 bins at 2 × 2 × 2 mm 3 ) were applied for first and last seven patients, respectively. By estimation, all 4DMRI scans would take a similar time range (5-15 min). 8 A T1w turbo field echo (TFE) sequence was employed with TE/ TR of 1.9 ms/4.2 ms and a flip angle of 15°. Parallel imaging (SENSE factor of 3), a half-scan factor of 0.8, and central-to-peripheral kspace acquisition order (CENTRA) were employed. The coronal direction with the smallest body separation (so least slice number) was used for acquisition, while the lateral direction with minimal motion was set for phase encoding. The same field of view for T2w 4DMRI was applied for T1w BH MRI with the voxel size of 2 × 2 × 2 mm 3 .
More detailed scan parameters in 4DMRI and BH MRI were For simplicity, only the primary GTV was delineated without considering nodal involvement. Only GTV was delineated in T1w BH MRI.
The ITV was automatically calculated without a margin.

2.D | Analysis of multiple datasets of lung tumor contours
The 16 lung lesions were first categorized based on their location (central vs. peripheral), as the delineation precision of peripheral lesions should be higher than central lesions due to the well-defined boundary. The lesions were then sorted by size, which also impacts on the contour uncertainty and tumor mobility. A small tumor has a volume of <10 cc, a medium tumor has 10-30 cc, and a large tumor has >30 cc.
Four aspects of the GTV/ITV delineation were analyzed. First, the average GTV was compared among 4DCT, T2w 4DMRI, and T1w MRI. Second, GTV variation was compared within the breathing cycle and between 4DMRI and 4DCT. Third, GTV displacement (center of mass, COM) was compared between 4DMRI and 4DCT.
Fourth, the volume and shape of the ITV were compared between 4DMRI and 4DCT, after the alignment of ITVs based on their COM, using the Dice similarity index and mean distance agreement (MDA) for quantification.
Because of differences in viewing direction (axial CT vs coronal MRI) and image resolution (1 × 1 × 3 mm 3 for CT/4DCT and 2 × 2 × 5 mm 3 for 4DMRI), two sets of control experiments were performed. The first seven patients were also scanned with 4DMRI in axial view and the last seven patients also were scanned with a higher resolution of 2 × 2 × 2 mm 3 .

3.A | Average GTV and its variation from CT to
T2w and T1w MRI For small and medium-sized peripheral tumors, the tumor boundary is well defined except that it may contact the chest wall. The GTV is similar between 4DCT and T2w 4DMRI, while the GTV from T1w MRI is on average 24% smaller. A similar trend was found for all lesions: the average tumor volume ratios are GTV T2w /GTV CT = 0.97 ± 0.16 and GTV T1w /GTV CT = 0.76 ± 0.30, as shown in Table 1.

3.B | Variation of GTV within the breathing cycle of 4DMRI and 4DCT
The GTV variation within the breathing cycle may result from 4D image quality (artifacts) and intra-observer variation. Figure 2 illustrated the image quality difference of 4DCT and 4DMRI of two patients and the difference would affect tumor delineation, especially smaller tumors with large motions. The mean variation of GTV within the breathing cycle among six radiation oncologists is slightly greater in 4DMRI (22%) than 4DCT (16%). The GTV ratios of axial T2w 4DMRI to 4DCT (0.96 ± 0.10) and high-resolution coronal T2w 4DMRI to axial 4DCT (1.04 ± 0.13) are close to unity, similar to 0.97 ± 0.16 for low-resolution T2w 4DMRI, suggesting that contouring directions and slice thickness difference are not critical in lung tumor delineation.

3.C | Variation of GTV motion between 4DMRI and 4DCT
Tumor motion displacement varies between 4DCT and 4DMRI owing to patient breathing irregularities, as shown in Fig. 3  3.D | The difference of ITV size and shape between

4DMRI and 4DCT
Although ITV delineation is affected by both GTV motion and GTV delineation, the GTV motion difference plays a more significant role, especially for small mobile tumors. In this study, the union of all GTVs is regarded as the ITV without an extra margin. Fig. 3(b) shows the average ITV differences between 4DCT and 4DMRI, together with the variation from the six physicians. The relative ITV difference [(ITV 4DMRI -ITV 4DCT )/ITV 4DCT × 100%] is 16 ± 31%, as shown in Fig. 3(c). This is a substantial difference in target volume for treatment planning. Table 2 tabulates the MDA and Dice similarity index of the ITV delineated between 4DCT and 4DMRI by six physicians. The mean MDA is 2.9 ± 1.5 mm (range: 0.9-9.0 mm), while the Dice index is 0.72 ± 0.11 (range: 0.41-0.86). When the Dice index is greater than 0.7-0.8, the MDA is usually 1.0-3.0 mm. Figure 4 illustrates the variation of the dice indices from the six radiation oncologists, suggesting that the large ITV variation is also largely associated with interobserver variation, in addition to tumor motion variation between 4DCT and 4DMRI scans. To our best knowledge, this is the first study that compares the GTV and ITV delineation between 4DMRI and 4DCT.

4.A | Lung tumor visualization and variation between imaging modalities
Although there is relatively low signal-to-noise ratio in lung MR imaging, T2w 4DMRI and T1w BH images provide sufficient unique artifact that appears similar to binning artifacts but only in the heart, aorta/vena, and major artery/veins, because of blood flow that may move the excited protons away from the acquisition slice, depending on the flowrate, flow direction, and slice thickness. Additionally, the coronal scan preserves more anatomic integrity of superior-inferior motion. Although the image resolution of 4DMRI differs from 4DCT, the larger slice thickness (5 mm) in 4DMRI, which may affect tumor visualization, does not produce much difference in tumor delineation as the in-slice resolution (pixel size of 2 × 2 mm 2 ) seems quite acceptable. In fact, for tumor motion assessment, the coronal 4DMRI slices provide a 2.0 mm resolution, superior to 3 mm resolution in the superior-inferior direction of 4DCT. It is worthwhile to mention that when the tumor is small and motion is large, such as tumor #8 in Table 1, the voxel size and severity of the binning artifacts could have substantial impact on the delineation. This study has demonstrated that the GTV and ITV delineation from T2w 4DMRI is comparable with that from 4DCT.
Based on over 2000 GTV contours (16 lesions, three modalities, 10 bins, and six physicians) in various locations, sizes, and shapes, the average GTV difference is small (3%) between 4DCT and T2w 4DMRI (T2w-to-CT volume ratio is 97 ± 16%). However, GTV decreases by 24% from CT to T1w MRI (T1w-to-CT volume ratio is 0.76 ± 0.30). A hypothetical explanation is that lung lesions may have a thin layer of edema, which can be well visualized by both CT and T2w MRI but may not by T1w MRI. Interestingly, GTV from CT was reported to be 18.3% greater than that of the pathological specimen based on 47 stage I or II lung cancer patients, 29 supporting this hypothesis. Another study based on 52 lung cancer patients illustrated that CT-based GTV delineation is larger than integrated PET/ CT-based GTV, which was closer to that obtained from the pathological specimen. 30 Although this edema hypothesis may be plausible, further investigation is necessary to provide direct evidence for support.
Between central and peripheral lesions, the major uncertainty in GTV delineation is from the border visualization of the gross tumor.
The peripheral lesions often have clearly defined edge, and therefore the delineated GTV is more accurate than central lesions, which are likely attached to a local normal structure in the hilum, making the delineation of the GTV more subjective. Therefore, it is more challenging to delineate a centrally located lung lesion than a peripheral one. Although T2w 4DMRI provides better soft-tissue contrast to differentiate the tumor from the surrounding central lung tissues, unlike 4DCT, further study and training are necessary to take advantage of 4DMRI. In this study, the inconsistency in determining the boundary of the GTV results in a large variation of the GTV delineation.
T A B L E Gross tumor volume (GTV, in cc) variation between 4DCT, T2w 4DMRI, and T1w breath-hold (BH) MRI.   In fact, the advantages of the coronal scans in 4DMRI are the integrity of the moving anatomy, the in-slice motion has a higher spatial resolution, and 3 the faster acquisition due to fewer slices in the AP F I G . 3. Mean GTV displacement difference (a) and ITV variation between 4DCT and 4DMRI (b and c). The GTVs are sorted based on their location (central vs peripheral) and size (S: small, M: medium, L: large). The error bars (1σ) are from GTV and ITV delineation by the six physicians. Five out of 16 lesions (~31%) have motion variation >5 mm and mean ITV varies from −25% to + 95% between 4DCT and 4DMRI. 4DCT, 4D computed tomography; 4DMRI, four-dimensional magnetic resonance imaging; GTV and ITV, gross and internal tumor volumes.
direction. The GTV difference depends upon image quality as well as the experience of the users using both 4DCT and 4DMRI.
In this study, the inter-observer variation is high, as indicated in high standard deviation (σ) in Table 1  It is worthwhile to indicate that this large ITV difference is caused by patient breathing irregularities rather than imaging modalities and both are correctly reflecting the ITV at the moment of scanning, but they may both deviate from the mean value if multi-breath respiration motion is scanned and used to delineate the ITV, closer to the mean ITV value for treatment planning.
An alternative to the single-breath 4DCT and 4DMRI is the multibreath volumetric time-resolved 4DMRI, which has been reported lately to provide multiple breathing cycles over a time scale of minutes, rather than seconds. 28,31 Using the time-resolved 4DMRI, patient-specific multi-breath tumor or organ motion can be better characterized and potentially incorporated into treatment planning and delivery for motion-compensated radiotherapy. [32][33][34] In summary, this study is the first attempt to compare lung tumor delineation between T2w 4DMRI and 4DCT. The additional image contrast provided by T2w 4DMRI may help to reduce the uncertainty in delineating centrally located tumors. However, given the limited clinical utility of MRI in current thoracic radiotherapy planning, additional studies, as well as training, will be needed for physicians to appropriately interpret the soft-tissue contrast in MRI-T A B L E 2 ITV difference between 4DMRI and 4DCT quantified by the mean distance to agreement (MDA, mm) and Dice similarity index among six radiation oncologists. The site refers to central (C) or peripheral (P) and size refers to small (S: <10cc), medium (M: 10-30 cc), and large (L: >30 cc). (simulation) and 3 cm motion in fluoroscopy (treatment). 37 In this study, the GTV variation is as large as 109% and ITV variation is (−25 to 95%) between 4DCT and 4DMRI, consistent with the previous finding.

| CONCLUSION
The feasibility of using 4DMRI for GTV and ITV delineation of lung cancer in radiotherapy has been demonstrated by comparison with 4DCT. The mean GTV from T2w-based (97%) is similar to CT-based GTV (100%) while the T1w-based GTV is 24% smaller (76%). This trend is more consistent for small/medium peripheral (detached) lung tumors. The average relative inter-observer variation is increasing from T2w 4DMRI (5%), to 4DCT (14%) and T1w BH MRI (25%), suggesting a higher agreement among physicians when using T2w 4DMRI. Due to breathing irregularities, a large ITV variation (−25% to 95%) between 4DMRI and 4DCT is observed, implying a variation between simulation and treatment. It is necessary to reduce the intra-and inter-observer variation by further MRI (T2w and T1w) training for tumor delineation.