Patient‐specific calibration of cone‐beam computed tomography data sets for radiotherapy dose calculations and treatment plan assessment

Abstract Purpose In this work, we propose a new method of calibrating cone beam computed tomography (CBCT) data sets for radiotherapy dose calculation and plan assessment. The motivation for this patient‐specific calibration (PSC) method is to develop an efficient, robust, and accurate CBCT calibration process that is less susceptible to deformable image registration (DIR) errors. Methods Instead of mapping the CT numbers voxel‐by‐voxel with traditional DIR calibration methods, the PSC methods generates correlation plots between deformably registered planning CT and CBCT voxel values, for each image slice. A linear calibration curve specific to each slice is then obtained by least‐squares fitting, and applied to the CBCT slice's voxel values. This allows each CBCT slice to be corrected using DIR without altering the patient geometry through regional DIR errors. A retrospective study was performed on 15 head‐and‐neck cancer patients, each having routine CBCTs and a middle‐of‐treatment re‐planning CT (reCT). The original treatment plan was re‐calculated on the patient's reCT image set (serving as the gold standard) as well as the image sets produced by voxel‐to‐voxel DIR, density‐overriding, and the new PSC calibration methods. Dose accuracy of each calibration method was compared to the reference reCT data set using common dose‐volume metrics and 3D gamma analysis. A phantom study was also performed to assess the accuracy of the DIR and PSC CBCT calibration methods compared with planning CT. Results Compared with the gold standard using reCT, the average dose metric differences were ≤ 1.1% for all three methods (PSC: −0.3%; DIR: −0.7%; density‐override: −1.1%). The average gamma pass rates with thresholds 3%, 3 mm were also similar among the three techniques (PSC: 95.0%; DIR: 96.1%; density‐override: 94.4%). Conclusions An automated patient‐specific calibration method was developed which yielded strong dosimetric agreement with the results obtained using a re‐planning CT for head‐and‐neck patients.


| INTRODUCTION
Radiation treatments generally span several weeks and during this time, changes in patient weight, tumor volume and organ positioning can occur. 1 These changes may substantially alter the radiation dose distribution within the patient, potentially resulting in degraded plan quality and suboptimal clinical outcomes. 2 To ensure that a patient receives adequate treatment, a new replanning CT (reCT) data set may be acquired to dosimetrically assess plan quality and to evaluate whether treatment re-planning has become necessary. Unfortunately, this workflow is often inefficient as it is difficult to distinguish a priori which patients require a reCT from those who do not. A promising solution is to use cone-beam computed tomography (CBCT) image sets to dosimetrically assess plan quality, since these image sets are already routinely acquired prior to treatment for patient setup and monitoring. However, to perform dose calculations, accurate tissue density information must be extracted from the CBCT voxel values.
Normally, tissue density information is obtained through CT calibration curves, which are generated by scanning a plastic phantom containing various inserts of known electron density. 3 For CBCT scans, the Hounsfield Units (HU) of an image set are highly dependent on many factors, including the size and material of the phantom, the materials placed in the phantom, and the imaging protocol used. [4][5][6][7] Furthermore, scattering conditions often differ between phantoms and patients when using a cone-beam geometry. Due to this variability, HU-to-density calibration curves obtained with phantoms for CBCT lack sufficient robustness to be applicable to all patients and across all anatomical sites. 6 Consequently, alternative methods of inferring tissue density have been proposed, such as: (a) population-based calibration curves, 6,7 (b) multi-level thresholding or bulk assignment of the HU or density values, [7][8][9][10] and (c) voxel-tovoxel mapping using deformable image registration (DIR). [10][11][12] Although these CBCT calibration techniques have demonstrated some promising results, each method may have limitations in certain situations. For instance, population-based calibration methods require unique calibration curves for each treatment site, and for each imaging protocol used. Bulk assignment techniques are dependent on the accuracy of automatic segmentation or thresholding of tissue regions, correct density assignments, or the time allotted to manually correct improperly delineated volumes. Similarly, DIR methods depend on the accuracy of the DIR algorithms, as regional DIR errors may significantly distort local anatomy and hence affect the density and dose evaluation within the region. 13 This may be particularly problematic in sites such as the pelvis and thorax where large deformation errors frequently occur. Moreover, regional DIR errors could also alter the delineation of critical structures, thereby further affecting organ dose assessment and dose-volume metrics.
To potentially resolve these limitations, we began development on an alternative patient-specific CBCT calibration (PSC) technique that, while using DIR algorithms, is less sensitive to DIR uncertainties. Briefly, rather than mapping CT numbers voxel-by-voxel with DIR, we generate a systematic but patient-specific calibration curve for each CBCT slice after registering CBCT to planning CT with DIR (see next section for details). This slice-specific calibration curve is then applied to the CBCT slice to convert the voxel values to their "planning CT equivalent" values, without altering the patient geometry through regional DIR errors. Calibration curves are generated on a per-slice basis since scattering conditions may vary axially and thereby affect the relationship between CBCT and planning CT HU values.
To evaluate whether this new PSC method improves dosimetric accuracy, we performed a retrospective patient study of 15 headand-neck clinical cases, and a phantom study. The dosimetric accuracy of this PSC method was compared to a re-planning CT (serving as the gold standard) and to other CBCT calibration methods proposed in literature (DIR mapping and bulk density assignment).

2.A | Patient selection
Fifteen head-and-neck cancer patients were selected at random from our institution database, all of whom had completed their treatment course and were referred for a reCT study at some point during their treatment course. This tumor site was selected due to the high frequency of treatment re-planning. To minimize the dosimetric error resulting from anatomical differences, CBCTs acquired around the acquisition date of the reCT were reviewed and the CBCT with the most acceptable anatomical agreement with the reCT image set was selected. Patient and treatment related information are summarized in Table S1.

2.B | Imaging
All CT and CBCT images were acquired as part of the patient's routine treatment course.
Original planning CT and re-planning CT images were acquired on a Philips Brilliance Big Bore 16-slice CT scanner (Philips Healthcare, Cleveland, OH). CT images were acquired with a full-fan 120 kVp beam. The scanning parameters used to acquire each planning and re-planning CT, can be found in Table S2 and S3, respectively. The CT images were reconstructed using the device's default filtered back-projection algorithm, with a default slice thickness of 3 mm and slice size of 512 9 512. The voxel size varied between image sets as the CT operator would select the smallest field of view (FoV) required to cover the patient. 14 CBCT images were acquired with either a Varian Truebeam or Clinac iX On-Board Imaging (OBI) system (Varian Medical Systems, Palo Alto, CA). CBCT scans were acquired with either a standard (20 mA) or low-dose (10 mA) protocol using a full-fan 100 kVp beam with a full bow-tie filter. The scanning parameters used to acquire each CBCT can be found in Table S4. CBCT scans were reconstructed by the treatment unit's OBI software (v 2.0-2.1) which uses a Feldkamp-Davis-Kress (FDK) reconstruction algorithm with a Ram-Lak filter. 15,16 Image slices were 384 9 384 in size when acquired with the Clinac iX's system, and 512 9 512 when acquired with the Truebeam's system. Figure 1 outlines the general steps performed for each calibration method in this study. Details specific to each method will be described below.

2.D | Patient-specific calibration (PSC) method
The CBCT image sets were imported into a research version of the Pinnacle treatment planning system (v9.7, Philips Healthcare, Fitchburg, WI) along with the patient's original treatment planning CT data set (containing the CT scan, treatment plan, contours, and points of interest). The CBCT image set was first rigidly registered with the planning CT image set. The planning CT image set was then deformably registered to the CBCT image set using a fast-symmetric Demon's algorithm implemented in Pinnacle, resulting in a deformed planning CT image set that was registered with the CBCT image set. 17 The resulting deformed planning CT and the CBCT image sets were exported to Matlab (v2015a, MathWorks Inc, Natick, MA) for the patient-specific calibration.
A correlation plot of the voxel values was then generated for each slice between the deformed planning CT and CBCT image sets Therefore, a strong relationship between the planning CT and CBCT HU values could be regressed from these correlation plots. With this relationship, we could scale the CBCT HU values to their CT-equivalent values without introducing the regional DIR errors.
Linear calibration curve specific to each slice were obtained by least-squares fitting of the correlation plots, and applied to each slice of the CBCT data set. These calibration curves were slice-specific since scatter conditions will vary between slices of the CBCT, and therefore the relationship between CBCT and planning CT HU values (the model parameters regressed) may change.
As a final image processing step, the calibrated CBCT images were merged with the original planning CT images to extend the FoV, as shown in Fig. 2(b). Before merging, the calibrated CBCT images were rigidly registered with the original planning CT [using R 1 in Fig. 1] and resampled with a linear interpolation algorithm so that the resolution of the CBCT matched that of the planning CT.
Regions that were outside of the calibrated CBCT FoV or truncated during reconstruction were substituted with voxel values from the original planning CT images. Slices on the superior/inferior border with poor correlation between the voxel values of the CBCT and the deformed planning CT image set (R 2 < 0.8) were also replaced by the planning CT images. This usually occurred in the shoulder region where the CBCT FoV was insufficient to cover the whole patient, resulting in large deformation errors. By removing these slices, we could improve the anatomical matching at the junction of the CBCT and the original planning CT image set.
F I G . 1. Schematic of the process used to generate a calibrated CBCT data set for dose calculation (left). The gold standard reCT data set is rigidly registered with the final calibrated CBCT data set for comparison (right).
For the DIR method, the deformable image registration proceeded exactly as it did for the PSC method. Provided there are no significant DIR errors, a deformed planning CT will match the target CBCT while containing HU values from the source planning CT. Therefore, the deformed planning CT data set can be directly used to calculate the dose received at the time of treatment. After DIR, the deformed planning CT image set has the same dimensions and coordinates as the CBCT image set. Therefore, the deformed images were also rigidly registered, resampled, and merged with the original (undeformed) planning CT images to extend the field-of-view. Like the PSC method, the same slices on the superior/inferior border with poor correlation between the voxel values of the CBCT and the deformed planning CT (R 2 < 0.8) were replaced by the planning CT slices.

2.F | Density-override method
In the density-override method, the CBCT image set was first rigidly registered with the original planning CT image set. Regions where soft tissue had become air (e.g., weight loss) or where air had been replaced by soft tissue (e.g., closed air cavity) were manually delineated on the original planning CT image set and assigned either water or air equivalent densities, accordingly. With these modifications, the major anatomical changes can be accounted for on the planning CT dataset, while continuing to use the original planning CT's HU values for dose calculations. This technique is similar to the algorithm proposed by van Zijtveld et al. 8 and is illustrated in Fig. 3.

2.G | Gold standard (reCT) for dose evaluation
The CBCT image sets calibrated by each method were imported back into the research version of Pinnacle, along with the patient's re-planning CT data set (including contours). Each of the image sets were rigidly registered with the reCT image set (R 2 in Fig. 1

2.H | Contouring and dose metrics
The contours from the reCT data set were rigidly copied onto each calibrated CBCT image set based on the rigid registration R 2 . If necessary, these contours were manually adjusted to match the patient anatomy as seen on the calibrated image set. Dose metrics evaluating tumor volume coverage and organ-at-risk (OAR) exposure were again tabulated and compared to the gold standard results.

2.I | Gamma analysis
Dose distributions computed on each of the calibrated image sets were compared to the gold standard dose distribution using the Sli-cerRT extension (v 0.18.0) of 3D Slicer (v 4.6.2). 19,20 A 3D gamma analysis was restricted to a region inside the original CBCT volume and excluded voxels within 3 mm of the surface so that uncertainties in surface dose were omitted. The analysis was performed with a low-dose threshold of 10% (relative to the maximum point dose on the reCT data set), and acceptance criteria of 3% dose-difference and 3 mm distance-to-agreement. The gamma pass rate (percentage of voxels with c < 1) was tabulated. pair-wise Student's t-tests when appropriate. A one-way repeated measure ANOVA was also performed to find statistical differences between the gamma pass rates. A 5% threshold for statistical significance (P = 0.05) was used.

2.K | Phantom study
A phantom study was also performed to assess the accuracy of the DIR and PSC CBCT calibration methods. A planning CT and CBCT (Clinac iX) scan were acquired of the CIRS 062 inner "head" phantom with various material inserts (Computerized Imaging Reference Systems Inc, Norfolk, VA).
To simulate weight loss with the phantom, a simulated reCT image set was created by reducing the planning CT's in-plane dimensions by 5% (yielding an equivalent depth reduction of 4.5 mm) as shown in Fig. 4. Similarly, the in-plane dimensions of the CBCT images were reduced by 5% to match the simulated reCT.
The original planning CT, simulated reCT and CBCT image sets were imported into the research version of Pinnacle. The CBCT was then calibrated using both the DIR and the PSC CBCT calibration methods described above. Merging of the calibrated CBCT image sets with the original planning CT was not required as the CBCT FoV was sufficient to capture the entire phantom. The calibrated CBCT images were then rigidly registered with the reCT image set.
The inserts in the phantom were manually delineated on each image set (as shown in Fig. 4) and the average density and Sørensen-Dice similarity coefficient (compared to the reCT contour) were calculated for each insert and each image set. F I G . 4. The image sets and contours produced for the phantom study. A simulated reCT was produced by reducing the Planning CT's in-plane voxel size by 5%. The CBCT voxel size was also adjusted by 5%. The CBCT was then calibrated using both the PSC and DIR methods. The inserts were delineated on each image and the average density and Dice Coefficient (relative to the reCT) was computed for comparison.

| RESULTS
Multivariate testing revealed that the image set used for dose calculation had a statistically significant effect on the dose metric values (P < 0.001). Further univariate analysis and pair-wise t-testing found that the spinal cord maximum dose D 0.1 cc metric was systematically underestimated by all three methods (P < 0.05). Furthermore, both DIR and density-override methods also systematically underestimated both the PTV mean dose metric (P < 0.01) and D 02 metric (P < 0.01), whereas the PSC method did not. The level of statistical significance of the Student's t-test is indicated by asterisks (P < 0.05) and daggers (P < 0.01) in Table 1.
The results of the 3D gamma analysis were found to be similar across all three techniques (P = 0.41), with the average (standard deviation) gamma pass rates of 95.0% (3.0%), 96.1% (3.3%), and 94.4% (4.4%) for the PSC, DIR, and density-override methods, respectively.
Results from the CIRS phantom study are provided in Table 2.
Relative to the reCT scan, the DIR calibrated CBCT provided very similar densities for every insert in the phantom. However, significant distortions were introduced into the image set as a result of the DIR errors, as evident by the Dice coefficient values and by visual inspection of the deformed CT in Fig. 4. Conversely, the PSC calibrated CBCT provides better anatomy matching than DIR, with higher Dice coefficients. Despite the presence of crescent artifacts, the PSC method also improves the average density accuracy for most materials, relative to the uncalibrated CBCT, as shown in Table 2.
On average, it took about 30 min to perform the full DIR and PSC calibration workflow, with the bulk of the time spent on dose calculations and transferring the image sets between systems for merging and/or calibration. Of those 30 min, under a minute was spent running the DIR, and only a few seconds were spent calibrating and merging the CBCT with planning CT image set using the PSC method. The density-override techniques took longer (~50 min) because the contours delineating anatomical changes were generated manually.

| DISCUSSION
We have developed a patient-specific method of calibrating CBCTs for dose tracking and plan assessment, and compared it with other methods for the head-and-neck site. 8,11 The results show that slightly better dosimetric agreement with the gold standard reCT can be obtained when using this patient-specific calibration (PSC) method, although each method demonstrated sufficient accuracy for plan re-assessment during radiotherapy.
It is worth noting that the spinal cord dose was poorly estimated by all three methods (D 0.1 cc in Table 1, Fig. 5). This was caused by a few select patients who had slight variations of the spinal cord positioning in regions of steep dose gradients (due to differences in setup between the reCT and the CBCT studies). It should also be noted that the gamma pass rates presented in this study are lower than other published results. For example, both van Zijtveld et al. 8 and Veiga et al. 11 reported similar gamma pass rates for the headand-neck site when using a stricter 2%, 2 mm acceptance criteria.
T A B L E 1 Mean (standard deviation) dose metric differences compared to the gold standard reCT, normalized by the prescribed dose. Dose metrics that were significantly different to the reCT are indicated with the *(P < 0.05) and † (P < 0.01).   (Table 1).

ROI
On the other hand, the PSC method is less sensitive to regional DIR errors as it maintains the patient anatomy from the CBCT, resulting in higher Dice similarity coefficients as shown in Table 2. While the PSC method preserves the patient anatomy, it also preserves the noise and any artifacts present in the CBCT images (Fig. 4). It also produces slightly less accurate densities than the DIR methods (  23 ), and reducing the noise and artifacts present in the CBCT, will be investigated in the future. The performance of this method will also need to be verified on other CBCT imaging systems, and in other treatment sites.
Finally, in addition to calibrating CBCT for dose calculations, the calibration curves used in the PSC method can also be used to quickly identify regions of potential DIR error on a deformed CT.
For example, if one highlights the voxels outside of the 95% confidence interval of the calibration curve, regions where the CBCT and the deformed CT differed substantially can be easily visualized. An example of this application is provided in Fig. 7.

| CONCLUSION
A patient-specific CBCT calibration method has been proposed and tested for the head-and-neck site. Compared to a gold standard reCT dose distribution, average differences in dose metric values were ≤ 1.1% for all calibration methods tested, although the PSC method showed slightly better performance. Some advantages of the PSC method are that it preserves the patient anatomy, accounts for variable scattering per CT slice, can be uniquely applied to each patient, is computationally efficient, and may also be used to display errors introduced by DIR algorithms.

CONFLI CTS OF INTEREST
This work received partial funding by Philips Healthcare.

SUPPORTING INFORMATION
Additional Supporting Information may be found online in the supporting information tab for this article. Table S1. Summary of Patient Treatment Information. Table S2. Summary of the planning CT acquisition. Table S3. Summary of the re-planning CT acquisition. Table S4. Summary of the CBCT acquisition.