Toward adaptive radiotherapy for head and neck patients: Uncertainties in dose warping due to the choice of deformable registration algorithm

Purpose: The aims of this work were to evaluate the performance of several deformable image registration (DIR) algorithms implemented in our in-house software (NiftyReg) and the uncertainties inherent to using di ff erent algorithms for dose warping. Methods: The authors describe a DIR based adaptive radiotherapy workflow, using CT and cone-beam CT (CBCT) imaging. The transformations that mapped the anatomy between the two time points were obtained using four di ff erent DIR approaches available in NiftyReg. These included a standard unidirectional algorithm and more sophisticated bidirectional ones that encourage or ensure inverse consistency. The forward (CT-to-CBCT) deformation vector fields (DVFs) were used to propagate the CT Hounsfield units and structures to the daily geometry for “dose of the day” calculations, while the backward (CBCT-to-CT) DVFs were used to remap the dose of the day onto the planning CT (pCT). Data from five head and neck patients were used to evaluate the performance of each implementation based on geometrical matching, physical properties of the DVFs, and similarity between warped dose distributions. Geometrical matching was verified in terms of dice similarity coe ffi cient (DSC), distance transform, false positives, and false negatives. The physical properties of the DVFs were assessed calculating the harmonic energy, determinant of the Jacobian, and inverse consistency error of the transformations. Dose distributions were displayed on the pCT dose space and compared using dose di ff erence (DD), distance to dose di ff erence, and dose volume histograms


INTRODUCTION
Deformable image registration (DIR) is an active and important area of research with multiple applications in adaptive radiotherapy (ART), particularly for head and neck (HN) patients. 1,2 The planning CT (pCT) can be deformed to match the daily anatomy to calculate the dose delivered per fraction, 3,4 the deformations can be applied to propagate contours, 5,6 and the fraction by fraction dose maps can be warped back to a common reference frame for summation. 7,8 In image registration at least two input images are necessary, the source and the target images, and the result is a transformation (T s→ t ) that can be used to deform the source image (s) onto the target image (t). The majority of the research and commercial registration algorithms are unidirectional, which means they only optimize the forward transformation (T s→ t ) and do not consider the transformation in the opposite direction (T t→ s ), which can be used to deform the target image onto the source image. For applications where T t→ s is also required, one can simply use a unidirectional algorithm twice, by switching the roles of the source and the target images, numerically estimate the inverse of T s→ t , or use bidirectional algorithms that optimize T s→ t and T t→ s simultaneously. In clinical applications, DIR is used to model the spatial anatomical mapping between time points; therefore, physically plausible transformations are desirable. Two concepts are associated with physically plausible deformations: inverse consistency and symmetry. Inverse-consistent registrations try to ensure that T t→ s is the mathematical inverse of T s→ t i.e., T t→ s = T −1 s→ t . Symmetric registration means that identical transformations are obtained when the roles of source and target images are switched: if the source image becomes the target (t ′ , such that t ′ = s) and the target is now the source (s ′ , such that s ′ = t), a symmetric algorithm will ensure that T t ′ → s ′ = T s→ t and T s ′ → t ′ = T t→ s . The two concepts are usually intertwined in the literature, but are not equal as a symmetric algorithm is not necessarily inverse-consistent (i.e., T s ′ → t ′ is not guaranteed to be T −1 t ′ → s ′ ), and vice-versa. The differences between symmetry and inverse consistency are more clear when considering unidirectional algorithms, since most bidirectional algorithms that aim to guarantee inverse consistency are also symmetric. For example, when performing two unidirectional registrations, one in each direction, the resulting transformations are symmetric but not inverse-consistent (switching the source and target results in the same transformations on opposite directions). If T t→ s is obtained by numerically estimating the inverse of the unidirectional registration result (T s→ t ), the transformations are inverse consistent but are not symmetric (estimating the inverse of the opposite transformation does not produce the same result as running the unidirectional registration in that direction).
In recent years, advanced and complex registration algorithms have been developed to be symmetric and inverseconsistent. One approach consists in using the inverse consistency error (ICE) directly as a penalty term. 9 While this encourages inverse consistency, it can reduce the ability to recover large and complex deformations, and the forward and backward transformations are only approximate inverses of each other. 10 A better, but less ad-hoc, approach is to use a stationary velocity field to parametrize the transformation. 11 This can be used to generate the deformation field in either the forward or backward direction, and these are guaranteed to be exact inverses of each other (subject to methodological approximation and numerical precision).
Dose warping and summation are important topics in ART research, since knowledge of the total dose delivered at each time point is fundamental in the ART decisionmaking and replanning process. Validation of DIR is a key aspect, therefore some authors have been evaluating the accuracy of DIR and dose warping using manually annotated points and structures, 5,6,12,13 physical plausibility of the transformations, 14,15 and by developing deformable and virtual phantoms with known deformations. [16][17][18][19] Others focused on estimating the accuracy requirements for dose warping 20,21 or estimating its precision. [22][23][24][25] Monte Carlo methods have also been proposed to recalculate doses using a deformed grid (with deformed and irregular voxels), which still uses DIR but minimizes the errors associated with dose interpolation. [26][27][28] Finally, some groups have been using dose warping to evaluate the benefits of replanning. 7,8,29 Most of the work done on these topics uses CT-to-CT registration on different anatomical sites, and/or registration algorithms that do not ensure symmetry and inverse consistency.
We investigated an ART framework for HN patients using the CT and cone-beam CT (CBCT) imaging. The Hounsfield unit (HU) information is mapped to the daily geometry for "dose of the day" calculations, 4 and the dose is remapped to the planning geometry for dose summation. Therefore, estimates of both forward and inverse transformations are required. Other groups suggest using CT and CBCT for dose remapping without requiring both transformations by calculating the dose of the day directly in the CBCT. 7,13 However, in our opinion, CBCT imaging is still unreliable for direct dose calculations, and until the image quality of CBCT is improved, a deformed CT is a good interim solution. We used four approaches to obtain those transformations, using three different DIR algorithms implemented in our in-house software (NiftyReg), all of which use a Bspline parametrization of the transformation with different theoretically desirable properties. The aims of this work were to compare different DIR algorithms available in NiftyReg in terms of their geometrical accuracy, physical properties, and computation time and to evaluate the uncertainties inherent to using different algorithms for dose warping. Since we compare algorithms from the same software package (i.e, similar implementation), the differences found are due to the physical properties of the underlying algorithm and not due to other differences in implementation.

2.A. Patient data
Retrospective data from five HN patients treated in our clinic were used in this study. The patients followed an intensity modulated radiotherapy (IMRT) treatment with a planned dose of 65 Gy delivered in 30 daily fractions. Positioning reproducibility was ensured using a personalized HN and shoulder mask with an appropriate head-rest support. The patients included in this study were selected from the few patients that were historically replanned at our institution, and therefore a challenging test cohort for image registration. They were referred for replan midtreatment when an offline review of their CBCT scans found the spinal canal or brainstem had moved out of their planning volumes, the external contour had decreased by more than 5 mm, and/or the immobilization mask no longer fitted satisfactorily.
A planning CT (GE Widebore 16 slice system) with contrast injection was acquired approximately 2 weeks before treatment and used for treatment planning. Weekly CBCTs (On-board imaging v1.4, Varian Medical Systems, Palo Alto, CA) were acquired before treatment delivery with the patient in treatment position. The CBCTs were acquired in half-fan mode, full rotation, 110 kVp, 20 mA, 20 ms, with a maximum FoV of 45 cm in diameter and 16 cm in length. Imaging resolution was 0.977 × 0.977 × 2.5 mm and 0.879 × 0.879 × 2 mm for the CT and CBCT scans, respectively.

2.B. NiftyReg
NiftyReg is an open-source registration package implemented by our institution's Centre of Medical Image Computing (http://cmic.cs.ucl.ac.uk/home/software/). It includes a block matching-based affine registration 30 and several B-spline free form deformation based algorithms. 31 NiftyReg's default algorithm is a standard unidirectional implementation on the GPU 32 and was previously used by our group for CT-to-CBCT registrations. 4 Recently more sophisticated (bidirectional, symmetric, and inverse-consistent) implementations have been incorporated in the software. An inverse-consistent symmetric 33 and a stationary velocity field transformation model implementations 34 are now freely available. NiftyReg also features a numerical estimation of the inverse of a deformation field, which uses an iterative method to estimate each vector of the inverse deformation vector field (DVF) independently using the simplex algorithm. It is similar to other published implementations 24,35 but independently developed. This algorithm is also implemented in the GPU.
NiftyReg includes several similarity measures and regularization terms. Normalized mutual information was used for all registrations as it could successfully deal with the nonlinear relationship between CT and CBCT intensities and the spatially varying CBCT intensities. Bending energy was used as a regularization term for all of the registrations. 31 In addition, a penalty term based on the logarithm of the Jacobian determinant 36 was used for the unidirectional registrations and the inverse consistency error penalty term 33 was used for the inverse-consistent registrations. The penalty term weights were optimized for each algorithm independently. Detailed explanation of the parameters used and how they were optimized can be found in our earlier publications. 4,37,38

2.C. DIR in an ART workflow
The method we propose to compute, map, and accumulate dose distributions while accounting for anatomic variations requires the pCT and CBCTs images acquired throughout the treatment. The process consists of each of the following steps ( Fig. 1).
(i) Mapping the HU information from the CT to the CBCT geometry using DIR. The process is repeated for each CBCT available for the patient. on the pCT space. When a CBCT is not available for every fraction of the treatment, the weighting used when summing the doses will depend on the fractionation scheme and scans available.
The accumulated dose can potentially be used clinically to feed the ART decision-making process. At each imaging time F. 1. Use of deformable image registration for an adaptive radiotherapy workflow. The registration maps the Hounsfield units from the CT to the daily CBCT scans, and the deformed CT is used for "dose of the day" calculations. The dose delivered is mapped back to the planning stage, where it is accumulated and the need to replan assessed. point, the dose delivered is mapped to the same reference space, chosen to be the pCT, where it is summed to the dose from previous time points. Choosing the pCT as reference allows to iteratively compare the planned objectives with the delivered values as the treatment progresses, such that the need to replan can be regularly assessed. 7,8 The total dose delivered can also be warped to the daily geometry to assess the need to replan (using the CBCT as reference) and to feed the replanning process (using a rescan CT). In the first case, additional registrations would be required between the fractions since the reference space would change at each fraction. The latter process should be performed with utmost care as uncertainties in registration will not only affect the decision to replan but also the newly planned treatment.
In this study, we compared four different approaches to dose warping available with the NiftyReg software (we use "forward direction" to stand for the CT-to-CBCT transformations and "backward direction" to stand for the CBCT-to-CT transformations): 1. Standard asymmetric registration in both forward and backward directions (DIR sas+sas ); 2. Standard asymmetric registration in the forward direction followed by the numerical estimation of the inverse of this transformation (DIR sas+inv ); 3. Inverse-consistent symmetric registration which provides both the forward and backward transformations (DIR ics ); 4. Symmetric registration parameterized by a stationary velocity field which inherently provides both the forward and backward transformations (DIR svf ).
All the algorithms use a B-spline parametrization of the transformation with different theoretically desirable properties; while DIR sas+sas is symmetric but not inverse-consistent, DIR sas+inv is inverse-consistent but not symmetric. DIR ics and DIR svf are both symmetric, but while the first encourages inverse consistency using a penalty term, the second guarantees it by using a velocity field parametrization. 34

2.D. Evaluation scheme
The different DIR algorithms were evaluated and compared in terms of geometrical matching, properties of the underlying deformations, and computation time. All the statistical tests performed on this paper were done using the  statistics toolbox and the Wilcoxon signed ranked test (95% confidence).

2.D.1. Geometric matching
The purpose of this part of the study was to assess the ability of the different approaches to align the same anatomical features in CT and CBCT images.
A clinical expert contoured a set of identifiable features on both the pCT and CBCT scans. The CBCT used for each patient was the last acquired before replan referral (i.e., with a relatively large deformation). The DVFs were used to propagate the features in both directions (from the pCT to the CBCT and from the CBCT to the pCT). The propagated contours were then compared with the original manually drawn contours (i.e., the features propagated from the pCT were compared with the ones manually drawn on the CBCT, and vice-versa). The features included vertebrae C1, C4, and C7 (which are subject to rigid motion only and cover the length of the cervical spinal canal), external body contour (which reflects the extent of the weight loss), and the right and left sternocleidomastoid muscles (whose deformation correlates with changes in nodal dose 39 ). Common HN organs-at-risk (OARs), such as parotids and brainstem, were not considered in this evaluation because they cannot be unequivocally identified in CBCT images.
We calculated three complementary metrics that provide information about the similarity between volumes: dice similarity coefficient (DSC), false positives (FP), and false negatives (FN). DSC measures the fraction of the overlap between volumes, FP stands for the fraction of deformed volume that is not part of the manual volume, and FN stands for the fraction of manual volume that is not part of the deformed volume. Using FP and FN as well as DSC provides additional insight into the cause of geometric errors and will indicate if one structure is consistently larger/smaller than the other. To infer about the closeness between contours, the distribution of the signed Euclidean distances between the manual and deformed surfaces, also known as the distance transform (DT), 5 was calculated bidirectionally. We computed the fraction of the DT distribution larger than 2 mm (DT 2mm ), mean and standard deviation of the signed DT distribution (DT mean and DT std ), and the 95% percentile of the DT distribution (DT 95% ). In the forward direction, DT is negative if the deformed contour is inside the manual contour, otherwise it is positive (and vice-versa for in the backward direction).

2.D.2. Characteristics and similarity of the deformation fields
By optimizing properly the parameters used in the different algorithms, it should be possible to obtain similar geometric matching. However, different algorithms can result in very different DVFs, particularly inside anatomy that lacks internal features or regions of increased noise and reduced contrast in the CBCT scans. Therefore, characteristics of the underlying deformations were evaluated in this section.
The smoothness of the transformations was analyzed using the harmonic energy (HE) and the properties of the determinant of the Jacobian of the transformation [det(Jac)]. The HE refers to the mean Frobenius norm of the displacement field and is inversely proportional to the smoothness of the deformation. 15  between the composed transformations and the identity transform.

2.D.3. Computation times
The time taken to complete the registrations is important for clinical translation; therefore, the computation times for each approach were measured three times per dataset. Non-GPU implementations were done on an Intel ® Core™ i7-2600S CPU (2.80 GHz, 8 GB RAM), and the GPU registrations used a NVIDIA Tesla C2070 GPU card (14 multiprocessors, 6 GB dedicated memory).

2.D.4. Dose warping comparison
The purpose of this part of the study was to investigate the uncertainties in dose warping when using different DIR algorithms.
Dose calculations were performed on a deformed pCT and mapped back to the original pCTs using the results from the four different methods (in all cases cubic spline interpolation was used). Varian Eclipse external beam planning system analytical anisotropic algorithm was used to calculate the dose distributions using the highest available resolution (1 mm). For each patient, the same IMRT plan was applied, including beam arrangement, monitor units, and fluence maps. The dose distributions were compared within different volumes of interest using dose differences (DDs), by calculating the percentage of pixels whose DD was inferior to 2% of the prescribed dose (pD) (DD 2%-pp ), the root mean square value (DD RMS ), and the 99th percentile of the DD distribution (DD 99% ), the differences predicting the mean (∆D mean ) and maximum doses (∆D max ) to OARs, and dose volume histograms (DVHs). 2%pD was defined as tolerance criteria for DD in accordance with our internal clinical standards for the comparison of dose distributions.
The distance to dose difference (DTD) was also calculated for all the planned dose distributions (using an accuracy of 2%pD). 20 DTD is a method to estimate the required spatial accuracy of a DVF for dose warping based on the distance that one has to travel within the dose map to find a DD above a tolerance value. The DTD in different regions of interest was compared to the variability in mapping between different algorithms, by measuring voxel-by-voxel the root mean square of the DTD (DTD RMS ) and the Euclidean distance between the backward deformation vector fields (computed as a L 2 -norm).

3.A. Geometric matching
In Table I The two exceptions were an underperformance of DIR ics in the external contours (p = 0.02) and slightly better results for DIR svf in the muscles (p < 0.01). This is in agreement with our findings by visually inspecting the registrations. We found that DIR ics had difficulty in recovering larger and complex deformations, while DIR svf performed particularly well in the alignment of soft tissues. The effect of inverse consistency became evident when analyzing the FN and FP results: FP ≈ FN for DIR sas+inv and DIR svf , as the FP in one direction coincided with the FN in the other direction, and vice-versa, and so they averaged to similar values. FP FN for DIR ics because even though this algorithm encourages inverse consistency, the resulting forward and backward transformations are only approximate inverses to each other. Table II shows the values of HE, det(Jac), and ICE found for each of the approaches. The voxels outside the patient were ignored when calculating the results. All the DVFs obtained had no negative det(Jac), thus were effectively invertible. In terms of physical plausibility of the DVFs, DIR svf provided deformations with more desirable physical properties. Lower values of HE indicate smoother transformations; therefore, the level of smoothness of the DVFs was higher in symmetric approaches (DIR ics and DIR svf ), which resulted in tighter intervals of the det(Jac) values. Transformations constrained with an inverse-consistent penalty (DIR ics ) considerably improved the ICE in comparison with DIR sas+sas , but the two resulting DVFs fields were clearly not real inverses as with DIR sas+inv and DIR svf , for which the mean and standard deviation of the ICE were submillimeter. Figure 2 shows the ICE map on a slice of one of the patient's anatomy. DIR sas+sas and DIR ics largest ICE values were found close to airways and in the shoulders region, where the CBCTs showed reduced contrast and higher noise. DIR sas+inv and DIR svf maximum T II. Mean values ± standard deviation for properties of the deformation vector fields: average values of HE, 1% and 99% percentiles of det(Jac), and mean, standard deviation, and 99% percentile of the ICE.  Table III shows the registration computation times measured for the different algorithms. When comparing the computation time taken to complete the registration in both directions and in the same processor, we found that DIR sas+sas and DIR sas+inv resulted in similar times, while DIR ics and DIR svf took on average 2-3 times longer. In comparison, the standard forward asymmetric registrations ran in 0.9 ± 0.2 min (range: 0.7-1.2 min) in the GPU. There are plans to implement all of the DIR algorithms available in NiftyReg so they can run on a GPU, but until then the use of DIR ics and DIR svf is limited to offline studies as current non-GPU computation times are too slow for online applications.

3.D. Dose warping comparison
Differences between the DVFs generated by different DIR algorithms will affect the final warped dose distribution. Each of the two DVF generated per patient (forward and backward) contributed to the differences in the final warped dose. First, differences in forward DVFs resulted in different deformed CTs and therefore different "doses of the day." Second, the backward DVFs remapped differently the doses of the day to the pCT space. The contribution of the first was small as we found the doses of the day to differ by less than 2% of the prescribed dose on over 95% of the body voxels.
Dosimetric differences between the results obtained with DIR svf and every other approach were assessed. DIR svf was arbitrarily chosen as the basis of this comparison since it generated the DVFs with more desirable physical properties.
Table IV presents the DD found between different methods and DIR svf in different regions of interest. We found no statistical evidence of any method being more similar to DIR svf (p ∈ [0.6-1.0]). The differences were smaller in the treated volume (TV) and larger in the 50%-95% of the prescribed dose volume (i.e., the irradiated volume that was not within the treated volume, IV-TV), where higher gradients were more likely to occur. Regions of poorer CBCT quality (low contrast and high noise within larger imaging volumes, 3 i.e., near the shoulders) within the IV (P IV ) showed higher variability between warped doses. The differences between DD 2%-pp between all the different identified regions were statistically significant (p < 0.01). Therefore, regions of higher dose gradient and poorer CBCT image quality were more prone to having larger variability in warped doses, but for different reasons. DTD RMS was 1.6 ± 0.4 mm and 2.7 ± 0.8 mm within IV-TV and TV, respectively. The root mean square L 2 -norm value within IV-TV, TV, and P IV was 2.7 ± 1.1, 3.4 ± 1.5, and 4.2 ± 2.1 mm. L 2 -norm values found for P IV were statistically different from other regions (p < 0.01), while between TV and IV-TV were not (p = 0.1). Voxels within IV-TV had larger DD than voxels within TV due to the local characteristics of the dose distribution (shown by DTD), while inside P IV , the larger spatial mapping variability between DIR algorithms explained the larger DD (shown by L 2 -norm).
The effect of the DVF in the dose mapping is complex and is theoretically expected to depend on the location of the dose gradients. 20,25 We computed the dose gradient of the planned dose distribution and related it with the values of DD within IV-TV. The correlation between gradient and DD was weak (Pearson correlation coefficient, ρ = 0.281), but it was clear that as the dose gradient increased the distribution of DD values became more spread and the average DD increased (Fig. 3) We computed the DD, ∆D mean , and ∆D max to the spinal canal, brainstem, and parotids (Table V). The wide range of values found for ∆D mean and ∆D max was indicative that larger dose differences occur depending on the particular dose distribution and relative positioning of the OARs. Analyzing the DVHs' curves obtained for each patient, we could see that, in general, all algorithms lead to similar clinical outcomes, i.e., that a replan was needed since OARs were receiving more dose than tolerated and targets less dose than planned. However, for one of the patients included in this study, while DIR sas+sas and DIR ics estimated a dose above the spinal canal tolerance, DIR sas+inv and DIR svf did not (Fig. 4). This is an example where the decision to replan could be affected by the choice of algorithm.

DISCUSSION
Regarding the geometric evaluation, all the methods resulted in good alignment between anatomical contours. In some cases, DIR ics had worse results than the other approaches, indicating a reduced ability to capture more complex deformations due to the introduction of additional constraint terms. The differences in geometrical alignment from DIR sas+sas and DIR svf were statistically and clinically insignificant, but the underlying properties of the deformations T IV. Mean values ± standard deviation for the dose difference test pass-percentage (DD 2%-pp ), root mean square value (DD RMS ), and the 99th percentile of the DD distribution (DD 99% ) between different approaches and DIR svf within different regions of interest (as a percentage of the pD). The TV corresponds to the volume encompassed by the planning 95% isodose surface, while the IV corresponds to the volume encompassed by the planning 50% isodose surface. Therefore, IV-TV is the volume where 50%-95% of the dose was planned to be delivered. P IV was defined as the body slices close to the shoulders (a larger imaging volume) intersected with the IV, where the CBCT Hounsfield units were not reliable. were different. DIR svf resulted in deformations with more desirable physical properties, where both symmetry and inverse consistency were satisfied. Different DIR algorithms generated different DVFs which resulted in differences when warping the dose to the planning geometry. The mean and maximum values found for DD between different DIR implementations were comparable to the values found by Salguero et al. when estimating the dose uncertainties of a DIR algorithm due to lack of inverse consistency. 23 In this small feasibility study, we identified situations where the choice of algorithm leads to higher uncertainties in dose warping. The first important point was the effect of the dose gradient. Where there was a high dose gradient, there was more likely a larger variability in dose between the different DIR algorithms. OARs within the high gradient region can be of concern, as we found that different methods could predict maximum doses to the spinal canal and brainstem with a difference of up to 2.8%pD. The correlation between gradient and DD was weak and similar to the values reported in the same study by Salguero et al. The weak correlation can be explained due to the fact that if a registration error occurs in uniform dose region the resulting dose error will be small, but when a registration error occurs in high dose gradient the resulting dose difference may be large but does not mean it will be, since there are other factors to consider besides the gradient (i.e., how large the uncertainty in spatial mapping is, whether the registration error is in the same direction as the gradient). Our findings are also in agreement with the findings of Saleh-Sayah et al., as the required spatial accuracy depended on the local dose distribution. 20 A second important point is the effect of the lower quality of the CBCT in the registration uncertainty. Regions of reduced contrast and increased noise (particularly evident in larger imaging volumes like the shoulders) were more susceptible to variability in mapping between registration algorithms and therefore in larger differences between warped T V. Mean values ± standard deviation (range) for the dose difference test pass-percentage (DD 2%-pp ), root mean square value of the DD (DD RMS ), and differences predicting the mean (∆D mean ) and maximum doses (∆D max ) to the OARs.

OAR
Method DD 2%-pp (%) DD RMS (%pD) ∆D mean (%pD) ∆D max (%pD) doses. The impact of the choice of DVF will depend on the dose distribution and relative positioning of OAR, and generic validation frameworks (based exclusively on geometric analysis of the deformations) are not sufficient for dose warping applications. The patients included in this study were a challenging cohort to test DIR, and it is likely that the issues reported may have less impact for less demanding patients (but then these patients will benefit less from ART). A larger study could potentially identify patterns of when different algorithms provide significant differences between doses, and flag the regions where the dose warping algorithm may be unreliable.
For an application that is sensitive to the underlying deformations such as dose warping, we believe a more complex algorithm like DIR svf is preferable over other approaches. This opinion is based on the similar ability to recover deformations while generating deformations with more desirable physical properties. Other studies support the theoretical advantages of ensuring symmetry and inverse consistency to improve the precision of dose warping using DIR. Bender et al. studied the effect of inverse consistency and transitivity in DIR for a single HN patient. 24 Lack of transitivity means that different dose distributions will be obtained depending on the order in which the registrations are used and the time point chosen for summation. They found dose differences at OAR when different image time points were used as a reference for summation-however, when increasing the inverse consistency and transitivity those differences were considerably reduced. Inverse-consistent algorithms do not enforce transitivity making this an interesting area of research for registration developers. Yan et al. showed that the dose mapping inverse consistency error observed when mapping doses back-and-forth was reduced 1.5-3 times when the spatial inverse consistency was improved. 35 The reliability of using dose warping in clinical settings is a current and open debate. 40 Our work contributes to this discussion by evaluating theoretically better DIR algorithms and investigating the uncertainties in dose warping due to the choice of algorithm in ART frameworks that use CT-to-CBCT registration. One of the main difficulties with validating dose warping is that the true point-to-point mapping is difficult or impossible to establish, especially in regions of homogeneous image intensities as is often found inside individual structures. 41,42 When new plans are based on accumulated dose, registration inaccuracies will also affect the newly planned treatment. We do not think that the difficulties inherent to validate DIR for dose warping necessarily discourage its use for clinical research, but we advise users to carefully consider F. 4. Dose volume histogram obtained for a patient included in this study, using doses warped by different DIR algorithms: (a) all organs-at-risk and target volumes and (b) zoom of the maximum dose to the spinal canal. DIR sas+sas and DIR ics estimated a dose above the tolerance, while DIR sas+inv and DIR svf did not. their choice of DIR algorithm and the conclusions that should be drawn from the results.
It should be noted that the real changes occurring in the tissue are complex and variate: sometimes tissue appears or disappears (e.g., weight loss and tumor shrinkage) and sometimes it expands/contracts or deforms in other ways. The vast majority of current DIR algorithms use a transformation model that represents expansion/contraction, but map constant image intensities, which in CT represents a constant density and so is more representative of appearing/disappearing tissue. To accurately model and recover the real physical changes that occur during a course of radiotherapy is extremely challenging but it is what an ideal DIR algorithm should be able to do and what should be an aim of the next generation of DIR algorithms. Several groups are actively working on making DIR algorithms more realistic. Examples include incorporating missing tissue in image registration by modifying existent DIR algorithms 43 and further regularizing the transformation to avoid deformation of bony anatomy. 44 However, there is still a very long way to go to achieve truly realistic DIR, and indeed, this will not just involve developing new algorithms and computational techniques but will also require a better understanding of the actual physical and biological processes that occur during a course of radiotherapy.

CONCLUSIONS
We have evaluated several DIR algorithms for CT-to-CBCT registrations and investigated the uncertainties inherent to using different DIR algorithms to warp doses to a reference geometry. Standard asymmetric and stationary vector field implementations resulted in similar geometric matching, but the properties of the DVFs were very different, with the second providing deformations with more desirable physical properties. The choice of DIR implementation had a larger impact on the dose warped in regions where the dose gradient is high and/or the CBCT image quality is poorer.