Using patient‐specific phantoms to evaluate deformable image registration algorithms for adaptive radiation therapy

The quality of adaptive treatment planning depends on the accuracy of its underlying deformable image registration (DIR). The purpose of this study is to evaluate the performance of two DIR algorithms, B‐spline‐based deformable multipass (DMP) and deformable demons (Demons), implemented in a commercial software package. Evaluations were conducted using both computational and physical deformable phantoms. Based on a finite element method (FEM), a total of 11 computational models were developed from a set of CT images acquired from four lung and one prostate cancer patients. FEM generated displacement vector fields (DVF) were used to construct the lung and prostate image phantoms. Based on a fast‐Fourier transform technique, image noise power spectrum was incorporated into the prostate image phantoms to create simulated CBCT images. The FEM‐DVF served as a gold standard for verification of the two registration algorithms performed on these phantoms. The registration algorithms were also evaluated at the homologous points quantified in the CT images of a physical lung phantom. The results indicated that the mean errors of the DMP algorithm were in the range of 1.0~3.1mm for the computational phantoms and 1.9 mm for the physical lung phantom. For the computational prostate phantoms, the corresponding mean error was 1.0–1.9 mm in the prostate, 1.9–2.4 mm in the rectum, and 1.8–2.1 mm over the entire patient body. Sinusoidal errors induced by B‐spline interpolations were observed in all the displacement profiles of the DMP registrations. Regions of large displacements were observed to have more registration errors. Patient‐specific FEM models have been developed to evaluate the DIR algorithms implemented in the commercial software package. It has been found that the accuracy of these algorithms is patient‐dependent and related to various factors including tissue deformation magnitudes and image intensity gradients across the regions of interest. This may suggest that DIR algorithms need to be verified for each registration instance when implementing adaptive radiation therapy. PACS numbers: 87.10.Kn, 87.55.km, 87.55.Qr, 87.57.nj

changes may compromise the accuracy of dose calculation for each organ. As a result, the treatment plan developed may not provide necessary target coverage and organ-at-risk sparing. Adaptive radiotherapy (ART) aims to minimize the dosimetric impact of anatomical changes by reoptimizing the original treatment plan if its quality degrades. (3) A key step in the implementation of ART is to match each point on daily CT images to their correspondent points in the planning image. This process can be accomplished with deformable image registration (DIR) techniques. (4) DIR is to derive a transformation map by maximizing the intensity similarity between the two images being registered. Depending on the registration techniques used, the transformation map can be represented by different mathematical models such as affine transform, (5) thin-plate spline, (6) or B-spline basis, (7) or adapted directly through optical flow-based equations. (8) Similarity metrics also can be represented in different forms including the sum of squared difference, cross-correlation, or normalized mutual information. (9,10) Like dose calculation algorithms, these DIR algorithms must be thoroughly evaluated before they are used in clinic for adaptive radiation therapy.
Visual evaluation is common practice after performing an image registration. As a preliminary test, this evaluation is convenient and especially useful for software development. (11,12) Along this direction, many efforts have been made using landmarks or contours to estimate errors in the displacement vector field (DVF) (13,14) of DIR. For example, Hardcastle et al. (15) evaluated two registration algorithms with dice scores calculated on the contours drawn by physicians, and Brock et al. (16) evaluated 4D CT registrations reported from 21 institutions by comparing the computer-predicted displacement at each bifurcation point with the displacement computed from the oncologists' annotations. Castillo et al. (17) developed an automatic method to identify and track landmark points in lung patient datasets. These studies provided quantitative evaluation results on the performances of different DIR algorithms at these distinctive landmarks or their nearby regions. However, as reported by Kashani et al. (18) and Liu et al., (19) large registration errors can be observed in regions of uniform image intensity, and the above evaluations are limited by the number of the objects being tracked; so errors estimated by the feature-guided evaluation methods may not be representative of the registration accuracy in voxels at a distance from those landmarks. (20) Since landmarks or contours are not always available in high-dose gradient regions, mathematical properties of the generated deformation maps may serve as an alternative metric to evaluate the quality of DIR in these regions. For example, Schreibmann et al. (21) evaluated the quality of image registrations by calculating the curl of their deformation maps. Zhong et al. (22) proposed the concept of unbalanced energy, calculated directly from DVF, to detect DIR errors. Bender and Tome (23) employed consistency metrics to evaluate the accuracy of the composed deformation maps. Klein et al. (24) used permutation and ANOVA tests to compare the relative performance of 14 nonlinear registration algorithms. These studies help evaluate the overall quality of different registrations and the derived information is valuable to clinic. However, as radiation dose was mapped by the derived DVF, subvoxel displacement errors may cause the accumulated dose over-or underestimated, especially when the dose was counted on individual particles during the dose mapping process. (25,26) Salguero et al. (27) demonstrated that registration errors greater than 1 mm can induce large dose errors in high-dose gradient regions, so the accuracy of the registration algorithm needs to be quantified at each image voxel in these regions. In addition, the performance of a registration algorithm should be evaluated for patient-specific registration scenarios that may involve different image qualities, anatomy patterns (e.g., size of homogeneous organs), and deformation magnitudes.
Computational modeling can help evaluate registration performance under various simulated scenarios. (28,29) For example, Wang et al. (30) used a set of B-spline-based mathematical phantoms to evaluate their demons algorithm, and Liu et al. (19) evaluated their DIR algorithms using the computational phantom NCAT. While the NCAT phantom has organs assigned with uniform intensity, the deformation of each organ was realistically modeled with B-spline functions. Furthermore, DIR algorithms could be verified with more realistic images acquired from deformable physical phantoms. (19,(31)(32)(33) Recently Nie et al. (34) used a set of computer-simulated deformable phantoms and a physical pelvic phantom to evaluate different DIR algorithms, and demonstrated the impact of different deformations on the performance of these algorithms. Their computational phantoms were generated by the commercial software ImSimQA (Oncology Systems Limited, Shrewsbury, UK). The global deformation of these phantoms was interpolated from the thin-plate splines which were guided by a set of manually selected control points. (35) Consequently, mass volume may not be preserved during the spline interpolation, and the resultant organ deformation may not be physically realistic.
In this study, we will first develop more realistic computational phantoms from CT images of lung and prostate patients. The images of the prostate phantoms were enhanced by the simulated CBCT noise. The reality of the phantom deformations is achieved through an in-house developed finite element modeling framework where tissue elasticity and volume change in each element were characterized by Young's modulus and Poisson ratio, and the global deformation was controlled by the conservation of tissue elasticity energy and external work. Taking these phantoms as a ground truth, we will investigate parameter settings for the DIR algorithms implemented in Velocity Advanced Imaging (VelocityAI), a commercial DIR software package, and compare the performance of these DIR algorithms for five cancer patients under different deformation scenarios. In addition, we will also develop a motor-controlled deformable physical phantom to verify these algorithms.

A. Image registration software package
The VelocityAI software package (v2.6.2; Velocity Medical Solutions, Atlanta, GA) offers a rigid registration and three choices for 3D deformable registration algorithms: "Deformable Demons" (Demons), "Deformable Single-Pass" (DSP), and "Deformable Multi-Pass" (DMP). DMP uses DSP with preprogrammed choices for grid settings for each pass. Specifics of the grid settings are unknown to the authors as VelocityAI is a commercial, proprietary software. However, it is known that DMP uses mutual information as its similarity metric and B-splines for interpolation. Operators also have the option of limiting the registration space to a region of interest (ROI). Registration can be performed intra-and intermodality for CT, CBCT, and MRI, and its derived DVF can be exported as a binary file with three-dimensional values for each voxel.
The purpose of this study is to evaluate the DMP and Demons algorithms implemented in VelocityAI. Different from DMP, DSP allows users to choose different grid settings ranging from "coarse" to "fine". However, due to the lack of a ground truth, these settings are hardly to be used in clinic. In this study, we will first use the developed computational phantoms to evaluate different grid settings for DSP and compare their registration results with those derived from the DMP algorithm, and then we will focus on the comparison between DMP and Demons for different scenarios of organ deformation.

B.1 Development of computational lung phantoms
4D CT datasets used in this study were selected from four lung cancer patients under a retrospective protocol approved by the Institutional Review Board of our institution (IRB#: 6203). For each case studied, the 3D image of the 4D CT at the end inhalation (EI) phase was considered the primary image. Computational phantoms were developed based on the primary images using a finite element modeling (FEM) system. The mathematical implementation of this system was described in our previous study. (29) With this system, tetrahedral meshes were generated and scaled to match each of the image domains. Diaphragm, spinal cord, and ribs on each side were manually segmented from the CT images. Tetrahedral nodes located in diaphragm regions were selected as driving nodes, and those in spinal cords and side ribs were fixed as boundary constraints. Young's moduli were set to 1 MPa for ribs, 1 kPa for lung, and 10 kPa for other soft tissue. The Poisson ratio was 0.38 for lung and 0.49 for other elements. (36) With different forces assigned to the driving nodes, the displacement vectors of other anatomical structures were computed using the FEM modeling system.
For lung patient 1 (denoted Lung1), the selected nodes were assigned with the forces of 0.8, 1.6, 2.4, and 3.2 kPa, and these forces got the diaphragm moved superiorly by about 1, 2, 3, and 4 cm. The force-induced organ deformation can be visualized through the overlay of the patient's original image set and its deformed image set, as shown in Fig. 1. For lung patient 2 (denoted Lung2), the forces of 0.8, 1.2, and 2.0 kPa caused diaphragm deformation by 1.8, 2.7, and 4.2 cm, respectively. These phantoms will be used to evaluate the impact of motion magnitudes on registration accuracy.
To evaluate the impact of patient-specific anatomy on DMP registrations, CT images of two additional lung patients (Lung3 and Lung4) were included to develop more computational phantoms. 1.8 and 2.1 kPa forces were assigned to the tetrahedral nodes located in their diaphragm regions, and their lateral ribs and spinal cords were kept stationary as boundary conditions. As a result, their simulated diaphragm deformation is limited to 3 cm. Consequently, there are total four phantoms, including Lung1 and Lung2, having their diaphragms deformed about 3 cm.
The displacement vectors of the tetrahedral nodes generated by the FEM modeling system were interpolated to generate a DVF for each primary image. Secondary images were constructed from the primary image sets using the FEM-generated DVFs. Slice spacing and image size were kept consistent with the primary image sets with voxel sizes of ~ 1 mm in the X and Y directions, and 3 mm in the Z direction. The FEM-generated DVFs were considered the gold standards to evaluate registrations performed from the primary images to the FEM-simulated images.

B.2 Computational phantom-based evaluations of DSP with different
grid resolutions DSP and DMP share the same registration algorithm with their difference mainly in that DSP allows the user to choose the B-spline grid resolution from 1 to 9, with 1 being the coarsest and 9 being the finest. The exact grid spacing of each level is unknown due to the proprietary nature of the software. One can compose a set of DSP registrations to form a multiresolution registration chain. For example, images would be registered first using the coarsest setting, and then that result would be fed to the next stage of finer grid spacing.
In order to evaluate the DSP registration algorithm, chains of registrations with different grid resolutions were performed on the two lung patients (Lung1 and Lung2) with four diaphragm deformations (1.8 cm, 2.0 cm, 2.7, and 3.0 cm). For each dataset, six representative compositions of typical grid resolutions were investigated, including three single grid resolutions of 1, 5, and 9, as well as three resolution compositions of 1→ 5, 1→ 9, and 1→ 5→ 9. The resulting DVFs were compared against the corresponding FEM-generated gold standard DVFs to estimate the registration errors at each image voxel. The DMP algorithm was also applied and evaluated using the same dataset for comparison.

B.3 Computational phantom-based evaluation of DMP and Demons
To evaluate the DMP algorithm which was suggested as the default by the manufacturer, the performances of DMP and Demons registrations were compared based on the developed FEM models. Both DMP and Demons in VelocityAI had no customizable grid settings and were run with the default parameters. Their registrations were performed by deforming the primary image sets to match the secondary image sets. The registration error, e, at a point x was quantified by: where the first and second terms are the DVFs generated by the FEM modeling and DIR algorithms, respectively.
Both DMP and Demons registrations were performed on seven deformable models, developed from Lung1 and Lung2 to identify the potential trend of registration errors that are associated with different displacement magnitudes; then the two algorithms were evaluated with models developed from four different patients to detect any patient-specific errors. Registration errors are generally large in the superior and inferior ends of image borders because there is often no corresponding anatomy between two input images. However, the effect is clinically insignificant since the treatment lesions and relevant organs of interest are typically located near the center of the images. Thus, such border regions (2 cm on each end) were excluded from analysis in this study. Boolean masks of the patient body and lung region were created to confine the error analysis to these relevant voxels.

B.4 Physical phantom-based evaluation of DMP and Demons
A motor-controlled lung phantom was developed to simulate respiration-induced deformation. Phantom characterization is beyond the scope of this work; but briefly, the insert was made of heterogeneous sponge with average density equivalent to lung density in CT images. The inferior end of the phantom was deformed 2.5 cm to simulate a diaphragm movement. A 4D CT dataset including the phases of the end inhalation (EI) and end exhalation (EE) was acquired for this phantom. Registrations were performed from EI to EE using both DMP and Demons algorithms. Registration errors were quantified with the ROIs set to the lung region, as well as to the entire image volume. Further, the influence of different window/level settings was investigated. The selected window settings included 250, 500, 1000, and full range with the levels set to the half of the corresponding window values.
The registrations of the phantom images were evaluated with 37 landmarks. These landmarks were sponge features automatically identified in the phantom's EE image. Specifically, the image was scanned voxel by voxel until one was found with intensity greater than a given threshold. Once this voxel was found, a subprocess utilized a spanning tree algorithm (37) to find all the directly neighboring voxels with a new threshold that had lower intensity than the first. The first threshold ensured that only objects whose intensity rose above it would be chosen. The second threshold was set lower to include more periphery voxels and form a complete feature. Once the feature was delineated, its center of mass was determined to define the feature's spatial location. All voxels within an already defined object were then ignored in the remaining scans.
A deformable image registration was performed from the EI image to the EE image, and its resultant DVF was then used to construct a warped image from the EI image. The process used to identify the features on the EE image was repeated on this warped image. At this point, two sets of features were determined: one on the EE and another on the warped EI images. The correspondence of these features was established based on their size and center-of-mass coordinates, and was then visually reviewed with their overlaid images. The distance between the corresponding center-of-mass coordinates in the EE image and in the warped EI image was determined, indicating the displacement error of the registration at the center.

C.1 Development of computational prostate phantoms
The primary CT image sets were acquired from prostate cancer patients using a retrospective protocol approved by the IRB of our institution. One image set with an empty bladder was selected to create a deformable prostate phantom, where bladder expansion was simulated through equally distributed forces applied outward on a set of tetrahedral nodes located in the inside of the bladder. Two different forces were added to these boundary nodes to create two deformed image sets, termed P 1 CT and P 2 CT , resulting in a 6 mm and an 8 mm isotropic expansion of the bladder radius in the lateral direction. The resultant displacements of all the tetrahedral nodes were interpolated to get displacements in the entire image domain. The interpolated displacement fields were used to deform the primary image to generate deformable prostate phantoms with known deformation fields.

C.2 Prostate phantoms for evaluation of image registrations between CT and CBCT
To evaluate DMP performance between CT and CBCT image registrations, simulated CBCT images were created by incorporating simulated CBCT noise into the FEM-generated benchmark models. Specifically, with the two FEM-simulated deformation fields, the primary prostate CT image was warped to create two deformed images. To create simulated CBCT images, P i Noise ( i = 1,2), Poisson noise was incorporated into the deformed images in a manner similar to Murphy et al. (38) Specifically, the deformed CT image was transformed to its spectrum space using a Fast Fourier Transform (FFT) in MATLAB (The MathWorks, Inc., Natick, MA). Based on Jaffray and Siewerdsen's study of CBCT performance characteristics, (39) the noise power spectrum (NPS) was approximated by the function (2) with λ = 0.15 mm, where k is the wave number in the Fourier domain. (38) The FFT image was then multiplied by the NPS and converted back to the image domain using an inverse FFT. The resultant image P i Noise contains the simulated CBCT noise. If a registration performed from the primary prostate CT image to P i CT or P i Noise is accurate, its DVF should be equal to the FEMgenerated deformation field.

C.3 Evaluation of registrations between prostate CT and CBCT images
The evaluation of prostate image registrations was performed in the same way as for the lung cases (i.e., a voxel-by-voxel comparison to the FEM simulated DVF for constructed image phantoms). This was done for registrations from the primary CT to P i Noise and P i Noise (i = 1,2, respectively), where P i Noise was used to assess the impact of CBCT noise on image registrations. For this reason, average errors and displacement profiles of these P i Noise registrations were compared to the P i CT registrations. The calculated errors were averaged in the entire patient volume, as well as in the prostate, bladder, and rectum regions.

A.2 Evaluation of DSP with different grid settings
DSP registrations with different B-spline grid settings were performed on the computational phantoms Lung1 2cm , Lung1 3cm , Lung2 1.8cm , and Lung2 2.7cm . All the grid settings defined in the Materials & Methods section B.2 (above) were tested, and the resultant displacement errors were averaged in the patient volume with the results shown in Fig. 2.
As shown in Fig. 2, DMP outperformed DSP with these selected settings for Lung1 2cm , while a slight improvement of DSP over the DMP algorithm was observed in Lung2 1.8cm where the average displacement error was improved by 0.2 mm. For the cases of large diaphragm motions, Lung1 3cm and Lung2 2.7cm , the DMP performed better for all six settings, and in some cases the differences were larger than 3.0 mm. In general, the performance of the DMP was comparable or superior to that of the DSP with the selected B-spline grid settings. It is noticeable that, for DSP, the R 1 resolution registrations have less error than R 9 registrations, which is particularly true for large deformation cases Lung1 3cm and Lung2 2.7cm . It is primarily due to the nature of the local optimization algorithms, where the optimization is trapped in one of the local minima. Therefore, it is suggested that low resolution be used, especially in the presence of large deformation, followed by gradually finer resolutions for detailed anatomy alignments. This may suggest that grid resolutions should be adapted for different patient images, as well as different deformation scenarios.

A.3 Evaluation of DMP and Demons registrations
DMP registrations were performed on the computational phantoms developed from Lung1 and Lung2 at different deformation magnitudes. Figure 3(a) shows the profiles for the superiorinferior (SI) components of the displacement vectors derived from the registrations of Lung1. These profiles correspond to the SI-line in Fig. 3(b) that passes through the tumor volume of Lung1. The position z = 0 mm indicates the most superior slice of the image. It is evident that the DMP displacements are dipping above and below the gold standard sinusoidally due to the B-spline interpolation of the DMP algorithm.
The registration errors in the lung as well as in the patient body for DMP and Demons algorithms were summarized in Table 1. It was observed that the mean displacement error was larger for larger diaphragm motion. In all seven cases, the Demons registrations are slightly  better than DMP inside the lung, but are much worse outside the lung region for Lung2. The patient-to-patient differences are illustrated further in the next section.

A.4 Evaluation of DMP and Demons registrations with different patients and regions of interest
Four patients were simulated with their diaphragm deformed about 3 cm in magnitude to create computational phantoms, and both the DMP and Demons algorithms were applied to these phantoms. The mean errors for the DMP registrations are between 1.3 and 2.6 mm, but the performance of the Demons registrations varies largely among these patients ( Table 2).
Compared to DMP, Demons has larger errors on the chest wall and breast regions. Inside the lung, Demons has a slightly better performance than DMP in all the cases listed in Tables 1 and 2 except Lung4 (Fig. 4), where a large amount of homogeneous tissues present a major challenge to the Demons registrations (Fig. 4(c)).
Even for one patient, different regions could be registered with different qualities. As shown in Figs Overall, the average displacement errors of the DMP and Demons registrations in the lung region ranged from 1 mm to 3.3 mm. In all cases except one, the DMP algorithm produced slightly worse registrations than the Demons algorithm in the lung, but it still provided DVFs with comparable accuracy. There were notable differences between the DMP and Demons registrations outside of the lung region. The Lung2 case demonstrated significantly reduced displacement errors for both the breast tissue and chest wall region when the DMP algorithm was used. Here, displacement errors exceeded 4 cm in the breast tissue for the Demons algorithm, but were only 1-2 mm for DMP in the same area.

B. Evaluation of DMP and Demons algorithms with a physical phantom
DMP and Demons registrations were performed from EI to EE for the physical phantom. The centers of the 37 previously defined landmark objects were automatically tracked. The spatial difference between the centers in the EE image and those in the warped EI image was counted as the registration error. When the registrations were performed over the entire image volume, their average displacement errors were 1.9 and 1.6 mm for the DMP and Demons algorithms, respectively. However, when restricted to the lung regions, the Demons algorithm failed to produce a visually acceptable registration, resulting in more than 10 mm mean error, while the DMP had its average displacement error reduced to 1.8 mm.
Note that the default parameters in DMP were set for patient images which were largely different from the physical phantom. To address this issue, the image intensity window was limited during the DMP registration, based on the manufacturer's suggestion. Table 3 summarizes average displacement errors for registrations at different window intensity settings. The average of the displacement errors for all the registrations that used customized contrast settings was 0.97 ± 0.51 for the lung region, and 1.25 ± 0.58 for the image domain. The best registrations were in the ROI-constrained registrations with lower window settings, and . Figure 6 shows the EE image overlaid with the warped EI image using the DMP registrations with the intensity windows of 250 and 500, respectively. The mismatched part of the tumor for the registration can be observed in the overlaid images. The results show that the intensity window could impact the DMP registrations, and is worth investigating further in future studies.   Figures 7(a) and 7(b) show the original and computationally deformed prostate images. The expansion of the bladder and the deformation of its nearby structures can be observed in the elliptical region marked in the axial and coronal images, respectively. With the method described in the Material and Methods section C.2 (above), simulated CBCT noise was incorporated into the deformed image ( Fig. 7(b)) to derive a simulated CBCT image (Fig. 7(c)).  Table 4 lists the errors for the DMP and Demons registrations. For DMP, the average registration errors were 1.9 and 2.1 mm across the whole patient volume, and 1.2-2 mm for prostate and rectum. The errors in bladder, 4.6 mm on average, were relatively large primarily due to the large initial deformation with low image gradients in the bladder. For the Demons registrations, without noise added, the displacement errors across the whole patient body were large, 10.3 mm and 7.8 mm for P 1 CT and P 2 CT , respectively (the first and second rows in Table 4). The large errors were mostly observed in the outer tissue regions near the skin surface, while regions of clinical interest, including the prostate and rectum, yielded registration errors between 2.2-3.2 mm.

C.2 Evaluation of DMP and Demons registrations using computational prostate phantoms
For the simulated CBCT images, the Demons algorithm was not able to produce visually satisfactory results and its registration errors were greatly increased. The impact of the simulated noise on DMP was small, with slight variations observed from region to region. Table 4 lists the average displacement errors for all the DMP and Demons registrations.

IV. dIScuSSIon
When adaptive radiation therapy is implemented in clinic, it becomes necessary to quantify the accuracy of its underlying deformable image registrations. In this study, we developed computational phantoms from lung cancer patient CT images to evaluate the performance of the DSP algorithm configured with a variety of B-spline grid setting options. The results revealed that the best performance of the DSP registrations was comparable to DMP. The DMP algorithm had average displacement errors ranging from 1.0 ~ 3.3 mm for the diaphragm compressions of 1 ~ 4 cm. This is comparable to the results of a multi-institution deformable registration accuracy study (MIDRAS). (16) In the MIDRAS, the institutions that used the B-spline-based registrations reported 1.6 to 3.0 mm errors when using a set of physician-identified landmarks as gold standards. In contrast to the landmark method, the computational phantom-based evaluation method uses simulated tissue deformation maps as benchmark models, which allow DIRs to be evaluated in a voxel-by-voxel manner, and this method can overcome the limitations of the landmark-based evaluation such as observation uncertainty, lack of image features, or slice thickness-induced round-off errors. The accuracy of the DMP algorithm was found to be dependent on the magnitude of tissue deformation, as well as individual patient images. As shown in Table 1, the registration errors increased for larger diaphragm deformations. This observation is similar to that reported by Liu et al. (19) in which their in-house developed DIR algorithm was evaluated with a computational phantom (NCAT) and a physical liver phantom. However, the capability of their DIR algorithm could be underestimated because the NCAT phantom has uniform intensities assigned to each organ and, therefore, the DIRs performed on this phantom would not be as accurate as those performed on real patients. In contrast, the computational phantoms in this study, derived from actual patient CT images, do not have this limitation. It also further allows registration algorithms to be evaluated under patient-specific clinical scenarios.
With the patient-specific phantoms, the evaluation results showed that DMP registration outperformed Demons outside the lung, but inside the lung the DMP registrations were slightly worse. Both the DMP and Demons registrations demonstrated large errors in the lower lobe of the right lung and at the boundaries between the lung, diaphragm, and chest wall, as marked in Figs. 5(c) and 5(d). At these boundaries, the simulated deformation fields were large, and such large deformation combined with the homogeneity of tissue could be a major source of registration errors. (40) For the prostate cases, DMP had smaller registration errors than Demons in the prostate. For the Demons algorithm, significant errors were observed in regions of low contrast tissue. With the simulated CBCT noise added, the accuracy of the Demons algorithm degraded significantly in all of the structures evaluated in this study, while the DMP algorithm remained robust. The average errors across the prostate, the rectum, and the patient volume indicate that the simulated CBCT noise did not have a major impact on the DMP registration. This confirms the results from Murphy et al. (38) where the simulated CBCT noise had no statistically significant impact on the B-spline-based image registrations.
It should be noted that, while different deformation scenarios can be simulated, limitations still exist for the computational phantom-based method. For example, on the 4D lung images it is common that motion artifacts differ among phase images. Furthermore, the differences of the organ filling, such as the rectum feces and gas, between the treatment planning CT and daily CBCT images creates one of the major challenges for deformable registrations. The impact of such artifacts and differences has not yet been investigated. As a supplementary verification, the physical phantom was employed in this study to evaluate these DIR algorithms. However, the deformation of real patients is much more complex than the physical phantom deformation. Patient-specific verifications are still needed for clinical use of these DIR algorithms.

V. concLuSIonS
In this study, patient-specific FEM models have been developed and used as the gold standard to evaluate DIR algorithms implemented in the VelocityAI software package. It has been found that the accuracy of these algorithms is patient-dependent, and related to various factors including the tissue deformation magnitudes and image intensity gradients across the regions of interest. This may suggest that DIR algorithms need to be verified for individual deformation instances when implementing adaptive radiation therapy.