A measure to evaluate deformable registration fields in clinical settings

Deformable registration has migrated from a research topic to a widely used clinical tool that can improve radiotherapeutic treatment accuracy by tracking anatomical changes. Although various mathematical formulations have been reported in the literature and implemented in commercial software, we lack a straightforward method to verify a given solution in routine clinical use. We propose a metric using concepts derived from vector analysis that complements the standard evaluation tools to identify unrealistic wrappings in a displacement field. At the heart of the proposed procedure is identification of vortexes in the displacement field that do not correspond to underlying anatomical changes. Vortexes are detected and their intensity quantified using the CURL operator and presented as a vortex map overlaid on the original anatomy for rapid identification of problematic regions. We show application of the proposed metric on clinical scenarios of adaptive radiotherapy and treatment response assessment, where the CURL operator quantitatively detected errors in the displacement field and identified problematic regions that were invisible to classical voxel‐based evaluation methods. Unrealistic warping not visible to standard voxel‐based solution assessment can produce erroneous results when the deformable solution is applied on a secondary dataset, such as dose matrix in adaptive therapy or PET data for treatment response assessment. The proposed metric for evaluating deformable registration provides increased usability and accuracy of detecting unrealistic deformable registration solutions when compared to standard intensity‐based approaches. It is computationally efficient and provides a valuable platform for the clinical acceptance of image‐guided radiotherapy. PACS numbers: 87.57.nj; 87.55.Qr; 87.57.cp

A deformable registration is in essence an optimization procedure trying to mimic anatomical changes by various mathematical models. In reality, organs in a patient's body will deform under forces exercised between them and from the surrounding medium. Although these forces cannot be measured in vivo and are thus unknown, their effect, such as organ compression, inflation, and displacement, can be visualized using different imaging techniques. A deformable registration algorithm estimates the magnitude of these forces by comparing images acquired before and after the deformation. Proposed deformable registration models make various assumptions to solve the problem at hand, being, in essence, synthetic models not necessarily based on underlying anatomy. For example, voxel-based algorithms (14,15) assign a vector to each voxel in the input images, and iteratively vary magnitude until the predeformed image -warped with the deformation -matches the postdeformation image. Custom-made deformable registration models that incorporate anatomical information are possible through the finite element model (FEM) approach. (16) In practice, general settings and assumptions are preferred over highly customized models, but a quality assurance procedure must be in place to ensure that the solution estimated by an algorithm conforms to the expected anatomical forces.
Accuracy assessment depends on the clinical application of the deformable registration algorithm. If the deformation algorithm is used for atlas-based contouring, voxel motion inside an organ is irrelevant to the algorithm application, with the aim being to correctly match structures borders. A valid approach for this application is to mathematically quantify registration errors by distances between the automated and user-delineated contours through Haunsdorf or Dice measures. (3,12,13,(17)(18)(19)(20)(21)(22)(23)(24)(25) However, for other applications when the resulting displacement field is used to warp supplementary information such as a dose matrix or a PET dataset, it is crucial that the displacement field inside a structure of constant intensity follows real anatomical changes. Voxels in such regions of constant intensity are indiscernible to most deformable registration algorithms, (25) and thus there is little information for the algorithm to build a displacement field that accurately mimics anatomical changes. This is especially important as deformable registration is an ill-posed inverse problem that does not fulfill Hadamard's postulate on wellposedness, and since a solution may not exist or may not be unique, it is important to verify if the displacement field solution obtained by a deformable registration algorithm correctly models voxel movement.
Indeed, as opposed to rigid registration where only one valid solution is attainable, deformable registration problems might not have a solution in the strict sense, and a solution might not be unique, making their verification problematic. One approach to evaluate a deformable registration technology is to measure accuracy by using markers or clearly visible anatomical locations similar to concepts developed for rigid registration. (8,12,18,(26)(27)(28)(29)(30) Such methods are tedious in clinical practice and incomplete when applied to deformable registration because the methods do not validate how the deformation field models voxels movement inside structures of uniform contrast. (31) For either rigid or deformable registration, visual inspection is still the norm, with tools such as the checkerboard or lens used to identify regions of anatomical mismatch. These standard evaluation methods give important visual feedback, but fail to differentiate voxels, and thus do not directly quantify displacement field properties. One academic technique is to warp images with known displacement fields and then inspect the algorithm's ability to recover the deformations. (32,33) Although this is a rigorous evaluation method, it is a time-consuming approach, as daily clinical practice lacks the tools and human resources to induce and compare deformations.
We propose, in the following, a metric to complement existing evaluation tools to better understand displacement field properties when correct modeling of anatomical changes is important. At the heart of the proposed procedure is identification of vortexes in the displacement field. Vortexes are a spinning, often turbulent, flow that may develop in fluid mediums. However, patient organs are rather solid and deform smoothly, and as a result, irregular motion descriptors that produce vortexes probably won't truthfully portray the underlying anatomical motion. Vortexes in a displacement field can be identified and characterized using concepts derived from vector analysis and will be used in the following to scan a solution produced by a deformable registration procedure for regions where the displacement is unsmooth. This information is presented to the user as a color-coded map overlaid on the input images for quick characterization of simulated anatomical behavior.
An overview of the theoretical framework is presented and illustrated with clinical examples where the technique was instrumental in extracting complimentary information from the deformation field. The protocol is easy to implement and use in clinical practice as it does not require phantom measurements, and is applicable to any type of deformable registration algorithm.

A. deformable registration algorithm output
Technically, deformable registration is an optimization problem that searches for a point-wise transformation minimizing discrepancies between the two datasets to be matched using global voxel-based similarity metrics. The optimization strategy varies based on the representation of the transformation and the number of variables, being either parametric or nonparametric. In the case of a parametric representation, a finite number of parameters allow representation of nonrigid transformations with realistic complexity. The most common approach in this category is the B-Spline model where the deformation is defined only on a sparse lattice of nodes overlaid on the image, and the displacement at any voxel is obtained by interpolation from closest lattice nodes. In nonparametric approaches, displacement vectors at any voxel are considered separately to define a dense mapping, with the disadvantage that the optimization space tends to be very large, such that a variational method must replace the optimization.
All deformable registration algorithms use a form of regularization that ensures the resulting displacement field is smooth to correctly describe intra-or interfractional anatomical changes. Simply stated, the purpose of the regularization is to ensure that neighboring voxels are mapped to similar location by the displacement field. Without regularization, each voxel is allowed to map freely to any location, a situation that may lead to the case when neighboring voxels are mapped to a completely different location. Such displacement field applied on tissue would create unrealistic folding and tearing. In the case of the B-Spline approach, this is accomplished by the interpolation between the control nodes. In the case of demons algorithms, an optional smoothing of the current displacement field is performed after each iteration.
Independent of a particular implementation of a deformable registration algorithm, its result is, in essence, a vector field that maps voxels between two images. This displacement field can be visualized as a set of vectors positioned at regular grid points as shown in Fig. 1, with arrow direction and length proportional with vector direction and magnitude. Each point in the displacement field has associated a vector

B. Vortexes
A vector field's rate of rotation in a particular point, sometimes called vorticity, can be mathematically detected by the CURL operator. For a vector field, the operator is associated with the microscopic circulation at a point and is defined as: Technically speaking, the operator measures changes in directions between two neighboring vectors in the displacement field. If the displacement field is smooth and consistent with expected anatomical motion inside an organ, changes between neighboring vectors are minimal and CURL values are low. If the displacement field is fragmented with neighboring vectors mapping in different directions, values of the CURL operator are high.
This concept as applied in radiotherapy is illustrated visually in Fig. 1, where the operator identifies discontinuities in the displacement field on a registration between inhale and exhale of 4D datasets. The displacement field in the liver regions is highlighted in Fig. 1(b) as a vector map. The regions marked with white arrows mark displacement field inconsistencies. At arrow #1, vectors engage in a circular motion. In reality, applying such a field on the organ would produce a vortex of its voxels. At arrow #2, for example, some vectors are directed toward the liver center, while neighboring arrows point outward. In reality, applying such a field on a solid organ such as the liver would produce organ disruption.
Vortexes in the displacement field detected by the CURL operator and characterizing such unnatural behavior are shown in Fig. 1(c) as a color-coded map placed in the same coordinate system with the vector field visualization. Vortex intensity is color-coded from blue, corresponding to small intensities, to the highest intensities shown in red. Inaccuracies observed in the visual inspection of this particular solution are automatically marked by the CURL operator.

A. Phantom case
To exemplify and verify the proposed QA metrics, we constructed a synthetic phantom with known characteristics modeled after a Quasar Body phantom (Modus Medical Devices, London, ON), (34) a Lucite phantom designed for nondosimetric quality assurance. CT images of the phantom were obtained using a GE LightSpeed scanner (GE Medical Systems, Milwaukee, WI) with a 2.5 mm slice thickness. Various phantom inserts consisting of standard cylinders and cubes were segmented, and synthetic CT datasets were created by warping these segmentations with known deformations and defects, followed by conversion of segmentations to voxels of constant HU units. As depicted in Figs. 2(a) and 2(d), structures marked with 1 and 2 were expanded to mimic tumor shrinkage. The figure shows the fixed dataset superimposed on the moving dataset, where synthetic differences are marked with arrows and can be observed as gray regions. Figure 2(d) illustrates in a similar display mimicking artifacts occurring in typical clinical CBCT datasets. The defects are a crosshair-like streak artifact marked with arrow #4 and an inaccurate HU artifact marked with arrow #5. Registration of the original and modified datasets was performed using a diffeomorphic demons-type (35) algorithm and the displacement field was analyzed using the proposed metrics.
As shown in Figs. 2(c)-2(f), the CURL operator gives a better understanding of deformation field characteristics that complement the simple image-based assessment by quantifying the behavior of the displacement field. In the standard evaluation method showing target and result (Figs. 2(b) and 2(d), 2(c) and 2(f)) images are almost identical, with few visible differences, as the algorithm was able to mimic structure shrinkage. Contractions are observed in structures 1 and 2, as the deformation field tries to inflate the voxels in the small initial volume to match its larger size in the target image. At the same time, the field must preserve shape number 3, creating a compression as the deformation is stopped from expansion in this structure's vicinity. The vortex map ranges from 0 to 5.1 and is shown in Fig. 2(e), with strong hot spots in structures 4 and 5, where the induced artifacts are present. For a better understanding of these measures, the displacement field on these structures is shown as arrows in Fig. 3, with arrow direction and color according to displacement field intensity and direction. On structure 1 (Fig. 3(a)), arrows point to the center, indicating a constant field compression. For comparison, compression on structure 2 is nonuniform. For structure 3 ( Fig. 3(b)) large changes in the displacement field are observed due to the small round artifacts in its middle that must be created by the deformation field from the neighboring voxels. These discrepancies are detected and marked by the CURL operator. As is obvious in this figure, visually inspecting the deformation field using arrows to mimic displacements would be tedious, because too many vectors are present, but the proposed operators automatically detect characteristics and provide a useful tool to quantify the displacement field behavior.

B. deformable registration for adaptive radiotherapy
Integration of CBCT imaging data and dose tracking for radiation therapy has drawn the interest of many clinicians because of its compelling advantages in defining tumor volume and designing treatment plans that better spare critical structures. In this approach, information obtained from online cone-beam CT (CBCT) scans probe patient anatomy that can be used to assess the dose delivered to date and make plan adaptations, if needed. Deformable image registration is used to map daily dose distributions to a common coordinate system, typically the original treatment plan, to create a cumulative dose distribution. The approach can be used to estimate the difference between the planned and delivered doses (36) or to study the need for re-planning. (37,38) We mapped the dose to the CBCT datasets using two algorithms, a diffeomorphic demons algorithm (36) and a B-Spline (15) algorithm, both implemented in the ITK library, (39) for a case of a large tumor in the pelvis expected to change in size and shape during treatment, treated to an initial dose of 45 Gy. The diffeomorphic demons algorithm is a monomodality algorithm that implicitly assumes a structure is represented by voxels of the same intensity in both datasets. This is not necessarily valid, as various artifacts and an increased level of noise are present in the CBCT dataset due to acquisition geometry. As compared with the standard multidetector CT, images acquired on a cone-beam CT (CBCT) dataset have increased scatter incoming on the detector that implicitly produces artifacts, decreased contrast-to-noise, and inaccuracies in CT number calculations in the reconstructed images. In addition, high density materials in the field of view that highly attenuate the X-ray beam making the attenuation, values of objects behind the object are incorrectly high causing streak artifacts.
A simple visual comparison of the original CT and CBCT datasets presented in Fig. 4, upper row, illustrates a visible hot spot in the bladder and a darker region in the soft muscle. For the demons diffeomorphic algorithm, these artifacts, combined with the low parameter of the regularization term in the algorithm's mathematics, produced the result shown in Fig. 4(c), which is correct from the viewpoint of metrics because previously mentioned hot and cold spots are mimicked in the resulting transform by warping voxels from bone or air. However, this is an unrealistic solution from a clinical point of view. For comparison, a multimodality B-Spline algorithm with a few control nodes to provide strong regularization is shown in Fig. 4(d). In this solution, although the displacement field is uniform, small changes in bladder shape (marked with arrow) are not necessarily modeled. The vortex map, presented in the third row of Fig. 4 for both algorithms, identifies many vortexes with high intensity in the demons solution, ranging from 0 to 67.34, while the B-Spline solution has only a few small vortexes with values ranging from 0 to 6.86 in soft tissue and away from the target. When applying these solutions on the dose matrix, the dose mapped through the B-Spline (Fig. 4(e)) appears smooth, while the dose mapped with the demons algorithm appears jagged and unrealistic (Fig. 4(d)). The bad solution presented in Fig. 4(c) is an extreme case, easily identifiable by visual inspection; however, most solutions we encountered in clinical practice have some unrealistic warping on local voxels that are not easily identifiable by visual inspection. For most clinical solutions, the vortex map was helpful in automatically identifying regions of nonsmooth displacements. Figure 5 illustrates such a case when warping of similar magnitude is not identifiable by visual inspection only.
Similar to IMRT QA, where the gamma index is used to measure discrepancies (medical physicist decides whether high values may influence treatment based on their locations and intensities), in the proposed evaluation method, the physician decides if vortexes are clinically relevant or justified. For example in Fig. 1, vortexes describing irregular motion are justifiable on the lung-diaphragm interface, as these organs move independently, but are not clinically justifiable inside the liver. Vortexes far away from regions of clinical interest, such as tumor or critical organs, may be acceptable, as long as the dose or other measures are not computed in the affected area. Fig. 4. Evaluation of deformable registration as applied in adaptive radiotherapy. The CT dataset (a) was warped to the CBCT dataset (b) using a monomodality diffeomorphic demons algorithm (c), (left column in results section) and a multimodality B-Spline algorithm (d), (right column in results section). In (c), HU calibrations and various artifacts present in the CBCT dataset violated the monomodality assumption of the first algorithm, leading to unrealistic warping. In (d), the solution provided by algorithm 2 interpolates artifacts but is not able to model small local organ changes. The vortex map easily catches these characteristics of the displacement fields (e) and (f). When warping the dose distribution from the planning to the CBCT dataset, the distribution warped by algorithm 1 looks unnatural and unrealistic (g), while the dose distribution warped by algorithm 2 looks natural (h).

c. deformable registration for treatment assessment
In Figure 5, we show usage of the vortex map to identify errors when deformable registration is used in treatment response assessment to compare pre-and post-treatment positron emission (PET) datasets. (40,41) In this approach, the baseline PET/CT scan is acquired up to a week before the start of the chemotherapeutic or radiotherapeutic treatment. The follow-up PET/CT scan is acquired in the first half of therapy, typically after 1 to 2 weeks of chemotherapy. Treatment efficiency is then assessed by comparing changes between the two datasets. (42,43) As changes in SUV intensities are used to monitor treatment process, it is essential to discern changes induced by the treatment itself from any other nontreatment-related factors. Deformable registration is needed to account for organ displacements and posture changes.
Application of the QA procedure when deformable registration was used for treatment assessment is presented in Fig. 5. Deformable registration input included CT attenuation components of the PET scans containing clearly visible anatomical information, but registration result was further applied on the PET component for SUV comparison. Input datasets are in Figs. 5(a) and (b), with an arrow marking the original tumor location, visible in the pretreatment dataset, but significantly reduced on the aftertreatment scan. Three solutions spanning the trade-off between smoothness of the deformation field and ability to model small changes are compared for this task: a diffeomorphic demons (44) algorithm with the smoothing displacement field option turned off (left column), the same algorithm with the same option turned on (middle panel), and a B-Spline algorithm (15) (right panel) using just seven grid nodes per dimension.
Solutions obtained by the three algorithms are presented in Figs. 5(c), (d), and (e). In a standard visual evaluation comparing the result with the target (post-treatment scan), the first algorithm is highly successful in modeling changes as the two compared images are almost alike (Fig. 5(c)). The smooth version of the same algorithm obtains a good result, with a small discrepancy marked with an arrow (Fig. 5(d)), while the B-Spline algorithm provides the worst result as it is not able to match tumor changes (Fig. 5(e)).
When turning the vortex map on, the first two solutions present high-intensity vortexes near the tumor location (26. (j)). Clinically, the solution provided by the third algorithm is the worst in terms of registration metric, but in a visual inspection of the vortex field it appears as the most accurate for treatment response assessment. The first two solutions warp the tumor to the healthy esophagus wall. However, cells in the tumor are destroyed by radiation and eliminated from the patient's body. Thus the high-intensity SUV values should not be mapped to the healthy esophagus, but rather kept at their original coordinates.

IV. dIScuSSIon
We show that for adaptive radiotherapy mapping additional datasets such as dose or PET datasets, careful case-by-case inspection of algorithm accuracy in describing anatomical changes is essential to assure safe and accurate practice of radiotherapy, by identifying in a specific solution, regions where the deformation field is unrealistic. Deformation field can be scanned to find vortexes in the displacement field. These regions should match clinical expectations. Measures proposed in this work can be used in conjunction with standard visualization tools to qualitatively characterize the deformation field for a specific solution obtained on real patient images. While not sufficient by themselves in detecting deformable registration errors, the tool presented in this report helps identify suspicious locations where the displacement field may not properly describe the expected anatomical change.
In clinical practice, there isn't any circular soft-tissue rotation expected and therefore the CURL values should be ideally zero. In our examples, we noticed that clinically valid solutions have CURL values ranging from 0 to 5 mm, while nonphysical solutions had CURL values from 5 to 10, with lower values recorded in regions of smooth transitions and voided of artifacts, and values above 3 observed in regions where the algorithm struggled to match the image datasets. However, a larger study on datasets from different anatomical sites is needed to validate clinically these initial findings. In the cases presented, the operator was useful in detecting regions where the displacement field had high values as compared with other regions in the solution. These regions of high CURL values should be far away from clinically important structures, with both their number and intensity being associated with nonphysical solutions.
Previous reports show that registration errors are prone to emerge in regions with uniform image intensity and low-intensity gradient regions and are not visible with standard voxel-based methods. (45) Although it is hard to identify differences by inspecting the images directly, analysis of the deformation field through the CURL operator provides complementary information that better identifies and characterizes tissue warping. As shown in Figs. 4 and 5, a vortex map leads to a better understanding of solution properties when compared to standard voxel-based methods because the displacement field is directly evaluated.
Phantom measurements that mimic patient anatomy can be used to assess deformable registration accuracy in specific conditions, (31,(45)(46)(47)(48) but are time-consuming and not well-suited for routine evaluation, as algorithm settings are usually tailored to image quality, (49,50) levels of noise, (51) or partial volume effects, with estimations obtained on phantom studies overestimating accuracy if images of lesser quality are used as input. Furthermore, accuracy depends on parameter selection, which is closely associated with intensity gradients of the underlying image, (45) or are site-specific. For example, the number of iterations or the number of nodes in the B-Spline model have been found to be site-specific, (45) with a different number of nodes needed to accurately register the rectum or lung. For a casual user, the evaluation metric presented here provides a tool that quickly analyzes and quantifies a solution obtained under specific settings, and visually guides the user in selecting parameters that work best with the input datasets, as the vortexes map calculation takes less than one second and is easily displayed as a 3D volume or a color-wash superimposed on the CT dataset (as illustrated in Fig. 6). As a software procedure that extracts relevant clinical information from the displacement field, it is ideally suited for a daily clinical operations where the plausibility of a solution should be verified, reducing the time spent by physics staff on verifications.
This study does not endorse a particular deformable registration algorithm, but rather highlights the need for evaluating, solution-by-solution, the results of such an algorithm to gauge if the displacement map provided by a deformable algorithm is plausible from a clinical perspective. In a comprehensive publication, Kashani et al. (31) reported interalgorithm comparison at different institutions and noted that "no algorithm was uniformly accurate across all regions in a phantom", with larger errors seen in regions of significant shape changes and in areas of uniform contrast. As noted by Zhong et al., (45) performance in regions with low-intensity gradients is difficult to assess. This study highlights the need for a direct measure derived from the displacement field when assessing accuracy of deformable warping for clinical applications.
Similar to established IMRT QA procedures that verify leaf position accuracy with devices or back-up software that is independent of the treatment planning optimization algorithm, the proposed deformable registration QA approach constitutes an independent check that quantifies errors by measures obtained on a specific result, independent of its generating algorithm. The proposed measure is also useful in discarding solutions that are mathematically valid by the metric function, but produce clinically unrealistic warping.

V. concLuSIonS
There is a lack of efficient evaluation tools to aid the casual user to identify unrealistic warping within a deformable registration solution. Such a tool should be simple to use in clinical practice, preferably without the use of markers and phantoms, and should be algorithm-independent. Our proposed metric fulfills these requirements. The procedure relies on extracting displacement field characteristics that are overlaid on the original anatomy for quick identification of problematic regions. Analysis of a displacement field through the CURL operator is ideally suited to evaluate the deformable registration when accuracy of its resulting displacement field inside a structure is vital, as the operator identifies nonrealistic displacement that is invisible in standard voxel-based approaches, providing an important tool to successful implementation of adaptive radiotherapy in clinical practice. rEFErEncES Fig. 6. Hot spots in the displacement field can be easily identified using the vortex measure. In this volumetric 3D view, the vortex map is displayed color-coded in relation to patient anatomy for fast localization of doubtful regions in the displacement field.