A multiparametric method to assess the MIM deformable image registration algorithm

Abstract A quantitative evaluation of the performances of the deformable image registration (DIR) algorithm implemented in MIM‐Maestro was performed using multiple similarity indices. Two phantoms, capable of mimicking different anatomical bending and tumor shrinking were built and computed tomography (CT) studies were acquired after applying different deformations. Three different contrast levels between internal structures were artificially created modifying the original CT values of one dataset. DIR algorithm was applied between datasets with increasing deformations and different contrast levels and manually refined with the Reg Refine tool. DIR algorithm ability in reproducing positions, volumes, and shapes of deformed structures was evaluated using similarity indices such as: landmark distances, Dice coefficients, Hausdorff distances, and maximum diameter differences between segmented structures. Similarity indices values worsen with increasing bending and volume difference between reference and target image sets. Registrations between images with low contrast (40 HU) obtain scores lower than those between images with high contrast (970 HU). The use of Reg Refine tool leads generally to an improvement of similarity parameters values, but the advantage is generally less evident for images with low contrast or when structures with large volume differences are involved. The dependence of DIR algorithm on image deformation extent and different contrast levels is well characterized through the combined use of multiple similarity indices.

between reference and target image sets. Registrations between images with low contrast (40 HU) obtain scores lower than those between images with high contrast (970 HU). The use of Reg Refine tool leads generally to an improvement of similarity parameters values, but the advantage is generally less evident for images with low contrast or when structures with large volume differences are involved. The dependence of DIR algorithm on image deformation extent and different contrast levels is well characterized through the combined use of multiple similarity indices. Physical phantoms, mimicking realistic anatomy and containing internal deformable heterogeneities, have been manufactured and proposed for the validation of several algorithms in thorax, pelvis, and head and neck districts. [2][3][4][5][6][7] This approach is convenient but presents some limitations due to phantom unavailability for all the tasks to be tested and from the lack of a ground truth transformation. On the contrary, synthetic or digital phantoms, created by applying displacement vector fields to deform patient images, offer the possibility to perform a quantitative comparison between the ground truth and the algorithm-created displacement vector fields. [8][9][10][11][12][13] This approach is interesting even if realistic deformations are quite difficult to be implemented in large regions. Finally, when patient data are used, DIR performances are usually assessed comparing anatomical marker positions [14][15][16][17][18] in deformed and original images. Even if these tests offer useful clinical information, they are inadequate to fully describe algorithm behavior and to point out its limits.
Independently from the validation approach followed, several similarity indices can be used to assess registrations quality. The most commonly employed method consists of comparing the position of corresponding landmarks (points that can be easily recognized in all images) or the volume of corresponding structures in registered images. Each indicator provides helpful information for DIR accuracy quantification, but only the combined use of several indicators can fully characterize the registration quality pointing out errors in position or shape of registered structures.
A lot of different deformable algorithms have been developed in the last years and several studies have been proposed to highlight their strengths and weaknesses or to compare the performances of different algorithms. Several papers have been published on MIM DIR algorithm performances, mostly comparing MIM results with those obtained using other algorithms. 2,7,8,[11][12][13]17,18 The impact of image characteristics (as contrast levels, noise, deformation etc.) on registration results has been studied but, as far as we know, no study examined different aspects separately.
In this work, we propose a multiparametric validation of the MIM-Maestro DIR algorithm and Reg Refine tool (MIM Software, Cleveland, OH) considering some typical deformations that might appear in computed tomography (CT) studies during the course of head and neck radiotherapy treatments. For this purpose two real phantoms, simulating realistic deformations as neck bending and tumor shrinking were realized and CT studies were acquired after applying different deformations. Moreover, different contrast levels were artificially created modifying the CT values of one of the two phantom studies. Images of deformed phantoms were registered on original data with the DIR algorithm and the Reg Refine tool. Landmarks distances, Dice coefficients, Hausdorff distances, and maximum diameter differences between reference and registered images were used to quantify the algorithm capability in recovery reference positions and shape of points and structures.

2.A | Registration software
The DIR module of the MIM software uses a free-form, intensitybased algorithm to carry out CT to CT DIRs (http://downloads.mim software.com.s3.amazonaws.com/brochures/MIM%20Maestro% 20Unlimited%20Brochure.pdf, visited on November 2018). The deformable transformation is created starting from a rigid fusion of the initial image sets, through the minimization of a cost function that takes into account the image similarity and the physical likelihood of the transformation. During the optimization process, the image similarity has a higher weight compared to the physical likelihood, with the risk of producing unrealistic transformations. 7 If the DIR results are not satisfactory, the user can refine the alignment using the Reg Refine tool. 11,18 In this case, boxes of adjustable dimensions are manually positioned in those regions where the alignment is not adequate and inside these boxes local rigid registrations are performed. A new DIR is then created combining the local registrations.

2.B | Phantoms
Two phantoms were prepared to check DIR algorithm performances when bending and volume shrinking occur.
The Phantom 1, developed for bending test, is a stick made of modeling clay. Inside it seven small glass grains (diameter, d = 2 mm) and three glass spheres (d = 1.6 cm) acting as markers and reference structures were included. A deformation of the clay and deployment of the rigid structures is obtained by flexing the phantom (see Fig. 1).
As the HU values of glass and modeling clay are different from those of real tissues, we modified them by using an in-house written Matlab following a second-degree polynomial curve that fits the HU of the background and the two new HU values.
The Phantom 2, developed for shrinking test, consists of a structure made by a rubber membrane filled with water and connected to a syringe with a small pipe, fixed between the head and the neck of an Alderson Rando phantom. Ultrasound gel was used to fill cavities to create a realistic mass protruding from the phantom. Volume variations were obtained filling the rubber membrane with different quantities of water. Eight phantom defects easily recognizable in all images were used as markers (see Fig. 2).

2.C | Accuracy tests
A good DIR should be able to create registered images where points and structures have positions, shapes, and dimensions as similar as possible to the corresponding ones in reference images. In this work, these aspects were evaluated measuring the distances between corresponding marker positions and comparing shape and dimension of corresponding structures automatically segmented using the following similarity indices: • the distance between centroids of corresponding markers (RM); • the dice similarity coefficient (DSC) between corresponding contours (sensitive to translations and volume changes); • the Hausdorff Distance (HD) between corresponding contours (sensitive to structures shape modifications).
When Phantom 1 was used, the distance between the centroid of corresponding spheres (R) and the absolute value of maximum diameter difference of corresponding spheres (DD) were also evaluated for each sphere.
Sensitivities of the methods used to evaluate the similarity indices were estimated by registering an image set with itself and evaluating RM, HD, DD, R, and DSC indices for each internal structure. For each index and phantom, the associated sensitivity was estimated as the maximum value obtained. For the DSC, the maximum difference from one was considered.

2.C.1 | Variable bending test
The DIR performances with respect to different degrees of bending were studied using Phantom 1 and acquiring CT studies with a Bril- datasets. In all cases the initial rigid registration was performed aligning spheres 1 and 3 as shown in Fig. 1(b). The three glass spheres were automatically segmented in CT(0°), R1, R2, and R3 using the same threshold (50% of max) and each marker centroid was localized in reference and deformed studies. HD, DSC, DD, R, and RM were finally measured.

2.C.3 | Shrinking volume test
The CT images of Phantom 2 were acquired with the same scanner (120 kV, 2 mm slice thickness and 0.52 × 0.52 mm 2 pixel size). Two CT studies CT(fill1) and CT(fill2) were performed after filling the membrane fixed to the Alderson Rando phantom with 25 and 50 ml of water. A CT study CT(fill0) of the same portion of the Alderson Rando phantom without adding the external structure was acquired (Fig. 2). CT(fill2) was registered on CT(fill1) and on CT(fill0) resulting in reg1 and reg2 registrations and corresponding registered studies.
Volume differences (VD) between CT(fill2) and CT(fill1) and CT(fill2) and CT(fill0) were 25 and 50 ml, respectively. Each pair of studies was initially fused by optimizing the matching of bony structures.
The phantom external contour and the mandible were segmented for 31 slices (6.2 cm) around the changing volume in references and registered studies and used to evaluate HD and DSC. RM between eight internal markers was also measured.  Table 1. For DSCs, differences from 1 are reported. These values were compared with the differences between indices coming from different registrations to assess whether they were negligible or not case by case.

3.B | Variable bending and variable contrast
The DSC values for the three spheres of Phantom 1 calculated between CT(0°), R1, R2, and R3 are reported in Table 2

3.C | Shrinking volume
In

3.D | Reg refine test
In

| DISCUSSION
In this work, several aspects of the MIM DIR process were separately investigated considering some typical deformations that might appear in CT studies during head and neck radiotherapy treatments.
We investigated how the accuracy of the registrations depends on the structures' shape and image contrast and whether the use of the Reg Refine tool improves the registration results. For these purposes, we developed two simplified physical phantoms to simulate neck bending and tumor volume shrinking, and we measured the quality of registration results using multiple indices. It is in fact only the combined use of multiple indices that permits to quantify all kind of errors in a registration process pointing out unrecovered translations and variations in shape or volume of structures depicted in reference and registered datasets.
From our tests, we have noticed that when objects are considerably deformed registration results get worse. The variable bending tests highlight that results get worse, as shown by increasing spider graph areas from R1 to R3, in case of a large difference in bending between reference and deformed images. Corresponding structures that in the initial rigid registration do not match [such as sphere 2 in our case, Fig. 1(b)] do not maintain their shape during the deformable registration process as demonstrated from DSCs values in Table 2. Moreover, HD and DD present a variability higher than that observed for corresponding markers (RM) and corresponding spheres (R) demonstrating that, in general, objects' position is better reproduced than objects' shape in registered images. It is worth noting that R is almost constant in all tests (variations lower than 0.5 mm) and lower than that of RM. In the first case, in fact, the intensitybased registration algorithm is facilitated by the high contrast existing between spheres and clay. More generally, registration algorithm performances worsen when decreasing the image contrast, as  Table 2 and by large spider graphs area for low-contrast images described in Fig. 3(c). Large bending and small contrast give the worst results. This is true not only for HD and DD but also for R, which results much higher than that obtained in high-contrast images. The low contrast between spheres and clay lowers the registration algorithm capability to properly correct deformed objects.
By analyzing the results of the shrinking volume tests, it is possible to see that the registration quality also depends on the differences in phantom volume in the two image sets. In particular, the shape of the external contour is less accurately reproduced when larger volume differences are considered, as demonstrated by higher HD value for reg2 than reg1, visible in Table 3. On the contrary, DSC is quite insensitive to volume difference, probably due to the small relative volume changes of the external contour (maximum 50 over 700 ml).
Finally, the Reg Refine tool leads generally to an improvement of registration quality. Analyzing case by case, we noticed that it happened in most cases, also in situations where the algorithm performances are originally poor. This is demonstrated by increased DSC values for sphere 2 when the Reg Refine tool was used (see Table 2). The advantage of Reg Refine is instead less relevant for images with low contrast especially in reproducing correctly structure's shape [see DD and HD of Fig. 3(d)] and for images with large volume differences (see Table 3). In this latter case, the use of blocked boxes that force the overlap of external contour, guarantees a better registration in this area but induces bony structure deformation (see Fig. 4). Moreover, the high value of mean markers' distance (

ACKNOWLEDG MENTS
The research of author S. Calusi is partially supported by "Master di primo livello per TRMIR, Specialista nell'ottimizzazione e sviluppo di apparecchiature, sequenza e tecniche di studio di Risonanza Magnetica" of Florence University.

CONFLI CT OF INTEREST
The authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript.