Accuracy of 3D volumetric image registration based on CT, MR and PET/CT phantom experiments

Registration is critical for image‐based treatment planning and image‐guided treatment delivery. Although automatic registration is available, manual, visual‐based image fusion using three orthogonal planar views (3P) is always employed clinically to verify and adjust an automatic registration result. However, the 3P fusion can be time consuming, observer dependent, as well as prone to errors, owing to the incomplete 3‐dimensional (3D) volumetric image representations. It is also limited to single‐pixel precision (the screen resolution). The 3D volumetric image registration (3DVIR) technique was developed to overcome these shortcomings. This technique introduces a 4th dimension in the registration criteria beyond the image volume, offering both visual and quantitative correlation of corresponding anatomic landmarks within the two registration images, facilitating a volumetric image alignment, and minimizing potential registration errors. The 3DVIR combines image classification in real‐time to select and visualize a reliable anatomic landmark, rather than using all voxels for alignment. To determine the detection limit of the visual and quantitative 3DVIR criteria, slightly misaligned images were simulated and presented to eight clinical personnel for interpretation. Both of the criteria produce a detection limit of 0.1 mm and 0.1°. To determine the accuracy of the 3DVIR method, three imaging modalities (CT, MR and PET/CT) were used to acquire multiple phantom images with known spatial shifts. Lateral shifts were applied to these phantoms with displacement intervals of 5.0±0.1mm. The accuracy of the 3DVIR technique was determined by comparing the image shifts determined through registration to the physical shifts made experimentally. The registration accuracy, together with precision, was found to be: 0.02±0.09mm for CT/CT images, 0.03±0.07mm for MR/MR images, and 0.03±0.35mm for PET/CT images. This accuracy is consistent with the detection limit, suggesting an absence of detectable systematic error. This 3DVIR technique provides a superior alternative to the 3P fusion method for clinical applications. PACS numbers: 87.57.nj, 87.57.nm, 87.57.‐N, 87.57.‐s


I. INTRODUCTION
Radiation therapy has been improved in recent years owing to technical advances, including image-based treatment planning as well as image-guided treatment delivery. (1)(2)(3)(4)(5) Multi-modality imaging techniques that have been routinely applied in radiation treatment planning (RTP) include: computed tomography (CT), magnetic resonance imaging (MR), and positron emission tomography (PET). Through image registration, (6)(7)(8)(9)(10) patient anatomy and physiology can be combined and visualized, providing a comprehensive view of the therapeutic target, together with surrounding normal tissues. The addition of coaxial imaging equipment to megavoltage Xray accelerators, including on-site cone beam CT (11,12) and Tomotherapy Imaging, (13,14) has set a new foundation for image-guided radiation therapy (IGRT) development, by providing immediate pre-treatment verification and adjustment of a patient's position, resulting in improved accuracy of conformal radiation treatment delivery. This high-precision radiation treatment delivery (RTD) minimizes normal tissue toxicity and opens a path to more aggressive fractionation schemes. It also permits a transition to frameless intra-/extra-cranial stereotactic radiation therapy, with improved patient comfort and clinical outcome. (15)(16)(17)(18) Image registration plays the key role in providing optimum alignment between the pre-treatment setup image and the planning image, (19)(20)(21)(22) minimizing deviation (patient setup uncertainty) of the RTD from the RTP.
Principal image registration techniques include intensity-based automatic registration, as well as visual-based manual registration. (6)(7)(8)(9)(10)(23)(24)(25)(26)(27)(28)(29) Automated registration techniques have been used increasingly in RTP and IGRT, (13,14,25,27,29) based on maximization of mutual information (MMI) of multi-modal images (30)(31)(32) or grayscale similarity (MGS) for single modality images. (33,34) However, an automatic registration may carry and propagate systematic errors, (34) reach a sub-optimal solution, (16) or even fail to achieve a reasonable alignment. (33) Realistically, these phenomena exist because most clinical images contain a certain degree deformation, including motion induced deformation and artifacts, especially in the case of PET imaging. Many deformable registration algorithms have been reported, (7,30) but they all suffer from a lengthy optimization process and are not yet applicable clinically. Simplified techniques have been reported and applied clinically, such as region-of-interest registration, (6) intensity-weighted registration, (14) and discrete rigid body approximation, (32) but manual adjustment is clinically required based on visual verification, combined with anatomical and physiological knowledge. (24)(25)(26)(27)(28) Prior to the recent development of the 3D volumetric image registration (3DVIR) method, (27) the only viable manual fusion method was based on three orthogonal planar views (3P). (6,13,(24)(25)(26)(27)(28) In addition to "fine-tuning" automatic registration results, this manual method was also used in establishing initial conditions for an automatic registration and for an independent registration. Because this 3P fusion is based on 2D visualization, it only provides partial 3D information at any given time. A 3D alignment is actually achieved through massive, non-visual correlation among a series of planar views in three orthogonal directions, resulting in the following major shortcomings: (1) large inter-/intra-observer variations, (26) (2) time consuming, tedious methodology, (25)(26)(27)(28)(29) (3) single-pixel precision, (24,26,27) (4) fewer reliable anatomical landmarks for functional images, (25) and (5) global registration errors. (27) Using the manual 3P fusion technique to verify and adjust the results of an automatic image registration will adversely affect the final registration accuracy, making it both observer-dependent and error prone.
The 3DVIR technique overcomes most of the shortcomings of the 3P fusion method. (27) This registration technique aligns the image using anatomic structure volumes and surfaces by employing a new dimension, namely the homogeneity of color distribution on an anatomical landmark. Additionally, the criteria associated with the 3DVIR provide instant feedback on the quality of the alignment and offer guidance for further iterations. The method presents a visual volumetric correlation of landmarks, eliminating the repetitive, tedious and observer-dependent evaluation process inherent in the 3P fusion method. Previously, the 3DVIR technique was compared with MMI-based automatic registration for cross-verification. (27) In this study, an improved 3DVIR technique and its validation against experimental data in three imaging modalities will be reported. The technical improvements have included (1) incorporation of image classification to visualize internal registration landmarks in real-time, (2) introduction of a quantitative registration criterion to further reduce observer dependency, and (3) use of decimal precision in transformation and interpolation for sub-voxel registration capability. Validation and accuracy assessment of the 3DVIR have been performed based on three phantom imaging experiments (CT, MR and PET/CT), by comparing calculated registration shifts with measured spatial shifts in the phantom position. The detection limit of the 3DVIR has been assessed visually and quantitatively using eight clinical personnel and a plot of quantitative criterion versus spatial shifts, respectively. Both accuracy and detection limit were found to be 1/10 voxel (1 voxel ~ 1 mm) for all three imaging modalities. The advantage of using the 3DVIR technique over the manual 3P fusion method and automatic MMI/MGS registration methods will be discussed in terms of accuracy and reliability, based on the ability to selectively use the most reliable, volumetric registration landmarks.

II. METHODS
The key of the 3DVIR technique, which was described previously, (27) is the registration criterion, namely the homogeneity of color distributed on a given anatomic landmark, with the registering images represented by pseudo-mono-colors, such as red (R), green (G) or blue (B). The volumetric registration flow chart is shown in Fig. 1. The volumetric image visualization is supported by a volume rendering video card for real-time performance based on the ray-casting visualization algorithm, as shown in Fig. 2. All registration operations are performed in real-time, including landmark classification and visualization, registration transformation and interpolation, calculation of the volumetric registration criteria, and viewpoint manipulation.
FIG. 1. The flow chart of 3D volumetric image registration process. The 4 volumetric image data are stored in a 32-bit voxel buffer array, which can be retrieved and manipulated in the two different processes: registration transformation and volumetric visualization. The registration process iterates until the registration criterion is satisfied visually and/or quantitatively.

A.1 RGBA lookup tables (LUTs)
A lookup table (LUT) is a map (transfer function) that associates a set of scalar values (grayscale) to color (R, G, and B) and visibility (opacity: A or Alpha) of a set of data points (voxels). A LUT is often overlaid with image histogram, facilitating the color mapping (a one-dimensional visualization technique). The opacity LUT (A) overrides the color LUTs (RGB); that is, when a voxel is transparent (A=0) the assigned color does not matter as it becomes invisible. Under the RGBA visualization format, the LUT is a sophisticated version of the "Window/Level" (W/L) control, defining which voxels are visible and what color they are. In other words, the W/L is the simplest case of a LUT with a linear Level function (increasing from zero to unity) within a grayscale Window range.
A mono-colored image can be realized using either a single LUT (such as R, with G=B=0) or identically weighted LUTs (such as white, with R=G=B). A linear color LUT(s) can be used to show the grayscale image in mono-color for stereoscopic visualization of anatomical "landscape", as shown in Fig. 3(C). It is worthwhile to note that a mild texture ("iso-elevation-contour" pattern) appears due to the unevenness of the volumetric surface caused by limited imaging resolution, as shown in Fig. 3. For the two superimposed images, slightly different colored LUTs can make the existing texture colorful (similar to a diffraction pattern) that should not be misinterpreted as an image misalignment.
Four default LUTs, which are provided by the software for each of the image volumes, can be modified by the user, based on the image histogram, as well as the visualized volume. The criteria for establishing a suitable LUT are based on if a desired anatomical volume is visualized by adjusting Alpha-LUT. Linear RGB-LUTs can be used within the A-Window range that affects the mono-colored rendition of the grayscale image for optimal stereoscopic visualization. This anatomy-based visualization does not require a precise LUT function, as long as the registration images have a similar volume. Slight LUT differences may result in a color weighted appearance (producing base line color difference and local colored texture), but will not affect the global homogeneity of the color distribution.

A.2 Ray-casting algorithm
Ray-casting is an algorithm for image-ordered volume rendering. (36) The basic idea is to determine the pixel values in an image plane by sending an array of rays through these pixels into the scene based on the current camera settings, such as viewing angle, as shown in Fig. 2(A). For RGBA visualization format, the rays accumulate RGBA values along the way and blend them into the pixel for display, until the accumulated opacity (A) becomes unity (at which point all voxels are opaque), as shown in Fig. 2(B). The mathematical equations for the RGBA accumulation used in this study are discussed in a later section. In fact, a variety of blending functions can be applied to the ray-casting visualization, generating very different views of the same images, such as the maximum intensity projection (MIP). One major drawback for the ray-casting visualization is that it is fairly slow and it is necessary to use hardware-based volume rendering to achieve real-time performance. (37,38) FIG. 3. Volumetric views of two identical CT phantom images with simulated spatial shifts. Top row (A to C): translational shifts (Xt) of 0.5, 0.2 and 0.0 voxels (1 voxel = 0.78 mm) were applied to the aligned image (C); Bottom row (D to F): rotational shifts (Xr) of 0.5°, 0.2° and 0.0° were applied to the aligned image (F). The color homogeneity on the "skin" landmark improves as the alignment is improved. The translational (lateral) shifts appear mostly on the left and right sides of the image volume: the larger the surface grayscale gradient and the larger the surface oblique angle (between the ray and surface normal), the larger the visual color inhomogeneity would be. The rotational (around the superior-inferior axis through the center of the image volume) shift causes non-uniform displacements in the directions perpendicular to the rotational axis: the larger the distance of the viewing voxel to the rotational axis, the bigger the rotational displacement and so the more dramatic color inhomogeneity.

A.3 Volumetric image registration operations
Volumetric image registration relies on global views of the homogeneity of color distribution within the visualized volume. It emphasizes the use of multiple viewing angles because rigid transformations affect the volumetric alignment in a systematic fashion. Any systematic change in the color distribution of the image volumes reflects their relative alignment and indicates the adjustment required to improve the homogeneity of color distribution. Distinguishing a translational misalignment from a rotational one is straightforward. For instance, a lateral translational shift causes a lateral displacement in the color inhomogeneity. In contrast, an axial rotation (passing through the center of the volume) causes color inhomogeneity that increases radically from the rotational axis (the further the voxel is from the rotational axis, the more dramatic the inhomogeneity, as shown in Fig. 3). In addition, the opposing color biases due to the rotation show not only laterally, but also in any directions perpendicular to the axis of rotation. So, viewing the volume from multiple directions should facilitate distinguishing between a rotational misalignment from a translational one. The facial "landscape" plays a significant role as well in distinguishing between the two different shifts. In the extreme case, a spherical phantom would not provide any information on rotational shifts about its central axis, but a translational displacement will clearly show.
The ability to have multiple views in real-time is the key to identifying and eliminating any systematic misalignment. For multi-modality images, it is expected that there will be some local color bias due to contrast, content and resolution differences among imaging modalities. The objective is to look for an overall color distribution inhomogeneity displaying a systematic pattern. We recommend rotational adjustments, followed by translation to superimpose the images. This process iterates until a satisfactory result is obtained.

A.4 Four concurrent image registration
The image voxel buffers are designed to permit the registration of up to four concurrent image volumes. Any of the four image volumes can be selectively turned on or off, if there is a need to focus on fewer images. Only one image can be moved (translated or rotated) at any given time. It is recommended to register any two images sequentially, followed by the cross-verification and cross-adjustment among all four images. Fundamentally, the three primary colors (RGB) provide the limit in the number of images that can be simultaneously registered with visual tracking. Practically, a tertiary color (white) can be used to represent the fourth image. An ambiguity may be introduced since white voxels can result from either perfect alignment of RGB voxels or the white image, but this can be resolved by turning on and off the white image. In our clinical research, registration of four concurrent images has been performed whenever more than two imaging modalities are involved, including this study. Previously, it was reported that registration of a set of CT, MR (T1), MR (T2) and PET images was performed in a single process for pre-treatment planning and post-treatment evaluation. (35) Not only can this single process perform registration of up to four images, but also combine registration with visual verification. The registration of four concurrent images eliminates potential error propagation if only two images are allowed for registration and multiple images have to be registered sequentially. (27)

B.1 Image classification using opacity lookup table (A-LUT)
Image classification in the 3DVIR technique is achieved through a built-in opacity (A, or Alpha) value, which is assigned to each voxel through a lookup table, together with three pseudo-color (R, G, or B) LUTs, as the RGBA visualization format. (36) The A-LUT operation over the image histogram determines the visibility of the voxel content displayed, while R, G, and B-LUTs determine the color of the voxel. For any image point with intensity (I), the visible voxel intensity (VVI) can be obtained using the RGBA LUTs (f's) through a vector transformation: In a CT image, for instance, there are two distinct interfaces with large voxel intensity differences: skin/air and bone/soft-tissue boundaries. Based on the definition of CT number, these differences are as large as half the grayscale range. Therefore, both interfaces are readily extracted using the opacity LUT, controlled in real-time by the graphical user interface (GUI). In MR and PET, the skin/air (and brain/bone) interfaces also possess significant intensity differences, in both phantom and patient images. As a matter of fact, skin is one of the few complete anatomies shown in patient PET images.

B.2 Visual amplification of the alignment of classified landmarks
The VVI is a new dimension beyond the 3D volumetric space, in which image alignment is examined. Because of the large intensity differences at the interface of a selected landmark over a voxel displacement in space, it amplifies the signal in 3D space. As discussed above, skin/air and bone/soft-tissue interfaces possess very large intensity gradient. Mathematically, it can be expressed as: , or (2) where dVVI is intensity differential resulting from dD, which is a spatial displacement within a voxel (1 voxel ~ 1 mm), and f() is an amplification function.
With the introduction of the decimal precision in the transformation and interpolation of the registration, the spatial displacement of images can be a fraction of a voxel. When the image alignment is evaluated volumetrically, any systematic bias in intensity (color) at the landmark will indicate a misalignment, which serves as guidance for further alignment.

C.1 Retrieving the visible voxel intensity on an anatomic landmark via ray-casting
To quantify the visual 3D volumetric registration criterion, the visible voxel must be retrieved in real-time via a ray-casting algorithm. (37,38) A ray in a given viewing direction was cast through the center of a pixel on an image plane, representing one or a series of voxels along the ray in a volume, as shown in Fig. 2. The RGBA values of the voxel points along each ray were accumulated, producing a visible pixel intensity. The penetration depth of the ray (or the thickness of the visible voxel layer on the landmark) was set to shallow, by using a narrow opacity window. The matrix of pixels from an array of parallel rays formed a visual image of the volume.
Quantitatively, the following recursive functions were used for rendering the visible image using front-to-back blending (accumulation) of RGBA (colors and opacity): where i and i+1 represent current and next ray depth, respectively. Note: both accumulated opacity (A i Accum ) and voxel opacity (A i ) affect the volumetric visualization. When A i Accum < 1.0, a voxel is invisible if its opacity (A i ) equals zero. When A i Accum = 1.0, all voxels (> i) are invisible, since they do not contribute to the pixel RGB values (Equation 3).
When registering multiple image volumes, one ray may reach a visible voxel that has contributions from more than one image volume, if they coincide at that particular voxel. Each voxel buffer contains four fields, as shown in Fig. 1, and up to four image volumes can be registered simultaneously. The RGBA-LUTs affect only the image visualization (which is useful in identifying the first layer of visible voxels) but not the voxel data stored in the voxel buffer. The uniformity of the VVI contributions at the surface of the landmark is used in the quantified registration criterion, as discussed below.

C.2 Quantitative analysis of the homogeneity of color distribution in real-time
By definition, the visual homogeneity of the color distribution on a given anatomical landmark should have minimal variance in the visible voxel intensity difference (VVID) between any two mono-colored imaging modalities. Therefore, for registered images a random color distribution (snow pattern) should be seen on the landmark; whereas a misalignment should appear to have a systematic color-biased distribution (global alignment aberration), indicative of a systematic spatial displacement.
Uniform sampling across the image plane is used for calculating the criterion, and about 4% of the pixels are sufficient to correctly identify a registration point, while retaining real-time performance. For any visible voxel (i), the VVID is: where I i A and I i B (< 256 = 8 bits) are the VVI from images A and B, respectively. For all sampled voxels, the variance of the VVID can be expressed as: (5) where ΔI = Σ ( ΔI i / N ) represents the average of the VVID and N is the total number of pixels sampled, excluding completely transparent rays. In the case of two identical images, the variance of VVID decreases as the image alignment improves, approaching zero with a perfect alignment.
In multi-modality image registration, the average voxel intensity of an anatomical landmark can differ dramatically. Owing to high baseline differences between two given modalities, the VAR value can become insensitive to the VVID. The sensitivity is substantially improved by incorporating a weighting factor (R): (6) and the Equation (5) is modified, producing an intensity-weighted variance: where ΔI * = Σ (ΔI i * / N ) is the average of modified VVID (ΔI i * = I i A /R -I i B ). This quantitative measure, when minimized, indicates an optimal registration, which can be independently verified by visual examination, avoiding local minima.

D. Tests for quantitative and visual detection limit of the registration criteria
Two identical CT images (red and green) with simulated rotational or translational shifts were used to evaluate the quantitative and visual detection limit of the 3DVIR criteria, using the variance analysis and eight independent observers. Superimposition of the images produced a yellow image, due to the color blending of equally weighted red and green contributions, as shown in Figs. 3(C) and 3(F). When one of the two images was slightly rotated or translated, the misalignment produced an inhomogeneous color distribution, as shown in Figs.

3(A), 3(B), 3(D) and 3(E).
For the quantitative test, incremental image shifts of 0.1° and 0.1 voxel, relative to the registration point, were simulated and used. The VAR value was calculated in different views for each of the image shifts. These values were plotted resulting in a well-shaped curve.
For the visual test, eight observers were asked to identify color biases in 12 images, which contained various shifts of 0.0, 0.1 and 0.2 units (degrees or voxels; 1 voxel = 0.78 mm) in any of the six degrees of freedom. These images were randomly presented to the observers as a slide show. A correct determination scored one point and an incorrect one scored zero. Statistical analysis of the results yielded a visual detection limit for the 3DVIR criterion.

E.1 Head phantom positioning accuracy
In all three imaging studies, the phantoms were immobilized with a head holder. Graph paper with 0.13 mm lines (1 mm grid), which were verified using a reference line, were taped on the scanner couches and on four sides of the phantom holder. A magnification glass was used for the line alignment at each side. The phantom was displaced at a regular interval of 5.0±0.1 mm between scans. The positioning uncertainty was limited by the width of the gridline (< 0.13 mm). The room lasers were not employed for phantom alignment and displacement, because the width of their projected line exceeded the desired alignment accuracy. Typically, four positions with three lateral shifts were used for image acquisition. Lateral uncertainty of the couch position was negligible (within ±0.1 mm) since it is not movable laterally. Longitudinally, however, the couch positioning uncertainty of < ±0.5 mm (based on the manufacturer's specification) was bigger than the gridline width (0.13 mm), dominating the longitudinal uncertainty. Therefore, the lateral comparisons were used for the accuracy evaluation of translational alignment. Experiments with rotational shifts were not conducted due to difficulties in accurate phantom setup using the graph papers, as well as the unavailability of an accurate shifting device for rotation. However, the rotational registration accuracy evaluation using experimental data remains our interest and will be examined in the future.
All images were pre-processed automatically by tri-linear interpolation to have an image size of 320×320 and an isotropic voxel size (~ 1 mm, varying with modalities, see below) with an 8 bit grayscale, prior to registration. For multi-modality image registration, the image field of view was kept the same for both modalities.

E.3 MR head phantom image acquisition
A water-based head phantom with internal structures, as shown in Fig. 4(B), was scanned in a 1.5T MRI scanner (Intera, Philips Medical Systems). Four axially-scanned images were acquired with lateral shift intervals of 5.0±0.1 mm. The images were processed to exclude the geometrically-shaped external voxels outside the skull. The "brain" image, as defined by the inner surface of the skull, was extracted for registration. The original voxel size was 0.90 × 0.90 × 2.0 mm (with an original image size of 256 × 256), and the reformatted voxel size was 0.72 × 0.72 × 0.72 mm (in the final image size of 320 × 320).

E.4 PET/CT head phantom image acquisition
An anthropomorphic head phantom filled with 18

E.5 Alignment of the PET/CT scanner
The PET/CT scanner alignment was determined using a solid rod phantom ( 68 Ga/ 68 Ge), 9.5 mm in diameter and 312 mm in length. The rod was placed with its axis parallel to the direction of couch motion and normal to the image plane. The rod image was arbitrarily divided into three segments, which were used as independent measures. In both PET and CT images, the identical region of interest was applied and the centers-of-mass (or centers-of-activity) were calculated for comparison. Because the rod was uniform in activity and density, the center of mass should be coincident with the center of geometry. Therefore, the difference between the two centers was indicative of the alignment quality of the combined-modality scanner.

E.6 Expression of the Registration Accuracy
The accuracy of image registration is stated with its precision using standard deviation. The unit of accuracy can be expressed in degree for rotational shifts and in voxels and/or in millimeters (mm) for translational shifts. In physical space, by definition, a voxel is the smallest unit of an image volume, so its size should carry a volumetric unit, such as mm 3 . In image space, however, a voxel is a dot (often isotropic) with an assigned grayscale, and is used as a unit length for any image operation. For the sake of simplicity, the lengths of a cubic (voxel) edge in mm are frequently used to describe the voxel size. A linear scaling relationship exists between voxel and mm, such as 1.0 voxel = 0.78 mm in CT, 1.0 voxel = 0.72 mm in MR, and 1.0 voxel = 1.0 mm in PET/CT. Here, we have used either voxel and mm, or mm alone, as the unit for image transformation and registration accuracy, as well as in comparison with previously reported results. Clinically, the units of mm and degree are more preferable. Fig. 5(A) shows the quantitative measure (VAR) of color homogeneity versus rotational or translational shifts in the lateral direction with increments of 0.1° or 0.1 voxel (0.08 mm). Two identical images superimposed perfectly at the registration point, resulting in a null variance of VVID (VAR in Equation 5) and uniform color homogeneity (Fig. 3(C)). The variance increases exponentially with relative image displacement, forming a well shaped curve. This is consistent with the color inhomogeneity increase, as shown in Fig. 3. The curve slope is steeper in the anterior view than the superior view, suggesting that the detection sensitivity is higher due to greater anatomic "landscape" details. Fig. 5(B) shows normalized curves from four, co-registered PET/CT images using the modified VVID variance (mVAR in Equation 7). These curves demonstrate an excellent agreement (0.05±0.09 mm or voxel) between the hardware and software registration, indicating that the mVAR provides an accurate measure of the quality of PET/CT registration. The lateral alignment of the combined PET/CT scanner was determined to be < ±0.1 mm. Again, the slope of the well curves is steeper in the anterior than in the superior view. Comparing with CT/CT registration curves in Fig. 5(A), the curves in Fig. 5(B) are much shallower, indicating a relatively low sensitivity of the "skin" voxel alignment in PET/CT images. Fig. 6 shows the anterior views of three PET/CT images with shifts of -0.5 mm, 0.0 mm and +0.5 mm, relative to the co-registration point. The accuracy is independent of imaging modalities because the registration criteria are built in the 4th dimension beyond the 3D image space.  Table 1 shows the results for determination of the visual detection limit. Under the experimental conditions, two coincident color images were correctly identified by observers as homogeneous ( Fig. 3(C)) with a success rate of 94%. For images misaligned by 0.2 voxels (0.16 mm) of translation or 0.2° of rotation (as illustrated by Figs. 3(B) and 3(E)), the color inhomogeneity was identified with a 100% success rate by all eight observers. When the misalignment was reduced to 0.1° and 0.1 voxels (0.08 mm), inconsistency started to occur. However, the average success rate was still 80%. Interestingly, the two lateral shifts (δXr and δXt in Table 1) had a 100% success rate, suggesting that the results were dependent upon image orientation and could be improved by providing additional volumetric views. Therefore, the visual detection limit for identifying color inhomogeneity using skin as the volumetric landmark was determined to be 0.1° and 0.1 voxel.  Table 2 shows the result of the CT/CT phantom image registration accuracy as: 0.03±0.12 voxels (0.02±0.09 mm) in lateral (x) direction and 0.33±0.27 voxels (0.27±0.21 mm) in longitudinal (z) direction. This variation was attributed to the longitudinal uncertainty (< ±0.5 mm) in the couch positioning (movable), while the lateral uncertainty of the couch (unmovable) was negligible. Therefore, the lateral comparison provides the best measure of registration accuracy, yielding a value of ~ 0.1 mm, which is similar to the experimental accuracy of phantom positioning. As a by-product, this phantom experiment using 3DVIR can provide a quality assurance assessment of couch mechanical accuracy. Fig. 7 shows the CT/CT registration using bony landmarks, which were visualized by changing the A-LUT interactively. Using the independent bony landmark, the 3DVIR registration accuracy remains unchanged, suggesting that the internal and external landmarks are equally reliable in the phantom image registration. This finding provides flexibility in the selection of landmarks as well as the ability to cross-verify registration of rigid images. Bony anatomy can be used as a more reliable landmark when motion or deformation of soft tissue is present.

C. Accuracy of CT/CT phantom image registration
Comparison of the CT phantom translational shifts and CT/CT registration shifts. The phantom "skin" was used as the 3D volumetric image registration landmark, and the visual color homogeneity was used as registration criterion. The registration shifts were calculated by physical distance (mm) = (voxel shift) × (voxel size), where 1.0 voxel = 0.78 mm. The registration was cross-confirmed using the "skull" as landmark. The uncertainty of the CT couch positioning was specified within ±0.5 mm in longitudinal direction and within ±0.1 mm in lateral direction by the manufacturer. The phantom positioning had ±0.1 mm uncertainty by aligning reference lines of 0.13 mm in width under optical amplification.  Table 3 shows the results of the MR/MR image registration accuracy using 4 images shifted laterally by 5.0±0.1mm intervals. The accuracy was found to be 0.04±0.10 voxels (0.03±0.07 mm), using the phantom "brain" (inner skull) as the registration landmark. In MR images, the brain interface is intact since there is a natural grayscale change at the boundary with the skull, which possesses void voxel. This can serve as another internal landmark for 3DVIR, with an accuracy consistent with that of the CT/CT registration.

E. Alignment of the PET/CT scanner and co-registered PET/CT images
The agreement between the center of mass for CT and PET images of the rod phantom was 0.05±0.13 mm laterally and 0.00±0.18 mm vertically. Therefore, the alignment of the combined PET/CT scanner was determined to be within ~ 0.1 mm in the image plane. Four co-registered PET/CT images were used to validate the quantitative mVAR criterion using the "skin" landmark, as shown in Figs. 5(B) and 6. The agreement between the experimental co-registration and the mVAR curve prediction was found to be within 0.1 voxel (mm). The PET "skin" volume was defined using the CT volume as reference, and the presence of the thin phantom wall did not affect the registration. Table 4 shows the registration accuracy for PET/CT images to be 0.03±0.35 voxels (mm) using visual criterion, and 0.05±0.09 voxels (mm) using the quantified (mVAR) criterion. Both results show similar accuracy, but higher precision using the quantitative criterion. Due to the low quality of the PET image, the quantitative measure produces a more reliable indication than the visual judgment.

F. Accuracy of PET/CT, PET/PET and CT/CT phantom image registration
Using the same image sets, the accuracies of CT/CT and PET/PET registration were found to be 0.04±0.10 voxels (mm) and 0.09±0.31 voxels (mm) respectively, using the visual criterion, similar to previous results.

A. Accuracy and reliability of single and dual modality image registration
The registration results of CT/CT, MR/MR, PET/PET and PET/CT images show an excellent agreement between physical shifts in phantom position and corresponding image registration shifts, resulting in an overall accuracy of 0.1 voxel (~ 0.1 mm), independent of imaging modality. In most cases, multiple volume surfaces serve as landmarks for the 3D registration, providing a mechanism for rapid cross-verification. Using both quantitative and visual criteria, together with both superficial and internal landmarks, the volumetric registration offers a more reliable and versatile tool for clinical use.
A separate assessment of rotational accuracy was not performed experimentally due to difficulty in accurate determination of rotational shifts experimentally and dependency of the result upon the location of the rotational axis. Using simulated image sets with rotational shifts, a visual detection limit of 0.1° was shown. Clinically, a 0.1° rotation may cause > 0.1 mm (voxel) displacement at the surface, since the center of volume is used as the center of rotation. For instance, assuming a head with ~200 mm separation, a 0.1° rotation will result in a voxel displacement of R × sinθ = 100 × sin 0.1° = 0.17 mm at the surface. This is above the detectable limit (0.1 voxel, which is ~ 0.1 mm in this study). For larger anatomies, such as torso, the bigger radical distance from the center of volume should produce larger spatial shift at the surface. So, an accuracy of 0.1° for rotational alignment was estimated.
The 3DVIR criteria are built into the VVID dimension beyond 3D space, resulting in a 0.1 voxel (mm) detection limit and accuracy. This phenomenon is resulted from the amplified projection from 3D space to the VVID space (Equation 2), so that a small spatial shift can result in a large color difference. In the cases of skin and bony landmarks, the interface has high contrast (spanning half of image grayscale), enabling detection of fractional voxel misalignment, as illustrated in Figs. 3 and 7. Note that the subtle texture ("iso-elevation contours") on the image results from the limited imaging resolution (particularly 2 mm slice thickness); (36) such monocolored or colored local visual effects (or artifacts) can be eliminated using uniform RGB-LUT settings. This volumetric visualization knowledge is useful in distinguishing local visual artifacts (the volume surface is not sufficiently smooth, but composed of multiple facets) from a systematic bias in color distribution (due to misalignment).

B. Applicability of the 3DVIR accuracy to clinical patient image registration
As indicated above, the accuracy of the 3DVIR technique relies upon three key factors: (1) rigid image assumption, (2) volumetric alignment criteria and (3) visualized external and/or internal anatomical landmarks. Two major factors prevent direct translation of the accuracy from this phantom study to clinical patient image registration: (1) patient motion and (2) organ deformation. In the presence of small, random, and rigid motion, which is restricted by using an immobilization device during image acquisition, the 3DVIR can tolerate small volume increases (due to blurring) by readjusting the A-LUT to achieve a volume match between images. For patient PET/CT images, the PET skin, although low quality, is one of the few complete anatomic landmarks identifiable and can be employed as a volumetric landmark for PET/CT image registration. In the presence of organ deformation, the use of soft-tissue landmarks will likely introduce a systematic uncertainty. However, the 3DVIR technique allows registration using motion-free bony landmarks, as shown in Fig. 7. Therefore, the registration accuracy remains unchanged by using stable bony landmarks based on this study. In general, due to the uncertainties in patient setup and patient motion, the image registration accuracy may be reduced, but it should roughly remain in sub-voxel (sub-mm) scale. Therefore, this volumetric registration is potentially useful in IGRT patient setup with minimal motion interference, as well as for frameless intra-/extracranial stereotactic radiosurgery/radiotherapy.

C. Comparison of the 3DVIR technique with the 3P fusion method
The 3P fusion is based on three orthogonal 2D views of two image volumes at any given time. In order to derive 3D information, all slices in the three orthogonal directions must be viewed sequentially, and reviewed every time the image alignment is adjusted. Additionally, the synthesis of 3D information is dependent upon the cognitive ability of any given observer. Therefore, it is both time consuming and error prone. (24)(25)(26)(27) It is also limited to single pixel precision.
In contrast, the 3DVIR technique reconstructs and visualizes entire image volumes for the observer, who can evaluate the alignment "on-the-fly". The quantitative criterion can be employed to further minimize user dependency, especially in the fine tuning stages. More profoundly, the 0.1 mm accuracy holds not only for the registration of anatomical images, but for functional images as well. The 3DVIR technique presents volumetric images in such a way that the alignment process is reduced to simply merging two objects in "virtual reality" (without perspective visualization). As reported previously, the 3DVIR takes only about one-third the time required by the 3P fusion to achieve a registration. (27) By using an automatic registration for a pre-alignment, the performance of the 3DVIR can be further enhanced by at least a factor of three.

D. Comparison of the 3DVIR technique and MI-based automatic registration method
Most automatic (rigid) image registration methods based purely on voxel intensity uses all voxels in the fused images, including those of moving organs. Therefore, this form of registration has self-imposed limitations, due to motion artifacts and deformation of soft tissues caused by respiratory, cardiac, digestive and muscular motion. Additionally, an automatic registration based on this methodology, although reproducible, may contain systematic errors. (34) Therefore, a visualbased manual fusion is always required to verify and often required to adjust the automatic registration results utilizing specific clinical knowledge. (13) It was reported that the 3DVIR possesses similar registration accuracy as the MI-based registration for cranial images, (27) so the 3DVIR can be used to evaluate and adjust the automatic result without reducing the overall registration accuracy.
Recently, efforts have been made to incorporate segmentation information into automatic image registrations, by filtering the images so as to alter their voxel weights. (14) The success rate of prostate registration was improved from 65% to 83% by eliminating high contrast voxels (air and bone) using grayscale filters. (33) Semi-automatic registration with the assistance of manually generated anatomical contours was reported to be useful. (19) The "hybrid" image registration combined with segmentation and visualization has become a trend in pursuing better image registration, especially deformable image registration. (6,39) The 3DVIR technique registers anatomical landmarks, which are extracted by image classification prior to visualization. These selectively classified landmarks can be more reliable than other voxels, especially when bony landmarks are employed, since they are rigid, well-defined, and motion-free (such as the spine). When soft tissues are rendered transparent, the organ motion and deformation can be ignored. Based on the quantified registration criterion, implementation of an automatic registration is the next logical step.

E. Future direction: a potential semi-automatic 3DVIR technique
The current 3DVIR technique provides both visual and quantitative registration criteria, which uniquely combine to minimize user dependency. In contrast, none of the visual based manual image registration methods has such a feature, but depends solely upon a user's visual judgment. (25)(26)(27)(28)(29) The real-time variance analysis provides a tool that is helpful in determining subtle differences in color homogeneity in the "fine tuning" of the registration, while the visual criterion provides both verification and visual guidance throughout the registration process. For multiple modalities, various VVI levels may affect the color baseline, but not the color distribution. This provides a statistical basis for the use of mVAR as a highly sensitive registration indicator (Equation 7). Multiple volumetric views from a variety of angles are helpful to view the global registration volumes. This quantitative criterion provides a foundation for future semi-automatic registration algorithms. The registration landmark selection and classification must be done manually, while "fine tuning" of a coarse manual 3DVIR alignment can be performed automatically.

V. CONCLUSION
The accuracy of 3D volumetric image registration of CT, MR and PET/CT images was found to be 0.1 voxel (~ 0.1 mm) and estimated to be 0.1°, based on the three phantom studies. Both superficial (skin) and internal (bone and brain) voxels were found to be suitable as volumetric landmarks. The quantitative registration criterion was found to be as effective as the visual criterion for registration, but provided a higher degree of precision. The capability of using both visual and quantitative measures makes the 3DVIR technique an effective, reliable, and accurate tool for the clinical use. The intrinsic classification and visualization provided by this technique allows registration of bony landmarks, while eliminating interference from organ motion and deformation. In the future, this quantitative criterion can be employed to produce a semi-automatic 3D volumetric image registration.