Dosimetric validation and clinical implementation of two 3D dose verification systems for quality assurance in volumetric‐modulated arc therapy techniques

A pretreatment quality assurance program for volumetric techniques should include redundant calculations and measurement‐based verifications. The patient‐specific quality assurance process must be based in clinically relevant metrics. The aim of this study was to show the commission, clinical implementation, and comparison of two systems that allow performing a 3D redundant dose calculation. In addition, one of them is capable of reconstructing the dose on patient anatomy from measurements taken with a 2D ion chamber array. Both systems were compared in terms of reference calibration data (absolute dose, output factors, percentage depth‐dose curves, and profiles). Results were in good agreement for absolute dose values (discrepancies were below 0.5%) and output factors (mean differences were below 1%). Maximum mean discrepancies were located between 10 and 20 cm of depth for PDDs (‐2.7%) and in the penumbra region for profiles (mean DTA of 1.5 mm). Validation of the systems was performed by comparing point‐dose measurements with values obtained by the two systems for static, dynamic fields from AAPM TG‐119 report, and 12 real VMAT plans for different anatomical sites (differences better than 1.2%). Comparisons between measurements taken with a 2D ion chamber array and results obtained by both systems for real VMAT plans were also performed (mean global gamma passing rates better than 87.0% and 97.9% for the 2%/2 mm and 3%/3 mm criteria). Clinical implementation of the systems was evaluated by comparing dose‐volume parameters for all TG‐119 tests and real VMAT plans with TPS values (mean differences were below 1%). In addition, comparisons between dose distributions calculated by TPS and those extracted by the two systems for real VMAT plans were also performed (mean global gamma passing rates better than 86.0% and 93.0% for the 2%/2 mm and 3%/3 mm criteria). The clinical use of both systems was successfully evaluated. PACS numbers: 87.56.Fc, 87.56.‐v, 87.55.dk, 87.55.Qr, 87.55.‐x, 07.57.Kp, 85.25.Pb

is traditionally based on manual monitor unit (MU) calculation methods for 3D conformal radiotherapy (3D CRT) treatments. (3)(4)(5)(6)(7) The complexity present in the modulated treatments requires an introduction of a comprehensive quality assurance program aimed at its implementation. (8) Such QA routines must take into account two approaches. On the one hand, an independent verification of the TPS dose calculations should be carried out. One way to fulfill this requirement consists of the application of Monte Carlo calculations for the independent verification of the treatment plan. (9) The main limitation in the application of these techniques is the calculation time. Other solutions are based on simpler algorithms, 10 like modified Clarkson methods (11) and extensions with the inclusion of head scatter. (12) In addition, the pretreatment QA measurement-based process must be considered to ensure the correct information flow from TPS plan calculation to treatment delivery in the linac by means of the record and verify system (R&V). The usual method to perform this QA consists of comparing dose distribution measurements acquired with phantoms/detectors of regular geometries with TPS calculations made under the same conditions. (13)(14)(15) Volumetric treatments have incorporated specifically developed solutions for these techniques. (16,17) Dose distribution comparisons tend to involve gamma index-based analyses. (18) Several studies have shown tolerances and action levels in the IMRT treatment verifications (14,(19)(20)(21)(22) by means of the previous methods.
The current clinical research related to the verification and QA in IMRT treatment delivery, however, has introduced a fundamental issue. Commercial solutions for redundant verifications in modulated treatments have usually assumed simple situations, like homogeneous geometries or single-point calculations, (14,23,24) which are results with no clinical relevance. Likewise, the results derived from the usual individualized pretreatment QA tools have not been related with clinically relevant dosimetric errors on patient dose delivery. (25,26) The results of the measurements and analyses performed in pretreatment IMRT QA must be suitably correlated with implications of possible mistakes during TPS calculations and real treatment delivery on the basis of new clinically relevant metrics. The background to set up these metrics must be the patient dose estimation from typical QA measurements. If the reconstructed dose on patient CT could be performed from measurements, then clinically relevant parameters, such as dose-volume histograms (DVH), could be extracted. In addition, redundant calculations must be considered in the same scope. (25) Recently, new systems that allow setting the acceptance criteria for modulated treatments from DVH-based metrics have been introduced. (27) These solutions are further necessary in VMAT QA, where the synchronization of all variable parameters raises the complexity in treatment delivery from traditional IMRT techniques. Two-dimensional (2D) ion chamber arrays, together with the suitable accessories, are adequate tools to extract as much information as possible from dynamic treatments. (28) This paper shows the commissioning, comparison, and clinical implementation of two systems that allow performing 3D redundant dose calculations for VMAT secondary verifications. In addition, the second one is capable of reconstructing the dose on patient anatomy from measurements taken with 2D ion chamber arrays.

A. Treatment unit and TPS
VMAT treatments were delivered in our institution with a 6 MV Synergy (Elekta, Stockholm, Sweden) machine. Plans were generated with Monaco 3.1 (Elekta).

B. Mobius3D system description
Mobius3D software (Mobius Medical Systems, Houston, TX) provides an independent dose calculation engine aimed at the verification of treatments generated by TPS. DICOM treatment plan data (CT images, RT Plan, RT Struct and RT Dose) are needed as initial information. Mobius3D utilizes stock reference values for common linear accelerators to model beams. Users can choose these average models or fit usual parameters, such as percentage depth dose curves (PDDs), output factors (OFs), and off-axis ratios (OARs), to scale the model correctly. In order to model the fluence, the system starts from a uniform map, adding layers of specific features for each linac (for instance, MLC characteristics and transmission or flattening filter properties). The software uses a collapsed cone convolution/superposition algorithm (29)(30)(31) independently developed and updated from its original conception. (32)(33)(34)(35) The algorithm is accelerated through graphic processing units (GPUs). A set of 144 isotropically spaced cones are used for each calculation point. Point dose kernels have been obtained with some refinements, (36,37) compared with the original study by Mackie et al. (30) GPU-based calculations increase the calculation speed significantly compared with CPUs. C. COMPASS system description COMPASS (v. 3.0) (IBA Dosimetry, Schwarzenbruck, Germany) consists of two different elements: a detector device and calculation, reconstruction, and analysis software. The underlying idea is to reconstruct the dose on patient CT from measurements taken with the associated detector. In addition, it provides an independent dose calculation engine that ensures a redundant verification of TPS treatment, as the Mobius3D system. Below, a detailed description of each element is performed.

C.1 Detector device
The detector system is a 2D ion chamber array (MatriXX Evolution , IBA Dosimetry). It consists of 1020 ion chambers with 0.08 cm 3 that covers an active area of 24.4 × 24.4 cm 2 (the distance between them is 7.619 mm). The versatility of the device is well known both for QA of treatment units and IMRT and VMAT verification. (38,39) As the detector element in the COMPASS system, it must be attached to the treatment unit head with a holder in order to ensure a rigid rotation of the device with the gantry. A buildup layer of 2.5 cm can be placed on the device, into the holder. With this arrangement (Fig. 1), the source-to-detector distance is 100 cm. The measurements taken under reference conditions in water with those performed in plastic. The isocenter of the test plans was matched with the phantom center. The measurement point was selected inside the PTV in all cases.

D.3 Validation with real VMAT patient plans
In order to evaluate different types of PTVs and locations, VMAT plans for four anatomical sites were generated with the TPS: head and neck (two treatments), thoracic (two lung treatments), abdominal (two gastric treatments) and pelvic (six prostate treatments, taking two from each usual staging: high-, intermediate-, and low-risk). Representative point-dose values obtained by the two systems were compared with ion chamber measurements (CC04) performed on the EasyCube phantom with the same arrangement. Likewise, planar dose distributions measured with MatriXX were compared with those generated by Mobius3D and COMPASS, with the same experimental setup, by means of gamma analysis (2%/2 mm and 3%/3 mm, global normalization to maximum with a low-dose threshold at 10% of global maximum). It consisted of the detector array inserted in a homogeneous cubic phantom (MultiCube) (Fig. 1). The phantom was stationary on the linac couch while treatment was dynamically delivered on it. The thickness of both the anterior and backscatter buildup layers was 11 cm. Detector setup robustness allows different arrangements in order to perform coronal and sagittal measurements. MatriXX dose measurements were dependent on the angle of the beam. Angular correction factors must be incorporated to solve this dependency. (39) Angular information could be collected with the COMPASS angle sensor previously described. OmniPro I'mRT (IBA Dosimetry America, Inc.) was used as analysis software.

E.1 Clinical implementation tests with static square fields and dynamic TG-119
test plans Previous static regular fields were calculated with TPS over TG-119 test cases in order to test DVH comparison modules. The dose received by TG-119 test plan structures was determined by the two systems for the previously described regular and VMAT plans. Relevant dosimetric parameters, according to the TG-119 report, were extracted and compared with TPS values for each test and structure.

E.2 Clinical implementation with real VMAT patient plans
Previous VMAT plans for each anatomical site were compared with TPS values using the clinical metrics previously defined. The process was carried out by evaluating representative dosimetric parameters from DVHs. ICRU recommendations for recording and reporting IMRT treatments (ICRU Report 83) (41) were used to extract evaluation parameters for PTVs (D 98 , D 2 , D 50 , D mean ). Maximum and mean doses were obtained for ORs. In addition, classical (42) and recently reviewed dose constraints (QUANTEC) (43) were reported for normal tissue. Comparisons with TPS by means of global gamma passing rates for all structures were reported with two criteria (2%/2 mm and 3%/3 mm, global normalization to maximum, with a low-dose threshold at 10% of global maximum). The COMPASS system is capable of reporting local gamma 3D analysis, in contrast to Mobius3D. Local 2%/2 mm gamma passing rates were also reported for COMPASS dose calculation and reconstruction.

F. Remarks about TPS dose calculation for plan verification
Collapsed cone algorithms of Mobius3D and COMPASS are based in dose engines that perform and report calculations in terms of the absorbed dose to water (D w ). In order to take into account patient heterogeneities properly, media are considered as water with different electronic densities. All TPS calculations presented in this study were performed using Monaco 3.1 (Elekta), with a Monte Carlo calculation algorithm, working in terms of the absorbed dose to medium (D m ). However, clinical implementation of Monte Carlo algorithms can lead to significant discrepancies between D w and D m . (44)(45)(46) In the AAPM report of the Task Group 105, (45) recommendations about the conversion of D m to D w and its reporting have been described. Previous discussion led to performing the DVH-based comparisons with the same criterion (D w ). All plans described in the present study were initially planned in terms of D m and recalculated in terms of D w . (Monaco has implemented the two features.)

G. Statistical analysis
Results were described as mean ± standard deviation (SD). Data were compared using a paired and two-tailed Student's t-test. The difference was considered statistically significant for p-values < 0.05.  Table 1. Discrepancies were below 0.5%.
Mean differences and gamma passing rates obtained with two different criteria for PDDs are shown in Table 2. Differences increased with depth, and they were statistically significant for comparisons between M3D and COMPASS in the region between the maximum and a depth of 20 cm (maximum mean difference of -2.7% ± 0.2% for M3D). Passing rates were better for COMPASS than those from M3D for both gamma criteria (p-values of 0.03 and 0.02 for both CC and CR and for the 2%/2 mm and 3%/3 mm criteria, respectively).
Profile comparisons are shown in Table 3, computing mean differences (for outside and inside the field regions), mean DTA values (for penumbra regions), and gamma passing rates with two different criteria. Maximum discrepancies for in-plane and cross-plane sections were found outside (mean value of 1.4% ± 1.8%) and inside the treatment field (mean value of -0.9% ± 2.2%) for the M3D results. Maximum DTA values were found for M3D for both in-plane (mean value of 0.9 mm ± 0.6 mm for 3 × 3 cm 2 field) and cross-plane (1.5 mm ± 0.2 mm for 20 × 20 cm 2 field) sections. For in-plane profiles, COMPASS results were better than M3D results outside the beam region (p < 0.01 for both CC and CR comparisons). In addition, CC results were better than CR results outside and inside the field (p = 0.01 and p = 0.02). In the penumbra region of the in-plane sections, COMPASS results were better than M3D results (p < 0.01 for CC and p = 0.01 for CR comparisons). For cross-plane profiles, CC results were better than M3D results outside the beam (p = 0.03). Furthermore, CC results were better than CR results in the beam region (p = 0.01). In the penumbra region of the cross-plane sections, COMPASS results were also better than M3D results (p = 0.01 for both CC and CR comparisons). The remaining differences were not statistically significant, including gamma passing rates for profile comparisons.  Table 3. Comparisons for profiles with reference measurements taken with water tank, for Mobius3D, Compass dose calculation, and reconstruction. Profiles were divided into three regions (outside the field, penumbra, and inside the field) for both in-plane and cross-plane sections. Mean differences were extracted for the regions outside and inside the field. Mean distance-to-agreement (DTA) values were extracted for penumbra regions. In addition, gamma passing rates (local normalization with no low-dose threshold) were extracted with two different criteria (2%/2 mm and 3%/3 mm).

COMPASS Calculation
In-plane

A.2 Validation tests with static square fields and dynamic TG-119 test plans
Differences between point-dose values obtained by both systems and measured with ion chamber are shown in Fig. 2. Mean discrepancies were 0.9% ± 1.3%, 0.5% ± 0.8%, and 0.9% ± 0.7% for M3D, CC, and CR, respectively. Comparisons did not show statistical relevance.

A.3 Validation with real VMAT patient plans
Point-dose measurements taken for each treatment and comparisons with M3D and COMPASS results are shown in Fig. 2. Mean discrepancies were 0.1% ± 1.0%, 0.5% ± 1.2%, and 1.2% ± 0.9% for M3D, CC, and CR, respectively. The best results were found for M3D (p < 0.01 for comparisons with CR). There was no statistically significant difference for other comparisons. Mean gamma passing rates for coronal and sagittal dose planes measured with the MatriXX+MultiCube set compared with those extracted from both systems are shown in Table 4. Mean values were better than 87.0% and 97.9% for the 2%/2 mm and 3%/3 mm criteria, respectively. Differences in mean gamma passing rates between both systems were not statistically significant (lower p-value was 0.06 for comparison between M3D and CR in the coronal plane with the 3%/3 mm criterion).

B.1 Clinical implementation tests with static square fields and dynamic TG-119
test plans Differences between TPS, dose calculation, and reconstruction for dosimetric parameters analyzed for each structure set are shown in Table 5. Larger differences were found for high-dose regions (D 99 in H&N, D 99 in superior and inferior volumes for MultiTarget) and for parotid glands in H&N. M3D results were better than COMPASS results for D 99 and D 10 in MultiTarget inferior volume (p = 0.04 in both cases) and for H&N cord volume (p = 0.02). CC results were better than M3D results for D 10 in MultiTarget superior volume (p = 0.02) and right parotid gland (p = 0.05) and better than CR results for D 20 in H&N (p < 0.01). The remaining differences were not statistically significant. The best and the worst mean values for all parameters were observed for CC and CR, respectively. Comparisons of mean values for all TG-119 parameters did not show statistical relevance.

B.2 Clinical implementation with real VMAT patient plans
Values for the dosimetric parameters previously described are shown in Table 6. For all the parameters, mean values were 0.0% ± 2.3%, 0.6% ± 1.1%, and -0.0% ± 1.6% for M3D, CC, and CR, respectively. Difference was statistically significant for comparisons between both COMPASS results (p = 0.01). Differences between M3D and COMPASS were not statistically significant. Mean gamma passing rates for all structures are shown in Table 7 for three different criteria. For the local gamma tolerance, lower mean values were found for CR applied to H&N and lung treatments. For the 2%/2 mm global gamma tolerance, lower mean values were found for M3D and CR applied to gastric and high-risk prostate treatments, respectively. CC results were better than M3D results for the 2%/2 mm global gamma criterion and better than CR results for both the 2%/2 mm and 3%/3 mm global gamma criteria (p < 0.01 in all cases). The remaining differences were not statistically significant.
Gamma passing rates for the entire anatomical volume were extracted for the 12 real plans, with the previous local and global criteria. Results are shown in Table 8. Low passing rates were observed while analyzing the total volume. In order to clarify these values, 3D gamma distributions for a CC lung and CR high-risk prostate cases are shown in Figs. 3 and 4, respectively, with gamma criteria ranging from 2%/2 mm local to 3%/3 mm global tolerance. The 2%/2 mm  local gamma test highlighted gamma failing points, mainly located in low-dose regions, where a local test could be more sensitive. The gamma passing rates observed for ORs were lower than those observed for PTVs for local gamma tests due to this effect. Relaxing gamma criteria from local to global 2%/2 mm tolerance resulted in a reduction of failing points in low-dose regions, showing problematic areas in PTVs and surrounding regions (total passing rates with the 2%/2 mm global gamma criterion were better than 92%, excluding the second gastric treatment for M3D [81.8%]). The last step to 3%/3 mm led to passing rates better than 97.6% in all cases.

A.1 MatriXX spatial resolution
Geometrical resolution of 2D detector arrays is limited due to the size of each single detector. Strictly, Mobius3D-and COMPASS-calculated values in planar analysis should be convolved with the detector response function and then compared to MatriXX measurements. (47) This problem, described as a limitation of the present study, is not observed in measurement-based dose reconstruction performed by COMPASS because the system inherently applies this correction.

A.2 Gamma passing rate metric in this study: application of tighter tolerances in gamma analysis
A 3%/3 mm gamma passing rate metric is commonly used in QA tasks. (21) However, common metrics may reduce the sensitivity of systems involved in patient-specific QA processes. (15,25,26,(48)(49)(50) A recent study by Nelms et al. (50) suggested performing a more stringent gamma analysis, restricting traditional tolerances. Validation of both systems, comparing planar dose distributions by means of gamma analysis, involved 2%/2 mm and 3%/3 mm global gamma criteria. Clinical implementation of the two systems made use of 2%/2 mm and 3%/3 mm global tolerances in volumetric dose comparisons and introduced a 2%/2 mm local gamma test for COMPASS results. The implementation of previous tolerances (local/global) in the analysis of each case (planar/volumetric) was a limitation of this study. OmniPro I'mRT was used to perform 2D gamma analysis (validation of the systems). Mobius3D was used to perform 3D gamma analysis (clinical implementation of the systems). Global gamma normalization is the only tolerance available in the previous solutions. COMPASS, however, is able to perform local and global gamma analysis in volumetric comparisons.

A.3 Remarks about evaluation and comparison of Mobius3D and COMPASS
Several authors have presented commissioning studies for 3D pretreatment verification systems. (27,(51)(52)(53) A comparison of Mobius3D with other solutions has not previously been evaluated in the literature. TG-119-based comparisons are powerful tools to evaluate the performance of IMRT and VMAT TPSs (54) and can also be implemented to evaluate DVH-based QA systems.
Validation of the systems was performed by means of comparisons with measurements taken with external devices and usual metrics (point-dose comparisons and planar gamma analysis). Clinical implementation was also performed in terms of TG-119 and real plan comparisons with usual (planar and volumetric gamma analysis) and DVH-based metrics.
An additional limitation is described for this study. Mobius3D has an independent tool to predict dose on patient anatomy, called MobiusFX (Mobius Medical Systems). This software reconstructs the dose from delivery (log-file) information. At the time of the present study, this tool is not available in our institution. Future work should focus on validation and clinical implementation of the MobiusFX tool, as a counterpart of the COMPASS dose reconstruction scheme.

B.1 Comparison with reference calibration
Comparisons of absolute dose values obtained with M3D and COMPASS were comparable with those from previous studies. (27,51,55) Mean differences for M3D at 10-20 cm of depth were -2.7% ± 0.2%, compared with -1.3% ± 0.2% for CC and -1.4% ± 0.2% for CR. The mean gamma passing rate for M3D (2%/2 mm, local normalization) is 68% ± 28%. These discrepancies can probably be improved by adjusting the reference data in M3D. The largest differences observed in profile comparisons were located in the penumbra region. The steep dose gradient present in this area contributes to increasing the discrepancies, as can be observed from gamma passing rates (mean values were better than 98% for the 2%/2 mm criterion with local normalization). However, profile comparisons resulted in good agreement between both systems and reference data. As a conclusion, it can be assumed that CC results were better than M3D and CR results in most situations. Passing rates were above the TG-119 action level of 90% for individual field dose gamma analysis (21) for PDDs and profiles, excluding some fields evaluated with the 2%/2 mm tolerance (M3D PDDs for 2 × 2, 3 × 3, 4 × 4, and 10 × 10 cm 2 fields).

B.2 Validation tests with static square fields and dynamic TG-119 test plans
Mean discrepancies were better than 1.0%. Results for static fields were comparable to those obtained in the previous section. For dynamic TG-119 plans, values were comparable to those extracted with real VMAT plans.

B.3 Validation with real VMAT patient plans
Results for comparisons with ion chamber measurements were comparable or better than those found in the literature. (27,51,55) Values for M3D were slightly better than those from CC and CR in point-dose comparisons. Planar dose comparisons led to similar results for the two systems, and they were better for sagittal dose planes. Passing rates were above the TG-119 action level of 88% for composite dose gamma analysis, (21) excluding mean values of coronal planes for CR evaluated with the 2%/2 mm tolerance.

C.1 Clinical implementation tests with static square fields and dynamic TG-119 test plans
Results for TG-119 test suite comparisons were consistent with previous studies. Dose-volume discrepancies were also comparable with actual clinical plans validated by other authors. (27,50,55) The observed differences could be reduced by improving the beam model for the Mobius3D and COMPASS systems. In the dose reconstruction process, the spatial resolution of the MatriXX detector could be a problem for detecting hot/cold spots in highly modulated fields and might contribute to obtaining worse results.

C.2 Clinical implementation with real VMAT patient plans
Dose-volume comparisons were comparable with other studies. (27,50,55) Differences between both COMPASS results were significant, but M3D results compared with those from the COMPASS system led to p-values higher than 0.05. These discrepancies could be improved with the same solutions reported in the previous sections. These results and those described in the previous section were in contrast with statistical analysis of relative dose distributions performed in the first section. Results for both systems were comparable in terms of dose-volume parameters. Gamma passing rates with global normalization were above the TG-119 action level of 88% for composite dose gamma analysis. (21)

V. CONCLUSIONS
The Mobius3D and COMPASS systems have been tested with quality assurance, static, and dynamic plans, resulting in good agreement with validation measurements. In addition, tests performed to evaluate the clinical implementation of both systems by means of comparisons with TPS calculations are in good agreement, according to dosimetric benchmarks. The two systems can be clinically implemented with no significant differences between them.