Investigation of error detection capabilities of phantom, EPID and MLC log file based IMRT QA methods

Abstract A patient specific quality assurance (QA) should detect errors that originate anywhere in the treatment planning process. However, the increasing complexity of treatment plans has increased the need for improvements in the accuracy of the patient specific pretreatment verification process. This has led to the utilization of higher resolution QA methods such as the electronic portal imaging device (EPID) as well as MLC log files and it is important to know the types of errors that can be detected with these methods. In this study, we will compare the ability of three QA methods (Delta4®, MU‐EPID, Dynalog QA) to detect specific errors. Multileaf collimator (MLC) errors, gantry angle, and dose errors were introduced into five volumetric modulated arc therapy (VMAT) plans for a total of 30 plans containing errors. The original plans (without errors) were measured five times with each method to set a threshold for detectability using two standard deviations from the mean and receiver operating characteristic (ROC) derived limits. Gamma passing percentages as well as percentage error of planning target volume (PTV) were used for passing determination. When applying the standard 95% pass rate at 3%/3 mm gamma analysis errors were detected at a rate of 47, 70, and 27% for the Delta4, MU‐EPID and Dynalog QA respectively. When using thresholds set at 2 standard deviations from our base line measurements errors were detected at a rate of 60, 30, and 47% for the Delta4, MU‐EPID and Dynalog QA respectively. When using ROC derived thresholds errors were detected at a rate of 60, 27, and 47% for the Delta4, MU‐EPID and Dynalog QA respectively. When using dose to the PTV and the Dynalog method 11 of the 15 small MLC errors were detected while none were caught using gamma analysis. A combination of the EPID and Dynalog QA methods (scaling Dynalog doses using EPID images) matches the detection capabilities of the Delta4 by adding additional comparison metrics. These additional metrics are vital in relating the QA measurement to the dose received by the patient which is ultimately what is being confirmed.

of higher resolution QA methods such as the electronic portal imaging device (EPID) as well as MLC log files and it is important to know the types of errors that can be detected with these methods. In this study, we will compare the ability of three QA methods (Delta 4 â, MU-EPID, Dynalog QA) to detect specific errors. Multileaf collimator (MLC) errors, gantry angle, and dose errors were introduced into five volumetric modulated arc therapy (VMAT) plans for a total of 30 plans containing errors. The original plans (without errors) were measured five times with each method to set a threshold for detectability using two standard deviations from the mean and receiver operating characteristic (ROC) derived limits. Gamma passing percentages as well as percentage error of planning target volume (PTV) were used for passing determination. When applying the standard 95% pass rate at 3%/3 mm gamma analysis errors were detected at a rate of 47, 70, and 27% for the Delta 4 , MU-EPID and Dynalog QA respectively. When using thresholds set at 2 standard deviations from our base line measurements errors were detected at a rate of 60, 30, and 47% for the Delta 4 , MU-EPID and Dynalog QA respectively. When using ROC derived thresholds errors were detected at a rate of 60, 27, and 47% for the Delta 4 , MU-EPID and Dynalog QA respectively.
When using dose to the PTV and the Dynalog method 11 of the 15 small MLC errors were detected while none were caught using gamma analysis. A combination of the EPID and Dynalog QA methods (scaling Dynalog doses using EPID images) matches the detection capabilities of the Delta 4 by adding additional comparison metrics. These additional metrics are vital in relating the QA measurement to the dose received by the patient which is ultimately what is being confirmed.

| INTRODUCTION
Pretreatment quality assurance (QA) of an intensity modulated radiotherapy (IMRT) plan is essential in preventing errors from propagating throughout the course of treatment. The rate and frequency of potential errors have increased as treatments shifted from 3D conformal to IMRT and volumetric modulated arc therapy (VMAT).
Numerous studies have shown the need for QA methods to be appropriate for the treatment technology used. 1 Therefore, awareness of the error detection capabilities of an IMRT QA system used in clinical environment is important for optimum delivery of complex treatments.
Two-dimensional (2D) diode arrays are commonly used for patient-specific IMRT QA. Dose distributions are first calculated on the phantom geometry in the treatment planning system (TPS) and then delivered and measured at the treatment machine. A very common method to quantitatively compare measured and calculated dose distributions is the concept of gamma index introduced by Low et al. 2 This approach represents the minimum distance between the measurement and calculation points compared against acceptance criteria for distance-to-agreement (DTA) and percentage dose differences (% DD). The effectiveness of the gamma index tool for IMRT QA has been investigated 3-9 with special emphasis on the sensitivity of different gamma criteria to positioning errors. 5 The results of these investigations have shown that the gamma index can fail to detect errors that may have a significant biological impact. Fundamentally, the patient-specific QA should be linked with treatment outcomes. 10 Dose volume histogram (DVH) based analysis of IMRT QA have proven the effectiveness of this modality while also showing that the gamma index values may have low correlation with the clinical impact of the errors. 4,11 For this reason, in this study both the gamma index and DVH based metrics will be used to evaluate error detection capabilities of three IMRT patient specific QA methods.  Table 1 shows the monitor units (MUs) and beam energies for each of the patient plans.

2.B | Baseline measurements
All plans were optimized and delivered on a Novalis Tx (Varian Medical Systems) linear accelerator equipped with an HD-120 MLC and an aS1000 EPID. To determine the threshold of detectability of our methods, we first established a baseline by measuring all treatment plans five (n = 5) consecutive times with each QA method and performed gamma index analysis of each. Each treatment plan was delivered on the Novalis Tx consecutively on the same day to minimize linac output and EPID response variation. The mean gamma index passing percentage rate for each plan using 3%/3 mm, 2%/ 2 mm, and 1%/1 mm tolerances was used as a reference standard.
Using the baseline statistics, a detection threshold was set at two

2.C | Errors
A total of six different versions of the original treatment were produced for each patient plan, each version containing a different type of error listed in Table 4. These "deliberate" errors represent typical errors that could occur during treatment.

2.E | Scandidos delta 4
The treatment plans containing "known" errors, for each patient, were measured using the Delta 4 detector array, following our insti-

2.F | Epid
The IMRT QA plans were delivered without a phantom in the beam and the fluence maps were collected by the EPID. The EPID images were processed through the MU-EPID software for all patients to convert them into an optical density matrix (ODM) and imported into Pinnacle TPS. First, a pixel-intensity-to-MU conversion factor was determined by delivering 100 MU using a 10 9 10 cm 2 field to the EPID prior to each IMRT QA session. An average pixel intensity is taken over a region of the calibration image corresponding to the cen- were compared for each plan.

2.G | Dynalog
The Dynalog files were recorded during the EPID image acquisitions.
The MLC positions, gantry angles and collimator angles were collected and used to replace the respective ones of the original plans and imported back to Pinnacle TPS. The treatment plans were then recalculated using the dynalog recorded parameters for each patient.
The resulting dose distributions were exported and processed using MatLab to obtain the same dosimetric data as with the plans calculated using ODMs generated from EPID images. Dynalog files do not contain information on the actual number of monitor units delivered, therefore EPID images were used in conjunction with MU values derived from the same pixel-intensity-to-MU conversion algorithm used in the EPID method. centages 90% at 3%/3 mm, 80% at 2%/2 mm, and 50% at 1%/1 mm.

2.H | Gamma index analysis
The passing criteria for 2%/2 mm and 1%/1 mm were derived by matching the number of detected errors at the standard 90% passing threshold. Error detection was also evaluated at passing thresholds set at two standard deviations (SD) from the mean of the baseline and at the percentage determined from ROC analysis.

3.A | Gamma index analysis
At 3%/3 mm and the ROC derived passing gamma cut-off, the  Fig. 7.

4.A.1 | MU-EPID
The EPID showed limited error detection capabilities mainly due to the geometry (loss of gantry angle information) and the over response to scattered radiation. Corrections for the variation in response across the EPID were applied, however, the correction is not able to account for scatter radiation which will largely depend on the plan being delivered. These deficiencies limit the effectiveness of the EPID as a comprehensive IMRT QA tool. The EPID does, however, provide a physical "measurement" that can be useful. The pixel intensity to MU conversion factor derived from the 100 MU calibration image has shown a variation of less than 1%.
The resulting MU value can be used in conjunction with log file based IMRT QA methods to incorporate a "measurement" into the process. This method was used with the Dynalog QA method investigated here.

4.B | Dynalog QA
The Dynalog method performed similarly to the Delta 4 when using the 2 SD and ROC derived pass thresholds. Dynalog method did not have trouble detecting the gantry angle errors but had difficulty detecting small MLC deviations.

4.E | Analysis metrics
Gamma index analysis exhibited poor sensitivity to small errors for all three IMRT QA methods investigated here. The failure of the gamma index to detect small errors has been documented and is in line with our findings. 4,6,8,11 Even with optimization using ROC methods gamma analysis fails to detect MLC errors of less thañ 2 mm. 14 However, the use of D 2 as an additional error detection metric enhances the capabilities of the systems to detect small errors as well as reducing the number of false positives that can be an issue with gamma index analysis.

| CONCLUSION
Measurements to established baseline data showed significant difference in the average gamma values between methods especially for F I G . 7. Bar graphs of errors detected by using D2 and D98 of the PTV.
stricter gamma index calculation criteria. The baseline data were used to perform ROC analysis to determine the error detection capabilities of each patient specific QA method. Our results from all patient measurements highlighted the strengths and weaknesses of each method used for patient specific QA in detecting small clinically relevant errors. It was also evident from the results that the gamma index as an analysis tool has significant limitations in detecting small errors as no method investigated here could detect more than 60% of the intentional errors.
Individually, neither the EPID nor Dynalog method performed as well as the Delta 4 when using the gamma index only. However, a combination of the two methods (scaling Dynalog doses using EPID images) matches the detection capabilities of the Delta 4 . Moreover, with the scaled Dynaolgs, the plan can be recalculated using the MLC location information and using the PTV D 2 and D 98 , we were able to increase the detectability of the errors. The increased sensitivity gained for analysis of dose to the PTV combined with phantom-less data collection makes this method an attractive alternative to phantom-based patient specific IMRT QA.

CONF LICT OF I NTEREST
We have no conflict of interest to declare.