Technical note: A modified gamma evaluation method for dose distribution comparisons

Abstract Purpose In this work we have developed a novel method of dose distribution comparison, the inverse gamma (IG) evaluation, by modifying the commonly used gamma evaluation method. Methods The IG evaluation calculates the gamma criteria (dose difference criterion, ΔD, or distance‐to‐agreement criterion, Δd) that are needed to achieve a predefined pass rate or gamma agreement index (GAI). In‐house code for evaluating IG with a fixed ΔD of 3% was developed using Python (v3.5.2) and investigated using treatment plans and measurement data from 25 retrospective patient specific quality assurance tests (53 individual arcs). Results It was found that when the desired GAI was set to 95%, approximately three quarters of the arcs tested were able to achieve Δd within 1 mm (mean Δd: 0.7 ± 0.5 mm). The mean Δd required in order for all points to pass the gamma evaluation (i.e., GAI = 100%) was 4.5 ± 3.1 mm. The possibility of evaluating IG by fixing the Δd or ΔD/Δd, instead of fixing the ΔD at 3%, was also investigated. Conclusion The IG method and its indices have the potential to be implemented clinically to quantify the minimum dose and distance criteria based on a specified GAI. This method provides additional information to augment standard gamma evaluation results during patient specific quality assurance testing of individual treatment plans. The IG method also has the potential to be used in retrospective audits to determine an appropriate set of local gamma criteria and action levels based on a cohort of patient specific quality assurance plans.

comparison method being used clinically, although several alternative methods have been proposed 7-12 . Derived from the dose difference (ΔD) test and the distance-toagreement (Δd) test 10 , the gamma index method calculates the difference between two dose grids in a combined spatial-dose domain 6 .
The result of the gamma test can be summarized by a single percentage value, usually referred to as the "pass rate" or "gamma agreement index" (GAI), which describes the percentage of points in the two dose distributions that agree within specified ΔD and Δd (producing a gamma value less than or equal to 1.0). Gamma evaluation results can also be plotted as a two-dimensional gamma distribution with desired spatial resolution, as well as histograms, so that the locations of regions of disagreement can be identified and investigated 6 . The gamma evaluation method has the advantage of producing a quantitative measure based on both dose and spatial criteria, so that large dose differences occurring in high-dose-gradient regions do not disproportionately affect the results of the comparison. The gamma evaluation method has, however, been criticized for being less clinically intuitive than more conventional dose-comparison methods 8 , being sensitive to dose grid resolution 13 and having poor sensitivity and specificity to clinical dosimetric inaccuracies (when evaluated in terms of global dose difference) [14][15][16][17][18][19][20] .
Several alternative dose comparison methods have been proposed, to avoid some of the perceived weaknesses of the gamma index method [7][8][9][10][11][12] . Specifically, several algorithms have been proposed which attempt to account for the differing levels of biological relevance associated with comparison results in different regions of the dose distribution. For example, the normalized agreement test (NAT) 7 , maximum allowed dose difference (MADD) method 8 and the divide and conquer (D&C) gamma method 11 all vary tolerances in dose-differences in high-dose or high-dose-gradient regions that may correspond to clinically important regions. A concept of radiobiological gamma index (Sumida method) 12 has been proposed to integrate radiobiological parameters such as tumor control and normal tissue complication probabilities into gamma index calculation and produces more clinically relevant results. The proposed DVH-based analysis 15,[21][22][23][24] allows different criteria to be used in each volume depending on clinical significance or required precision, which could be more relevant than simply judging the overall agreement 25 . These and other alternative dose comparison methods share the disadvantage of providing results that are difficult to compare and benchmark against historical data or other sources (other radiation oncology centres 26 , auditing bodies 27 or established quality assurance guidelines 28,29 ), given the widespread adoption and acceptance of the gamma evaluation method.
This study investigates a modified gamma evaluation method, the "inverse gamma" (IG) evaluation, which calculates the gamma evaluation criteria (ΔD or Δd) that would be needed to achieve a predefined GAI. Li et al. 30 proposed a similar approach where the passing percentage is fixed and combination of ΔD and Δd was calculated; however, has not been implemented into clinical QA. It is expected that the modified IG method proposed in this study will provide additional information for clinical PSQA, to augment standard gamma evaluation results by providing users with an indication of the minimum Δd for which a specified GAI can be achieved, when (for example) the ΔD is set to 3%.

2.A | Standard gamma index method
The gamma index at a point r r is defined as 6 : where δ(r e , r r ) is the dose difference between evaluated and reference doses at point r, ΔD is the dose difference criterion, r(r e , r r ) is the spatial distance between evaluated and reference dose points, and Δd is the distance-to-agreement criterion. The GAI is calculated as the percentage of points for which eq. (2) results in a gamma value less than or equal to 1.0, indicating agreement within the specified ΔD and Δd. The gamma index method implemented in this study used the global gamma normalization where the ΔD is normalized to the global maximum dose.
2.B | Inverse gamma with fixed ΔD (IG ΔD ) IG ΔD calculates the minimum distance-to-agreement criterion (Δd) that is needed to achieve a specified GAI, given a fixed value of the dose difference criterion (ΔD). The fixed ΔD used in this study was selected to be 3%, denoted as IG ΔD = 3% . The IG algorithm performs iterative global gamma calculations with Δd increasing from 0 mm in 0.1 mm increments, until the specified GAI is reached and the required minimum Δd is reported. The resulting Δd can be denoted as Δd GAI = 100% or 95%, ΔD = 3% . Clinically this value would then be used to compare against a tolerance. The time required to perform the IG calculations is dependent on the number of iterations required. Lower Δd values require less time to calculate than higher Δd values. On average it takes a few minutes to run on a desktop PC, which is practical in clinical settings.
As an example, the ΔD was fixed at 3% in this work and the values of Δd required to achieve GAI values of 95% and 100% were investigated, for a pre-existing set of VMAT PSQA results. The ΔD of 3% was chosen for this work because it is very widely recommended and used. Recent surveys 31,32 suggested that 3%/3 mm are currently the most commonly used gamma evaluation criteria, and the AAPM's task group report on IMRT commissioning (TG-119) 28 used 3%/3 mm and the AAPM's more recent task group report on modulated radiotherapy quality assurance (TG-218) 29

2.C | Application
In-house gamma evaluation code was developed using Python v 3.5.2, following the method proposed by Low et al. 6 . The code was validated by establishing agreement with the commercial SNC Patient software package within ± 1.5% [mean 0.3% difference], which falls within the range of variations due to minor differences in algorithm implementation 29 between commercial gamma evaluation software packages reported by TG-218. The in-house code was also validated against the square-field evaluation as per Low & Dempsey 6 .
The code was then modified to include the IG algorithm. A set of pre-existing PSQA results, consisting of 53 arcs from 25 VMAT treatment plans measured using the ArcCheck (Sun Nuclear Corporation, Melbourne, USA) helical diode arrays, were arbitrarily selected and used to validate the performance of the code, by attempting to duplicate the gamma evaluation results produced during conventional PSQA tests using the SNC patient software (version 6.2.2) with Van Dyk global gamma analysis 41 and 2D distance-to-agreement. The lower dose threshold (LDT) was set to 5%, which is consistent with the LDT used in the gamma evaluation. The performance of the standard gamma evaluation calculations by the in-house code was verified by the Low and Dempsey method 6 and established agreement with the output produced by the SNC Patient software. The in-house code was then used to calculate IG ΔD = 3% for all 53 measurements. When calculating IG ΔD = 3%, the target GAI was first defined as 100%, which represents the extreme situation where all points must pass the gamma test. The IG ΔD = 3% analysis was then repeated, with the target GAI set to 95%, which corresponds to two standard deviations and is the action level most commonly used 31 .

| RESULTS
The mean and standard deviation (SD) of IG ΔD = 3% (when GAI was set to 100% and 95%), compared with their original gamma values, of the 53 VMAT arcs were calculated and displayed in Table 1. The detailed values for each individual arc were displayed in Table A1 in the Appendix. and 95%. Regions of high and low geometric uncertainties can be easily identified, which is not easily achievable by performing multiple gamma evaluation with varying Δd criteria. Figure 2 illustrates the results of the IG analysis of the VMAT PSQA results, showing the Δd required to achieve the specified GAI values for each of the 53 arcs, when the ΔD is fixed at 3%. The mean Δd required to achieve a GAI of 100% was 4.5 ± 3.1 mm. The number of arcs that achieved a GAI of 100% only when Δd was 10 mm or more provides a graphic indication of the clinical unsuitability of requiring that all points pass the gamma evaluation. The mean Δd required to achieve the more-conventional GAI of 95% was 0.7 ± 0.5 mm, with the majority arcs (75.5%) requiring Δd less than 1 mm.
Examination of the results in Fig. 2 indicates that if initial PSQA testing of these arcs had used gamma evaluation criteria of 3%/ 2 mm, all arcs would have achieved a GAI greater than 95%. Data in Fig. 2 also indicate that that if initial PSQA testing of these arcs had used gamma evaluation criteria of 3%/1 mm, then 75.5% of the arcs would have achieved a GAI greater than 95%.
The highest Δd calculated is from arc 2 using GAI of 100%, which indicated that a Δd of nearly 17 mm is needed for all points to pass gamma when the ΔD value is set to 3%. However, when GAI of 95% is used, Δd has been significantly dropped to only 1.6 mm. This behavior will be discussed in the next section.

VMAT Arcs
Gamma pass rate (%) 2%, 2 mm, 5% LDT  The IG method also offers more variability and flexibility than simply fixing the ΔD criterion. The algorithm is amenable to using a fixed Δd to identify the ΔD that would produce a specified GAI.
Alternatively, the ratio of ΔD to Δd can be fixed, enabling the IG For centres that are committed to using the standard gamma evaluation method for PSQA, the IG indices developed in this study could be used to investigate or justify the choice of gamma criteria for ongoing PSQA use, based on combination of the local treatment technique and measuring device.

| CONCLUSIONS
A novel dose comparison method called the inverse gamma (IG) method has been developed. The IG ΔD = 3% index has been tested on 25 retrospective VMAT PSQA plans (53 arcs). This index was proven useful to quantify the minimum Δd based on given ΔD of 3% in order to pass a given GAI. This method has the potential to be implemented clinically to perform additional analysis of failed plans and to provide those who prescribe, plan and test modulated radiotherapy treatments with more detailed dosimetric information about the reliability with which planned doses can be delivered. The IG method also has the potential to be used in retrospective internal and inter-departmental audits, to evaluate the suitability of the local gamma evaluation criteria.

CONF LICT OF I NTEREST
The authors have no relevant conflict of interest to disclose.

APPEN DIX
The body of this manuscript describes the IG method, using the example of keeping the ΔD fixed and identifying the minimum Δd required to achieve a specified GAI for each PSQA comparison (abbreviated to IG ΔD ). Two additional forms of the IG algorithm have been investigated: inverse gamma with fixed Δd (IG Δd ) and inverse gamma with fixed dose-distance-ratio (IG ΔD/Δd ).
As an obvious analogue to IG ΔD , IG Δd keeps the Δd fixed and calculates the minimum ΔD required to achieve a specified GAI for each PSQA comparison. The algorithm performs iterative global gamma calculations of ΔD from 0% with 0.1% increment until the minimum ΔD were found to achieve a GAI that is no less than the predefined threshold GAI value. To provide an indication of the results achievable using IG Δd , the 53 VMAT PSQA results used in this work were re-evaluated using a fixed Δd of 1 mm, denoted as IG Δd = 1mm (see Fig. A1). A comparable example of the resulting ΔD distribution when applying the IG Δd = 1mm method is illustrated in  | 199 from 1%/ 1 mm with 0.1%/mm increment until the resulting GAI equals to or above the predefined threshold GAI value (see Fig. A1). Figure A1 shows how the IG results can vary, depending on the selection of the GAI and on which specific IG method (fixed ΔD, fixed Δd or fixed ΔD/Δd) is used for the comparison. Clearly, the ΔD and Δd values required to achieve a "passing" result are higher when the GAI action threshold is 100%, compared to when the GAI action threshold is 95%. Similarly, the ΔD values required to exceed the GAI action threshold are higher when the Δd is set to 1 mm, than when the Δd is allowed to vary at a constant ratio with the ΔD. Figure A1 also provides an indication of how the indices developed in this study can be used to evaluate locally used or proposed gamma criteria and GAI, as part of a statistical process control (SPC) process. In Fig. A1, the mean ΔD and Δd criteria that result from an IG ΔD/Δd = 1.0 evaluation of the VMAT PSQA results, with a GAI of 95%, is 1.6 ± 0.2%/1.6 ± 0.2 mm, which is just within 2%/2 mm.
This suggests that if 2%/2 mm was used in a standard gamma calculation, the resulting mean GAI would be slightly higher than 95% (in fact, the results of this study show that it is 97.9 ± 1.5%). This result suggests that using the gamma criteria of 2%/2 mm and GAI of 95% is suitable for the cohort of plans. This finding is consistent with the results estimated using the SPC method 29 for the same cohort of patients. Similarly, Fig. A1 suggests that if a center decided to use a GAI of 100%, then the suitable gamma criteria they should be using is 4%/4 mm, assuming a unity dose-to-distance ratio is preferred.