Assessment of radiobiological metrics applied to patient‐specific QA process of VMAT prostate treatments

VMAT is a powerful technique to deliver hypofractionated prostate treatments. The lack of correlations between usual 2D pretreatment QA results and the clinical impact of possible mistakes has allowed the development of 3D verification systems. Dose determination on patient anatomy has provided clinical predictive capability to patient‐specific QA process. Dose‐volume metrics, as evaluation criteria, should be replaced or complemented by radiobiological indices. These metrics can be incorporated into individualized QA extracting the information for response parameters (gEUD, TCP, NTCP) from DVHs. The aim of this study is to assess the role of two 3D verification systems dealing with radiobiological metrics applied to a prostate VMAT QA program. Radiobiological calculations were performed for AAPM TG‐166 test cases. Maximum differences were 9.3% for gEUD, −1.3% for TCP, and 5.3% for NTCP calculations. Gamma tests and DVH‐based comparisons were carried out for both systems in order to assess their performance in 3D dose determination for prostate treatments (high‐, intermediate‐, and low‐risk, as well as prostate bed patients). Mean gamma passing rates for all structures were better than 92.0% and 99.1% for both 2%/2 mm and 3%/3 mm criteria. Maximum discrepancies were (2.4%±0.8%) and (6.2%±1.3%) for targets and normal tissues, respectively. Values for gEUD, TCP, and NTCP were extracted from TPS and compared to the results obtained with the two systems. Three models were used for TCP calculations (Poisson, sigmoidal, and Niemierko) and two models for NTCP determinations (LKB and Niemierko). The maximum mean difference for gEUD calculations was (4.7%±1.3%); for TCP, the maximum discrepancy was (−2.4%±1.1%); and NTCP comparisons led to a maximum deviation of (1.5%±0.5%). The potential usefulness of biological metrics in patient‐specific QA has been explored. Both systems have been successfully assessed as potential tools for evaluating the clinical outcome of a radiotherapy treatment in the scope of pretreatment QA. PACS number(s): 87.56.Fc, 87.55.Qr, 87.55.dk, 87.55.dh, 87.10.Vg, 87.55.km, 87.53.Bn, 87.55.‐x, 87.56.‐v


I. INTRODUCTION
Radiation therapy (RT) for prostate cancer has substantially evolved during the last years. Dose escalation improves disease control, at the expense of an increment in toxicity. (1) Modulated techniques can reduce toxicity by optimizing treatment conformation. (2) Hypofractionated schemes are suitable in prostate treatment because of the low α/β ratio for the prostate gland. (3) Hypofractionated plans delivered with intensity-modulated radiation therapy (IMRT) techniques lead to extended treatment times compared to traditional techniques. Volumetric-modulated arc therapy (VMAT) has been developed due to rotational capabilities recently implemented in conventional linacs. (4) Treatment times are noticeably reduced within this new paradigm, making VMAT a powerful tool for hypofractionated prostate treatments. (5)(6)(7)(8)(9) VMAT, as one kind of IMRT technique, requires a detailed patient-specific quality assurance (QA) program. (9) This independent pretreatment QA process usually consists of comparing dose measurements acquired with phantoms/detectors of regular geometries with treatment planning system (TPS) calculations made under the same conditions. (10,11) Ion chambers are used to perform point measurements. Two-dimensional (2D) (plane) dose distributions are measured with several systems: electronic portal imaging devices, films or 2D detector arrays. Tests involving gamma index passing rates are common in these comparisons. (12,13) Three-dimensional (3D) verifications start with specifically developed solutions for volumetric techniques. (14,15) The lack of correlations between usual 2D pretreatment QA results and the clinical impact of possible mistakes has been established. (16,17) In a second step, based on previous conclusions, 3D verification systems are developed under the scope of determining dose on patient anatomy, providing clinical predictive capability to these systems. Solutions for redundant calculations on patient CT information or 3D dose reconstruction from measurements have already been developed. (18) Three-dimensional dose calculation and reconstruction have introduced DVHbased metrics in QA process, allowing for dose-volume information comparisons.
The quality of an RT plan has been traditionally judged by dose-volume parameters rather than biological ones. However, dose-volume criteria should be complemented by biological indices. (19) Eventually, biological models should be routinely introduced and validated because these models have demonstrated their predictive ability in the evaluation of the treatment outcome. Although a whole replacement of standard DVH-based metrics should not be recommended, the efforts may be addressed in order to validate outcome prediction models, overcoming the traditional evaluation metrics. (20)(21)(22)(23) Radiobiological data and response parameters -such as generalized equivalent uniform dose (gEUD), (24,25) tumor control probability (TCP), (26)(27)(28) or normal tissue complication probability (NTCP) (29)(30)(31)(32)(33)(34)(35) -can be obtained from DVH information. Hence, radiobiological metrics can be incorporated into individualized pretreatment QA process. This paper assesses the role of two 3D dose verification systems dealing with radiobiological metrics applied to a VMAT prostate treatment QA program.

B. 3D dose verification systems
Two 3D verification systems were assessed. Mobius3D software (Mobius Medical Systems, Houston, TX) provides an independent dose calculation engine for treatments generated by TPS. COMPASS (v. 3.1) (IBA Dosimetry, Schwarzenbruck, Germany) is capable of reconstructing dose on patient CT from measurements taken with an associated detector. In addition, it provides an independent and redundant dose verification of TPS calculations, as does Mobius3D.

B.1 Mobius3D system description
The software uses stock reference values for common linear accelerators to model the beams. Mobius3D works with a collapsed cone convolution/superposition algorithm independently developed and updated from its original conception. (36)(37)(38)(39)(40)(41) The algorithm is accelerated throughout graphic processing units (GPUs), increasing the calculation speed significantly compared to CPUs.

B.2 COMPASS system description
COMPASS consists of two different devices: the detector and associated software. The detector device is a 2D ion chamber array (MatriXX Evolution , IBA Dosimetry). It has 1020 ion chambers (0.08 cm 3 ) covering an active area of 24.4 × 24.4 cm 2 . This detector has already been evaluated for VMAT pretreatment QA. (42) The MatriXX device must be placed in a holder attached to the head treatment unit in order to ensure a rigid rotation of the detector with the gantry. Source-to-detector distance is 100 cm. A buildup thickness of 2.5 cm was used with the previous arrangement. An angle sensor is attached to the gantry in order to associate each measured fluence with its detection angle; the sensor has an angular tolerance of ± 0.6°. COMPASS software requires a beam modeling process fitting basic parameters, as expected for TPS. The model connects with a collapsed cone convolution/superposition algorithm that allows both calculating and reconstructing (from measurements) dose on patient CT. A commissioning process is required for the MatriXX device in the software. It consists of background (20 s) and pre-irradiation (5 Gy or higher) measurements together with a square field (10 × 10 cm 2 ) acquisition that automatically corrects detector shifting and rotation. An absolute dose calibration with a known-dose reference field is also required. Sampling time for each measurement was 250 ms.

C.1 gEUD
For a nonuniform tumor dose distribution, equivalent uniform dose (EUD) is defined as the uniform dose that yields the same biological effect, if treatment is delivered over the same number of fractions as the nonuniform original dose distribution. (24) Niemierko (25) proposed a phenomenological expression to extend the previous concept to normal tissues, referred to as the generalized EUD (gEUD): (1) where v j is the fractional tumor volume receiving a dose d j , and a is a tissue-specific parameter describing volume effect. For tumors, a takes negative values; for serial-like structures, a takes large positive values; and for parallel-likes structures, a takes values close to 1.

C.2 TCP
Tumor control probability (TCP) can be modeled as a Poisson distribution. (26) If the number of initial clonogen cells is N c and the clonogen surviving fraction after irradiation with a single fraction is denoted by S, TCP can by written for a course of n fractions as: (2) Assuming that the average number of surviving clonogenic cells is an exponential function of the dose, the characteristic sigmoid dose-response curve is obtained. This simple assumption has been replaced by introducing the linear-quadratic (LQ) model to obtain the surviving fraction (27) as: where d is the dose per fraction and α/β are usual parameters in the LQ model. For an inhomogeneous irradiation of dose d j in a fractional volume v j , TCP can be calculated as (28) The number of clonogenic cells can be determined using some of the following relations:) (43)(44)(45) (5) (6) where D 50 is the tumor dose required to obtain a TCP of 50%. In addition, an expression for TCP may be obtained if data of clonogenic cells are not available, in terms of sigmoidal dose response parameters as (43,44,46) (sigmoidal model for TCP): The γ 50 parameter is the slope of dose response at TCP of 50%. Another possibility consists of using gEUD concept previously introduced, obtaining TCP as (28) (Niemierko model for TCP): (8)

C.3 NTCP
The Lyman model (29) describes complication probabilities for uniformly irradiated organ volume. The characteristic sigmoid dose-response curve is described by three parameters. Sigmoid curve dependence on dose is described by TD 50 and curve steepness by m. The magnitude of the volume effect is described by n parameter in a power-law relationship between the tolerance dose and irradiated volume: (9) where TD(v) is the tolerance dose for a given partial volume fraction v, and TD(1) is the tolerance dose for the full volume. For an inhomogeneous irradiation, the Lyman model can be completed with an algorithm to convert a heterogeneous dose distribution into a uniform organ irradiation resulting in the same NTCP. The effective volume method (31) is most commonly used to complement the Lyman model, resulting in a combined formalism named the Lyman-Kutcher-Burman (LKB) model. NTCP can be calculated, for an inhomogeneous irradiation of dose d j in a fractional volume v j , as: (31,32) where t is (11) with D eff (12) Another possibility entails using the concept of gEUD concept previously introduced and obtaining NTCP in the Niemierko model as: (32,34) (13)

D. Prostate treatments
Prostate treatments were planned and delivered with a single arc. VMAT technique was applied to all prostate cases treated with external beam therapy: treatment for usual staging (high-, intermediate-and low-risk) and radiotherapy after prostatectomy. Prostate gland, seminal vesicles, and pelvic lymph nodes were treated in high-risk patients; prostate gland and seminal vesicles were the targets for intermediate-risk cases; and prostate volume was the single target for low-risk staging. A simultaneous integrated boost (SIB) technique was used with two or three target volumes. A moderate hypofractionation scheme was applied for the previous three staging levels. The prescription doses were 70 Gy to prostate gland, 56 Gy to seminal vesicles, and 50.4 Gy to pelvic lymph nodes, delivered in 28 fractions. For prostate bed treatments after prostatectomy, the prescription dose was 74 Gy delivered in 37 fractions. Contoured organs at risk (OARs) were usually rectum, bladder, and femoral heads. For each usual staging, 25 prostate cases were analyzed, together with 25 cases of prostate bed treatments, for a total of 100 analyzed treatments.

E. Verifications of biological metrics for both systems with TG-166 benchmark phantom and test cases
In order to test the capabilities of both systems for radiobiological calculations, tests taken from the AAPM TG-16621 report were performed.

E.1 Benchmark phantom test
Benchmark phantom consists of a large cubical phantom with four simple structures (three rectangular, one triangular) created inside the phantom (21) (Fig. 1). A single 6 MV, 100 cm sourceto-surface distance, 20 × 20 cm 2 photon beam was calculated in the TPS, with a prescribed dose of 72 Gy in 40 fractions to a point at 6 cm depth along the central axis. Dose imparted over benchmark phantom structures was determined by Mobius3D (M3D), COMPASS calculation (CC), and reconstruction (CR) modules; then, gEUD, TCP, and NTCP were calculated and compared among TPS, M3D, CC, and CR from the previous models. An in-house developed software was used to perform previous calculations. The software reads the DVH information from the different systems and applies Eqs. (1), (4), (5), (6), (7), (8), (10), (11), (12), and (13) in order to obtain gEUD, TCP, and NTCP values.

E.2 Representative test cases
Treatment plans for three representative test cases (head and neck [H&N], prostate, and brain) were also calculated in the TPS according to the volumes and dose prescriptions defined in the report. gEUD, TCP, and NTCP values were determined for TPS plans and compared to those extracted by M3D, CC, and CR for the same treatments. Fig. 1. Benchmark phantom test case from the AAPM TG-166 report, (21) with four simple structures (three rectangular, one triangular) and the edge of a single 6 MV, 100 cm source-to-surface distance, 20 × 20 cm 2 photon beam.

F. Evaluation of prostate treatments with classical (gamma) and DVH-based
dose-volume metrics In order to assess correct performance in 3D dose calculation and reconstruction processes, traditional gamma tests and DVH-based comparisons were carried out for both systems. Resorting to classical metrics, comparisons with TPS by means of global gamma passing rates for all structures were reported with two criteria (2%/2 mm and 3%/3 mm, global normalization to maximum, with a low-dose threshold at 10% of global maximum). In addition, 3D dose evaluation was performed by comparing all prostate plans generated by the TPS and those determined by M3D, CC, and CR. Representative dosimetric parameters were obtained from DVHs. ICRU recommendations for recording and reporting IMRT treatments (20) were used to extract evaluation parameters for PTVs (D 98 , D 2 , D 50 , D mean ). Maximum and mean doses were extracted for OARs. For normal tissue, depending on the case, classical (47) and recently reviewed dose constraints (QUANTEC) (48) were also reported.
G. Introducing biological metrics in patient-specific QA for prostate treatments gEUD, TCP, and NTCP were extracted from TPS and compared to those values obtained with M3D, CC, and CR for all the models discussed above. These calculations were performed with the same in-house software previously defined. In order to support the introduction of radiobiological metrics, the correlation between biological indices and differences in DVH parameters was studied.

H. Radiobiological parameters
Parameters used for tumor calculations (gEUD and TCP) for TG-166 test cases and prostate treatments are summarized in Table 1. Values for a parameter were extracted from the TG-166 report (21) (value for benchmark phantom case was also taken as -10). Selected α value was 0.1 Gy -1 . (49) Selected α/β values were 10 Gy for TG-116 cases, with the exception of the prostate case, where 3 Gy was selected, (45) therefore taking values recommended in the report. For analyzed prostate treatments, α/β was 1.5 Gy, which is customary in our institution. (50)(51)(52)(53) D 50 and γ 50 values for benchmark phantom case were taken from the TG-166 report. Values for the remaining TG-116 cases were also extracted from the study by Okunieff et al. (45) For Table 1. Selected radiobiological parameters for tumor calculations (gEUD and TCP). Values were obtained from AAPM TG-166 report, Cheung, (54,55) King, (56) Okunieff, (45) and Levegrun (57) (54,55) For prostate bed treatments, parameters were taken from the study by King et al., (56) using the relationship between absolute and relative slope at D 50 . (45) In addition, TCP values for the prostate cases under analysis were calculated with the values reported by Okunieff et al. (45) and Levegrun et al. (57) The parameters used for normal tissue calculations (gEUD and NTCP) for TG-166 test cases and prostate treatments are summarized in Table 2. Values for "a" parameter were extracted from TG-166 report. (21) The selected "a" values for benchmark phantom case were taken as 12, 1, 4, and 2 for the PTV Rectangle, Rectangle 1, Rectangle 2, and Triangle 1 structures, respectively. These values were selected in order to fit the results from both LKB and Niemierko models ("a" parameter was not reported for the Niemierko model for NTCP calculations in TG-166 report). Selected α/β values were 3 Gy for all the cases. Values for TD 50 , γ 50 , m, and n parameters were shown in the previous report, taken from the study by Burman et al. (58) Additional γ 50 parameters were taken from studies by Stavrev et al. (59) (cord and mandible), Huang et al. (60) (inner ear), and Lee et al. (61) (parotid gland). The γ 50 parameter for the pubic bone was taken as 4, similar to the value for the femoral head. As in the Niemierko model case, m and n parameters from LKB were not reported in the AAPM report; standard values for bone (m = 0.12 and n = 0.25) were taken in order to perform the calculations.

I. Statistical analysis
Results were described as mean ± standard deviation (SD). Data were compared using a paired and two-tailed Student's t-test. The difference was considered statistically significant for p-values < 0.05. Possible correlation between variables was studied by means of Pearson's correlation coefficient (r). Table 2. Selected radiobiological parameters for normal tissue calculations (gEUD and NTCP). Values were obtained from AAPM TG-166 report and the studies by Burman et al., (58) Stavrev et al., (59) Huang et al., (60) and Lee et al. (61) Parameters a α/β (Gy) TD 50

A. Verifications of biological metrics for both systems with TG-166 benchmark phantom and test cases
Differences and comparisons between gEUD, TCP, and NTCP calculations from TPS and those obtained from Mobius3D and COMPASS for TG-166 test cases are shown in Tables 3, 4, and 5, respectively. DVHs from previous cases are plotted in Figs. 2 and 3. Maximum deviations for gEUD were found for left and right inner ear in M3D calculations (8.8% and 9.3%, respectively). CR results were better than M3D and CC results for gEUD discrepancies. The maximum difference for TCP evaluations was found for M3D calculations using the Poisson model (-1.3%). M3D and CR results were better than those from CC. For NTCP, the maximum discrepancy was found in the COMPASS reconstructed dose of Triangle 1 structure from benchmark phantom test (5.3%). There were no statistically significant differences for NTCP comparisons.

B. Evaluation of prostate treatments with classical (gamma) and DVH-based dose-volume metrics
Mean global gamma passing rates for all structures are shown in Table 6. Considering mean values for all structures, M3D passing rates were worse than those from COMPASS in all cases, with the exception of values for 3%/3 mm criterion in low-risk treatments (mean passing rates for all structures were 99.8% for both M3D and CR results).
TPS mean values of the dosimetric parameters analyzed for each prostate case are shown in Table 7. Mean differences and comparisons between previous values and results for the same parameters determined by M3D, CC, and CR are also presented in this table. Maximum discrepancies for PTVs were found in COMPASS reconstructed low-risk and prostate bed patients (2.4% ± 0.8%). The largest differences observed for rectum were found for COMPASS reconstructed mean dose in bed patients (6.2% ± 1.3%). For bladder, the maximum differences were also located in mean dose Mobius3D calculated values for bed treatments (-3.3% ± 1.6%). The worst mean differences for femoral heads were found for CR results for the right head in low-risk patients (3.1% ± 1.3%). CR results were worse (p < 0.05) than M3D and CC results for a huge number of cases. Statistically significant differences for comparisons between M3D and CC were also observed. Deviations were better for M3D than CC in some cases, and vice versa. For all the parameters, mean values were 0.5% ± 1.9%, 0.4% ± 1.0%, and 1.1% ± 2.6% for M3D, CC, and CR, respectively. Considering all parameters, M3D results were better than CR results, and CC values were better than those obtained by M3D and CR.  all femoral head values. In addition, M3D results were better than CR ones for rectum values in intermediate-risk treatments. Finally, CR results were better than M3D results for prostate gland values in low-risk treatments and also better than CC results for prostate gland values in prostate bed treatments. The remaining differences were not statistically significant. TCP comparisons between the TPS, M3D, and COMPASS for all the previously described models are shown in Table 9. The maximum mean difference was found for high-risk COMPASS reconstructed values using the Poisson model and taking the values for radiobiological parameters from the study by Cheung et al. (54) (-2.4% ± 1.1%). CC results were better than CR results for all high-risk treatments (except those obtained using the sigmoidal model and taking the values from the study by Okunieff et al. (45) and prostate bed values. CC and CR results were better than M3D ones for all low-risk values. In addition, CC results were better than M3D results using the Niemierko model with any set of parameters and using the sigmoidal model with the parameters by Levegrun et al. (57) The remaining differences for TCP comparisons were not statistically significant.
NTCP comparisons between the TPS and the two systems are shown in Table 10. The worst mean difference was found for COMPASS reconstructed rectum values in low-risk treatments (1.5% ± 0.5%). Discrepancies for bladder and femoral heads were better than 0.1% and 0.01%, respectively, in all cases. CC results were better than both CR and M3D results for rectum values in all cases, for bladder values in high-risk and prostate bed treatments and also for intermediate-risk treatments using the Niemierko model. In addition, CC results were better   The correlation between the changes in DVH parameters and the corresponding changes in radiobiological outcomes is summarized in Tables 11 and 12 for targets and normal tissues, respectively. Selected DVH parameters for prostate volumes were D 98 , D 2 , and D 50 . Correlation ranged from moderate (r > 0.500) to strong (r > 0.700) for gEUD comparisons, although CC results seemed to have a weaker correlation (r < 0.500). For OARs, selected DVH parameters were V 70 and mean dose for rectum and bladder and maximum dose for femoral heads. For gEUD, moderate to strong correlation was observed, although some results exhibited a weaker correlation (CC and CR results for rectum and bladder). NTCP results seemed to have stronger correlation for the LKB model than the Niemierko model for rectum. Correlation was weaker for mean dose compared to V 70 values for bladder. Low-risk cases exhibited a stronger correlation for bladder NTCP results compared with the other cases. Femoral heads showed no correlation for either LKB or Niemierko calculations.  Table 11. Pearson correlation coefficient comparing gEUD and TCP for prostate target volumes. The selected DVH metrics were D 98 , D 2 , and D 50 . Levegrun Parameters   TCP  TCP  TCP  TCP  TCP  TCP  TCP  TCP  TCP  gEUD

A.1 Variability between radiobiological response parameters and models
Radiobiological models are powerful evaluation tools because values such as TCP and NTCP are related to the clinical outcome of a treatment. Therefore, biological-based evaluation becomes an interesting metric in order to evaluate a treatment plan. However, all biological models have uncertainties in the values of the parameters chosen. These metrics should be used with caution, due to these uncertainties. The use of biological models in plan evaluation requires accurate TCP/NTCP models and parameter estimation. The users of biological metrics could derive model parameters based on their own experience by calibrating selected models against observed clinical outcomes. Another option is to cautiously use published parameter values, as these data are available for many tumor and normal tissue sites. These known ideas have been expressed in reference reports, such as AAPM TG-166 or ICRU 83 reports. (20,21) The first option could not be feasible for our institution, as it requires expertise in outcome modeling and sufficient patient throughput. In order to overcome these possible uncertainties, published parameters taken from reference reports and studies were used, as in the TG-166 report, (21) the study by Okunieff et al., (45) or the study by Burman et al. (58) The possible variation due to the model selection was overcome by using three models for TCP calculations (Poisson, sigmoidal, and Niemierko) and two models for NTCP calculations (LKB and Niemierko). In both cases, results between the models were comparable. A limitation of the present study was related with the TCP calculation using the Eq. (3), where the dose protraction factor has not been considered in order to model the TCP. Dose protraction factor (G) modifies the quadratic term of the linear-quadratic expression in order to take into account the sublethal damage repair of protracting the dose delivery. If the delivery takes a short time (instantaneous), G = 1. For any other dose delivery pattern, G < 1. This study was performed considering G ~ 1 since treatment times are about few minutes, as in the prostate treatments using VMAT techniques.

A.2 Use of AAPM TG-166 test cases
The aim of the present study was to evaluate the implementation of radiobiological metrics in the patient-specific QA workflow. TG-166 test cases have been designed to assess the biological modeling implemented in several commercial TPSs. In particular, benchmark phantom structures have been specifically developed to address this problem. In addition, the dose calculation and reconstruction capabilities of the systems should be tested with clinical cases. The other TG-166 cases (H&N, prostate, and brain) have been intentionally used for this purpose.

A.3 Relevance of radiobiological metrics
The main aim of patient-specific QA for modulated treatments is to ensure the quality of each individual patient treatment. The pretreatment QA measurement-based process must be considered to ensure the correct information flow from TPS plan calculation to treatment delivery in the linac by means of the record and verify (R&V) system. Such patient-specific QA is conventionally performed by delivering the patient plans to a phantom with detectors, and comparing the calculated and measured dose in the phantom. Recent studies have investigated estimating the delivered patient dose from QA measurement, resulting in several new QA tools. These new approaches to patient-specific QA open the possibility of adopting patient dose-based metrics that are more relevant to the expected treatment outcome. Therefore, new QA metrics should be introduced in order to effectively take into account the clinical impact of possible calculation/delivery mistakes on the treatment outcome for modulated treatments. Considering radiobiological parameters as potential indicators of this clinical outcome for radiation therapy treatments, such parameters could be included in these metrics. The previous statement goes beyond the usual patient-specific QA flow, which is based in dose distribution (2D or 3D) comparisons. In this way, patient-specific QA metrics could directly evaluate the variation of expected clinical outcome between the planned dose and delivered dose.
In addition to the previous discussion, radiobiological metrics could be used to evaluate the robustness of a radiotherapy treatment from the point of view of its sensitivity to possible perturbations. From the sigmoid shape of TCP and NTCP curves, the high region for TCP and the low region for NTCP are the ideal regions for these indices, because results are less sensitive to possible changes. Determining the robustness of a plan could be used to prevent the influence of different possible sources of error and could lead to improvements in the quality of the plan before treatment.

A.4 Limitation in the definition of action levels
Any QA metric should have sensitivity in order to show the impact of possible errors on the patient treatment. Action levels for classical metrics, as gamma analysis, have been widely studied. However, gamma passing rates could not catch clinically relevant patient dose errors. (16,17) TCP and NTCP models have been introduced in order to take into account the treatment outcome. In this way, these metrics permit to concentrate on the errors that are of clinical importance. However, the impact of an error source on the clinical outcome depends on the location/characteristics of the PTV and the possible OARs. As an example, the traditional action levels for gamma analysis (3% of dose difference) could be applied to evaluate the observed difference in a serial normal tissue, as the spinal cord. If the maximum dose to spinal cord is, for example, 5 Gy, an error of 3%, 5% or even 10% probably will have no impact on the NTCP for this tissue. When the NTCP slope starts to be steep, the possible impact on NTCP becomes increasingly important. In order to implement new patient-specific QA protocols, action levels for new metrics should be defined. Nevertheless, the definition of action levels might be beyond the scope of the present study. Future application of these metrics to a large amount of modulated treatments/disease sites/PTVs/OARs could give us enough information and statistics in order to define a correct level for the acceptability/rejection of gEUD/TCP/ NTCP variations, depending on the case.

B. Verifications of biological metrics for both systems with TG-166
Discrepancies between TPS and the two systems were in good agreement for gEUD calculations. Maximum discrepancies were found for small volume structures, like inner ear, where small differences could lead to poor results. For TCP and NTCP comparisons, discrepancies were better than the previous results. The worst results were found in the benchmark phantom test case.

C. Evaluation of prostate treatments with classical (gamma) and DVH-based
dose-volume metrics. COMPASS-calculated and reconstructed passing rates were slightly better than those from Mobius3D. Results were above the TG-119 action level of 88% for composite dose gamma analysis, (62) excluding some results with the 2%/2 mm gamma criterion: COMPASS calculated mean values for high-risk pelvic lymph node PTV, COMPASS reconstructed mean values for high-risk prostate PTV, and some Mobius3D results (high-risk prostate and pelvic lymph node PTVs, prostate bed PTV, rectum for the bed case, and bladder for the high-and low-risk and bed cases).
DVH comparisons were comparable to results from other studies. (63)(64)(65) Both systems led to similar results for dose-volume parameters. CC calculations improved the M3D values, and Mobius3D results were better than CR values.

D. Biological metrics applied prostate treatments
Radiobiological calculations led to comparable discrepancies for the three systems. The statistically significant differences were shown in the previous section. For gEUD values, differences were larger for OARs than for target volumes. Considering the TCP calculations, absolute TCP values calculated for the TPS with the Poisson model were, in general, slightly larger than those from the sigmoidal and Niemierko models. These absolute values were comparable between the models, with the exception of the results for the high-risk prostate case taking the values from Cheung et al. (54) (sigmoidal and Niemierko values were very close, but they were about 20% lower than Poisson results), and the prostate bed case taking the values from King et al. (56) (sigmoidal and Niemierko values were also very close but about 15% lower than Poisson results) and Levegrun et al. (57) (Poisson results were about 10%-15% higher than those from sigmoidal and Niemierko values.) The observed discrepancies in absolute TCP values come from the model/parameter selection and they were also derived for M3D, CC, and CR results, preserving the low differences obtained in the comparisons. For NTCP calculations, absolute values were comparable between both the LKB and Niemierko models and also preserved the low discrepancies in the comparisons.
Correlation results between DVH and gEUD/TCP/NTCP differences were analyzed in the previous section. Correlation between D 50 and TCP was stronger than results obtained comparing D 98 and D 2 . TCP increase/decrease was highly correlated with D 50 , which is related to the mean dose. Correlation between NTCP and dose-volume parameters for OARs was larger for rectum than bladder or femoral head values. These results could be explained using ideas from the study by Zhen et al., (66) based on the discussion expressed in Discussion section A.3 above. If TCP/NTCP values are located in low-gradient regions of the sigmoid curve (high TCP or low NTCP), the clinical outcome is less sensitive to changes in the plan, like dose-volume discrepancies. NTCP values for bladder and femoral heads are low and extremely low, respectively (Fig. 4). This could explain the lack of correlation between changes in DVH parameters and NTCP results in these cases.

V. CONCLUSIONS
This study has explored the potential usefulness of biological metrics in patient-specific QA process. Initial evaluation of radiobiological data extracted from dose calculation and reconstruction performed by Mobius3D and COMPASS systems were carried out by means of TG-166 test cases. The capabilities of the systems for 3D dose calculations and reconstructions were assessed with classical metrics, obtaining comparable results between both systems. Radiobiological metrics expressed in terms of comparisons between different indices (gEUD, TCP, and NTCP) were applied to a paradigmatic case of VMAT delivery (prostate treatment). The possibility of using radiobiological calculations as alternative metrics was introduced in order to evaluate the expected clinical outcome of radiotherapy treatments in the scope of pretreatment patient-specific QA. Fig. 4. NTCP values for rectum, bladder, and femoral heads calculated by the TPS for the high-, intermediate-, and lowrisk and prostate bed cases. NTCP values are located in low-gradient regions of the sigmoid curve; therefore, the clinical outcome is less sensitive to changes in DVH parameters.