On correlations in IMRT planning aims

The purpose was to study correlations amongst IMRT DVH evaluation points and how their relaxation impacts the overall plan. 100 head‐and‐neck cancer cases, using the Eclipse treatment planning system with the same protocol, are statistically analyzed for PTV, brainstem, and spinal cord. To measure variations amongst the plans, we use (i) interquartile range (IQR) of volume as a function of dose, (ii) interquartile range of dose as a function of volume, and (iii) dose falloff. To determine correlations for institutional and ICRU goals, conditional probabilities and medians are computed. We observe that most plans exceed the median PTV dose (average D50 = 104% prescribed dose). Furthermore, satisfying D50 reduced the probability of also satisfying D98, constituting a negative correlation of these goals. On the other hand, satisfying D50 increased the probability of satisfying D2, suggesting a positive correlation. A positive correlation is also observed between the PTV V105 and V110. Similarly, a positive correlation between the brainstem V45 and V50 is measured by an increase in the conditional median of V45, when V50 is violated. Despite the imposed institutional and international recommendations, significant variations amongst DVH points can occur. Even though DVH aims are evaluated independently, sizable correlations amongst them are possible, indicating that some goals cannot be satisfied concurrently, calling for unbiased plan criteria. PACS number(s): 87.55.dk, 87.53.Bn, 87.55.Qr, 87.55.de.

are typically met, the significant personal preferences in plans raise concerns for effective and reproducible methods for comparing treatment plans among patients and institutions.
In general, such preferences may be introduced from a variety of sources, including delineation of anatomical structures, treatment planning system (TPS) variability, and institutional guidelines, but most importantly treatment planner's preferences. Accurately delineating tumors and organs is heavily dependent on both the quality of CT images and the experience of clinicians. Cazzaniga et al. (2) compared structure delineation for three patients by six radiologists and observed standard deviations ranging from 0.35 to 2.64 cm 2 of overlapping area. This reveals the need for more detailed constraints in addition to clinical experience in order to restrict such preferences. Fogliata et al. (3) observed tendencies introduced through the choice of TPS by comparing eight TPSs on four patients using scoring indices. They reported varying scores for organs at risk (OAR) and target depending on the employed TSP. Eriguchi et al. (4) observed variations due to institutional protocols by comparing treatment plans executed at multiple institutions using the same TPS on the same patient scans. They reported a D 50 to PTV that ranged from 43.6 to 51.2 Gy, and the volume that received 20 Gy or higher, V 20 , ranged from 12.2% to 18.9% over the four institutions. D x is the minimum dose to the hottest x% of the volume and V x is the fractional volume that receives at least x% of the prescription dose.
In principle, clinical dose and volume constraints and guidelines are used to overcome the aforementioned sources of variations, as well as to assess the quality of plans. (5,6) However, despite rigorously imposing these constraints, Das et al. (7) observed substantial differences in the DVH shapes when varying patients, planners, and TPS. An additional factor is the allocation of time to generate a plan, as typically very little time is devoted for optimization. (8) Dosimetric uncertainties can also alter the quality of plans. (9) The DVH constraints are often imposed through specific dose-volume points as a comparison metric, such as D 2 , D 50 , D 95 , and D 98 . They may also directly serve as DVH constraints in plan optimization, as suggested by Cambria et al. (5) In order to adhere to these constraints, treatment plans are often iteratively adjusted by relaxing one or more constraints to achieve the desired result. This iterative process is heavily dependent on the planner's skill and experience, as demonstrated by Nelms et al., (10) who suggested the need for best-practices criteria as part of a continuous improvement strategy to reduce significant plan variations. Subjective preferences in trading off DVH constraints during the planning are reflected in the final plan's DVH control points and their correlations.
In this work, we study the dependence amongst these DVH control points. Since we cannot assume the DVH variations amongst cases to follow a Gaussian distribution, nonparametric statistical tools are employed in two main steps: 1. Variations amongst plans, measured by median and interquartile range. 2. Correlations between planning aims, computed via conditional probability and median.
We observe significant variations surrounding the DVH constraints even though most of the clinical goals were satisfied. It is shown that the current clinical DVH constraints require prioritizing amongst them, which limits the reproducibility and standardized plan quality. Furthermore, we demonstrate trends in increased dose delivery, suggesting dissension in evaluating treatment plans.

II. MATERIALS AND METHODS
One hundred head-and-neck cancer cases from Indiana University, Melvin and Bren Simon Cancer Center are selected based on their anatomical comparability. The corresponding treatments for all cases are planned using the same TPS (Eclipse, Varian Medical System, Palo Alto, CA), following identical institutional dosimetric criteria. Besides anatomical differences, possible variations stem from plans that were produced by different planners. Based on clinical advice, brainstem and spinal cord are identified as the highest priority OAR to be spared. For brevity, we shall refer to spinal cord as spine. In many cases when the lesion was at or adjacent to the skin, the patient was scanned with bolus. Consequently, the PTV is deliniated to be within 2-3 mm of the skin surface. (11) For the cases with multiple PTVs, the one with the largest prescribed dose is identified as the primary target to be irradiated. We analyze all targets and report the primaries as the main PTV for this study to maximize the contrast between the competing objectives of tumor irradiation and OAR sparing.
These plans were optimized based on dosimetric constraints for PTV of D 95 ≥ 95% and D 2 ≤ 107% of the prescribed dose, for spine D max ≤ 45 Gy, and for brainstem D max ≤ 54 Gy, following the institutional guidelines. All DVHs are collected from the TPS with a resolution of 0.1 cGy dose. Furthermore, the DVHs for PTV are collected using relative volume and dose points, allowing effective comparisons. Note that the median absolute prescribed dose over all patients is 70 Gy, with an interquartile range of 10.3 Gy. Moreover, in order to compare specific dose-volume points, DVH values are linearly extrapolated by binning remaining doses with 0% volume.
The quantitative analysis is divided into two parts: variations amongst plans and correlations between DVH control points. For the first part, we use three statistical metrics that are well suited for such nonsymmetrically distributed data, namely the interquartile range of dose for changing volume, interquartile range of volume for changing dose, and the approximate DVH gradient to study variation in the falloff. For the second part, we use conditional measures for institutional and for internationally recommended DVH goals. Note that the methods and the results section will follow this two-part structure. The details of our analytical method are as follows.

A. Variations in plans
The interquartile range is an estimator of spread, computed as the difference of the upper and the lower quartile (Q 3 -Q 1 ) of a distribution. It is inherently robust to changes in scale, hence uninfluenced by outliers. (12) Robustness is important, because an otherwise optimal solution (statistical estimator) may render suboptimal and unreliable when uncertainties occur. (13,14) To study the spectrum of relevant variations amongst plans, we employ: (i) the IQR of volume as a function of dose, (ii) the IQR of dose as a function of volume, and (iii) the median DVH falloff and its IQR. The details are as follows.

A.1 Interquartile range of volume
is a statistical estimator of V x distribution at a dose value D and quantifies directly the spread. In this study, IQR vol (D) is recorded over the entire dose range in increments of 0.1% relative dose (tumor) or 0.1Gy (OAR).

A.2 Interquartile range of dose
IQR dose (V) measures the spread of D x distribution and is recorded over the entire volume range at V = 0.1% increments. Note that IQR dose (V) and IQR vol (D) are estimators of two different and orthogonal distributions and, hence, independent of each other.

A.3 Dose falloff
∇ serves as a measure for the DVH falloff. It is determined at two neighboring DVH point pairs (D1,V1) and (D2,V2) via finite differences where D2-D1 = 0.1%. V2 and V1 are the corresponding volumes. The schematic setting is illustrated in Fig. 1. The objective of this analysis is to identify whether a rapid falloff or a slow falloff (blue or red DVH in Fig. 1) was preferred for a specific plan. In order to observe such variations and identify trends amongst past decisions, we compute the median DVH falloff, ∇(D), for all patients as function of dose. Additionally, the interquartile range of the falloff, IQR ∇ , is determined in order to observe potential dissension in decisionmaking favoring a slow or rather a rapid DVH falloff.

B. Correlations in plans
The iterative treatment planning process aims to satisfy DVH constraints determined by both institutional and internationally recommended aims. In practice, however, it is often not feasible to satisfy all constraints, which in part justifies the existence of institutional guidelines which, in some cases, deviates from the recommended aims. As a result, treatment planners choose to trade off some constraints. The underlying assumption of this pursuit is that it is admissible to violate one or more of the aims, while still providing an overall satisfactory and clinically acceptable plan. To analyze the dependence amongst the constraints, we determine statistical correlations for PTV, brainstem, and spine constraints.
For each DVH control point i, we divide the set of the plans into those that satisfy the constraint i (subset Y i ) and those that violate it (subset Ȳ i ) with Y i ∩ Ȳ i = ∅. To establish correlations, we use conditioning for two differing control points i and j as where P(•|•) denotes the conditional probability. In other words, if violating the constraint j reduces the probability of constraint i to be met, these two constraints are positively correlated and negative otherwise. Here, probability reflects the relative number of plans in the subset. Additionally, we quantify the rate of change by computing conditional medians and derive conclusions analogous to Eqs. (2) and (3). In addition to these imposed goals, we also consider empirical control points to measure the correlations of the imposed goals to their surrounding DVH sections. To this end, when the plans exhibit unanimous agreement at a particular high dose by displaying a marginal spread, defined as IQR vol (D) ≤ 5%, we denote the corresponding (constrained) dose as D con . Conversely, neighboring unconstrained DVH points (within ± 5%) with significant deviation (i.e., IQR vol (D) >> 5%) are denoted as D unc (unconstrained dose). Such constrained and unconstrained points subsequently serve to qualify additional subsets. The analysis is divided for DVH constraints that stem from institutional protocols and those recommended by the ICRU-83. (15) The control points are as follows

B.1 Institutional aims
The institutional dosimetric constraints for PTV are D 95 ≥ 95% and D 2 ≤ 107% of the prescribed dose, for spine D max ≤ 45 Gy, and for brainstem D max ≤ 54 Gy. These control points are used to establish subsets of satisfying or violating plans, along with their neighboring points. The corresponding D con and D unc are data-specific and will be discussed in the Results section below.

III. RESULTS
Despite consistent planning protocols and the normalizing nature of DVH, sizable variations amongst the 100 head-and-neck cases are observed for PTV, brainstem, and spine in Figs. 2(a), (b), and (c), respectively. Maximum PTV dose ranges 104% and 120% of the prescribed dose. While the DVHs for most cases overlap for PTV, we observe a wider spread for brainstem and spine (see Figs. 2(a) to (c)). For brainstem, a rapid decrease is apparent in low dose areas and a plateau in the mid-volume range for most cases. For spine, we observe an extended plateau at high-volume range with a decrease for higher dose for most cases. To quantify these variations, we employ the estimators introduced in Material and Methods section 7.

A. Variation estimators
As observed in Figs. 2(a) to (c), the distribution amongst plans is not symmetric. This is expected, given the nonincreasing nature of cumulative DVH. Therefore, IQR is well suited to characterize the spread of the asymmetrical DVH distributions. We consider an IQR > 5% at a DVH goal to indicate a significant variation amongst plans.

A.1 Interquartile range of volume
In clinical practice, volume constraints are often imposed at specific points, to achieve desired dose distributions (e.g., V 95 on tumor). (16) In order to meet these DVH constraints, the weights on other clinical objectives are adjusted and at times completely relaxed. These decisions are case-specific and depend on the discretion of the oncologist and/or planner, leading to variations of V x over the patients. The spread of this distribution can be measured by IQR vol (D) and is illustrated in Fig. 3. Figure 3(a) shows unanimous agreement of PTV's IQR vol (D) until 90% prescribed dose (IQR vol = 0%). The maximum deviation is reached at IQR vol (104) = 38%, and vanishes again at IQR vol (111) = 0%, depicting the DVH falloff region. The observed peak at D = 104% indicates an increased spread at D = 104%, which can stem from relaxed PTV constraints or how the objectives are specified, amongst others. Note that the recommendations of the Radiation Therapy Oncology Group allow for up to 20% of the tumor volume to receive more than 110% prescribed dose, which supports the observed tendency to tumor excess dose. (17) The overall narrower and lower IQR vol (D) of PTV reflects the higher importance assigned to PTV in comparison to brainstem and spine. Figure 3(b) illustrates that the volume variation for brainstem peaks at IQR vol (4) = 51%, followed by a second peak at IQR vol (17) = 43% with a subsequent steady decrease and vanishes at IQR vol (50) = 0%. One possible explanation would be that treatment planners prefer to relax constraints on brainstem at lower dose values, but tend to much tighter control at higher dose values, supported by the observed narrower spread. For spine, Fig. 3(c) reveals an increase of IQR vol (10) = 30% to IQR vol (32) = 65%, followed by a sharp decrease to IQR vol (45) = 0%. This result indicates tight constraints on higher doses for spine, whereas the large spread for D = [20,40] Gy suggests relaxation. However, a tighter control is not necessary because the probability of myelopathy vanishes for D < 40 Gy to the cord. (18) These observations may be considered as an indirect effect of the predominant D max constraints on brainstems and spines, as common in head-and-neck cases. (19)(20)(21)(22) Note that D max thresholds lead to V = 0% at D ≥ D max , resulting in narrower IQR vol .

A.2 Interquartile range of dose
In addition to the volume constraints, a set of constraints are imposed at specific dose values. Similarly, they may lead to variations in the surrounding DVH regions which we measure with IQR dose (V). For PTV, Fig. 4(a) illustrates negligible (IQR dose = 3%) for V ≤ 99%, with a steep increase to IQR dose (100) = 11%. This suggests that tight D x constraints are imposed on PTVs until V ≤ 95% followed by relaxed control for higher volumes, which is in agreement with the recommendations. (23) Figure 4(b) shows reduced IQR dose at higher brainstem volumes. This is in agreement with the recommendations that strict D max constraints are essential to maintain functionality of such critical organs. (19)(20)(21)(22) However, Fig. 4(c) for spine depicts significant spread around D 60 -D 80 , indicating a lack of D 50 constraint and a need for tighter control at higher doses.

A.3 Dose falloff
To achieve the goal of conformal target dose, intensity modulation is leveraged to deliver heterogeneous distributions, allowing for a sharp dose falloff on DVHs. (24) A rapid dose falloff at the prescribed dose allows to reduce risks of complications. (25) A vertical falloff at the prescribed dose constitutes an ideal target DVH. To achieve this goal, near-minimum dose (D 98 ) and near-maximum dose (D 2 ) constraints are imposed. (26) However, due to competing objectives, a sharp and perpendicular falloff is usually infeasible. Therefore, some constraints are relaxed, leading to ∇ variations amongst plans. The median ∇ and IQR ∇ are shown in Figs. 5(a) to (c) for PTV, brainstem, and spine. Figures 5(a) to (c) show a marginal planning variation for PTV, in comparison to brainstem and spine. This is seen by narrow IQR ∇ surrounding the prescribed dose, which is typically a result of employing multiple constraints. For brainstem and spine, however, the significant IQR ∇ is spread over multiple regions, demonstrating variabilities in decision-making as a result of employing single maximum dose constraints. More specifically, Fig. 5(a) shows a vanishing PTV median ∇(D) for D ≤ 95%, followed by a drop to median ∇(104) = -13. This implies that a dose change of 1% results in a -13% volume change. In other words, the falloff region V = [100,0]% extends over a median dose interval of 7.7% (= 100/13), exhibiting a narrow falloff region through tighter control. This is followed by a rise to ∇(110) = 0. This change in median ∇(D) for D = [95,110]% coincides with the region surrounding the prescribed dose. Noticeably, the sharpest falloff occurs at D = 104% instead of D = 100%, further suggesting planners' preference towards higher dose, as also reported by Eisbruch et al. (17) Figure 5(b) illustrates that the sharpest falloff for brainstem occurs at very low-dose values (D = 2 Gy), demonstrating tight dose constraints. Unlike brainstem, the median ∇(D) for spine in Fig. 5(c) shows two distinct turning points at D = 2 Gy and at D = 40 Gy. The first DVH falloff is attributed to an artifact of the patient geometry (for some cases, lower doses to significant portions of organs is common, but not for all). On the other hand, the turning point at D = 40 Gy is a direct result of maximum dose constraints for brainstem and spine, which are typically imposed in head-and-neck cancer cases. (19)(20)(21)(22) The spread around the median is shown via IQR ∇ in Figs 5(a) to (c). For PTV, negligible spread is observed for D ≤ 90%, followed by narrow spread for D = [90,95]% region, which is typically a result of strict constraints. (23) However, significant spread occurs around the DVH falloff region, revealing variability in decision-making over this range, followed by the overall agreement beyond D ≥ 110%. For brainstem, the largest IQR ∇ is recorded at lower dose values, followed by a narrow deviation at higher dose points. This further suggests multiple constraints imposed at higher dose for this critical organ. Similarly for spine, beyond the low-dose spread, a significant spread is observable only between D = 30 Gy and D = 45 Gy. This demonstrates both the impact of maximum dose constraints and a lack of median dose constraints. Since the maximum dose constraints are typically rigorously enforced on both PTV and critical organs, competing dose-volume constraints are relaxed to generate feasible plans. Note that the use of statistical estimators for symmetric distributions would misrepresent the actual distribution here. For example, the use of mean and standard deviation for ∇ reveals a substantial ripple at D 85 (shown in Appendix A), which is not observable with median ∇ and IQR ∇ in Fig. 5(a). However, this ripple is caused by four outliers, while the median and interquartile ranges are insensitive to outliers and, hence, appropriately describe the observed asymmetric distribution. While such outliers may reveal additional insights, their analysis is beyond the scope of this work.

B. Correlations in plans
Since all the studied plans were clinically approved, it is justified to expect broad agreement at dose-volume constraints enforced in the study, while variations may exist at other constraints which were relaxed. To establish possible consequences of such goal relaxation, we included empirical constraints in the vicinity of the DVH goal points. This allows for a guide to gauge potential risks for the overall plan that are associated with violating some constraints. We first focused the analysis on the institutional DVH goals that were enforced for the studied cases. We then analyze the cases based on additional constraints recommended by the ICRU-83, (15) since a subset of them overlaps with the institutional constraints.

B.1 Institutional aims
When the plans exhibit unanimous agreement at a particular high dose by displaying a marginal spread (i.e., IQR vol (D) ≤ 5%), we denote the corresponding (constrained) dose as D con . We observed a PTV constrained dose point with IQR vol (D) ≤ 5% at D con = 110%. For brainstem, D con = 50 Gy and for spine, D con = 45 Gy. This corresponds to maximum dose constraints that are usually imposed on all structures, often being the only constraint for brainstem or spine. (19)(20)(21)(22) When V 110 , V 50 , or V 45 are negligible (≤ 2%), we hypothesize them to be a consequence of maximum dose constraints, and otherwise (> 2%) independent. On the other hand, we observe a neighboring (within ± 5%) unconstrained point with IQR vol (D) >> 5% at D unc = 105% for PTV, D unc = 45 Gy for brainstem, and D unc = 40 Gy for spine. Note that, due to the nonincreasing nature of cumulative DVHs, V 105 ≥ V 110 , V 45 ≥ V 50 , and V 40 ≥ V 45 for PTV, brainstem, and spine, respectively. This implies that an equal or larger spread (from V ≤ 2% objective) is expected at D unc when compared to D con , since the constraints are found at higher volumes. To establish deviations, we observed two clusters of plans, namely those satisfying V 105 , V 45 , or V 40 by being less or equal to 2% and those which were significantly different at a neighboring (± 5%) point D unc . The two groups maximally separate, when the volumes at D unc exceeded 45%. Therefore, the hypothesis is that such violations (> 45%) are related to meeting other DVH constraints, and otherwise (≤ 45%) independent. To summarize these relationships, Table 1 serves as an overview on institutional planning aims. Note that each table element can serve as a subset, as defined in Materials and Methods section B.
Significant overdose tendencies are observed for PTV. For brainstem, minimal deviations in comparison to spine is apparent, which has higher percentage of constraint violations. This demonstrates the increased planning importance for brainstem. Specifically, Fig. 6(a) for PTV shows substantial violations at V 105 (> 65% for some patients) in comparison to V 110 , where violations are negligible (on average ≤ 2%). This implies substantial volume portions received D > 100%, supporting aforementioned observation of tendencies to deliver higher doses to PTV. More importantly, this result shows larger violations at D unc for cases that also have slight violations at D con . This result is more evident when evaluating the conditional median. In particular, the conditional median of V 105 given a violation at V 110 is Median(V 105 | V 110 > 2%) = 50% compared to Median(V 105 | V 110 ≤ 2%) =19% for PTV.
Similarly, D 95 = 95% constraints (denoted as V con ) were imposed on PTV, which is a common institutional choice. (27) The neighboring volumes are considered V unc (e.g., at D 100 ). Patients with significant deviations (D < 70%) at D 100 also show deviations from the established D 95 constraints (not shown). The conditional Median(D 110 | D 95 < 95%) = 62% versus Median(D 110 | D 95 ≥ 95%) = 90% shows improvement in neighboring areas when constraints are satisfied. Figure 6(b) illustrates negligible disagreement at D unc for brainstem. Even though violations are observed, they are not due to decision tendencies but rather due to the nonincreasing nature of cumulative DVHs and the already existing violations at D con . When comparing the conditional medians for brainstem, we observe a difference between Median(V 45 | V 50 > 2%) = 33% and Median(V 45 | V 50 ≤ 2%) = 0%. For spine, Fig. 6(c) shows significant disagreement at D unc propagated with violations at D con . There is also a sizable difference between Median(V 45 | V 50 > 2%) = 45% and Median(V 45 | V 50 ≤ 2%) = 9% for spine. This result indicates that marginal deviations from DVH constraints can (in median) lead to significant degradation in neighboring unconstrained points and, hence, limit the plan quality. Overall, Fig. 6 shows the Fig. 6. Impact of DVH volume constraints. Large volume deviations are observed at D unc (red) compared to D con (blue). case-by-case impact of constraint violation for all cases. Beyond the conditional changes, we next provide a quantitative guide for the associated risk for a plan when constraints are violated.
In practice, not all clinical goals can be satisfied concurrently, resulting in constraint relaxation. To measure the risk (probability) that violating some constraints may have on other goals, we evaluate the plans at D con = 110% for PTV and D con = 50 Gy, and 45 Gy for brainstem, and spine, respectively. We hypothesize that D con satisfaction is correlated to D max constraints, or otherwise (> 2%) independent. Additionally for PTV, we evaluate D con = 95%, for which we hypothesize that V 95 ≥ 95% if satisfied, or V 95 < 95% if violated. For correlation analysis, we divide all plans into subsets defined in Table 1 and apply the conditional measures of Materials and Methods section B: Subset A: plans satisfying D max goal, leading to V ≤ 2% at D con ; Subset B: plans satisfying V 95 goal, leading to V ≥ 95% at D con ; Subset C -: plans violating low volume at high-dose goal, leading to V ≥ 45% at D unc . For brainstem, a substantial improvement is observed for plans that satisfy V 50 (subset A). In fact, the IQR of A and A do not overlap, suggesting that significant improvements are possible, if the constraints are satisfied. For spine, moderate reduction in dose is observed for plans that satisfy the V 45 constraint (subset A), suggesting a similar potential for improvements. Figure 7 shows that violating constraints goes beyond the common notion of "trade-off" and actually affects the entire DVH. It allows one to determine the impact of violating one of these constraints on the overall plan quality. Therefore, this analysis can inform clinical planning decisions to quantitatively gauge potential risks of constraint relaxations. This impact can also be accessed by comparing the probabilities of delivering high doses to larger volume portions in neighboring and unconstrained DVH points (D unc ) when conditioned on violating (or meeting) the goals at D con . In other words, we compare the conditional probability P(C -|A -) (at D con ) to P(C -|A) (at D con ). Therefore, the impact of not meeting PTV's constraints is observed through the comparison of (4) and (5) This demonstrates a positive correlation between V 105 and V 110 for the PTV, namely violating one increases the probability of violating the other as well.
Similarly for brainstem, the probabilities for degrading the plans are P brainstem (C -|A -) = 0.33 and for spine P spine (C -|A -) = 0.60. In comparison when D con are met, P brainstem (C -|A) = 0 and P spine (C -|A) = 0.08. The two proportion z-tests, P(C -|A -) reveals significantly larger than P(C -|A) for all structures with a p-value less than 0.0001. Additionally for PTV, P(D 100 < 70% | D 95 < 95%) = 0.69 is significantly larger than P(D 100 < 70% | D 95 ≥ 95%) = 0.01. Furthermore, the two proportions z-tests confirm this significant difference (p-value < 0.0001). Therefore, all these observations support the con clusion that significantly higher risks for degrading outcomes are expected when DVH constraints are violated. These results can serve to quantitatively inform the decision-making process in treatment planning.

B.2 ICRU aims
Similarly, we analyze the correlations amongst the noninstitutional aims, as defined in section B of Material and Methods. Table 2 summarizes the relevant internationally recommended planning aims by ICRU-83. (15) These aims may also become competing (i.e., they cannot be met concurrently), hence, compelling planners to relax or even ignore some of them. To measure correlations among the ICRU constraints, we similarly group patients into subsets as discussed in Materials and Methods section B: Subset E: plans that satisfy D 2 ≤ 107%; Subset F: plans that satisfy 98% ≤ D 50 ≤ 102%; Subset G: plans that satisfy D 98 ≥ 95%.
The impact of violating the ICRU-83 recommendations is shown in Fig. 8. The excess dose violation of subset E and the underdose violation of subset G in Fig. 8(a) can be considered as a result of the definition of these constraints. For the subset F in Fig. 8(b), the D 50 violation is attributed to an increased dose, confirming the overall overdose tendency observed through this analysis. This is also reflected in an elevated dose to larger volumes, hence satisfying D 98 , as demonstrated in Fig. 8(c) for the subset G. Conversely for F, D 50 is satisfied by reduced dose to larger volumes, which leads to violating D 98 and constituting G -. In other words, plans in subset F have an increased likelihood to also reside in G -. Similarly, those in F are more probable to also be in G. Therefore, the set of analyzed plans clearly suggests a negative correlation between the recommended aims of D 50 and D 98 , indicating competing goals that are unlikely to be met simultaneously. The pairwise conditional probabilities for all goals are summarized in Table 3. Given that 86% of the cases violate D 50 (subset F -), the probabilities of F conditioned on E or G are marginal.
However, when conditioning on F, a sizable positive correlation is observed for the subset E, since the corresponding probability decreases upon complementing F. In other words, these plans suggest that satisfying D 50 increases the probability of satisfying D 2 as well. On the other hand, a negative correlation is observed when G is conditioned on F, since the probability increases  when conditioning on F -. This means that satisfying D 50 reduces the probability of satisfying D 98 . This observed competing nature of these two recommended aims demonstrates that the likelihood of meeting them simultaneously is low. Since these plans were designed and delivered to the best ability and knowledge of experts, the result that not all ICRU-83 recommended aims could be satisfied supports the clinical need for institutional guidelines, such as D 95 , as prevalent amongst institutions. At the same time, we observe that these institutional goals lead to trends, such as excess dose tendencies, that potentially jeopardize both plan quality and their comparability across institutions.

IV. DISCUSSION
In treatment planning, point-wise dose-volume goals are not only used to enforce clinical objectives, or to directly optimize treatment plans, but also to choose between alternative plans. The analysis presented in this work seeks to identify areas of plan variation and constraint correlation. Therefore, a possible extension of the presented study includes developing and integrating unbiased constraints and estimators directly into treatment planning to mitigate variability.
To limit dissension in decision-making, this study reveals the need for constraints or comparison estimators that can be enforced for extended DVH regions. The incorporation of an unbiased estimator for treatment plan evaluation was suggested by Loveless et al. (28) A deviation metric was designed to capture the weighted difference between the realized and ideal DVH. The weights were extracted from past treated cases. Alternatively, benchmark DVHs were used for treatment comparisons for prostate cancer cases. (29) It needs to be noted that this analysis was confined to head-and-neck cases only, warranting comparable anatomies. Unique geometries for some patients still may contribute to deviations in received dose on certain structures. These variations are patient-specific and can occur in any treatment. Therefore, we chose robust estimators to immunize the conclusions against potential  Table 3. Probabilities of subset Y i in row i is conditioned on Y j in column j. For sizable differences, a positive correlation is marked in blue and a negative correlation in red. uncertainties. Furthermore, since the main goal of this study was to establish correlations, the resulting probabilities and risk measures are indented to inform future treatment planning, but cannot yield causations. The understanding of the nature of these effects, potentially confounded by the geometry, treatment planning system, dosimetric models, and other factors, is beyond the scope of this study. Nevertheless, these results have the potential to quantitatively augment the treatment planning, confining the trial-and-error nature of the decision-making process and, hence, reducing time and ambiguity.

V. CONCLUSIONS
The impact of trading off clinically employed DVH goals for treatment planning is demonstrated using three robust statistical estimators on 100 past head-and-neck cases. Significant plan deviations are observed on the interquartile ranges of dose and volume, as well as DVH falloff, especially on areas neighboring the constraints. This analysis shows that, when these constraints are even marginally violated, larger deviations are expected across the entire DVH. These results extend conventional notions of a "trade-off" between clinical goals to a quantitative risk measure that relates each constraint violation to degradation of the overall plan. In fact, some of the internationally recommended aims were identified to be negatively correlated, hence, competing. This study also identifies DVH goals of unanimous agreement amongst planners, as well as areas of dissension and deviations, calling for novel clinical constraints that go beyond the established dose-volume constraints and address extended areas of the DVH. Since these results relied on standard dosimetric criteria, they are of general nature and, hence, applicable to a wide range of IMRT planning settings.