Framework for the quantitative assessment of adaptive radiation therapy protocols

Abstract Background Adaptive radiation therapy (ART) “flags,” such as change in external body contour or relative weight loss, are widely used to identify which head and neck cancer (HNC) patients may benefit from replanned treatment. Despite the popularity of ART, few published quantitative approaches verify the accuracy of replan candidate identification, especially with regards to the simple flagging approaches that are considered current standard of practice. We propose a quantitative evaluation framework, demonstrated through the assessment of a single institution's clinical ART flag: change in body contour exceeding 1.5 cm. Methods Ground truth replan criteria were established by surveying HNC radiation oncologists. Patient‐specific dose deviations were approximated by using weekly acquired CBCT images to deform copies of the CT simulation, yielding during treatment “synthetic CTs.” The original plan reapplied to the synthetic CTs estimated interfractional dose deposition and truth table analysis compared ground truth flagging with the clinical ART metric. This process was demonstrated by assessing flagged fractions for 15 HNC patients whose body contour changed by >1.5 cm at some point in their treatment. Results Survey results indicated that geometric shifts of high‐dose volumes relative to image‐guided radiation therapy alignment of bony anatomy were of most interest to HNC physicians. This evaluation framework successfully identified a fundamental discrepancy between the “truth” criteria and the body contour flagging protocol selected to identify changes in central axis dose. The body contour flag had poor sensitivity to survey‐derived major violation criteria (0%–28%). The sensitivity of a random sample for comparable violation/flagging frequencies was 27%. Conclusions These results indicate that centers should establish ground truth replan criteria to assess current standard of practice ART protocols. In addition, more effective replan flags may be tested and identified according to the proposed framework. Such improvements in ART flagging may contribute to better clinical resource allocation and patient outcome.

Adaptive radiation therapy (ART) protocols replan treatment in response to anatomical changes to ensure that planned target coverage and OAR sparing are achieved. While successful ART approaches may improve clinical outcomes, 1-3 they are resource intensive. 4,5 Therefore, the clinical viability of ART depends on correctly identifying patients most likely to benefit from a replanned treatment. Selection criteria in the literature generally fall into three categories.
Image-based methods compare periodic cone beam CT (CBCT) or CT images with the CT simulation (CTsim) to identify any systematic physical changes. [6][7][8][9][10] Temporally based methods preselect the time at which a new plan should be calculated. 2,11 Patient characteristicbased methods examine pretreatment parameters such as weight and tumor stage to predict if and when a replan may be necessary. 8,12 In most protocols, these parameters indicate when a physician should make a judgment call regarding possible adjustment to immobilization, re-CT, dose recalculation, or replanning. Few dosimetric thresholds warranting a replan have been stated and efficient and easily implementable replan flags remain elusive. 2,3,8,13,14 Despite the variety of ART protocols used clinically, the accuracy of simple standard of practice ART replan candidate identification is rarely quantified in the literature. This work proposes a two-step, quantitative evaluation framework and exhibits its utility through the assessment of a single institution's clinical ART flag: a change in body contour exceeding 1.5 cm. Anecdotally, this type of flag is commonly used in many institutions. First, "ground truth" for dosimetric deviations requiring replanning were established by surveying radiation oncologists (ROs) treating HNCs. Second, flag performance was quantitatively assessed by comparing this "truth" to interfractional dose deviations. In this study, we assessed 15 HNC patients treated with VMAT whose body contour changed by >1.5 cm at some point in this treatment. This method of quantifying ART performance may allow clinics to identify more effective flags to improve clinical resource allocation and ultimately patient outcomes.

2.A | Protocol
For HNC VMAT patients treated in this study, kV-CBCT images were acquired approximately every five fractions. For select cases, CBCTs were also acquired for the first three fractions to assess setup reproducibility; CBCT acquisition was delayed until a later fraction if the patient was feeling unwell, due to the prolonged on-unit time, and CBCT images may have been taken on the day after a flag in body contour change for additional monitoring. Patients were imaged on the treatment couch using CBCT after kV-orthogonal x-ray acquisition and subsequent couch position adjustment and prior to treatment delivery. Radiation therapists performed a rigid registration of each CBCT with the CTsim according to institutional image-guided radiation therapy practices. The axial view of the rigid registration was then assessed to identify, for any axial slice, the largest pointwise distance between the CBCT and CTsim external contours.
The latter was used to quantify change in body contour and formally may be regarded as a maximum axial slice-based Hausdorff distance. In practice, this flagged weight loss and tumor shrinkage effects as well as changes in shoulder position. Those patients exhibiting a change in body contour exceeding 1.5 cm were "flagged" for consult with a medical physicist. The RO in collaboration with the physicist would then elect to refit the immobilization, re-CT, and/or replan treatment; clinicians may have elected to monitor patients if only a few (e.g., less than 5) fractions remained.

2.B | Patients
Fifteen consecutively flagged patients exhibiting a greater than

3.A | Survey results
Survey results are shown in Table 2. Values are stated with respect to institutional planning objectives or initial plan parameter values in the absence of formal planning criteria. Structure-specific unacceptable violations provided by the ROs were subsequently stratified into "major violations" and "minor violations" based on the magnitude of median responses and the relevance to treatment outcome, for example,   As expected, enlargement effects were more detrimental to tumor coverage than was subsequent shrinkage. Figure 2 shows changes in body contour as a result of weight loss. Here, flagging largely coincided with clinically significant changes in dose deposition, with  Table 2). Bold entries indicate fractions flagged by the protocol. (Contours: redhigh-dose GTV, orangehigh-dose PTV, yellowlow-dose PTV, cyanspinal cord, bluespinal cord with margin). *Clinically significant deviation according to the major/minor violation criteria (only those parameters violating Table 2 criteria are shown). † Low-dose PTV volume excludes the high-dose PTV volume.   Table 3).

3.B | Protocol assessment
major and minor violations would require the replanning of 13/15 of these "high-risk" patients. However, the extent to which dose discrepancies can be improved through replanning depends on factors such as patient anatomy and number of fractions remaining. While inferring the timing and frequency of replans required to avoid violations falls outside the scope of our retrospective study, the literature suggests that 2-3 replans in the first half of treatment is the most effective. 11,16,17 This study is limited by the necessary use of major/minor violation replanning criteria founded on physician experience and judgment rather than that of a formal quantitative analysis, as ART QUANTEC-type guidelines do not yet exist. Stoll et al. 18  to RO decision-making is not explicit in the study, which examines subsequent model development on the high-risk cohort. In contrast, logistic regression and nomography have been used to assess the predictive capability of complex, multiparameter ART protocols which use data acquired prior to and during treatment. 12,15 Our proposed framework may be used to assess the predictive capabilities of simple standard of care or multimetric flags characterized by a normal/abnormal threshold.

| CONCLUSION
A framework to quantify ART protocol performance in comparison to RO-specified unacceptable dose violation criteria was demonstrated for a common ART flagging metric: >1.5 cm change in external body contour. This framework successfully identified a mismatch between the flag's intended purpose of identifying changes in central-axis dose and the physician priorities to correct for geometric shifts. This work suggests that centers may similarly benefit by quantifying ART performance according to center-specific requirements.

CONFLI CT OF INTEREST
The authors declare no conflict of interest.

APPEN DIX A
(The following uses the notation TP = true positive, FP = false positive, TN = true negative, FN = false negative in keeping with Table 3 and Fig. 3.) To elaborate on how the 27% sensitivity of a random flag on a comparable patient sample is derived, we first assumed that approximately 26/121 fractions are flagged (TP + FP = 20%) while only 15% of fractions exhibit a clinically significant effect (TP + FN = 15%).
These proportions are in keeping with that of the study cohort. In addition, TP + FN + FP + TN = 100%. Furthermore, from analysis of receiver operating characteristic curves, we assume that a random flag is such that sensitivity is approximately equal to (1 − specificity): i.e., sensitivity = 4%/(4% + 11%) = 27%.