Evaluating the sensitivity of Halcyon’s automatic transit image acquisition for treatment error detection: A phantom study using static IMRT

Abstract Purpose The Varian Halcyon™ electronic portal imaging detector is always in‐line with the beam and automatically acquires transit images for every patient with full‐field coverage. These images could be used for “every patient, every monitor unit” quality assurance (QA) and eventually adaptive radiotherapy. This study evaluated the imager’s sensitivity to potential clinical errors and day‐to‐day variations from clinical exit images. Methods Open and modulated fields were delivered for each potential error. To evaluate output changes, monitor units were scaled by 2%–10% and delivered to solid water slabs and a homogeneous CIRS phantom. To mimic weight changes, 0.5–5.0 cm of buildup was added to the solid water. To evaluate positioning changes, a homogeneous and heterogeneous CIRS phantom were shifted 2–10 cm and 0.2–1.5 cm, respectively. For each test, mean relative differences (MRDs) and standard deviations in the pixel‐difference histograms (σRD) between test and baseline images were calculated. Lateral shift magnitudes were calculated using cross‐correlation and edge‐detection filtration. To assess patient variations, MRD and σRD were calculated from six prostate patients’ daily exit images and compared between fractions with and without gas present. Results MRDs responded linearly to output and buildup changes with a standard deviation of 0.3%, implying a 1% output change and 0.2 cm changes in buildup could be detected with 2.5σ confidence. Shifting the homogenous phantom laterally resulted in detectable MRD and σRD changes, and the cross‐correlation function calculated the shift to within 0.5 mm for the heterogeneous phantom. MRD and σRD values were significantly associated with the presence of gas for five of the six patients. Conclusions Rapid analyses of automatically acquired Halcyon™ exit images could detect mid‐treatment changes with high sensitivity, though appropriate thresholds will need to be set. This study presents the first steps toward developing effortless image evaluation for all aspects of every patient’s treatment.


| INTRODUCTION
As radiotherapy treatment plans have increased in complexity, the need for careful patient-specific quality assurance (QA) has increased. Current recommendations for patient-specific QA are focused on pretreatment verification [1][2][3][4] where the patient's plan is delivered to either a phantom or air and the absolute dose is measured using ion chambers, film, or the electronic portal imaging device (EPID). 5 These techniques can be time-consuming and will not detect errors due to improper patient setup, changes in the patient's weight, or corruption of the treatment plan file that occur after the creation of the verification plan. As a result, various groups have investigated the potential of using the EPID to measure patient exit dosimetry in hopes of catching these failure modes. [6][7][8][9][10][11] In exit dosimetry, EPID is positioned behind the patient during treatment and detects the full exit fluence from each beam.
The panel is calibrated 7,9,[12][13][14][15][16] to convert the results to dose. The EPID-measured dose can then be compared to a precalculated expected dose 10 or back-projected 17 to determine the 3D dose that was deposited in the patient. With both techniques, the EPID monitors the entire treatment process and thus has the potential to detect errors in the beam delivery (MLC motion, beam output) and changes in the patient (motion during treatment, weight changes). While back-projection gives more information about the delivered dose distribution in the patient, comparing the EPID measured dose to a forward calculated prediction can be faster and would allow for detecting differences mid-treatment in time to correct them.
While several research groups have developed highly accurate techniques to acquire and analyze EPID transit images, 10,11,17-19 few clinics are using exit dosimetry as part of their standard quality assurance. 20 This slow adoption can be partially attributed to resistance to changes in the clinical work flow, an increased risk of collision when the EPID is extended, and a lack of implementation guidelines. Consistent use of exit dosimetry for every fraction of every patient's treatment would increase the frequency at which errors could be detected 21 such as mid-treatment motion or changes in the beam and thus increase patient safety and confidence in treatment delivery. This was demonstrated by the Netherlands Cancer Institute which used in vivo EPID dosimetry to analyze 4337 patient plans over four years. 20 They detected 17 serious errors requiring intervention, nine of which would not have been detected with routine pretreatment quality assurance. 20 They now use off-line in vivo EPID dosimetry as part of their routine quality assurance and are evaluating modifications to perform online EPID dosimetry. 11 Halcyon™ (Varian, Palo Alto, CA) is a new therapeutic linear accelerator available from Varian. By design, the EPID for this linear accelerator is always in the path of the beam, and this EPID automatically acquires portal images for all fields during clinical mode.
Thus, exit dosimetry images are immediately available for analysis from every field of every patient's treatment without any changes to the workflow. Additionally, as Halcyon™ is an enclosed gantry, there is no risk of patient or couch collision with the EPID.
The purpose of this study was to evaluate the sensitivity of the Halcyon™ imaging panel to changes in patient size, patient position, and beam output. Having an automated quantifiable way to identify changes in patient size is important as source to surface distance (SSD) values to the patient body cannot be measured on the Halcyon because it does not have an optical distance indicator (ODI).
The KV-CBCT which can be used to verify the body contour has not changed substantially but may not show the full extent for large patients. Additionally, kV imaging was not initially available on the Halcyon and the Halcyon can still be installed with only MV imaging. The MV imager has a smaller field of view so for all pelvic patients the body contour could not be checked for patient weight changes. The poorer soft tissue contrast with MV imaging could also lead to improper patient setup and thus having an independent patient positioning check after image guidance would be useful.
Changes in patient position that occur during treatment after the initial image-guided patient alignment are also important as they can have an adverse effect on the planned dosimetry particularly when tight margins are used or sensitive organs at risk are directly next to the target. On the Halcyon, we cannot use surface guided imaging to monitor intrafraction patient motion during treatment because of the bore, so using exit images would decrease the uncertainty in patient positioning. Additionally, errors in couch translations from image guidance could be detected with this method. Changes in beam output larger than those identified during daily output checks are exceedingly rare but would have a disastrous effect on patients. Failures in MLC trajectories will manifest from the EPID's perspective as changes in beam output. Thus, adding an automated check for beam output will add an extra level of safety to our current clinical practice without requiring any changes to current workflow because these images are already being acquired.
This analysis provides a first look at whether this new device can easily and reproducibly detect changes in a patient's setup or treatment to prevent an ineffective and potentially dangerous treatment from occurring. The eventual goal is to provide a quantitative metric that can be automatically measured from these images and compared to action-level thresholds to impact clinical workflow. This would allow for immediate online verification that supplements pretreatment QA by effortlessly identifying errors that occur during treatment delivery including errors that would have previously remained undetectable.

| MATERIALS AND METHODS
The Halcyon comes equipped with a Varian aS1200 22,23 digital megavoltage imaging panel (Varian Medical Systems, Palo Alto, CA) that is mounted directly opposite the single energy 6X-FFF MV source, Fig. 1. The panel is located at a source to imager distance of 154 cm, has a physical size of 43 cm × 43 cm with a 28 cm × 28 cm isocentric projection, an image matrix of 1280 × 1280 pixels, and a projected pixel size of 0.22 mm in the isocentric plane. Images are acquired with 16 bit depth and a frame rate of 25 frames/sec. To prevent saturation during image acquisition, pixel "scaling is applied automatically if the intensity is close to the limit of the 16-bit resolution." 24 This scaling factor is recorded in the DICOM header. To allow for equivalent image comparison, all processed image intensity values in this work were multiplied by this scaling factor prior to measuring the relative differences between images.
When operated in portal dosimetry mode, the panel integrates the readout obtained from the entire treatment field. This mode is most commonly used for patient-specific intensity-modulated radiation therapy (IMRT) QA to compare the predicted dose map to the acquired portal dosimetry image via gamma analysis. 25 During treatment, this mode is automatically initiated for every field in a patient's plan. The resulting exit dosimetry images are automatically exported to the record-and-verify system (ARIA, Varian Medical Systems, Palo Alto, CA) with no manual intervention required. Currently no vendor-provided workflow is available for evaluating these images to detect changes in the patient setup and we do not currently have access to third party workflows. 10,[26][27][28] Additionally, the treatment planning system (Eclipse, Varian Medical Systems, Palo Alto, CA) cannot currently generate a prediction exit image for comparison with these daily exit images as it does for portal dosimetry although some independent research groups have developed tools to accomplish this. 29,30 In this study, three types of clinical situations were simulated using phantoms: (i) changes in beam output, (ii) changes in patient weight, and (iii) changes in patient positioning. An open beam and a modulated beam were delivered for each test, both at a gantry and collimator angle of 0°. The open beam was a 10 cm × 10 cm square field with 100 monitor units (MUs). The modulated beam was from a prostate patient's nine-field IMRT plan and delivered 141 MU with an approximate field size of 8 cm 2 × 8 cm 2 . Three phantoms were used for this study: a 12 cm high stack of solid water with width and length of 30 cm, a CIRS IMRT homogenous phantom (Model 002H5, CIRS Inc., Norfolk, VA), and a CIRS IMRT heterogeneous phantom (Model 002LFC) (see Fig. 2). All phantoms were set up with their midpoint at 100 cm SAD. After each test, the exit dosimetry images were exported from ARIA to MATLAB (The MathWorks, Inc. Natick, MA) for analysis.

2.A | Quantitative image analysis
The relative difference for each pixel between the baseline image and the subsequent test case was calculated for each detector pixel: where i and j are the discretized positions in the X and Y directions; I 0,ij is the default image for a particular test (e.g. the exit dosimetry image from the solid water slab with no additional buildup); and I X,ij is the image being compared to the default image (e.g. the exit dosimetry image from the solid water slab with 1 cm of additional buildup).
Then the mean of these relative differences (MRDs) was calculated with the following equation, where N x and N y are the total number of pixels in the image in the X and Y directions, respectively.
The standard deviation of the relative differences between each pair of pixels was also calculated to establish the uncertainty in each measurement: To determine if the measured MRD would be detectable, 10 where N meas is the number of repeat images. SD MRD estimates the shot-to-shot noise in the system. Thus, measured differences between images under different setup conditions must result in values greater than 2.5 times the SD MRD in order for the change to be detected with 2.5-sigma confidence.
To define the usable "in field" component of a modulated beam, we set a minimum pixel intensity threshold of 50% of the mean image value in the baseline image. The same pixel locations that were excluded in the baseline image were also excluded from the compared images.

2.B | Output change detection
Linac output is checked every morning prior to patient treatment and is unlikely to change significantly during the day, however, fail-  From the results of the cross-correlation function, the number of pixels required to shift the row from the second image so that it best aligns with the row from the upshifted image can be determined. Each pixel is 0.22 mm wide so this allows us to compare the calculated shift with the magnitude of the physical shift that was performed. As we knew 1D shifts had been applied, we used a 1D cross-correlation function. In a clinical setting, where the shift is unknown, a 2D cross-correlation function could be used. The edge detecting filter is used on the image first to highlight areas that can be used for alignment. Without an edge detecting filter, the crosscorrelation function can erroneously attempt to minimize differences in CT noise versus large differences at interfaces.

2.E | Patient study
To better understand the potential of this technology for a patient case, the daily exit dosimetry images for six prostate patients' treatments who were observed to have gas during some of their treatments were also analyzed. We selected for this variable as it can be We observed large gas bubbles within the treatment field on the first fraction for several of the 6 patients we selected. Rather than comparing all the images to an image with a large bubble, we selected the first fraction without a gas bubble for each patient as the baseline. In future prospective applications for on-treatment monitoring, the first fraction's image could be utilized to ensure consistency. For the purposes of this work, we sought to understand the day-to-day variations present in real cases. All of the images were categorized as having or not having gas. T-tests were conducted to comparing the MRD and σ RD values for images with and without gas for each patient. Because of the number of tests we conducted, multiplicity correction using the Bonferroni 36 technique was applied to these P-values. Corrected P-values < 0.05 were considered significant. In addition to measuring the MRD and σ RD , these images were also visually examined for anatomical features that could be used to detect mid-treatment changes in the patient positioning or setup. Shifts in the pubic symphysis were measured manually using ImageJ (ImageJ 1.5k, NIH, Bethesda, MD).

3.A | Phantom studies
The repeatability test using the CIRS homogeneous phantom  In the third test, the homogeneous CIRS phantom was shifted to simulate patient positioning changes. The resulting MRDs followed a clear trend, Fig. 4(d). For the open field, σ RD ranged from 1.43% to 23.14% for the 2-      shift magnitude was highly accurate and thus is the best metric for evaluating shifts. Future analyses will include shifts in more than one direction and with rotations to better represent the variety of positioning changes that are experienced in patient data.

| DISCUSSION
Quantitative thresholds for the different metrics will allow for treatments to be flagged for review by physicists similar to how currently couch position overrides or measured differences in SSDs over the course of treatment trigger physicists to conduct a more indepth review during their weekly check. It is worth noting that Halcyon does not have an optical distance indicator so SSDs cannot be recorded and on the daily imaging CBCT, the surface is often F I G . 8. The effect of gas on the two metrics (mean relative difference and σ RD ) was evaluated individual for each patient to determine if there was a significant difference. Five of the six patients had significant differences (corrected Pvalue < 0.05) for both metrics. *P < 0.05, **P < 0.01, ***P < 0.001.
F I G . 9. Examples of the patient exit dosimetry images from three fractions as well as the calculated relative difference between these fractions and the average image. A large gas bubble is immediately visible on the relative difference images from patient 2 fraction 2 which had the largest value for mean relative difference and σ RD . The pubic symphysis is apparent on all the calculated relative difference images. Note that the MV cone-beam computed tomography images were acquired prior to setup correction.
| 139 truncated for pelvis treatments due to the small field of view and thus changes in weight could go completely unnoticed without such metrics. This truncation is even more prevalent on Halcyon machines that are only equipped with the MV CBCT. When this technique is further developed to analyze the images as they are acquired instead of posttreatment, they could also prevent spine mistreatments from being aligned to the wrong vertebral body or errors in the executed couch translations after image guidance.
In clinical practice, more than one of these effects could be present at any given time, (e.g. patient weight loss and intrafraction patient motion). Additionally, in certain cases, one effect could even mask another, such as a slight decrease in the daily output of the beam cancelling out the effect from a decrease in patient thickness.
We limited the present analysis to investigating one effect at a time because the purpose of this study was to demonstrate that large clinically significant changes in either factor result in a measurable response from the panel. Increases in a metric would then raise a flag for the covering physicist to investigate further. Future studies will look at defining reasonable thresholds when multiple changes are present.
We also measured both the MRD and σ RD from the exit images for six patients treated clinically. The main purpose of the patient analysis was to demonstrate the feasibility of our workflow and metrics on real patient data. From the results, we observed that both MRD and σ RD were sensitive to changes arising from gas bubbles and thus could be valuable metrics to include for clinical evaluations.
In this study, we used a binary classification for the presence of gas and showed that these metrics showed significant changes with gas. This study had a few weaknesses, the primary of which was using the first fraction without noticeable amounts of gas as the

(a) (b) (c)
baseline for the quantitative analysis of the patients' images. Using an image predicted by the treatment planning system would be preferable since the baseline would be the actual patient position used for the plan. However Eclipse v15.1 cannot currently generate this predicted image. We planned to use the first fraction as the baseline, but for four of the six patients there were appreciable amounts of gas on day one and we did not want to penalize subsequent images for not having gas. We are working to develop a methodology to calculate the predicted image from the treatment plan within Eclipse or use a 3 rd party workflow to serve as the reference image for future studies and clinical implementation. Similar workflows have been developed by research groups and appear in the literature. 29,38 Additionally, in this current implementation, errors would not be detected until after a treatment fraction is delivered. For our initial implementation, we think this is adequate as it would serve to flag images for physics review and a deeper analysis rather than interrupting treatment for every anomaly. For certain errors such as weight loss, identifying trends over the course of several fractions would be enough for preventing treatment errors. In the long-term, we plan to further develop this technique so that images can be analyzed as they are acquired such as at every specified number of control points allowing us to prevent potential mistreatments.
Another potential weakness of this study was the fact that our methodology assumed IMRT treatments with fixed fields. Previously at our institution the Halcyon™ was only being used to treat static field-modulated plans, however, we recently expanded to using it for volumetric modulated arc therapy (VMAT) plans. In this situation, the anatomical landmarks we observed in the present analysis would not be visible in the integrated image because the gantry would move through treatment. To analyze transit images from VMAT plans, images would have to be generated and compared for the summation of the whole arc or extracted at a specified frequency 10,11 such as at each control point of a VMAT plan. 39 This might not impact the evaluation of the quantitative metrics proposed in this study, however, the sensitivity of the panel to changes in output, patient weight, and patient alignment may differ if it is averaged over an entire arc.
Instead of the three metrics proposed in this study, gamma analysis could be used to evaluate the transit image accuracy. However, we selected not to include gamma analysis because of studies demonstrating its limited sensitivity to IMRT errors. [40][41][42] We wanted a quick quantitative indicator that something has gone wrong which scales linearly with the size of the error. We believe the MRD and calculated shift accomplish this more efficiently than gamma analysis for output or buildup changes and lateral shifts, respectively.
In this study, we focused on finding the magnitude of change that was detectable. However these thresholds must be considered alongside of what magnitude of change is clinically relevant. 43 The clinically relevant magnitude of change will vary with treatment site and technique and potentially even with each patient and as a result is more challenging to measure. Work by other groups have typically used gamma analysis with standard pass criteria (3%, 3 mm) 20,25,44 to determine if measured transit images pass. One study using a 3%, 3 mm criteria had false positive alerts for 10 clinically irrelevant discrepancies, 19 while in a separate clinical implementation, a gamma analysis with looser criteria of 4%, 4 mm for pixels above a 10% threshold was used but no patients in the trial ever had such large differences. 45 Determining what limits should be used in our clinic to identify relevant errors will be an area of future study. One potential methodology for evaluating the clinically relevant magnitude of change for prostate patients would be to use a phantom with dynamic bladder and rectum compartments. A treatment plan could be generated using full bladder and empty rectum conditions, and then recalculated with varying levels of empty bladder and full rectum to determine at which point there is a significant dosimetric response. Then transit dosimetry images could be collected for the same set of bladder and rectum conditions to evaluate whether the panel detects significant differences.
In this analysis, we demonstrated the feasibility of extracting and analyzing exit dosimetry images from the Halcyon™ built-in portal imager. To do this, we developed three quantitative metrics for easy comparison of daily images to baseline images, and then showed that these metrics were sensitive to small changes in beam output, patient position, and patient weight changes. In the next iteration of this study, we intend to develop a workflow for generating the exit dosimetry image prediction from Eclipse. This will produce a more reliable baseline for analyzing patient images daily and allow us to analyze the large volume of patient images we have already acquired. The long-term goal of this study was to automate the analysis of these images in order to alert clinicians that the patient may require repositioning or replanning. In the final clinical implementation, our goal is to acquire these images for every patient on treat-