Volume 49, Issue 2 p. 1161-1180
RESEARCH ARTICLE
Free Access

The markerless lung target tracking AAPM Grand Challenge (MATCH) results

Marco Mueller

Corresponding Author

Marco Mueller

ACRF Image X Institute, The University of Sydney, Sydney, New South Wales, Australia

Correspondence

Marco Mueller, The University of Sydney, ACRF Image X Institute, 1 Central Ave Room 221, Sydney, NSW 2015, Australia.

Email: [email protected]

Present address

Dianne Ferguson, Department of Radiation Oncology, University of California San Francisco, San Francisco, CA 94143, USA.

Search for more papers by this author
Per Poulsen

Per Poulsen

Danish Center for Particle Therapy and Department of Oncology, Aarhus University Hospital, Aarhus, Denmark

Search for more papers by this author
Rune Hansen

Rune Hansen

Department of Medical Physics, Aarhus University Hospital, Aarhus, Denmark

Search for more papers by this author
Wilko Verbakel

Wilko Verbakel

Amsterdam University Medical Centers, Location VUmc, Amsterdam, Netherlands

Search for more papers by this author
Ross Berbeco

Ross Berbeco

Department of Radiation Oncology, Brigham and Women's Hospital, Dana Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA

Search for more papers by this author
Dianne Ferguson

Dianne Ferguson

Department of Radiation Oncology, Brigham and Women's Hospital, Dana Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA

Search for more papers by this author
Shinichiro Mori

Shinichiro Mori

Research Center for Charged Particle Therapy, National Institute of Radiological Sciences, Chiba, Japan

Search for more papers by this author
Lei Ren

Lei Ren

Department of Radiation Oncology, Duke University Medical Center, Durham, North Carolina, USA

Search for more papers by this author
John C. Roeske

John C. Roeske

Department of Radiation Oncology, Loyola University Medical Center, Maywood, Illinois, USA

Search for more papers by this author
Lei Wang

Lei Wang

Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California, USA

Search for more papers by this author
Pengpeng Zhang

Pengpeng Zhang

Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, New York, USA

Search for more papers by this author
Paul Keall

Paul Keall

ACRF Image X Institute, The University of Sydney, Sydney, New South Wales, Australia

Search for more papers by this author
First published: 16 December 2021
Citations: 14

Abstract

Purpose

Lung stereotactic ablative body radiotherapy (SABR) is a radiation therapy success story with level 1 evidence demonstrating its efficacy. To provide real-time respiratory motion management for lung SABR, several commercial and preclinical markerless lung target tracking (MLTT) approaches have been developed. However, these approaches have yet to be benchmarked using a common measurement methodology. This knowledge gap motivated the MArkerless lung target Tracking CHallenge (MATCH). The aim was to localize lung targets accurately and precisely in a retrospective in silico study and a prospective experimental study.

Methods

MATCH was an American Association of Physicists in Medicine sponsored Grand Challenge. Common materials for the in silico and experimental studies were the experiment setup including an anthropomorphic thorax phantom with two targets within the lungs, and a lung SABR planning protocol. The phantom was moved rigidly with patient-measured lung target motion traces, which also acted as ground truth motion. In the retrospective in silico study a volumetric modulated arc therapy treatment was simulated and a dataset consisting of treatment planning data and intra-treatment kilovoltage (kV) and megavoltage (MV) images for four blinded lung motion traces was provided to the participants. The participants used their MLTT approach to localize the moving target based on the dataset. In the experimental study, the participants received the phantom experiment setup and five patient-measured lung motion traces. The participants used their MLTT approach to localize the moving target during an experimental SABR phantom treatment. The challenge was open to any participant, and participants could complete either one or both parts of the challenge. For both the in silico and experimental studies the MLTT results were analyzed and ranked using the prospectively defined metric of the percentage of the tracked target position being within 2 mm of the ground truth.

Results

A total of 30 institutions registered and 15 result submissions were received, four for the in silico study and 11 for the experimental study. The participating MLTT approaches were: Accuray CyberKnife (2), Accuray Radixact (2), BrainLab Vero, C-RAD, and preclinical MLTT (5) on a conventional linear accelerator (Varian TrueBeam). For the in silico study the percentage of the 3D tracking error within 2 mm ranged from 50% to 92%. For the experimental study, the percentage of the 3D tracking error within 2 mm ranged from 39% to 96%.

Conclusions

A common methodology for measuring the accuracy of MLTT approaches has been developed and used to benchmark preclinical and commercial approaches retrospectively and prospectively. Several MLTT approaches were able to track the target with sub-millimeter accuracy and precision. The study outcome paves the way for broader clinical implementation of MLTT. MATCH is live, with datasets and analysis software being available online at https://www.aapm.org/GrandChallenge/MATCH/ to support future research.

1 INTRODUCTION

Stereotactic ablative body radiation therapy (SABR) is a successful and widely implemented treatment technique for unresectable primary or metastatic thoracic malignancies. Radiation doses with a biologically effective dose of 100 Gy or higher correlate with increased tumor control1, 2 and have been found to improve clinical outcomes for early-stage lung cancers.3-6 The positive results of the TROG 09.02 CHISEL trial7 provide level 1 evidence for the choice of SABR regimens in appropriately selected patients. However, current state-of-the-art techniques in radiation therapy are approaching the physical limits of shaping high doses to the target volume. Much is yet to be gained by improving motion control and imaging for radiation therapy.8 SABR treatment outcomes and efficacy may be limited by the fact that patients move, breathe, and their hearts beat, causing both inter- and intra-fraction tumor and organ at risk (OAR) motion during imaging and treatment. Therapy for lung targets has previously resulted in excess toxicity because of the proximity of critical OARs.9 The accumulated dose to OARs is limiting the potential of delivering multiple SABR treatments for patients with re-occurring cancers or multiple metastases.10 Adaptive radiotherapy with image guidance is a potential solution to further improve SABR treatment in the presence of motion. The American Association of Physicists in Medicine (AAPM) Task Groups 76: The management of respiratory motion in radiation oncology11 and 101: Stereotactic body radiation therapy12 have highlighted the need for accessible and cost-effective real-time image guidance strategies during radiation therapy.

The current clinical strategies to overcome target and OAR motion are a tradeoff between patient tolerance and treatment accuracy. Motion is most commonly accounted for by enlarging the target volume,13 a tolerable but less accurate approach, which potentially increases dose to the healthy tissue surrounding the target. An alternative solutions are surrogate-based target tracking techniques, which track surgically inserted radio-opaque markers14-16 or electro-magnetic transponders17-19 to estimate the target position. Although current surrogate-based target tracking techniques can achieve a high surrogate tracking accuracy, they do not track the lung target directly20-22 and are not suitable or tolerated by all patients because of the potential associated costs. These costs include the monetary expense of the markers, the implantation procedure cost, the toxicity of the procedure,23-26 the time taken and treatment delays due to the procedure, and the potential for treatment errors if the markers migrate between planning and treatment and the variation between marker motion and target motion.27

We define markerless lung target tracking (MLTT) as a real-time target position monitoring system28 for lung cancer radiation therapy applications where no exogenous materials (such as fiducial markers) are implanted into the patient to aid with the image guidance process. MLTT can localize the treatment target during treatment delivery and can be combined with several motion management methods for SABR treatments, such as exception gating, correcting for baseline drifts, respiratory gating, couch tracking or multi-leaf collimator tracking. MLTT aims to improve patient safety, reduce treatment margins and account for the breath-to-breath and day-to-day variation in lung target motion. MLTT does not require surgical interventions and is therefore more tolerable by lung cancer patients. It is not dependent on but complementary to techniques to restrict breathing, such as deep inspiration breath-hold,29 active breath-hold,30 or abdominal compression.31

Several commercial and preclinical MLTT approaches32-44 have been developed. Current commercial MLTT systems such as the Accuray CyberKnife have been used clinically since 2009,45-49 however they are not widely available due to the higher costs compared to conventional radiotherapy systems. MLTT approaches on a conventional linear accelerator (linac) have been proposed using either one or both of the on-board kilovoltage (kV) imaging system50 or the electronic portal imaging device (EPID),51, 52 although only approaches using the kV imaging system have been implemented prospectively in a clinical environment.40, 53 As illustrated in Figure 1, we hypothesize that MLTT is at the tipping point to evolve from a state-of-the-art approach on a few specialized clinical systems to an implementation on accessible conventional linacs.

Details are in the caption following the image
Clinical status and growth of X-ray-guided markerless lung target tracking (MLTT) with approximate first clinical dates based on publication dates. The first MLTT clinical implementations were on less available treatment systems,45-49 more recently MLTT was clinically implemented on a conventional linac.53 At the time of the MArkerless lung target Tracking CHallenge (MATCH), we are at the tipping point for MLTT to evolve from a state-of-the-art approach on specialized systems to an implementation on widely accessible conventional linacs

The MLTT approaches developed to date have yet to be benchmarked using a common framework. The purpose of the MArkerless lung target Tracking CHallenge (MATCH) was to bridge this knowledge gap and provide methods, data, and tools to benchmark current and future MLTT approaches. MATCH was designed for X-ray-guided radiation therapy systems. The in silico and experimental challenge design excluded magnetic resonance imaging and ultrasound guidance approaches as the phantom is not suitable for these modalities. For the purposes of MATCH, the tumor motion data used were from free, non-restricted, breathing patients.

2 MATERIALS AND METHODS

2.1 Overview

MATCH consisted of two parts: a retrospective in silico study and a prospective experimental study. An overview is shown in Figure 2.

Details are in the caption following the image
The workflow for the MArkerless lung target Tracking CHallenge (MATCH) in silico study and experimental study

2.2 Common materials

2.2.1 The experiment setup and treatment planning protocol

Common to both parts of MATCH were an anthropomorphic thorax phantom,54 lung SABR planning protocol. The phantom is shown in Figure 3. The phantom consisted of a 3D printed anatomy taken from a patient, including blood vessels, mediastinum with trachea, and bony structures. In contrast to a patient, the phantom could be moved with known displacements, offering a benchmark for the target position.55 The computed tomography (CT)-scan and X-ray images of the phantom resembled those of a patient. The phantom included three targets with a diameter of 2 cm, two of water-equivalent density and one of half water-equivalent density. Only the two targets of water-equivalent density were used for MATCH. The lung SABR planning protocol was taken from RTOG 0915,56 using the 12 Gy/fraction arm for expediency of measurements (12 Gy/fraction, internal target volume (ITV): D100% = 100% (12 Gy), planning target volume (PTV): D98% ≥ 100% (12 Gy)).

Details are in the caption following the image
The experimental setup used in both studies consisted of the anthropomorphic thorax phantom attached to a modified HexaMotion platform. This setup enabled patient-measured 3D motion of the phantom during treatment delivery

2.3 In silico study

2.3.1 Goal

The goal of the in silico study was the benchmarking of different MLTT approaches on identifying blinded motion. Therefore, MLTT approaches were challenged to accurately and precisely identify the position of the target with time (t,x,y,z) during a volumetric modulated arc therapy (VMAT) treatment on a conventional linac. Four treatments were simulated in a phantom experiment and all treatment planning and delivery data were provided to the participants.

2.3.2 Workflow

The simulated lung SABR VMAT workflow with intra-fractional imaging consisted of three stages as shown in Figure 4. The anthropomorphic thorax phantom was fixed on a modified HexaMotion platform where a custom-made wooden board replaced the normal Delta4 device as shown in Figure 3. To simulate 3D motion, the HexaMotion platform moved the phantom with patient-measured motion traces. The imaging and planning data were generated at Aarhus University Hospital.

Details are in the caption following the image
Workflow of the data acquisition as presented to the participants in the in silico study. The three phases treatment planning, treatment setup, and volumetric modulated arc therapy (VMAT) treatment delivery were performed with the anthropomorphic thorax phantom with intra-fractional imaging

To simulate treatment planning, a static 3D- and a 4D-CT with motion were acquired from a SOMATOM go.Open Pro scanner (Siemens Healthineers, Forchheim, Germany) for each of the two selected lung targets in the anthropomorphic thorax phantom, and lung SABR treatment plans were created. On the experiment day, treatment setup 3D- and 4D-cone beam CTs (CBCTs) were acquired under motion and reconstructed on a TrueBeam linac (Varian Medical Systems, Palo Alto, CA, USA). On the linac, the kV imaging source and the kV detector were mounted on the gantry perpendicular to the megavoltage (MV) imaging system, which consisted of the MV treatment source and an EPID. The motion traces “high motion range” and “high complexity” were assigned to target 1, “mean motion range” and “mean complexity” were assigned to target 2. For each combination, a single arc VMAT treatment was delivered. Intra-fractional kV and MV images were acquired with an imaging rate of 11 and 12.7 Hz, respectively. Afterwards, all treatment planning and intra-fractional imaging data were converted to DICOM format and provided for download to the participants. A detailed description of the dataset can be found on the MATCH website. Participants were given 6 months to apply their MLTT approaches to the imaging data and to identify the blinded intra-fractional lung target motion.

2.3.3 Lung target motion traces

The lung target motion traces were taken from the clinical study Evaluating an Anchored Transponder in Lung Cancer Patients Receiving Radiation Therapy (NCT01396551)22 using measurements of the centroid position of Calypso beacons implanted near lung tumors. The original dataset contained 45 patients with 605 fractions of motion measurements recorded at 10 Hz. Following AAPM TG7657 guidelines, fractions with a range of respiratory motion ≤5 mm were excluded. As the range of motion and the motion complexity are two factors known to affect lung tracking, from the remaining data, four fractions from three patients were selected for the in silico study. The selected patient traces represented motion with mean motion range, high motion range, mean complexity, and high complexity. The datasets were shared through a data usage agreement with the Washington University in St. Louis.

The details of the selected motion traces are given in Figure 5 and Table 1. To ensure independence, only one institution (Aarhus University Hospital) had access to the data prior to the challenge submission deadline.

Details are in the caption following the image
Plots of the ground truth lung target motion in the left–right (LR, blue), superior–inferior (SI, red), and anterior–posterior (AP, yellow) direction for each of the experiments in the in silico study
TABLE 1. Characteristics of the lung target motion traces used for the in silico study
Standard deviation of displacement (mm)
Motion trace name LR SI AP Average period(s)
Mean motion range 0.4 4.8 0.9 3.3
High motion range 0.2 5.6 1.1 2.6
Mean complexity 0.8 3.2 1.1 2.4
High complexity 0.6 4.8 1.9 3.1
  • Abbreviations: AP, anterior–posterior; LR, left–right; SI, superior–inferior.

2.4 Experimental study

2.4.1 Goal

The goal of the experimental study was the evaluation of clinical feasibility and benchmarking of different MLTT approaches to identify target motion prospectively in a clinical environment. Participants were asked to submit the measured intra-fraction target position with time (t,x,y,z), all acquired clinical data, and an estimate of the end-to-end system latency of their MLTT approach. The experiment hardware and a description of the prospective phantom experiments were provided to the participants.

2.4.2 Workflow

The workflow of the experimental study is shown in Figure 2. Each institution used the anthropomorphic thorax phantom, the RTOG 0915 treatment planning guidelines and five provided motion traces. The institution-specific materials included the MLTT approach, a HexaMotion platform or programable treatment couch, a clinical CT scanner, a radiation therapy planning system, and a radiation therapy delivery system. The anthropomorphic thorax phantom was scanned and planned in accordance with the department protocol (e.g., 3D/4D). Treatment plans for each of the two targets were developed. Each treatment plan was delivered to the phantom moving with the motion trace using the institution's MLTT approach and the end-to-end system latency was estimated. After the experiments, all required data were submitted. If an MLTT approach was not implemented prospectively, the experiments could still be conducted, and the target position could be identified retrospectively in the acquired data. However, the results were then excluded from the MATCH ranking. Three copies of the phantom experiment setup were available and required to be shipped internationally between institutions. Participants were given 3 weeks to complete the experiments.

2.4.3 Lung target motion traces

The patient-measured lung target motion traces are presented in Figure 6 and Table 2. The five traces consisted of one SI-sinusoidal trace and four 3D patient-measured respiratory motion traces previously used in a multi-institutional real-time adaptive radiotherapy comparative study.58 Apart from the SI-sinusoidal trace, all the traces were originally acquired on a CyberKnife Synchrony machine from lung cancer patients at a sampling rate of 25 Hz and smoothed for practical use.59 The traces were selected to represent a range of typical and atypical lung target motions: SI-sinusoidal, typical lung, high frequency, left–right (LR) dominant, and baseline shift.

Details are in the caption following the image
Excerpts of the lung target motion traces in the left–right (LR, blue), superior–inferior (SI, red), and anterior–posterior (AP, yellow) direction used in the experimental study
TABLE 2. Characteristics of the lung target motion traces used for the experimental study
Standard deviation of displacement (mm)
Motion trace name LR SI AP Average period (s)
SI-sinusoidal 0 3.5 0 4.0
Typical lung 0.2 2.0 0.8 7.1
High frequency 0.3 3.2 0.2 2.4
LR dominant 4.1 0.4 1.2 3.3
Baseline shift 2.9 0.9 2.7 2.9
  • Abbreviations: AP, anterior–posterior; LR, left–right; SI, superior–inferior.

2.5 Requested result submission data

The following files were submitted by the participants:
  1. Target position with time (t,x,y,z): the target position (xLeft-Right, ySuperior-Inferior, zAnterior-Posterior) was given as the position of the target center of mass with regards to the treatment isocenter (0,0,0) in millimeters. The time sampling rate could be aligned with the projection frequencies, for example, one target position (t,x,y,z) for each kV or MV projection t. The result was to be submitted as a four-column table (t|x|y|z) as a Text, Matlab, or Excel file.

  2. Details of the tracking algorithm: the participants were asked to provide a detailed description of their approach, supported by screenshots or related publications. This was to be submitted as PDF document with supplementary files of common file types.

  3. End-to-end system latency: as part of the prospective experimental study, an estimate of the end-to-end system latency had to be supplied if no motion prediction algorithm was used to overcome latency.

  4. Clinical data: to support the integrity of the results, participants of the experimental study were requested to submit the CT images set(s), treatment plans, tracking algorithm images, treatment logs, treatment time, and an estimate of the intra-treatment imaging dose.

2.6 Analysis of markerless lung target tracking submissions

The challenge for the MLTT approaches was to identify the intra-fractional position of the target accurately and precisely with time. The result submissions were analyzed in Matlab. Although several metrics could be selected, a single value was needed for ranking. For MATCH, the prospectively defined primary metric was the percentage of target localizations with a 3D tracking error <2 mm. Additionally, the mean tracking error and standard deviation in each dimension were calculated for each motion trace. For the in silico study, the mean 2D error and standard deviation for locating the target in the kV-projection were determined by forward-projecting both the tracked and the ground truth 3D target position onto the intra-fractional image. The tracking margin components were calculated in each of the directions LR, superior–inferior (SI), and anterior–posterior (AP) according to Van Herk's formula60:
urn:x-wiley:00942405:media:mp15418:mp15418-math-0001

Here, the systematic error Σ was calculated as the standard deviation of the mean tracking error across all experiments and the random error σ was calculated as the root mean square of the standard deviation of the tracking error across all experiments.

In both the in silico study and the experimental study the lung target motion traces were used as ground truth. For the in silico study dataset, the ground truth target position for each intra-fractional image was determined by temporally aligning the ground truth phantom motion trace with the signal from the Real-Time Position Management (RPM) system (Varian Medical Systems), which was mounted on the HexaMotion platform. Besides being exported as DICOM files from the accelerator after the treatments, the cine MV and perpendicular kV images were also streamed as Varian XIM files by the frame grabber software iTools Capture (Varian Medical Systems) during treatment delivery. The XIM file image headers included the phantom position as recorded at the time of image exposure by optical monitoring using the RPM. By synchronizing this position with the known patient-measured motion trace the phantom position at each image acquisition was found and used as the ground truth position. The MLTT results were matched accordingly.

In the experimental study, the markerless tracking results were matched with the ground truth in a twofold optimization process. Simultaneously, the temporal alignment between ground truth and tracked target motion was optimized, and the sampling rate of the ground truth motion was optimized to best match the tracked target motion. This optimization approach was a brute force method to maximize correlation. The ground truth sampling rate was optimized within 50.0 ± 2.5 Hz (expected sampling rate ± uncertainty). The uncertainty of ±2.5 Hz for the ground truth sampling rate was a generous margin to include possible variations in the experimental sampling speed and reproducibility of the HexaMotion platform. After the optimization process, the sampling rate of the tracked target motion was interpolated to match the ground truth sampling rate. All results were generated without taking the end-to-end system latency into account. When an MLTT approach used prediction algorithms to compensate for latency, the predicted target position was used for the results analysis. If an MLTT approach was not implemented prospectively, the results were analyzed similarly, but were excluded from the final MATCH ranking.

To remove non-approach-related errors between the result submissions, feedback was provided to the participants or submissions were corrected by the organizers, if the following errors were seen by the MATCH analyst:
  1. Incorrect units, for example, cm/mm, s/ms.

  2. Switched dimension, for example, SI in place of AP.

  3. Wrong sign of dimension, for example, IS instead of SI.

  4. Obvious erroneous offset, for example, a 5 cm shift of the target position.

  5. Wrong motion trace.

  6. Wrong tumor being tracked.

  7. Systematic offsets larger than 3 mm were found in any direction LR, SI, AP.

All data submissions in the experimental study underwent a correction for study design-related errors in an individual procedure for each MLTT approach. Where large discrepancies (>3 mm) for the mean tracking error were found, the participants were questioned and given the opportunity to explain the reason for the discrepancy. If a reasonable explanation was found for an error, such as usage of an incorrect coordinate system, the results were corrected. The coordinate systems of the ground truth and the tracking were reasonably aligned based on information taken solely from the submitted clinical data and a systematic offset was added to the tracked target motion. The individual approaches to align systematic shifts between the initially submitted results and the ground truth are described in the following description of the participating MLTT approaches.

2.7 Participating markerless lung target tracking approaches

MATCH was open to any participant, and participants could complete either one or both parts of the challenge. MATCH was advertised on the AAPM Grand Challenge website (https://www.aapm.org/GrandChallenge/MATCH/). Additionally, potential participants were proactively identified and contacted by searching the 2017–2019 AAPM annual meeting abstracts for the keyword “markerless” and reviewing the top 50 hits on a google scholar search of “markerless lung tracking radiotherapy”.

2.7.1 Radixact 1 and Radixact 2

Detailed technical descriptions of the Synchrony on Radixact have been reported elsewhere.61 In summary, a motion correlation model for the target was created based on LED markers on the surface and images from a gantry-mounted kV-imager. During treatment delivery, the target was located based on the LED marker surrogacy every 10 ms and the correlation model was updated based on kV images. The time between kV images was between 4.5 and 7.3 s for both plans. Radixact uses a prediction algorithm to overcome system latency. All Radixact delivery records were automatically saved in the system. Accuray provided Matlab code to extract the necessary tracking data from each delivery record and saved as Excel file.

Two individual Radixact systems from two separate institutions participated in the MATCH. The MATCH organizers found a systematic shift of >3 mm in the initial submission of Radixact 2. Therefore, for Radixact 2, possible phantom setup errors in each delivery were further corrected by evaluating the difference of the detected and the projected target positions, with phantom motion from the motion trace considered, on all acquired radiograph images.

2.7.2 CyberKnife 1 and CyberKnife 2

Detailed technical descriptions of the XSight Lung Tracking System on CyberKnife have been reported elsewhere.62 In summary, a motion correlation model for the target was created based on attached surface markers on the phantom and kV-projections acquired with two perpendicular kV-imagers in a static orientation. During treatment delivery, the target was located based on the surface marker surrogacy every 10 ms and the correlation model was updated based on kV images.

Two individual CyberKnife systems from two separate institutions participated. During the delivery, the imaging rate was set to acquire kV-projections every 45 s (CyberKnife 1) and 60 s (CyberKnife 2), which were used to update the correlation model and ensure the CyberKnife was tracking the target accurately. Following the deliveries, the log files of the treatment were exported from the treatment machine and the target motion trace was extracted from the Predictor.log file. The Predictor.log file stored the predicted target position and associated error between the previous prediction and current correlation location. The results were converted into the result submission format by adding the respiratory center (a CyberKnife-specific value63 describing the middle point of the respiration peak and valley positions as observed in the planning 4D-CT) of each treatment plan to the predicted target position. The XSight Lung Tracking System on CyberKnife used a prediction algorithm.

2.7.3 C-RAD catalyst

For each motion trace, a correspondence model was constructed to track the tumor motion during radiation delivery. The phantom and the HexaMotion platform were set up on the CT scanner couch. The displacement of one point on the phantom's surface was recorded by Sentinel (C-RAD AB, Uppsala, Sweden) as surrogate signal. The 4D-CT phantom images were sorted into 10 respiratory phases retrospectively using the vendor-supplied amplitude-binning algorithm. To extract the internal motion data, deformable image registration was performed between the end of exhalation phase and other respiratory phases using DIRART suite (v1.0a) in Matlab. After exporting the surrogate signal from C-RAD software (v5.4.2), the average displacement of the phantom surface was calculated at each respiratory phase from surrogate signal. Principal component analysis was used to fit the correspondence model for each trace. The phantom and the HexaMotion platform were set up on the treatment delivery couch, and surrogate signal was recorded for each trace by CatalystHD (C-RAD AB). The surrogate signal was exported from C-RAD software. The model incorporates the surrogate signal as input and delivers the motion trace as output. This approach was implemented retrospectively and therefore the end-to-end system latency has not been reported.

2.7.4 Vero

Detailed technical descriptions of the Vero4DRT can be found elsewhere.45, 64 In summary, the external motion of an infrared (IR) marker pad is correlated with the internal target motion extracted from kV-projections. The model is updated if deviations >3 mm occur between the prediction of the model and the measured target position in the kV-projections.

At treatment, an IR respiratory marker pad was placed on the phantom as an external motion surrogate. ExacTrac imaging and a six-degree-of-freedom robotic couch was used to position the phantom at isocenter. A dual-fluoroscopy sequence was acquired over 12–20 s (∼3 respiratory cycles) at an optimized gantry/ring angle selected to maximize the visibility of the gross tumor volume by avoiding overlap with the spine or other internal features. A correlation model between the external IR surrogate and the internal target was built by the system. The tracking feature on the Vero4DRT was achieved by moving the linac assembly (waveguide and collimation) on a 2D gimbal. Dual-kV images were acquired periodically (∼1 s) and predicted-versus-detected target positions were reported live. A tracking uncertainty threshold was set such that if the predicted-versus-detected target position deviates by >3 mm, three times in a row (user-defined), the MV beam was terminated, and the correlation model was updated. The treatment plans for the “SI-sinusoidal” and “high frequency” motion scenarios were not delivered, because the AP motion component was considered too small to be detected as external IR surrogate.

An absolute shift in the tracking position log data was applied to align with the HexaMotion phantom motion. This shift was obtained from the correlation model data file by calculating the mean values of the X, Y, and Z “predicted” target positions. The Vero4DRT used a prediction algorithm and therefore the end-to-end system latency was considered to be insignificant.

2.7.5 Conventional Linac 1

The MLTT implementation Monoscopic Sequential Stereo was based on the non-clinical software suite RapidTrack,65 which combined 2D template-based tracking on projection images66 with sequential stereo imaging67, 68 to estimate the 3D position of the target. The hardware consisted of a single kV source and imager. From the contoured planning CT a 2D template of the 3D structure including an additional margin was generated as a band-pass-filtered digitally reconstructed radiograph (DRR). This process was repeated for several user-specified imaging angles along the treatment arc to form a 2D template library. During treatment continuous kV-projections were acquired, pre-processed, and analyzed. Based on the imaging angle the corresponding template was selected from the library and its location was matched to the projection using normalized cross-correlation. Monoscopic Sequential Stereo then calculated the 3D position though triangulation with previously acquired projections. Their minimal angular separation and arc-length was defined by the user retrospectively. To triangulate the 3D position at the start of the treatment, at least a partial arc of projection images, for example, from a setup CBCT acquisition or a dedicated 20° subarc acquisition were required to be taken before switching on the treatment beam. This approach was implemented retrospectively, and the potential end-to-end system latency was estimated to be <300 ms.

2.7.6 Conventional Linac 2

Simultaneous Sequential Stereo follows a similar approach as Monoscopic Sequential Stereo but relies on a stereoscopic imager setup consisting of two perpendicular kV sources and imagers, 90° apart. This method was implemented retrospectively on a conventional linac with only one kV-imager through a combination of two treatment delivery arcs with a constant 90° imaging offset. Apart from having two immediate tracking rays available for triangulation at each time instance, this method also allowed to fall back to the Monoscopic Sequential Stereo approach if the target was only detected on one of the two images. Similarly, in case the target was detected on neither of the two images, no location information was provided. This approach was implemented retrospectively and therefore the end-to-end system latency has not been reported.

2.7.7 Conventional Linac 3

This method had the same workflow as Monoscopic Sequential Stereo, however a deep learning approach replaced the 2D template matching. In particular, a deep regression model p= ƒƟ(Ic|It,pt) was used to approximate the probability of projected 3D target center-of-mass (COM) pc on current projection image Ic with the template image It and its COM probability pt, where Ɵ were network parameters. A 2D Gaussian distribution centered at the projected COM was used for pt. A similarity transformation was used to transform Ic and pc during the training process to impose constraints on possible motions between a template and current projection. The template images were DRRs of the planning CT and current projections were pre-treatment kV-projections. The Monoscopic Sequential Stereo approach as described in Conventional Linac 1 was used to estimate the 3D target position from the 2D COM location. This approach was implemented retrospectively and therefore the end-to-end system latency has not been reported.

2.7.8 Conventional Linac 4

This method combined the output from two approaches to achieve a better overall performance using a set of selection and weighting criteria. For the in silico study, it combined the output from Conventional Linac 1 and Conventional Linac 3. In the experimental study it combined the output from Conventional Linac 2 and Conventional Linac 3. The combined results were improved twofold over the individual approaches: firstly, for frames taken at difficult detection angles at least one of both approaches would still detect the target. Secondly, when both approaches would detect the target, the output results were weighted with a static factor such that the outcome of Conventional Linac 3 was preferred in case both approaches predicted a similar 3D position. The weights were optimized retrospectively for optimal performance for the specific use case. The optimization parameters were the fraction of tracked projections and the confidence and accuracy of the measured target position. This approach was implemented retrospectively and therefore the end-to-end system latency has not been reported.

2.7.9 Conventional Linac 5

A detailed technical description of this approach is available elsewhere.40, 69 Prior to treatment delivery, the treatment planning 4D-CT with target contours was used to build a case-specific template library of the moving anatomy and the target. In addition, a motion correlation model was built between the target and the diaphragm. On the treatment day, the motion correlation model was updated with the daily target motion pattern on the kV-projections of a 200° pre-treatment imaging arc prior to the MV dose delivery. A surface signal was also acquired using an Intel RealSense depth sensor and correlated with the diaphragm and target motion. During treatment delivery, kV-projections were streamed prospectively at 7 Hz. The surface signal acted as a surrogate to define the search window for the diaphragm in each kV-projection. Once the diaphragm was detected,14 the motion correlation model was used to define the search region for the target. Within this target search region, the anatomy model was used to enhance the soft tissue contrast by forward-projecting and subtracting anatomical structures that did not correspond to the target. Next, the 2D target position was tracked using a template matching approach based on the maximum correlation. The measured target position in the kV-projection was ultimately inferred to 3D using Bayesian statistics.70 For the in silico study only the target tracking with template matching in the kV-projections was utilized without a motion correlation model and without diaphragm tracking. The end-to-end system latency of this approach was measured to be 195 ± 17 ms.

3 RESULTS

3.1 In silico study: Results and ranking

Four MLTT approaches were benchmarked in the in silico study. The primary metric, the percentage of the 3D tracking error being <2 mm, ranged from 50% to 92%. The tracking accuracy did not depend on the motion direction or motion amplitude. The tracking margin component was found to be <2.0 mm for the three best performing MLTT approaches in every motion direction. The percentage of images tracked indicates that for the three best performing approaches not every kV image was suitable for localizing the target, with up to 2.3% of images being skipped. Table 3 summarizes the results of the in silico study.

TABLE 3. Ranked performance of the in silico study
In silico study
Mean tracking error ± standard deviation (mm) Tracking margin component (mm)
MLTT Tracking error <2 mm (%) LR SI AP LR SI AP Percentage of images tracked (%)
Conv. Linac 1 92.3 −0.3 ± 0.6 −0.1 ± 0.4 0.3 ± 0.7 0.6 0.8 1.8 97.7
Conv. Linac 4 89.3 −0.3 ± 0.7 −0.5 ± 0.6 0.3 ± 0.8 0.9 1.2 1.4 99.6
Conv. Linac 3 88.8 −0.3 ± 0.7 −0.5 ± 0.6 0.2 ± 0.7 1.0 1.2 1.2 99.5
Conv. Linac 5 49.7 −0.2 ± 1.1 0.0 ± 1.4 0.3 ± 1.4 3.4 1.5 4.7 100
  • Note: The percentage of the 3D tracking being within 2 mm of the ground truth averaged over all images of the four simulated treatment fractions was used for ranking. The fraction of all kV-projections that were used for target localization is also shown.
  • Abbreviations: AP, anterior–posterior; Conv., conventional; LR, left–right; MLTT, markerless lung target tracking; SI, superior–inferior.

In Table 4, the secondary results are presented. The mean 2D error and standard deviation for locating the target in the kV-projection indicates the accuracy of the target localization. The target localization was better in the u-direction of the image with a mean error close to zero, which split into finite mean errors of similar magnitude and opposite signs when the projected u-position was divided into LR and AP positions in patient coordinates (Table 3). The less accurate target localization in the v-direction of the image aligns with the SI motion direction and propagated to similar tracking errors in the SI direction.

TABLE 4. Secondary results of the in silico study
Mean 2D error ± standard deviation (mm)
MLTT u-direction v-direction
Conv. Linac 1 0.0 ± 0.4 −0.1 ± 0.4
Conv. Linac 4 0.0 ± 0.4 −0.5 ± 0.6
Conv. Linac 3 0.0 ± 0.4 −0.5 ± 0.6
Conv. Linac 5 0.1 ± 1.5 0.1 ± 1.4
  • Note: The mean 2D error and standard deviation for locating the target in the kV-projections are presented.
  • Abbreviations: Conv., conventional; MLTT, markerless lung target tracking.

Figure 7 shows violin plots of the tracking results for each MLTT approach in the three directions LR, SI, and AP, as well as the 3D tracking error. The uniform distribution of the values around the mean error suggests that the tracking error is of random nature. Largest outliers were found in AP direction across all MLTT approaches. Figure 8 shows the tracked target motion of each MLTT approach compared to the ground truth for an excerpt of each of the motion traces. Relatively large tracking errors occur at the motion peaks. Gaps in the tracked target motion indicate periods where the target position was not identified.

Details are in the caption following the image
Violin plots of the in silico study results for each markerless lung target tracking (MLTT) approach over all four simulated treatment fractions in the directions left–right (LR), superior–inferior (SI), and anterior–posterior (AP), as well as the 3D tracking error. The white dot and line indicate the mean and first standard deviation of the tracking error, respectively. The width of the violin relates to the number of data samples for a given value
Details are in the caption following the image
Tracked target motion compared to ground truth motion for all markerless lung target tracking (MLTT) approaches on the example of a 20 s excerpt from the high motion (a), high complexity (b), mean motion (c), and mean complexity (d) motion traces. The plots visualize the target tracking behavior of each approach qualitatively and are not representative for the general tracking performance

3.2 Experimental study: Results and ranking

In total, 11 MLTT approaches have been benchmarked in the experimental study, of which six were benchmarked prospectively. Table 5 shows the ranking of the benchmarked MLTT approaches. The primary metric, the percentage of the 3D tracking error being <2 mm, ranged from 39% to 96% in the prospective measurements. Figure 9 shows violin plots of the tracking results for each MLTT approach in the three directions, as well as the 3D tracking error. The MLTT approaches implemented on conventional linacs showed slightly worse performance for the LR and AP directions when compared to their respective result in the SI direction. Several violin plots are multimodal, suggesting that the results are subject to systematic errors, which can potentially be related to the geometrical uncertainty of the experiment setup. Systematic offsets can also be seen in the 20 s extracts of each motion scenario for all MLTT approaches in Figure 10. It must be noted that the plots in Figure 10 show small extracts from the whole experiment for visualization purposes and large deviations between the tracked target position and the ground truth are not representative for the general tracking performance.

TABLE 5. Ranked performance of the experimental study
Experimental study
Mean tracking error ± standard deviation (mm) Tracking margin component (mm)
MLTT Tracking error <2 mm (%) LR SI AP LR SI AP
Radixact 1 96.2 0.5 ± 0.3 0.6 ± 0.3 −0.1 ± 0.3 1.2 0.7 0.8
Radixact 2 94.4 −0.1 ± 0.5 −0.1 ± 0.4 0.0 ± 0.5 1.0 1.0 1.0
Vero 91.3 0.1 ± 0.6 0.1 ± 0.3 −0.2 ± 0.2 1.0 2.2 1.0
CyberKnife 1 78.2 0.7 ± 0.3 0.2 ± 0.4 −0.6 ± 0.3 3.6 1.5 2.4
Conv. Linac 5 62.4 −0.3 ± 1.4 0.0 ± 1.1 −0.3 ± 0.9 2.5 2.2 1.8
CyberKnife 2 38.9 1.4 ± 0.4 0.2 ± 0.4 −1.1 ± 0.3 5.0 2.8 2.8
Conv. Linac 1a 98.0 0.0 ± 0.5 −0.2 ± 0.4 0.0 ± 0.5 0.9 0.8 1.0
Conv. Linac 2a 97.3 0.2 ± 0.6 −0.2 ± 0.4 −0.1 ± 0.6 0.8 0.9 1.1
Conv. Linac 4a 95.9 −0.2 ± 0.7 −0.3 ± 0.4 −0.3 ± 0.5 1.2 1.5 0.8
Conv. Linac 3a 88.7 −0.4 ± 1.2 −0.3 ± 0.3 −0.2 ± 1.1 1.9 1.6 1.4
C-RADa 54.9 −0.7 ± 1.0 −0.8 ± 1.6 0.3 ± 0.5 4.9 3.8 1.5
  • Note: The percentage of the 3D tracking being within 2 mm of the ground truth averaged over all 10 simulated treatment fractions were used for ranking.
  • Abbreviations: AP, anterior–posterior; Conv., conventional; LR, left–right; MLTT, markerless lung target tracking; SI, superior–inferior.
  • a No prospective implementation.
Details are in the caption following the image
Violin plots of the experimental study results for each markerless lung target tracking (MLTT) approach in the directions left–right (LR), superior–inferior (SI), and anterior–posterior (AP), as well as the 3D tracking error. The white dot and line indicate the mean and first standard deviation of the tracking error, respectively. The width of the violin relates to the number of data samples for a given value. aMLTT approaches with no prospective implementation
Details are in the caption following the image
Tracked target motion compared to ground truth motion for all markerless lung target tracking (MLTT) approaches on the example of a 20 s excerpt from the baseline shift (a), high frequency (b), left–right (LR) dominant (c), and typical lung (d) motion traces. LR, superior–inferior (SI), and anterior–posterior (AP). The plots visualize the target tracking behavior of each approach qualitatively and are not representative for the general tracking performance

The surrogacy-based MLTT approaches of the Radixact, CyberKnife, and Vero showed limitations when the motion pattern deviated from the modeled motion, which was mostly visible for the motion scenario “baseline shift” as reflected in the results. The deviation was detected and corrected within seconds when the next kV-projection was acquired, and the internal–external motion model was updated. The Vero was unable to create a correlation model if there was not enough motion detected by the IR camera in the AP direction. For this reason, treatment plans for the “SI-sinusoidal” and “high frequency” motion scenarios could not be delivered with the Vero. The operators of the CyberKnife 2 changed the experiment setup for the “SI-sinusoidal” and “typical lung” motion scenarios, which were therefore excluded from the results.

4 DISCUSSION

MATCH successfully benchmarked the geometric tracking accuracy of six preclinical and five clinically available commercial MLTT approaches in an in silico and an experimental study. An experiment consisting of an anthropomorphic thorax phantom moving with lung target motion traces was developed to act as a suitable measurement methodology for current and future MLTT approaches. The MATCH was the first Grand Challenge sponsored by the AAPM with an experimental component, and it provides a template for future grand challenges. Additionally, tools were developed to automatically analyze the target tracking accuracy of each system and made publicly available.

The result submission for the Vero and the best result submission for the Radixact and CyberKnife systems achieved sub-millimeter target tracking accuracy and precision in the prospective experiments. These commercially available systems potentially allow for high treatment accuracy. The results of the CyberKnife 2 and the C-RAD system were subject to unknown errors and therefore not suitable as a benchmark for the potential tracking accuracy of the respective MLTT approach. The dataset of CyberKnife 1 records an offset >3 mm in at least one motion direction for two out of the 10 motion traces. This is a potential indication for a user-dependent factor in MLTT performance and that improper operation might increase errors significantly. According to the feedback provided by both CyberKnife teams, the experiment setup was not well-suited for implementation on the CyberKnife system, due to the geometric constraints for imaging and treatment. Conventional Linacs 1–4 also achieved sub-millimeter tracking accuracy in retrospective implementations.

MLTT avoids the cost and risk associated with surgically inserted markers. MLTT approaches on standard linacs come with relatively low associated costs and have the potential to become a more widely accessible MLTT technique when implemented clinically. MLTT using anatomical imaging, such as kV- or MV-projections, is in principle superior to marker-based lung target tracking because MLTT locates the target directly without the marker to tumor surrogacy uncertainties associated with inferring the target position from marker positions. A recent study estimated the surrogacy uncertainty of marker-based lung target tracking approaches over a population of 17 lung SABR patients and found a mean surrogacy uncertainty of 5%–33% of the motion magnitude (95th percentile range of up to 2.3 mm) depending on the motion direction.27 Therefore, MLTT approaches have the potential to achieve more accurate target localization than a marker-based tracking approach.

MATCH had several limitations that affect the transferability of the benchmarked tracking accuracy to the tracking performance in patient cases. The benchmark is more transferable for MLTT approaches that track the target directly, compared to those using an indirect surrogacy model, because of the rigidity of the anthropomorphic thorax phantom. Although the phantom represented the shape and density of a patient thoracic anatomy, it did not deform to change the relative position of the targets with respect to normal structures with respiration. This resulted in the internal–external motion being “perfectly correlated”, which may have benefitted MLTT approaches that rely on a model between internal and external anatomy (e.g., Radixact, Vero, and CyberKnife). However, these systems monitored a 1D external motion, for example, motion in AP direction only, which was then used to infer the internal motion in all three directions. Due to the inhomogeneous motion pattern of the used motion traces, the internal motion in, for example, SI direction would not be “perfectly correlated” with the monitored external motion in the AP direction. Although accurate target tracking still relied on motion correlation between axes, accurate tracking of this phantom motion was necessary but not sufficient to demonstrate clinical utility. More complex phantoms could be used for future challenges, particularly for algorithms that use the external signal as an input to the target tracking model. Also, only 3D target translation was considered in this challenge, however intra-fraction target rotation has been observed71 and could be a subject of a future challenge. The sampling rate of the tracked target motion was interpolated to match the ground truth sampling rate. Therefore, a sampling rate lower than the ground truth sampling rate of 50 Hz could have negative effects on the ranking criterion in the experimental study. A way forward could be to compare MLTT performed offline on clinical patient data. To date, no publicly available patient dataset with intra-fractional imaging exists that also provides a known ground truth target position.

A limitation of the in silico study was that the images were acquired on a gantry-mounted X-ray-guided system and were therefore not applicable to all cancer therapy devices and imaging geometries. Another limitation was that all the images were available for analysis. For a clinical implementation, these images would be streamed in real-time with no knowledge of the future images. Therefore, the participants were relied upon to simulate the clinical scenario. The rigid motion also enabled potential gaming of the system, for example, tracking other objects of high contrast (diaphragm, vertebrae) that may not work as good surrogates in a real patient. This issue was ameliorated by requesting details of the tracking algorithm from the participants. The experimental study used an anthropomorphic thorax phantom where the motion ground truth was known to the participants. Future studies could use more complex phantoms, or clinical images, ideally where markers close to the target are present, but masked out for the purposes of the markerless tracking algorithm evaluation.

The presented results suggest a very promising future for MLTT, however the transition from phantom experiments to patient treatment or to further tumor sites will introduce more challenges, that will need to be overcome. Therefore, today's technology is a lower bound to MLTT performance. Algorithms, imaging systems, and computational speed and power will improve with time. Future developments on MLTT focus on improving imaging systems, detectors, and algorithms. In cases where the tumor is clearly visible, X-ray-based MLTT can track tumors with a high degree of accuracy. However, a major difficulty with MLTT is in cases where the tumor is small, overlapped by soft-tissue or bone and may not be detectable on X-ray projections. Such cases may lead to the outliers shown in this study. Lewis et al.72 showed that the tracking error could be up to 5 mm due to the overlap of the tumor with a high contrast object, such as bone. This effect makes MLTT especially challenging for rotational acquisitions as many projections have tumor/bone overlap. In these cases, improved localization methods may be required. For example, Yang et al.73 obtained CBCT projections, and subtracted DRRs of bone to highlight the tumor. Separately, van Sörnsen de Koste et al.41 applied a band-pass spatial filter to enhance tumor visibility on CBCT projections. Other groups have explored dual energy (DE) fluoroscopic imaging through kV-switching to address the limitations of X-ray-based MLTT.37, 74-78 DE imaging involves obtaining kV X-ray images at high (e.g., 120 kVp) and low (e.g., 60 kVp) energies. By performing a weighted-logarithmic subtraction, a third image is produced that suppresses bone and enhances soft tissue/tumor visibility. Alternative methods of producing DE images (using kV or MV imaging) involve using a multilayer detector where an X-ray filter (such as copper) is placed between two sensors. The top layer produces an image at the nominal X-ray energy, while the second layer produces an image after being filtered by the first layer and possibly by a copper plate. Thus, images are obtained simultaneously with no motion artifacts. However, DE image contrast may be inferior using this approach since the mean energy separation between the two images is small.79-81 More recently, there has been interest in the use of photon counting detectors to provide spectral separation which may be used to remove overlaying bone on kV-projections.82-86 Furthermore, machine learning approaches, especially deep learning approaches, have been applied to segment kV-projections.87-92

5 CONCLUSIONS

A common methodology for measuring the accuracy of MLTT approaches has been developed and used to benchmark preclinical and commercial approaches retrospectively and prospectively. Several MLTT approaches were able to track the target with sub-millimeter accuracy and precision. The study outcome paves the way for broader clinical implementation of MLTT. MATCH is live, with datasets and analysis software being publicly available online at https://www.aapm.org/GrandChallenge/MATCH/ to support future research.

ACKNOWLEDGMENTS

We would like to thank the AAPM for their support of the MATCH. We acknowledge Scandidos for supporting the experimental study with HexaMotion platforms. We thank Nichole Maughan, Parag Parikh, and Lakshmi Santanam from Washington University for providing the patient-measured lung target motion traces, and Calypso/Varian for funding the lung target motion traces data collection. We thank Esben Worm from Aarhus University Hospital for treatment planning for the in silico study, and Toon Roggen from Varian Medical Systems for the file conversion of the in silico dataset. We further acknowledge the 3D InnovationLab of Amsterdam UMC and Oceanz for their support of the phantom manufacturing. The MATCH organizers want to thank the participants Lee Goddard and Kyoungkeun Jeong from the Montefiore Medical Centre, Alanah Bergman and Marie-Laure Camborde from the BC Cancer Centre, Irene Redaelli and Anna Martinotti from the Centro Diagnostico Italiano, Guangpei Chen from the Medical College of Wisconsin, Cynthia Chuang and Dante Capaldi from Stanford University, Sharareh Fakhraei and Parham Alaei from the University of Minnesota, Adam Briggs and Jeremy Booth from the Royal North Shore Hospital Sydney, Vanessa Panettieri from the Alfred Hospital Melbourne, and Toon Roggen and Stefan Scheib from Varian iLab. Marco Mueller acknowledges funding support from the Cancer Institute NSW Translational Program Grant scheme. Paul Keall acknowledges funding support from the Australian Government NHMRC Senior Principal Research Fellowship and Investigator Grant schemes. John Roeske was supported by the National Cancer Institute of the National Institutes of Health under award number R01-CA207483. Ross Berbeco acknowledges funding support from the National Cancer Institute of the National Institutes of Health under award number R01-CA188446.

    CONFLICTS OF INTEREST

    The MATCH organizing committee adhered to the AAPM Working Group on Grand Challenges (WGGC) policy (https://www.aapm.org/GrandChallenge/documents/ChallengeOrganizerGuidance.pdf). To adhere to this policy and maintain the integrity of the challenge and confidence in the results a governance statement was developed (see the Appendix). Specific author conflicts of interest are: Paul Keall is an inventor on patent application PCT/AU2016/000086 that is related to markerless tumor tracking. This patent and associated intellectual property were assigned by the University of Sydney to ASTO CT. Ross Berbeco has been supported by a research grant from Varian Medical Systems. John Roeske has received speaker's honoraria from Varian Medical Systems outside the scope of this work. Wilko Verbakel received research funding and speakers honoraria/travel expenses from Varian Medical Systems outside the current research. All other authors report no conflict.

    APPENDIX: GOVERNANCE STATEMENT FOR THE MATCH CHALLENGE

    To comply with the American Association of Physicists in Medicine (AAPM) Working Group on Grand Challenges (WGGC) policy (https://www.aapm.org/GrandChallenge/documents/ChallengeOrganizerGuidance.pdf) and through discussions with the WGGC several steps were taken. The MArkerless lung target Tracking CHallenge (MATCH) organizers did not participate in the challenge. No-one who reported directly to a MATCH organizer was involved in the challenge. In line with the WGGC goal to manage conflicts in a manner that preserves the overall integrity of the challenge, the following additional governance steps were taken:
    1. Challenge design with multiple stakeholder consensus. In the study design there was a large group of organizers with eight institutions represented. The final challenge design was met by consensus of all organizers, and therefore less likely to be biased by single institutions or institution factions.

    2. Early proactive challenge notification to potential participants. To maximize the time participants had to participate in the challenge, the challenge was published on the AAPM website, with potential participants proactively contacted when the challenge was approved by the WGGC. Potential participants were proactively identified and contacted by searching the 2017–2019 AAPM annual meeting abstracts for the keyword “markerless” and reviewing the top 50 hits on a google scholar search of “markerless lung tracking radiotherapy”.

    3. Independent acquisition of blinded in silico motion data by an expert institution (in silico study). The motion trace selection and image acquisition for the in silico part of MATCH was performed by an independent institution (Aarhus University Hospital, Denmark) and thus the organizers and participants did not have access to ground truth data until the closure of the challenge. Also, standard 3D/4D computed tomography (CT)/cone beam CT (CBCT)/kV/MV/optical clinical imaging protocols were used for data acquisition, which are widely clinically available and common practice. Therefore, any organizer-specific requests were not accommodated. The in silico study of MATCH could be advantaged by (1) using the image set for the analysis at once rather than simulating the clinical case of a real-time setting and (2) using other parts of the phantom for tracking. These issues were ameliorated by ensuring that a detail of the method used accompanied the submission.

    4. Experimental acquisition data upload requirement (experimental study). The ground truth is necessarily known for the experimental acquisition as this data is needed to drive the motion phantom at each institution. To ameliorate any use of the ground truth motion in modifying results participants were requested to provide all clinical data and references to the original clinical data files that were used to determine the tracked target position.

    There may be conflicts of funding sources and intellectual property from some of the organizers and participants that will be acknowledged and disclosed per the International Committee of Medical Journal Editor (ICMJE) guidelines.