IMRT QA using machine learning: A multi‐institutional validation

Abstract Purpose To validate a machine learning approach to Virtual intensity‐modulated radiation therapy (IMRT) quality assurance (QA) for accurately predicting gamma passing rates using different measurement approaches at different institutions. Methods A Virtual IMRT QA framework was previously developed using a machine learning algorithm based on 498 IMRT plans, in which QA measurements were performed using diode‐array detectors and a 3%local/3 mm with 10% threshold at Institution 1. An independent set of 139 IMRT measurements from a different institution, Institution 2, with QA data based on portal dosimetry using the same gamma index, was used to test the mathematical framework. Only pixels with ≥10% of the maximum calibrated units (CU) or dose were included in the comparison. Plans were characterized by 90 different complexity metrics. A weighted poison regression with Lasso regularization was trained to predict passing rates using the complexity metrics as input. Results The methodology predicted passing rates within 3% accuracy for all composite plans measured using diode‐array detectors at Institution 1, and within 3.5% for 120 of 139 plans using portal dosimetry measurements performed on a per‐beam basis at Institution 2. The remaining measurements (19) had large areas of low CU, where portal dosimetry has a larger disagreement with the calculated dose and as such, the failure was expected. These beams need further modeling in the treatment planning system to correct the under‐response in low‐dose regions. Important features selected by Lasso to predict gamma passing rates were as follows: complete irradiated area outline (CIAO), jaw position, fraction of MLC leafs with gaps smaller than 20 or 5 mm, fraction of area receiving less than 50% of the total CU, fraction of the area receiving dose from penumbra, weighted average irregularity factor, and duty cycle. Conclusions We have demonstrated that Virtual IMRT QA can predict passing rates using different measurement techniques and across multiple institutions. Prediction of QA passing rates can have profound implications on the current IMRT process.


| INTRODUCTION
Over 50% of cancer patients receive radiotherapy as partial or full cancer treatment, and radiotherapy is an increasingly complex process. Machine learning is a subfield of data science that focuses on designing algorithms that can learn from and make predictions on data. Machine learning applications in radiotherapy have emerged increasingly in recent years, with applications including predictive modeling of treatment outcome in radiation oncology, 1-7 treatment optimization, [8][9][10][11] error detection and prevention, [12][13][14][15] and treatment machine quality assurance (QA). [16][17][18][19] These machine learning techniques have provided physicians and physicists information for more effective and accurate treatment delivery as well as the ability to achieve personalized treatment.
To the best of our knowledge, however, little work with machine learning has been explored in the field of dosimetry and QA in clinical radiotherapy. It is common to perform patient-specific pretreatment verification prior to intensity-modulated radiation therapy (IMRT) delivery. This process is time consuming and not altogether instructive due to the myriad of sources that affect a passing result. In an earlier work, a machine learning algorithm, Virtual IMRT QA, was developed that can predict IMRT QA passing rates and identify underlying sources of errors not otherwise apparent. 20 The algorithm identified the correlation between the IMRT plan complexity metrics and gamma passing rates and was validated on a single planning/delivery platform.
The objective of this study is to further validate the approach using a large, heterogeneous dataset using different QA measurement devices (diode-array detectors and portal dosimetry) on different models of treatment machines and at different institutions.
Identifying plans prone to QA failure allows physicists to concentrate resources in developing proactive approaches to QA and provides information on sources of errors needed to strategically improve the workflow of patient care as described in AAPM TG-100. 21 Goals of this study are to provide a framework to establish universal standards and thresholds, intercompare results, safely and efficiently implement adaptive radiotherapy, and in the long term, eliminate failing QA altogether. This represents a fundamental paradigm change in the way in which QA is performed. Other details can be found in the original publication on the development of the Virtual IMRT QA method. 20 Figure 1

2.A | Methodologies and data collection
where D i is the total number of detectors in the analysis and fr x i ð Þ is the mean value of the failing rate of the plan i that depends on its complexity vector x i .
We can model fr according to a Poisson regression as: where b is a constant vector the same size as x i Now, given the realization of the data S, let us find the most likely vector b. In order to obtain b, we use Bayes theorem: where p bjS ð Þ is the posterior probability of b given S, p Sjb ð Þ is the probability of obtaining S given b, p b ð Þ is the prior probability of b, and p S ð Þ is the probability of obtaining S regardless of b. We are interested in finding the b that maximizes the function p bjS ð Þ, which is the same as: In eq. 4 we have taken into account that p S ð Þ does not depend on b and as such, it can be dropped from the optimization problem.
Assuming all measurements x i ; y i ð Þ are conditionally independent given the model, the probability p Sjb ð Þ can be written as: And assuming a Laplace distribution with a mean of 0 and variance equal to 2k 2 for p b ð Þ, as customary in Lasso regularization, we which results in: As maximizing p bjD ð Þ is equivalent to maximizing log(p bjD ð ÞÞ, eq. 7 can be rewritten as: Applying the rules of logarithms and dropping the terms that do not depend on b results in: The workflow of the validation of Virtual IMRT QA model.
where fr i is the observed failing rate for plan i, w i = D i ./D max is a weight factor for each observation proportional to the number of detectors in the measurement and D max is a normalization constant.

Equation 8 is a weighted Poisson regression problem with Lasso reg-
ularization where b T can be obtained using the software package available at https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.

html.
Once b T is obtained, this constant vector is used together with eq. 2 and the complexity metrics of each plan x i to predict a specific plan's passing rates as: 3 | RESULTS  20 The required training QA data should be readily available assuming a measurement-based clinical patient-specific QA program is in place, and should not impose additional measurement burden on the practicing physicist.
The potential benefit of this approach can be quite significant. For instance, Virtual IMRT QA could be run by the dosimetrist while planning. If an arbitrary threshold of 93.5% for Virtual IMRT QA is set, all plans that satisfy this threshold should pass IMRT QA with a passing rate higher than 90%. These plans could be further measured. However, those that have predicted passing rate smaller than 93.5% could be modified without the need to perform the QA At present, the model is only capable of assessing fixed-beam IMRT planning/delivery; features critical to volumetric-modulated radiotherapy (VMAT) will be incorporated into future machine learning models. This should be acknowledged as a limitation of the current method. Analysis of 3D detectors will require the collection of some of the key delivery information, such as gantry speed, MLC speed, and aperture size, and obtaining these parameters posts potential challenges. With the popularity of this treatment modality, further study in this direction will be a great asset to the community.
In addition, the predictive model was trained to correlate the automatic registration of calculated and measured QA doses which has its pros (uncertainty due to phantom misalignment is removed) and cons (some mechanical errors producing a shift of dose are not detected in QA).
In this study, we have used the formalism as described by Valdes et al. 20 From eq. 9, however, it is clear that contributions from the different metrics were assumed to be linear (multiplication of a constant vector by the vector describing the plan characteristics), that is, no interaction terms between the different characteristics were con-

| CONCLUSIONS
In this work with more extensive QA data, the validity of Virtual IMRT QA to accurately predict gamma passing rates, within 3.5% error, has been shown for different models of Linacs in different institutions, providing a strong validation of our IMRT QA predictive model. Compared to conventional measurement-based QA, the framework also provides significant insight into both machine and plan characteristics. Software-based, Virtual IMRT QA using machine learning has a unique position in the radiotherapy QA program and further provides a framework for a future integrated risk-based QA program such as that envisioned in AAPM TG-100.

ACKNOWLEDGMENTS
This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748.

CONF LICT OF I NTEREST
The authors declare no conflict of interest.