Validation of the clinical applicability of knowledge‐based planning models in single‐isocenter volumetric‐modulated arc therapy for multiple brain metastases

Abstract Purpose To validate the clinical applicability of knowledge‐based (KB) planning in single‐isocenter volumetric‐modulated arc therapy (VMAT) for multiple brain metastases using the k‐fold cross‐validation (CV) method. Methods This study comprised 60 consecutive patients with multiple brain metastases treated with single‐isocenter VMAT (28 Gy in five fractions). The patients were divided randomly into five groups (Groups 1–5). The data of Groups 1–4 were used as the training and validation dataset and those of Group 5 were used as the testing dataset. Four KB models were created from three of the training and validation datasets and then applied to the remaining Groups as the fourfold CV phase. As the testing phase, the final KB model was applied to Group 5 and the dose distributions were calculated with a single optimization process. The dose‐volume indices (DVIs), modified Ian Paddick Conformity Index (mIPCI), modulation complexity scores for VMAT plans (MCSv), and the total number of monitor units (MUs) of the final KB plan were compared to those of the clinical plan (CL) using a paired Wilcoxon signed‐rank test. Results In the fourfold CV phase, no significant differences were observed in the DVIs among the four KB plans (KBPs). In the testing phase, the final KB plan was statistically equivalent to the CL, except for planning target volumes (PTVs) D2% and D50%. The differences between the CL and KBP in terms of the PTV D99.5%, normal brain, and Dmax to all organs at risk (OARs) were not significant. The KBP achieved a lower total number of MUs and higher MCSv than the CL with no significant difference. Conclusions We demonstrated that a KB model in a single‐isocenter VMAT for multiple brain metastases was equivalent in dose distribution, MCSv, and total number of MUs to a CL with a single optimization.


| INTRODUCTION
Brain metastases are the most common intracranial tumors, which are present in approximately 2% of cancer patients at the time of the primary diagnosis, with a prevalence ranging from 15% (small cell lung cancer) to < 0.1% (prostate cancer) and a median overall survival of less than 1 yr. 1 The incidence of brain metastases is observed to increase because of the prevalence of magnetic resonance imaging (MRI), which has improved the rates of detection.
Moreover, the development of systemic therapy has improved the survival after the primary diagnosis of brain metastases. The standard treatment strategy for multiple brain metastases is whole-brain radiotherapy (WBRT); however, WBRT leads to the deterioration of neurocognitive function and patient quality of life. 2 Stereotactic radiosurgery (SRS) is an irradiation technique that requires the precise fixation of the patient, localization of the target, and the application of highly biologically effective radiation doses. 3 Using SRS, the irradiated dose to the normal brain can be reduced, and re-irradiation can be considered even in post-SRS patients who experience intracranial recurrence. In patients with 1-4 brain metastases, SRS causes lesser neurocognitive deterioration compared to SRS plus WBRT; additionally, there is no difference in the overall survival between the SRS and SRS plus WBRT. [4][5][6] Although the clinical advantage of SRS over WBRT is controversial in patients with more than four brain metastases, SRS is considered an effective and safe treatment option, especially in patients with a favorable prognosis. 7 Linac-based single-isocenter volumetric-modulated arc therapy (VMAT) can accomplish clinically equivalent dose distributions to gamma knife radiosurgery, but with a reduced delivery time. 8 However, the treatment planning of single-isocenter VMAT for multiple brain metastases is generally time-and resource-intensive. When performing the optimization, planners manually determine the optimization parameters; nevertheless, it is cumbersome to identify the optimization parameters due to the presence of multiple target volumes with different sizes and characteristics. In addition, for one target located adjacent to another in the same plane, several optimizations are required to minimize the dose spillage between the target volumes; as there is no clearly defined goal, the process can seem endless.
Recently, knowledge-based (KB) planning has become available for clinical use as a tool assisting inverse planning. 9  When verifying machine-learned models, the predictive performance of the models should be evaluated on unseen data. 10 The hold-out validation method is a simple approach for model evaluation where the data are categorized into two subsamples: training and testing. However, this method is prone to subsample bias and is inadequate for small sample sizes. The k-fold cross-validation (CV) method is another technique for the evaluation and comparison of machine-learned models. The k-fold CV method splits the data into k equal sized subgroups; one subgroup is used as a validation group and the remaining subgroups are used as a training dataset. With k-fold CV, the whole dataset can be used for both training and validation, and this method is affected by pessimistic bias in a lesser manner, compared to the hold-out method. In general, k-fold CV is usually considered as the preferred method because it allows the model to train via multiple train-test splits providing a better indication of how well the model will perform on unseen data. Most studies on KB planning employed the hold-out method for prostate/ uterine cancer, head and neck cancer, and lung, liver, and primary brain tumors. [11][12][13][14][15] To date, however, there has been insufficient literature comparing the performance of KB models through k-fold CV, or on the application of KB planning to multiple brain metastases.
The purpose of this study was to validate the clinical applicability of KB planning in single-isocenter VMAT for multiple brain metastases using the k-fold CV method.

2.A | Patients
This study included 60 consecutive patients with multiple brain metastases treated using single-isocenter VMAT (28 Gy in five frac-  The organs at risk (OARs) included the normal brain (whole brain minus the PTVs), brainstem, eyes, lens, optic nerves, chiasm, and skin (defined as a structure cropped 5 mm from the body).

2.B | Treatment planning
The isocenter was located at the center of all the PTVs. The VMAT plans were created using 3-5 arcs, including 1 full coplanar arc and 2-4 non-coplanar partial arcs with a couch angle of ± 60°.
The collimator angle was manually selected depending on the size and location of the target. Photon beam energies of six flattening filter-free (FFF) and 10 FFF MV photon beams were used. All the treatment plans were generated the Eclipse planning system (version with heterogeneity correction and a 1-mm grid resolution. Plan normalization was performed in a manner such that at least 99.5% of the prescribed dose (D 99.5% ) of 28 Gy (five fractions) was generally delivered to each PTV. Plan optimization was performed so that the near-maximum dose (D 2% ) to all PTVs was around 40 Gy (Per protocol, 135-150%) and the irradiated dose to the OARs was as low as possible.

2.C | Model creation, evaluation, and selection
The fourfold CV method was used for model creation, evaluation, and selection ( Fig. 1). First, the dataset was split into two; Groups 1-4 were used as the training and validation dataset and Group 5 as the testing dataset (Table 1). Next, the model configuration and hyperparameter setting were conducted using RapidPlan TM (version 13.7; Varian Medical Systems). 16 RapidPlan TM is a machine learning system based on the geometric relation of the structures and DVH.
As part of the model-creating process, the dosimetric and geometric data were extracted from the database to establish DVH estimation models using regression techniques. In RapidPlan TM , there are mainly two types of optimization objectives; a fixed objective and a line objective. The former manually adds a fixed upper or lower dose or volume, with a fixed priority or with a generated priority created by the DVH estimation algorithm. Fixed objectives can be used for both OARs and target structures. On the other hand, the latter generates an estimated DVH range only for the OARs, which is also created by the DVH estimation algorithm automatically. Using a line objective, the optimization is performed so that the OAR will receive a dose as low as the estimated DVH range. Therefore, the volumes, doses, and priorities are automatically generated in a line objective.
Subsequently, four DVH estimation models were created from the training and validation dataset with the determined hyperparameter setting shown in Table 2 Group 4). A lack of any statistical difference among the DVIs of the four KB plans (KBPs) and a D 2% for all PTVs ranging from 130% to 155% of the prescribed dose (clinically acceptable variation), was taken to indicate that the models were not overfitted and had a good generalization performance.
Finally, in the testing phase, the accuracy of the created models was focused upon. Model E (ME; Groups 1-4) was constructed using all the training and validation datasets and the same hyperparameter settings as shown in Table 2. ME was applied to Group 5 (independent testing dataset), and a KBP (KBP-E5) was then generated. A lack of statistical difference between the DVIs extracted from the KBP-E5 and the CL of each corresponding PTV and each OAR in the same patient was regarded to indicate that the KB models yielded clinically acceptable plans.

2.D | KB-generated plan optimization and dose calculation
One optimization cycle was performed for each model without the modification of the KB-generated optimization objectives. Thereafter, the dose distribution was calculated with the same beam arrangement used in the CL and plan normalization was performed.

2.E | Evaluation indices
In the validation phase, the DVIs for the PTV (D 2% , D 50% , and D 99.5% ), normal brain (V 5 Gy , V 10 Gy , V 14 Gy, V 20 Gy , and V 28 Gy ), and the maximum dose (D max ) to all OARs were compared among plans.
In the testing phase, in addition to the DVIs above, the Ian Paddick Conformity Index (IPCI) was also employed to evaluate the dose distributions. 17

3.A | Validation phase
In the fourfold CV phase, no significant difference was observed in any DVI among the four KBPs. The details of the DVIs for all the PTVs, normal brain, and other OARs between the four KB models are shown in Fig. 2

| DISCUSSION
To the best of our knowledge, this is the first study to assess the performance of KBPs in single-isocenter VMAT for multiple brain metastases using the k-fold CV method. The number of test samples was almost equivalent to previous reports on KB planning. [11][12][13][14][15]19 As shown in Table 3, most previous studies employed the hold-out method; however, as described above, one disadvantage of this method is that the performance evaluation is subject to a higher variance given the smaller dataset, whereas multiple validations can be performed using the k-fold CV method to conclusively determine the model overfitting. Using the k-fold CV method, all of our KB models yielded clinically acceptable plans with a single optimization.
KB planning, a machine-learning tool for determining the best practice based on past successful treatment plans, creates KB models for improving the treatment plans for future patients. It is important to compare the performance of KB models in terms of generalization. In this study, k-fold CV was applied for model evaluation and comparison. MA to MD were created with fixed parameters and applied to different patients' groups for performance evaluation.
The KB models were statistically equivalent after the adjustment of learned and fixed optimization parameters, although the interquartile ranges of the DVIs varied among them (Fig. 2). In the testing phase, the DVIs for ME were compared to the CL. We found that the ME generated statistically equivalent plans to the CL with a single opti- treatment planning algorithm because the definition of "best" varies according to clinical factors, such as the patient's condition and treatment preferences. Therefore, we used the DVIs of the CL as the "best" outcome measures; the DVIs were compared between the KBPs and the CL for model evaluation.
According to a review paper on KB planning, several researchers achieved comparable, and often improved, VMAT plans using KBPs, while also reducing the planning time and variation in the plan quality. 9 We demonstrated that the PTV D 2% for the KBP was significantly higher than that for the CL, while achieving the same dose conformality, and also that the radiation dose to the normal brain for the KBP was similarly low to that of the CL in patients with 2-18 PTVs, with a single optimization. In brain SRS, a higher D max of the PTVs is associated with an improved local control of the dose, and a lower irradiated dose to the OARs with higher mIPCI values could decrease the radiation necrosis. According to previous research, a normal brain V 14 Gy is a good indicator of radiation necrosis in patients with large metastases after five-fraction CyberKnife radiotherapy (Accuray, Sunnyvale, CA, USA) 20 ; however, such dose-volume constraints would not always be applicable, depending on the fractionation, target sizes, and the number of target volumes. During the manual inverse planning of single-isocenter VMAT SRS for multiple brain metastases, it is difficult to set definitive clinical goals because of such variations. With KB planning, the realistically achievable dose distribution can be predicted and patients can receive high-quality treatment even with limited time and human resources.
Numerous parameters reveal the plan complexity, such as MCSv, a modulation index, and the plan-averaged modulation. 18,21,22 The MCSv was employed in this study as it allows for an effortless comparison to other studies. In the study by Masi et al., 19 the MCSv values were in the range 0.25-0.50 for the conventional VMAT plans.  isocenter VMAT in terms of the conformity and dose falloff. 23,24 In this study, the same beam parameters (beam gantry, couch, and collimator angles) used in an approved CL were employed during the testing phase; however, in a clinical situation, the beam arrangement is determined through a process of trial and error, which is time consuming and affects the plan quality. As shown in the present and previous studies, 23 Abbreviations: CV, cross-validation; LOOCV, leave-one-out cross-validation, GBM = Glioblastoma. * Note: two models were constructed.
F I G . 6. The scatter plots and box-and-whisker plots of (a) the mIPCI, (b) modulation complexity scores for VMAT plans (MCSv) and (c) monitor units (MUs) in the testing group. The scatter plots of mIPCI and MCSv were above the diagonal line (y = x). mIPCI = modified Ian Paddick Conformity Index. Note: The mIPCI was defined as ([TV PIV_sum ] 2 / [TV sum × PIV sum ]); TV PIV_sum = the sum of target volumes enclosed by an isodose line of the prescription dose, TV sum = the sum of volume of all PTVs, PIV sum = the sum of prescription isodose volume. The mIPCI approaching 1 means that PTVs were conformally covered with the prescribed dose. model) were generated after the adjustment of learned and fixed optimization parameters. After confirming the generalization performance of the models, the final KB model was applied to the test group. We demonstrated that the KB model in the single-isocenter VMAT for multiple brain metastases was equivalent in dose distribution, MCSv, and the total number of MUs to the CL with a single optimization.

CONS ENT F OR PU BLICATI ON
The consent for publication was obtained via our institution's form.

AUTHORS' CONTRI BUTIONS
NK performed the planning study and statistical analysis, and drafted the manuscript. NK, MN, HH, NM, KT, and MU conceived the study, participated in its design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.