A longitudinal evaluation of improvements in treatment plan quality for lung cancer with volumetric modulated arc therapy

Abstract Purpose To investigate planning time and number of optimizations in routine clinical lung cancer plans based on the plan quality improvements following each optimization. Materials and method We selected 40 patients with lung cancer who were treated with conventional fractionated radiotherapy (CFRT). The 40 plans (divided into two groups with one or two target volumes) were completed by 9 planners using volumetric modulated arc therapy (VMAT). A planning strategy, including technique script for each group and a planning process for data collection, was introduced. The total planning time, number of optimizations, and dose–volume parameters of each plan were recorded and analyzed. A plan quality metric (PQM) was defined according to the clinical constraints. Statistical analysis of parameters of each plan following each optimization was performed for evaluating improvements in plan quality. Results According to the clinical plans generated by different planners, the median number of optimizations of each group was 4, and the median planning time was approximately 1 h (68.6 min and 62.0 min for plans with one or two target volumes, respectively). The dose deposited in organs at risk (OARs) gradually decreased, and the PQM values gradually improved following each optimization. The improvements were significant only between adjacent optimizations from the first optimization (Opt1) to the third optimization (Opt3). Conclusion Increasing the number of optimizations was associated with significantly improved sparing of OARs with slight effects on the dose coverage and homogeneity of target volume. Generally, based on the designed planning strategy, there was no significant improvement of the plan quality for more than three optimizations.

angles, field size, or the gantry angle spacing between subsequent control points. These parameters are often selected manually via trial-and-error according to the planner's experience.
Due to the complexity of treatment plans, the planning process is usually iterative and time consuming. Planning time of the standard planning process is dominated by performing optimization iterations with the system (i.e., setting parameters, performing the optimization, evaluating the results, and repeating these steps until the planners are satisfied). 4 The planning time is an important factor that can be used as not only a significant reference for both planners and physicians but also an important data for the administrator of the department to improve the management of clinical workflow.
The investigation of average planning time is mainly based on the statistical data of large number of clinical plans. Therefore, we employed 40 lung cancer plans and designed a planning strategy to obtain time consumption of each plan.
The quality of treatment plans would also vary considerably among different planners and institutions, 5 which means sub-optimal treatment plans may be produced. During plan optimization process, two situations may occur. One involves insufficient optimization, although the clinical requirements are achieved, the dose distribution can be improved through further optimization. The other situation involves over-optimization, in which the plan is optimized beyond a certain number of optimizations without significantly improving plan quality. To investigate the gains in plan quality improvement during optimization, we performed an analysis of longitudinal dosimetric trends by comparing adjacent optimizations in the planning process.
To conduct a comprehensive assessment of a treatment plan, it is necessary to compare each metric of the target volumes and the OARs, as well as the overall plan quality according to a quantitative evaluation criterion. Certain quantitative evaluation methods are mainly used to compare the quality of the same plan finished by different planners, different institutions, different TPSs, or different modalities. These methods cannot directly be used to evaluate statistical control experiments. Thus, according to the existing plan quality metric (PQM) 5 and quality score S D , 6 here we introduce a new plan quality scoring procedure for lung cancer.
This study comprised two parts. The first involved the statistics of planning time and number of optimizations of the resultant lung cancer plans. According to the designed planning strategy, 40 plans were completed by nine planners with different years of experiences, and corresponding data were recorded. The second involved treatment planning improvement following each optimization of a plan. An analysis of longitudinal dosimetric changes was performed, and the new PQM scoring procedure was used to quantify treatment plan quality.

2.A | Patient selection and planning objectives
To include homogenous patient population, treatment plans of 40 patients with lung cancer who underwent conventional fractionated radiotherapy (CFRT) were selected from recent clinical treatment plans. The patients were scanned during normal breathing in the supine position using 5 mm slice thickness computed tomography (CT) in plane voxel size of 1 mm × 1 mm. Gross tumor volume (GTV), planning gross tumor volume (PGTV), clinical target volume (CTV), and planning target volume (PTV) were contoured by qualified radiation oncologists. Other relevant OARs were delineated, which mainly included the whole lung, spinal cord, and heart. A 5 mm margin was added to the spinal cord as the planning organ at risk volume (PRV). whole lung receiving more than 5 Gy (V5) is not specified (the lower the better), and more than 20 Gy (V20) < 28%; mean dose, whole lung D mean < 17 Gy; volume of heart receiving more than 30 Gy (V30) < 40% and more than 40 Gy (V40) < 30%. The 40 plans were finished by nine planners with 3-10 yr working experiences. In the planning process, information including time points and evaluation parameters of the plans after each optimization was obtained ( Fig. 1). First, when we received a plan, the system auto-   set to more stringent conditions, meaning that each objective was tuned properly, and the optimization process continued with objective values >0. The 40 plans were optimized using the same hardware configuration. The collection of required information did not significantly interfere with the routine clinical planning design.

2.C | Study endpoints
The 40 plans were divided into a group of plans with one target volume PTV and a group with two target volumes (PTV and PGTV). Different optimizations were compared using metrics averaged over the 40 plans of each optimization as follows: i Homogeneity index (HI) is defined as: Here D2%, D50%, and D98% are minimum doses delivered to 2%, 50%, and 98% of the PTV, respectively. The closer the value of HI is to 0, the better is the homogeneity of PTV. 11 ii Conformity index (CI) is defined as: V PTV is the volume of PTV. TV PV is the portion of the V PTV within the prescribed isodose line. V TV is the treated volume of the prescribed isodose line. The closer the value of CI is to 1, the better is the conformity of PTV. 12 iii Dose deposition in lungs was analyzed using V5 Gy (%), V20 Gy (%), and mean dose (D mean ).
v Maximum dose (D max ) to spinal cord and spinal cord PRV.
These metrics were determined according to the patients' clinical requirements. were set from 0 to 5. The quality score S of each plan is the sum of PQM values of the subcomponents, 5,6 defined as follows:

2.D | Plan quality metrics
k is the number of subcomponents, S i is PQM value of corresponding metric (M i ), M il and M iu are the lower limit and upper limit of M i , respectively. PQMvalue imax is the maximum value (highest score) of each metric. The interval of each metric was determined for all recorded data of the 40 plans. Thus, all lung cancer plans in this control experiment could be evaluated using this PQM scoring procedure.

2.E | Statistical analysis
The dosimetric data are summarized per optimization using mean ± SD and confidence intervals. Statistical analysis was performed using SPSS v17 (IBM Corp). The paired t test was adopted to compare the intergroup difference of data, and P < 0.05 indicates a significant difference.

| RESULTS
According to the recorded information of each plan, the planning time and number of optimizations of the 40 plans was shown in   For the first group of plans with single target volume, the CI and HI were improved after the second optimization (Opt2), and the HI was significantly worse after the fourth optimization (Opt4) (P < 0.05). The metrics of both lungs were significantly different, except for those of V20 from Opt2 to Opt4. The other metrics including V5, V20, and D mean gradually and significantly (P < 0.05) decreased from Opt1 to Opt4. For the metrics of heart, V30 and V40 were slightly decreased from Opt1 to Opt4, and the differences of V30 from Opt1 to Opt3 and V40 between Opt1 and Opt2 were significant (P < 0.05). The maximum dose delivered to spinal cord was slightly reduced after Opt2, and the maximum dose delivered to spinal cord PRV gradually and significantly (P < 0.05) decreased from Opt1 to Opt3.
The results for the second group of plans with two target volumes were similar to those of the first group of plans with single target volume. The CI and HI improved after Opt2 for PTV and PGTV, and the HI of PGTV was significantly (P < 0.05) worse after Opt4. The V5, V20, and D mean of lungs gradually and significantly (P < 0.05) decreased from Opt1 to Opt3. For the metrics of heart, the differences of V30 between Opt2 and Opt3, as well as V40 between Opt1 and Opt2 were significant (P < 0.05). The maximum dose to the spinal cord was significantly (P < 0.05) reduced from Opt1 to Opt3.
For both groups, the improved OAR sparing from Opt1 to Opt4 did not have a significant effect on the CI and HI of the target volumes.

| DISCUSSION
Here we conducted a statistical analysis on planning time and number of optimizations through improvements along successive According to the results of our present statistical analysis on the plan quality improvements from Opt1 to Opt4 in the two groups of plans, we found that advances were made throughout the optimizations that were significantly associated with dose reduction delivered to OARs. The gradual sparing of the OARs had only a slight effect on CI and HI of target volumes. The average differences from Opt1 to Opt4 were significantly associated with improved dose deposition in lung tissue. In contrast, there were no obvious improvements in the dose distributions to heart and spinal cord when the metrics met the clinical goals. In this study, paired t tests were performed between the adjacent optimization from Opt1 to Opt4, and there were significant improvements of plan quality found from Opt1 to Opt3. Since no significant improvement was found between Opt3 and Opt4, the analysis on subsequent optimizations (from Opt5 to Opt8) was not involved. There was no significant improvement of lung cancer plans for more than three optimizations.
This result could be especially valuable to the planners, so that they   The continued copying after each optimization might disturb the planners, which may influence the planning time and the number of optimizations. Here we developed a script (Data S1) to incorporate the data collection and PQM scoring procedure into the Pinnacle, so that the planning strategy will be further improved, and there will be less interference to the routine clinical planning design.

| CONCLUSIONS
In this study, we conducted the average performance of routine clinical lung cancer plans, and achieved similar statistical results on the plans with one or two target volumes. The longitudinal evaluation of plan quality indicated that the increasing number of optimizations was associated with significantly improved OAR sparing while only slightly affecting PTV dose coverage and homogeneity.
We believe that the average performance and the planning strategy for lung cancer plans can help the planners (especially the junior planners) to design an optimal treatment plan in a more efficient way, and the average planning time can help the administrator to improve the management of clinical workflow. Furthermore, the PQM scoring procedure can help both planners and physicians to quantitively evaluate the plan quality considering various dose parameters.

ACKNOWLEDG MENTS
This work was supported by the National Key Projects of Research and Development of China (2016YFC0904600) and the National Natural Science Foundation of China (81801799).

CONFLI CT OF INTEREST
The authors report no conflict of interest with this study. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with this work.

R E F E R E N C E S
14. Xhaferrllari I, Wong E, Bzdusek K, Lock M, Chen J. Automated IMRT planning with regional optimization using planning scripts.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.
Data S1. Pinnacle script for data collection and PQM scoring procedure XIA ET AL.