Automated data mining of a plan‐check database and example application

Abstract Purpose The aim of this work was to present the development and example application of an automated data mining software platform that preforms bulk analysis of results and patient data passing through the 3D plan and delivery QA system, Mobius3D. Methods Python, matlab, and Java were used to create an interface that reads JavaScript Object Notation (JSON) created for every approved Mobius3D pre‐treatment plan‐check. The aforementioned JSON files contain all the information for every pre‐treatment QA check performed by Mobius3D, including all 3D dose, CT, structure set information, as well as all plan information and patient demographics. Two Graphical User Interfaces (GUIs) were created, the first is called Mobius3D‐Database (M3D‐DB) and presents the check results in both filterable tabular and graphical form. These data are presented for all patients and includes mean dose differences, 90% coverage, 3D gamma pass rate percentages, treatment sites, machine, beam energy, Multi‐Leaf Collimator (MLC) mode, treatment planning system (TPS), plan names, approvers, dates and times. Group statistics and statistical process control levels are then calculated based on filter settings. The second GUI, called Mobius3D organ at risk (M3DOAR), analyzes dose‐volume histogram data for all patients and all Organs‐at‐Risk (OAR). The design of the software is such that all treatment parameters and treatment site information are able to be filtered and sorted with the results, plots, and statistics updated. Results The M3D‐DB software can summarize and filter large numbers of plan‐checks from Mobius3D. The M3DOAR software is also able to analyze large amounts of dose‐volume data for patient groups which may prove useful in clinical trials, where OAR doses for large numbers of patients can be compared and correlated. Target DVHs can also be analyzed en mass. Conclusions This work demonstrates a method to extract the large amount of treatment data for every patient that is stored by Mobius3D but not easily accessible. With scripting, it is possible to mine this data for research and clinical trials as well as patient and TPS QA.


| INTRODUCTION
Recent reports have estimated that the U.S. healthcare data accumulation reached 150 exabytes (10 18 bytes) in 2011, with growth models predicting this to exceed a zettabyte (10 21 ) and eventually a yottabyte (10 24 ) of data every year. 1 To put this in context, the capacity of the human brain has been estimated at 1 petabyte (10 15 ) 2 and thus to store this accumulation of data, 1 billion individuals would be required. Historically, the utilization of this data has been low but the future potential is well established. 1 In the radiation oncology health space, the use of database systems has been reported for long (25 years) 3 and short (5 years) 4 periods, as well as for specific treatment sites. 5 On a global level, the reported advantages ranged from informing treatment policy, resource allocation, clinical trials assessment, enhancing research and the publication of peer-reviewed papers. [3][4][5] For optimal outcomes, these publications also allude to the need for these database tools to be automated and streamlined in the collection, analysis, and presentation of large amounts of data. The majority of this data is currently stored within  [7][8][9][10][11][12][13] By including the patient anatomy and having independent beam data and dose calculation algorithm, Mobius3D not only provides a robust second check of the treatment plan prior to the patient being treated, but can also inform clinical decision making due to more subtle TPS limitations and uncertainties. 14 One aspect of Mobius3D, which to the author's knowledge has perhaps not yet been explored, is the fact that all of the patient's plan data used to calculate the result and present the report is also stored by Mobius3D.
CT data, structure sets, plan parameters (field sizes, gantry angles, couch angles, beam energy, MUs, MLC data), patient demographics, dose-volume, and fractionation information are all recorded.
While almost all of these data are also stored in the OIS, it can be difficult to access at a database level and does not offer the additional benefit of also having independent verification data. As a result, there is a vast amount of data those originate in the TPS and is then stored within the Mobius3D database, but is not necessarily scrutinized on a global level.
In the preface to this work, results obtained during commissioning of the system have been presented. 14 In that work, results from the first 1000 patients were used to modify the tolerance values relating to pass and action limits for plan-check results depending on the calculation algorithm (superposition-convolution or pencil beam) and treatment site. This initial work was primarily based on manual spreadsheet input. The work presented herein builds on this through scripting and GUIs that are used to automate the retrieval, analysis, and presentation of any data of interest. As an example application for this automated data retrieval, statistical process control (SPC) [15][16][17] methods can be employed on the comparison between Mobius3D and the TPS with respect to the mean dose to the PTV.
At our institution, every patient's approved treatment plan is sent to Mobius3D prior to treatment regardless of the complexity of technique. The Mobius3D report is then either checked by a radiation therapist for 3D conformal radiation therapy or by a Physicist for dynamic or stereotactic treatments. The Mobius3D check is approved and a report is generated and stored with the patient's file in the OIS.
As every patient's final treatment plan transits through the Mobius server and all of the plan information is retained, there is automatically a large amount of untapped data which can be accessed through scripting. The logical next step is to utilize this data to obtain global plan information for patient populations, dose distribution and OAR statistics, or even patient throughput analyses for facilities management.
The aim of this work was to present this software and its current

2.A | Mobius3D and data storage
Mobius3D runs on stand-alone server architecture with graphics-processing unit (GPU) capabilities and provides a secondary dose calculation using the same CT data, structure set and RT plan used by the TPS. The value of this secondary dose calculation is that it is fully 3D and uses a proprietary dose calculation algorithm whose input is consensus beam data, rather than institution specific beam data collected by the user, as used by the TPS. Mobius3D compares its own dose calculation result to that of the TPS and presents a comprehensive report encompassing dose-volume metrics, gamma, and coverage statistics. The end-user can specify warning and out-of-tolerance levels for mean dose and 90% coverage as well as the percentage 3D gamma pass-rate. DVH constraints are taken from the literature (RTOG publications, AAPM TG 101) and are fractionation dependent (conventional vs stereotactic).
Mobius3D stores all plan data needed for calculation and presentation of results in the form of a JavaScript Object Notation (JSON) file. This data format is an open-standard, human readable text consisting of attribute-value pairs and data array types similar to XML. Table 1 shows the basic data structure and major fields that are stored for each patient. Note that for each patient two .JSON files are stored on the server, a .JSON file containing all of the plan data and a .JSON file containing all of the dose-volume data for all structures listed in the plan that has been previously assigned in Mobius3D.

2.B. | Python and MATLAB
Python is used to query the server and copy plan data .     As an example, typing "Rectum" into the search field and clicking

2.E. | Clinical workflow
The basic workflow for the use of Mobius3D at our facility is shown in  The average difference between the mean doses to the PTV as calculated by the TPS and M3D for all plan checks was found to be −0.13% ± 1.2% (1σ). The maximum difference for all plan-checks to date was 5.0% and the mean gamma pass-rate (3%/3 mm) was 98%.  The software could also be used to provide a method for automated and continuous QA of TPSs. The hypothesis for this use is that if variable parameters that drive the TPS dose-calculation are incorrect or inadvertently modified, an analysis of sequential plan-checks using M3D-DB should yield an out of control process that can then be investigated further.
Automation is important for maximizing the benefit of a database while minimizing the labor required maintaining and interrogating it.
At the time of publication, this software is setup independently from the Mobius3D server. However, a version is being created that will run on the network and automatically query the Mobius3D database once a new plan-check is completed. The role of the software will then be a quality assurance tool, comparing the current plan-check against all previous plan-checks for identical parameters using SPC and alerting the Physics personnel to investigate outliers, all the while collating and collecting dose-volume and patient plan information. This automation could also extend to DVH data comparing a DVH for an OAR to all existing DVH data for that organ.

| CONCLUSION
This technical note presents the development and use of software that allows for meta-analyses of plan-check and DVH data obtained and processed by Mobius3D. The scope of this work in the era of F I G . 8. Another example of filtering out TPS DVH data, this time for the rectum volume for prostate patients. Shown here is the TPS DVH data for every patient found, along with the mean, median, and μ ±1σ, shown as thick blue-dashed lines, thick red dashed line and thick green dashed lines, respectively. Also shown here are histograms based on the selected dose-volume constraints (dashed crosshair) of approximately 60 Gy and 50%, respectively. DUNN AND JOLLY | 747 "big data" in healthcare is niche and addresses only a portion of the data available for analyses, per patient throughout a course of radiotherapy. However, the potential of such a database and analysis framework to be automated provides an additional layer of quality control at no additional cost in labor. The database generated expands in size with the natural progression of each patient to their treatment and could potentially feed into larger clinical databases for imaging, demographic, diagnoses, and outcome cross-correlation.

CONF LICT OF I NTEREST
No funding has been received from Mobius Medical Systems relating to the development of this software and this publication. No financial arrangement is in place for potential future financial benefits relating to this software.