Data integrity systems for organ contours in radiation therapy planning

Abstract The purpose of this research is to develop effective data integrity models for contoured anatomy in a radiotherapy workflow for both real‐time and retrospective analysis. Within this study, two classes of contour integrity models were developed: data driven models and contiguousness models. The data driven models aim to highlight contours which deviate from a gross set of contours from similar disease sites and encompass the following regions of interest (ROI): bladder, femoral heads, spinal cord, and rectum. The contiguousness models, which individually analyze the geometry of contours to detect possible errors, are applied across many different ROI's and are divided into two metrics: Extent and Region Growing over volume. After analysis, we found that 70% of detected bladder contours were verified as suspicious. The spinal cord and rectum models verified that 73% and 80% of contours were suspicious respectively. The contiguousness models were the most accurate models and the Region Growing model was the most accurate submodel. 100% of the detected noncontiguous contours were verified as suspicious, but in the cases of spinal cord, femoral heads, bladder, and rectum, the Region Growing model detected additional two to five suspicious contours that the Extent model failed to detect. When conducting a blind review to detect false negatives, it was found that all the data driven models failed to detect all suspicious contours. The Region Growing contiguousness model produced zero false negatives in all regions of interest other than prostate. With regards to runtime, the contiguousness via extent model took an average of 0.2 s per contour. On the other hand, the region growing method had a longer runtime which was dependent on the number of voxels in the contour. Both contiguousness models have potential for real‐time use in clinical radiotherapy while the data driven models are better suited for retrospective use.


| INTRODUCTION
The concept of data integrity systems for mapped organ contours in radiation therapy aims to improve both the accuracy and consistency of data. New advances in automated segmentation technology paired with radiotherapy dose calculations have improved the ability of clinicians to accurately contour boundaries of organs at risk in radiotherapy. [1][2][3][4][5][6] These technologies have been applied to numerous regions of interest (ROI) in the head, neck, thorax, and prostatic regions. [7][8][9] Other studies have segmented CT images using patient and population based statistics. 10 However, with any qualitative, manually completed activity, there are margins of error, which if not detected, can have implications on the treatment of patients and how physicians treat future patients. Poorly or spuriously mapped contours by physicians and residents has the potential to result in erroneous radiation dosing of critical, noncancerous anatomy and has the potential to skew predictive models developed by data scientists to extrapolate post-treatment parameters, such as weight loss and dysphagia.
Currently, many studies have been completed regarding integrity checking in radiation therapy across a given set of patients. 11,12 After treating patients with a consistent diagnosis, physicians tend to assess the variation in treatment planning and delivery across the set of patients with the goal of standardizing treatment for that diagnosis. Similar studies analyze the integrity of radiation treatment through a standardized set of parameters. Other studies aim to improve the safety and integrity of treatment through the verification of prescriptions and a variety of "in house parameters." 13 With regard to automated treatment control, studies have analyzed the efficacy of tools to improve the safety and integrity of intensity modulated radiation therapy (IMRT) while finding the optimum treatment plans for patients. 14 On the other hand, this technology employs an active approach to improving integrity, through the lens of a clinical database environment. In this study, we developed two classes of contour integrity checks. The first, a data driven integrity check, aims to develop and test models which identify contours which deviate from the norm of a set of data. The second is an internal ROI check which is developed independently and applied to a set of contours with potential for real time applications in radiotherapy. By employing metrics to detect poorly contoured anatomy within the radiation oncology clinical workflow, this technology distinguishes itself from prior studies and technologies within the realm of radiation therapy planning integrity. It aims to improve the quality of clinical data for data scientists and physicians, minimizing the risk of radiation overdose to critical anatomy for patients.

| MATERIALS AND METHODS
The contour data used in this study comes from the Oncospace 14 database, a learning health system comprised of clinical radiotherapy patient data. Specifically, given the regions of interest tested in this study, we used contour data from the Oncospace Head and Neck and Oncospace Prostate databases. The Oncospace data used within this study was collected across several clinics to ensure accuracy and consistency throughout development and analysis. The number of contours analyzed is dependent on the region of interest and was not consistent across each region of interest. The development and testing of algorithms as well as analysis was completed using Python and MATLAB R2017a. In this study, we divide our models into two classes: data driven and contiguousness. With regard to the data driven models, we used several metrics as thresholds and classifiers to create the models.
The first metric used in the data driven models was total ROI volume. Using Microsoft Visual Studio as a platform, we used SQL direct queries to query patients from the Oncospace 15 Prostate Database based on specific Region of Interest Volume. We then consolidated the patient lists, organizing by ROI, and exported the data into MATLAB and Python, our analysis software. Total ROI volume aims to detect abnormally large or abnormally small organ contours.
To the analyst, an abnormally large organ contour could indicate incorrectly contoured surface anatomy while an unusually small volume could indicate missing geometric contour slices.
Similarly, the next metric, total ROI Extent, indicates anatomy which extends abnormally in the left-right, anterior-poster, or inferior-superior directions. We define Extent as the range of voxels in a three-dimensional grid in each direction. Total ROI Extent is calculated by converting the transition points of the binary mask of a contour into sets of Indices which map the surface voxels of this contour. The binary mask is encoded using a data compression technique called run length encoding, 16 shown in Figure 1. Here, extensive runs of data are stored as single data counts rather than its original run. Then, we applied the following equations over each contour to compute indices for each contour. Below are the defined variables within the equations. F I G . 1. Shows a diagram explaining the concept of run length encoding. Run length encoding is a lossless data compression method which stores long runs of data into single data counts. A common application of this methodology is in JPEG files.
Index: A certain transition point within the "Mask" array We then apply an algorithm to ensure that voxels fill each slice of the contour. Lastly, we subtract the maximum indices value from the minimum in each direction within a 3D space to find the Extent.
An abnormally large left-right or anterior-superior Extent could indicate suspicious or poorly contoured anatomy or missing slices in the inferior-superior direction. While total Extent will detect more generalized abnormalities throughout the contour, sliced based The data driven models developed and tested in this study are over the following ROIs: bladder, femoral heads, rectum, and spinal cord. In these models, we combined the aforementioned metrics to develop thresholds which could be applied over sets of treated radiotherapy contours derived from CT imaging. 1 These models distinguish themselves from the contiguousness models in that they are data driven, meaning they are tested and developed on a data set to optimize their ability to detect poorly or suspicious contoured anatomy. Each model is unique to a certain ROI. On the other hand, the contiguousness models were developed to be tested across many ROI to detect suspicious contours across the set. and above the 90th percentile for any of the three Extent ratios, as shown in Figure 2. The contours which met these conditions were then verified using an ROI shape verification tool which projects a surface plot of a contour, created using its indices derived from its mask and multiplied by a voxel dimension size. Figure 3 shows a suspicious bladder contour identified by the model. Extent ratio and inferior-superior Extent was applied. A subarray of contours which met this condition were then verified using the same ROI shape verification tool used in the bladder contour integrity model to verify their suspicious nature. Figure 5 shows an example of a contoured femoral head, but not a femoral shaft.

2.C | Rectum and spinal cord contour integrity model
Among ROI contours which possess variation from each slice in the inferior-superior direction, slice based Extent calculations were used.
The unique ability of slice based Extent is that it will detect poorly contoured anatomy which is not missing slices in any given direction but possesses abnormally oblong or undersized on any slice in the inferior-superior plane. This is the case in rectum contours. Lastly, it creates another array by applying a function which sorts radiotherapy contours whose slice based Extents are greater than the 90th percentile in the left-right and anterior-superior directions.
The contours within this array were then verified as abnormal using the ROI shape verification algorithm used in the bladder model. The software supporting the contiguousness by Extent metric was derived from the concept that slice based Extent or total ROI volume would not be able to detect subtler suspicious contours. This model iteratively loops by slice across a region of anatomy and evaluates whether slices are missing in a given direction. Here, a contiguous contour is one in which number of unique voxels in each geometric direction plus one equals the Extent in that given F I G . 2. Shows a histogram of the volume of bladder contours. Once contour volumes were queried from the Oncospace database, the data were stored in an array and then projected into a histogram. This aims to show the shape and spread of bladder contour volume data. F I G . 3. Shows a suspicious bladder contour detected by the bladder integrity model. Within this contour, one can see that it projects abnormally far in the anterior-superior direction and seems to be missing contoured anatomy in the inferiorsuperior plane.

2.D | Contiguousness integrity model
direction. Noncontiguous contours would therefore be missing contoured slices in a given direction or possess voxels of contoured anatomy that is projected away from the main body of the contour.
The next contiguity metric, Region Growing over Volume, validates a binary mask structure over a voxel grid. Within this metric, contiguity is defined as having a path from one voxel to every other voxel throughout the mask of the contour. The name "region growing" is derived from the algorithm supporting contiguity via region growing over volume, which analyzes specific voxels within a binary mask. This algorithm was implemented in Python, using the SciPy library. Starting at a single voxel within a contour, this method repeatedly searches for all adjacent voxels until none exist. Figure 6 shows an example of a contiguous region, where the green voxel grows a neighborhood that encompasses all voxels in the structure.
By contrast, in Figure 7, a noncontiguous region the number of visited voxels does not equals the voxels in image, indicating that there is not a contiguous path between all voxels of the structure. K-d trees 17 were used as a method for fast indexing and lookup of neighbors. The KDTree.query_ball_point() 18,19 method returns all points within a specified radius from a point, in this case, √3, since F I G . 4. Shows a histogram of the volume of left femoral head contours. This histogram aims to show the shape and distribution of the volume of contoured femoral heads. Upon review, it seems to show a skew toward the lower end of volumes. F I G . 5. Shows a suspicious left femoral head contour using MATLAB software. One can see that only a small portion of the femoral neck is contoured and that the projected voxels in the inferior-superior direction seem incomplete. This contour was clearly detected by the volume threshold set in this model. points are treated as a unit-voxel grid corresponding to the voxel indices. Figure 8 shows pseudocode of the implementation of the region growing algorithm.
After creating these metrics, we applied their respective algorithm to patient lists across the aforementioned regions of interest.
For the region growing algorithms, we also computed runtime data for the contours tested to assess their feasibility real time applications. Lastly, to check for false negatives, suspicious contours undetected by the integrity metrics, we conducted a blind review of all of the contours tested across the regions of interest. During this review, we marked all suspicious contours and then compared the results to that of the preliminary analysis.

| RESULTS AND DISCUSSION
After completing the analysis, all data integrity models were successful at detecting suspicious or abnormal contours within the clinical workflow, however, to differing levels. Based on the results summarized in Tables 1-6, it is clear that the Contiguousness models are the most exact method of detecting these contours within a clinical radiotherapy workflow as each suspicious contour was verified as abnormal via the MATLAB ROI shape verification tool. On the other hand, the data driven models, while accurate, did have false positives and false negatives. The results shown in Tables 1-6 indicate that certain contours are contoured abnormally at a higher frequency than other contours. More so, Table 4 summarizes the accuracy of the data driven models, while Tables 5 and 6 summarizes the accuracy of the contiguousness models. A common example is the spinal cord, which was contoured suspiciously 82 times using a sample of 1148 patients. In the case of the spinal cord, a treated noncontiguous contour could result in radiation dose passing through the cord to critical, nontarget anatomy. This issue also presents in ROIs, such as the bladder, where the contoured anatomy is near the target volume.
Using the Bladder Contour Data integrity model indicates a high level of variation in contoured bladder anatomy, giving way to nine F I G . 6. Shows a diagram which explains the characteristics of a contiguous contour. The diagram begins with one voxel filled and continues to grow until no more voxels can be filled without breaks, implying contiguity.  The accuracy of the femoral head prediction model can be attributed to nature of femoral head radiotherapy contours. Due to the binary nature of these contours, using a model which categorizes contours based on volume, inferior-superior Extent, and Extent ratios will clearly be successful. However, the contours poorly classified by the model can be attributed to the fact that a smaller portion of the femoral head could be contoured with the shaft, therefore failing to meet the volume threshold as shown in Table 4. False negative rates were not calculated as the point of this model was to distinguish between two types of femoral head contours. Both the femoral head and bladder models could be implemented as preliminary integrity checks within radiotherapy planning to warn clinicians before they treat a suspiciously contoured plan.

With regard to the Slice Based Extent Rectum and Spinal Cord
Models, both models were successful in detecting abnormal slice Extents throughout a contour, having 73% and 83% accuracy respectively. Within the Spinal Cord model, we found less than 10% over-  Table 4, the 73.1% of false negatives present with the spinal cord T A B L E 1 Results of data driven integrity models.

Region of interest
Integrity model or metric Therefore, the slice based extent model will only see more significant discontiguities, such as those with multiple missing slices. Both of these models are important as suspiciously contoured slices with abnormal left-right or anterior-superior Extent possesses the same clinical risks as those highlighted in the bladder model.
Lastly, with regards to the nondata driven contiguousness metric, while both methods were quite accurate, it can be seen that each method poses its own benefits and drawbacks. As shown in Figure 9, both models detected suspicious contours. While the contiguousness by Extent model was the fastest, there were instances where it failed to detect suspicious contours which were detected using the region growing method. This is shown in Table 5  Additionally, while the region-growing model is highly accurate, the implementation often results in a high runtime due to the exponential computational complexity of the algorithm, taking between 30 s and a minute to check a single ROI contour. However, an average runtime cannot be cited for this method as the runtime is wholly dependent on the numbers of voxels present in the contour, shown in Figure 10. As shown in Table 6, the Region growing model suc- Another important strength of the region growing method is associated with the smoothness assumption. In general, anatomical structures are smooth, meaning that between slices in a 3D representation, voxels in neighboring slices should be close together. In the occurrence of an irregular structure or contour, where the rings defining the edge of each axial slice are too far apart, the region growing over volume will not deliver a false positive as it is able to travel between slices along the overlapping interior voxels.
Overall, the suspicious contours detected by the aforementioned methods are present for several reasons. The main reason is due to human error in the contour process. The contours used in analysis were collected across several physicians and clinics to ensure that the suspicious contours could not be attributed to a single physician.
There are no clear biases in contour data that would allow certain contours to better fit certain integrity models.

| CONCLUSION
The models developed and tested in this study each have benefits.
The data driven models are effective in finding specific cases of contours but, due to their lesser accuracy and more significant false neg- More so, this study shows the need for contour integrity system in clinical radiotherapy during the planning process. Potentially, such a tool could be used in conjunction with CT and Atlas based autosegmentation methodologies. This will not only minimize the risk of radiation overdose to critical anatomy in a clinical workflow but aid physicists, clinicians, and data scientists in the creation of F I G . 1 0 . Shows a plot describing the relationship between the runtime of tested contours and number of voxels in the contour for the Region Growing algorithm. This plot shows a direct relationship between the two variables.
T A B L E 6 Accuracy of contiguousness by region growing over volume.

CONF LICT OF I NTEREST
There are no conflicts of interest.