A study of positioning orientation effect on segmentation accuracy using convolutional neural networks for rectal cancer

Abstract Purpose Convolutional neural networks (CNN) have greatly improved medical image segmentation. A robust model requires training data can represent the entire dataset. One of the differing characteristics comes from variability in patient positioning (prone or supine) for radiotherapy. In this study, we investigated the effect of position orientation on segmentation using CNN. Methods Data of 100 patients (50 in supine and 50 in prone) with rectal cancer were collected for this study. We designed three sets of experiments for comparison: (a) segmentation using the model trained with data from the same orientation; (b) segmentation using the model trained with data from the opposite orientation; (c) segmentation using the model trained with data from both orientations. We performed fivefold cross‐validation. The performance was evaluated on segmentation of the clinical target volume (CTV), bladder, and femurs with Dice similarity coefficient (DSC) and Hausdorff distance (HD). Results Compared with models trained on cases positioned in the same orientation, the models trained with cases positioned in the opposite orientation performed significantly worse (P < 0.05) on CTV and bladder segmentation, but had comparable accuracy for femurs (P > 0.05). The average DSC values were 0.74 vs 0.84, 0.85 vs 0.88, and 0.91 vs 0.91 for CTV, bladder, and femurs, respectively. The corresponding HD values (mm) were 16.6 vs 14.6, 8.4 vs 8.1, and 6.3 vs 6.3, respectively. The models trained with data from both orientations have comparable accuracy (P > 0.05), with average DSC of 0.84, 0.88, and 0.91 and HD of 14.4, 8.1, and 6.3, respectively. Conclusions Orientation affects the accuracy for CTV and bladder, but has negligible effect on the femurs. The model trained from data combining both orientations performs as well as a model trained with data from the same orientation for all the organs. These observations can offer guidance on the choice of training data for accurate segmentation.


| INTRODUCTION
Segmentation of the organs-at-risk (OARs) and the tumor target is one of the key problems in the field of radiotherapy. Computer-assisted automated methods have the potential to reduce the inter-and intraobserver variability and relieve physicians from the labor-intensive contouring workload. Such problems have been addressed in clinical applications using "atlas-based" automated segmentation software. [1][2][3] Despite the popularity of such software, the recent deep learning revolution, especially the fully convolutional neural networks (CNN), [4][5][6][7][8] has turned the tables due to its significant improvement in terms of segmentation accuracy, consistency, and efficiency. Lustberg et al. 9 and Lavdas et al. 10 demonstrated that CNN contouring demonstrated promising results in CT and MR image segmentation as compared with atlas-based methods. Ibragimov et al. 11 successfully applied CNN for OAR segmentation in the head and neck CT images. The authors 12 previously reported a dilated CNN with high accuracy for segmentation of rectal cancer. With the promising learning tools and the enhancement of computer hardware, deep learning will dramatically change the landscape of radiotherapy contouring. 13 As is well-known, data are one of the most important components of any machine learning system, 14 especially for the deep networks. 15,16 Although the approaches substantially improve the performance, training CNN requires a large number of fine quality contour annotations to achieve a satisfactory segmentation outcome.
The training data for modeling must be representative of the characteristics of the image sets in the study. Special attention should be paid to collecting and constructing an appropriate dataset for any segmentation system for CNN. Patients undergoing radiotherapy for rectal cancer are generally treated either in a prone position to reduce the volume of small bowel in the high-dose region 17 or in a supine position as it is much more stable. 18 A different positioning orientation (prone or supine) will result in variability 19 in location, shape, and volume of the structures of interest. These differences may affect segmentation performance when training and testing across different positioning orientations.
In this study, we investigated the effect of cross-orientation on segmentation for rectal cancer radiotherapy using CNN. This issue is highly relevant for the following reasons. First, whether a CNN model trained with patients positioned in one orientation performs poorly for cases in the opposite orientation has not been studied before. Although this may be subjectively true, there have been no experiments to support this assumption and no quantitative evaluation of such deterioration. Second, there has been no prior report on whether and how much the training with data from both orientations would affect the segmentation accuracy. More data can increase the diversity, but mixing two very different types of data are likely to lead to confusion in model training. This is an open question whose answers may influence the training strategies of deep learning. Third, segmentation is often the prerequisite of medical image analysis. If the positioning orientation affects segmentation, it will also affect further quantitative analysis, e.g., radiomics, which is based on the segmentation. This study will therefore provide evidence and guidance for patients positioning orientation considerations. The image data were pre-processed in MATLAB R2017b (Math-Works, Inc., Natick, MA, USA). A custom-built script was used to extract and label all the voxels that belonged to the specific contours from the DICOM structure files. We used a contrast-limited adaptive histogram equalization (CLAHE) 12,20 algorithm to pre-process the CT images for image enhancement. For the patients in the "supine" position, the images were rotated 180°clockwise to create the corresponding "virtual prone" images. This is to remove the effects that are entirely caused by the physical orientation of the image. The final data used for CNN were the 2D CT slices and the corresponding 2D labels. The process and the additional image pre-processing were fully automated.

2.B | Convolutional neural networks implementation
We used the ResNet-101 7 as the deep learning network for segmentation. As illustrated in Fig. 1, the inputs of the network were the original 2D CT slices and the outputs were the corresponding maps with the segmentation labels. Table 1

| EXPERIMEN TS
In order to evaluate the effect of positioning orientation on segmentation, we designed the following three sets of experiments for comparison.
1. Segmentation using the model trained with data from the same orientation; 2. Segmentation using the model trained with data from the opposite orientation; 3. Segmentation using the model trained with data from both orientations.
Subsequently, we chose subsets with j as the testing sets and i != j as the training set to train the jth set of models. We repeated this step until we trained five sets of models to cover all the data.
In order to avoid overfitting during training phase, we adopt an offline and online data augmentation schemes. The offline augmentation randomly transformed the training cases with noise pollution and rotation (between −5°and 5°), which enlarged the training dataset by ten times. The online augmentation applied methods of randomly scaling the input images (from 0.5 to 1.5), randomly cropping, and randomly left-right flipping. With the data augmentation, the network hardly trained the same augmented image twice, as the modifications were performed at random each time. This greatly increased the diversity of samples and made the net more robust.
We implemented the training and testing of our model using Caffe, 21 which is a publicly available deep learning platform. The model was trained in a 2D pattern. During the testing phase, all the 2D CT slices were tested one by one. In detail, the 2D CT slices were the inputs and the corresponding segmentation probability maps were the outputs. The model parameters for each network were initialized using the weights from the corresponding model trained on ImageNet. 22 In this case, the input channel of "Conv1" layer should be three. However, our input was the gray image of CT, which has only one channel. We solved this problem by taking only the first channel of each filter in the "Conv1" pre-trained on Ima-geNet when loading the model. This was achieved by modifying the original code of Caffe, that is, to compare the channel number c1 of the current network and the channel number c2 of the pre-training model. If c1 is less than c2, only previous c1 channel of the filters is used. The training set was used to "tune" the parameters of the networks. The loss function and the training accuracy were computed with "SoftmaxWithLoss" and "SegAccuracy" built-in Caffe, respectively. 21 The optimization algorithm of training used backpropagation with the stochastic gradient descent (SGD). We used the "poly" learning rate policy where current learning rate equals the base one multiplying ð1 À iter max iter Þ power . In this study, we set the base learning rate to 0.001 and power to 0.9. The batch size was set to 1 due to the limitation of physical memory on GPU card. The training iteration number was set to 90K. The momentum and weight decay were set to 0.9 and 0.0005, respectively. The training and testing phases were fully automated with no manual interaction. All computations were undertaken on an Amazon Elastic Compute Cloud with NVIDIA K80 GPU.

3.B | Performance evaluation
Physician approved manual segmentation was used as the gold standard reference. The spatial consistency between the automated segmentation and the manual reference segmentation was quantified using two metrics: the Dice similarity coefficient (DSC) 23

| RESULTS
The results of the segmentation accuracy are summarized in Tables 2   and 3. The CNN segmentation models for CTV and bladder trained with cases positioned in the opposite orientation performed significantly worse (P < 0.05) than that trained with cases positioned in   tasks to solve problems faster and more effectively. Segmentation using CNN with transfer learning will be explored in the future.

| CONCLUSIONS
The experiments demonstrated that the orientation of the training dataset affects the accuracy of CNN-based segmentation for CTV and bladder but has negligible effect on the femurs. The model trained from data combining both orientations works as well as model trained on data from the same orientation for all the organs.
These observations provide guidance on how to choose training data for accurate segmentation.

ACKNOWLEDG MENTS
This project was supported by U24CA180803 (IROC) and CA223358 from the National Cancer Institute (NCI).

CONFLI CT OF INTEREST
No conflicts of interest. F I G . 3. Segmentation results on cases positioned in prone using CNN models trained with different types of datasets.