Automatic segmentation and applicator reconstruction for CT‐based brachytherapy of cervical cancer using 3D convolutional neural networks

Abstract In this study, we present deep learning‐based approaches to automatic segmentation and applicator reconstruction with high accuracy and efficiency in the planning computed tomography (CT) for cervical cancer brachytherapy (BT). A novel three‐dimensional (3D) convolutional neural network (CNN) architecture was proposed and referred to as DSD‐UNET. The dataset of 91 patients received CT‐based BT of cervical cancer was used to train and test DSD‐UNET model for auto‐segmentation of high‐risk clinical target volume (HR‐CTV) and organs at risk (OARs). Automatic applicator reconstruction was achieved with DSD‐UNET‐based segmentation of applicator components followed by 3D skeletonization and polynomial curve fitting. Digitization of the channel paths for tandem and ovoid applicator in the planning CT was evaluated utilizing the data from 32 patients. Dice similarity coefficient (DSC), Jaccard Index (JI), and Hausdorff distance (HD) were used to quantitatively evaluate the accuracy. The segmentation performance of DSD‐UNET was compared with that of 3D U‐Net. Results showed that DSD‐UNET method outperformed 3D U‐Net on segmentations of all the structures. The mean DSC values of DSD‐UNET method were 86.9%, 82.9%, and 82.1% for bladder, HR‐CTV, and rectum, respectively. For the performance of automatic applicator reconstruction, outstanding segmentation accuracy was first achieved for the intrauterine and ovoid tubes (average DSC value of 92.1%, average HD value of 2.3 mm). Finally, HDs between the channel paths determined automatically and manually were 0.88 ± 0.12 mm, 0.95 ± 0.16 mm, and 0.96 ± 0.15 mm for the intrauterine, left ovoid, and right ovoid tubes, respectively. The proposed DSD‐UNET method outperformed the 3D U‐Net and could segment HR‐CTV, bladder, and rectum with relatively good accuracy. Accurate digitization of the channel paths could be achieved with the DSD‐UNET‐based method. The proposed approaches could be useful to improve the efficiency and consistency of treatment planning for cervical cancer BT.


| INTRODUCTION
Brachytherapy (BT) following external beam radiation therapy (EBRT) and concurrent chemotherapy is the standard of care for patients with locally advanced cervical cancer. 1,2 Brachytherapy is an essential part of the curative intent therapy and closely associated with improvements in clinical outcomes. [3][4][5] Three-dimensional (3D) imagebased BT allows individual treatment planning based on the volumetric image of patient and is considered as a significant technical advancement and widely adopted for the treatment of cervical cancer. 6 The application of 3D image-based BT enables the practitioner to prescribe dose to the target volume as well as determine and potentially limit dose to the organs at risk (OARs), which is more advantageous than the conventional two-dimensional (2D) imagebased approach. Numerous studies demonstrate improved treatment plan quality and clinical outcomes of 3D image-based BT for cervical cancer. [7][8][9][10] Magnetic resonance imaging (MRI) is the preferred imaging modality for treatment planning of cervical cancer BT due to its superior soft tissue visualization relative to computed tomography (CT). However, there are many obstacles for routinely performing the MRI-based BT in many radiation oncology departments, including limited availability, high cost, and long scanning time. Therefore, CTbased BT of cervical cancer is widely used in treatment centers worldwide, especially in the developing countries.
Segmentation of the target volumes and OARs is an essential step in the treatment planning of 3D image-based BT. The delineation task is usually carried out by the radiation oncologist according to the recommended guidelines. However, manual delineation of the target volumes and OARs is time-consuming and prone to large inter-and intraobserver variation. Thus, accurate segmentation with high efficiency and consistency is highly desired and useful for treatment planning of 3D image-based BT. The commonly used automatic segmentation method in clinical practice is an atlas-based technique, which is employed by many commercial software tools. [11][12][13][14] To achieve automatic segmentation, atlas-based methods rely on deformable image registration between the image to be segmented and the reference image, referred to as atlas, in which regions of interest are already segmented. The segmentation of corresponding structures in a new test image is obtained by finding the optimal transformation between the atlas and test image. 15,16 Therefore, the segmentation result strongly depends on the applied registration algorithm. The choice of a suitable and robust algorithm has a substantial effect on the result, especially in the presence of image noise and interference arising from contrast changes. 17,18 Apart from registration algorithm, the atlas itself plays a crucial role in atlasbased segmentation. When performing atlas-based automatic segmentation using a single atlas, accurate segmentation result cannot be guaranteed if the selected reference image is not representative or the morphology of anatomical structures is not similar enough between the atlas and test image. 19 To reduce the uncertainties of single atlas segmentation, multiple atlases are selected from a database to register the test image, and the final segmentation result based on the multiple registrations is obtained with voting schemes. 20,21 Although multiatlas automatic segmentations improve the robustness of the segmentation results as compared to single atlas-based ones, they are prone to topological errors and require more computational time. 16,22 Applicator reconstruction is the process of localizing the radiation source paths defined by the applicator channels in the planning images. It is another critical step during the procedure of BT treatment planning. The potential dwell positions are placed on the digitized applicator channels and corresponding dwell times are determined to meet the dosimetric objectives. Applicator reconstruction accuracy has a significant impact on the dosimetric result of the treatment plan due to the steep dose gradients of BT treatment. A small uncertainty in the digitization of applicator channels would translate into a relatively large dosimetric uncertainty. [23][24][25] In general, applicator reconstruction is performed manually by the medical physicist. The digitization process is subjective and time-consuming.
Thus there is a strong need to achieve fully automatic applicator reconstruction in 3D image-based BT to ensure treatment planning accuracy and efficiency. The applicator library integrated in the treatment planning system is the clinically available tool for automatic applicator digitization, which can significantly reduce the reconstruction uncertainty and improve efficiency. It allows channel digitization based on the manual registration of virtual applicator model with predefined source paths to its appearance in the planning images.
However, the applicator library-based reconstruction method is not fully automatic due to the manual alignment of applicator model. Moreover, applications of this method are limited to only those applicators included in the library. Electromagnetic tracking technique has been recently utilized for catheter digitization in BT. 26,27 Although this method has highly accurate digitization result, additional hardware and complex procedure may hamper its widespread application.
In recent years, convolutional neural networks (CNNs) as a kind of deep learning algorithm have been successfully applied to automatic segmentation in medical images. [28][29][30][31] Outstanding segmentation performance has been achieved with various architectures of CNNs.
Automatic segmentation method based on CNNs is also introduced to radiotherapy treatment planning. 32 Automatic delineation of target volumes and OARs in EBRT treatment planning for head and neck, 33,34 breast, 35 and rectum 36 cancer have been reported. To the best of our knowledge, there are no reports on automatic segmentation for cervical cancer BT with any CNNs. In addition, most of the CNNs utilized in the previous studies take 2D CT/MRI slices as input, thus the 3D spatial and contextual information of the whole volume cannot be utilized effectively by the networks.
In this work, we propose a novel 3D CNN architecture that is based on the popular 3D U-Net architecture 30

2.B | Preprocessing of images
For the segmentation of HR-CTV and OARs, the original planning CT volumes were first cropped in each dimension to discard the regions depicting empty space or without labeled structures. Then the cropped CT volumes were resized with a linear interpolation to an identical size of 128 × 128 × 64 voxels. With the cropping and interpolating processes, the average in-plane resolution of the obtained CT volumes is 2.89 × 2.78 mm 2 , the average interslice resolution is 2.06 mm. To enhance the image contrast of the planning CT volumes, contrast limited adaptive histogram equalization (CLAHE) algorithm was employed to preprocess the images fed to the CNN. With the CLAHE method, the shape, texture, and boundary of anatomical structures in the CT images became more distinguishable, the image quality was highly improved. Figure 1 shows the image enhancement using CLAHE algorithm. In order to preserve the original information, we kept the original CT images with the processed images using CLAHE method to compose a dual-channel input to the proposed DSD-UNET.
As the first step of proposed method for automatic applicator reconstruction, all parts of the applicator were segmented in planning CT images utilizing the DSD-UNET. Due to the relatively small size of the applicator components, a fixed-size volume of interest (128 × 128 × 80 voxels) which centered the segmented applicator was first cropped from the original planning CT volume. Then linear interpolation along the superior-inferior direction was applied to obtain a 128 × 128 × 64 CT volume with the interslice resolution of 2.5 mm. The in-plane resolution of the resulting CT volume is identical with that of the original CT volume.

2.C | Architecture of DSD-UNET
The proposed novel DSD-UNET architecture was inspired by the popular 3D U-Net architecture. Figure 2 illustrates the detailed architecture of DSD-UNET. Like the U-Net, our network consisted of a contracting path and an expanding path with different stages that operate at different spatial resolutions. In order to solve the problem of vanishing gradients and accelerate the learning convergence, the residual block which consisted of two 3 × 3 × 3 convolutional layers and a spatial dropout layer in between was applied at each stage in the contracting path. The residual block was followed by a 3 × 3 × 3 convolution with a stride of 2 for downsampling. Each stage in the expanding path consisted of a 3 × 3 × 3 deconvolution with a stride of 2 for upsampling, followed by a concatenation with the feature maps from the corresponding stage in the contracting path, and then two convolutional layers with kernel sizes of 3 × 3 × 3 and 1 × 1 × 1 respectively. The segmentation layer which was a 1 × 1 × 1 convolution layer with filters equal to the segmentation classes was employed at the end of each stage in the expanding path. The number of feature channels was doubled at each stage in the contracting path and was halved at each stage in the expanding path.
Moreover, we deployed a dilated convolution module between the contracting and expanding paths, which parallel employed four dilated convolution layers with dilated factors of 1, 2, 3, and 4, respectively.
With implementation of the dilated convolution module, multiscale high-level features could be learned and aggregated to achieve more accurate and robust segmentation. Deep supervision was also employed in our network by integrating the segmentation layers at different stages of the expanding path and combining them via elementwise summation to form the final output. At last we applied sigmoid activation to this final output layer to obtain the voxel-wise probabilities for each segmentation class. Instance normalizations 38 followed with LeakyReLU nonlinearities were applied to all the convolutional layers through the network, except for the segmentation layers.

2.D | Model training
The proposed DSD-UNET architecture was implemented in Keras framework with Tensorflow as the backend. Training was performed using the Adam 39 optimizer with an initial learning rate of 0.0005 and a decay rate of 0.5 once the learning stagnated for more than 20 epochs. The proposed network architecture was trained with batch size of 2 and maximum 200 epochs. The model with the highest performance was selected. Dice similarity coefficient (DSC) was employed as an accuracy measure of the segmentation. So we formulated a multiclass dice loss function for the model training: where u k i represents the sigmoid activation value of the voxel i from the output filter for segmentation class k, v k i represents the corresponding one hot encoding of the GT segmentation map. I indicates the set of voxels in the output filter and K indicates the set of segmentation classes. N K is the total number of segmentation classes.
In order to prevent overfitting when training the networks with limited data, data augmentation techniques including random rotations (≤10°), random scaling (≤15%), and mirroring (along left-right direction only) were applied on the fly during the training process.
Due to the unavailability of sufficient GPU resource, all computations in this study were undertaken on a workstation with an Intel Xeon E5-2620 v3 CPU (2.4 GHz, 6 cores) and 32 GB RAM.

2.E | Evaluation metrics
When the training process was finished, performance of the model was assessed with the testing data only. The DSC, Jaccard Index (JI), and Hausdorff distance 40 (HD) were used as the evaluation metrics of segmentation accuracy in this study. The DSC is defined as and the JI is defined as where P and T correspond to the predicted segmentation of the model and the GT segmentation, respectively, P∩T and P∪T represent the intersection and union of predicted segmentation and GT segmentation, respectively. HD is computed as where d a, b ð Þ is the Euclidean distance between two points, A and B indicate the measured point sets, which are defined as the voxel sets for evaluation of segmentation result. The HD measures the largest degree of mismatch between two voxel sets in the volumetric image.
Thus spatial discrepancy between the automatic segmentation and GT segmentation can be quantified with this metric. In addition, the HD was used to evaluate the proposed approach of automatic applicator reconstruction. It was calculated with the points in the channel paths that determined by the developed method and manual operation.

2.F | Experiments
The proposed DSD-UNET architecture was exploited to complete two segmentation tasks. The first one was automatic segmentation of the HR-CTV and OARs, another one was automatic segmentation of all parts of the applicator to finally achieve the applicator recon-  Figure 7(a) illustrates the skeletons and segmentations of applicator tubes for an example case. The manually determined points along the applicator channels for the same case are shown in Fig. 7(b). The channel paths generated by the proposed method showed good agreement with the GT paths, as shown in Fig. 7(c). In recent years, a variety of studies concerning automatic segmentation with CNNs in radiotherapy have been reported. [32][33][34][35][36]42 To    25 In order to minimize the reconstruction uncertainties and avoid accidental errors, implementation of automatic applicator reconstruction methods with high accuracy and consistency is necessary. In our study, automatic applicator reconstruction method utilizing the DSD-UNET model has achieved relatively high accuracy. HD values between the channel paths determined automatically and manually were <1 mm for all the applicator channels.
In order to explore the feasibility and accuracy of the proposed method, we started the study with the relatively simple case of tandem and ovoid applicators. So the presented work serves as a preliminary exploration and simple test. Subsequent developments and more comprehensive evaluations are needed to extend the proposed method to more difficult scenarios.
One of the limitations of this work is the relatively small dataset size. This is due to the limited number of patients with cervical cancer that received CT-based BT in our clinic and the lack of common dataset that is suitable for this segmentation task. To ease this problem, data augmentation strategy was applied in the model training.
Dropout was deployed in the network to reduce the risk of overfitting introduced by data augmentation. However, due to the intrinsic characteristics of deep learning method, larger dataset usually leads to the improvements of performance and generalization. Therefore, we plan to collect more suitable image data in the future study.
Then more accurate and reliable segmentation result could be expected. can be enlarged to some extent. Moreover, we expect to train a single DSD-UNET model that could segment both the anatomical structures and the applicator in the planning CT images, instead of two separated models in the current study. Therefore, the automatic segmentation and applicator reconstruction could be achieved simultaneously to streamline the treatment planning workflow for cervical cancer BT.

| CONCLUSION
In this study, we presented a deep learning-based method using DSD-UNET architecture to automatically segment the HR-CTV and OARs in the planning CT images for cervical cancer BT. Quantitative

CONF LICT OF I NTEREST
The authors have no conflict of interest to disclose.