Voxel‐based supervised machine learning of peripheral zone prostate cancer using noncontrast multiparametric MRI

Abstract Purpose The aim of this study was to develop and assess the performance of supervised machine learning technique to classify magnetic resonance imaging (MRI) voxels as cancerous or noncancerous using noncontrast multiparametric MRI (mp‐MRI), comprised of T2‐weighted imaging (T2WI), diffusion‐weighted imaging (DWI), and advanced diffusion tensor imaging (DTI) parameters. Materials and methods In this work, 191 radiomic features were extracted from mp‐MRI from prostate cancer patients. A comprehensive set of support vector machine (SVM) models for T2WI and mp‐MRI (T2WI + DWI, T2WI + DTI, and T2WI + DWI + DTI) were developed based on novel Bayesian parameters optimization method and validated using leave‐one‐patient‐out approach to eliminate any possible overfitting. The diagnostic performance of each model was evaluated using the area under the receiver operating characteristic curve (AUROC). The average sensitivity, specificity, and accuracy of the models were evaluated using the test data set and the corresponding binary maps generated. Finally, the SVM plus sigmoid function of the models with the highest performance were used to produce cancer probability maps. Results The T2WI + DWI + DTI models using the optimal feature subset achieved the best performance in prostate cancer detection, with the average AUROC , sensitivity, specificity, and accuracy of 0.93 ± 0.03, 0.85 ± 0.05, 0.82 ± 0.07, and 0.83 ± 0.04, respectively. The average diagnostic performance of T2WI + DTI models was slightly higher than T2WI + DWI models (+3.52%) using the optimal radiomic features. Conclusions Combination of noncontrast mp‐MRI (T2WI, DWI, and DTI) features with the framework of a supervised classification technique and Bayesian optimization method are able to differentiate cancer from noncancer voxels with high accuracy and without administration of contrast agent. The addition of cancer probability maps provides additional functionality for image interpretation, lesion heterogeneity evaluation, and treatment management.


| INTRODUCTION
Prostate cancer (PCa) is the most diagnosed cancer among men, and the second leading cause of cancer death in Australia. 1 Multiparametric magnetic resonance imaging (mp-MRI) is used as a complementary PCa diagnosis method to prostate-specific antigen (PSA) test, digital rectal examination, and transrectal ultrasound (TRUS)-guided biopsy for the detection and assessment of PCa. 2,3 Accurate spatial delineations of PCa lesions using mp-MRI have many benefits. It can assist with treatment decisions and margins, enable focal treatments specifically targeting the lesion, or facilitate increases in radiotherapy doses (boosts) to the lesions. It may also play a role in evaluation of response to treatment. Mp-MRIbased delineation is made difficult by low contrast, potential discordances between the results of the different pulse sequences, and the similarities between the appearances of benign and malignant lesions. Recent advances of machine learning (ML) techniques have the potential to improve the efficiency, accuracy, and consistency of prostate mp-MR delineations. 4,5 A combination of ML technique and human observer may provide the optimal accuracy for PCa delineation. 5 According to the Prostate Imaging Reporting and Data System (PI-RADS), the clinical guidelines for mp-MRI include T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), and dynamic contrast enhancement (DCE). 6 Several studies have investigated the potential of ML using T2WI, DWI, DCE, and diffusion tensor imaging (DTI). 7,8 DCE has a minor role in the PI-RADS V2 guidelines, limited to confirming peripheral zone (PZ) DWI-based PI-RADS3 findings.
While some studies have shown additional benefit for PZ, 9 other studies have not shown a benefit including a recent study using a DTI sequence. 10 DCE also requires a contrast agent adding to cost, time, and risk. 8 DTI is an extension to DWI, which evaluates anisotropic water diffusion in tissue structures using different gradient directions. Previous studies confirmed the utility of DTI to evaluate normal and cancerous prostate anatomy employing the widely used parameters of fractional anisotropy (FA) and mean diffusivity (MD).
DTI can also yield several other quantitative parameters that to our knowledge have not been evaluated in the prostate using ML, including axial diffusivity (AD) and volume ratio (VR). 11 Therefore incorporation of DTI including these new parameters could improve PCa delineation with ML techniques.
The majority of previous prostate tumor delineation techniques have performed a binary cancer or noncancer voxel-based classification. However producing a statistical probability of tumor presence may be useful for the noninvasive assessment of cancer heterogeneity and response to treatment. In addition, it may be useful to guide dose prescription for dose painting-based treatment planning for PCa radiation therapy. Groenendaal

2.A | Patients
Seventeen men with high PSA and positive PCa biopsy were included in this study. All patients underwent 16-core (S16C) 8-zone transrectal and transperineal ultrasound-guided biopsy at least 6 weeks before the MRI scan. All patients signed written consent prior to participating in this study, which was approved by the local health district human research ethics committee. One patient with Parkinson disease was excluded due to image distortion artifacts, leaving 16 patients. Clinical information of the patients is summarized in Table 1.
The inclusion criteria were as follows: (a) patients with biopsyproven PCa with complete clinical data, (b) patients without any contraindication of using MRI and (c) at least 6 weeks after the biopsy.
The exclusion criteria were as follows: (a) data with insufficient quality, (b) prior PCa treatment, (c) presence of hip prosthesis, (d) pelvic nodal involvement, and (e) men with intellectual impairment who would have difficulty giving informed consent to the study.
All patients underwent contrast-free T2WI, DWI, and DTI. One main limitation of prostate diffusion MRI is the sensitivity to motion and prostate movement, such as air-tissue interfaces. 13 In an attempt to overcome this limitation, all patients were restricted to a low-fiber diet 24 hours before scan 14 and were asked to empty bladder and received a rectal laxative (Microlax ® , Sanofi-Winthrop, Colombiers, France) before MRI sessions. These protocols showed a reduction in prostate gland shift in PCa patients treated with radiotherapy. 15

2.B | Image acquisition
Mp-MRI examinations were carried out on a 3 tesla MRI scanner (Skyra, Siemens Healthineers, Germany) equipped with a phase array coil. The full mp-MRI parameters are exhibited in Table 2. Mp-MRI scans were obtained 6-8 weeks after prostate biopsy to avoid hemorrhage artifacts as suggested by PI-RADS V2. 16 Furthermore, T1weighted MRI scans were acquired for patients and bleeding artifacts were not observed.

2.C | Image preprocessing
T2WI images were bias corrected using the N4ITK bias correction method with 3D Slicer software (www.Slicer.org) to eliminate intensity bias due to differing coil sensitivities. The N4ITK algorithm is based on nonparametric nonuniform intensity normalization (N3), with improvement in B-spline smoothing strategy. The N4ITK optimization parameters included: B-spline order of 3, B-spline grid resolution of (1,1,1), a shrinkage factor of 4, a maximum number of 100 iterations at each of the three resolution levels, and a convergence threshold of 0.001. Noise reduction was then applied using a median filter with a 3 × 3 square window. 17 To allow comparison between different patient T2WI intensities, four biological targets were considered for potential use as a reference for image standardization: femoral bone marrow, ischioanal fossa, obturator-internus muscle, and urine. The biological reference reproducibility was assessed by the interpatient coefficient of variations (%interCV). The tissue with the highest reproducibility (lowest %interCV) was used for intensity standardization using median + interquartile intensity range method. 18 For high b-value DWI, noise reduction was applied using a diffusion anisotropic filter and bias corrected using the N4ITK bias correction method using 3D Slicer. Then, high b-value DWI and apparent diffusion coefficient (ADC) maps were co-registered with T2WI using a rigid and affine co-registration method to correct motion misalignment and eddy current distortion in 3D Slicer.
The DTI volume was registered to b-value = 0 s/mm 2 with an affine and rigid body registration to correct for eddy current distortion artifacts and motion artifacts using ExploreDTI software (http:// www.ExploreDTI.com). 19 For noise reduction, an adaptive anisotropic diffusion filter was applied using the log-Euclidean anisotropic filter available from the software package, MedINRIA. 20 The filtered tensor images were imported into DTIStudio and used to generate eigenvalues (primary [λ 1 ], secondary [λ 2 ], and tertiary [λ 3 ]). 20 Subsequently, these three eigenvalues served to calculate 20 DTI features using Matlab 2015b software (Mathworks, USA). Then, these maps were co-registered with T2WI using a rigid and affine co-registration method in 3D Slicer. All DWI and DTI upsampled to T2WI original matrix size.

2.D | Lesion segmentation
All images were individually assessed by two experienced radiologists; one radiologist with more than 20 years and one with 12 years of experience in prostate radiology. In both cases, image assessment was performed in conjunction with the biopsy-reported cancer location. Radiologists visually matched ADC map, baseline image (bvalue = 0 s/mm 2 ), and corresponding T2WI slice locations and gland anatomy (apex, mid-gland area, and base). The cancer regions of interest (ROI)s on separate multiple slices for each patient were manually outlined independently by each radiologist in PZ on T2WI using OsiriX software (OsiriX V.0.9.0, Pixmeo, Geneva, Switzerland).
The cancer ROIs were in most cases on adjacent slices but not always and so a three-dimensional cancer ROI was not constructed. PZ boundaries were delineated by either one of radiologists. The agreement for each cancer ROI between the two radiologists was assessed (Radiologist #1; n = 50 and Radiologist #2; n = 49) using T A B L E 2 Multiparametric magnetic resonance imaging (mp-MRI) sequence parameters used for this study.

2.E | Supervised machine learning
The voxel-by-voxel approach offers for a precise tumor analysis. 22 In this study, voxel-based analysis was used to develop a ML system for cancer prediction. Figure 2 shows a schematic diagram of the implemented in the current study. Features were derived for each voxel of T2WI, high b-value DWI, and ADC map according to the following five main categories:

2.
Fifteen first-order texture features: mean, 25th percentile, 75th percentile, variance, standard deviation, median, interquartile range, mode, minimum, maximum, range (maximum -minimum), skewness, kurtosis, entropy, and energy; extracted over a sliding window. 23 Texture feature evaluation requires rather large windows in order to obtain meaningful descriptions of their content.
However, small windows are required in order to accurately locate the boundaries between different textured regions. In this study, a 9 × 9 voxel sliding window was used for texture features extraction. 24 3. Nineteen second-order texture features: variance, standard deviation, contrast, maximum probability, energy, entropy, correlation, maximum correlation coefficient, inertia, inverse, homogeneity, dissimilarity, cluster shade, sum of average, sum of variance, sum of square, sum of entropy, difference variance, and difference entropy; extracted over a 9 × 9 voxel sliding window using a gray-level co-occurrence matrix (GLCM). 25 4. Six edge-based features extracted to evaluate the local intensity variations; Roberts, Prewitt, Sobel, Canny, Roberts, and log. 26

5.
Sixteen gradient-based features derived in three directions (horizontal-x, vertical-y, diagonal-xy) and the magnitude was measured using Sobel, Prewitt, and Robert gradient operators and Central and Intermediate difference gradient methods.
Twenty quantitative diffusivity and anisotropy features were extracted from DTI volumes using tensor eigenvalues (λ 1 , λ 2 , and λ 3 ). 11 The most common prostate DTI measures are MD and FA. In this study, eight anisotropy parameters (FA, relative anisotropy (RA), volume ratio (VR), linear (C l ), planar (C p ) and spherical (C s ) anisotropies, attenuation coefficient (AC), and mode) and 12 diffusivity parameters (MD, axial diffusivity (λ 1 ), two orthogonal diffusivities to λ 1 λ 2 and λ 3 ð Þand four radial diffusivities (RD, λ 1 À λ 2 , λ 1 À λ 3 , and λ 2 À λ 3 ), 27 in addition to novel DTI parameters, volume diffusivity (VD ¼ λ 1 λ 2 λ 3 ), and surface diffusivities F I G . 1. All patients underwent 16-core (S16C) transrectal and transperineal ultrasound guided biopsy. Axial, coronal, and sagittal T2WI and axial DWI were acquired at least 6 weeks after biopsy. The two radiologists (R1 and R2) visually matched biopsy report and MRI. Then, cancer regions of interest (ROI)s were delineated by radiologists (blue and green colors). The agreement for each ROI between the two radiologists was calculated and used as cancer ROI (red color). Peripheral zone (PZ) boundaries (white color) were delineated by either radiologist. formula adapted from test theory. 28 The CFS method was applied to the training data set to find the optimal feature set. Before performing feature selection, each of the extracted features was normalized to be more sensitive to the classification model using the minimum-maximum scale method so that the minimum value was 0 and maximum value was 1.

2.E.3 | Classification and validation
The cancer voxels and noncancer voxels were utilized from all patients in the training data sets. Consider the training data set , along with their corresponding labels ðy i Þ i¼1,...,n , where y i ∈ À1, þ 1 f g for i ¼ 1,:...,n. For a linearly separable training data set, hard-margin SVM is based on margin maximization between two classes and is defined as: where w is the normal vector to the hyperplane and b is the intercept of the decision boundary.
However, linear SVM is usually not successful in practice and nonlinear SVM is required. 29 The training data set are represented as points and SVM transforms them into feature space by a nonlinear function ψ and the linear SVM is trained for the transformed In this study, a nonlinear SVM using a radial basis function (RBF) Þwas used for classification. Bayesian optimization of SVM regularization parameter C and the Gaussian kernel shape parameter γ was used to find the optimal values within an interval of [10 −4 , 10 3 ] using tenfold cross validation. 30 The performance of the model for discrimination of cancer and noncancer voxels was measured using the area under receiver operating characteristic curve (AUROC). RBF-SVMs were developed on the training data sets using the reduced features.
Overfitting is a common and serious problem. To reduce overfitting, a leave-one-patient-out approach was used for model validation for assessing how the classification model will generalize an independent data set. 31  Additionally, Dice similarity index between radiologists' agreement and output of models with the highest diagnostic performance were also calculated.

2.E.4 | Probability map
The output of the binary SVM model for input GHOLIZADEH ET AL.
| 183 described as a probability ðy i jx i Þ instead of a binary À1=1 label. A sigmoid function was fitted to the SVM decision function to derive the probability distribution: 32 A and B are two scalar parameters that are learned by the algorithm.
The parameters A and B are optimized by minimizing the local negative log-likelihood: where, p k denotes the output of the sigmoid and t k the probability target. To solve this problem model-trust minimization algorithm based on Newton's method was used.

3.A | Feature extraction and selection
The total number of radiomic features calculated using T2WI, DWI, and DTI were 57, 114, and 20, respectively. Figure 3 shows some examples of extracted radiomic feature maps from T2WI, high bvalue DWI, and DTI.
Subsets of features were selected for each of the four different classification sets using a correlation-based approach. The feature selection method also reports the relative importance of the selected features in terms of their predictive capability. Due to leave-one-pa- For the T2WI + DWI + DTI models, the volume diffusivity map from DTI had the most common effective predictive performance.
The additional radiomic features were T2WI intensity, second-order

3.B | Classification and validation
The kernel parameters of C and γ for each model with the minimum objective function value for a total of 30 for function evaluation iterations are summarized in Table 3. T2WI + DWI + DTI models achieved the largest value of γ, which indicates the more peaked the corresponding transformations of the feature vectors with the higher capacity of the models. The kernel parameters control the trade-off between error due to bias and variance in the model. Using the optimal parameters, the RBF-SVM models were developed and the receiver operating characteristic curves of the set of T2WI, T2WI + DWI, T2WI + DTI, and T2WI + DWI + DTI models using leaveone-patient-out approach plotted for the discrimination of cancer and noncancer voxels (Fig. 5). For each model and corresponding curve plotted in Fig. 5  The specificity of T2WI + DTI model was significantly higher than T2WI + DWI (P < 0.05). However, there was no significant difference in sensitivity and accuracy between T2WI + DWI and T2WI + DTI models (P > 0.05). Figure 6 shows boxplots of sensitivity,

3.C | Binary and probability maps
The T2WI + DWI + DTI models were used to generate binary and probability outputs of different models in the entire PZ of test data sets. Figure 7 shows the T2WI, binary output, and posterior probability of the T2WI + DWI + DTI RBF-SVM models for PZ for three patients with different Gleason scores. In addition, the overlap between two radiologist segmentations of the tumor on T2WI figures allows us to visually assess the efficacy of the models.  T A B L E 3 The most commonly selected and average of optimal kernel parameters for T2WI, T2WI + DWI, T2WI + DTI, and T2WI + DWI + DTI models and estimated AUROC, sensitivity, specificity, and accuracy results AE standard deviation of a comprehensive set of support vector machine using a radial basis function (RBF-SVM) models. This study showed that Haralick-based texture features over a 9 × 9 sliding window extracted from T2WI, DWI, and ADC map of DWI can improve differentiation between cancer and noncancer tissue. 43 There is no single optimum window size at which the best result can be achieved. However, a small window size such as 3 × 3 does not provide sufficient information about limited intensities and gray-level variation and using a large window size (e.g., 15 × 15) increase the chance of mixing up benign and malignant voxels. Experimental results suggested median window F I G . 6. Boxplot comparing the sensitivity, specificity, and accuracy statistics of support vector machine using a radial basis function (RBF-SVM) classifiers using leave-one-patient-out validation approach. Mann-Whitney U-test, **P < 0.01, *P < 0.05 and n.s. P > 0.05.

| DISCUSSION AN D CONCLUSION
size of 9 × 9 as the best window size for prostate texture feature on MRI. 24,44,45 Overfitting with small sample size is a common problem in ML. 31 To eliminate biased estimation, the training and test data sets were separated for model validation using a leave-one-patient-out approach, all features normalized using the min-max scale method, features selected from training pool using a leave-one-patient-out approach, and model parameters optimized using Bayesian optimization technique. These models successfully predict unseen test data set with high performance, thus highlighting the non-overfitting properties of classifiers.
Quantitative evaluation of the diagnostic performance of different classification algorithms for prostate mp-MRI demonstrated that SVM yields the maximum AUROC and the highest accuracy. 24 Although mostly used for binary classification, SVM can be adapted to calculate probabilities. The probability map produced in this study can be used as a useful tool to evaluate tumor heterogeneity and response to treatment.
The low number of patients is a limitation of this study, and the results can only be considered preliminary at this stage. Studying a cohort with more widely spread Gleason scores, supported by standard anatomic specimens, is warranted. Another limitation of this study is that classification and validation were based on radiologists' determination rather than section histology. Radiologist reporting performance cannot be perfect, and some errors are inevitable.
However, this has been offset by only considering intermediate and high-grade cancer in this study and by using the concordance of two expert radiologists to minimize this uncertainty. In addition, cancer ROIs were selected as the overlap between two radiologists' delineation, and noncancer ROIs were selected as the rest of PZ. However, noncancer ROIs might contain cancerous voxels, which is another limitation of the current study.
In conclusion, we have developed two classes of discrimination: a binary (cancer vs noncancer) and a probability RBF-SVM in this study. Binary output of SVM makes it impossible to distinguish lesion heterogeneity. A combined supervised ML technique and radiomic approach appeared as a feasible tool to predict cancer lesion on noncontrast mp-MRI. The probability output of the models provide useful practical cancer recognition information for malignancy risk assessment. The binary classification is a subclass of the F I G . 7. Examples of T2WI, binary map, and probability map for three prostate cancer patients with Gleason score of 4 + 5, 4 + 3, and 3 + 4, respectively. Delineations of peripheral zone (PZ) (white) and cancer (red) regions of interest (ROI) overlay on T2WI were performed by concordance of two experience radiologists.