Multiparametric deep learning tissue signatures for a radiological biomarker of breast cancer: Preliminary results

Purpose Deep learning is emerging in radiology due to the increased computational capabilities available to reading rooms. These computational developments have the ability to mimic the radiologist and may allow for more accurate tissue characterization of normal and pathological lesion tissue to assist radiologists in defining different diseases. We introduce a novel tissue signature model based on tissue characteristics in breast tissue from multiparametric magnetic resonance imaging (mpMRI). The breast tissue signatures are used as inputs in a stacked sparse autoencoder (SSAE) multiparametric deep learning (MPDL) network for segmentation of breast mpMRI. Methods We constructed the MPDL network from SSAE with 5 layers with 10 nodes at each layer. A total cohort of 195 breast cancer subjects were used for training and testing of the MPDL network. The cohort consisted of a training dataset of 145 subjects and an independent validation set of 50 subjects. After segmentation, we used a combined SAE‐support vector machine (SAE‐SVM) learning method for classification. Dice similarity (DS) metrics were calculated between the segmented MPDL and dynamic contrast enhancement (DCE) MRI‐defined lesions. Sensitivity, specificity, and area under the curve (AUC) metrics were used to classify benign from malignant lesions. Results The MPDL segmentation resulted in a high DS of 0.87 ± 0.05 for malignant lesions and 0.84 ± 0.07 for benign lesions. The MPDL had excellent sensitivity and specificity of 86% and 86% with positive predictive and negative predictive values of 92% and 73%, respectively, and an AUC of 0.90. Conclusions Using a new tissue signature model as inputs into the MPDL algorithm, we have successfully validated MPDL in a large cohort of subjects and achieved results similar to radiologists.


Introduction
A new paradigm is beginning to emerge in Radiology with the advent of increased computational capabilities and algorithms.This has led to the ability of "real time" learning by computer systems of different lesion types to help the radiologist in defining disease.In particular using deep learning algorithms to segment and classify different radiological images.We chose to use multiparametric magnetic resonance imaging (mpMRI) parameters which capitalize on the different contrasts of tissue.For example, using conventional and advanced MRI parameters of T1-and T2-weighted, diffusionweighted imaging (DWI) and dynamic contrast enhanced imaging (DCE) provide qualitative and quantitative information of different tissue types which can be used to construct "tissue" signatures information of tissue [1][2][3][4] .Therefore, to integrate mpMR and characterize breast tissue, we have developed a machine learning method coupled with deep learning for segmentation and characterization of breast tissue using mpMRI.Deep learning networks (DLN) allow for the "learning" of radiological relationships between different tissue types and provides new methods to "segment" and/or classify high-dimensional data sets [5][6][7][8][9][10][11] .These DLN algorithms allow for accurate and reliable prediction of tissue types from "raw" input images with the aim to improve the radiologist's clinical decision support in different diseases 12- 15 .Therefore, we implemented an unsupervised deep learning system based on stacked sparse autoencoders (SSAE).Autoencoders are unsupervised neural networks that are trained to create a compact or a low dimensional representation of its input via the hidden layer 7,9,16,17 .The stacked sparse autoencoder network (SSAE) is a stack of sparse autoencoders with each autoencoder forming a layer of the SSAE.This allows us to use deep learning to develop multiparametric breast tissue signatures across subjects, without prior knowledge of the lesion type for application to patients for tissue segmentation and classification of breast lesions.
In this work, we establish the use of a multiparametric deep learning SSAE for radiological biomarkers of breast tissue by demonstrating that the MPDL SSAE can segment different breast tissue types, i.e., fatty, glandular, and lesion tissue.Second, we can classify the breast lesions into benign or malignant and show that the results are similar to radiologists.Finally, we developed and validated the MPDL tissue signature model with an independent data set in breast cancer patients.

Materials and Methods
Clinical subjects: All studies were performed in accordance with the institutional guidelines for clinical research under a protocol approved by the Johns Hopkins University School of Medicine Institutional Review Board (IRB) and all HIPAA agreements were followed for this retrospective study.One hundred and eighty-nine women (96 malignant, 39 benign and four normal (no-lesion) and were scanned.Malignancy was determined by pathology in all cases.
Fifty (n=50) cases were obtained from University California San Francisco (UCSF) for an independent deidentified test data set.These fifty cases are from a Phase 3 clinical trial for women receiving neoadjuvant chemotherapy for locally advanced breast cancer defined by histology 18,19 .We used the baseline study before initiation of the therapeutic regimen.
Multiparametric MRI imaging protocol: MRI scans were performed on a 3T magnet (Philips North America Corporation), using a dedicated phased array breast coil with the patient lying prone with the breast in a holder to reduce motion.MRI sequences consisted of fat suppressed (FS) T2WI spin echo (TR/TE=5700/102) and fast spoiled gradient echo (FSPGR) T1WI (TR/TE =200/4.4,Field of View (FOV)=256x256, slice thickness(ST), 4mm, 1mm gap); diffusion-weighted (TR/TE=5000/90ms,b=0-800, FOV= 192x192,ST=6mm); and finally, pre-and post-contrast enhanced images FSPGR T1WI (TR/TE=20/4, FOV=512x512, ST=3 mm) were obtained after intravenous administration of a GdDTPA contrast agent (0.2mL/kg(0.1 mmol/kg)).The contrast agent was injected over 10 seconds, with MRI imaging beginning immediately after completion of the injection and the acquisition of 14 phases.The contrast bolus was followed by a 20cc saline flush.The DCE protocol included two minutes of high temporal resolution (15 sec per acquisition) imaging to capture the wash-in phase of contrast enhancement.A high spatial resolution scan for two minutes then followed, with additional high temporal resolution images (15 sec per acquisition) for an additional two minutes to characterize the wash-out slope of the kinetic curve for pharmacokinetics(PK) 2 .
Total scan time for the entire protocol was less than 45 minutes.The independent validation breast MRI scans were acquired on a different 1.5 T magnets using a dedicated breast RF coils and obtained from the ACRIN I-SPY clinical trial 18,19

Multiparametric Image Registration:
The mpMRI were coregistered using a hybrid registration algorithm that combines 3D wavelet transformation for 3D reslicing and rescaling of the MRI volumes with nonlinear affine transformation to minimize the loss of information in image transformations 20 .The pre-contrast image of the DCE dataset was used as the reference image for all the other MRI images.Multiparametric MRI tissue signature generation:The Eigenimage filter (EI) segmentation algorithm was used to segment the breast lesions from the post contrast DCE image.The EI is a linear filter that maximizes the projection of a desired tissue (lesion tissue) while it minimizes the projection of undesired tissues (glandular tissue) onto a composite image called an Eigenimage 21,22 .Tissue signatures of glandular, fatty, and lesion tissue were defined using the EI filter are described below.Moreover, the EI corrects for partial volume effects, this allows for better demarcation of the underlying structures 23 .Multiparametric Deep Learning Tissue Signatures: The multiparametric deep learning network was trained on the breast tissue signatures defined using EI identified on all the original breast MRI images as demonstrated in Figure 1.The MPDL network builds a composite feature representation of the breast tissue signatures of the underlying breast tissue.A voxel tissue signature vector is defined as the vector of gray level intensity values corresponding to that voxel position in each image in the entire data sequence (n=23 images).Mathematically, the MPDL tissue signature is defined as follows: , , , and a fourth one for background noise.Each set of MPDL tissue signature vectors created automatically using a multiparametric region growing algorithm.The initialization to the region growing algorithm is provided by the operator identifying pixels within the tissue of interest.The tolerance for region growing was set at 5%.The final ROI is created by computing a logical AND operation between the ROIs generated from region growing on each of the MR images (Figure 1).By using several images concurrently, the probability of a pixel from another tissue being included in the final ROI (due to noise, partial volume, and nonuniformities) is reduced.The computer time required for producing the final ROI was less than a second for each tissue type.

Multiparametric Deep Learning Network Stacked Sparse Autoencoders Network Architecture:
We developed the mpMRI segmentation deep network by stacking sparse autoencoders (SSAE) for segmenting a multiparametric breast MRI dataset into regions corresponding to different tissue types.Figure 2 demonstrates the network architecture of the mpMRI segmentation deep network.Each sparse autoencoder of the SSAE was pre-trained in an unsupervised fashion to create a low dimensional representation of its input via the hidden layer by the tissue signature vectors.The input to each autoencoder except the first autoencoder was the hidden layer representation discovered by the previous layer.The first layer autoencoder learns a low dimensional representation,  = { (1) ,  (2) , … ,  () } ∈   from the training input tissue signatures,  = { (1) ,  (2) , … ,  () } ∈   , where D is the dimensionality of the input tissue signatures, N is the total number of tissue signatures and d is the number of nodes in the hidden layer of the autoencoder.The output of the final sparse autoencoder was used as input to train a softmax classifier to identify the input tissue signature as background, fat, glandular or lesion.

Multiparametric MRI segmentation deep network training and evaluation:
For the MpMRI segmentation, the training parameters obtained from optimization were set as follows: the number of layers=5, the number of nodes in each hidden layer=10, L2 regularization penalty = 0.001, Sparsity proportion = 0.5, Sparsity regularization = 4.The transfer function for the autoencoder nodes was selected as the saturating linear function given as We tested the MPDL for segmentation of breast tissue into different tissue classes using a two-fold cross validation.The balance between number of tissue signatures used to train different tissue types were maintained by sampling uniformly at random equal number of tissue signatures corresponding to each tissue type from each patient.Moreover, the MPDL tissue signature can adapt to the input sequences into the SSAE.
To perform a quantitative comparison between the MPDL segmented regions, we defined the radiological ground truth for breast imaging by the EI segmented regions.The dice similarity index (DS) was used as the overlap evaluation metric 24 .The dice similarity index is designed to find the similarity between overlapping regions from two objects.Mathematically, DS is given by the following equation: Here, A and B are the lesion areas obtained by ground truth (EI segmented post-contrast image) and the multiparametric deep learned image, respectively.The EI segmentation was obtained by thresholding the EI contrast image.The threshold was obtained by evaluating the post-contrast MR image histogram, and using the mean and a 95% confidence interval.

Multiparametric MRI classification deep network training and evaluation:
For the MpMRI classification, we developed a hybrid feature extraction and classification method termed SAE-SVM by combining sparse autoencoder (SAE) with support vector machine (SVM) algorithm.The unsupervised SAE component of the SAE-SVM algorithm automatically extracted the intrinsic tissue signatures for each MRI parameter which were then trained by the linear SVM classifier to predict a tumor as benign or malignant.The training parameters of the SAE were set as follows: Number of nodes in the hidden layer=10, L2 regularization penalty=0.001,Sparsity proportion=0.5,Sparsity regularization=4, Encoder transfer function: sigmoid, Decoder transfer function: linear.The imbalance in the number of benign and malignant patients was resolved by setting a higher misclassification cost for benign than malignant patients.The SAE-SVM feature extraction and classification method was tested using leave-one-out and ten-fold cross validation with sensitivity, specificity and area under the receiver operating characteristic curve (AUC) as the evaluation metrics.The imbalance in the number of benign and malignant patients was resolved by setting a higher misclassification cost for benign than malignant patients.The optimal value of the misclassification penalty was obtained using grid search on misclassification penalty ratios from the set Benign: Malignant = {1: 1, 1.5: 1, 2: 1, 2.5: 1, 3: 1, 3.5: 1, 4: 1}.

Validation of the Multiparametric Deep Learning Network. Magnetic Resonance Imaging Data:
We used 50 patients with calculated volumes from the University California at San Francisco I-Spy ACRIN study to test our MPDL network 18,19 .To compare with our dataset, we used the baseline DCE contrast imaging session from the study 18,25,26 .The UCSF data was registered to the pre-contrast DCE image.After application of the MPDL, the MPDL and UCSF volumes were compared and analyzed.

Statistical Methods:
We computed summary statistics (mean and standard deviations) from the quantitative imaging parameters from the mpMRI.The percentage difference and overlap segmented from the lesion areas were computed and compared.Statistical analysis was performed using linear regression to correlate the total lesion areas.A Student's t-test was used to determine statistical significance between the lesion areas and the percent difference between the lesion boundaries.Sample size was calculated based on the ROC curve by combining information from multiple MRI tissue contrasts [27][28][29] .A sample size of 112 can give 85% power to detect an increase in specificity (under significance level alpha=0.05).The same sample size also gives us greater than 85% power to differentiate sensitivities between 80% and 95% at alpha=0.05 significance level.We increased our total sample size from 112 to 189.Bland-Altman tests were run to insure no bias in the validation data set.Sensitivity and specificity with Area under Curve (AUC) was computed.Statistical significance was assigned for p < 0.05.

Results
Patient Demographics:There were total of 139 women with suspicious breast lesions in this study (n=139 training).For the training set (n=139) the mean age was 52 years (range: 24-80 years).Ninetysix patients had malignant breast lesions (69%) while thirty-nine patients had benign breast lesions (28%) and four had no lesions identified (3%).
Quantitative mpMRI:In the training data set, the DWI and DCE sequences provided quantitative radiological metrics.there were significant differences (p<0.001) between the ADC map values for malignant and benign breast lesions.ADC values for malignant cases were (mean and standard deviation) 1.26±0.13(mm 2 x10 -3 /s) and benign lesions were 1.74±0.17(mm 2 x10 -3 /s).Glandular tissue ADC values for malignant and benign lesions were not significantly different, 2.16±0.46 and 2.34±0.33(mm 2 x10 -3 /s), respectively.The DCE PK values were significantly different (p<0.05) between malignant and benign lesions.The K trans values were 0.55±0.32(1/min) and EVF were 0.30±0.16for malignant cases and 0.25±0.19 (1/min) and 0.22±0.13for benign cases, respectively.Training Data Set:The MPDL tissue signatures were defined for different breast tissue types (Figure 1) and applied to the 139 mpMRI breast cases.Figure 2 demonstrates the mpMRI deep network segmentation for the tissue signatures.Figure 3 illustrates the results on five representative malignant patients.Similarly, figure 4 illustrates the mpMRI deep network segmentation results on five benign patients.Figures 3 and 4, demonstrate the mpMRI deep network successfully segmented different tissue types from benign and malignant patients using the tissue signature model.The dice similarity index between the lesion segmentations demonstrated excellent overlap with mean and standard deviation (SD) 0.87±0.05for malignant patients and 0.85±0.07for benign patients.Representative cases are shown in Figure 5.The optimal value of misclassification ratio was obtained at 2:1 i.e. benign patients had misclassification penalty set twice that of malignant patients.Validation Testing: The validation of the MPDL was done on an independent clinical data set (University of California-San Francisco-UCSF) was excellent 18,19 .Figure 6 illustrates representative cases comparing the segmented tissue regions of the UCSF validation data set using the MPDL tissue signatures.The lesion volumes defined by MPDL and USCF resulted in a small percent difference of 4.4%±3.9%.Bland-Altman plots are shown in Figure 7, demonstrating no bias and excellent agreement between the data set.Finally, the sensitivity and specificity for differentiation of malignant from benign lesions were 90% and 85% respectively with an AUC of 0.93 and shown in Figure 8.

Discussion
Utilizing multiparametric deep learning network, we have developed, tested, and validated a cognitive computing platform that organizes, integrates, and interprets imaging information using a MPDL tissue signature model.The application of the MPDL tissue signature model resulted in excellent segmentation and classification of different tissue classes.This study employed an integrated multiparametric breast MRI deep learning model in the clinical setting and demonstrates that MPDL tissue signatures defines benign and malignant tissue and performs accurate classification.Moreover, this report demonstrates that deep learning-assisted unsupervised segmentation using mpMRI signatures can detect heterogeneous zones within breast lesions.These heterogeneous regions can be used for further classification of breast tissue by quantitative ADC maps and/or PK-DCE parameters.Finally, the MPDL model with machine learning classification distinguished between benign and malignant tissue with high sensitivity, specificity, and accuracy.
The results from application of MPDL tissue signature model to an independent breast MRI dataset demonstrated the robustness of the MPDL model.Importantly, the MPDL model was able to accurately segment breast tissue irrespective of the magnetic field strength (3T for our data and 1.5T for the validation set).Furthermore, the MPDL model was invariant to the imaging orientation as our dataset was in the axial plane while the validation set was obtained in sagittal plane.This invariance is due to the underlying depiction of the tissue using tissue signature vectors, which captures the tissue underlying characteristics and allows for the "adjustment" to different MRI input.Moreover, the MRI parameters, as well as, the time resolution of the DCE image phases used to train the MPDL model were different for our dataset and the validation dataset reasserting the robust nature of the MPDL model and eliminates the need to "retrain" the MPDL model.The mpMRI parameters used in this study were based on our and others previous results in patients 1,3,4,30 .These studies demonstrated that the combined MRI sequences consisting of DWI, ADC, and PK-DCE were highly correlated with the histological phenotype of the tissue.The sensitivity and specificity of classification between malignant and benign tumors by MPDL were similar to radiologists 31,32 .This is very encouraging when the future reading rooms will have advanced computing power to assist in reading of cases.Currently, it is very unlikely that machine learning and deep learning will replace radiologists as has been suggested by some, yet there may be a role for improved efficiency in the workflow and accuracy of interpretation.Using advanced computational methods with allow for this coming change to be better managed within radiology.Our results demonstrate that the MPDL method can be used on an independent data set acquired from different institutions.Indeed, the I-Spy trial is one of the largest MRI trials and incorporates many different MRI field strengths.However, the ability for the MPDL to learn different tissue signatures allows it to adapt to different data sets with highly accurate results.This was shown with the high dice similarity of the validation data using different input MRI data.There are, however, some technical limitations to the use the MPDL network in practice.First, increased computational power on the graphical processor units ((GPU) > 2500 cores, 12GB)) used here may not be widely available.However, the use of advanced GPU computing is rapidly finding applications in many different radiological datasets 10,12,15,[33][34][35][36] .More specific to the present study, any assessment of the clinical value of MPDL network will require additional studies in a larger patient population.
Moreover, a prospective trial with subsequent follow-up and pathological correlation using MPDL will provide us with new data to explore the exact application and methods to apply to larger studies.
In conclusion, we have demonstrated that integrated MPDL method accurately segmented and classified different breast tissue from multiparametric breast MRI.The MPDL images allow for improved visualization of different tissue characteristics based on multiple radiological parameters.with a sensitivity of 90% and specificity of 85%.

Figure 1 .
Figure 1.Demonstration of the multiparametric deep learning tissue signatures on axial mpMRI of the breast.Representative tissue signatures from the normal and abnormal tissue are obtained on each of the input MRI to create the vector signature MPDLij.

Figure 2 .
Figure 2. Framework for mpMRI segmentation deep network trained to segment multiparametric breast MRI into regions of different tissue types and background.The stacked sparse encoder deep network is constructed of ten hidden layers with five nodes each and a softmax classification layer that outputs the probability of different tissue types for the input tissue signature.

Figure 3 .
Figure 3. Illustrates the use of MPDL network on axial breast mpMRI in five representative malignant patients.The color coding for different tissue types are shown to the right of the images.

Figure 4 .
Figure 4. Illustrates the use of MPDL network on axial breast mpMRI in five representative benign patients.The color coding for different tissue types are shown to the right of the images.

Figure 5 .
Figure 5. Demonstration of dice similarity overlap between the Eigenimage and MPDL segmentation masks of two benign (A and B) and two malignant (C and D) patients overlaid on the subtracted dynamic contrast enhanced image.The Eigenfilter segmentation boundary is shown in yellow and the MPDL segmentation boundary is displayed in red.On the overlap masks, the blue region corresponds to the overlap between the two methods, red represents the area segmented by the MPDL alone while yellow represents the area segmented by Eigenfilter alone.

Figure 6 .
Figure 6.Demonstration of three representative sagittal breast cases from the validation cohort and the resulting MPDL segmentations.In all cases the segmented regions of breast lesions were highly correlated between each other.The color coding for different tissue types are shown to the right of the images.

Figure 7 .FiguresFigure 1 .
Figure 7. Bland-Altman plots demonstrating the limits of agreement of the percent differences from the validation data set and MPDL segmentations.The mean is shown by the center line and the confidential intervals (±2SD) are shown at 10.6% and -12.9%. the resulting.The plot shows excellent agreement between the two measurements.

Figure 2 .
Figure 2. Framework for mpMRI segmentation deep network trained to segment multiparametric breast

Figure 3 .
Figure 3. Illustrates the use of MPDL network on axial breast mpMRI in five representative malignant

Figure 4 .
Figure 4. Illustrates the use of MPDL network on axial breast mpMRI in five representative benign

Figure 5 .
Figure 5. Demonstration of dice similarity overlap between Eigenimage and MPDL segmentation

Figure 6 .
Figure 6.Demonstration of three representative sagittal breast cases from the validation cohort and

Figure 7 .
Figure 7. Bland-Altman plots demonstrating the limits of agreement of the percent differences on the

Figure 8 .
Figure 8.The receiver operating characteristic curve from the MPDL classification shows an AUC=0.93