Breast Cancer Mass Detection in DCE–MRI Using Deep-Learning Features Followed by Discrimination of Infiltrative vs. In Situ Carcinoma through a Machine-Learning Approach

Breast cancer is the leading cause of cancer deaths worldwide in women. This aggressive tumor can be categorized into two main groups—in situ and infiltrative, with the latter being the most common malignant lesions. The current use of magnetic resonance imaging (MRI) was shown to provide the highest sensitivity in the detection and discrimination between benign vs. malignant lesions, when interpreted by expert radiologists. In this article, we present the prototype of a computer-aided detection/diagnosis (CAD) system that could provide valuable assistance to radiologists for discrimination between in situ and infiltrating tumors. The system consists of two main processing levels—(1) localization of possibly tumoral regions of interest (ROIs) through an iterative procedure based on intensity values (ROI Hunter), followed by a deep-feature extraction and classification method for false-positive rejection; and (2) characterization of the selected ROIs and discrimination between in situ and invasive tumor, consisting of Radiomics feature extraction and classification through a machine-learning algorithm. The CAD system was developed and evaluated using a DCE–MRI image database, containing at least one confirmed mass per image, as diagnosed by an expert radiologist. When evaluating the accuracy of the ROI Hunter procedure with respect to the radiologist-drawn boundaries, sensitivity to mass detection was found to be 75%. The AUC of the ROC curve for discrimination between in situ and infiltrative tumors was 0.70.


Introduction
Breast Cancer (BC) is one of the most common malignant tumour as well as the leading cause of mortality among women worldwide. In Italy, BC affected about 53,000 new cases out of a total of 175,000 cases of all female cancers in 2019 [1]. BC can be classified into two main types: in situ and invasive. Based on cytological characteristics and growth patterns, in situ type is further subdivided into Ductal and Lobular, located within the ductal or lobular epithelium, respectively. Ductal carcinoma in situ (DCIS) is more common than Lobular carcinoma in situ (LCIS), accounting for 30-50% of all detected BCs [2,3] and normally does not infiltrate through the basal membrane. On the other hand, the invasive ductal carcinoma (IDC) is the most common malignant lesion, accounting for approximately 70% of all malignant cases [4,5]. Treatment choice is different between in situ and IDC as clinical outcomes are worse for the invasive disease, therefore women may need to undergo additional surgical options if an invasive disease is missed.
In the current clinical imaging practice, the use of Magnetic Resonance Imaging (MRI) has high sensitivity and strongly improves tumour mass detection and discrimination between benign and malignant lesions [6][7][8][9][10]. Naturally, breast MRI scans must be interpreted by experienced radiologists, as these examinations are often used to improve the outcome of the surgical practice by reducing the number of re-excisions, allowing patient selection for neoadjuvant chemotherapy or therapy modification as well as representing a technique of choice for pre-surgical assessment of residual tumour size to determine breast conservation surgery candidacy [6].
In this scenario, a new field of research called Radiomics is becoming increasingly popular, with the general aim at the conversion of all the information contained in digital medical images into quantifiable features, normally related to tumour size, shape, pixel intensity and texture associated with clinical outcomes and prognosis, defining a proper tumour Radiomics signature [11] that in many cases can lead to a remarkable improvement of detection rate [12,13].
Starting from the above considerations, the aim of this study was to develop a software system able to differentiate in situ from infiltrating BT in dynamic contrast-enhanced (DCE-MRI) images, based on lesion Radiomics signature. Preliminary results of this work, on a smaller dataset, with a partially different approach, and without the segmentation step, is reported in [14]. The problem of distinguishing invasive from in situ BC is debated in a few papers in the specific literature. In [15] Radiomics features are extracted from DCE-MRI scans (190 IDC and 58 DCIS) and used to train a random forest classifier in a Leave One Out cross-validation scheme. AUC of the ROC curve was 0.90. A Radiomics signature of 569 features was tested by Li et al. [16] in mammographic images; the dataset was composed of 161 DCIS and 89 IDC and their best result was AUC = 0.72. In [17] the apparent diffusion coefficient (ADC) computed from DWI (diffusion-weighted MRI) was used to distinguish invasive from in situ DCIS. DWI characterizes tissue diffusivity, thus providing a description of tissue micro-structure [18,19]. The rationale was that invasive breast cancer spreads by degrading tissue structure by proteolytic activity: the chronic inflammatory reaction to proteolysis causes the reduction of extra-cell water content, with consequent reduction of ADC if compared to in situ tumors. In order to test the hypothesis, in [17] a dataset of 21 DCIS and 155 IDC was employed and a significant difference in ADC values between the two groups was found (p<0.001 and AUC = 0.89). A radiomics approach in DCE-MR images, using combined computer-extracted MR imaging kinetic and morphologic features, was tested by Bhooshan et al. [20] (in a dataset containing 32 benign, 71 DCIS and 150 IDC cases), obtaining AUC = 0.83. Finally, deep learning was tried in [21], with the purpose of predicting invasive cancer after DCIS diagnosis. They used a transfer learning approach, in which a pre-trained GoogleNet was used to calculate features in 131 MRI images, then training a support vector machine (SVM). The result was AUC = 0.70.
The Radiomics calculations to classify tumors as in situ or infiltrative must be performed in a Region of Interest (ROI) containing the tumor tissue. For this reason, a necessary pre-processing step is manual or (semi)automatic segmentation (contouring) of the lesions, separating the tumor from normal tissue in the image. Breast tumour segmentation, especially in DCE-MRI images, is still a challenging task in the clinical setting although it is necessary in some circumstances, e.g. when tumour-response prediction to chemotherapy is assessed [22][23][24]. Automating this procedure would help radiologists to reduce their manual workload on image analysis, as they normally perform tumour diagnosis by locating lesions layer-by-layer, and that is an arduous and time-consuming task [25].
Different image segmentation methods in MRIs were proposed in the past decades but no optimal method exists yet. The simplest pixel-based approaches generally rely on thresholding the image intensity and grouping individual pixels by appropriate classifiers. For example, Tzacheva et al. [26] determined the boundary of the suspected tumour on the assumption that the lesion intensity range is 110-140 on the 0-255 scale so that they simply applied a threshold for obtaining a binary image. The use of thresholding for breast tumour segmentation was also used by Fusco et al. [27] through the exploiting of intensity differences between the pixels before and after giving contrast, followed by morphological post-processing steps. Fuzzy C Means (FCM) clustering [25] and its various versions [28,29] is also one of the prevailing methods due to its simplicity [25] for isolating the suspicious lesions. Another popular method is the classic k-means used for segmenting the lesion [30,31].
Other typical techniques used for lesion segmentation are region-based methods. Adams and Bischof [32] proposed the algorithm of Seeded Region Growing (SRG) and its advancement [33] that begins by determining the seed (or set of seeds) from which growth will start. Then SRG grows these seeds into regions by successively adding surrounding pixels until all pixels are assigned to one region. Other region-based methods exploit the watershed algorithm followed by post-processing steps [34,35].
Contour-based methods are also much used in the task of breast lesion segmentation, especially in case of active-contours of the lesion boundary. A recent work [36] describes an interactive segmentation method for BC lesions in DCE-MRI images based on the active contour without edges (ACWE) algorithm and using parallel programming with general purpose computing on graphics processing units (GPGPU). The ACWE was able to segment objects with low gradient information in their boundaries. The performance of this algorithm was evaluated on a set of 32 breast DCE-MRI cases in terms of speed-up, and compared to non-GPU based approach. A high speed-up (40 or more) was obtained for high-resolution images, providing real-time outputs.
Sun et al. [37] proposed a semi-supervised method for breast tumour segmentation. After image segmentation with advanced clustering techniques, they performed a supervised learning step based on texture features and mean intensity levels, to classify the tumour and non-tumour patches in order to automatically locate the tumour regions in a MRI image.
These manual or semi-automatic tumour annotation techniques are generally the most used [25,38], although these approaches are often time-consuming and can drive numerous user variability. In addition, they often need manual delineation of ROIs as a first step, requiring expert knowledge in advance. On the contrary, breast tumour segmentation using deep learning approaches has recently been used in some medical imaging applications [39][40][41] and shows promising in automatic lesion segmentation. El Adoui et al. [39], used two deep learning architectures, SegNet and U-Net [42,43] for the detection and segmentation of 86 breast DCE-MRI images. These two CNN architectures were successfully applied to biomedical imaging segmentation and can be used even with relatively small datasets [44]. A 2D U-Net [42] CNN architecture was also used by Dalmis et al. on 66 breast T1-MRI post-contrast images [40] with promising results. At the same time, Moeskops et al. [41] used a deep learning approach to segment the pectoral muscle in 34 T1-MRI breast images.
The next sections describe the software system developed in this work, composed of a segmentation step followed by classification. Technical details on the database employed and on the code structure are given in the Materials and Methods section, while the preliminary results are summed up and commented in the Results and Discussion section.

Materials and Methods
The dataset consists of 55 anonymized DCE-MRI scans of BC (11 DCIS + LCIS and 44 IDC). The MRI sequence was dynamic eTHRIVE with fat suppression, on a Philips Achieva 1.5 T MRI equipment. We considered images containing at least one tumour mass, as diagnosed by an expert radiologist and confirmed by biopsy. A ROI of the tumour mass was manually delimited for each slice by an expert radiologist in the post-contrast images. The MRI volumes were resampled to isometric 1-mm pixel size before processing.
The (suspicious regions likely to contain a tumour mass) based on a dynamically changed threshold on the intensity values (ROI hunting); b) feature extraction from candidate ROIs through a pre-trained Deep-Learning Convolutional Neural Network (CNN); c) false-positive ROI rejection through the training of a feed-forward multi-layer perceptron Artificial Neural Network (ANN), with the aim of preserving only the tumors (positive class) for subsequent processing. The second step concerns the discrimination between in situ and infiltrating tumours and is subdivided into: d) Radiomics signature extraction from the detected ROIs; e) binary classification. The code was written partially in python 3.7 and pyradiomics 1 and partially in the Matlab environment. In the following sections each of the above mentioned processing steps is reviewed in detail.

ROI Hunter procedure
In our particular application accurate tumor borders were not fundamental, so we used a simple detection/segmentation method based on the application of thresholds followed by region classification.
Prior to be processed and in view to minimize false positive (FP) ROIs, the mammary area, containing the breast, was semi-automatically selected in all the slices by a bounding box (working volume) and the tissue outside the box removed. Figure 1 shows an example of breast area selection. Then the candidate tumours inside the working volume were detected. Since the tumour mass normally appears as a bright area, an iterative 2D ROI Hunter procedure, based on a dynamically changing threshold, was implemented. The number of ROIs detected from each slice was not set a priori, rather it was related to the intensity properties of the image.
Firstly, the images were converted to pixel values in the range 0 and 1, where the 99.9thpercentile (P1) of the whole image was used for normalization, in order to exclude outliers. The following iterative procedure was then performed on a per-slice basis, giving a small number of 2D ROIs per image section. An initial threshold (T) was set to 0.9 and only pixels with value ≥ T were extracted, considering the found objects as tumour candidates. If no objects were detected, the threshold was iteratively lowered by 5% from the current value, until at least one object was identified in the current slice. Tumour lesions are normally fairly round, thus elongated and threadlike objects were excluded by thresholding on their geometrical features. Each object gray-value median was calculated and the ROIs were labelled from 1 onwards in gray-value median descending order (the highest median value possibly selecting the most plausible tumour). Borders of the ROIs were also extracted.

Deep-learning Feature Extraction
Many different approaches were experimented with the purpose of obtaining a set of features able to distinguish tumor regions from FP, the most successful being the one described hereafter. In the calculation of features and subsequent classification (training and validation), we adopted a sliding window approach. Initially, in order to set some procedure parameters, the variability of tumor size was investigated, as the latter differs among different patients and of course in different slices. According to the statistics of our dataset, the longest edge of the lesion bounding box was at most 120 mm . This is in accordance with e.g. [45]. We then chose, after some tests, 30x30 pixels as the size of the sliding window for ROI scanning. During operation, the bounding box containing each lesion section is enlarged if necessary (when smaller than 30x30) and the sliding window moves with a step of two pixels (on each axis) to explore the ROIs. The features are calculated for each position of the sliding window. Features were extracted using a GoogleNet model pre-trained on ImageNet dataset [46], which is one the most representative networks in image classification. GoogleNet consists of 2 convolution layers, 9 inception layers, and 1 fully connected layers which was used for feature calculation. The output size of the last fully connected layer is 1000 thus the same number of features were extracted for each sliding window position. To fit the input image size of GoogleNet all extracted patches were resized to 224x224 pixels using bilinear interpolation and converted to RGB images by replication of the only image bitplane.

FP ROI rejection through binary classification
In order to preserve only positive ROIs (true tumors) for further processing, thus excluding FPs, the obtained features were used to train a binary classifier.
Tumour patches whose area was occupied by the lesion at least by 10% were considered as positive, while the remaining ones, together with supplementary patches randomly extracted from outside the lesion bounding box, contributed the negative samples.
To increase the size of the dataset and to favour generalization, data augmentation was obtained by random image rotations, taking care of finally obtaining a roughly balanced dataset. Several classifiers were tested (e.g. XGBoost , svm...) and the best results were obtained with a feed-forward, backprop multi-layer perceptron ANN with one hidden layer composed of five neurons.
Training/validation was performed in a leave-one-patient-out (LOPO) cross-validation scheme [47]. Data was split by patient, ensuring that ROIs related to each patient were totally contained in the training or in the validation set, and never in both, to avoid bias and consequent overfitting. For the same reason, at each iteration, feature value normalization to [0,1] was performed using min-max normalization on the training set and subsequently validation set features were normalized by the parameters used for the training set. Fifty-four out of 55 patients were used for training the network, while the last one was used for validation and a cyclical permutation of the patients was carried out. Statistics were calculated after a full LOPO cycle: a ROC curve was used to judge classification quality and to deduce an optimal threshold value on the ANN output, so obtaining the binary classifier. Figure 2 shows an example of the whole processing from ROI Hunting to classification. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 7 August 2020 doi:10.20944/preprints202008.0179.v1 Figure 2. A typical result obtained by using the ROI Hunter. The objects found after the iterative procedure based on intensity values and the results of the classification prediction (red for tumour and blue for FPs).

Tumor Characterization by Radiomics Signature
The ROI Hunter locates lesions without giving further information. The second and last part of the process concerns the characterization of the found ROIs so that a decision-making system can correctly discriminate in situ from infiltrative lesions. This step consists of Radiomics feature extraction from the selected ROIs and classification. As the calculation was performed in 3D, before proceeding 2D ROIs were grouped on the basis of continuity from slice to slice, so as to form 3D ROIs.
In order to discriminate the tumor volumes so obtained, we investigated a large dataset of radiomic features. Overall, 1820 features comprising shape, first order and higher order features are generated for each detected ROI, with original and filtered intensity. We computed 18 first order statistic features describing the distribution of voxel intensities within the image region defined, and 68 textural features quantifying intra-tumor heterogeneity (22 from gray-level co-occurrence matrices (GLCM), 16 from gray-level run length matrices (GLRLM), 14 from gray level dependence matrices (GLDM), and 16 from gray level size zone matrices (GLSZM) [48]. Besides calculating the features on the original ROI volumes, we applied several preprocessing filters to each ROI before computing the Radiomics signatures: Laplacian of Gaussian filter for edge enhancement, Wavelet filters yielding 8 subfilters (all possible combinations of applying either a High or a Low pass filter in each of the three dimensions), Square and SquareRoot filters that take the square and the square root of the image intensities and linearly scale them back to the original range, Logarithm, Exponential, Gradient filters, the Local Binary Pattern filter (in a by-slice operation, i.e. 2D and using spherical harmonics, so in 3D). After this step we applied recursive feature elimination to remove redundant and irrelevant.

Classification to discriminate in situ vs invasive BC
Three different classifiers (Naive Bayes, random forests and XGBoost) were employed and the best results were obtained with the Extreme Gradient Boosting (XGBoost) classifier (an implementation of gradient boosted decision trees) [49] in a LOPO cross-validation scheme. At each iteration the features were normalized to [0,1] using min-max normalization on the training subjects and subsequently applying the calculated normalization parameters to each test patient feature set. To overcome the severe class imbalance we oversampled the minority class (in situ BC) using Synthetic Minority Oversampling Technique (SMOTE) [50]. Performance for our imbalanced classification task was assessed using different metrics, such as balanced accuracy instead of accuracy, average precision-recall, confusion matrix, Matthews correlation coefficient and AUC from Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 7 August 2020 doi:10.20944/preprints202008.0179.v1 ROC curve. All the hyperparameters of the XGBoost classifier were optimized for our imbalanced dataset.

Results and Discussion
The sensitivity of the detection/segmentation procedure of our prototype, computed as the percentage of tumour masses correctly detected, was 75% (n=41 out of a total of 55 samples). As the system showed a sub-optimal sensitivity, an interactive part that allows the manual inclusion of regions missed by the automatic procedure was added for completeness. Four FPs were suggested by the ROI Hunter but excluded by the trained ANN, which thus showed an excellent specificity. As concerns the differentiation between in situ and infiltrative lesions, we tried the system in two different configurations, i.e. by giving as input all the masses as visually detected and manually segmented by the radiologist, and by feeding it with only the masses found by the detection/segmentation step. In the former case, the evaluation of the classification performance of the trained XGBoost classifier reported an average precision-recall score of 0.36, a balanced accuracy score of 0.68, a Matthews correlation coefficient of 0.33 and a ROC curve with an AUC of 0.70. After choosing an optimal threshold as the one associated with the ROC curve point closest to the (0,1) ROC space coordinates, the model correctly classified 46 subjects out of 55. When only the masses found by the detection/segmentation step were considered (which, as said, leads to miss a nonnegligible number of lesions), these values were slightly better as most of the classification errors actually came from masses not detected by the first CAD step. This suggests that the lesions missed by the detection step were also more difficult to characterize and assign to either class.
The limit in our results might be justified, at least in part, with the small size of our monocentric dataset and its imbalanced nature. This is also suggested by checking the dataset size of the three reviewed articles working in (conventional) MRI, i.e. [15,20,21], against the corresponding AUC the Authors declare. While our small database consists of only 55 patients, the number of images employed in the mentioned papers were respectively 248, 221 (if only the malignant cases are considered), and 131. The AUC values were 0.90, 0.83, and 0.70, which evidently correlate with sample cardinality. Accordingly, a deeper test of our approach would require a larger sample size for each class so to guarantee generalization and result quality, avoiding overfitting. In perspective, we point to increase the size of the dataset by involving different hospitals, thus creating a multicenter study. In this way, after solving the well-known problem of image normalization between different scanners, we might possibly build a CAD system with better quality and larger applicability.

Conclusions
The automatic pre-operative non-invasive distinction between infiltrative and in situ breast cancer represents an important challenge in the biomedical field.
In this work, a two-step CAD system was developed and tested on DCE-MRI scans, with the aim of discriminating infiltrating from in situ breast tumous. The first step initially performs a ROI Hunting procedure to automatically extract 2D ROIs exploiting intensity values. This level consists in a dynamical threshold algorithm that allows to select suspicious regions that are likely to contain a tumor mass. From the candidate ROIs, 1820 features are extracted through a deep learning method (starting from a pre-trained GoogleNet) followed by a classical machine-learning classifier (ANN), in the task of excluding FP regions. The second step performs the classification of in situ vs invasive breast cancer of the previously detected ROIs (merged into 3D regions), through a Radiomics-based analysis. The results show that the ROI Hunter segmentation procedure correctly identifies 75% of tumor volumes, but the software contains an interactive part that allows the manual inclusion of regions missed by the automatic detection/segmentation procedure. The infiltrative vs in situ classification task achieves a final balanced accuracy score of 0.68 on all the masses and slightly better on the masses automatically identified by the detection/segmentation step.
Our preliminary results on tumor type classification are still worse than those reported in the few specific papers existing in the literature, which can be partly explained with the small size of the dataset we use, which is moreover quite imbalanced.