Radiomic and Artificial Intelligence Analysis with Textural Metrics Extracted by Contrast-Enhanced Mammography and Dynamic Contrast Magnetic Resonance Imaging to Detect Breast Malignant Lesions

Purpose:The purpose of this study was to discriminate between benign and malignant breast lesions through several classifiers using, as predictors, radiomic metrics extracted from CEM and DCE-MRI images. In order to optimize the analysis, balancing and feature selection procedures were performed. Methods: Fifty-four patients with 79 histo-pathologically proven breast lesions (48 malignant lesions and 31 benign lesions) underwent both CEM and DCE-MRI. The lesions were retrospectively analyzed with radiomic and artificial intelligence approaches. Forty-eight textural metrics were extracted, and univariate and multivariate analyses were performed: non-parametric statistical test, receiver operating characteristic (ROC) and machine learning classifiers. Results: Considering the single metrics extracted from CEM, the best predictors were KURTOSIS (area under ROC curve (AUC) = 0.71) and SKEWNESS (AUC = 0.71) calculated on late MLO view. Considering the features calculated from DCE-MRI, the best predictors were RANGE (AUC = 0.72), ENERGY (AUC = 0.72), ENTROPY (AUC = 0.70) and GLN (gray-level nonuniformity) of the gray-level run-length matrix (AUC = 0.72). Considering the analysis with classifiers and an unbalanced dataset, no significant results were obtained. After the balancing and feature selection procedures, higher values of accuracy, specificity and AUC were reached. The best performance was obtained considering 18 robust features among all metrics derived from CEM and DCE-MRI, using a linear discriminant analysis (accuracy of 0.84 and AUC = 0.88). Conclusions: Classifiers, adjusted with adaptive synthetic sampling and feature selection, allowed for increased diagnostic performance of CEM and DCE-MRI in the differentiation between benign and malignant lesions.


Introduction
In the screening, detection and follow-up of breast cancer, the mammography (MX) was considered the first imaging examination [1,2]. In particular, thanks to the technological improvements achieved by combining digital mammography with techniques that allow low and high energy images to be obtained, and with the administration of iodate contrast agent, it is possible to acquire images that emphasize the vascularity linked to malignant lesions by the contrast agent enhancement. This imaging technique is recognized as contrast-enhanced mammography and exploits the same physiological mechanisms as dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI).
DCE-MRI is an important complementary diagnostic imaging technique that was validated in the screening of high-risk women and dense breasts and in the monitoring of oncological therapies, thanks to its capability of combining morphological and functional information [2,3].
Radiomics and artificial intelligence approaches have been extensively applied to process both CEM and DCE-MRI in order to increase diagnostic performance in the detection of malignant breast lesions [13,14]. By means of the radiomics approach, it is possible to obtain, from medical images, a large amount of quantitative data that, combined with pattern recognition procedures, allow for the resolution of many clinical issues with high accuracy. Examples of features used in the oncology field are tumor size and shape, as well as intensity, statistical and textural metrics .
In this study, we designed several classifiers with the aim of discriminating between benign and malignant breast lesions using, as predictors, radiomic metrics extracted from CEM and DCE-MRI images. In order to optimize the analysis, balancing and feature selection procedures were performed.

Patient Selection
Patients were enrolled in this study, which was approved by the local ethical committee of National Cancer Institute of Naples Pascale Foundation. Fifty-four patients (mean age 54.3, range 31-78 years) with 79 histo-pathologically proven breast lesions (48 malignant lesions and 31 benign lesions) ( Table 1) underwent both CEM and DCE-MRI. The lesions were retrospectively analyzed with radiomic and artificial intelligence approaches. Breast lesions were categorized based on the American Joint Committee on Cancer staging. All women gave their written informed consent according to local ethical committee regulations.
Inclusion criteria: patient with known, histologically proven breast lesions who underwent both dual-energy CEM in craniocaudal (CC) and mediolateral oblique (MLO) views and DCE-MRI.

Imaging Protocol
CEM was acquired with the dual-energy mammography system (Hologic's Selenia ® Dimensions ® Unit, Bedford, MA, USA) as reported in our previous studies [43]. Two minutes after the administration of 1.5 mL/kg body weight of iodinated contrast medium (Visipaque 320; GE Healthcare, Inc., Princeton, NJ, USA) at a rate of 2-3 mL/s, each woman was placed in a CC view. Four and eight minutes after administration of the contrast agent, each breast was compressed in the MLO view: early MLO and late MLO views, respectively. DCE-MRI was acquired with a 1.5T MR scanner (Magnetom Symphony; Siemens Medical System, Erlangen, Germany) equipped with a dedicated breast coil with 16 channels. Scan settings are reported in our previous study [44]: one series before and nine series after the automatic intravenous injection of 0.1 mmol/kg body weight of a positive paramagnetic contrast material (Gd-DOTA; Dotarem, Guerbet, Roissy CdG CEDEX, France) were acquired.

Image Processing
Regions of interest were manually segmented, slice by slice, by two expert radiologists, with 25 and 20 years of experience in breast imaging, respectively.
Breast lesions were segmented on dual-energy subtracted images, where contrast uptake was emphasized, both in CC and in MLO, and on the third T1-weighted subtracted series where contrast uptake was emphasized.
Radiomics features were extracted using the Texture Toolbox of MATLAB ® , realized by Vallières et al. [45], which includes 48 parameters calculated according to the Image Biomarker Standardization Initiative [46], as previously described in [43,44]. The textural features include both first-order and second-order features; an extra detailed description of each feature has been provided in Appendix A.

Statistical Analysis
The statistical analysis was performed with RStudio software [47]. To assess variability among radiomic feature values, the intra-class correlation coefficient (ICC) was calculated. A non-parametric Wilcoxon-Mann-Whitney test and receiver operating characteristic (ROC) analysis were performed and the Youden index was calculated to obtain the optimal cut off value for each feature; then, in order to assess analysis results, the area under the ROC curve (AUC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV) and accuracy (ACC) were computed.
Linear classifier (linear discriminant analysis-LDA), decision tree (TREE), k-nearest neighbors (KNN), artificial neural network (NNET) and support vector machine (SVM) using all extracted metrics of textural parameters were used [14]. Configuration settings for each classifier are provided in our previous study [41,43]. The 10-fold cross validation (10-fold CV) and the leave-one-out cross validation (LOOCV) approaches and median values of AUC, accuracy, sensitivity, specificity, PPV and NPV were obtained.
Feature selection with the least absolute shrinkage and selection operator (LASSO) method [48] was performed considering both the λ value with the minimum mean squared error (minMSE) and the largest λ value within one standard error of it (1SE) [49].
The best model was chosen considering the highest area under the ROC curve and highest accuracy.
A p-value < 0.05 was considered as significant.

Results
The time interval between CEM and DCE-MRI was 2.5 days as a median value (range 1-16 days). Table 2 reports the diagnostic performance of significant textural parameters for DCE-MRI and for dual-energy CEM in all views (i.e., CC, early and late MLO view), expressed in terms of AUC and p-value. The best result, considering the single feature in a univariate approach, was reached by the energy, range and GLN_GLRLM extracted on DCE-MRI volume with an AUC of 0.72.  Figure 1 shows ROC curve trends of significant textural features: variance, correlation and IQR for mammography CC projection, kurtosis and skewness for mammography early-MLO projection and range, energy, entropy, GLN_GLRLM and GLN_GLSZM for DCE-MRI images. Figure 2 shows the boxplots related to the above-mentioned parameters, to separate benign from malignant lesions. Table 3 reports the performance achieved by the best classifiers designed to discriminate between benign and malignant lesions using CEM and DCE-MRI images.
The     Table 3 reports the performance achieved by the best classifiers designed to discriminate between benign and malignant lesions using CEM and DCE-MRI images.

Discussion
Using texture features from dual-energy CEM and DCE-MRI, considered both individually and in combination, we aimed to evaluate radiomic analysis in discriminating between malignant and benign breast lesions.
Marino et al. [61] investigated the potential of radiomic analysis of both CEM and DCE-MRI of the breast for the non-invasive assessment of tumor invasiveness, hormone receptor status and tumor grade in patients with primary breast cancer. This retrospective study included 48 female patients with 49 biopsy-proven breast cancers who underwent pretreatment breast CEM and MRI. Radiomic analysis was performed by using MaZda software. Radiomic parameters were correlated with tumor histology (invasive vs. noninvasive), hormonal status (HR+ vs. HR−) and grading (low grade G1 + G2 vs. high grade G3). CEM radiomics analysis yielded classification accuracies of up to 92% for invasive vs. non-invasive breast cancers, 95.6% for HR+ vs. HR− breast cancers and 77.8% for G1 + G2 vs. G3 invasive cancers. MRI radiomics analysis yielded classification accuracies of up to 90% for invasive vs. non-invasive breast cancers, 82.6% for HR+ vs. HR− breast cancers and 77.8% for G1 + G2 vs. G3 cancers. Their study, however, did not reported the combination of radiomic features extracted from CEM and DCE-MRI.
Jiang et al. [62] noninvasively evaluated the use of intratumoral and peritumoral regions from full-field digital mammography (DM), digital breast tomosynthesis (DBT) and dynamic contrast-enhanced and diffusion-weighted (DW) magnetic resonance imaging images separately and combined to predict the Ki-67 level based on radiomics. Their results demonstrated that the combined intra-and peritumoral radiomic signatures improved the AUC compared with the intra-or peritumoral radiomic signature in each modality. The nomogram incorporating the multi-model radiomics signature, age and lymph node metastasis status achieved the best prediction performance in the training (AUC = 0.922) and validation (AUC = 0.866) cohorts.
Zhao et al. [63] constructed radiomic models from DCE-MRI and mammography for the values in the diagnosis of breast cancer, reporting an accuracy of the individual model of 83.2% for DCE-MRI, 75.7% for mammography lesion, 64.4% for mammography margin and 77.2% for lesion + margin. When all features were combined, the accuracy increased to 89.6%.
Niu et al. [64] evaluated digital mammography, DBT, DCE-and DW-MRI, individually and combined, for the values in the diagnosis of breast cancer. They reported that the radiomic signature derived from DBT plus DM generated a lower AUC and sensitivity, but a higher specificity compared with that from DCE plus DWI. The nomogram integrating the combined radiomic signature, age and menstruation status achieved the best diagnostic performance in the training (AUC = 0.975) and validation (AUC = 0.983) cohorts.
Our results demonstrated that, considering the single metrics extracted from CEM, the best predictors were KURTOSIS ( Considering the analysis with classifiers and the unbalanced dataset, no significant results were obtained. After the balancing and feature selection procedures, higher values of accuracy, specificity and AUC were reached. The best performance was obtained considering 18 robust features among all metrics derived from CEM and DCE-MRI, using a linear discriminant analysis (accuracy of 0.84 and AUC = 0.88).
This study had some limitations. The small cohort of studied patients represents a preliminary result to validate increasing the cohort of patients. Manual segmentation was time-consuming and could be operator-dependent and lose reproducibility; however, an automatic segmentation considering possible multicentric lesions or background parenchymal enhancement could be difficult to perform. In this study, the histological differences of tumors were not considered. This could improve the performance in the classification problem and allow for the classification of breast lesions according to grading and histotype.
Both DCE-MRI and CEM provide functional information on neoplastic neo-angiogenesis. CEM is an attractive alternative when MRI is not available, contraindicated or poorly tolerated. However, at our institution, a study protocol to compare DCE-MRI and CEM in staging and follow-up in breast cancer is still ongoing. Therefore, a future endpoint could be to design separate classifiers for CEM and DCE-MRI images and then merge the results in specific clinical settings, such as during patient follow-up in cases of suspicious local recurrence.

Conclusions
In conclusion, classifiers adjusted with adaptive synthetic sampling and feature selection allowed for increased diagnostic performance of CEM and DCE-MRI in the differentiation between benign and malignant lesions. Acknowledgments: The authors are grateful to Alessandra Trocino, librarian at the National Cancer Institute of Naples, Italy. Moreover, for the collaboration, the authors are grateful for the research support of Paolo Pariate, Martina Totaro and Andrea Esposito of the Radiology Division at Istituto Nazionale Tumori IRCCS Fondazione Pascale-IRCCS di Napoli, I-80131 Naples, Italy.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A. Definition of Textural Features
Appendix A.1. First-Order Gray-Level Statistics First-order gray-level statistics describe the distribution of gray values within the volume. Let X denote the 3-D image matrix with N voxels, P the first order histogram, P(i) the fraction of voxels with intensity level i and Nl the number of discrete intensity levels.
• Mean, the mean gray level of X.
• Mode, the most frequent element(s) of array X.
• Median, the sample median of X or the 50th percentile of X.
• Standard deviation (STD) Mean Absolute Deviation (MAD), the mean of the absolute deviation of all voxel intensities around the mean intensity value.
Range, the range of intensity values of X.
where max(X) is the maximum intensity value of X and min(X) is the minimum intensity value of X.
• Interquartile range (IQR), the interquartile range is defined as the 75th minus the 25th percentile of X.
• Kurtosis: where X is the mean of X.
• Variance, Variance is the square of the standard deviation: where X is the mean of X.

•
Skewness: where X is the mean of X.
Appendix A.2. Gray Level Co-Occurrence Matrix (GLCM) A normalized GLCM is defined as P(i, j; δ, α), a metric with size N g × N g describing the second-order joint probability function of an image, where the (i, j)th element represents the number of times the combination of intensity levels i and j occur in two pixels in the image, that are separated by a distance of δ pixels in direction α and N g is the maximum discrete intensity level in the image. Let: -P(i, j) be the normalized (i.e., ∑ P(i, j) = 1) co-occurrence matrix, generalized for any δ and α, - |i − j| 2 P(i, j) • Entropy Run-length metrics quantify gray level runs in an image. A gray level run is defined as the length in number of pixels, of consecutive pixels that have the same gray level value. In a gray level run length matrix p(i, j|θ), the (i, j)th element describes the number of times j a gray level i appears consecutively in the direction specified by θ. Let: p(i, j) be the (i, j)th entry in the given run-length matrix p, generalized for any direction θ, -N g be the number of discrete intensity values in the image, -N r be the maximum run length, -N s be the total numbers of runs, where N s = High Gray Level Run Emphasis (HGRE) Run-Length Variance (RLV)

. Gray Level Size Zone Matrix (GLSZM)
A gray level size-zone matrix describes the amount of homogeneous connected areas within the volume, of a certain size and intensity. The (i, j) entry of the GLSZM p(i, j) is the number of connected areas of gray level (i.e., intensity value) i and size j. GLSZM features therefore describe homogeneous areas within the tumor volume, describing tumor heterogeneity at a regional scale [5].
Zone Size Variance (ZSV) Appendix A.5. Neighborhood Gray Tone Difference Matrix (NGTDM) The ith entry of the NGTDM s(i|d) is the sum of gray level differences of voxels with intensity i and the average intensity A i of their neighboring voxels within a distance d. Let: n i be the number of voxels with gray level i, -N = ∑ n i be the total number of voxels, -s(i) = ∑ n i |i − A i | f or n i > 0 0 otherwise be generalized for any distance d, -N g be the maximum discrete intensity level in the image, -p(i) = n i N be the probability of gray level i, -N p be the total number of gray levels present in the image.
• Coarseness: where ε is a small number to prevent coarseness from becoming infinite.