Radiomics and Artificial Intelligence Analysis with Textural Metrics Extracted by Contrast-Enhanced Mammography in the Breast Lesions Classification

The aim of the study was to estimate the diagnostic accuracy of textural features extracted by dual-energy contrast-enhanced mammography (CEM) images, by carrying out univariate and multivariate statistical analyses including artificial intelligence approaches. In total, 80 patients with known breast lesion were enrolled in this prospective study according to regulations issued by the local Institutional Review Board. All patients underwent dual-energy CEM examination in both craniocaudally (CC) and double acquisition of mediolateral oblique (MLO) projections (early and late). The reference standard was pathology from a surgical specimen for malignant lesions and pathology from a surgical specimen or fine needle aspiration cytology, core or Tru-Cut needle biopsy, and vacuum assisted breast biopsy for benign lesions. In total, 104 samples of 80 patients were analyzed. Furthermore, 48 textural parameters were extracted by manually segmenting regions of interest. Univariate and multivariate approaches were performed: non-parametric Wilcoxon–Mann–Whitney test; receiver operating characteristic (ROC), linear classifier (LDA), decision tree (DT), k-nearest neighbors (KNN), artificial neural network (NNET), and support vector machine (SVM) were utilized. A balancing approach and feature selection methods were used. The univariate analysis showed low accuracy and area under the curve (AUC) for all considered features. Instead, in the multivariate textural analysis, the best performance considering the CC view (accuracy (ACC) = 0.75; AUC = 0.82) was reached with a DT trained with leave-one-out cross-variation (LOOCV) and balanced data (with adaptive synthetic (ADASYN) function) and a subset of three robust textural features (MAD, VARIANCE, and LRLGE). The best performance (ACC = 0.77; AUC = 0.83) considering the early-MLO view was reached with a NNET trained with LOOCV and balanced data (with ADASYN function) and a subset of ten robust features (MEAN, MAD, RANGE, IQR, VARIANCE, CORRELATION, RLV, COARSNESS, BUSYNESS, and STRENGTH). The best performance (ACC = 0.73; AUC = 0.82) considering the late-MLO view was reached with a NNET trained with LOOCV and balanced data (with ADASYN function) and a subset of eleven robust features (MODE, MEDIAN, RANGE, RLN, LRLGE, RLV, LZLGE, GLV_GLSZM, ZSV, COARSNESS, and BUSYNESS). Multivariate analyses using pattern recognition approaches, considering 144 textural features extracted from all three mammographic projections (CC, early MLO, and late MLO), optimized by adaptive synthetic sampling and feature selection operations obtained the best results (ACC = 0.87; AUC = 0.90) and showed the best performance in the discrimination of benign and malignant lesions.


Introduction
Breast cancer is the most common female disease in the word, diagnosed primarily in over-fifties women [1][2][3].
Mammography (MX), introduced in the 1960s, plays a pivotal role in cancer screening, detection, and follow-up, despite the availability of various other breast imaging modalities, such as ultrasound (US) and breast Magnetic Resonance Imaging (MRI), due to its properties and qualities [4,5].
Contrast-enhanced mammography (CEM) is a diagnostic technique that combines the advantages of standard full-field digital mammography (FFDM) with the intravenous administration of an iodinated contrast medium, highlighting neovascularity associated with actively growing malignancy. The use of contrast agents in cancer detection is based on the phenomenon that neoplasms induce angiogenesis for further tumor growth. Contrast medium can pass through the walls of new rapidly formed vessels into the (tumor) interstitium, causing enhancement.
In addition, dual-energy CEM is a quick and well-tolerated examination that eliminates breast density as a limiting factor when interpreting two-dimensional (2D) mammograms by utilizing a dual-energy acquisition system, which then generates a subtracted image to outline areas of enhancement.
Previous studies have evaluated the sensitivity of CEM compared to conventional digital MX, US, and MRI [6][7][8]. CEM sensitivity has been reported in the range 90-100% [9], much higher in comparison to MX and US alone [8]. CEM allows to identify additional occult cancers at mammography, to assess more accurately the disease extent, and to guide surgical and treatment planning [8][9][10][11].
However, several studies have reported that CEM sensitivity increases compared to conventional MX at the expense of specificity that ranges in the interval of 58-70% [12]. This issue could be resolved using quantitative feature extracted by CEM images.
Recent significant advancements within the field of medical image analysis relies on the application of Artificial Intelligence (AI) methods for the processing of large quantities of data from different imaging modalities [13,14].
Radiomics is the process of extracting quantitative properties, named features, from images. This feature extraction activity is typically realized by means of pattern recognition algorithms and provides, as a result, a set of numbers, each one representing a quantitative description of a specific either geometrical or physical property of the image portion under consideration. In oncological applications, examples of features are tumor size, shape, intensity, and texture, collectively providing a comprehensive tumor characterization, called the radiomics signature of the tumor [15][16][17].
The texture features analysis characterize intensity histogram and the relationships between different intensity levels within lesion; moreover, it describe spatial and spectral frequency patterns [18][19][20].
This work has the objective to estimate the diagnostic accuracy of textural features extracted by dual-energy CEM images by carrying out univariate and multivariate statistical analyses, using artificial intelligence approaches in the classification of benign and malignant breast lesions.

Patient Selection
From October 2017 to April 2018, 80 consecutive patients with known unifocal and or multifocal/multicentric breast lesions (mean age ± standard deviation of 50.7 ± 11.6 years (range 26-78)) were enrolled in this prospective study according to regulations issued by the local Institutional Review Board. All women gave their written informed consent for research purposes. Inclusion criteria: patient with known breast lesions, histologically proven, and patient underwent dual-energy CEM examination in both craniocaudal (CC) and mediolateral oblique (MLO) projections.
Exclusion criteria were breast implants, pregnancy or possible pregnancy, a contraindication to the intravenous use of iodinated or gadolinium-chelated contrast agent (e.g., allergy to either agent or severely impaired renal function), inability to keep upright immobility during the examination, neoadjuvant chemotherapy treatment, hormone treatment (the most common forms of hormone therapy for breast cancer work by blocking estrogen hormones from attaching to receptors on cancer cells or by decreasing the body's production of hormones) or radiation therapy at the time of imaging.
Overall, 104 suspected breast lesions of 80 enrolled patients were analyzed. A total of 19 patients were excluded: 6 for contraindication to the intravenous use of iodinated or gadolinium-chelated contrast agent; 4 for neoadjuvant chemotherapy treatment; 4 for breast implants; 2 for inability to keep upright immobility during the examination; and 3 for hormone treatment or radiation therapy.

Imaging Protocol
All CEM examinations were performed with a dual-energy mammography system (Hologic's Selenia ® Dimensions ® Unit, Bedford, MA, USA). Approximately two minutes after an intravenous injection of 1.5 mL/kg body weight of iodinated contrast medium (Visipaque 320; GE Healthcare, Inc, Princeton, NJ, USA) at a rate of 2-3 mL/s, each woman was properly positioned in a CC view.
Two images were obtained per each breast through a low-energy exposure (26-31 kVp, according to breast thickness and density) and a high-energy exposure (45)(46)(47)(48)(49), straddling the k-edge of iodine. The low-energy image served as the equivalent of a twodimensional FFDM examination, which was previously demonstrated [33][34][35]. The highenergy image is not diagnostic and is used for post-processing purposes. The total X-ray dose delivered to the patient for a pair of low-and high-energy images was estimated to be between 0.7 and 3.6 mGy, depending on breast thickness (30 to 80 mm) and tissue composition (0 to 100% glandular tissue); it is approximately 1.2-times the dose delivered in a standard single-view digital mammography [33][34][35].
The two previously acquired images are then digitally subtracted of each other to produce a resultant recombined image that highlights contrast enhancement uptake area and gives functional information [10], as it can be seen in Figure 1. Four and eight minutes after contrast agent administration, each breast were compressed in the MLO view: early-MLO and late-MLO view images were respectively acquired ( Figure 1).

Histopathological Analysis
The reference standard was histopathologic examination. Overall, 104 samples of 80 patients were analyzed histopathologically: 65 surgical specimens for malignant lesions and 39 samples (surgical specimen or fine needle aspiration cytology (FNAC), core or Tru-Cut needle biopsy, and vacuum assisted breast biopsy (VABB)) for benign lesions. Histopathological examination was performed before CEM examination. This procedure did not change the shape of the lesion and therefore of the segmentation of the lesions.
Tumor stages were classified according to the system implemented by the American Joint Committee on Cancer staging. Ductal carcinoma in situ and invasive cancer tumors were counted as malignant lesions including, also, the papillary breast cancer that is a

Histopathological Analysis
The reference standard was histopathologic examination. Overall, 104 samples of 80 patients were analyzed histopathologically: 65 surgical specimens for malignant lesions and 39 samples (surgical specimen or fine needle aspiration cytology (FNAC), core or Tru-Cut needle biopsy, and vacuum assisted breast biopsy (VABB)) for benign lesions. Histopathological examination was performed before CEM examination. This procedure did not change the shape of the lesion and therefore of the segmentation of the lesions.
Tumor stages were classified according to the system implemented by the American Joint Committee on Cancer staging. Ductal carcinoma in situ and invasive cancer tumors were counted as malignant lesions including, also, the papillary breast cancer that is a very rare type of invasive ductal breast cancer that accounts for fewer than 1% of all breast cancers.

Image Processing
Two expert radiologists, with 22 and 15 years of experience respectively, performed segmentation by drawing manually the regions of interest (ROIs) first separately and then

Image Processing
Two expert radiologists, with 22 and 15 years of experience respectively, performed segmentation by drawing manually the regions of interest (ROIs) first separately and then together and in accordance with each other.
The margins of the breast lesions were defined on dual-energy subtracted (DES) images, where contrast uptake was emphasized, both in CC and in early and late MLO view. An example is shown in Figure 2. The margins of the breast lesions were defined on dual-energy subtracted (DES) images, where contrast uptake was emphasized, both in CC and in early and late MLO view. An example is shown in Figure 2. DES images of each woman were processed in MATLAB (The MathWorks, Inc., Natik, MA, USA [36]) and for each ROI radiomics textural features were extracted. They provide information in the spatial distribution of intensity levels in a neighborhood.
The Texture Toolbox of MATLAB ® realized by Vallières et al. [37], which includes 48 parameters, calculated according to the Image Biomarker Standardization Initiative [38], was considered. The package implements wavelet band-pass filtering, isotropic resampling, discretization length corrections and different quantization tools. The toolbox can be downloaded at https://it.mathworks.com/matlabcentral/fileexchange/51948-radiomics (accessed on 15 April 2019).
The textural features ( Table 2) include both first-order features and second-order ones; an extra detailed description for each one has been provided in the Appendix A.  DES images of each woman were processed in MATLAB (The MathWorks, Inc., Natik, MA, USA [36]) and for each ROI radiomics textural features were extracted. They provide information in the spatial distribution of intensity levels in a neighborhood.
The Texture Toolbox of MATLAB ® realized by Vallières et al. [37], which includes 48 parameters, calculated according to the Image Biomarker Standardization Initiative [38], was considered. The package implements wavelet band-pass filtering, isotropic resampling, discretization length corrections and different quantization tools. The toolbox can be downloaded at https://it.mathworks.com/matlabcentral/fileexchange/51948-radiomics (accessed on 15 April 2019).
The textural features (Table 2) include both first-order features and second-order ones; an extra detailed description for each one has been provided in the Appendix A.

Statistical Analysis
The statistical analysis included univariate and multivariate approaches and were performed with the RStudio software [39].

Univariate Analysis
The intra-class correlation coefficient (ICC) was calculated to assess the robustness of manual segmentation for the radiomics features obtained considering the two volumes of interest segmented separately by two expert radiologists.
For two-groups comparisons, we used the non-parametric Wilcoxon-Mann-Whitney test for continuous variables. Receiver operating characteristic (ROC) analysis was performed.
To individuate the optimal cut-off value for each feature the Youden index was calculated. Area under ROC curve (AUC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC) were obtained considering the optimal cut-off values identified maximizing the Youden index [40].
A p-value < 0.05 was considered as significant. However, false discovery rate (FDR) adjustment according to Benjamini and Hochberg for multiple testing was considered.

Multivariate Analysis
Multivariate analysis was carried out following the pattern recognition model and using linear classifier (linear discriminant analysis-LDA), decision tree (TREE), k-nearest neighbours (KNN), artificial neural network (NNET), and support vector machine (SVM) to assess the diagnostic accuracy using all extracted metrics of textural parameters. In recent years, several authors have deepened the subject in their work [18,[41][42][43][44]. Moreover, a brief description of each classifier has already been discussed in a previous article [14]. Configuration settings for each classifier are provided in Table 3. The classification analysis has been cross-validated using 10-fold cross validation (10-fold CV) and the Leave-one-out cross validation (LOOCV) approaches and median values of AUC, accuracy, sensitivity, specificity, PPV e NPV were obtained.
Each classifier received the same set or subset of robust features, identified by a feature selection with the least absolute shrinkage and selection operator (LASSO) method [45]. In the LASSO method, 10-fold cross-validation was used to select the optimal regularization parameter alpha, as the average of mean square error of each patient was the smallest. With the optimal alpha, features with a non-zero coefficient in LASSO were reserved. Note that the shrinkage requires the selection of a tuning parameter (lambda) that determines the amount of penalization. Feature selection was carried out considering both the λ value with the minimum Mean Squared Error (minMSE) and the largest λ value within one standard error of it (1SE) [46]. In addition, the presence of less-represented classes has led the use of two techniques to synthesize data and, therefore, to help balance the classes (malignant and benign) giving the possibility to better train the considered classifiers: the self-adaptive synthetic over-sampling (SASYNO) approach and the adaptive synthetic sampling (ADASYN) approach. This allowed to boost the performance both overall and in terms of confusion matrix [47][48][49][50][51].
A brief informal description of LASSO method parameters and of SASYNO and ADASYN functions has already been discussed previously [52].
The best model was chosen considering the highest area under the ROC curve and highest accuracy.

Results
The ICC for radiomics textural features was excellent (median value 0.92, range 0.87-0.98), for all extracted features by DCE-MRI. Table 4 reports the diagnostic accuracy of textural parameters for CC and early and late MLO view in terms of AUC and p-value. Considering these results, the univariate analysis showed low accuracy and area under the curve for all considered features in the differentiation of benign and malignant lesions.     Considering these results, the univariate analysis showed low accuracy and area under the curve for all considered features in the differentiation of benign and malignant lesions. Figures 6-8 show the boxplots, related to above-mentioned parameters, to separate benign from malignant lesions.         As regard multivariate analysis, only the most useful results for the purposes of this work will be reported. Tables 5-8 report the performance achieved by best classifiers to discriminate benign from malignant lesions.  As regard multivariate analysis, only the most useful results for the purposes of this work will be reported. Tables 5-8 report the performance achieved by best classifiers to discriminate benign from malignant lesions.    Figure 9 shows the ROC curve of the best classifiers.
Studies of CEM focused on imaging analysis with an auxiliary system for reporting, and on radiomics features for benignant and malignant differentiation [60,61].
The radiomics analysis of tumor features extracted from CEM images combined with qualitative and quantitative information on morphology and functionality represents an important tool for breast tumor characterization [60][61][62][63]. A summary data was provided in the Table 9 in order to report the main findings using radiomic and artificial intelligence analysis on CEM. Table 9. Summary table of main findings using radiomic and artificial intelligence analysis on CEM.
Studies of CEM focused on imaging analysis with an auxiliary system for reporting, and on radiomics features for benignant and malignant differentiation [60,61].
The radiomics analysis of tumor features extracted from CEM images combined with qualitative and quantitative information on morphology and functionality represents an important tool for breast tumor characterization [60][61][62][63]. A summary data was provided in the Table 9 in order to report the main findings using radiomic and artificial intelligence analysis on CEM. Table 9. Summary table of main findings using radiomic and artificial intelligence analysis on CEM.

Reference
Extracted Features Used Classifier Results [60] Statistical measures on original ROIs, gradiented images, Haar decompositions of the same original ROIs, and on gray-level co-occurrence matrices of each sub-ROI obtained by Haar transform.

SVM classifier
The features extracted from each sub-ROI decomposed by two levels of Haar transform were predictive only when they were all used without any selection, reaching the best mean accuracy of about 80%.
[61] Statistical Features Random Forest classifier The present method resulted highly performing in the prediction of benign/malignant ROIs with median values of sensitivity and specificity of 87.5% and 91.7%, respectively. The classification model outperformed the human reader, by increasing the specificity over 8%. [62] Radiomic features were derived from first-order HIS, COM, RLM, absolute gradient, autoregressive model, the discrete Haar WAV, and lesion geometry. Fisher, probability of error and average correlation (POE + ACC), and mutual information (MI) coefficients informed feature selection. Linear classifier, decision tree, k-nearest neighbours, artificial neural network and support vector machine were utilized.
Multivariate analyses considering 144 textural features extracted from all three mammographic projections (CC, early MLO and late MLO) optimized by adaptive synthetic sampling and feature selection operations obtained the best results (ACC = 0.87; AUC = 0.90) and showed the best performance in the discrimination of benign and malignant lesions.
Marino et al. [62] investigated the potential of CEM and radiomics analysis for the noninvasive differentiation of breast cancer invasiveness, hormone receptor status, and tumor grade. Radiomics features were derived from first-order histogram (HIS), co-occurrence matrix (COM), run length matrix (RLM), absolute gradient, autoregressive model, the discrete Haar wavelet transform (WAV), and lesion geometry. They reported that radiomics analysis with CEM has potential for noninvasive differentiation of tumors with different degrees of invasiveness, hormone receptor status, and tumor grade.
La Forgia et al. [63] evaluated radiomics features for predicting histological outcome and two cancer molecular subtypes, namely, Human Epidermal growth factor Receptor 2 (HER2)-positive and triple-negative suggesting an interesting role for radiomics in CEM to predict histological outcomes and particular tumors' molecular subtype.
In this study, we aimed to evaluate radiomics analysis with texture features extracted by dual-energy CEM images in the classification of malignant and benign breast. We performed the evaluation considering both a univariate analysis and a multivariate analysis using pattern recognition approaches.
The univariate analysis on features extracted from CC view shows statistically positive results for MAD (AUC = 0.72), IQR (AUC = 0.71) and VARIANCE (AUC = 0.72) among first order gray-level statistics, in addition to COARSNESS, BUSYNESS and STRENGTH as regards Neighborhood Gray-Tone Difference Matrix. Considering these results, the univariate analysis showed low accuracy and area under the curve for all considered features in the differentiation of benign and malignant lesions.
As regards multivariate analysis, using the unbalanced dataset showed no significant results; in particular, specificity assumed low values. After a balancing operation (with ADASYN function) and a features selection operation (with LASSO approach) higher values of accuracy, specificity and AUC were measured. A TREE trained with balanced data and a subset of three robust textural predictors (MAD, VARIANCE and LRLGE) achieved the best performance with an accuracy of 0.75 and an AUC of 0.82.
As regards the early MLO view univariate analysis, higher AUC values occurred in correspondence with VARIANCE, BUSYNESS and STRENGTH (AUC = 0.70 in all three cases). With the multivariate analysis, a balancing operation (both with the ADASYN function and the SASYNO function) and a features selection operation allowed to reach good improvements. The best performance (accuracy of 0.77 and AUC of 0.83) was obtained with a NNET trained with balanced data (with ADASYN function) and a subset of robust As for the multivariate analysis, the best performance (accuracy of 0.73 and AUC of 0.82) using late MLO features was reached with an NNET trained with balanced data (with ADASYN function) and a subset of robust textural features including MODE, MEDIAN, RANGE, RLN, LRLGE, RLV, LZLGE, GLV_GLSZM, ZSV, COARSNESS and BUSYNESS.
Presently, the determination of the breast malignancies is performed with biopsy. Unfortunately, this examination is invasive and then, there is great interest in alternative, non-invasive and cheaper methods to derive the same information directly from the radiologic images. Another important observation of automated systems for radiomics analysis is that these are not affected by factors such as breast density or background parenchymal enhancement (BPE), which may be limiting breast lesions differentiation. Consequently, these systems also preserve their reliability in more complex breast investigations. Therefore, also that textural analyses is again little used in clinical routine practice; in this context, the radiomics analysis of tumor textural features extracted from CEM images represent an important robust tool for breast tumor characterization.
The small cohort of studied patients represents an initial finding to validate by increasing the sample size of the study in the future. The segmentation of the ROIs slice by slice was manual, and this can be time-consuming, can be operator variable, can be difficult in the segmentation of multicentric lesions or in cases of background parenchymal enhancement. In this study is not reported a comparison of accuracy in the breast lesions detection of CEM respect to FFDM. The future endpoint is to include in the analysis an automatic segmentation of the lesions. The radiomics analysis did not consider tumor histological differences, while the integration of texture metrics combined with histopathology results may provide other important prognostic information for the classification of malignant breast lesions both in early and late phase. This could improve the performance in the classification problem and allow classifying breast lesions according grading and histotype. We did not evaluate the possible impact on texture values considering the different impact of the "void sign" after VAAB vs. post FNAC. This could be evaluated in a future study.
In conclusion, multivariate analyses using pattern recognition approaches optimized by adaptive synthetic sampling and feature selection operations obtained the best results to separate benign and malignant lesions. Overall, textural features extracted from CC view images, showed the best performance. However, the best results (ACC = 0.87; AUC = 0.90) were reached by considering all textural features extracted from dual-energy CEM images, using an SVM trained with balanced data. These findings represented a preliminary experience that should be validate with a larger sample size including patients by different centers.

Conflicts of Interest:
The authors declare no conflict of interest.

First-Order Gray-Level Statistics
First order gray-level statistics describe the distribution of gray-values within the volume. Let X denote the 3-D image matrix with N voxels, P the first order histogram, P(i) the fraction of voxels with intensity level i and Nl the number of discrete intensity levels.

•
Mean, the mean gray-level of X.
• Mode, the most frequent element(s) of array X. • Median, the sample median of X, or the 50th percentile of X. • Standard deviation (STD) Mean Absolute Deviation (MAD), the mean of the absolute deviation of all voxel intensities around the mean intensity value.
where max(X) is the maximum intensity value of X and min(X) is the minimum intensity value of X. • Interquartile range (IQR), the interquartile range is defined as the 75th minus the 25th percentile of X.
where X is the mean of X. • Variance, Variance is the square of the standard deviation.
where X is the mean of X.

Gray-Level Co-Occurrence Matrix (GLCM)
A normalized GLCM is defined as P(i, j; δ, α), a metric with size N g × N g describing the second-order joint probability function of an image, where the (i, j)th element represents the number of times the combination of intensity levels i and j occur in two pixels in the image, that are separated by a distance of δ pixels in direction α, and N g is the maximum discrete intensity level in the image. Let: -P(i, j) be the normalized (i.e., ∑ P(i, j) = 1) co-occurrence matrix, generalized for any δ and α, - σ y be the standard deviation of p y , where σ y = • Energy |i − j| 2 P(i, j) • Entropy

Gray-Level Run-Length Matrix (GLRLM)
Run length metrics quantify gray level runs in an image. A gray level run is defined as the length in number of pixels, of consecutive pixels that have the same gray level value. In a gray level run length matrix p(i, j|θ) , the (i, j)th element describes the number of times j a gray level i appears consecutively in the direction specified by θ. Let: -p(i, j) be the (i, j)th entry in the given run-length matrix p, generalized for any direction θ, -N g the number of discrete intensity values in the image, -N r the maximum run length, -N s the total numbers of runs, where N s = p r the sum distribution of the number of runs with run length j, Run-Length Variance (RLV)

Gray-Level Size Zone Matrix (GLSZM)
A gray level size-zone matrix describes the amount of homogeneous connected areas within the volume, of a certain size and intensity. The (i, j)th entry of the GLSZM p(i, j) is the number of connected areas of gray-level (i.e., intensity value) i and size j. GLSZM features therefore describe homogeneous areas within the tumor volume, describing tumor heterogeneity at a regional scale [5].

Neighborhood Gray Tone Difference Matrix (NGTDM)
The ith entry of the NGTDM s(i|d) is the sum of gray level differences of voxels with intensity i and the average intensity A i of their neighboring voxels within a distance d. Let: -n i be the number of voxels with gray level i, -N = ∑ n i , the total number of voxels, -s(i) = ∑ n i |i − A i | f or n i > 0 0 otherwise , generalized for any distance d, -N g be the maximum discrete intensity level in the image, -p(i) = n i N , the probability of gray level i, -N p , the total number of gray levels present in the image.
• Coarseness where ε is a small number to prevent coarseness becoming infinite.