Ensemble Discrete Wavelet Transform and Gray-Level Co-Occurrence Matrix for Microcalciﬁcation Cluster Classiﬁcation in Digital Mammography

: The presence of clusters of microcalcifications is a primary sign of breast cancer. Their identification is still difficult today for radiologists, and the wrong evaluations involve unnecessary biopsies. In this paper, an automatic tool for characterizing and discriminating clusters of microcalcifications into benign/malignant in digital mammograms is proposed. A set of 104 digital mammograms including microcalcification clusters was randomly extracted from a public available database and manually labeled by our radiologists, obtaining 96 abnormal ROIs. For each so-identified ROI, a multi-scale image decomposition based on the Haar wavelet transform was performed. On the decomposition, a textural features extraction step was carried out both on each sub-image and on the corresponding gray-level co-occurrence matrix. Then, a random forest classiﬁer was employed for classifying microcalciﬁcation clusters into benign and malignant. The study found that the most discriminant features extracted from the ROIs decomposition by Haar transform were variance and relative smoothness, whereas as regards the textural features calculated on the GLCMs corresponding to the Haar-decomposed ROI, it emerged that the relationship between the pixels of the sub-image in the diagonal direction had high discriminating power for the classiﬁcation of microcalciﬁcation clusters into benign and malignant. The proposed method was evaluated in cross-validation and performed highly in the prediction of the benign/malignant ROIs, with a mean AUC value of 97.39 ± 0.01%.


Introduction
Breast cancer is currently the second most common cause of cancer death among women worldwide, and early detection of breast lesions is the key to increase the chances of survival and reduce the death rate [1,2]. Although digital mammography is a clinically accepted imaging method for detecting breast cancer in its early stage, there are some difficulties when radiologists analyze mammograms to search for signs of cancer. In particular, Microcalcifications (MCs) are often a primary sign of breast cancer: they are quite tiny calcium deposits and can be distributed in various ways throughout the breast tissue. When MCs are extremely minute (sometimes, they do not exceed 0.1 mm) and occur in clusters, they often indicate the presence of tumor lesions. The identification and characterization of clustered microcalcifications is sometimes a difficult and imprecise process. As a result, missing lesion detection during the routine check or performing many unnecessary breast biopsies on benign microcalcification clusters may occur. Thus, in the screening phase, independent double reading of mammograms by two radiologists is one of the solutions to reduce the missed tumor rate [3]. However, this solution involves a high workload and cost and is time consuming. For these reasons, an automatic tool supporting the radiologist during the mammographic interpretation of such lesions might produce more benefits for the health system with more accurate diagnosis and for the clinics with a lesser radiologist time requirement [4].
Computer technologies and mathematical models can allow increasing the detection rate, and consequently, they can effectively reduce the mortality rate caused by breast cancer [5,6].
The decision support system known as Computer-Aided Detection/Diagnosis (CAD) may offer radiologists reliable support to detect abnormalities in mammographic images [7][8][9]. Indeed, the detection of breast cancer with digital mammography improves when the radiologists can use an artificial intelligence system as the supporting diagnostic tool, without requiring additional reading time [10][11][12].
Although various models of CAD systems for breast diseases have been developed in the past two decades, reaching high performances for specific abnormalities, the automatic and accurate classification of microcalcification clusters in benign/malignant lesion is still difficult due to their nature. In fact, while the properties of normal tissue are quite different from abnormal ones, often benign and malignant ROIs may show overlapping characteristics. Some works showed this difficulty in their analysis: they achieved excellent results in terms of the classification of normal tissue and abnormal ROIs, while they were not able to provide an equally good accuracy for the benign and malignant discrimination problem [13,14]. Furthermore, our previous work on the diagnosis of MC clusters underlined this difference [15]: while the classification of normal and abnormal clusters reached an accuracy greater than 98%, the distinction between benign and malignant MCs required further study, focusing on the extraction of features useful for this purpose.
During the last two decades, texture analysis by using a set of local statistical properties of pixel intensity proved to be a useful technique to detect and classify lesions in digital mammograms. The benefit of such an approach results in being more effective when the image is analyzed at different scales: this technique may be performed by decomposing each image into different frequency sub-bands by means of a wavelet transform and calculating first and second order texture features. This is confirmed by the several multi-scale texture analysis approaches recently proposed. The work presented in [16] analyzed a set of features extracted by using wavelet functions based on dual tree complex transforms in order to capture the correlation between different wavelet coefficients and their statistical dependencies.
Furthermore, the texture analysis can also take place by considering the spatial relationship between pixels with different gray levels using the co-occurrence matrices [3,17,18] on original images. It is a solid method based on the regular changes of gray levels between two neighboring pixels and how these changes are statistically and spatially correlated.
Most of the state-of-the-art works use these features together with others of different kinds, such as shape features [3,[17][18][19], which are often operator dependent because they require the intervention of the radiologist to segment the lesion manually or define appropriate descriptors.
In previous works, we developed detection and classification models of the microcalcification clusters on Full-Field Digital Mammograms (FFDMs) [15,20] as support tools for experts radiologists; indeed, they provided the automated ROIs' segmentation and characterization into normal/abnormal with high performing results. In this work, we propose a texture analysis approach based on a multi-scale decomposition of the ROIs with the aim of characterizing them in benign/malignant tissue. In particular, to make the lesion classification more objective and operator independent, once the radiologist has identified the Regions Of Interest (ROIs), the characteristic features are extracted in an automated manner.
The process is a multi-phase approach composed of a feature extraction step, performed by texture analysis methods, a features selection step, and a breast region classification in order to categorize clusters of MCs in digital mammograms. Specifically, as a first step, for each ROI, a multi-scale decomposition based on the Haar wavelet transform [21,22] is obtained. On such a decomposition, a set of well known statistical features and a number of attributes able to describe the image texture using the Gray-Level Co-occurrence Matrix (GLCM) [23] are extracted. Successively, a feature selection technique is exploited, carried out with embedded methods allowing an optimization between the interaction of the selected features and the classification algorithm used. Finally, a random forest classifier is trained on a subset of features selected by means of an embedded method with the aim of labeling the clustered microcalcifications in benign or malignant ones. The model performance is tested in cross-validation and evaluated in terms of accuracy and sensitivity; then it is compared with respect to the state-of-the-art.
Differently from the approaches shown in literature and developed for film scanned/digitized mammograms, the proposed approach was developed and evaluated on FFDMs, since digital images have a higher quality than the film scanned ones. In this way, a breast CAD could be implemented on images obtained from digital devices of various cancer institutes preserving a functional compatibility for future research activities. In particular, digital mammograms randomly selected from the public database Breast Cancer Digital Repository (http://bcdr.ceta-ciemat.es) (BCDR) were exploited.

Experimental Data
The data used in our analysis consisted of a set of full field digital mammograms extracted from the open access BCDR [24].
Specifically, 104 digital mammograms containing microcalcifications were randomly downloaded from BCDR-DM, the Full Field Digital Mammography based Repository of BCDR. Although the BCDR database provides all the images along with information about the diseases affected by each patient, only the main lesions, often different from the microcalcifications, were highlighted. For this reason, a double blind reading by two radiologists of our institute dedicated to senologic diagnostics was manually performed in order to identify and then classify ROIs containing microcalcification clusters only. Then, as a result of the comparison between the readings of the two radiologists, the ROIs for which both radiologists agreed were considered. In so doing, 56 benign and 40 malignant ROIs with different sizes depending on the extension of each cluster were obtained (see Figure 1 for some examples of ROIs containing microcalcification clusters).

Texture Analysis
Texture is a feature that can be used to segment an image into regions of interest and classify them [25]. Theoretically, it can be seen as composed of sub-patterns with particular descriptors, and assuming that such descriptors are uniform within the regions with the same texture, it can be helpful to identify a number of features able to discriminate different textures. In the literature, textural analysis is generally based on local statistical measures computed on the pixels' gray levels of the image. In accordance with this interpretation, in this work, we extracted and analyzed a set of statistical features (1) on a wavelet decomposition of the original ROI and (2) on the relatively calculated GLCMs, on which a classifier was trained with the aim of performing a texture analysis process able to differentiate benign microcalcifications lesions form malignant ones. Specifically, we exploited a two level Haar wavelet decomposition of the ROI containing the lesion; successively, on such resulting images and on the relative calculated GLCM, a set of statistical features was extrapolated as in the following and is depicted in the schema reported in Figure 2. Finally, a classification model was embedded to learn a model able to discriminate benign from malignant tissue and to provide useful information on features' importance in such a process. Figure 2. Scheme of feature extraction. For each ROI, a set of some textural features was computed on a multi-scale decomposition by using the Haar wavelet transform, called the HAAR SFset. Moreover, GLCMs was calculated on each sub-image decomposed by Haar transform in order to derive a second set of textural features, called the GLCM SF set. This scheme is shown starting from a low energy image, but it is also performed for recombined images.

Multi-Scale Wavelet Decomposition
The first step of the proposed work was mainly based on a multi-scale wavelet analysis, an appropriate technique for detection and classification of microcalcifications. Indeed, in digital mammograms, the microcalcifications appear brighter than the neighboring tissue, and represent one of the main high frequency components of the image spectrum. Consequently, in order to catch this signal, the wavelet transform can be used to decompose the original image into different frequency sub-bands. In particular, the Haar wavelet decomposition rescales the high frequency sub-band and reconstructs the image using such a scaled high frequency sub-band [21,22].
In the 2D Haar wavelet decomposition of the image, the original image is first high-pass filtered, yielding the three detailed coefficients' sub-images (Figure 3a-a1; top right: horizontal, bottom left: vertical, and bottom right: diagonal), then low-pass filtered and down-scaled, yielding an approximation coefficient sub-image (Figure 3a-a1, top left). To compute the successive level of decomposition, the process was iterated on the approximation coefficient sub-image (Figure 3b-b1, top left). All of these coefficients describe the local geometry of the image in terms of scale and orientation [22,26], allowing suppressing the sub-band of the Haar decomposed image that carries the lowest frequencies and contains smooth (background) information, before the reconstruction of the image [26]. In particular, for each image, the 2D Haar transform at two levels of decomposition was performed, obtaining eight sub-images (LL1, HL1, LH1, HH1, LL2, HL2, LH2, HH2), as shown in Figure 3b-b1. As for the textural features derived from such a decomposition, for each of the sub-images were obtained the computed mean, variance, skewness, kurtosis, entropy, and relative smoothness. The second order moment of the mean, also known as variance, describes the visual roughness of the image, while the third order and fourth order moment, i.e., skewness and kurtosis, respectively, reflect the asymmetry and the uniformity of the histogram of the image. If K denotes the gray scale, µ the gray average, and h(k), k = 0, 1, 2, ..., K − 1, the histogram of each sub-image, they are defined as follows: This way results in a set of 48 statistical features (in the following HAAR SFset) for each ROI.

Gray-Level Co-Occurrence Matrix
A further step towards the texture analysis was carried out by considering the spatial relationship of pixels in the ROIs by using the GLCM method [23], a widely used technique to grasp the textural patterns from the gray-level spatial dependences present in the image [25,27].
After the GLCMs' generation, several statistical parameters, providing useful information about the texture of an image, could be derived from them by using different formulas. Specifically, in our study, for each image obtained from the first level of the Haar decomposition (HL1, LH1, and HH1), we considered an offset D = 2 along the θ 0 • direction, obtaining two GLCMs, taking into account the relationship between the pixels from right to left (in the following GLCM dir1) and then from left to right (in the following GLCM dir2). Supposing that each GLCM is defined as a matrix G ij ; statistical measures are computed as follows: Di f f erenceEntropy = − Thus as result, a set of 22 (11 for each of the two directions D) statistical features were obtained for HL1, LH1, and HH1 Haar decomposition, for a totally of 66 features (in the following GLCM SF set) for each ROI.

Classification Model
Once the set of statistical features was extracted from different frequency sub-bands of the Haar wavelet transform calculated on each original image and its relative GLCMs, as described above, a classification model was trained to discriminate microcalcification clusters into benign and malignant on a subset of significant features. We used a well known machine learning method, like random forest (RF) [28].
Moreover, in order to evaluate the minimum number of significant features needed to solve the classification problem, we developed an embedded feature selection [29,30]. Embedded methods allow optimization between the interaction of the selected features and the machine learning algorithm used, because the selection criterion is grafted into it. In particular, the RF classifier processes an analysis of feature importance with respect to its expected result; therefore, these methods are essentially fulfilling the goal, i.e., optimizing the classifier performance. At each step of cross-validation, a feature ranking was calculated with respect to the predictive power assessed with the embedded method on the training set. Then, a binary classification model was trained by selecting iteratively an increasing number of features sorted by their discriminating power. In particular, the tree based strategy used by RF naturally ranks by how well they improve the purity of the node: nodes with the greatest decrease in impurity are at the start of the trees, while nodes with the least decrease in impurity occur at the end of trees measured by Gini's diversity index. Thus, the RF algorithm allows estimating predictor importance values by permuting out-of-bag observations among the trees. In this work, we adopted a standard configuration for RF with 100 trees and 20 features (as described in [28]) randomly selected at each split. More complicated architectures did not give any significant classification improvement.
The performance of the prediction model was evaluated on 100 ten-fold cross-validation rounds in terms of Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve and also accuracy, sensitivity, and specificity calculated, identifying the optimal threshold by means of Youden's index on ROC curves [31]. Performing 100 rounds of cross-validation allowed having an estimate of the error on the average value of the performance.
All the analysis algorithms used in this work (images' pre-processing, feature extraction and, learning models) were implemented in MATLAB2017a (MathWorks, Inc., Natick, MA, USA) software.

Performance Evaluation
The CAD system for classifying ROIs we proposed in [15] was trained on statistical features, derived from the Haar wavelet based multi-scale decomposition of the image and on the number of interest points/corners of the ROIs obtained by using two well known algorithms, SURF and Minimum Eigenvalue Algorithm (MinEigenAlg). The model was evaluated on 96 ROIs (56 benign and 40 malignant) reaching an average AUC value of 94.19% and an accuracy of 88.19% for the benign vs. malignant classification problem. Although the experimental outcomes showed high performances of the model in benign/malignant lesions' discrimination, it was below the results reported in the literature. However, the interesting result shown by the study was that by adding the number of interest points and corners detected by using Speeded Up Robust Feature (SURF) and Minimum Eigenvalue Algorithm (MinEigenAlg) improved the forecast performance of abnormal ROIs with respect to the use of statistical features calculated on the Haar wavelet transform only.
Starting from these encouraging results, here we examined the contribution of other features other than the statistical features calculated on the Haar wavelet transform; moreover, a different approach for the feature selection task was evaluated. In particular, as reported in Section 2.3, at each step of cross-validation, a features ranking was calculated by an embedded method on the learning set, and then, a binary RF classifier was trained by iteratively selecting an increasing number of features.
The experimental results for the discrimination of benign and malignant microcalcification clusters are summarized in Figure 4. The mean accuracy of the classifier model (%) was calculated on 100 rounds of 10-fold cross-validation. The experimental outcomes showed that the model reached a higher performance value exploiting only 16 out of the 114 (48 HAAR SF + 66 GLCM SF) features with a mean AUC value of 97.39 ± 0.01% ( Figure 5), an accuracy of 92.58 ± 0.02%, a sensitivity of 93.15 ± 0.04%, and a specificity of 89.18 ± 0.03%. For this classification result, the F1-score was equal to 90.05 ± 0.05%; therefore, the classifier showed also good precision. During the study and development of the present work, we used other classifiers, such as SVM, Bayes, and k-nearest neighbors, different from the random forest used for the feature selection step. Nevertheless, no significant improvement in the overall performance for the same task by other evaluated classifiers was observed; this was probably because embedded methods are techniques that perform the feature selection process as part of the model construction process. Then, we preferred to report only this best result for the ease of reading the paper.  We identified the features whose occurrence in the first 16 positions of the rankings defined by the feature selection process was significantly different from the case (p-value null model test ≤ 0.05) ( Table 1). Among these, there were only eight features calculated on the Haar decomposition images, i.e., mean, variance, relative smoothness, and kurtosis. In particular, variance and relative smoothness calculated on the original high-pass filtered image of the first (LL1) and second (LL2) level of the Haar decomposition were always selected; therefore, these seemed more informative for the recognition of the type of microcalcification cluster. The other features had an occurrence of selection below 15%, which seemed to be attributable exclusively to the case and not to a real discriminating power. Moreover, taking into account the features calculated on the GLCMs of the Haar decomposed ROIs, an interesting result emerged. Since GLCM is a technique to obtain information on the relationship between the pixels of an image in each direction, the features with the most discriminating power seemed to be the statistic measures calculated on the GLCM corresponding to the sub-images in the diagonal direction.

Discussion
In this paper, we proposed a CAD system for characterizing and discriminating suspicious areas into benign/malignant as a development of the previous work for the classification of microcalcification clusters in full field digital mammograms [15]. The analysis was performed on 96 ROIs collected from 104 digital mammogram images of the BCDR database. For each ROI, firstly, a set of some textural features was computed on a multi-scale decomposition based on the Haar wavelet transform, obtaining a first set of statistical features (HAAR SF set). Secondly, the GLCM on the sub-images decomposed by Haar transform was calculated to derive a different set of textural features, obtaining a second set of statistical features (GLCM SF set). Therefore, all the features obtained were used to train an RF classifier with an embedded method for the feature selection able to discriminate ROIs. The classification performance was evaluated in terms of accuracy, sensitivity, specificity, and AUC values.
The model developed on the HAAR and GLCM statistical feature set, previously described, was high performing in the prediction of benign and malignant clustered microcalcifications with the RF classifier. It was observed that using both textural feature sets led to a median AUC value of 97.39%, an accuracy of 92.58%, a sensitivity of 93.15%, and a specificity of 89.18%. Table 2 summarizes the results of the proposed method and other state-of-the-art models for the classification into benign and malignant ROIs. For this comparison, we considered works on classification of MC clusters that mainly used textural features [16] or a combination of these feature with other typologies [3,[17][18][19]32]. We considered both works that used the same public database and others that collected digital mammography images; since all the databases collected heterogeneous images, often not coming from the same research centers, according to us, this comparison was justified.
Moreover, in [33,34], the authors developed a Convolutional Neural Network (CNN) decision support system, which was an artificial feed-forward neural network with tens or hundreds of layers. Each of these layers learned to detect independently the different features of an image: the algorithm divided each image into topologically compact portions, each of which would be processed by the filters in order to search for specific patterns, and the output of each image was used as the input for the next layer. However, this method tended to require more images for training, as the new network needed numerous examples of the object to understand the variation of the features. Training times were often longer, and the number of combinations of existing network layers could make configuring the network from scratch a very complex operation.
In our work, we used only textural features. The extraction of tissue texture features performed in an automated manner from ROIs did not require any intervention by radiologists and provided characteristic information about microcalcification clusters that may be helpful to human readers to improve the discrimination of malignant and benign lesions.
Since these works used different digital databases and classifiers, the comparison was purely qualitative. However, the performance of our model was high performing in terms of accuracy and AUC both with respect to works using the same database and to those using the DDSM database. As highlighted in the table, only [18] showed a higher accuracy, but in this work, additional features were used, such as shape features, which required a segmentation process that did not seem to justify the computational effort in terms of performance.

Conclusions
The diagnosis of microcalcifications is usually based on radiologists' expertise, sometimes resulting in an inaccurate lesion detection with unnecessary biopsies and subsequent surgery. Several methods have been developed for the task of microcalcification diagnosis, in some cases performing well. Indeed, in the literature, this classification problem was not yet very accurate compared to the normal and abnormal tissues' discrimination, despite a high rate of false positives implying unnecessary biopsies. Nevertheless, the automatic and accurate classification of microcalcification clusters, especially in differentiating the benign from the malignant ones, remains still complicated due to their nature. In this paper, we proposed the use of texture analysis methods combined with machine learning techniques in order to select the optimal subset for characterizing breast regions. In particular, we trained a binary RF classifier on an increasing number of features sorted by their diagnostic power calculated using the embedded feature importance method.
Specifically, we developed a completely automated texture analysis approach to characterize benign and malignant tissues on digital mammograms. Using only textural features, the experimental results evaluated by cross-validation techniques were quite promising with respect to the state-of-the-art works.
Since the sample of images with microcalcification clusters of BCDR was poor, we considered it appropriate to evaluate the model performance through the cross-validation technique. Nevertheless, our future work will include a validation study on a larger population, also considering other public database, and on an independent test sample. Moreover, we will evaluate the contribution of other types of features to our model with respect to those used in this work, in order to improve the diagnostic accuracy.  Acknowledgments: This work was supported by funding from the Italian Ministry of Health "Ricerca Corrente 2018-2020".

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.