Novel Texture Feature Descriptors Based on Multi-Fractal Analysis and LBP for Classifying Breast Density in Mammograms

This paper investigates the usefulness of multi-fractal analysis and local binary patterns (LBP) as texture descriptors for classifying mammogram images into different breast density categories. Multi-fractal analysis is also used in the pre-processing step to segment the region of interest (ROI). We use four multi-fractal measures and the LBP method to extract texture features, and to compare their classification performance in experiments. In addition, a feature descriptor combining multi-fractal features and multi-resolution LBP (MLBP) features is proposed and evaluated in this study to improve classification accuracy. An autoencoder network and principal component analysis (PCA) are used for reducing feature redundancy in the classification model. A full field digital mammogram (FFDM) dataset, INBreast, which contains 409 mammogram images, is used in our experiment. BI-RADS density labels given by radiologists are used as the ground truth to evaluate the classification results using the proposed methods. Experimental results show that the proposed feature descriptor based on multi-fractal features and LBP result in higher classification accuracy than using individual texture feature sets.


Introduction
Breast density is a critical bio-marker which indicates the possibility of developing breast cancer in the future for women. High breast density is caused by a high percentage of fibro-glandular tissue and reduces the effectiveness of mammography screening [1]. Related research work shows that women with extremely dense breast could suffer four to six-fold higher risk of developing breast cancer than other females with low breast density [2]. Dense tissue areas in mammograms also cause the 'masking' effect leading to reduced sensitivity when radiologists visually assess breast lesions or early signs of cancer, such as lumps and calcification clusters [3]. Breast density is gaining significant attention because it is closely associated with higher cancer risk, increased incidence of interval cancer and reduced mammographic sensitivity.
Breast density measurements help identify and target women who could benefit from tailored screening options such as increased (or decreased) screening interval or supplemental screening via alternative modalities [4]. Breast density categories currently in use are six-class-categories (SCC) [2], Wolfe's four categories [5], and the Breast Imaging-Reporting and Data System (BI-RADS) [6]. BI-RADS criterion proposed by the American College of Radiology (ACR) has been widely used in clinical applications and includes four density categories: fatty, scattered density, heterogeneously dense, and extremely dense. A simplified two-category model ("fatty and sparsely dense", "heterogeneously Texture analysis methods, particularly using descriptors such as LBP and its variants, have been shown to be useful in extracting relevant image structure information for classification tasks. Ojala et al. [24] proposed a method using local binary patterns (LBP) to describe image texture patterns. Due to its simplicity and efficiency, LBP has been studied widely and several variants were proposed for extracting texture features and classifying medical images. In [25], LBP was extended to elliptical LBP (ELBP), and meanelliptical LBP (M-ELBP) by considering various neighbourhood topologies and different local region scales. It has been reported that M-ELBP gave a satisfactory classification result (77.38 ± 1.06) on MIAS dataset. Tan et al. [26] proposed local ternary patterns (LTP) using a three-value encoding approach compared to two-value encoding of LBP. Rampun et al. [27] extracted LTP based texture features to classify MIAS mammograms into four BI-RADS categories. Nanni et al. [28] proposed local quinary patterns (LQP) by extending LBP from a binary encoding system to a five-value encoding system, and used three different medical image classification tasks to test this new texture descriptor. Subsequently LQP was investigated and extended with multi-resolution and multi-orientation schemes in [29] to classify mammographic density. Their experimental results showed that the use of LQP based texture features improved the classification accuracy. In a recent study, Rampun et al. [30] tried local septenary patterns (LSP) method by using a seven-value encoding approach to further improve the classification performance. Their experimental results demonstrated that classification accuracy was slightly improved by LSP compared to LQP (80.5 ± 9.2 vs. 80.1 ± 10.5 on INBreast dataset).
Inspired by the success of texture analysis methods based on LBP and its variants, this paper investigates the usefulness of a richer texture descriptor by combining LBP and texture features obtained using multi-fractal analysis. Multi-fractal methods, due to their capability of enhancing image texture characteristics, have been used to process and interpret medical images [31][32][33]. Its applications in other image processing tasks also demonstrate that it can be an effective and promising tool for extracting texture features. To the best of our knowledge, multi-fractal analysis and its feature vectors have not been previously used to estimate breast density in mammograms. In this paper, we assume that a joint texture feature space consisting of LBP and multi-fractal features could possibly capture more effective and useful texture features related to breast density characteristics, thus improving the classification accuracy over methods using only one individual feature set. A FFDM dataset, INbreast, containing 409 mammogram images with the BI-RADS density labels annotated by radiologists as the ground truth is used in experiments to test the proposed methods. More details of our experimental approach are given in the following sections. This paper is organized as follows. The next section describes a digital mammogram dataset used in experiments and the processing pipeline. Section 3 introduces two feature extraction methods: multi-fractal analysis and LBP. Section 4 discusses PCA and autoencoder model for feature selection, and Section 5 shows our experimental results and comparative analysis. Conclusions and some future work directions are given in Section 6.

Dataset
INbreast [34] is a FFDM database which contains 115 cases and 409 images including bilateral mediolateral oblique (MLO) and craniocaudal (CC) views. This database was acquired at the Breast Centre in CHSJ, Porto, under permission of both the Hospitals Ethics Committee and the National Committee of Data Protection. The images were acquired between April 2008 and July 2010; the acquisition equipment was the MammoNovation Siemens FFDM, with a solid-state detector of amorphous selenium, pixel size of 70 mm (microns), and 14-bit contrast resolution. The image matrix was 3328 × 4084 or 2560 × 3328 pixels, depending on the compression plate used in the acquisition (based on the breast size of the patient). This dataset offers carefully associated ground truth (GT) annotations by radiologists. For breast density, BI-RADS labels are available, including 136, 146, 99 and 28 images respectively in BI-RADS I to IV classes.

Dataset
INbreast [34] is a FFDM database which contains 115 cases and 409 images bilateral mediolateral oblique (MLO) and craniocaudal (CC) views. This databa quired at the Breast Centre in CHSJ, Porto, under permission of both the Hospit Committee and the National Committee of Data Protection. The images were between April 2008 and July 2010; the acquisition equipment was the Mammo Siemens FFDM, with a solid-state detector of amorphous selenium, pixel size (microns), and 14-bit contrast resolution. The image matrix was 3328 × 4084 or 25 pixels, depending on the compression plate used in the acquisition (based on size of the patient). This dataset offers carefully associated ground truth (GT) an by radiologists. For breast density, BI-RADS labels are available, including 13 and 28 images respectively in BI-RADS I to IV classes. Figure 1 illustrates four mammograms in different density categories.

Processing Stages
Various stages of the processing pipeline are shown in Figure 2. After re input mammogram image, we first use a pre-processing step to segment a regio est (ROI) which is the breast region in this work. The ROI segmentation results image which can be used in the feature extraction step, where we use textur extracted from only the breast region areas (excluding the background area and muscle region). Multi-fractal analysis and local binary patterns are used to extra features from breast regions, constituting the texture descriptors. PCA and au network are used in the feature selection step to reduce the feature space. SVM as the classifier to predict the breast density label for each test mammogram ima steps are detailed in the following sections.

Processing Stages
Various stages of the processing pipeline are shown in Figure 2. After reading an input mammogram image, we first use a pre-processing step to segment a region of interest (ROI) which is the breast region in this work. The ROI segmentation results in a mask image which can be used in the feature extraction step, where we use texture features extracted from only the breast region areas (excluding the background area and pectoral muscle region). Multi-fractal analysis and local binary patterns are used to extract texture features from breast regions, constituting the texture descriptors. PCA and autoencoder network are used in the feature selection step to reduce the feature space. SVM is chosen as the classifier to predict the breast density label for each test mammogram image. These steps are detailed in the following sections.

Mammogram Pre-Processing
The main task performed in the pre-processing stage is the breast area segmentation. In MLO view mammograms, the pectoral muscle region is captured along with the breast region. However, the pectoral muscle represents a predominantly dense region which may easily affect breast density evaluation. Therefore, ROI segmentation is first applied to remove non-breast areas such as the background region and pectoral muscle. A pipeline for the pre-processing stage is shown in Figure 3. A multi-fractal method (introduced in Sections 3.1 and 3.2) is used to enhance the contrast between image background and the breast tissue region. An intensity thresholding method and morphological operations [35] are used to separate the breast region from the background. A K-means algorithm and polynomial fitting approach [36] are employed to distinguish the pectoral muscle region from the breast region in MLO view mammograms. A median filter of 3 × 3 size is used to reduce noise. Finally, mask images are obtained, which are used to extract image features from only the region of interest (breast area) in the following steps. Figure 4 shows examples of segmenting the breast region from INbreast mammograms. A more detailed introduction of the methods used in this pre-processing stage can be found in the Supplementary Material ( Figures S1 and S2).

Mammogram Pre-Processing
The main task performed in the pre-processing stage is the breast area segmentation. In MLO view mammograms, the pectoral muscle region is captured along with the breast region. However, the pectoral muscle represents a predominantly dense region which may easily affect breast density evaluation. Therefore, ROI segmentation is first applied to remove non-breast areas such as the background region and pectoral muscle. A pipeline for the pre-processing stage is shown in Figure 3. A multi-fractal method (introduced in Sections 3.1 and 3.2) is used to enhance the contrast between image background and the breast tissue region. An intensity thresholding method and morphological operations [35] are used to separate the breast region from the background. A K-means algorithm and polynomial fitting approach [36] are employed to distinguish the pectoral muscle region from the breast region in MLO view mammograms. A median filter of 3 × 3 size is used to reduce noise. Finally, mask images are obtained, which are used to extract image features from only the region of interest (breast area) in the following steps. Figure 4 shows examples of segmenting the breast region from INbreast mammograms. A more detailed introduction of the methods used in this pre-processing stage can be found in the Supplementary Material ( Figures S1 and S2).

Mammogram Pre-Processing
The main task performed in the pre-processing stage is the breast area segmentation. In MLO view mammograms, the pectoral muscle region is captured along with the breast region. However, the pectoral muscle represents a predominantly dense region which may easily affect breast density evaluation. Therefore, ROI segmentation is first applied to remove non-breast areas such as the background region and pectoral muscle. A pipeline for the pre-processing stage is shown in Figure 3. A multi-fractal method (introduced in Sections 3.1 and 3.2) is used to enhance the contrast between image background and the breast tissue region. An intensity thresholding method and morphological operations [35] are used to separate the breast region from the background. A K-means algorithm and polynomial fitting approach [36] are employed to distinguish the pectoral muscle region from the breast region in MLO view mammograms. A median filter of 3 × 3 size is used to reduce noise. Finally, mask images are obtained, which are used to extract image features from only the region of interest (breast area) in the following steps. Figure 4 shows examples of segmenting the breast region from INbreast mammograms. A more detailed introduction of the methods used in this pre-processing stage can be found in the Supplementary Material (Figures S1 and S2).

Multi-Fractal Analysis
Multi-fractal analysis of image intensity variations computed using a set of local measures, can be used to describe image texture features for different classification tasks [31][32][33]. Let μw(p) denote a multi-fractal measure function, where p is the central pixel in a square window of size w × w. Then, a local singularity coefficient, known as the Hölder exponent or α-value [37], can be calculated to reveal the variation of the selected μw(p) function within the neighbourhoods of the pixel p.
where, C is an arbitrary constant and d is the total number of windows used in the computation of αp. The value of αp can be estimated from the slope of a linear regression line in a log-log plot where log(μw(p)) is plotted against log(w). Commonly used multi-fractal measures for calculating α are outlined below: : : where, g(k, l) represents the intensity value of a pixel at position (k, l); Ω denotes the set of all neighbourhood pixels of p in the window; # is the number of pixels in a set. Pixel intensity values are normalized into the range of [0,1] when considering maximum and inverse-minimum measures. Such normalization enhances subtle image features in the domain of Hölder exponents due to the non-linear amplifying effect of the logarithmic function. A patch (a small image region) of one mammogram is shown in Figure 5 and one pixel p (marked in red colour) is chosen for illustrating the calculation of the α value. Figure 6 shows the measured values of μw(p) by using maximum measure when the square window sizes of w are 1, 3, and 5 respectively. The α value is then estimated using the slope of the linear regression line in a log-log plot (Equation (2)).

Multi-Fractal Analysis
Multi-fractal analysis of image intensity variations computed using a set of local measures, can be used to describe image texture features for different classification tasks [31][32][33]. Let µ w (p) denote a multi-fractal measure function, where p is the central pixel in a square window of size w × w. Then, a local singularity coefficient, known as the Hölder exponent or α-value [37], can be calculated to reveal the variation of the selected µ w (p) function within the neighbourhoods of the pixel p.
where, C is an arbitrary constant and d is the total number of windows used in the computation of α p . The value of α p can be estimated from the slope of a linear regression line in a log-log plot where log(µ w (p)) is plotted against log(w). Commonly used multi-fractal measures for calculating α are outlined below: Summation : µ w (p) = ∑ (k,l)∈Ω g(k, l) where, g(k, l) represents the intensity value of a pixel at position (k, l); Ω denotes the set of all neighbourhood pixels of p in the window; # is the number of pixels in a set. Pixel intensity values are normalized into the range of [0, 1] when considering maximum and inverse-minimum measures. Such normalization enhances subtle image features in the domain of Hölder exponents due to the non-linear amplifying effect of the logarithmic function. A patch (a small image region) of one mammogram is shown in Figure 5 and one pixel p (marked in red colour) is chosen for illustrating the calculation of the α value. Figure 6 shows the measured values of µ w (p) by using maximum measure when the square window sizes of w are 1, 3, and 5 respectively. The α value is then estimated using the slope of the linear regression line in a log-log plot (Equation (2)).   Figure 5 using the maximum measure.

Alpha Image and Texture Features
Alpha images (denoted as α-images) are obtained by replacing the intensity value at each position p by the αp value. In α-images, certain texture patterns have higher contrast compared to original images. The range of α values in an α-image is denoted by [αmin, αmax]. This range is subdivided into a set of bins, and pixels having the α values in the same bin are counted to obtain an α-histogram which can be used as one of the texture features [31,32]. In this study, the α value range in each mammogram image is divided into 100 bins which are used to obtain the corresponding α-histograms. As this work focuses on breast density classification, we extract texture features from only breast regions (pectoral muscle and background area excluded). Breast region mask images obtained in the preprocessing stage are used to control that the α-images and α-histograms reflect the image information in breast regions. In addition, since the area of breast region differs between different mammograms, we use the proportion of pixels in the region of interest to represent the height of bins in histograms instead of the pixel number, which makes the histo-   Figure 5 using the maximum measure.

Alpha Image and Texture Features
Alpha images (denoted as α-images) are obtained by replacing the intensity value at each position p by the αp value. In α-images, certain texture patterns have higher contrast compared to original images. The range of α values in an α-image is denoted by [αmin, αmax]. This range is subdivided into a set of bins, and pixels having the α values in the same bin are counted to obtain an α-histogram which can be used as one of the texture features [31,32]. In this study, the α value range in each mammogram image is divided into 100 bins which are used to obtain the corresponding α-histograms. As this work focuses on breast density classification, we extract texture features from only breast regions (pectoral muscle and background area excluded). Breast region mask images obtained in the preprocessing stage are used to control that the α-images and α-histograms reflect the image information in breast regions. In addition, since the area of breast region differs between different mammograms, we use the proportion of pixels in the region of interest to represent the height of bins in histograms instead of the pixel number, which makes the histo-

Alpha Image and Texture Features
Alpha images (denoted as α-images) are obtained by replacing the intensity value at each position p by the α p value. In α-images, certain texture patterns have higher contrast compared to original images. The range of α values in an α-image is denoted by [α min , α max ]. This range is subdivided into a set of bins, and pixels having the α values in the same bin are counted to obtain an α-histogram which can be used as one of the texture features [31,32]. In this study, the α value range in each mammogram image is divided into 100 bins which are used to obtain the corresponding α-histograms. As this work focuses on breast density classification, we extract texture features from only breast regions (pectoral muscle and background area excluded). Breast region mask images obtained in the pre-processing stage are used to control that the α-images and α-histograms reflect the image information in breast regions. In addition, since the area of breast region differs between different mammograms, we use the proportion of pixels in the region of interest to represent the height of bins in histograms instead of the pixel number, which makes the histogram based features uniform and comparable. Figures 7-10 show α-images of mammograms in different breast density categories given earlier in Figure 1, and their α-histograms using four different multi-fractal measures. gram based features uniform and comparable. Figures 7-10 show α-images of mammo grams in different breast density categories given earlier in Figure 1, and their α-histo grams using four different multi-fractal measures.   gram based features uniform and comparable. Figures 7-10 show α-images of mammo grams in different breast density categories given earlier in Figure 1, and their α-histo grams using four different multi-fractal measures.

Local Binary Patterns
Local binary pattern (LBP) proposed by [24] is a powerful feature descriptor used f texture analysis and classification. Due to its simplicity and robustness, several researc works and applications use it to extract image features. For mammographic density cla sification, LBP and its variants have been applied and tested to improve the classificatio accuracy in [25][26][27][28][29][30]. A binary pattern is derived by comparing the intensity at each pix with its neighbours and encoding the information in a P-bit integer value. Concretely, f each central pixel c with a grey level value gc in a specific window size, its LBP code calculated by comparing the gc value with its neighbourhood pixels which is located at distance R from c. If gc is higher than the neighbouring pixel Pi, then the neighbour pix is assigned a value 0, otherwise a value 1. Subsequently, a P-bit binary code is generate for the current pixel c. LBP P,R is used to denote this binary code and its calculation can b described as follows:

Local Binary Patterns
Local binary pattern (LBP) proposed by [24] is a powerful feature descriptor used fo texture analysis and classification. Due to its simplicity and robustness, several researc works and applications use it to extract image features. For mammographic density cla sification, LBP and its variants have been applied and tested to improve the classificatio accuracy in [25][26][27][28][29][30]. A binary pattern is derived by comparing the intensity at each pix with its neighbours and encoding the information in a P-bit integer value. Concretely, fo each central pixel c with a grey level value gc in a specific window size, its LBP code calculated by comparing the gc value with its neighbourhood pixels which is located at distance R from c. If gc is higher than the neighbouring pixel Pi, then the neighbour pix is assigned a value 0, otherwise a value 1. Subsequently, a P-bit binary code is generate for the current pixel c. LBP P,R is used to denote this binary code and its calculation can b described as follows:

Local Binary Patterns
Local binary pattern (LBP) proposed by [24] is a powerful feature descriptor used for texture analysis and classification. Due to its simplicity and robustness, several research works and applications use it to extract image features. For mammographic density classification, LBP and its variants have been applied and tested to improve the classification accuracy in [25][26][27][28][29][30]. A binary pattern is derived by comparing the intensity at each pixel with its neighbours and encoding the information in a P-bit integer value. Concretely, for each central pixel c with a grey level value g c in a specific window size, its LBP code is calculated by comparing the g c value with its neighbourhood pixels which is located at a distance R from c. If g c is higher than the neighbouring pixel P i , then the neighbour pixel is assigned a value 0, otherwise a value 1. Subsequently, a P-bit binary code is generated for the current pixel c. LBP P,R is used to denote this binary code and its calculation can be described as follows: When P is set to 8, the total number of different binary patterns that can be generated using LBP is 256 (2 8 ). Therefore, an LBP histogram containing 256 bins is obtained and used as a local texture descriptor. Figure 11 shows LBP images and their LBP histograms for mammograms in different density categories. When P is set to 8, the total number of different binary patterns that can be generat using LBP is 256 (2 8 ). Therefore, an LBP histogram containing 256 bins is obtained a used as a local texture descriptor. Figure 11 shows LBP images and their LBP histogra for mammograms in different density categories. Larger neighbourhood areas represented by higher values of R provide more lo image information and generate a longer texture feature vector. For example, when P = setting R = 1 and R = 2 separately, two LBP histograms are obtained and concatenat producing a 512-length feature vector. This descriptor is called a multi-resolution L (MLBP), and it contains more texture information and consequently has higher featu dimensionality compared to LBP.
In basic LBP, a circular neighbourhood with a radius of R is used for locating neig bouring pixels. As a variant of LBP, elliptical LBP (ELBP) is developed in [25], aiming extract more local information from different directions. In ELBP, R1 and R2 denote t lengths of the semi-minor and semi-major axis of the elliptical image region, and the EL is calculated as follows: , , , , , 2 Figure 12 illustrates the configuration of image pixels used in the computation of t basic LBP and its variants discussed above, and they will be used and compared in t classification work. Larger neighbourhood areas represented by higher values of R provide more local image information and generate a longer texture feature vector. For example, when P = 8, setting R = 1 and R = 2 separately, two LBP histograms are obtained and concatenated, producing a 512-length feature vector. This descriptor is called a multi-resolution LBP (MLBP), and it contains more texture information and consequently has higher feature dimensionality compared to LBP.
In basic LBP, a circular neighbourhood with a radius of R is used for locating neighbouring pixels. As a variant of LBP, elliptical LBP (ELBP) is developed in [25], aiming to extract more local information from different directions. In ELBP, R 1 and R 2 denote the lengths of the semi-minor and semi-major axis of the elliptical image region, and the ELBP is calculated as follows: Figure 12 illustrates the configuration of image pixels used in the computation of the basic LBP and its variants discussed above, and they will be used and compared in this classification work.

Feature Descriptor with Concatenated Texture Features
As discussed above, multi-fractal features and LBP features are used as texture descriptors in this work to classify mammogram images. In addition to applying the two texture feature descriptors separately, we also create a novel feature descriptor by concat-

Feature Descriptor with Concatenated Texture Features
As discussed above, multi-fractal features and LBP features are used as texture descriptors in this work to classify mammogram images. In addition to applying the two texture feature descriptors separately, we also create a novel feature descriptor by concatenating multi-fractal measure-based feature set with the LBP based feature set. By combining two different texture feature groups, we assume that more useful image information can be captured to reflect different tissue texture patterns related to breast density characteristics, thus improving the classification performance. In our experiments (Section 4), the four different multi-fractal measures (i.e., Max, In-Min, Iso, Sum) discussed in Section 3.2 are concatenated with LBP based texture features separately with the aim of obtaining a higher classification accuracy.

Feature Selection
The concatenation of several texture feature vectors could result in high feature dimensionality and a high level of feature redundancy. In this section, we consider using PCA and the autoencoder network to select the optimum subset of texture features by removing redundant features which possibly do not relate to the breast density characteristics well.

Principal Components Analysis
Principal components analysis (PCA) is used for efficient coding of various biological signals [38]. It is a well-known optimal linear scheme for dimension reduction in data analysis, which retains maximal variance in the data set, while improving algorithm performance and saving processing time.
In the proposed method, X is used to denote the input feature set. X is an M × N matrix which has N dimensional features and M elements in each dimensionality. We use X i to refer to the entire set of elements in the ith dimension and X j i to refer to the jth element in this set. The covariance between the first two dimensions X 1 and X 2 is computed as below.
cov X 1 , where, X 1 and X 2 denote means of the set of X 1 and X 2 respectively. After computing all the possible covariance values between different dimensions, a covariance matrix CM can be obtained as follows: Eigenvalues and eigenvectors of the covariance matrix are calculated subsequently, and eigenvectors are sorted in descending order according to the eigenvalues. A matrix V can be constructed with these eigenvectors in columns. The final feature set X can be derived from X and the matrix V as follows: In PCA, an assumption made for feature selection is that most information of the input feature set is contained in the subspace spanned by the first n principal axes, where n < N in an N-dimensional feature space. Therefore, each original feature vector can be represented by its principal component vector with the dimensionality of n.

Autoencoder Network
Autoencoder network [39] is a feed-forward neural network with more than one hidden layer, which attempts to reconstruct input data at the output layer. The output layer is usually of the same size as the input layer and the network architecture represents an hour-glass shape. As the size of the hidden layer in an autoencoder neural network is smaller than the input layer, the high-dimensional input data can be reduced to narrower code space when using more hidden layers. Therefore, in addition to image reconstruction and compression [40], an autoencoder is also used to reduce feature dimensionality. Generally, an autoencoder network consists of two components, namely "encoder" and "decoder". By reducing the hidden layer size, the encoder part is forced to learn important features of the input data, and the decoder part reconstructs the original data from the generated feature code. Once the training phase is over, the decoder part is discarded and the encoder is used to transform a data sample to a feature subspace.
We use an autoencoder to reduce the length of the feature vector. Therefore, a feature vector of size (1 × n) is input into the autoencoder network. Figure 13 shows an autoencoder model containing 11 hidden layers, in which the input layer receives and processes the initial texture features. Fully connected (FC) layers are used as hidden layers and rectified linear unit (ReLU) is used as the activation function in each hidden layer except the last layer which uses a sigmoid function. Binary cross entropy is employed as a loss function. n < N in an N-dimensional feature space. Therefore, each original feature vector can represented by its principal component vector with the dimensionality of n.

Autoencoder Network
Autoencoder network [39] is a feed-forward neural network with more than one hi den layer, which attempts to reconstruct input data at the output layer. The output lay is usually of the same size as the input layer and the network architecture represents hour-glass shape. As the size of the hidden layer in an autoencoder neural network smaller than the input layer, the high-dimensional input data can be reduced to narrow code space when using more hidden layers. Therefore, in addition to image reconstructi and compression [40], an autoencoder is also used to reduce feature dimensionality. Ge erally, an autoencoder network consists of two components, namely "encoder" and "d coder". By reducing the hidden layer size, the encoder part is forced to learn importa features of the input data, and the decoder part reconstructs the original data from t generated feature code. Once the training phase is over, the decoder part is discarded a the encoder is used to transform a data sample to a feature subspace.
We use an autoencoder to reduce the length of the feature vector. Therefore, a featu vector of size (1 × n) is input into the autoencoder network. Figure 13 shows an autoe coder model containing 11 hidden layers, in which the input layer receives and process the initial texture features. Fully connected (FC) layers are used as hidden layers and re tified linear unit (ReLU) is used as the activation function in each hidden layer except t last layer which uses a sigmoid function. Binary cross entropy is employed as a loss fun tion.

Classification
SVM is used in this study for training the classification model and producing clas fication results on test images. Since this work aims to classify mammograms into multip density categories, a multiclass SVM which is implemented by a one against all (OA method is used in this model. To obtain the optimal classification results, three common

Classification
SVM is used in this study for training the classification model and producing classification results on test images. Since this work aims to classify mammograms into multiple density categories, a multiclass SVM which is implemented by a one against all (OAA) method is used in this model. To obtain the optimal classification results, three commonly used kernels [41], RBF, Poly, and Sigmoid, are tested in this work. A grid-searching method is used to find the best combinations of parameters (gamma, C, and degree).

Experiments and Result Analysis
In our experiments, a five-fold cross validation is performed on INbreast dataset using the proposed texture descriptors and the SVM classifier. Table 1 lists the important parameters of the methods used and their selected values or value ranges.

Effect of Feature Selection
This section investigates the effect of feature selection using PCA and autoencoder network on classification accuracy. As discussed in previous sections, the cascaded texture feature set (multi-fractal features + MLBP) occupies a large feature space which may contain redundant image features, and need to be optimized before using it in the classification algorithm.
PCA re-computes the initial feature set and outputs new features in decreasing order, which can be used to select the optimum n features for the classification work. Figure 14 shows the top 70 features after applying PCA, in which range the best classification results shown in Table 4 are obtained when using the cascaded feature descriptors. In Figure 14, we can see that the highest classification accuracy (84.6%) based on the Iso+MLBP descriptor is obtained using the top 45 (i.e., n = 45) features. The Max+MLBP descriptor also uses n = 45 features to reach its best classification performance. For the In-Min and Sum based descriptors, 55 and 65 features respectively are used to achieve their best classification results. As the initial feature set contains 612 features, over 90% feature space is removed by PCA, which contributes to a significant reduction in feature dimensionality and the improvement of classification accuracy.
We also use the autoencoder network to select texture features for the cascaded feature descriptors. Different number of hidden layers in the autoencoder structure are tested using values derived from the set {5, 7,9,11,13, and 15} and classification results based on the Iso+MLBP descriptor are shown in Table 5. An 11-layer architecture with the code layer size of 64 produces the highest classification result (80.7%). The results indicate that the autoencoder network can optimize the feature space for the cascaded feature descriptors and obtain a desirable classification accuracy. However, the classification performance does not surpass the accuracy (80.7% vs. 84.6%) by using PCA. We also use the autoencoder network to select texture features for the cascaded f ture descriptors. Different number of hidden layers in the autoencoder structure are tes using values derived from the set {5, 7,9,11,13, and 15} and classification results based the Iso+MLBP descriptor are shown in Table 5. An 11-layer architecture with the co layer size of 64 produces the highest classification result (80.7%). The results indicate t the autoencoder network can optimize the feature space for the cascaded feature scriptors and obtain a desirable classification accuracy. However, the classification p formance does not surpass the accuracy (80.7% vs 84.6%) by using PCA.

Results Comparison and Discussion
In this section, we further compare and discuss the experimental results presented Sections 5.1-5.3 using different classification metrics such as accuracy (CA), AUCRO Kappa coefficient and F1 score. In addition, statistical hypothesis test is conducted. Th test with a significance level of 0.05 is conducted between the Iso+MLBP and every ot method to calculate p-value, which shows the statistical difference between them. Tab shows that the Iso+MLBP joint feature descriptor outperforms other approaches, not o producing the highest AUC (95.3 ± 3.1) but also obtaining higher Kappa (0.79), and score (0.85). In the t-test, all other methods present low p values (<0.05), which means

Results Comparison and Discussion
In this section, we further compare and discuss the experimental results presented in Sections 5.1-5.3 using different classification metrics such as accuracy (CA), AUCROC, Kappa coefficient and F1 score. In addition, statistical hypothesis test is conducted. The t-test with a significance level of 0.05 is conducted between the Iso+MLBP and every other method to calculate p-value, which shows the statistical difference between them. Table 6 shows that the Iso+MLBP joint feature descriptor outperforms other approaches, not only producing the highest AUC (95.3 ± 3.1) but also obtaining higher Kappa (0.79), and F1 score (0.85). In the t-test, all other methods present low p-values (<0.05), which means the difference of classification performance is statistically significant.  Tables 2-4 and 6, we have the following findings: (1) For the LBP and its variants, we can see that the use of a different neighbourhood topologies (i.e., elliptical vs. circular) can improve the classification performance, which is consistent with the conclusion in [25]: the extracted anisotropic texture information have the potential in distinguishing objects. However, the MLBP which collects texture information from two different neighbourhood areas makes the improvement more evident. This indicates that for the breast density classification which is a very challenging task due to the heterogeneous texture patterns of breast tissue, capturing more (richer) texture features from different local regions can lead to the improvement of the classification performance. (2) The results in Table 2 lead to the following observations regarding multi-fractal features. The Iso measure produces a better classification result than the other measures. The reason for this can be ascribed to fact that the Max and In-Min measures consider only one pixel information (the maximum or the minimum intensity value) when computing the singularity coefficient (i.e., α-value), and fails to collect the image information from the other pixels within the local region.  Tables 4 and 6, we can see that the use of the combined feature descriptor improves the classification accuracy significantly, which also indicates that the texture features extracted from the MLBP and multi-fractal methods are different. The feature sets collected by the two different methods can be considered as complementary to each other. (4) As shown in Section 5.4, the combination of different texture features produce a bigger feature space which contains redundant features that do not help distinguish the breast density related characteristics between different categories. Results in Figure 14 show that the classification accuracy can even be lower than using the individual feature set if the concatenated features are not selected properly, which demonstrate the importance of removing the redundant features and the necessity of using the feature selection scheme. (5) Even though BI-RADS uses four density categories, sometimes, breast density is discussed with binary labels of low density (fatty and sparsely dense, or BI-RADS I and II) and high density (heterogeneously and extremely dense, or BI-RADS III and IV) [7]. We conduct the binary classification using the Iso+MLBP descriptor which produces the best four-category classification results in our experiments. Table 7 shows the binary classification results. Although the texture features extracted by the multi-fractal Iso method and the MLBP provide desirable binary classification accuracies (89.2% and 91.9%), the joint feature descriptor Iso+MLBP shows a more powerful representation capability for image features, with a higher classification performance of 92.9% obtained. These observations are consistent with the results obtained in four-category classification work and demonstrates the robustness of the proposed feature descriptor. A recent study [29] applied a local quinary pattern (LQP) method to extract texture features using different neighbourhood topologies for the same classification task. Their results indicate that the ellipse topology based LQP gives the best accuracy of 82.02% when using over 200 image features to test 206 images in the INbreast dataset (only MLO view images used). By contrast, our proposed method is tested on the whole INbreast dataset (409 images) and attains the accuracy of 84.6% using only 45 features. Table 8 shows the comparison of classification performance by using LQP, LBP and multi-fractal feature descriptors.

Conclusions and Future Work
In this paper, a comprehensive study towards the multi-fractal analysis is conducted to extract texture features in mammogram images for classifying BI-RADS density. Four different multi-fractal measures are used to capture breast density related image features, and LBP and its variants are employed to extract more features. Novel texture feature descriptors concatenating multi-fractal features and MLBP features are proposed to integrate rich image features based on different feature extraction methods. PCA and autoencoder network are used in this work to optimize the feature space by removing redundant features. The proposed method is tested using a FFDM dataset, INBreast, with 409 mammogram images. Experimental results show that the cascaded feature descriptor based on multifractal analysis and LBP obtains the higher classification accuracy than using individual feature set. In future work, other image feature descriptors such as LQP will be considered together with multi-fractal features. In addition, different mammogram datasets will be used in experiments to demonstrate the robustness of the proposed method.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/jimaging7100205/s1, Figure S1: Segmentation of breast region. (a) Input mammogram; (b) clustering operation based on the alpha image; (c) rough contour of the breast region; (d) smoothing breast contour; (e) finding pectoral muscle area; (f) breast mask image without pectoral muscle region; Figure S2: . Challenging cases with inaccurate mask images generated and their adjusted mask images based on manual operations. References [42][43][44]