Automatic Breast Tumor Screening of Mammographic Images with Optimal Convolutional Neural Network

: Mammography is a ﬁrst-line imaging examination approach used for early breast tumor screening. Computational techniques based on deep-learning methods, such as convolutional neural network (CNN), are routinely used as classiﬁers for rapid automatic breast tumor screening in mammography examination. Classifying multiple feature maps on two-dimensional (2D) digital images, a multilayer CNN has multiple convolutional-pooling layers and fully connected networks, which can increase the screening accuracy and reduce the error rate. However, this multilayer architecture presents some limitations, such as high computational complexity, large-scale training dataset requirements, and poor suitability for real-time clinical applications. Hence, this study designs an optimal multilayer architecture for a CNN-based classiﬁer for automatic breast tumor screening, consisting of three convolutional layers, two pooling layers, a ﬂattening layer, and a classiﬁcation layer. In the ﬁrst convolutional layer, the proposed classiﬁer performs the fractional-order convolutional process to enhance the image and remove unwanted noise for obtaining the desired object’s edges; in the second and third convolutional-pooling layers, two kernel convolutional and pooling operations are used to ensure the continuous enhancement and sharpening of the feature patterns for further extracting of the desired features at different scales and different levels. Moreover, there is a reduction of the dimensions of the feature patterns. In the classiﬁcation layer, a multilayer network with an adaptive moment estimation algorithm is used to reﬁne a classiﬁer’s network parameters for mammography classiﬁcation by separating tumor-free feature patterns from tumor feature patterns. Images can be selected from a curated breast imaging subset of a digital database for screening mammography (CBIS-DDSM), and K-fold cross-validations are performed. The experimental results indicate promising performance for automatic breast tumor screening in terms of recall (%), precision (%), accuracy (%), F1 score, and Youden’s index.


Introduction
As per statistics provided in 2020 by Taiwan's Ministry of Health and Welfare, cancer (malignant tumors) is the primary cause of death among Taiwanese people. In recent years, breast cancer (BC) in females is among the top four cancers (first place) and is one of the diseases that most definitely cannot be ignored. The age at which women possibly develop BC is between 45 and 69 years. As per latest figures on the cause of death from the Ministry of Health and Welfare and cancer registration data from the National Health Agency [1], the standardized incidence and mortality rates of female BC are 69.1 and 12.0 (per 100,000 people), respectively. Each year, more than The 2D CNNs may comprise several convolutional-pooling layers and a fully connected network in the classification layer, such as back-propagation neural networks and Bayesian networks, which combine the image enhancement, feature extraction, and classification tasks to form an individual scheme [16,17] which achieves promising accuracy for image classification in breast tumor screening. These CNNs are usually greater than 10 convolutional-pooling layers which perform the above mentioned image preprocessing and postprocessing tasks and then increase the identification accuracy. Hence, this multilayer design may gradually replace machine-learning (ML) methods [18,19], which perform image segmentation and feature extraction as an image preprocess for mammograms and breast MRIs and then use the fixed features obtained to train a classifier. Both CNN and ML-based image segmentation [20] can learn the specific features or knowledge representations to automatically identify the boundaries of ROI and then detect the breast lesions. Traditional ML methods have fewer parameters that can easily be optimized by the gradient descent optimization or back-propagation algorithms through training with small-to-medium-sized datasets [21,22]. Through a series of convolutional and pooling processes, the multilayer CNN can enhance and extract the desired object at different scales and different levels from low-level features (extract object's edge) to high-level information (extract object's shape) for detecting nonlinear features, which can increase nonlinearity and obtain feature representation. Then, the pooling process with maximum pooling (MP) is used to reduce the sizes of feature maps for obtaining abstract features. Thus, in contrast to the traditional machine-learning method, CNN-based methods can learn to extract the feature patterns from the raw data and improve the classification accuracy significantly. However, small-or medium-sized datasets are insufficient to train a deeplearning-based CNN. For example, from the existing literature [23][24][25][26], such as AlexNet (eight-layer CNN) [25] and ZFnet [26], it can be observed that the deep-learning-based CNN requires several convolutional-pooling layers and fully connected layers for the large-scale image classification (ImageNet image database [27,28]). This CNN can learn to optimize features during the training stage, process large inputs with sparsely connected weights, adapt to different sizes of 2D images, and reduce error rates. Furthermore, this approach demonstrates greater computational efficiency compared with the traditional fully connected multilayer perceptron (MLP) networks. Despite its many advantages, however, a deep-learning-based CNN presents several drawbacks and limitations, such as the number of convolutional-pooling layers' determination, the number of convolutional windows and pooling windows, the sizes of convolutional window assignment (3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11), the high computational complexity and large-scale dataset requirement for training the CNN-based classifier, and the poor suitability for real-time applications. Additionally, multi-convolutional-pooling processes with different sizes of convolution masks will result in a very large information loss for feature extraction, and this will result in increased complexity levels. The multilayer CNN must be performed with a graphics processing unit (GPU) to speed up the training and classification tasks by making use of a large amount of training and testing data.
Therefore, to simplify the image processing and classification tasks, this study aimed to design a suitable number of convolutional-pooling layers and a classification layer that is capable of increasing the identification accuracy of image classification, to facilitate automatic breast tumor screening. As observed from Figure 1, we utilized a multilayer classifier, consisting of a fractional-order convolutional layer, two convolutional-pooling layers, a flattening layer, and a multilayer classifier in the classification layer. In the first convolutional layer, a 2D spatial convolutional process with two 3 × 3 fractional-order convolutional masks was used to perform the enhancement task and to remove unwanted noise from the original mammography image, to distinguish the edges and shapes of the object. In the second and third convolutional layers, sixteen 3 × 3 kernel convolutional windows were used to subsequently enhance and sharpen the feature patterns twice; hence, the tumor contour could easily be highlighted and distinguished for feature pattern extraction. Consequently, two MP processes were used to reduce the dimensions of the feature patterns, which conducted network training to avoid failing in overfitting problems [29,30]. In the classification layer, a multilayer classifier with an input layer, two hidden layers, and an output layer is implemented to perform the pattern recognition task, which separates tumor-free feature patterns from tumor feature patterns. To reduce the error rates, an adaptive moment estimation method (ADAM) can compute the adaptive learning rates for updating network parameters by storing an exponentially decaying average of past squared gradients [31,32], which combines two stochastic gradient descent approaches, including adaptive gradients and root mean square propagation. Its optimization algorithm uses randomly selected training data subsets to compute the gradient, instead of using the entire dataset. The momentum term can speed up the gradient descent by converging faster. The ADAM algorithm has a simple implementation, computation efficiency, and fewer memory requirements, and is appropriate for operations with large datasets and parameters for training the multilayer CNN models. A total of 78 subjects is selected from the MIAS (Mammographic Image Analysis Society) Digital Mammogram Database (United Kingdom National Breast Screening Program) for experimental analysis. The clinical information was confirmed and agreed upon by expert radiologists for biomarkers, such as image size, image category, background tissue, class of abnormality, and severity of abnormality [33,34]; the image database included a total of 156 mammography images (including right and left images), including 94 normality cases and 62 abnormalities involving benign and malignant cases. The ROIs were extracted by a 100 × 100 bounding box, and then the 932 feature patterns were extracted by using the proposed convolutional-pooling processes including 564 abnormalities and 368 tumor-free patterns. By making use of cross-validation, the dataset was randomly divided into two halves: 50% of the dataset was used for training the classifier, and 50% of the dataset was used for evaluating the classifier's performance. Thus, tenfold cross-validation is used to verify the performances of the proposed multilayer deep-learning-based CNN with the proposed convolutional-pooling layers in terms of recall (%), precision (%), accuracy (%), F1 score, and Youden's index [35,36]. Therefore, the optimal architecture of multilayer CNN can be determined, and may potentially be applied to establish a classifier for automatic breast tumor screening in clinical applications. The remainder of this study is organized as follows: Section 2 describes the methodology, including the design of the multilayer deep-learning-based CNN, the adaptive moment estimation method, the classifier's performance evaluations, and the computer assistive system. Sections 3 and 4 present the experimental setup, testing of different multilayer CNN models and determination of the suitable CNN architecture, testing of the first convolutional layer for image enhancement, determination of the mask types, feasibility tests, and experiment results for clinical applications, and the conclusions, respectively.

Design of the Multilayer Deep-Learning-Based CNN
Multilayer deep-learning-based CNN includes the multiple convolutional layers, pooling layers, and pattern recognition layers (classification layers). It is a multilayer model for image classification that combines multiple functions, such as image feature enhancement, feature extraction, parameter simplification, and pattern recognition. In recent years, deep-learning and broad-learning technologies have been applied for medical image processing, segmentation, and classification [3][4][5][6][7][8][9][13][14][15]. Compared with traditional MLP networks, CNN can process feature enhancement and extraction at the front end of the network; therein lies its advantage in processing 2D medical images. However, image processing generates a considerable number of parameters, increasing the amount of computations and the time required for computing. With pooling processes, the number of features is reduced in the middle of the network to reduce the computational time. Finally, the image classification layer assists with screening tasks using a pattern to separate the normality from abnormalities. This study uses the multilayer CNN model to develop an automatic breast tumor screening based on 2D mammogram images, as shown in Figure 1. This automatic screening process includes: (1) Region of interest (ROI) extraction; (2) feature enhancement and extraction; and (3) rapid screening of breast tumors. Each function is described as follows: ROI extraction: In this study, we use digital data in the mammogram X-ray image database (161 female patients, 322 images) [33,34] provided by the MIAS. The biomarkers of MIAS database clearly mark the positions and tumor sizes [33]. The statistical results of probability distribution of tumor locations, in accordance with the MIAS database's biomarkers, are shown in Figure 2a. Looking at the distribution probability of tumor locations, the most frequent location of tumors is the right and the upper outer quadrant of the breast. Given a 2D image with 4320 pixels × 2600 pixels, we define the priority for automatically extracting ROI with a specific bounding box based on the distribution probability. Areas with greater probability are first in line for ROI extraction, and the priority order is stored in the work queue. The priority order is shown in Figure 2b. ROI image extraction and tumor detection is performed as per the priority order.

•
Feature enhancement and extraction: A multilayer 2D convolution operation is used to magnify the texture of what might be tumor tissue and edge information (usually two or more layers are used), as shown in Figure 2a. Each layer uses a 3 × 3 sliding window to perform the operation of the convolutional weight. First, a 2D fractional convolution operation is performed to magnify the tumor characteristics. Then, by combining multilayer convolutional weight calculations, the contour of the tumor is gradually strengthened, noise is removed, and the image is sharpened. These effects help strengthen the target area and retain non-characteristic information. This study applies the 2D spatial fractional-order convolutional processes in the fractional convolutional layer, selects the appropriate fractional order parameters, and performs convolution in the x and y directions, thus yielding a combination of 2D weight values in space, the general formula being [35][36][37][38]: where h = 3 is the dimension of the convolution window, v is a fractional parameter and v ∈ (0, 1), and I(x, y) ∈ [0, 255] is the pixel value at point (x, y) in a 2D image. Each fractional-order convolutional mask multiplies each element, M(i, j) or M(j, i), by the corresponding input pixel values, I(x, y), and then obtains an enhanced feature pattern containing spatial features in the x-axis and y-axis directions. These 2D spatial convolutional processes act as two low-pass frequency filters [39] and then remove the high-spatial-frequency components from a breast mammogram. In this study, the image dimension is n × n, x = 1, 2, 3, . . . , n, and y = 1, 2, 3, . . . , n. M x and M y are 3 × 3 convolutional windows that can be written as follows [35][36][37][38]: where v ∈ (0, 1) is the fractional-order parameter. A sliding stride = 1 is selected for spatial domain-based convolution operations in the horizontal and vertical directions. The results of the convolution operation of (1) and (2) are combined and normalized, and the approximate formula is written below: These multilayer convolutional layers are also called the perception layers of the CNN network for feature enhancement and extraction. After feature extraction, the 2 × 2 sliding window is used to perform maximum pooling (MP), as shown in the general formula (6): After MP, the number of feature patterns is reduced to 25% of the total number of original feature images. This reduction in the dimensions of the feature patterns can overcome the overfitting problem for training a multilayer classifier.

•
Rapid screening of breast tumors: Breast tumors can be identified at the image classification layer, which includes the flattening process (FP) and a multilayer classifier, as seen in Figure 1. The FP can convert a 2D feature matrix into a 1D feature vector, which is then fed as the input vector of the classifier for further pattern recognition.
After two MP treatments, the FP treatment may be written as shown in the general formula (7): where X is the 1D feature vector of the multilayer classifier used as input. In this study, the multilayer classifier includes an input layer, two hidden layers (i.e., the first and second hidden layers), and an output layer.
In two hidden layers, the Gaussian error linear unit (GeLU) function [40][41][42] is used as the activation function in each hidden node. This activation function performs a nonlinear conversion, as shown in Figure 3, which can be expressed as follows: where x i is the 1D feature vector used as input, i = 1, 2, 3, . . . , n, The training of the multilayer classifier uses the back-propagation algorithm to adjust the connecting weighted parameters of the classifier and set the loss function as the convergent condition for terminating the training stage. For multi-class classification, multiple classes of binary cross-entropy functions [7,[43][44][45] are shown in Equation (9): where t j,k is the target value (desired class), T = [t 1,k , t 2,k , t 3,k , . . . , t m,k ] for multiple classes; y j,k is the outputted prediction value, Y = [y 1,k , y 2,k , y 3,k , . . . , y m,k ]; and m is the number of classifications. This study sets m = 2, either normal or abnormal, coding as Y = [1, 0] and Y = [0, 1], respectively, k = 1, 2, 3, . . . , K, is the number of training data, and W is the weighted parameter matrix of the classifier with a fully connecting network.

Adaptive Moment Estimation Method
In the classification layer, the network connecting weighted parameters are adjusted by using BPA to minimize the loss function. The smaller the cross-entropy value, the smaller the classification error rate, and the higher the accuracy that can be obtained. The adjustment formula of weight parameters of the classifier uses the adaptive moment estimation (ADAM) optimization method, as follows [31,46]: 1−β 2 are adjustment parameters; η is the learning rate of the classifier; δ is the smoothing value; β 1 = 0.900 and β 2 = 0.999 are the attenuation rates of each iteration; p = 1, 2, 3, . . . , p max ; and p max is the maximum number of iterations. Each iteration computation adjusts the weighted parameters of the classifier within a limited range with the parameters of Equation (11), as shown in Equations (12) and (13) [31,46]: With the above-mentioned formulas, the best parameters can be quickly obtained using matrix operations and the loss function (9) can be minimized.
The proposed multilayer classifier used in this study is a fully connecting network. The number of nodes in the hidden layer in the middle of the network is set as per the graph's data type and complexity. Optimization algorithms are used to adjust the connecting weighted parameters of the classifier to minimize the loss function. The input ROI image size used in this study is 100 pixels × 100 pixels. There is one fractional-order convolutional layer, two convolutional layers, two maximum pooling layers, a flattening layer, and a fully connected multilayer classifier. The relevant information about the proposed multilayer CNN is shown in Table 1.

Classifier's Performance Evaluations
This study uses the cross-validation ten times to evaluate the performance of the proposed multilayer CNN-based classifier. Each time, the dataset is divided into the two groups of normal and abnormal feature patterns. The dataset is then randomly divided into two halves: 50% training dataset and 50% testing dataset. Repeat the procedure ten folds to confirm the performances of the proposed classifier, such as the evaluation indicators shown in the formulas for precision (%), recall (%), F1 score, and accuracy (%) [35,36,47] in Table 2. For each fold cross-validation, the multilayer CNN-based classifier will produce a confusion matrix comprising four parameters, including TP (True Positive), FP (False Positive), TN (True Negative), and FN (False Negative). These parameters help to determine the indexes for evaluating the performances of the proposed classifier. The precision (%) indicator represents the rate that a TP can be correctly identified among all TPs (positive samples). Generally, a model must be larger than 80% to be recognized as a good classifier. The recall (%) indicator is defined as TP/(TP + FN). The F1 Score (∈ [0,1]) is the harmonic mean of precision (%) and recall (%) indexes. Its index combines the two in a single evaluation index. The higher the value of the F1 score and the closer it is to 1, the better the classifier is at prediction.

Computer Assistive System for Automatic Breast Tumor Screening
This study uses the LabVIEW 2019 (NI TM ) software to develop a computer assistive system for automatic breast tumor screening, integrating: (1) ROI image extraction, (2) feature enhancement and extraction, and (3) breast tumor screening classifier and other functions. Algorithms for functions (1) and (2) are developed using the MATLAB Script tool. The multilayer CNN algorithm and the interface shown in Figure 4 are written by Python software. The interface works as follows: • Zone 1 : Sets the source path of breast mammography images; • Zone 2 : Loads and displays the selected mammography images; • Zone 3 : As per the priority order, extract ROI images and perform automatic tumor screening. In this study, six areas at which tumors are most possibly identified are designated. The CAS automatically prioritizes the ROI cutting feature patterns (100 pixels × 100 pixels), as seen in Figure 2, and then screens those areas. The block marked 3 can show the output of the classifier, the identification result, and the classification information. The red and green circles show the normality and abnormality. The output value of the classifier must be >0.5 to have a high degree of confidence that there is a suspected breast tumor. The human machine interface designed in this study can be used in clinical applications when switched to manual mode. The user can manually select six ROI blocks and screenshots and then save these images in a temporary database. When the number of screenshots reaches the set number (i.e., default of six ROIs), the multilayer CNN classifier performs the classification task as per the priority order of the queue and returns the identification results. The clinician then receives messages to confirm the existence of a possible tumor.

Experimental Setup
In the MIAS database, the most common image size was 4320 pixels × 2600 pixels. Thus, in this study, the dimensions 4320 pixels × 2600 pixels were selected for breast tumor screening [33,34]. The vertical and horizontal resolutions of each image were identical at 600 dpi, with a bit depth of 24 bits. A total of 156 mammography images (78 subjects), including 62 images with malignant (M) or benign (B) tumors and 94 images without tumors, were obtained. Given a specific bounding box measuring 100 pixels × 100 pixels, feature patterns were screenshots from the 156 images. In total, 932 feature patterns, including 564 tumors and 368 tumor-free screenshots, were obtained. In each classifier's training stage, 282 tumor and 184 tumor-free screenshots (50% feature patterns) were randomly selected to train the multilayer CNN classifier. The remaining 50% of the feature patterns were used to evaluate the classifier's performance for each cross-validation. This study used the relevant data, as shown in Table 1, to establish a multilayer CNN-based classifier. We designed a fractional-order convolutional layer, two general convolutional layers, and two MP processing layers for feature enhancement and extraction. The convolutional layer had 16 kernel convolutional windows. In the kernel window, the sliding window moved the number of columns and rows in steps of 1 (stride = 1) at each point of the convolution operation. The padding parameter was set to 1 to maintain the feature pattern after the convolutional operation. During the pooling process, the MP window moved with a stride of 2 (stride = 2) each time. During each feature enhancement, and extraction process, the possible tumor features and contours were gradually enhanced by the convolutionalpooling processes; hence, it can be observed that the multilayer CNN-based classifier can improve the accuracy of pattern recognition based on these enhanced features. Tenfold cross-validation was performed using precision (%), recall (%), F1 score, and accuracy (%) as indicators [35,36,48] to evaluate the prediction performance of the proposed classifier. Figure 5 shows the visualization of the confusion matrix; for example, the classifier used 466 images for rapid screening, and the results show 178 TPs, 13 FPs, 269 TNs, and 6 FNs. The precision (%), recall (%), F1 score, and accuracy (%) can be calculated from the confusion matrix.

Experimental Results
This study compares the training time, accuracy, training curve, and prediction performance of multilayer CNN-based classifiers using different numbers of convolutional layers and pooling layers, different types of convolutional windows, and different sizes of convolutional window dimensions, as seen in Table 3. The comparison items are briefly described as follows: • The number of convolutional layers and pooling layers: This study increases the number of convolutional layers and pooling layers from 1 to 5 and the sizes of convolution windows from 3 × 3 to 11 × 11. The processing windows for the pooling layers are set to 2 × 2, and the second to fifth convolutional layers have 16 kernel convolution windows to perform feature enhancement and extraction. • The first convolution windows: This study selects three types of convolutional windows, including fractional-order (v ∈ (0, 1)), Sobel (first order, v = 1), and Histeq [35,36,38,47,49,50] windows to pre-enhance the feature pattern.  In general, a multilayer CNN may have dozens of layers of convolutional layers. As shown in Table 3, this study designs five different multilayer models of convolutional layers and convolutional window sizes. It compares the training time and accuracy of five models to confirm the feasibility of the CNN model constructed in this study. Moreover, we establish three models for feature enhancement and extraction, as seen in Table 4, by combining different kernel convolutional windows and dimensions (3 × 3 and 5 × 5) and comparing the performance of these different models. These tests will help determine the best model for clinical applications in automatic breast tumor screening. In addition, we also use a multi-core personal computer (Intel ® Q370, Intel ® Core™ i7 8700, DDR4 2400 MHz 8G*3) as a development platform to implement the multilayer CNN-based classifier suggested in this study and use the graphics processing unit (GPU) (NVIDIA ® GeForce ® RTX™ 2080 Ti, 1755 MHz, 11 GB GDDR6) to speed up the time it takes for digital image processing. The feasibility study was validated as described in detail in the subsequent sections.

Testing of Different Multilayer CNN Models and Determination of the Most Suitable Architecture
As shown in Table 3, this study designs five models comprising 1-5 convolutional layers. The first layer is a fractional-order convolutional layer with two 3 × 3 convolutional windows; it is used to perform the 2D spatial convolution operations. The second to fifth convolutional layers use 16 kernel convolutional windows with different sizes of convolutional windows (3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11) [51] and 16 MP windows for feature enhancement and extraction. Finally, a fully connected network with two hidden layers using the adaptive moment estimation optimization method [46,48] adjusts the connecting weighted parameters with Equations (11)- (13) such that the predetermined classification is obtained. This study uses the same training dataset, specifically, 466 feature patterns (282 tumor and 184 tumor-free screenshots), to train and test the five different CNN models. We randomly generate initial parameters and train each model at least five times, thus recording the required training time and classification accuracy rate to compare the average training CPU time (min) and average accuracy (%) of models, as seen in Table 4. The testing results indicate that the three-layer convolutional layer model (Model #3 shown in Table 3) has an average training CPU time of lower than 7 min and an average accuracy (%) of larger than 95% with 466 untrained feature patterns. While the average accuracy (%) of the four-and five-layer models can reach larger than 95%, these two models require more training CPU time. Therefore, Model #3 is the most suitable model for developing an automatic breast tumor screening classifier in clinical applications.

Testing of the First Convolutional Layer and Determination of the Window Type
As seen in Table 5, three types of convolution windows in the first convolutional layer, including fractional-order windows, Sobel windows, and Histeq windows, are used perform the 2D spatial convolutions [35,36,47,[49][50][51][52]. Figure 6a shows the original image and image enhancement results of these three types of convolution windows. Figure 6b shows the pixel grayscale value distribution map after the image is magnified. Compared with the original image grayscale value (0-255) distribution map, the convolution result of the first derivative-based Sobel convolution window [49] has a smoothing effect and is anti-noise but requires a considerable amount of calculations while performing convolutions; moreover, this window type produces a thicker edge contour, which results in lower accuracy in identifying the position of the target object. We can use a second-order-based convolutional window for feature enhancement, but this window is fairly susceptible to noise and thus unsuitable for obtaining the edge contour of the target; this type of window is generally used for binarization applications. The Histeq convolution (histogram regularization) [50] yields a histogram of the number of times each grayscale value appears. This histogram can describe the statistical information of the grayscale values of the image and allows the direct observation of the characteristics of the image, such as its brightness and contrast. It is primarily used for image segmentation and adjustment of grayscale values in the image. As shown in Figure 6b, the non-zero value of the histogram has a wide and uniform distribution, which indicates that the contrast of the image is high. The pixel value of the image may be readjusted to a value between 0 and 255 by using linear, piecewise linear, and nonlinear transformation functions [53]. These transformation functions are primarily used to magnify the contrast of the original image. The overall grayscale value distribution map shifts to the right, and the contrast of the image increases, thereby minimizing the effort required to highlight the outline of the malignant tumor, as shown in Figure 6a. However, this method is susceptible to factors such as illumination, viewing angle, and noise. The Histeq (histogram normalization) function [50] can automatically determine the grayscale transformation function and yield an output image with a uniform histogram. It is primarily used for contrast adjustments over a small range but could amplify background noise.  When fractional-order spatial convolution is conducted in 2D space, the overall grayscale value distribution moves to the right, which increases the contrast of the image and filters out noise. Thus, it shows better performance than the Sobel convolutional operation. Therefore, in the first convolutional layer, this study selects a 3 × 3 fractional-order convolution window for the first convolutional layer. In addition, the literature [35,36] proposes that setting the fractional-order parameter v = 0.30-0.40, which is also used for feature enhancement in X-ray images, could yield promising results. Thus, our study selects the parameter v = 0.35. After 2D spatial convolution and normalization operations, the contour of the sharpened target can be obtained by using Equations (1)- (5). Strengthening the target's features, retaining non-characteristic information, and removing noise are helpful for the subsequent second-and third-layer feature extraction operations and further pattern recognition.

Multilayer CNN-Based Classifier Testing and Validation
This study uses Model #1, as shown in Table 5, which adopts three convolutional layers, and the same completely connected classification layer to develop four models, as shown in Table 6. The convolutional window sizes of the second and third layers are combinations of (3 × 3, 3 × 3), (3 × 3, 5 × 5), (5 × 5, 3 × 3), and (5 × 5, 5 × 5) [51]. The image dataset is divided into two groups of equal size. The four models use 466 trained and 466 untrained feature patterns to test and confirm the performance of the classifier. A total of 1000 epochs are set for the training classifier, with the trained and untrained feature patterns. Figure 7 shows (a) the training performance of the classifier and (b) the training history curve of the classification performance validation; in (b), the solid blue line represents the results of the training performance test and the solid orange line indicates the results of classification performance validation. As the number of epoch training increases, the classifier's output accuracy (%) gradually increases. The four classifier models require an average of lower than 240 s (lower than 4 min) CPU time to complete the training and testing tasks, as seen in Table 6. Then, the trained and untrained feature patterns are randomly selected, and the accuracy (%) of the four classifier models is tested by performing 10-fold crossvalidation (K f = 10). Table 7 shows the overall cross-validation results. Figure 7a indicates that the accuracy (%) of Models #2 and #3 can be improved over 600 epochs of training. By comparison, the accuracy of Models #1 and #4 can be improved over 200-400 epochs, after which it converges, and the accuracy (%) of classification approaches the maximum. The training convergence curve of the classifier is shown in Figure 7b. The accuracy (%) of the four models may reach larger than 95%. To shorten the classifier's design cycle and reduce the memory requirements for storing classifier parameters, we recommend using the architectures of Models #1 and #4 to establish and implement the multilayer CNN-based classifiers.   Considering the experimental results listed in Table 6, the architecture of Model #1 is selected to establish the screening classifier. After training is completed, 466 untrained feature patterns, including 184 abnormal and 282 normal patterns, are randomly selected from the dataset to validate the performance of the classifier. The experimental results of the classifier produce a visual confusion matrix. The testing result of the abnormal pattern yields TP = 178 and FP = 6, while that of the normal pattern yields TN = 269 and FN = 13; these values can be used as variables in Table 2 to compute the four evaluation indices of the classifier. In this study, precision (%) = 96.74%, recall (%) = 93.19%, F1 score = 0.9493, and accuracy (%) = 95.92%. Precision (%) is the standard for predicting TP, and recall (%) is the true accuracy of TP. Both indicators may be greater than 80%. Recall (%) is also called the positive predictive value (PPV), which is the so-called TP in the detection case. The general PPV index is larger than 80%, which means the proposed classifier has promising predictive performance. The F1 score fuses the indicators of precision (%) and recall (%), and F1 score larger than 0.9000 generally indicates a good classification model. Youden's index (YI) is a fusion evaluation index of sensitivity (Sens) and specificity (Spec) [54], which reflects the performance of the classifier for detecting abnormalities. The larger the YI, the better the performance of the classifier for detection and validation and the greater its authenticity. The testing results show YI = 91.01% (Sens = 93.19%, Spec = 97.82%). Given that all evaluation indicators considered in this work exceed 90%, Model #1 indeed has an architecture that supports good classification accuracy and performance, as seen in the tenfold cross-validation (K f = 10) for averages of precision (%), recall (%), accuracy (%), and F1 score in Table 8. Hence, we suggest Model #1 to carry out a multilayer CNN-based classifier for automatic breast tumor screening. In addition, as seen in Table 9, we also set 4, 8, 16, and 32 Kernel convolutional windows and 4, 8, 16, and 32 maximum pooling windows in second and third convolutional-pooling layers, respectively, for establishing four models (Models #1-1 to #1-4). With the tenfold cross-validation, trained feature patterns are randomly selected, the average training CPU time of Models #1-1 and #1-2 is less than Model #1-3 with 16 Kernel convolutional windows and 16 maximum pooling windows. It can be seen that Model #1-4 comprises 32 Kernel convolutional windows and 32 maximum pooling windows will increase the average training CPU time and complex computational processes at each cross-validation. With the tenfold cross-validation, untrained feature patterns are also randomly selected, as seen in Tables 10-13, the proposed architecture of multilayer classifier (Model #1-3) has promising classification accuracy and performance in terms of average precision (%), average recall (%), average accuracy (%), and average F1 score. Additionally, the proposed CNN architecture with different convolutional windows in the first convolutional layer, including fractional-order, Sobel (first-order), and Histeq convolutional windows, is used to test the performance of breast tumor screening model. Through the tenfold cross-validation, the CNN classifier with a fractional-order convolutional window in the first convolutional layer, as Model #1 in Table 14, has better classification accuracy (larger than 95%) than Model #2 (larger than 85%) and Model #3 (larger than 90%).   Figure 8 shows the left breast mammogram image of a patient (File Name #mdb31-5ll [33,34]). In this case study, the right breast mammogram image (File Name #mdb316rl) is normal (background tissue: Dense-glandular (D)), the left breast has a benign tumor (B), the center coordinates of the tumor are (1900, 317), the background tissue is D, and the circumscribed masses are labeled (CIRC) [33,34]. In this study, using the automatic screening system developed [55], as per the pre-selected priority order of screening ( 1 → 6 ), the sequence of ROI blocks is shown in Figure 8, and the automatic screening results show four TP ( 1 , 2 , 3 , and 6 ) and two TNs ( 4 and 5 ). In this case, the large tumor spans four ROI blocks 1 -3 and 6 . Therefore, the screening results show TP for identifying a possible breast tumor, the reliability of the classifier output judged to be abnormal is larger than 0.50, and the abnormality is flagged by a red message. The screening system can be switched to manual mode. Similar to the automatic screening results, the four ROI blocks 1 -3 and 6 can be manually circled, screenshotted, and stored in the queue in the order of manual screenshots. The classifier automatically performs image recognition in sequence, and the corresponding recognition results and messages are returned so that the clinician can confirm the possible tumor conditions.

Discussion
This study designs a mammography classification method incorporating a multilayer CNN-based classifier for automatic breast tumor screening in clinical applications. The proposed classifier algorithm is implemented in the LabVIEW 2019 (NI TM ) software, MATLAB Script tools, and open-source Tensorflow platform (Version 1.9.0) [28] and integrated into a computer assistive system with the automatic and manual feature extraction and breast tumor screening modes. The fractional-order convolutional layer and two convolutionalpooling layers allow the image enhancement and sharpening of the possible tumor edges, contours, and shapes via one fractional-order and two kernel convolutional processes in the feature patterns. Through a series of convolution and pooling processes at different scales and different dimensions, the classifier can obtain nonlinearity feature representation from low-level features to high-level information [29]. Then, with the specific bounding boxes (automatic or manual mode) for ROI extraction, enhanced feature patterns can then be distinguished for further breast tumor screening by the multilayer classifier in the classification layer. A gradient-descent optimization method, namely, the ADAM algorithm, is used in the back-propagation process to adjust the network weighted parameters in the classification layer. With K-fold (K f = 10) cross-validation, the 466 randomly selected untrained feature patterns for each test fold, the proposed multilayer CNN-based classifier, has high recall (%), precision (%), accuracy (%), and F1 scores for screening abnormalities in both right and left breasts. Experimental results show that the proposed multilayer CNN model offers image enhancement, feature extraction, automatic screening capability, and higher average accuracy (larger than 95%) for separating the normal condition from the possible tumor classes. It has been observed from previous literature [3][4][5][6][7]10,56] that multilayer CNNs comprised several convolutional-pooling layers and a fully connecting network to establish a classifier for automatic breast tumor screening, and could also be applied for CT, MRI, chest X-ray, and ultrasound image processes, such as image classification and segmentation in clinical applications [19,23,28,35,36,51,55]. The combination of a cascade of deep learning and a fully connecting networks is also carried out by a multilayer CNN-based classifier, and a decision scheme [56]. For the screened suspicious region on mammograms, the cascade of the deep-learning method had 98% sensitivity and 90% specificity on the SuReMapp (Suspicious Region Detection on Mammogram from PP) dataset [56], and 94% sensitivity and 91% specificity on the mini-MIAS dataset [56]. This CNN-based multilayer classifier could extract multi-scale feature patterns, and increase the depth and width feature patterns by using multi-convolutional-pooling processes, which had an overall increase in accuracy. However, excessive multi-convolutional processes would completely lead to a loss of the internal data about the position and the orientation of the desired object, and an excessive multi-pooling processing would lose valuable information relating to the spatial relationships between features; thus, many processes were required to perform with GPU hardware for complex computational processes. Hence, the proposed optimal multilayer CNN architecture contained 2D spatial information in the fractional-order convolutional layer (with two fractional-order convolutional windows), and continuously enhanced the features with two-round convolutional-pooling processes (with 16 Kernel convolutional windows and 16 maximum pooling windows), which could extract the desired features at different scales and different levels. Thus, in comparison with the other deep-learning methods, the proposed multilayer classifier exhibited promising results for the desired medical diagnostic purpose. Hence, we have some advantages for the proposed CNN-based classifier, as follows:

•
The ROI extraction, image enhancement, and feature classification tasks are integrated into one learning model; • The fractional-order convolutional process with fractional-order parameter, v = 0.30-0.40, is used to extract the tumor edges in the first convolutional layer; subsequently, two kernel convolution processes are used to extract the tumor shapes; • The ADAM algorithm is easy to implement and operate with large datasets and parameter adjustment; • The proposed CNN-based classifier has better classification accuracy than the CNN architecture with Sobel and Histeq convolutional windows in the first convolutional layer.

Conclusions
The proposed CNN architecture had better learning ability for complex feature patterns in massive-sized training datasets, and also had more promising classifier performance than traditional CNN-based classifiers and a cascade of deep-learning-based classifiers. Through experimental test and validation, we suggested optimal architecture for a simplified and established multilayer CNN-based classifier, which consisted of a fractionalorder convolutional layer, two Kernel convolutional-pooling layers, and a classification layer. Hence, this optimal CNN-based classifier could replace manual screening for tasks requiring specific expertise and experience for medical image examination, which could also raise its indication in clinical applications with CBIS-DDSM and SuReMapp dataset for the proposed training classifier. Additionally, in real-world applications, clinical mammography with biomarkers are continuously obtained, the new feature patterns can be extracted and added to the current database to further train the CNN-based classifier, which can keep its intended medical purpose and can also be used as a computer-aided decision-making tool and software in a medical device tool.

Conflicts of Interest:
The authors declare no conflict of interest.