Histopathological Breast-Image Classiﬁcation Using Local and Frequency Domains by Convolutional Neural Network

: Identiﬁcation of the malignancy of tissues from Histopathological images has always been an issue of concern to doctors and radiologists. This task is time-consuming, tedious and moreover very challenging. Success in ﬁnding malignancy from Histopathological images primarily depends on long-term experience, though sometimes experts disagree on their decisions. However, Computer Aided Diagnosis (CAD) techniques help the radiologist to give a second opinion that can increase the reliability of the radiologist’s decision. Among the different image analysis techniques, classiﬁcation of the images has always been a challenging task. Due to the intense complexity of biomedical images, it is always very challenging to provide a reliable decision about an image. The state-of-the-art Convolutional Neural Network (CNN) technique has had great success in natural image classiﬁcation. Utilizing advanced engineering techniques along with the CNN, in this paper, we have classiﬁed a set of Histopathological Breast-Cancer (BC) images utilizing a state-of-the-art CNN model containing a residual block. Conventional CNN operation takes raw images as input and extracts the global features; however, the object oriented local features also contain signiﬁcant information—for example, the Local Binary Pattern (LBP) represents the effective textural information, Histogram represent the pixel strength distribution, Contourlet Transform (CT) gives much detailed information about the smoothness about the edges, and Discrete Fourier Transform (DFT) derives frequency-domain information from the image. Utilizing these advantages, along with our proposed novel CNN model, we have examined the performance of the novel CNN model as Histopathological image classiﬁer. To do so, we have introduced ﬁve cases: (a) Convolutional Neural Network Raw Image (CNN-I); (b) Convolutional Neural Network CT Histogram (CNN-CH); (c) Convolutional Neural Network CT LBP (CNN-CL); (d) Convolutional Neural Network Discrete Fourier Transform (CNN-DF); (e) Convolutional Neural Network Discrete Cosine Transform (CNN-DC). We have performed our experiments on the BreakHis image dataset. The best performance is achieved when we utilize the CNN-CH model on a 200 × dataset that provides Accuracy, Sensitivity, False Positive Rate, False Negative Rate, Recall Value, Precision and F-measure of 92.19%, 94.94%, 5.07%, 1.70%, 98.20%, 98.00% and 98.00%, respectively.


Introduction
Cancer, being a serious threat to human life, is actually a combination of diseases, and more specifically unwanted and abnormal growth of the cells of the human body is known as cancer.Cancer can attack any part of the body and can then be distributed to any other part.Different types of cancer exist, but, among all the cancers, women are more vulnerable to Breast Cancer (BC) than men because of the anatomical structure of women.Statistics show that each year more people are newly affected by BC, at an alarming rate.Figure 1 shows the number of females newly facing BC as well as the number of females dying since the year 2007 in Australia.This figure shows that more and more females are newly facing BC, and the number of females dying of it has also increased in each year.This is the situation of Australia (population 20-25 million), but it can be used as a symbol of the BC situation of the whole world.Proper investigation is the first step in proper treatment of any disease.Investigation of BC largely depends on investigation of biomedical images such as Mammograms, Magnetic Resonance Imaging (MRIs) Histopathological, etc. Manual investigation of this kind of images largely depends on the expertise of the doctors and physicians.As humans are error prone, so even an expert can give wrong information about the diagnostic images.Besides this, biomedical image investigation always requires a large amount of time.However, Computer Aided Diagnosis (CAD) techniques are largely utilized for biomedical image analysis such as cancer identification and classification.The use of CAD allows the patient and doctor to take a second opinion.
Different biomedical image analysis techniques are available and different research groups have investigated the identification and classification of BC.The conventional image-classification techniques, such as Support Vector Machines (SVM), Random Forest (RF), Bayesian classifier, etc. algorithms, have been largely utilized for the image classification.Utilizing an SVM, a set of cancer images was first classified by Bazzani et al. and their findings have been compared with the Multi Layer Perception (MLP) technique [1].Naqa et al. [2] utilized the kernel method along with SVM techniques for better performance for the classification, where they obtained around 93.20% accuracy.A set of Histopathological images has been classified using Scale Invariant Feature Transform (SIFT) and Discrete Cosine Transform (DCT) features with an SVM for classification by Mhala et al. [3].Law's Texture features have been utilized for Mammogram (322 images) image classification and 86.10% accuracy obtained by Dheeba et al. [4].Taheri et al. [5] utilized intensity information, Auto Correlation Matrix and Energy values for breast-image classification and obtained 96.80% precision and 92.50% recall with 600 Mammogram images.A set of ultrasound images have been classified by Shirazi et al. [6], where Regions of Interest (ROI) have been extracted for reduction of the computational complexity.Levman et al. [7] classify a set of MRI images (76 images) into benign and malignant classes, utilizing Relative Signal Intensities and Derivative of Signal Strength as features.
The RF method has also been used for image classification.A set of Mammogram images has been classified by Angayarkanni et al. [8], and they achieved 99.50% accuracy using the Gray-Level-Cooccurence Matrix (GLCM) as feature.Gatuha et al. [9] utilized Mammogram images for image classification using a total of 11 features and achieved 97.30% accuracy.Breast Histopathological images have been classified by Zhang et al. [10] and they achieved 95.22% accuracy, where they utilized the Curvelet Transform, GLCM, and Completed Local Binary Pattern (CLBP) methods for feature extraction.GLCM and Gray-Level-Run-Length-Matrix (GLRLM) have been utilized along with the RF algorithm by Diz et al. [11] for Mammogram image classification with 76.60% accuracy.The Bayes method has also been used for image classification.Kendall et al. [12] utilized the Bayes method for Mammogram image classification with the DCT method for feature selection.Their obtained sensitivity was 100.00% and specificity was 64%.Statistical and Local Binary Pattern (LBP) features along with the Bayesian method have been utilized by Claridge et al. [13] on two Mammogram image sets.When they used the Mammographic Image Analysis Society (MIAS) dataset, their best achieved accuracy was 62.86%.
Other than RF, SVM, Bayes method, and the Neural Network (NN) method have largely been utilized for image classification.Rajakeerthana et al. [14] classified a set of Mammogram images and obtained 99.20% accuracy.Thermographic images have been classified by Lessa et al. [15], and they utilized the NN method along with a few statistical values such as mean, median, skewness, kurtosis, median as features and obtained 85.00% accuracy with a specificity value of 83.00%.Harlick and Tamura features have been utilized by Peng et al. [16] along with an NN network.They used Rough-Set theory for the feature reduction.Silva et al. [17] utilized 22 different morphological features such as convexity, lobulation index, elliptic normalized skeleton along with NN for ultrasound image classification and obtained 96.98% accuracy.Melendez et al. [18] utilized Area, Perimeter, Circularity, Solidity, etc. along with NN and achieved sensitivity and specificity of 96.29% and 99.00%.
As the literature shows, different methods and techniques have been utilized for image classification on different breast-image datasets using different image-classification techniques.However, the state-of-the-art image classification technique of the Convolutional Neural Network (CNN) has put its strong footprint in the image-analysis field, especially the image-classification field.Though the model "AlexNet" proposed by Krizhevsky has gained a new momentum in the CNN research field, a CNN model was first utilized by Fukushima et al. [19] who proposed the "Necognitron" model that recognises stimulus patterns.For the mammogram image classification, Wu et al. first utilized the CNN model [20].Though little work on the CNN model had been done to the end of the 20th century, this model has only gained momentum from the AlexNet model.Advanced engineering techniques have been used by research groups such as the Visual Geometry Group and Google, which have modeled the VGG-16, VGG-19 and GoogleNet models.Arevalo et al. [21] classified benign and malignant lesions using the CNN model, and this experiment was performed on 766 mammogram images, where 426 images contain benign and 310 malignant lesions.Before classifying the data, they utilized preprocessing techniques to increase the image enhancement and obtained a 0.82 ± 0.03 Receiver Operating Characteristic (ROC) value.GoogleNet and AlexNet methods have been utilized by Zejmo et al. [22] for the classification of cytological specimens into benign and malignant classes.The best accuracy obtained when they utilized the GoogleNet model was 83.00%.Qiu et al. [23] used the CNN method to extract global features for Mammogram image classification and obtained an average achieved accuracy of 71.40%.Fotin et al. also utilized the CNN method for tomosynthesis image classification and obtained an Area Under the Curve (AUC) value of 0.93.Transfer learning is another important concept of the CNN method that allows the model to not extract features from scratch, rather applying a weight-sharing concept to train a model.This method is helpful when the database contains fewer images.Jiang et al. [24] utilized a transfer learning method for Mammogram image classification and obtained an AUC of 0.88.Before utilizing it in a CNN model, they performed a preprocessing operation to enhance the images.Suzuki et al. [25] also used the benefit of transfer learning techniques to train their model to classify mammogram images and obtained sensitivity 89.9%.They performed their experiment with only 198 images.
Most image classification based on the CNN method has been performed based on global feature-extraction techniques.Recently, researchers have also shown an interest in how local features can be utilized with the CNN model for data classification.Both global and local features have been utilized by Rezaeilouyeh et al. [26] for Histopathological image classification.For local feature extraction, the authors utilized the Shearlet transform and obtained an accuracy of 86 ± 3.00%.For local feature extraction, Sharma et al. [27] used the GLCM, GLDM methods and then fed the local features to a CNN model for the Mammogram image classification, obtaining 75.33% accuracy for the fatty and dense tissue classification.Both global and local features have been used by Jiao et al. [28] for mammogram image classification and they obtained 96.70% accuracy.Kooi et al. [29] utilized both global features and hand-crafted features for Mammogram image classification.In their experiment, they also utilized the transfer learning method.
The Contourlet Transform (CT) has been used for image analysis.Using CT, the distribution of Mammograms (MIAS dataset) has been calculated by Anand et al. [30].Along with GLCM and morphological features, CT features have been utilized for the Mammogram image classification with the SVM method, and obtained a mean Accuracy around 100.00% by Moayedi et al. [31].The non-subsampled CT transform has been utilized for Breast mass classification by Leena Jasmine along with the SVM techniques [32].Pak et al. also utilized Non-subsampled CT for breast-image (MIAS dataset) classification and obtained 91.43% mean Accuracy and 6.42% mean False Positive Rate (FPR) [33].
Inspired by the usefulness of local-features utilization techniques with the CNN, this paper has also classified a set of Histopathological images (BreakHis dataset) using local features along with the CNN model.For the local-feature selection, we have utilized the CT transform, LBP and Histogram information.We have also extracted frequency-domain information and tried to find how the CNN model behaves when we provide frequency-domain information.To do so, we have organized our paper as follows: Section 1 describes related research, Section 2 describes the overall architecture for the image classification, Section 3 describes the feature-extraction and data-preparation techniques, Section 4 describes the novel Convolutional-Neural-Network (CNN) model, Section 5 describes the performance measuring parameters, Section 6 describes the performance of our model on the BreakHis dataset as well as compare with the present findings, and we conclude our paper in Section 7.

Overall Architecture
Benign and Malignant image classification has always been a challenging task.The level of complication of the data classification increases when we consider Histopathological images, as an example the left side.Conventionally, handcrafted features or local features are extracted and utilized for the input of a classifier model.However, in most of the work, using CNN-based image classification, raw images are fed directly to the CNN model.From the raw images, the CNN model tries to extract features globally.In this work, we have utilized raw images as well as descriptive handcrafted local features and frequency-domain information for the image classification along with the CNN model.Figure 3 shows the overall classifier model that has been used for the data classification.1.

Image Dataset
Case2a: Selected statistical information has been collected from the CT coefficient data and this statistical data has been further concatenated with the Histogram information.This case has been named CNN-CH.The feature matrix for each of the images is represented as

2.
Case2b: Selected statistical information has been collected from the CT coefficient data and this statistical data has been further concatenated with the LBP.This case has been named CNN-CL.The feature matrix for each of the images is represented as • Case3: Case3 utilizes frequency-domain information for the image classification, collected using the Discrete Fourier Transform (DFT) and the Discrete Cosine Transform (DCT).This case has been further subdivided into two sub-cases: 1. Case3a: DFT coefficients have been utilized as an input for the classifier model, named CNN-DF.The feature matrix for each of the images is represented as

2.
Case3b: DCT coefficients have been utilized as an input for the classifier model, named CNN-DC.The feature matrix for each of the images is represented as

Feature Extraction and Data Preparation
We have utilized three cases to analyse our data.Case1 or CNN-I directly feeds the raw data to the CNN model for further analysis.However, Case2 and Case3 utilize handcrafted features with CT, Histogram, LBP, Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT).

Data Preparation for Case2
For Case2, we have extracted a set of statistical information utilizing the value of CT coefficients that has been collected after applying CT to each of the images.CT is an extension of the Wavelet Transform (WT).WT ignores the smoothness along the contour and it provides less directional information about the images, whereas CT overcomes this problem of WT and gives better information about the contour and direction edges of an image [34].CT method utilizes multi-scale Laplacian Pyramid (LP) and a Directional Filter Bank (DFB).

•
Laplacian Pyramid (LP): The image pyramid is an image-representation technique where the represented image contains only relatively important information.This technique also produces a series of replications of the original images, but those replicated images have less resolution.
A few pyramid methods are available such as Gaussian, Laplacian and WT.Burt and Adelson introduced the Laplacian Pyramid (LP) method.In the case of CT, the LP filter decomposes the input signal into a coarse image and a detailed image (bandpass image) [35].Each bandpass is further processed and the bandpass directional sub-band signals calculated.

•
Directional Filter Bank: A DFB sub-divides the input image into 2 n+1 sub-bands.Each of the sub-bands has a wedge-shaped frequency response.Figure 4a shows the wedge-shaped frequency response for a 4-band response.
Let the input image I (x, y) feed to the LP filter (LP n ), where n = 1, 2, . . ., N, which decomposes I (x, y) images into the low-pass signal L n (x, y) and the detailed signal T n (x, y).The detailed image T n (x, y) is passed through a DFB to get the directional images.In general, the detailed image at level j, T j (x, y), is further decomposed by DFB into 2 l j -level directional images C j,k (l, j). Figure 4b shows an overall CT procedure.As CT is an iterative operation, it continuously produces low-pass signals and directional signals into some predefined level.Among the available lowpass signals and the directional signals, we have deliberately selected a set (cardinality of the set is sixteen) of statistical features: The CT operation has been performed on each of the image channels individually.For a single channel of each of the input images, we calculated sixteen MA, sixteen MI, sixteen KU, and sixteen ST values that have been used as features.Features extracted from the single channel utilizing the CT and statistical method can be written as F CS = {16 × MA + 16 × MI + 16 × KU + 16 × ST}, so a single channel image produces sixty-four feature values using CT and statistical information.As our images are RGB, we have utilized Red, Green and Blue channels, so the total number of features due to the CT utilization will be F CT = 3 × F CS .

Histogram Information
A graphical display that represents the frequency of each of the particular intensities in an image is known as a histogram.Let the feature set collected for the histogram information from a single channel be represented as F H IS .A single RGB image provides a total F H IT = 3 × F H IS features, where the cardinality of F H IT will be 768.As Case2a that is CNN-CH utilizes statistical information collected from CT as well as histogram information, the total concatenated features will be F C2a = {F CT , F H IT }, and cardinality of F C2a will be 960.We have added zero padding at the end of the feature set F C2a to reshape the F C2a vector to a 31 × 31 matrix, to produce the matrix H CH .

Local Binary Pattern
The Local Binary Pattern (LBP) is proposed by Ojala et al. [36], which represents an image I (x, y) by a two-dimensional matrix, where each entry of this newly created two-dimensional matrix is labeled by an integer.Basically, this matrix represents a local pattern and structural distribution of the image information.A single channel provides 256 LBP features.Let the feature set collected for the LBP information from a single channel be represented as F LBS .A single RGB image provides a total F LBT = 3 × F LBS features, so the cardinality of F LBT will be 768.As Case2b that is CNN-CL utilizes statistical features from CT and LBP, so the total concatenated features will be F C2b = {F CT , F LBT }, with a cardinality of 960.We have added zero padding at the end of the feature set F C2b to reshape the F C2b vector to a 31 × 31 matrix, in order to produce the matrix H CL .

Data Preparation for Case3
For Case3, we have utilized frequency-domain information as the features.To find the frequencydomain information, we have utilized the DFT and DCT transforms.

DFT for Feature Selection
Frequency-domain information reveals valuable information from the signal which can be extracted using the Fourier Transform.This frequency-domain information can be extracted both from the continuous and discrete-time signal.For the discrete time signal, DFT methods have been utilized for the frequency-domain information extraction.To avoid the computational complexity and timing issues of the DFT, we have utilized the Fast Fourier Transform (FFT) to extract the frequency-domain information.As the Histopathological image contains three channels, the FFT coefficients have been extracted from each of the three channels: h r f = FFT coefficient from red channel, h g f = FFT coefficient from green channel, h b f = FFT coefficient from blue channel.The first top "t" FFT coefficients have been selected from each of the channel where t = h 1 × h 2 : Here, H DF represent the feature matrix for the Case3a that is for CNN-DF.

DCT for Feature Selection
Strang first introduced the DCT method in 1974 [37].A few DCT methods are available, and among them DCT-II methods have been largely utilized for image analysis.As a Histopathological image contains three channels, the DCT coefficients have been extracted from each of the three channels: The first top "t FFT coefficients have been selected from each of the channels where t = h 1 × h 2 :

Convolutional Neural Network
A CNN model is a state-of-the-art method that has been largely utilized for image processing.A CNN model has the ability to extract global features in a hierarchical manner that ensures local connectivity as well as the weight-sharing property.

•
Convolutional Layer: The Convolutional layer is considered as the main working ingredient in a CNN model and plays a vital determining part of this model.A kernel (filter), which is basically an n × n matrix successively goes through all the pixels and extracts the information from them.

•
Stride and Padding: The number of pixels a kernel will move in a step is determined by the stride size; conventionally, the size of the stride keeps to 1. Figure 6a shows an input data matrix of size 5 × 5, which is scanned with a 3 × 3 kernel.The light-green image shows the output with stride size 1, and the green image represents the output with stride size 2. When we use a 3 × 3 kernel, and stride size 1, then the convolved output is a 3 × 3 matrix; however, when we use stride size 2, the convolved output is 2 × 2. Interestingly, if we use a 5 × 5 kernel on the above input matrix with stride 1, the output will be a 1 × 1 matrix.Thus, the size of the output image has changed with both the size of the stride and the size of the kernel.To overcome this issue, we can utilize extra rows and columns at the end of the matrices that contain 0 s.This adding of rows and columns that contain only zero values is known as zero padding.
For example, Figure 6b shows how two extra rows have been added at the top as well as the bottom of the original 5 × 5 matrix.Similarly, two extra columns have been added at the beginning as well as the end of the original 5 × 5 matrix.Now, the olive-green image of Figure 6b shows a convolved image where we have utilized a kernel of size 3 × 3, stride size 1 and padding size zero.The convolved image is also a 5 × 5 matrix, which is the same as the original data size.Thus, by adding the proper amount of zero padding, we can reduce the loss of information that lies at the border.

•
Nonlinear Performance: Each layer of the NN produces linear output, and by definition adding two linear functions will also produce another linear output.Due to the linear nature of the output, adding more NN layers will show the same behavior as a single NN layer.To overcome this issue, a rectifier function, such as Rectified Linear Unit (ReLU), Leaky ReLU, TanH, Sigmoid, etc., has been introduced to make the output nonlinear.

•
Pooling Operation: A CNN model produces a large amount of feature information.To reduce the feature dimensionality, a down-sampling method named a pooling operation has been performed.A few pooling operation methods are well known such as For our analysis, we have utilized the Max Pooling operation that selects the maximum values within a particular patch.

•
Drop-Out: Due to the over training of the model, it shows very poor performance on the test dataset, which is known as over-fitting.These over-fitting issues have been controlled by removing some of the neurons from the network, which is known as Drop-Out.

•
Decision Layer: For the classification decision, at the end of a CNN model, a decision layer is introduced.Normally, a Softmax layer or a SVM layer is introduced for this purpose.This layer contains a normalized exponential function and calculates the loss function for the data classification.Figure 7 shows the work flow of a generalized CNN model that can be used for image classification.Before the decision layer, there must be at least one immediate dense layer available in a CNN model.Utilizing the Softmax layer, the output of the end layer can be represented as where Here, H ( As we are working on a two-class classification problem, then only the L 1 and L 2 values are possible, and the output will be be Benign when L 1 ≤ L 2 , else the output will be Malignant.The R-n layer also produces 16 feature maps.The output X R n of the R-n layer is merged with the output X C n of the layer and produces a residual output.The output X R n of Block-n can be represented as where W n represents the weight matrix and B n represents the bias vector. The input matrix passes through Block-1 and Block-2 as shown in Figure 8 (left image).The Output of Block-1 is fed to Block-3, the output of Block-3 is fed as an input of Block-5, the output of Block-5 is fed as an input of Block-7, and the output of Block-7 is fed as an input of Block-9.
Similarly, the output of Block-2 is fed to Block-4, the output of Block-4 is fed as an input of Block-6, the output of Block-6 is fed as an input of Block-8, and the output of Block-8 is fed as an input of the Block-10.Now, the output of Block-9 and Block-10 are concatenated in the Concat layer.
After the Concat layer, a Flat Layer, a Drop-Out Layer and a Softmax layer have been placed one after another.The output of the Softmax layer has been used to classify the images into Benign and Malignant classes.

•
Model-2: Model-2 utilizes almost the same architecture as Model-1.The only difference is that, in each Block-n, the output X C n of layer C-n is multiplied (rather than added) with the output X R n of layer R-n.The output of Block-n can be represented as

Performance Measuring Parameter and Utilized Platform
The performance of a classifier is measured by some basemark criteria, which can be obtained by a two-dimensional matrix known as the Confusion Matrix [38].The content of the matrix position i = j represents how many times the target is correctly classified.Thus, it is expected that the non-diagonal positions of the Confusion Matrix should be as small as possible.Figure 9 shows a graphical representation of a Confusion Matrix and Table 2 summarizes

Platforms Used
Image pre-processing related tasks are performed in MATLAB@16 (R2016b).Out of the available platforms for CNN model development, we have selected Keras.Lastly, most of the matrix operations are performed on a GeForce GTX 1080 GPU (Taiwan), as the classification of images involves billions of matrix operations, which is not possible with a low-grade CPU.

Results and Discussion
For the classification, we utilized the BreakHis data set [39].The images of this dataset are RGB in nature, having 8-bit depth and a (Portable Network Graphics) PNG extension.The images are 700 × 460 pixels in size.All the images are divided into four groups, depending on the visual magnification factor, namely 40×, 100×, 200× and 400×, where × represents the magnification factor.We performed our experiments on the individual groups of the dataset.

Performance of 40× Dataset
Table 3 shows the performance of Model-1 and Model-2 on the 40× data-set.The overall best performance is achieved when CNN-CH along with Model-1 is utilized.In this situation, the achieved Accuracy is 94.40%, where the Recall and Precision values are 96.00% and 86.00%, respectively.For the Model-1, CNN-CL provides a similar performance.When we use Model-1, the worst Accuracy of 86.47% is achieved when we utilize CNN-DC.For Model-2, the best Accuracy of 88.31% is achieved when we utilize the CNN-I algorithm.However, the achieved Recall value is 96.00% and the Specificity value is 69.45%, which indicates that almost 31.00% of the Benign images have been misclassified as Malignant images.When we utilize the CNN-CH algorithm along with Model-2, the Recall value is 100.00% and FPR is 100.00%.This indicates that all the data, irrespective of Benign or Malignant, are classified as Malignant.In terms of Accuracy, CNN-DF and CNN-DC provide a similar performance; however, CNN-DF provides better specificity performance than CNN-DC.More specifically, CNN-DC mis-classifies almost 50.00% of the Benign data as Malignant data.

Performance of 100× Dataset
For the 100× dataset, when we utilized Model-1 and the CNN-CH algorithm, it provides almost 95.93% Accuracy, along with 94.85% Specificity and 96.36% Recall values (as in Table 4).This indicates that only 5.15% of the Benign data has been misclassified as Malignant data, and 3.64% of Malignant images have been misclassified as Benign images.When we use CNN-I, that is when we utilized raw images as input with Model-1, and the Accuracy is 87.15%.In this particular situation, the Recall value is 93.30%, but the Specificity value is 67.42%,This indicates that almost one third of the Benign images has been misclassified as Malignant images, and this low Specificity value reduces the overall performance.CNN-DC and CNN-DF show similar performance when we utilized Model-1 and the 100×.For Model-2, when we utilized CNN-I, that is when we utilized raw images as input, it produces the best accuracy among all the cases.In this particular case the Specificity value is 81.87% and the Recall value is 88.78%.CNN-DC also provides similar Accuracy to CNN-I; however, it shows very poor specificity performance of 65.71% .For Model-2, CNN-CH provides the worst performance among all the available cases with 67.96% accuracy, 43.00% Specificity and 78.00% Recall values.

Performance of 200× Dataset
For the 200× dataset, when Model-1 and CNN-CH are used together 97.90%, Accuracy is achieved, along with 94.94% Specificity and 98.20% Recall values (as in Table 5).This indicates that almost all the Malignant data have been classified into Malignant, whereas 5.06% of Benign data have been misclassified as Malignant.When we use Model-2 along with CNN-CH, CNN-CL or CNN-DC, on the 200× dataset, we get very poor performance.In all these three cases, all the data is classified as Malignant data irrespective of reality.In this scenario, the best performance is achieved when we utilized the raw image as input, that means the CNN-I case.In this case, we achieved 86.00% Accuracy, along with 81.87% Specificity and 88.78% Recall values.

Performance of 400× Dataset
When we use Model-1 and the CNN-CH algorithm on the 400× dataset, the best performance is achieved (as in Table 6).In this case, the Accuracy is 96.00%, with 90.16% Specificity and 97.79% Recall values.CNN-DF and CNN-DC provide similar performance.When we utilized raw images as an input, the Accuracy achieved is 84.43%.When we use Model-2 and the CNN-CH algorithm on the 400× dataset, the system gives the worst performance, of 67.80% Accuracy with 0.005% Specificity and 100.00%Recall values.Interestingly, CNN-I, CNN-DF, and CNN-DC provide similar performance.

Required Time and Parameters
Table 7 shows the number of parameters required and the time required to run per epoch for Model-1 and Model-2.Model-1 requires 119,666 parameters for the total operation, whereas Model-2 requires 120,466 parameters.[40] utilize a total of 361 images for their experiment.From each of the images, they have collected 1060 features and as a classifier tool they have utilized the SVM method and obtained 96.40% accuracy.Zhang et al. [41] also perform the classification operation on the same dataset utilizing ensemble methods and have obtained 97.00% Accuracy.Ciresan et al. [42] and Wang et al. [43] both perform their experiments on the ICPR12 dataset where they have utilized global features.The findings of our paper cannot be compared directly with the above-mentioned findings because they have performed their experiment on a different dataset as well as using different classification techniques.
Spanhol et al. [44], Han et al. [45] and Dimitropoulos et al. [46] perform their experiment on the BreakHis dataset.Spanhol et al. [44] obtained the best performance when they utilized the 40× datset and obtained 90.40% accuracy.Han et al. [45] achieve 95.80 ± 3.10% Accuracy on the 40× dataset.Dimitropoulos et al. [46] obtained the best Accuracy performance when they utilized the 100× dataset and the Vector of Locally Aggregated Descriptors (VLAD) method.Our experiment has been performed on the BreakkHis dataset, and obtained the best Accuracy 97.19%, which is almost comparable with the the state-of-the-art findings of Han et al. [45].Both the work by Spanhol et al. [44], Han et al. [45] and Dimitropoulos et al. [46] finds the Accuracy values only; however, in this paper, we also have findings for the Specificity, Precision, and Recall values along with finding the required number of parameters and the time required to perform the experiment.Besides this, we have also compared our results with Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) methods and found that our algorithms provide better performance than those two methods.

Figure 1 .
Figure 1.New cases of breast cancer for women and number of women dying in the last twelve years.

Figure 2
represents the Benign and the right side figure represent the Malignant images.Every supervised classification technique follows a predefined working mechanism, such as selection of dataset, features and model to perform the classification.Then, a set of performance measuring parameters is tested based on model performance parameters.The selected dataset is normally split into train and test datasets.A hypothetical model is established based on the training dataset, and later this hypothetical model's performance is evaluated by the test dataset.

Figure 2 .
Figure 2. The left side represents the Benign and right side Malignant histopathological images (This data has been collected from the BreakHis dataset).

Figure 5 .
Figure 5. Feature-Selection Procedure from images when we use DFT and DCT.

Figure 6 .
Figure 6.This figure represents the effects of kernel size, the size stride and zero padding in a convolutional operation.

− 1 k
represents the kth neuron at the (end − 1)th layer, and σ represents the nonlinear function.For binary classification, the number of class = 2. Let d = 1 represent the Benign class and else it represents the Malignant class.The cross-entropy loss of Ȳd can be calculated as L d = − ln( Ȳd ).

Figure 7 .
Figure 7. Work flow of a Convolutional Neural Network.

4. 1 .
CNN Model for Image Classification For breast-image classification, we have utilized the CNN model with the following architectures: • Model-1: Model-1 utilizes a residual block, represented as Block-n.Each Block-n contains two convolutional blocks named C-n and R-n.The C-n layer convolves the input data with a 5 × 5 kernel along with a ReLU rectifier and produces 16 feature maps.The output X C n of the C-n layer passes through the R-n convolutional layer, which also utilizes a 5 × 5 kernel along with a ReLU rectifier.

Figure 8 .
Figure 8. Architecture of Model-1 on the left and the architecture of Model-2 on the right.

Figure 10 .
Figure 10.(a-c) represent the Train and Test Accuracy, Loss and M.C.C. values when we utilized Model-1, CNN-CH on the 40× dataset.

Figure 11 Figure 11 .
Figure11shows the Accuracy, Loss and M.C.C. values for Model-2 on the 40× dataset.Among all the available Models and Cases, CNN-CH provides the worst performance on the 40× dataset when we utilize Model-2.In this particular situation, the Train Accuracy (71.00%) and the Test Accuracy (64.00%) are constant throughout the epochs.For the loss performance, the loss values for Train and Test are

Figures 12 and 13 Figure 12 .
Figures 12 and 13 shows the Accuracy, Loss and M.C.C. values for the CNN-CH case on the 100× dataset when Model-1 and Model-2 has been utilized.Initially, up to around epoch 25, the Test Accuracy values show better performance than the Train Accuracy.After that Train Accuracy shows better performance than Test Accuracy.After around epoch 50, Test Accuracy is about 96.00%, and Test Accuracy is about 95.00%.For the loss performance, up to around epoch 21, the Test loss shows better values than the Train loss.However, after epoch 21, the Train loss continuously decreases, whereas the Test loss shows poor performance.For the M.C.C. values, after around epoch 80, the Train M.C.C. value is 0.98 and the Test M.C.C. value is around 0.95.

Figure 13 .
Figure 13.(a-c) represent the Train and Test Accuracy, Loss and M.C.C. values when we have utilized Model-2, CNN-CH on the 100× dataset.

Figure 14 .Figure 15 .
Figure 14.(a-c) represent the Train and Test Accuracy, Loss and M.C.C. values when we utilized Model-1, CNN-CH on the 200× dataset.We saw earlier that using CNN-CH, CNN-CL and CNN-DC with Model-2 on the 200× dataset gave very poor performance.Figure15shows the Accuracy, Loss and M.C.C. values when we utilized the CNN-CH algorithm on the 200× dataset.For the Accuracy case, Train shows around 69.00% accuracy for all epochs up to 450.On the other hand, Test accuracy remains constant at around 67.5%

Figure 16
Figure 16 shows the Accuracy, Loss and M.C.C. values for different epochs when we utilized Model-1, CNN-CH and 400× dataset.Figure 16 shows that, up to around epoch 15, the Train and Test Accuracy, Loss and M.C.C. values remain almost the same with some exceptions.After around epoch 15, Train accuracy shows better performance than Test Accuracy.After epoch 50, the Train accuracy becomes constant at around 96.00%, whereas the Test Accuracy shows continually better performance.For the loss, as the epoch proceeds, the difference between the Train loss and the Test loss increased.For M.C.C., the Train M.C.C. value touches around 0.92.

Figure 16 .
Figure 16 shows the Accuracy, Loss and M.C.C. values for different epochs when we utilized Model-1, CNN-CH and 400× dataset.Figure 16 shows that, up to around epoch 15, the Train and Test Accuracy, Loss and M.C.C. values remain almost the same with some exceptions.After around epoch 15, Train accuracy shows better performance than Test Accuracy.After epoch 50, the Train accuracy becomes constant at around 96.00%, whereas the Test Accuracy shows continually better performance.For the loss, as the epoch proceeds, the difference between the Train loss and the Test loss increased.For M.C.C., the Train M.C.C. value touches around 0.92.

Figure 17 showsFigure 17 .
Figure17shows Accuracy, Loss and M.C.C. values for different epochs when we utilized Model-2, CNN-CH and the 400× dataset.In this particular scenario, the Train Accuracy keeps around 68.25%, whereas the Test Accuracy is around 66.50%.For the loss case, the Train loss remained around 0.625 and the Test loss remained around 0.63; with some exceptions, those values remain constant for all epochs.

Table 1 .
DCT coefficient from red channel, h gt d = Top t DCT coefficient from green channel, h bt d = Top t DCT coefficient from blue channel.Here, H DC represents the feature matrix for the Case3b that is CNN-DC.Table 1 summarises extracted local features for different cases: Number of handcrafted features.

Table 2 .
A summary of classification performance measurement parameters.

Table 3 .
Performance of various cases on 40× dataset.

Table 4 .
Performance of various cases on 100× dataset.

Table 5 .
Performance of various cases on 200× dataset.

Table 6 .
Performance of various cases on 400× dataset.

Table 7 .
Required time and number of parameters.Table 8 summarizes a few recent findings of Histopathological breast-image classification.Brook et al.

Table 8 .
Summary of a few recent findings of Histopathological breast-image classification.