thermogram Breast Cancer Detection: A Comparative Study of Two Machine Learning Techniques

: Breast cancer is considered one of the major threats for women’s health all over the world. The World Health Organization (WHO) has reported that 1 in every 12 women could be subject to a breast abnormality during her lifetime. To increase survival rates, it is found that it is very effective to early detect breast cancer. Mammography-based breast cancer screening is the leading technology to achieve this aim. However, it still can not deal with patients with dense breast nor with tumor size less than 2 mm. Thermography-based breast cancer approach can address these problems. In this paper, a thermogram-based breast cancer detection approach is proposed. This approach consists of four phases: (1) Image Pre-processing using homomorphic ﬁltering, top-hat transform and adaptive histogram equalization, (2) ROI Segmentation using binary masking and K-mean clustering, (3) feature extraction using signature boundary, and (4) classiﬁcation in which two classiﬁers, Extreme Learning Machine (ELM) and Multilayer Perceptron (MLP), were used and compared. The proposed approach is evaluated using the public dataset, DMR-IR. Various experiment scenarios (e.g., integration between geometrical feature extraction, and textural features extraction) were designed and evaluated using different measurements (i.e., accuracy, sensitivity, and speciﬁcity). The results showed that ELM-based results were better than MLP-based ones with more than 19%.


Introduction
Breast Cancer (BC) is a big threat to women around the world causing high mortality to them.To increase survival rates, it is found that it is very effective to early detect breast cancer.Various approaches have been produced to achieve high survival rates.It is obvious that traditional techniques are time-consuming and error-prone as they require physicians and oncologists to use their experiences and naked eyes in diagnosis process.Therefore, high efficient and automated techniques detecting the breast lesions cells are needed [1].
There are four early signs of breast cancer.These signs are architectural distortion, mass, microcalcification, and breast asymmetries [2].There are many medical imaging techniques aiming to detecting these signs.Examples of these techniques include mammography, magnetic resonance imaging (MRI), tomography, and ultrasound etc. [3].These techniques usually produce images which are then analysed for recognizing benign or malignant patterns.
The mammography technique is the most well-known screening technique for detecting breast abnormalities.However, in the case of women with dense breasts, mammograms technique does not work well because dense breasts may hide tumor cells [4,5].Consequently, mammography screening could lead to high false negative/positive rate of such patients.Another disadvantage of mammography is that imaging formation needs X-ray radiation.As reported in [6], this radiation is found to increase the chance for the patient to have the cancer in future.Thus, there is a need for another screening technique to address these limitations.
Infrared-based (thermography) screening was found to be effecting in address these limitations.This thermography depends on monitoring the physiological changes in a woman's breast.From such physiological changes and before structural changes are done in the breast, breast cancer lesions can be detected in its early stage [7].One of the most important advantages of thermography is that it can be used for women at early ages, women with breast implants and women with high breast densities [8].Othre advantages are that thermography is safe ( non-ionizing and a non-invasive) and painless medical imaging technique [6].
The main idea of breast thermography techniques is that each object (the breast in our case) produces infrared radiation which measures the vascular heat radiated [9,10].It was found that the growing tumor has a higher metabolic percentage than the other surrounding area and associated rise in local vascularization [11].Based on this discovery, it was proved, by asymmetric heat patterns, that there is a difference between abnormal and normal thermogram images [11].Visual asymmetric heat patterns can be noticed in Figure 1.As can be seen in this figure, the left image shows a normal breast in which the heat distributions in the left and right breast are symmetric but asymmetric in the right image.Usually, physicians/oncologists look for such abnormalities and decide subjectively.However, it is not always possible to diagnose all types of abnormalities found in thermograms by just naked eyes.Automatic or semi-automatic techniques such as Computer-Aided Detection or Diagnosis (CAD/CADx) are beneficial as they can help physicians/oncologists on discovering more information on such abnormalities from medical images [7,12].A typical CAD system usually consist of six major phases as shown in Figure 2.This paper aims to conduct a comparative study between ELM and MLP classifiers which would give the best results in terms of performance and accuracy.To achieve this aim, a model is proposed as follow.Firstly, thermal images are pre-processed for noise reduction and enhancement.Secondly, they are segmented to get the tumor area.Thirdly, features of the tumor region are extracted to be used in the cancer detection phase (fourth phase) which is accomplished using ELM and MLP algorithms by which the tumor region was classified either benign or malign.
The contributions of this piece of research is summarized as follows.
(1) Building a 2D vector features model representing tumor's region.This model includes two type of features: three textural features and seven geometrical ones.(2) Providing a deep analysis of two machine learning algorithms: ELM and MLP for detecting breast cancer from digital thermography images.(3) Evaluating the proposed method using a large dataset consisting of 1345 digital thermography images were used from the public dataset in [13].
The remainder of this paper is organized as follows.An overview of related works is given in Section 2 while Section 3 presents a background of the used classifiers, ELM and MLP.Section 4 introduces the proposed method while Section 5 presents the experiments and the discussions of the obtained results.Section 6 gives an analysis of the performance of of using ELM and MLP in breast tumor detection systems.Finally, the conclusion and future work are highlighted in Section 7.

Related Work
Deborah A. Kennedy et al. [14] proposed a breast cancer detection method combining thermography and mammography.The results of this method showed that thermogram-based detection only achieved 83% of sensitivity, mammogram-based detection gave 90% while combined thermogram with mammogram achieved 95% sensitivity.
Sourav Pramanik et al. [15] proposed a feature extraction method known as Block Variance (BV).This method depends on the local texture.They then used it for thermogram breast cancer detection.In the classification phase, they used hybrid method of a gradient descent training rule and a feed-forward neural network.They evaluated their method using the public database, DMR but they used 100 images ( 40 malignant and 60 benign).The proposed system is evaluated using asymmetry analysis approach.The results showed that this system is good at classification accuracy (less than 0.1 false-positive rate) comparing to related work.However, this good result is achieved using a very small dataset (40 malignant and 60 benign).
Rafal Okuniewski et al. [16] proposed a thermogram breast cancer detection based on classifying contours that are visible on thermogram images taking directly from the Braster device.The images are first classified based on their contours and then their attributes are computed.These attributes are then classified using four classifiers, namely Decision Tree, Naive Bayes, Random Forrest, and SVM, to evaluate which of them gives the best cancer detection results.This evaluation showed that the Random Forest classifier achieved the best results measured under sensitivity and specificity.However, this good result is achieved using a private dataset collected from the Braster device.
Acharya et al. [17] suggested a thermogram-based method to detect breast cancer early.They extracted several statistical features including the energy, mean, homogeneity, and entropy from each image.They then used the SVM to detect the images with a cancer tumor.In the experiments, they used 50 thermograms (25 normal and 25 abnormal).The results showed that the SVM achieved specificity of 90.48% and sensitivity at 85.71%.A compression sensitivity results achieved by an expert radiologist showed that the proposed method is better by 8%.Although these are good results, they have been achieved using a small dataset (25 malignant and 25 benign).Thus the results cannot be generalized.
Gaber et al. [1] suggested an automatic segmentation and then classification of normal and abnormal breast.The segmentation is based on an optimized Fast Fuzzy C-mean algorithm and Neutrosophic sets.Then SVM classifier is used to differentiate between normal or abnormal images (i.e., patients).The proposed method was evaluated using sensitivity and accuracy with the highest accuracy results at 88.41%.However, the dataset used in this evaluation was small (29 healthy and 34 malignant), thus the results cannot be generalized.
Gogoi et al. [18] proposed the use of the SVD to distinguish abnormal thermograms from the normal ones.Under the measurements of accuracy, specificity and sensitivity, the experimental results showed the proposed method achieved 98.00%.Although the dataset used (45 abnormal and 100 normal) was bigger than the ones used in [15,17].However, these results still can not be generalized with this small dataset.
Sathish et al. [19] conducted a comparative study between Ensemble Bagged Trees and AdaBoost for the detection of breast cancer from thermograms.They utilized two type of features, spectral and spatial features, in the classification phase.The evaluation results showed that Ensemble Bagged Trees classifier is better than AdaBoost one.This evaluation was done based on the accuracy of 87%, sensitivity of 83% and specificity of 90.6%.However, the size of the used dataset was not given.

Multilayer Perceptron (MLP) Classifier
MLP is an type of ANN (Artificial Neural Network) composing thee layers: input, output and hidden.The number of the hidden layers depends on applications as well as the designer of this ANN.Each node in MLP classifier executes two functions.The first function calculates the weighted sum of the input (P) along with the bias (θ) [20].
where P i indicates to the input layer, W ik denotes to the relation weight from input layer i to jth hidden layers, θ k indicates the bias of kth hidden layers and m is the number of input layer (neurons).
The second function, called activated function, is used to produce the output for each neuron, i.e., Then the output layer is calculated using the following equation: where W kl denotes the relation weight from kth hidden layer to l output layer, B k is the bias of lth hidden layer and l is the output neuron.In the output layer, an activated function is used to produce the output layer for each neuron.
To train the optimal values of MLP, the weights and bias are required for desirable output.Figure 3 gives a schematic explanation for MLP model.

Extreme Learning Machine (ELM) Classifier
ELM is a Single Hidden Layer Feed Forward Neural Networks (SHLFNNs) learning algorithm.It works as follows: It randomly specifies the input weights and hidden layer biases and then analytically tests the output weights of SHLFNNs.ELM can achieve better performance than the other traditional learning algorithms in terms of fast learning speed.Also, ELM is less susceptible to user-specified parameters and can be deployed faster and more appropriately [21].ELM can be expressed using the following equations.For S distinct samples with arbitrary (y k ,u k ), where y k = [y k1 , y k2 , . . ., y km ] T ∈ R m and u k = [u k1 , u k2 , . . ., u kn ] T ∈ R n the standard SHLFNNs with the number of hidden nodes L and activation function f (y) are mathematically modeled as where w j = [w j1 , w j2 , . . ., w jm ] T is the weight vector linking the jth hidden node and the input nodes, T is the weight vector linking the jth hidden node and the output nodes, b j is the threshold of the jth hidden node, and T is the jth output vector of the SHLFNNs [21].
Given L hidden nodes and activation function f (y), the SHLFNNs can approximate these S samples with zero error [21].This can be modelled using the following equations: and there are β j , w j and b j such that Then where where H is the output matrix of the neural network, the jth column of H refers to the jth hidden node output given the inputs y 1 , y 2 , . . ., y S .From the solution of the above linear-system, the smallest norm least squares β can be obtained by the following equation: where H † is known as the Moore-Penrose generalized inverse of the matrix H. Thus, the output function of ELM can be expressed as

The Proposed Method
As shown in Figure 2 earlier, there are four main stages of a typical breast cancer detection system using digital thermography: (a) image preprocessing including artifact removing, (b) segmentation (extracting Region of Interest (ROI)), (c) feature extraction (i.e.extracting different features representing each sample), and (d) breast tumor detection (i.e.classifying each sample to either normal or abnormal).In the proposed method, these phases are summarized as follows (more details are given in Sections 4.1-4.4):

•
In the image pre-processing stage: Noise reduction was achieved using the combination of homomorphic filtering and spatial domain morphology and image enhancement was accomplished using top-hat transform and adaptive histogram equalization.

•
In the segmentation stage, the tumor was segmented using binary masking and K-means clustering to produce the boundary image.

•
In the feature extraction stage, two types of features were extracted: geometrical and textural features.

•
In the tumor detection stage, MLP and ELM classifiers were used and compared.

Image Preprocessing Stage
In this stage, two main processes have done: noise reduction and image enhancement.

Noise Reduction
The noise reduction process is achieve using a homomorphic filtering technique.This technique is applied to (1) remove multiplicative noise and (2) to improve the appearance intensity range illumination.As demonstrated in Algorithm 1, the homomorphic filtering firstly applies linear techniques to map the input image to a different domain and then mapping back the output to the original image domain.Also, it involves morphological operations to delete any noise and smooth the image's edges [22].The results of applying this technique and morphological operations are shown in Figure 4.

Algorithm 1 A homomorphic filtering technique
1: Read original thermal image, img 2: The image img(v, w) is first converted into logarithm domain using the following equation: The output image is filtered by high-pass filter using Fast Fourier Transformation (FFT) using the following equation where reflectance (f) and illumination (l) components.
4: The output image is filtered again using transfer function of frequency domain filter using the following equation: Where r H > 1 and r L < 1 are the regulation parameter to change high frequency and low frequency respectively, D is balance parameter and D 0 is harmonic coefficient.
5: Inverse FFT is applied on the output of frequency domain filter as given in this equation: 6: Exponential is applied to the output of inverse FFT using this equation:

Image Enhancement
The enhancement process was achieved using Algorithm 2 which is designed based on the ideas in [22].

Algorithm 2
The enhancement process 1: Read thermal image, G(v,w) 2: Tophat transform is applied to G(v, w) to separate the objects.
Let toph1 is the output where the size and shape of the structuring element are selected based on the size and shape of the masses.
3: Dilation operator is to gradually enlarge the regions boundaries of foreground pixels to smooth the borders of the tophat transformed image.
Let toph2 is the output .4: Bot-hat transform is used to smooth the objects in the image [22].
Let toph3 is the output .5: This image is combined using Image arithmetic addition and subtraction as shown in Figure 5.

AHE executed in
Step 5 computes several histograms of an image.Each of them corresponding to a distinct section of the image, and it then uses these histograms to redistribute the lightness values of the image.Thus, it enhances the edges' definitions for each region of an image.The results of this algorithm are summarized in Figure 6.

Segmentation Stage
The segmentation stage consists of three main steps which are based on the ideas in [22]: applying K-means clustering to extract the ROI, followed by binary morphology to remove small brights details and finally applying morphological gradient to remove the constant intensity areas and enhance edges.We summarize these stages as follows: 4.2.1.K-Means Clustering K-mean clustering is used in our proposed method to divide thermal images into 3 clusters based on features to extract ROI.We considered the values of image pixels are grey-level.We have selected Euclidean distance as a distance measure to minimize the total of any pixel point to cluster centroid distances.The results of applying this k-means clustering technique and segmented mass are shown in Figure 7a-c.

Binary Morphology
To enhance thermal images, opening operations are utilized in our proposed method where they are used to remove small bright details in the image while leaving the overall pixel intensity values and large bright objects undisturbed.Mathematically, opening of image g with structuring element b is given by the following equation:

Morphological Gradient
Erosion and dilation act as a local minimum and local maximum operator respectively.Image subtraction of these two operations gives a morphological gradient.This operation is used to enhance image edges by removing the constant intensity areas.The results, as illustrated in Figure 7c, show the boundary of the lesion.The morphological gradient is given by the following equation.

Feature Extraction Stage
In this paper, we used the boundary signature-based features techniques [23] to extract features from thermogram images.A boundary signature is defined as a 1-D representation of this boundary.It can be seen as a plot of the distance from the boundary centroid to this boundary.Example of applying this technique to thermograms are shown in Figure 8a-c.They show the detected boundary of normal, benign mass and malignant mass.Figure 9a-c show the signatures of the detected boundaries.

Geometrical Feature Extraction
Based on the correlation between breast tumors and its locations and shapes, it is believed that geometrical features such as area, perimeter, P/A ratio, major axis, minor axis, LS ratio, and ENC are a very important for breast cancer detection.Below, we explain each geometrical feature which was used in the proposed method.

•
Area: The total of all pixels (p) of the segmented nucleus (n).

•
Perimeter: The nuclear envelope length is computed as a polygonal length approximation of the boundary (B).• P/A ratio: It is measured the degree to which the perimeter of the boundary (B) is exposed area ratio of the boundary (B).

•
Major Axis Length (in pixels): It is computed as the length of the major axis of an ellipse having the same second moments as the region.

•
Minor Axis Length (in pixels): It is computed as the length of the minor axis of an ellipse having the same second moments as the region.

•
LS Ratio: It is computed as the length ratio of the major axis length to the minor axis length of the equivalent ellipse of the lesion.

•
Elliptical normalized circumference(ENC): ENC features are computed in terms of the lesion circumference ratio and its equivalent ellipse.

Textural Feature Extraction
It is medically proved that the tumor distort a breast tissue.Thus, textural features become important in tumor detection.In recent years, there are many techniques for the extraction of textural features, e.g., GLCM (Gray Level Co-occurrence Matrix (GLCM) is a well-known texture features technique [24].In the proposed method, the textural features extraction is used in the classification: entropy, EBCM, and standard deviation.

•
Entropy: It is a statistical measure of randomness of an intensity image.It is used to represent the texture of a given image.Mathematically, an entropy of an image is computed as, sum(H.* (log) 2 (h)), where h denotes the histogram counts for the intensity image [13].

•
Edge-Based Contrast Measure (EBCM): This is a feature which depends on the human perception and is very sensitive to edges.Mathematically, EBCM feature of an image, I, is calculated as given in the following equation: C(h, q) is a contrast value for an image pixel located at (h, q) and is calculated as where the mean gray level is: where N(h, q) is set of all neighboring pixels with the center pixel at (p, q) and g(k, l) is the edge value which is the image gradient magnitude.For EBCM, if EBCM value of output image is greater than the original image, it indicates the better contrast of output image.

• Standard Deviation:
The standard deviation of gray-scale values, σ is the estimate of the mean square deviation of the grey pixel value v(x, y) from its mean volume.It describes dispersion within a local region.It is calculated using the formula

Classification Stage
Classification is the last stage in the breast cancer detection technique which includes two operations: training and testing.The extracted features are given inputs into the classification techniques to evaluate whether an given thermogram is normal, benign tumor, or malignant tumor.In this paper, two classification techniques (i.e., MLP and ELM) are applied to evaluate the performance and accuracy of the proposed method.
In the MLP and ELM networks, the selection of the activation functions plays a substantial role in the performance of the network.Many studies have investigated special activation functions to solve different problems.So, in this paper, we investigated different activation functions in both MLP and ELM networks to evaluate which would give high accuracy and performance.The tested activation functions are: neuronal, sigmoid, logarithmic, hyperbolic tangent, exponential, and sinusoidal.The performances of the MLP and ELM networks are calculated based on the correct classification percentage.Figure 10 summarizes the proposed method.

Dataset Description and Experiments Setup
The dataset used in the evaluation of the proposed method is DMR-IR database [25].It consists of thermogram images and it is made available for researchers to evaluate their thermogram-based breast cancer detection proposals.Ii contains 1345 images: 705 normal, 200 benign tumors, and 440 malignant tumors.The patients are women between ages of 32 and 74 and thermal IR breast images are with size 640 × 480 pixels.All database images and their annotations are confirmed by radiologists.In the evaluation of the proposed method, 1000 samples of these images are utilized to train the proposed method and the remaining 345 samples are utilized to test and validate the proposed method.All experiments were conducted on a laptop/PC Core i5-2400 CPU 3.10 GHz with 4.00 GB.The implementation was compiled using MATLAB R2015a under Windows 10.

Experiments Scenarios
We designed several experimental scenarios to examine the best MLP and ELM activation functions and parameters giving the best results as well as the performance of MLP and ELM.

Scenario 1: MLP Activation Functions
The aim of this scenario is to investigate the best MLP activation functions.The best will be determined based on the highest classification accuracy obtained.The following MLP activation functions will be examined: saturating linear (satlin), linear (purelin), symmetric saturating linear (satlins), log-sigmoid (logsig), positive linear (poslin), and hyperbolic tangent sigmoid (tansig).In this scenario, the following steps were followed.

1.
The MLP number of layers was one for input, one hidden and one for the output.2.
These layers were constant for each MLP activation function.

3.
The set of features were fixed for each MLP activation function.

4.
This experiment for each MLP activation function above.

Scenario 2: Training Time of MLP Activation Functions
This scenario is designed to understand the required training time for each MLP activation function during the classification process.In this scenario, the following steps were followed.

1.
The MLP number of layers was one for input, one hidden and one for the output) 2.
These layers were constant for each MLP activation function.

3.
The set of features were fixed for each MLP activation function.

4.
This experiment for each MLP activation function above 5.
The CPU time consumed was computed o get the final classification accuracy each MLP activation function.

Scenario 3: Numbers of Layers and Neurons per Each Hidden Layer of the Best MLP Activation Function
The aim of this scenario is to further investigate the best MLP activation functions (obtained from Scenario 1).For the best MLP activation, we aim to test which parameters would give the highest classification rate.Specifically, we will investigate the Numbers of layers and neurons for each hidden layer.In this scenario, the following steps were followed.

1.
We have tested the following parameters for the used neural network: (a) 3 hidden layers and a number of neurons 4,5 and 10 respectively, (b) 5 hidden layers and a number of neurons 4, 5 and 10 respectively and (c) 7 hidden layers and a number of neurons 4, 5 and 10 respectively.

2.
For each of the above setup, a set of features were fixed for each experiment with the same chosen MLP activation function.

3.
the CPU time consumed was computed to get the final classification accuracy for each (1.a, 1.b, 1.c)

Scenario 4: Learning Rate of the Best MLP Activation Function
The aim of this scenario is to understand the impact of the learning rate of the best MLP activation functions (obtained from Scenario 1).We aim to test which value of the learning rate would give the highest classification rate.Specifically, we will investigate the effect of the numbers of hidden layers and their chosen neurons on the classification results.In this scenario, the following steps were followed.

1.
We have tested the following parameters for the used neural network: (a) 3 hidden layers using number of neurons of 4, 5 and 10 respectively.(b) 5 hidden layers using number of neurons of 4, 5 and 10 respectively.(c) 7 hidden layers using number of neurons of 4, 5 and 10 respectively.

2.
For each of the above setup, learning rate η= 1, 3, 5, 7, 9 was tested and the obtained results were recorded.

3.
For each of the above setup, the set of features were fixed for each experiment with the same chosen MLP activation function.

4.
The CPU time consumed was computed to get the final classification accuracy for each value of η set in Step 2.

Scenario 5: ELM Activation Functions
The aim of this scenario is to investigate the best ELM activation functions.The best will be determined based on the highest classification accuracy obtained.The following ELM activation functions will be examined: Sigmoid (sigmoid), Sine (sin), Triangular basis function (tribas), Hard Limit (hardlim), and Radial basis function (radbas).In this scenario, the following configurations were used.

1.
Three layers of ELM were used: one for input, one hidden and one for the output.2.
These layers were constant for each ELM activation function.

3.
The set of features were fixed for each ELM activation function.

4.
This experiment was repeated for each ELM activation function above.

Scenario 6: Training Time of ELM Activation Functions
This scenario is designed to understand the required training time for each ELM activation function during the classification process.In this scenario, the following steps were followed.

1.
Three layers of ELM were used: one for input, one hidden and one for the output) 2.
These layers were constant for each ELM activation function.

3.
The set of features were fixed for each ELM activation function.

4.
This experiment was repeated for each ELM activation function above 5.
The CPU time consumed was computed to get the final classification accuracy each ELM activation function.

Results and Discussions
In this section, the results of the experiments run under all the scenarios above will be reported and discussed.It will first present MLP results and then the ELM ones.The reported results of the accuracy, specificity and sensitivity are the average of ten runs of the proposed method.

The Results of Scenario 1
The accuracy of training and testing of the different activation functions are compared.All results of the activation functions are presented in Table 1.The best results (accuracy of the trained and tested data) were obtained using the "tansig" activation function.This best function (tansig) was then further evaluated in terms of specificity, sensitivity, the results 84% for specificity and 61.6% for sensitivity.The time required for training of the different activation functions are compared.All results of activation functions are presented in Table 2.The best results of the training time were obtained using the "purlin" activation function and it was 2.890241 s.Although this time is better than tansig's time but it is quite a small amount (0.195924 s).The current advance in computer speed (supercomputer and GPU) would make this processing time irrelevant.The accuracy of training and testing of different number of layers and neurons used for each hidden layer of "tansig" activation function are compared.All results of accuracy are presented in Table 3. From this table, it can be noticed that the best result (accuracy at 82.20% with training time 15.811615s) was obtained when the number of hidden layers and neurons were 7 and 10 respectively The results of this scenario showed a positive relationship between the training time and the number of neurons.The training time increases when the number of neurons increases.From Table 3, we found that when the number of neurons was 10, the training times increased.Similarly, the results showed a positive relationship between the training time and the accuracy of training.The accuracy of training increases if the training time increases.From Table 3, the accuracy of training was the best result when the training times increased.

The Results of Scenario 4
Based on the results given in scenario 1, the activation function "tansig" was further tested with different learning rates and compared when the number of hidden layers was 7 and the number of neurons was 10.All results of accuracy are presented in Table 4.The best results of the accuracy were obtained using the "tansig" activation function, the learning rate was 1.00 and accuracy was 80.04% with training time was 17.687415 s.The accuracy of training and testing of the different activation functions are evaluated and compared.The summary of the results of this scenario is presented in Table 5.The best results (accuracy of the trained and tested data) were obtained using the "tribas" activation function and it was 99.10%.In general, from Tables 1 and 5, it can be noticed that the ELM classifier's accuracy results are much better than MLP classifier's ones.All results of training time of different activation functions are presented in Table 6.The best result was obtained during the training time using the "tribas" activation function and it was 0.0015 s.In general, from Tables 2 and 6, it can be noticed that ELM classifier needed less time than MLP classifier.This logic as ELM only has one hidden layer while MLP needs 7 hidden layers to produce the best results.To compare our results with the most related work, we conducted a comparison and its results are summarized in Table 7. From this table, it can be noticed that the results obtained by ELM algorithm is the best in terms of the accuracy, specificity, and sensitivity.In addition, our results have obtained from the largest dataset of all the compared work.This means that our results would be much reliable in terms of the scale-ability.In addition to the above comparison, we compared our method with the CNN based method [26] which reported classification results (TPR = 100% and PPV = 100% using a dataset containing 73 breast images).It is obvious that the CNN-based method achieved a slightly better results.However, our results are concluded from using a large dataset (1345 images) while CNN-based one obtained from 73 breast images.

Conclusions
This paper introduced a comparative study between two machine learning techniques (MLP and ELM) for the early detection of breast cancer through thermograms.Before applying any of these techniques, the entered images were firstly pre-processed, ROI was then segmented and finally, features were extracted.Under different scenarios, experiments were conducted using public dataset (DMR-IR).Different activation functions of both MLP and ELM were tested and investigated too.The experimental results showed that ELM-based breast cancer detection gave the best accuracy (100%) while the MLP classifiers gave only 82.20%.The ELM results were obtained using its TRIBAS function and only one hidden layer.Also, it was found that ELM is much faster than MLP.These promising detection results would be an important step toward an automatic detection of breast cancer using thermal images.The limitation of the proposed method depends on the quality of the protocol used to capture the thermograms.As the thermal images depends mainly on the temperature, the room and patient's temperature could affect the predication techniques.In future work, we think there are two points which could further improve/confirm the results of this study.Firstly, a thorough comparison and analysis between ELM and CNN could be conduct.Secondly, to advance the performance of the proposed segmentation technique, it is suggested to use more extracted features of different types to evaluate the classifier performance.

Figure 1 .
Figure 1.Examples for Breast Thermograms Images: on the left a normal breast while on the right abnormal one.

6 :
Adaptive histogram equalization (AHE) technique is then used to improve the contrast of images produced in Step 5.

Figure 5 .
Figure 5. Top-hat Transform Technique: (a) homomorphic filtering on normal image (b) white top-hat on normal image (c) black top-hat on normal image (d) homomorphic filtering on benign image (e) white top-hat on benign image (f) black top-hat on benign image (g) homomorphic filtering on malignant image (h) white top-hat on malignant image (i) black top-hat on malignant image.

Figure 10 .
Figure 10.Flowchart of the Proposed Method.

Table 1 .
The results of Scenario 1: MLP Activation Functions.

Table 2 .
The results of Scenario 2: MLP Training Time.

Table 3 .
The results of Scenario 3: The accuracy and training time for different configurations of "tansig" activation function.

Table 4 .
The results of Scenario 4: The accuracy and training time of different learning rate for the "tansig" activation function.

Table 5 .
The results of Scenario 5: ELM Activation Functions.

Table 6 .
The results of Scenario 6: ELM Activation Functions.

Table 7 .
The comparison between the proposed work and the other related work.