Breast Cancer Detection in Thermal Infrared Images Using Representation Learning and Texture Analysis Methods

Nowadays, breast cancer is one of the most common cancers diagnosed in women. Mammography is the standard screening imaging technique for the early detection of breast cancer. However, thermal infrared images (thermographies) can be used to reveal lesions in dense breasts. In these images, the temperature of the regions that contain tumors is warmer than the normal tissue. To detect that difference in temperature between normal and cancerous regions, a dynamic thermography procedure uses thermal infrared cameras to generate infrared images at fixed time steps, obtaining a sequence of infrared images. In this paper, we propose a novel method to model the changes on temperatures in normal and abnormal breasts using a representation learning technique called learning-to-rank and texture analysis methods. The proposed method generates a compact representation for the infrared images of each sequence, which is then exploited to differentiate between normal and cancerous cases. Our method produced competitive (AUC = 0.989) results when compared to other studies in the literature.


Introduction
Thousands of women suffer from breast cancer worldwide [1].To detect breast cancer early, mammographic images are commonly used [2]; however, some studies have shown that thermal infrared images (known as thermographies) can yield better cancer detection results in the case of dense breasts (breasts of young females) [3].The dynamic thermography technique is less expensive than the mammography and magnetic resonance imaging (MRI) techniques.In addition, it is a non-invasive, non-ionizing and safe diagnostic procedure, in which patients feel no pain.The principle of work of thermography is based on two facts: (1) the temperatures of breast cancer regions are warmer than the surrounding tissues, and (2) metabolic heat and blood perfusion rates generation in tumors are much higher than the rates of normal regions.This variation of temperature between healthy and abnormal breasts can be captured by thermal infrared cameras [4].
Dynamic infrared thermographies may enhance the detection results of static infrared images by employing a superficial stimulus to improve the thermal contrast [5].Cooling or heating procedures can be exploited as a stimulus to thermally excite tissues of patients' breasts.It is worth noting that the cooling procedure is much safer than the heating procedure as the temperature range of women body is 36.5 • C-37.5 • C, and thus higher heating may harm the living tissues in the breasts.After applying the cooling procedure on the breasts, the temperature of healthy tissues decreases with an attenuation of vascular diameter; in turn, the temperature of abnormal tissues remains unaltered (or it increases with a vascular dilation), as reported in [6].In this manner, the findings of the analysis of the similarity (or the dissimilarity) between infrared images acquired before and after the cooling procedure can be exploited to reveal breast cancer.Figure 1 shows samples of sequences of infrared images produced from a dynamic thermography procedure for a healthy case and a cancer patient.The dynamic thermography procedure generates a sequence of arrays for each patient.Each array is generated at a time step t and comprises the temperatures of the middle region of the patient's body (specifically, the region between the neck and the waist).Examples of Figure 1 only show the temperature arrays of one breast for the two cases.Please note thatthe arrays that include the skin temperature (in Celsius) of the breasts are transformed into images.In this paper, we propose a novel method for modeling the changes of temperature in the breasts during the dynamic thermography procedure using a representation learning technique called learning-to-rank (LTR) and texture analysis methods.We assess the performance of six texture analysis methods to describe the changes of breast temperatures in thermal infrared images: histogram of oriented gradients (HOG), features calculated from the gray level co-occurrence matrix (GLCM), lacunarity analysis of vascular networks (LVN), local binary pattern (LBP), local directional number pattern (LDN) and Gabor filters (GF).These texture analysis methods have been widely used in the literature to analyze breast cancer images.We use the LTR method to learn a representation for the whole sequence of infrared images.The LTR method produces a compact yet descriptive representation for each sequence of thermograms, which is then fed into a classifier to differentiate between healthy cases and cancer patients.
The rest of this article is structured as follows.Section 2 presents the related work.In Section 3, we explain the proposed method.In Section 4, we present and discuss the experimental results.In Section 5, we summarize the findings of this article and suggest some lines of future work.

Related Work
Over the past two decades, several techniques have been presented for the early detection of breast cancer, such as mammography and MRI [7].Sebastien et al. [8] presented a comparative study of several breast cancer detection techniques and discussed their main limitations.Hamidinekoo et al. [9] presented an overview of the recent state-of-the-art deep learning-based computer-aided diagnosis (CAD) systems developed for mammography and breast histopathology images.The authors proposed an approach to map features between mammographic abnormalities and the corresponding histopathological representations.Rampun et al. [10] proposed the use of local quinary patterns (LQP) for breast density classification in mammograms on various neighborhood topologies.Despite the fact that mammography is the standard method for early detection of breast cancer, its main drawback is that it may produce a large number of false positives [8].In turn, the MRI technique is recommended as an adjunct to mammography for women with genetic mutations [11].However, the main limitation of MRI is that it has a limited spatial resolution, yielding a low sensitivity for sub-centimeter lesions [12].
Infrared thermal imaging can overcome the limitations of the mammography technique because it can selectively optimize the contrast in areas of dense tissues (young women), as reported in [13].Several methods have been proposed in the literature to analyze breast cancer using dynamic thermograms [14][15][16][17][18][19][20][21][22][23].The authors of [24] reviewed different medical imaging modalities for computer-aided cancer detection methods with a focus on thermography.The authors of [25] presented a new method for extracting the region of interest (ROI) from breast thermograms by considering the lateral views and the frontal views of the breasts.The obtained ROIs help physicians to discriminate between the biomarkers of normal and abnormal cases.Mookiah et al. [26] evaluated the use of a discrete wavelet transform, texture descriptors, fuzzy and decision tree classifiers, to distinguish the normal cases from the abnormal ones.With 50 thermograms, their system achieved an average sensitivity of 86.70%, a specificity of 100% and an accuracy of 93.30%.Saniei et al. [27] proposed a five-step approach for analyzing the thermal images for cancer detection.The five steps are: (1) the breast region is extracted from the infrared images using the connected component labeling method, (2) the infrared images were aligned using a registration method, (3) the blood vessels were segmented using morphological operators, (4) for each vascular network, the branching points were exploited as thermal minutia points, and (5) the branching points were fed into a matching algorithm to classify breast regions into normal or abnormal, achieving a sensitivity of 86% and a specificity of 61%.Acharya et al. [28] extracted a co-occurrence matrix and a run length matrix texture features from each infrared image and then fed them into a support vector machine algorithm to discriminate the normal cases from the malignant ones, achieving a mean sensitivity of 85.71%, a specificity of 90.48% and an accuracy of 88.10%.Furthermore, Etehadtavakol et al. [29] demonstrated the importance of extracting the hottest/coldest regions from thermographic images and used the Lazy snapping method (an interactive image cutout algorithm) to do so quickly with an easy detailed adjustment.
Gerasimova et al. [30] used the wavelet transform modulus maxima method in a multi-fractal analysis of time-series of breast temperatures obtained from dynamic thermograms.Homogeneous mono-fractal fluctuations statistics have been noticed from the temperature time-series obtained from breasts with malignant tumors.The authors of [31] used asymmetry features to classify breast thermograms while [32] used several feature extraction methods and hybrid classifiers to classify breast thermograms into normal or abnormal cases.In [33], statistical descriptors were extracted from thermograms and inputted into a fuzzy rule-based system to classify them into malignant or benign.The descriptors were collected from the two breasts of each case to quantify their bilateral changes.This method gave an accuracy of 80%.Furthermore, the study of [34] proposed a three-step approach for identifying patients at risk of breast cancer.First, the infrared images of each case were registered.Second, the breast region was divided into kernels of 3 × 3 pixels and the average temperature from each region was computed in all images of the same case.Then, two complexity features were computed from the thermal signals.Third, the k-means clustering method was used to construct two clusters from the feature vectors.Silhouette, Davies-Bouldin and Calinski-Harabasz indexes were computed and then used as descriptors.This method obtained an accuracy of 90.90% using a decision tree classifier.In [35], a ground-truth-annotated breast thermogram database (DBT-TU-JU database) was proposed for the early prediction of breast lesions.The DBT-TU-JU database includes 1100 static thermograms of 100 cases.
Most of the methods proposed in the literature focus on the extraction of features from the images and do not consider the temporal evolution of temperature during the dynamic procedure (i.e., they ignore the temporal information in the infrared image sequences).Unlike the above-mentioned methods, we propose the use of the LTR method to generate a compact and robust representation of the whole sequence of infrared images of each case.The resulting representation models the evolution of temperature in the skin of breasts during the thermography procedure.Then, we use the generated representations to train a classifier to discriminate between healthy cases and cancer patients.

Proposed Method
Figure 2 presents the proposed method for cancer detection in dynamic thermograms.It consists of two phases: training and testing.The training phase has two stages: representation learning and model construction.In the representation learning stage, we extract texture features from each thermogram, and then an LTR method is used to learn a compact and descriptive representation for the whole sequence.In the model construction stage, the learned representations are fed into a classifier to construct a model.This model is trained to discriminate between normal and cancerous cases.Please Please note thatthe representation learning step (feature extraction and LTR) does not use the ground truth of each sequence (i.e., the labels of each sequence: normal or cancer) to generate a compact representation of the infrared images of each sequence.The labels are only used to train a multilayer perceptron (MLP) classifier (the model construction stage of the training phase).In the testing phase, we perform the first two steps of the training phase (feature extraction and LTR), and then we input the generated representations of test sequences into the model obtained in the training phase to predict their labels (normal/cancer).Below, we provide more details for each step of the proposed method.

Representation Learning
In this stage, we use six texture analysis methods to extract texture descriptors from each infrared image, and then the LTR method is exploited to generate a representation for the infrared images of each case.

Texture Analysis
To describe the texture of breast tissues in infrared images accurately, we assess the efficacy of six texture analysis methods: histogram of oriented gradients, lacunarity analysis of vascular networks, local binary pattern, local directional number pattern, Gabor filters and descriptors computed from the gray level co-occurrence matrix.These texture analysis methods have been widely used in medical image analysis and achieved good results with breast cancer image analysis [16,27,28,[36][37][38].

Histogram of oriented gradients (HOG):
The HOG method is considered as one of the most powerful texture analysis methods because it produces distinctive descriptors in the case of illumination changes and cluttered background [39].The HOG method has been used in several studies to analyze medical images.For instance, in [16], a nipple detection algorithm was used to determine the ROIs and then the HOG method was used to detect breast cancer in infrared images.To construct the HOG descriptor, the occurrences of edge orientations in a local image neighborhood are accumulated.The input image is split into a set of blocks (each block includes small groups of cells).For each block, a weighted histogram is constructed, and then the frequencies in the histograms are normalized to compensate for the changes in illumination.Finally, the histograms of all blocks are concatenated to build the final HOG descriptor.

Lacunarity analysis of vascular networks (LVN):
Lacunarity refers to a measure of how patterns clog the space [40] and describes spatial features, multi-fractal and even non-fractal patterns.We use the lacunarity of vascular networks (LVN) in thermograms to detect breast cancer.To calculate LVN features, we first extract the vascular network (V N) from each infrared image and then calculate the lacunarity-related features [D(VN), L(V N)].Please Please note thatLVN extracts 2 features from each infrared image.Below, we briefly explain the two stages: -Vascular network extraction: this method has two steps [27]: Step 1: Preprocess each infrared image using an anisotropic diffusion filter.

•
Step 2: Use the black top-hat method to extract the VN from the preprocessed infrared images.
This process produces a binary image (i.e., VN is a binary image).In Figure 3, we show an example of a vascular network extracted from an infrared image.-Lacunarity-related features: We use the sliding box method to determine the lacunarity on the binary image V N. In this method, an l × l patch is moved through the image VN, and then we count the number of mass points within the patch at each position and compute a histogram χ l (n); n denotes the number of mass points in each patch.The histogram χ l (n) refers to the number of patches containing n mass points.The lacunarity at scale l for the pixel at the position (i, j) can be determined as follows [40]: where E(X) calculates the expectation of the random variable X.The study of [41] demonstrated that the lacunarity shows power law behaviors with its scale l as follows: We take the logarithm on left and right sides of Equation ( 2) to get the following equation: To estimate D(V N i,j ) and L(V N i,j ), we use the linear least squares fitting technique, and set the scale range to compute the lacunarity in the interval [42,43].Gabor filters (GF): GF have been used in several works on medical image analysis.For instance, the authors of [44] extracted Gabor features from breast thermograms, such as energy and amplitude in different scales and orientations to quantify the asymmetry between normal and abnormal breast tissues.In [45], self-similar GF, the Karhunen-Loève (KL) transform and Otsu's thresholding method were used to detect global signs of asymmetry in the fibro-glandular discs of the left and right mammograms of breast cancer patients.The KL transform was used to preserve the most relevant directional elements appearing at all scales.In [37], GF were used to extract directional features from mammographic images and the principal component analysis method was used for dimensionality reduction.In [46], GF were used to extract texture features from mammographic images to detect breast cancer.
The 2D Gabor filter f g(x, y) can be formulated as a sinusoidal with a certain frequency and orientation multiplied by a Gaussian envelope as follows: where σ x , σ y and (u 0 , v 0 ) are the standard deviations along the x-and y-axes, and the centre of the sinusoidal function, respectively.In our experiments, 4 scales and 6 orientations are used to compute GF responses (the number of scales and orientation were empirically tuned).Then, we convolve each input image I(x, y) with each response f g(x, y) to determine the filtered image f (x, y).To extract texture features from GF, the energy is computed from each filtered image, and then all energies are concatenated into one feature vector [47].

Local binary pattern (LBP):
The LBP operator assigns a label (0 or 1) for each pixel in the input image.LBP extracts a 3 × 3 box around each pixel p x and then compares all pixels with it.Pixels in this box with a value greater than p x are labelled by '1' and the other pixels are labelled by '0'.In this way, LBP represents each pixel with 8-bits [36,48].We only consider patterns that comprise at most two transitions from 1 to 0 or from 0 to 1 (called uniform LBPs).Please Please note that 10101010 is a non-uniform LBP and 00111110 is a uniform LBP.To construct the final LBP descriptor, we compute the histogram of uniform patterns (its length is 59).
Local directional number (LDN): Rámirez et al. proposed the LDN descriptor in [49], in which 8 Kirsch compass masks are convolved with the input image, yielding 8 edge responses for each pixel (each response is a result for a mask).The authors used a code for each mask (called location code): 000 for mask0, 001 for mask1, 010 for mask2, 011 for mask3, 100 for mask4, 101 for mask5, 110 for mask6 and 111 for mask7.For each pixel in the image, LDN determines the maximum positive and smallest negative edge responses, and then the location codes corresponding to the two responses are concatenated.
Assume that we convolve the eight masks (mask0-mask7) with an infrared image and we get eight edge responses (10, −20, 0, 3, 7, −10, 6, and 30) for the pixel p1.The maximum positive and smallest negative edge responses are 30 and −20 that correspond to the masks mask1 and mask7.Thus, the LDN code for the pixel p1 is 111001.A 64-bin histogram is computed from the codes extracted from the input infrared images.

Features computed from the gray level co-occurrence matrix (GLCM):
To compute the GLCM, we determine the joint frequencies ∆(i, j) of possible combinations of gray levels i and j in the input image.Please Please note thateach pair is separated by a distance l τ and an angle τ [50].In this research, we set the distance to 5 and use 4 different orientations (0 • , 45 • , 90 • and 135 • ) with a quantification level of 32 to compute 22 descriptors from each GLCM.This configuration was recommended in several related studies, such as [38,51].We then concatenate all descriptors into one feature vector (its dimension is 88, 4 × 22).In Table 1, we present the descriptors extracted from the GLCM in addition to the mathematical expression of each descriptor [50,52,53].The interpretation of terms used to compute the descriptors are provided in Table 2.

N g
The number of notable gray levels in the quantized image.N 2 g GLCM size. µ The average of the entire normalized GLCM.p x (i) i th entry in the marginal probability matrix (MPM) calculated by summing the rows of p(i, j).p y (i) i th entry in the MPM calculated by summing the columns of p(i, j).µ x and µ y Mean of p x and p y , respectively σ 2 x and σ 2 y Variance of p x and p y , respectively.p x+y (k) It accumulates the elements of probability matrix (PM) entries p(i, j) that correspond to the sum of a set of pairs of gray levels p x−y (k) It accumulates the PM entries p(i, j) that correspond to the difference of a set of pairs of gray levels HX, HY and HXY Entropy of p x , p y and p(i, j), respectively.

HXY1
Term similar to the entropy equation used to calculate the IM of D19.

HXY2
Term similar to the entropy equation used to calculate the IM of D20.

Representation Learning Using LTR
Based on the observation that the changes of temperatures on normal and abnormal tissues have a different temporal ordering, we use the LTR method to model the evolution of temperatures of women breasts during dynamic thermography procedures.In other words, we use the LTR method to generate a compact description for each image sequence.Assume that we have a sequence of thermograms of a breast (temperature matrices), and each thermogram at time t is represented by a vector x t ∈ R D , thus the whole sequence is represented by X = [x 1 , x 2 , . . ., x Nsq ]; Nsq is the number of images in the sequence (in our case Nsq = 20).The feature vector of each thermogram x t is obtained in the feature extraction stage (Section 3.1.1).
We use the LTR method to model the relative rank of these thermograms (e.g., x 2 comes after x 1 , which can be written as x 2 x 1 ).Assume m3 is the feature vector of an infrared image acquired at time step t3, m2 is the feature vector of an infrared image acquired at time step t2, and m1 is the feature vector of an infrared image acquired at time step t1, this means that m3 comes after m2, and m2 comes after m1.We use the LTR method to model this ranking, which represents the change of temperature in breasts from a time step to another during the dynamic thermography procedure.
To avoid the effect of abrupt changes of temperatures in thermograms, the LTR method learns the order of smoothed versions of the feature vectors.Let us define a sequence M = [m 1 , m 2 , . . ., m Nsq ], where m t is the result of processing the feature vectors from time 1 to t (x 1 to x t ) using the time varying mean method as follows: Each vector v t is then normalized as follows: where m t is the normalized version of v t .Given the smoothed vectors, LTR learns their rank (i.e., m n m n−1 m n−2 • • • m 1 ), a linear LTR learns a pairwise linear function ψ(m t ; ), where ∈ R D is a vector containing its parameters.The ranking score of m t is computed by ψ(m t ; ) = T .m t .The LTR method optimizes the parameters of ψ(m t ; ) using the objective function of Equation (7) with the constraint ∀t i , t j m t i m t j ⇐⇒ T .m t i T .m t j as follows [54]: where κ is the regularization parameter and is the margin of tolerance.The constraint of the objective function encourages the ranking score (prediction score) of the difference between the feature vectors of two infrared images m t i and m t j acquired at time t i and t j , respectively, to be greater than 1 − ij , and the margin of tolerance should be greater than or equal zero.In other words, if m t i comes after m t j , the ranking score of m t i should be greater than the one of m t j .For further details the reader is referred to [55,56].In our case, the temperature evolution information is encoded in the parameters , which can be viewed as a principled, data-driven, temporal pooling of the evolution of temperature in thermography procedures.Thus, the parameters are used in this study to describe the evolution of temperatures in the whole sequence of thermograms.In other words, we use to represent the whole sequence.In this study, we model the evolution of temperatures in thermography procedures in the forward and backward directions, as follows: • Forward direction ( f orw ): we model the changes in the temperature from the cooling instant to the instant of restoring the normal temperature of the body (the algorithm moves from the infrared image acquired at time step t to the image acquired at t + 1).

•
Backward direction ( back ): we model the changes in temperature in the reverse direction where we start with the last infrared image of the input sequence and proceed to the first one (the algorithm moves from the infrared image acquired at time step t to the image acquired at t − 1).
In our experiments, we calculate the parameters of LTR two times for each input sequence: one for the forward direction f orw and another for the backward direction back .Prior to the classification stage, we normalize each feature vector using the 1−norm.In the classification stage, we input the feature vectors of normal and cancerous cases into the MLP classifier to construct a classification model.

Model Construction
We used the MLP classifer to classify the cases into normal or cancerous.MLP is a network of simple neurons called perceptrons.Each perceptron computes a single output from multiple numerical inputs by setting a weight for each input and then combining them.The combined value is then input into a non-linear activation function.Equation (8) formulates this operation as: where y out is the output of the perceptron, x is the vector of inputs, θ is the vector of weights, δ is the bias and Φ is the activation function.The common activation functions are the logistic sigmoid Φ(z) = 1/(1 + e −z ) and the hyperbolic tangent Φ(z) = tanh(z).In this study, we use the latter one.

Evaluation Metrics and Statistical Analysis Methods
To evaluate the proposed method, we use five metrics: accuracy, recall, precision, F-score (the harmonic mean of the precision and recall) and the area under the curve (AUC) of the receiver operating characteristic (ROC) curve.True positive (TP), true negative (TN), false positive (FP) and false negative (FN) are determined to calculate the four metrics: • TP refers to positive instances correctly classified as positive cases.
• TN refers to negative instances correctly classified as negative cases.

•
FP refers to negative instances wrongly classified as positive cases.

•
FN refers to positive instances wrongly classified as negative cases.
The mathematical expressions of accuracy, recall, precision and F-score are shown below: Recall = TP TP + FN (10) The ROC analysis is used to avoid choosing a certain threshold for classifying the data [57].We use each possible threshold to calculate the true positive rate TP/(TP + FN) and the false positive rate FP/(FP + TN).In our experiments, we randomly split the dataset into training, validation, and testing sets.The training, validation and testing ratios were 0.7, 0.15 and 0.15.Please Please note thatall images/sequences pertaining to a given patient either all belong to the training set or the testing set (or validation set).The number of instances in each set is rounded to the nearest integer.
To check if the model is overfitting or not, we randomly re-partition the DMR-IR dataset into training, validation and testing datasets.We repeat this process 100 times and report the mean value of AUC, accuracy, recall, precision and F-score.
To show the efficacy of the proposed method, we also calculate the statistical significance of the differences of results between our proposal and the other methods in terms of the AUC values.To do so, we use the Wilcoxon signed-rank test to find the difference in AUC values (significance level < 0.05).
To check the normality of the distributions of the AUC values, we use the Shapiro-Wilk test.If we have Cn comparisons and the significance level α of a single experiment is 0.05, according to the Bonferroni correction the actual significance level should be 0.05/Cn [58].Please note thata p-value lower than 0.05/Cn refers to statistically significant results.

Dataset
In our experiments, we use the dynamic thermogram dataset presented in [59] to validate our method.This database is called DMR-IR and it is available at http://visual.ic.uff.br/dmi/.The acquisition protocol of DMR-IR has the following steps:

•
An electric fan was used for several minutes for cooling the breasts and armpits of patients (an environment with the temperature range 20 A FLIR thermal camera was used to capture 20 infrared images during 5 min after the cooling (the dimension of the images is 640 × 480 pixels).
The FLIR camera (model SC620) has a sensitivity smaller than 0.04 • C with a detectable temperature range between −40 • C and 500 • C. DMR-IR has 37 sequences of cancer patients (histopathologically proven breast cancer) and 19 of healthy cases (without benign findings).It is worth mentioning that the DMR-IR dataset includes segmented images that contain only the temperatures of breasts (excluding the temperatures of other parts of the body).The medical records of each patient (age, race, personal history, family history, medical history and protocol recommendations) are given in the following website: http://visual.ic.uff.br/dmi/.It is worth mentioning that the DMR-IR dataset does not include information about breast density, breast size and lesion size.

Results
In our experiments, we only model the changes on temperatures in the breast region of thermograms.The dataset used includes masks for the breast regions (images containing the temperatures of breasts of each patient).To evaluate the proposed method, we carried out the following four experiments:

•
Concatenation of the texture features extracted from each thermogram (without using the LTR method).In this way, we produce one feature vector for each sequence, which is then used to classify the cases.

•
Use of the forward representation generated by the LTR method to classify the cases.

•
Use of the backward representation generated by the LTR method to classify the cases.

•
We concatenate the forward and backward representations of each sequence of thermograms, and then use it to classify the cases.
Table 3 presents the results of the texture analysis methods with the MLP classifier (50 neurons) when concatenating the feature vectors extracted from each thermogram (without using the LTR method).As shown, the HOG method with 4 blocks and 4 cells (HOG4*4) achieves an AUC value and an accuracy better than the ones of HOG2*2 (2 blocks, 2 cells), LVN, GF, LBP, LDN and GLCM.Please note that HOG4*4 has the highest dimension compared to the other texture analysis methods.Even though the LVN feature vector has the smallest dimension, it has better classification results than the GLCM.The main reason for the low performance of the GLCM is that it depends on the intensity of the pixels, making it more sensitive to the noise.4 presents the performance of the six texture analysis methods when using the representation learned by the forward LTR (i.e., f orw ), finding that HOG2*2+ f orw achieves the best AUC value, while LDN+ f orw gives the best accuracy and F−score.Table 5 presents the results of each texture analysis method when using the representation learned by the backward LTR (i.e., back ).In this case, HOG4*4+ back achieves the best AUC value, accuracy, recall, precision and F−score.The results of the forward and the backward representations are better than the ones of the feature concatenation method.This is because of the temporal information encoded in the forward and the backward representations (i.e., the change from one infrared image to another over time).In turn, the feature concatenation method ignores such information.
In Table 6, we present the performance of the six texture analysis methods when concatenating the representations learned by the forward and backward LTR.HOG4*4+LTR (4 blocks, 4 cells, and with the concatenation of f orw and back ) gives classification results better than the ones of the other methods in terms of the five evaluation metrics.HOG2*2+LTR (4 blocks, 4 cells and with LTR) produces the second best classification results, but its dimensions are much lower than the ones of HOG4*4+LTR (72 with HOG2*2+LTR vs. 288 with HOG4*4+LTR).In general, the results of the HOG method with LTR are better than the ones of concatenating the features of all thermograms of each sequence, or the ones learned by f orw or back .
As summarized in Table 7, HOG, LVN, GF, LBP and LDN achieve the best AUC values when concatenating the representations learned by the forward and backward LTR.In turn, GLCM gives the best AUC value when using the representation learned by back .The concatenation of the forward and backward representations provides a complete description of the evolution of the temperature in the breasts during the dynamic thermography (from the cooling instant to the warm instant and vice-versa).Thus, this concatenation yields accurate classification results.Unlike the feature concatenation approach, the time-varying mean used with the LTR method smooths the feature vector of each sequence, and thus it reduces the effect of abrupt changes of temperatures and suppresses the noise.There are two major reasons for the high performance of the HOG method with the LTR method (HOG4*4+LTR):

•
Unlike LBP, LDN and GLCM, HOG is less sensitive to the noise.

•
In the HOG method, we accumulate the occurrence of the gradients of the temperature matrices along different orientations, and thus it produces an accurate description of the gradients of temperatures in the infrared images.
Indeed, the problem of having too many features describing an inductive learning task is the curse of dimensionality.In the case of GLCM, we calculate 22 descriptors, where each uses measures to extract a specific feature from the images (e.g., contrast and energy).When concatenating the forward and backward representations of GLCM, we get 176 features (88 × 2) that may make the model more expressive and maybe not all of these features are even relevant to our task.This explains why the results of the forward or the backward representation of the GLCM features (AUC values of 0.807 and 0.821) are better than the ones of concatenation (AUC = 0.677).Malignancy score.The proposed method also assigns a degree of membership (η Mal ) for each case to be a malignant tumor (the malignancy score).The range of the malignancy score values is [0, 1].If the malignancy score is close to 0, the probability of a malignant tumor decreases.In this paper, we exploit the likelihood score of the MLP classifier to compute the degree of membership for each case (the likelihood that a label comes from the malignant class).It is worth noting that the degree of membership of a given case to be healthy is 1 − η Mal .In Figure 4, we show the malignancy scores of our method (HOG4*4+LTR) for two normal cases and two cancer patients.As we can see, the proposed method assigns small scores for normal cases (η Mal < 0.5) and high scores for cancerous cases (η Mal > 0.6).Statistical analysis.The results of HOG4*4+LTR are better than those of individual texture analysis methods (Table 3).To show the efficacy of the proposed method, we also calculate the statistical significance of the differences of results between HOG4*4+LTR and the other methods in terms of the AUC values.The findings of the Shapiro-Wilk test showed that the AUC values do not follow a normal distribution.Given that we have 5 comparisons and the significance level α of a single experiment is 0.05, according to the Bonferroni correction the actual significance level should be 0.01 (0.05/5).In Table 7, we present the statistical analysis of the AUC values.Table 7 shows that the AUC values of the HOG4*4+LTR are significantly better than the ones of the other combinations (Wilcoxon signed-rank test).The experimental results demonstrate that the proposed method can model the changes on temperature across dynamic thermograms and produce a descriptive and compact representation of the whole sequence of infrared images, yielding good classification results between normal and cancer patients.

Comparison with the Related Work
In Table 8, we compare the proposed method with related studies.The authors of [27,34] used dynamic thermography databases, while [33] also used a static thermography database.As we can see in Table 8, the classification results with dynamic thermograms are much better than the ones with static thermograms.The authors of [34] (described in Section 2) also used the DMR-IR dataset in their study.In our experiments, we used all the samples of the DMR-IR dataset while [34] only used 22 samples, which are not enough to train, test and judge the efficacy of their method.In addition, they did not use any cross-validation method or statistical analysis method.In turn, to check if our model is overfitting or not, we randomly re-partitioned the DMR-IR dataset into training, validation and testing datasets and reported the average results.Our proposal (HOG4*4+LTR) gives an accuracy and precision better than the ones of [34].In turn, we achieved an AUC value lower than the one of [34] because we used a larger number of samples to train and test our method.Unlike the principal component thermography (PCT) [60] and the candid covariance-free incremental principal component thermography (CCIPCT) [61] that use matrix decomposition/ factorization to find the defects in thermal images and dimension reduction, in the proposed method we input the 20 infrared images of the breast for each subject (the whole sequence) into the LTR method to generate a compact representation for them.As shown in the results section, the LTR method yields classification results better than the ones of the concatenation of each texture analysis method.

Conclusions and Future Work
In this article, we have presented a new method for detecting breast cancer in dynamic thermograms.We model the changes in breast temperatures during the dynamic thermography procedures using the LTR and texture analysis methods.Our method generates a compact yet descriptive representation for the whole sequence of thermograms of each case.We then input the representation extracted from normal and cancerous cases into the MLP classifier to build a classification model.The proposed method obtains outstanding classification results in terms of AUC, accuracy, recall, precision and F-score.It also outperforms the results of related methods.To further

Figure 1 .
Figure 1.Samples of two dynamic thermogram sequences for two cases, where each sequence comprises 20 infrared images.The infrared images of the left breast acquired at different time steps (t 1 , t 2 , t 19 , t 20 ) for a healthy case (top) and a cancer patient (bottom).

2 Figure 2 .
Figure 2. The training and testing phases of the proposed method.

Figure 3 .
Figure 3. Lacunarity analysis of vascular networks extracted from infrared images, (a) the input infrared image, and (b) the extracted vascular network.

Figure 4 .
Figure 4.The malignancy score η Mal of two normal cases (left column) and two cancer patients (right column).

Table 1 .
Descriptors computed from the GLCM.

Table 2 .
Interpretations for terms used to compute descriptors from the GLCM.

Table 3 .
The performance of the proposed method when concatenating the feature vectors of the 20 infrared images of each sequence.

Table 4 .
The performance of the proposed method when using the representation learned by the forward LTR.

Table 5 .
The performance of the proposed method when using the representation learned by the backward LTR.

Table 6 .
The performance of the proposed method when concatenating the representations learned by the forward and backward LTR.

Table 8 .
Comparison with related studies.