Two Ensemble-CNN Approaches for Colorectal Cancer Tissue Type Classification

In recent years, automatic tissue phenotyping has attracted increasing interest in the Digital Pathology (DP) field. For Colorectal Cancer (CRC), tissue phenotyping can diagnose the cancer and differentiate between different cancer grades. The development of Whole Slide Images (WSIs) has provided the required data for creating automatic tissue phenotyping systems. In this paper, we study different hand-crafted feature-based and deep learning methods using two popular multi-classes CRC-tissue-type databases: Kather-CRC-2016 and CRC-TP. For the hand-crafted features, we use two texture descriptors (LPQ and BSIF) and their combination. In addition, two classifiers are used (SVM and NN) to classify the texture features into distinct CRC tissue types. For the deep learning methods, we evaluate four Convolutional Neural Network (CNN) architectures (ResNet-101, ResNeXt-50, Inception-v3, and DenseNet-161). Moreover, we propose two Ensemble CNN approaches: Mean-Ensemble-CNN and NN-Ensemble-CNN. The experimental results show that the proposed approaches outperformed the hand-crafted feature-based methods, CNN architectures and the state-of-the-art methods in both databases.


Introduction
Traditionally, pathologists have used the microscope to analyze the micro-anatomy of cells and tissues. In recent years, the advancement in Digital Pathology (DP) imaging has provided an alternative way to enable the pathologists to do the same analysis over the computer screen [1]. The new DP imaging modality is able to digitize the Whole Slide Imaging (WSI), where the glass slides are converted into digital slides that can be viewed, managed, shared and analyzed on a computer monitor [2].
In Colorectal Cancer (CRC), tumor architecture changes during tumor progression [3] and is related to patient prognosis [4]. Therefore, quantifying the tissue composition in CRC is a relevant task in histopathology. Tumor heterogeneity occurs both between tumors (intertumor heterogeneity) and within tumors (intra-tumor heterogeneity). In fact, Tumor Micro-Environment (TME) plays a crucial role in the development of Intra-Tumor Heterogeneity (ITH) by the various signals that cells receive from their micro-environment [5].
Colorectal Cancer (CRC) is considered as the fourth most occurring cancer and it is the third leading cancer type to cause death [6]. Indeed, early stage CRC diagnosis is decisive for therapy of patients and saving their lives [7]. The evaluation of tumor heterogeneity is very important for cancer grading and prognostication [8]. In more detail, intre-tumor heterogeneity can aid the understanding of TME's effect on patient prognosis, as well as identify novel aggressive phenotypes that can be further investigated as potential targets for new treatment [9].
In recent years, automatic tissue phenotyping, in Whole Slide Images (WSIs), has become a fast-growing research area in computer vision and machine learning communities. In fact, state-of-the-art approaches have investigated the classification of two tissue types [10,11] or multi-class tissue types analysis [8,12,13]. The two tissue types are tumor and stroma tissue categories. Actually, the classification of just two tissue categories is not suitable for more heterogeneous parts of the tumor [12]. To overcome this limitation, the authors of [12] proposed the first multi-class tissue type database, where they considered eight tissue types.
In this work, we deal with the classification of multi-class tissue types. In order to classify different CRC tissue types, we proposed two ensemble approaches which are: Mean-Ensemble-CNNs and NN-Ensemble-CNNs. Our proposed approaches are based on combining four trained CNN architectures, which are ResNet-101, ResNeXt-50, Inception-v3 and DenseNet-161. Our Mean-Ensemble-CNN approach uses the predicted probabilities of different trained CNN models. On the other hand, the NN-Ensemble-CNN approach used combined deep features that were extracted from different trained CNN models, then classified them using NN architecture. Since automatic multi-class CRC tissue classification is a relatively new task, we evaluated two hand-crafted descriptors which are: LPQ and BSIF. In addition, two classifiers were used which are SVM and NN. As summary, the main contributions of this paper are: • We proposed two ensemble CNN-based approaches: Mean-Ensemble-CNNs and NN-Ensemble-CNNs. Both of our approaches combine four trained CNN architectures which are ResNet-101, ResNeXt-50, Inception-v3 and DenseNet-161. The first approach (Mean-Ensemble-CNNs) uses the predicted probabilities of the four trained CNN models to classify the CRC tissue types. The second approach (NN-Ensemble-CNNs) combines the deep features that were extracted using the trained CNN models, then it uses NN architecture to recognize the CRC phenotype. • We conducted extensive experiments to study the effectiveness of our proposed approaches. To this end, we evaluated two texture descriptors (BSIF and LPQ) and their combination using two classifiers (SVM and NN) in two CRC tissue types databases. • Implicitly, our work contains comparison between CNN architectures and handcrafted feature-based methods for the classification of CRC tissue types using two publicly databases.
This paper is organized as follows: In Section 2, we describe the state-of-the-art methods. Section 3 includes description of the used databases, methods and evaluation metrics. In addition, Section 3 contains an illustration of our proposed approach and experimental setup. Section 4 represents the experimental results. In Section 5, we compare our results with the state-of-the-art methods. Finally, we conclude our work in Section 6.

Related Works
In recent years, CRC tissue phenotyping has been subject to increasing interest in both computer vision and machine learning fields due to the availability of CRC-tissue-type databases such as [8,12,14,15]. Supervised methods are widely used to classify the tissue types in histological images [12]. The supervized state-of-the-art methods for phenotyping the CRC tissues can be categorized as texture [10][11][12]16], or learned methods [8,15,17,18]. In addition, there are some works that combined deep and shallow features such as [19]. The texture methods are hand-crafted algorithms that were designed based on mathematical model to extract specific structures within the image regions [20]. However, deep learning methods have the ability to learn more relevant and complex features directly from the images across their layers. In particular, when there is no prior knowledge about the relationship between input data and the outcomes to be predicted. Since the pathology imaging tasks are very complex and little is known about which quantitative image features predict the outcomes, deep learning methods are suitable for these tasks [21,22]. In this section, we will describe the state-of-the-art works that have addressed multi-class CRC tissue types and used supervized methods.
In [12], J. N. Kather et al. were the first who addressed CRC multi-class tissue types, where they created their database from 5000 histological images of human colorectal cancer including eight CRC tissue types. J. N. Kather et al. tested several state-of-the-art texture descriptors and classifiers. Their proposed approach is based on the combination GLCM and LBP local descriptors beside with global lower-order texture measures which achieved promising performance. In [19], Nanni et al. proposed the General Purpose (GenP) approach which is based on ensemble of multiple hand-crafted, dense sampling and learned features. In their combined approach, they trained each feature using SVM then combined all of them using the sum rule. Cascianelli et al. [23] compared deep and shallow features to recognize the CRC tissue types. In their work, they studied the impact of using dimensionality reduction strategies in both accuracy and computational cost. Their results showed that the best trade-off between accuracy and dimensionality using CNN-based features is possible.
In [15], J. N. Kather et al. used 86 H&E slides of CRC tissues from the NCT biobank and the UMM pathology to create a training image set of 100,000 images that were labeled into eight tissue types. They tested five pretrained CNN models: VGG19 [24], AlexNet [25], SqueezeNet version 1.1 [26], GoogLeNet [27], and ResNet-50 [28]. They concluded that VGG19 was the best model among the five CNN models. Javed et al. [8] proposed a new CRC-TP database which consists of 280K patches extracted from 20 WSIs of CRC; these patches are classified into seven distinct tissue phenotypes. To classify these tissue types, they used 27 state-of-the-art methods including texture, CNN and Graph CNN-based (GCN) methods. From their experimental results, the GCN outperformed the texture and CNN methods. Despite hand-crafted feature-based and deep learning methods having been used for multi-class CRC tissue type classification, the performance of these methods still needs more improvement. To this end, we proposed two ensemble-CNN approaches that achieved considerable improvement on two popular databases. Kather-CRC-2016 database [12] consists of 5000 CRC tissue type images, where each tissue type has 625 samples. J. N. Kather et al. [12] used 10 anonymized H&E stained CRC tissue slides from the pathology archive at the University Medical Center Mannheim, Germany. Firstly, they digitized the slides. Then, contiguous tissue areas were manually annotated and tessellated. From each slide, they created 625 non-overlapping tissue tiles of dimension 150 × 150 pixels. The following eight types of tissues were selected in their database:

CRC-TP Database
The CRC-TP database [8] consists of 280K CRC tissue type images. These CRC tissue type images are patches that were extracted from 20 WSIs of CRC stained with H&E taken from University Hospitals Coventry and Warwickshire (UHCW). Each slide was taken from a different patient. With the aid of expert pathologists, the WSI slides were manually divided into non-overlapping patches and these patches were annotated into seven distinct tissue phenotypes, where each patch was assigned to a unique label based on the majority of its content. Table 1 contains the CRC tissue types and their corresponding number of samples.
The CRC tissue image size is fixed to 150 × 150 pixels. Javed et al. [8] divided the 280K CRC tissue images into training and testing splits to evaluate the performance of their methods, where 70% of each tissue phenotype patches are randomly selected as the training split and the remaining 30% are used as testing split. In our experiments, we used the provided patch-level separation data splits (70-30%) that were provided by [8]. Figure 2 contains five samples of each CRC tissue type from the CRC-TP database.

Local Phase Quantization (LPQ)
In the last three decades, texture descriptors have proved their efficiency in many computer vision tasks. In our experiments, we used two of the most powerful descriptors: Local Phase Quantization (LPQ) [29] and Binarized Statistical Image Features (BSIF) [30]. In addition, we tested the combination of these two descriptors by concatenating their features alongside each other.
LPQ [29] is a local texture descriptor based on quantized phase of the Discrete Fourier Transform (DFT) in local neighborhood pixels. For local neighborhood pixels M × M, shortterm Fourier transform is used to quantize the phase of Fourier transform by considering four frequencies. In our experiments, we choose LPQ parameters as follows: the local neighborhood size of the block is 13 × 13 pixels, the frequency estimation method is the Gaussian derivative quadrature filter pair and 3 × 3 multi-block representation. Each block produces a histogram which contains the repetition of the quantized phases for all pixels within this block. Consequently, each block produces a 256-dimensional feature vector and the final feature vector is the concatenation of all block feature vectors.

Binarized Statistical Image Features (BSIF)
BSIF [31] is a local texture descriptor that uses a set of 2-D filters to have a binarized response for each pixel. These filters were learned from natural images using independent component analysis. In our experiments, we used the 17 × 17 × 8 bank of filters. Similar to LPQ feature extraction, we used the 3 × 3 multi-block representation. Each block produces a 256-dimensional feature vector and the final feature vector is the concatenation of all block feature vectors. Figure 4 contains an example of extracting the BSIF features from a CRC tissue image. histogram histogram Figure 4. Multi-block BSIF feature extraction example of 3 × 3 multi-block representation.

Support Vector Machine (SVM)
In machine learning, SVM [32] is one of the most powerful supervized learning methods. For D features, the SVM algorithm seeks to define a hyperplane in D-dimensional space that distinctly classifies the data points. To separate two classes of data points, there are many possible hyperplanes that could be chosen. SVM objective is to find the plane that has the maximum margin, i.e., the maximum distance between data points of all classes. Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence. Figure 5 shows an example of possible hyperplanes between two classes and the linear-SVM hyperplane, which separates the two classes of data points with maximum margin. In our experiments, we used linear-SVM as a benchmark classifier for the hand-crafted features.

Neural Network (NN) Classifier
In addition to the SVM classifier, we built a seven-layer NN classifier to classify the shallow features that were obtained from LPQ and BSIF descriptors and their combination. Figure 6 illustrates the used architecture. We chose these seven layers to make the classifier simple since the extracted features are already middle-level features. To this end, we tested a different number of layers (3, 5, 7 and 9) on the first fold of the Kather-CRC-2016 database, then we picked out the number of layers corresponding to the best performance which was seven layers. Consequently, the seven-layer NN architecture was used in the other folder of the Kather-CRC-2016 database and CRC-TP database experiments. The seven-layer NN classifier was trained for 20 epochs, initial lr = 10 −6 with decay of 0.1 every 10 epochs and batch size equals 128.

CNN Architectures
In our experiments, we evaluated four of the most powerful CNN architectures, which are: ResNet-101, ResNeXt-50, Inception-v3, and DenseNet-161. Here, we used the pre-trained models that were trained on ImageNet challenge database [25].

ResNet-101
The traditional CNN architectures suffer from gradient vanishing/exploding when going deeper. In [28], K. He et al. proposed a solution to the gradient vanishing/exploding problem by using residual connections straight to earlier layers as shown in Figure 7. The residual networks are easier to optimize, and can gain accuracy from considerably increased depth with lower complexity than the traditional CNNs. In our experiments, we used the ResNet-101 pretrained model.

ResNeXt-50
ResNeXt block [33] uses the residual connections straight to earlier layers similar to ResNet block as shown in Figure 8. In addition, ResNeXt block adopts the splittransform-merge strategy (branched paths within a single module). In the ResNeXt block, as shown in Figure 8, the input is split into many lower-dimensional embeddings (by 1 × 1 convolutions)-32 paths each of 4 channels; then all paths are transformed by the same topology filters of size 3 × 3. Finally, the paths are merged by summation.

Inception-v3
Inception-v3 [34] is the third version of the Inception networks family that were introduced first hand in [27]. Inception block provides efficient computation and deeper networks through a dimensionality reduction with stacked 1 × 1 convolutions. The main idea of Inception architectures is to make multiple kernel filter sizes operate on the same level instead of stacking them sequentially as was the case in the traditional CNNs. This is known as making the networks wider instead of deeper. Figure 9 illustrates the architecture of Inception-v3, which makes several improvements compared to the initial Inception versions. These improvements include using label smoothing, factorized 7 × 7 convolutions, and the use of an auxiliary classifier to propagate label information lower down in the network.

DenseNet-161
DenseNet networks [35] seek to solve the problem of CNNs when going deeper. This is because the path for information from the input layer until the output layer (and for the gradient in the opposite direction) becomes so big, that it can vanish before reaching the other side. G. Huang et al. [35] proposed connecting each layer to every other layer in a feed-forward fashion (as shown in Figure 10) to ensure maximum information flow between layers in the network. In our experiments, we used the DenseNet-161 pre-trained model.

Focal Loss
Originally, Focal Loss function was proposed for one-stage object detectors [36], where it proved its efficiency in the imbalanced classes case. The Focal Loss function is defined by: where: p t is the predicted probability corresponding to the ground truth class, γ is the focusing parameter. Figure 11 shows a comparison between the Cross-Entropy loss function and Focal Loss function with different values of focusing parameter γ. As shown in Figure 11, γ controls the shape of the curve. The higher the value of γ, the lower loss will be assigned to the well-classified examples. At γ = 0, Focal Loss becomes equivalent to Cross Entropy Loss. In addition to one-stage object detection task, Focal loss function has proved its efficiency in many classification tasks [37,38].

Evaluation Metrics
To evaluate the performance of the tested methods, we used three metrics which are: accuracy, F1-score andF1-score. Accuracy is the measurement of all correctly classified samples over the total number of samples. The accuracy is mainly used to evaluate the methods on the Kather-CRC-2016 database because it is a balanced database. Since the CRC-TP database is not a balanced database, we used F1-score andF1-score. F1-score is defined by the formula: where: Precision and Sensitivity (also called Recall) are defined by the following formulas: where TP is the number of True Positive instances, FP is the number of False Positive instances and FN is the number of the False Negative instances. F1-score is defined by the formula: where C is the number of classes, n i is the number of test samples of i-th class, and N is the total number of test samples.

Proposed Approaches
To classify different CRC tissue types, we propose two Ensemble-CNN approaches: Mean-Ensemble-CNNs and NN-Ensemble-CNNs. The proposed approaches used the already trained CNN models (ResNet-101, ResNeXt-50, Inception-v3 and DenseNet-161) for CRC tissue type classification using the training data.
In the Mean-Ensemble-CNN approach, the predicted class of each image is assigned using the average of the predicted probabilities of four trained models. In more detail, the probabilities of the four models corresponding to all classes give the mean probability for each class, then the max of the mean probabilities assigns the ensemble predicted class. Figure 12 illustrates our Mean-Ensemble-CNN approach.  In the NN-Ensemble-CNN approach, the deep features corresponding to the last FC layer are extracted from the four trained models. Then, these deep feature vectors are concatenated alongside each other to obtain an ensemble deep feature vector. The extracted training features (from the training data) are used to train new NN architecture, which consists of four layers. On the other hand, the extracted testing features (from the testing data) are used to test the four-layer NNs. Figure 13 illustrates our NN-Ensemble-CNN approach. The selection of four layers for our NN-Ensemble-CNNs approach was after testing different small numbers of layers (3, 4 and 5) on the first fold of the Kather-CRC-2016 database. This was similar to what we did for the NN classifier of the hand-crafted features in Section 3.2.4.

Experimental Setup
For hand-crafted feature extraction and SVM classification, we used MATLAB 2019. For deep learning and NN training, we used the Pytorch [39] library with NVIDIA GPU Device Geforce TITAN RTX 24 GB. For training the deep learning architectures, we used data pre-processing including normalizing and resizing the input images to have the correct input size for each network. Inception-v3 input size is 299 × 299 pixels, while DenseNet-161, ResNeXt-50 and ResNet-101 need an input size of 224 × 224. Moreover, we used the following active data augmentation techniques:

Experiments and Results
In this section, we will describe our experimental setup and the experimental results.

Hand-Crafted Feature Experiments
In this section, we used two hand-crafted descriptors (LPQ and BSIF) to extract the features from CRC tissue images. After the features were extracted, we used two classification methods (SVM and NN) to distinguish between different CRC phenotyping.

Kather-CRC-2016 Database
In Kather-CRC-2016 database experiments, we used 5-fold cross-validation evaluation scheme. Table 2 summarizes the obtained accuracy for each fold and the mean of the five folds accuracies. From this table, we notice that the combination of LPQ and BSIF gave better performance for both classifiers (SVM and NN). On the other hand, we observe that combined features achieved similar results with the two classifiers, with slightly better accuracy with the seven-layer NN classifier.

CRC-TP Database
In the CRC-TP database, we used the train and validation splits that were provided with the database [8]. Table 3 summarizes the obtained results of LPQ and BSIF descriptors and their combination using SVM and NN classifiers. Similar to Kather-CRC-2016 experiments, we noticed that the combination of LPQ and BSIF gave better performance for both classifiers (SVM and NN). In addition, we observed that NN classifier with the combined features achieved the best performance.

Deep Learning Experiments
In this section, we evaluated four CNN architectures which are ResNet-101, ResNeXt-50, Inception-v3, and DenseNet-161, and two proposed ensemble schemes which are Mean-Ensemble-CNNs and NN-Ensemble-CNNs. All Networks are trained for 20 epochs with an Adam optimizer [40] and Focal Loss function [36] with γ = 2. The initial learning rate is 10 −5 for 10 epochs, then the learning rate decreases to 10 −6 for next 10 epochs. We also add a dropout layer in DenseNet-161, ResNeXt-50 and ResNet-101 before the fully connected layer with a probability of 0.3. Meanwhile, Inception-v3 already has a dropout layer with a probability of 0.5. For NN-Ensemble-CNNs, we used the four trained models to extract the deep features, then we trained the four-layer NN network as described in Figure 13. The four-layer NN network is trained for 30 epochs with an initial learning rate of 10 −6 and it decays after 15 epochs. Similar to the CNN architectures, the four-layer NN network is trained using an Adam optimizer [40] and Focal Loss function [36] with γ = 2.

Kather-CRC-2016 Database
Similar to the hand-crafted feature experiments on the Kather-CRC-2016 database, we used a 5-fold cross-validation evaluation scheme. For training the CNN architectures, we used a batch size of 64. To train the NN network for our NN-Ensemble-CNN approach, we used a batch size of 32. Table 4 summarizes the obtained results using the CNN architectures and the proposed ensemble approaches. By comparing these results with the ones from Table 2, we notice that the CNN architectures exceed the hand-crafted featurebased methods in the classification of CRC tissue types. We noticed that the performance of the two proposed ensemble approaches outperformed the performance of the four CNN architectures. Figures 14 and 15 show the confusion matrices of our proposed Mean-Ensemble-CNN and NN-Ensemble-CNN approach, respectively. From these confusion matrices, we notice that both Ensemble approaches achieved close results in the recognition of each CRC tissue type on the Kather-CRC-2016 database.

CRC-TP Database
To train the CNN architectures using the training data of CRC-TP database, we used batch size of 128. Similarly, we used batch size of 128 to train the four NN layers of our NN-Ensemble-CNN approach. In CRC-TP database experiments, we selected bigger batch sizes than the experiments of the Kather-CRC-2016 database because the CRC-TP database contains a larger number of samples for each class. Table 5 summarizes the obtained results using the CNN architectures and the proposed ensemble approaches. By comparing these results with the ones from Table 3, we notice that the CNN architectures exceed the handcrafted feature-based methods in the classification of CRC tissue types. On the other hand, the performance of the two proposed ensemble approaches outperformed the performance of the four CNN architectures. Figures 16 and 17 show the confusion matrices of our proposed Mean-Ensemble-CNN and NN-Ensemble-CNN approaches. The comparison between the performance of the two proposed ensemble approaches (from Table 5 and Figures 16 and 17) show that the NN-Ensemble-CNN approach performs slightly better on the recognition of CRC tissue types.   Since our approaches are an ensemble of trained CNN architectures, it is interesting to compare the computational cost of our proposed approaches with these CNN architectures. Table 6 contains the required time to test single CRC tissue type image using the trained CNN architectures and our approaches on Kather-CRC-2016 and CRC-TP databases. From Table 6, we notice that our approaches' testing time is equal to the sum of single models' testing times. Moreover, we notice that the required time is very trivial for all the evaluated methods in both databases. Therefore, our approaches are suitable for real-world digital pathology application.

Discussion
In this section, we will compare our results with state-of-the-art methods. Table 7 contains the comparison between our proposed approaches and the state-of-the-art methods. In [12], J. Kather et al. tested different texture descriptors with an SVM classifier. In [41], Ł. Rączkowski et al. proposed the Bayesian Convolutional Neural Network approach. In [19], L. Nanni et al. proposed an ensemble (FUS_ND+DeepOutput) approach based on combining deep and texture features. The comparison in Table 7 shows that our proposed ensemble approaches outperform the state-of-the-art methods. Table 7. Comparison between our approaches and the state-of-the-art methods on Kather-CRC-2016 database.

N of Folds Accuracy (%)
Gabor+rbf-SVM [12] 10 62.60 Perceptual+rbf-SVM [12] 10 63.00 GLCM+rbf-SVM [12] 10 71.90 Histogram higher+rbf-SVM [12] 10 72.40 LBP+rbf-SVM [12] 10 76.20 Histogram Lower+rbf-SVM [12] 10 80.80 ARA-CNN [ Table 8 contains the comparison between our proposed approaches and the state-ofthe-art methods on the CRT-TP database. In [8], S. Javed et al. used supervized and semisupervized learning methods. In this comparison, we consider the results of the supervized approaches which are similar to our approaches. In Table 8, we compare our approaches with texture and deep learning methods that were tested on [8]. The comparison shows that our approaches (Mean-Ensemble-CNNs and NN-Ensemble-CNNs) outperform the state-of-the-art methods. The comparison with the hand-crafted feature-based methods, deep learning architectures and the state-of-the-art methods proves the efficiency of our proposed ensemble approaches (Mean-Ensemble-CNNs and NN-Ensemble-CNNs).  From the results on the Kather-CRC-2016 database, we notice that our proposed approaches (Mean-Ensemble-CNNs and NN-Ensemble-CNNs approach) achieved similar results (Table 7). Meanwhile, in the CRC-TP database, we notice that the NN-Ensemble-CNNs performance is better than Mean-Ensemble-CNNs (Table 8). On the other hand, we noticed that the performance of different methods on Kather-CRC-2016 is better than the performance on the CRC-TP database. This is probably because the CRC-TP database contains more challenging classes than Kather-CRC-2016. In addition, CRC-TP is not a balanced database that can influence the overall performance. Another possible reason can be the splitting and labeling of the tissue types, which were performed by different expert pathologists for each database. Despite our approach outperforming the state-of-the-art methods in both databases, the results in the CRC-TP database need more improvements for real-world applications. One possible solution is to use more data augmentation techniques to increase the training data.

Conclusions
In this paper, we proposed two Ensemble deep learning approaches to recognize the CRC tissue types. Our proposed approaches are denoted by Mean-Ensemble-CNNs and NN-Ensemble-CNNs, which are based on combining four trained CNN architectures. The trained CNN architectures are ResNet-101, ResNeXt-50, Inception-v3 and DenseNet-161. In our Mean-Ensemble-CNN approach, we combined the CNN architectures by averaging their predicted probabilities. In our NN-Ensemble-CNN approach, we combined the deep features from the last fully connected layer of each trained CNN architecture, then feed them into four layers NN. In addition to evaluating the four CNN architectures and our proposed approaches, we evaluated two texture descriptors and two classifiers. In more detail, we evaluated LPQ features, BSIF features and their combination by using two classifiers which are: SVM and NN.
The experimental results showed that deep learning methods (single architecture) surpass the hand-crafted feature-based methods. On the other hand, our proposed approaches outperform both the hand-crafted feature-based methods and the CNN architectures. In addition, our ensemble approaches outperform the state-of-the-art methods in both databases. As for future work, we are planning to use more data augmentation techniques to augment the training data. Moreover, including other powerful CNN architectures to our ensemble approaches will help to improve the performance.