Deep Learning Techniques for the Classiﬁcation of Colorectal Cancer Tissue

: It is very important to make an objective evaluation of colorectal cancer histological images. Current approaches are generally based on the use of different combinations of textual features and classiﬁers to assess the classiﬁcation performance, or transfer learning to classify different organizational types. However, since histological images contain multiple tissue types and characteristics, classiﬁcation is still challenging. In this study, we proposed the best classiﬁcation methodology based on the selected optimizer and modiﬁed the parameters of CNN methods. Then, we used deep learning technology to distinguish between healthy and diseased large intestine tissues. Firstly, we trained a neural network and compared the network architecture optimizers. Secondly, we modiﬁed the parameters of the network layer to optimize the superior architecture. Finally, we compared our well-trained deep learning methods on two different histological image open datasets, which comprised 5000 H&E images of colorectal cancer. The other dataset was composed of nine organizational categories of 100,000 images with an external validation of 7180 images. The results showed that the accuracy of the recognition of histopathological images was signiﬁcantly better than that of existing methods. Therefore, this method is expected to have great potential to assist physicians to make clinical diagnoses and reduce the number of disparate assessments based on the use of artiﬁcial intelligence to classify colorectal cancer tissue.


Introduction
Colorectal cancer (CRC) is the third most common form of cancer, accounting for about 10% of all cases in the world [1]. The results of many studies have shown that a more accurate classification of medical images can effectively determine the development of colorectal cancer [2,3]. Many common tissue types, such as normal colon mucosa (NORM), adipose tissue (ADI), polyps, cancer-associated stroma (STR), and lymphocytes (LYM) can extract prognosticators directly from these hematoxylin and eosin stains (HE stains), which are the principal tissue stains used in histology [2]. Optical colonoscopy is the medical procedure that is usually used to examine a series of abnormalities on the surface of the colon, including their location, morphology and pathological changes to make a clinical diagnosis. This improves the accuracy of the diagnosis and the ability to predict the severity of the disease in order to apply the most appropriate clinical treatment. Nevertheless, although the correct classification of pathological images is an important factor in assisting doctors to precisely identify the best possible treatment, a great deal of time and effort is required to analyze histopathological images, and the evaluation of tissue classification is easily affected by many subjective factors. Subjective evaluation is generally performed by pathologists who manually review the histological slides images of CRC tissue, which remains the standard for cancer diagnosis and staging. However, the training, experience, evaluation condition or time pressure for each pathologist could result different diagnosis judgement. Hence, the universal automatic classification of CRC pathological tissue slide images for fair evaluation has an important clinical significance.
Pathology slides provide an enormous amount of information, which has been quantified through digital pathology and classic machine learning techniques over the years [4]. Previous research has been based on machine learning approaches for judging the cell classification in the histological slides of tumor tissue. The classification of histopathological images using artificial intelligence not only improves the accuracy and efficiency of the classification, but also enables doctors to make timely decisions in terms of clinical treatment [5,6]. However, most of the proposed experimental methods rely on manual feature labels, which is the main limitation of traditional textual analysis approaches. Therefore, deep learning has been introduced in the last few years to solve this and other limitations. Deep learning is a new technology that is considered to be an evolution of machine learning, since it uses multiple layers of neural networks to learn and progressively extract higher-level features in order to reduce human intervention in the recognition of different classes in the images. It is also effective in classifying non-image data, such as speech recognition, social network filtering, and medical image analysis, and its advanced approach not only reduces the need for human intervention, but it can also automatically achieve results that are comparable to or surpassing those of humans.
Convolutional neural networks (CNN) [7,8] recently showed effective results in classifying images in the field of deep learning where a neural network might have dozens or hundreds of layers to learn containing images with different features. A convolutional layer composed of a small-sized kernel to generate advanced features applies weights to the inputs and directs them through an activation function as the output. The main advantage of using CNN compared to a traditional neural network is that it reduces the model parameters for more accurate results.
With this in mind, we aimed to use deep learning technology to identify medical images to increase the accuracy of the identification due to the automatic classification of tumor types. This involved the achievement of the following objectives: a.
To compare the classification accuracy rate with different CNN models. b.
To find the best performance of deep learning techniques. c.
To compare the results of this method with those of existing techniques.
This paper consists of a systematic study of deep learning and its application for the classification of pathological images. Past studies of deep learning will be reviewed in Section 2, while the approach of deep learning models will be described in Section 3. Details of the experiment will be provided in Section 4, and the paper will be concluded in Section 5 with proposals for possible future investigations in this field.

Related Works and Deep Learning Methodology
Some of the prior studies in relation to the automatic classification of histopathological images will be described and discussed in this section with a further explanation of how deep learning works. This will be followed by a presentation of the proposed method to conduct the current research.

Related Works
Digital technology is currently used extensively to classify medical images, as evidenced by the results of several methods of histopathological image classification shown in Table 1. Kather [2] used a range of textual descriptors to analyze a multi-class problem of tumor epithelium and simple stroma in 5000 histological images. He proposed four classification methods: (1) the k-nearest neighbors algorithm (k-NN), (2) employ an SVM decision function in an attempt to classify all categories, (3) assemble decision tree models using the RUSBoost method, and (4) use a 10-fold cross validation to train the classifiers, without an explicit stratification approach. The results indicated that SVM was the best classification method, which achieved 87.4% accuracy over eight classes. Lately, the classification of tumor types has been found to be more accurate using the CNN classification method. Tsai [9] applied the CNN architecture of a deep learning technique to detect pneumonia from Chest X-rays and achieved an accuracy rate within 82.1% by using feature selection and the CNN.

Literature
Research Objective Approach Classification Technique Accuracy Rate (%) [2] Multi-class texture analysis in colorectal cancer histology Texture based methods One-nearest neighbor, linear SVM, radial-basis function SVM and decision trees.
87.4 [9] Machine learning-based common radiologist-level pneumonia detection on chest X-rays images were epithelial (EP) and stromal (ST). He used automated segmentation or the classification of color features, which included intensive pixels in different color spaces, and analyzed the tumor microenvironment. In his study, Du [11] proposed that learning the basic features of CNN methods outperformed handcrafted features, and automatically distinguished the epithelial and stromal regions in the breast. In addition, he found that colorectal tumors could be distinguished from tumor tissue using a network architecture layer approach with results that were 84% accurate. Transfer learning is a methodology that consists of deep learning techniques to distinguish the features of leverages images. Du [10] discussed the use of transfer learning methods to accurately distinguish breast or ovarian cancer from histological images and of CNN for fine tuning the feature extractor of images. Additionally, he discussed how to distinguish high-level and low-level features inside the neural network. A deep neural network may have multiple layers, the first of which will learn the low-level features and then the more they progress toward the output layer, the more the layers will learn the high-level features. Du [11] also used a transfer learning approach with GoogLeNet and achieved 90.2% accuracy, suggesting the feasibility of using it to classify the tumor stroma ratio (TSR). Xu et al. [12] improved the activation features of the AlexNet model and proposed the characteristic of visualizing the neurons in the last hidden layer to classify and segment them. Trained by ImageNet, the framework successfully transferred the features extracted from the network into little histopathology images features for training and visualization and a test accuracy rate of 97.5% was reported. Bejnordi et al. [13] proposed deep convolutional neural networks with some new geometric features, and trained the algorithm networks to classify stroma images, including stroma, fat tissues, other situ lesions and to predict the stroma regions. Bejnordi analyzed the stroma between surrounding invasive cancer and situ lesions and achieved a 96.2% accuracy. Additionally, Kather [3] replaced the classification layer and the best accuracy rate was 98.7% with VGG19.

Advantages and Limitations of Using Machine Learning Approaches
Machine learning teaches computers to simulate and implement human learning behavior based on computational methods to learn knowledge from sample data. It is widely used in various applications such as image, content recommendation, computer vision, etc., in which it is difficult to develop conventional algorithms and solve the above problems to achieve the required tasks [14]. There are two main techniques: supervised learning (used to learn mapping between input and output) and unsupervised learning (involves using a model to extract relationships from data). The goals of machine learning are feature extraction, selection, prediction and recognition. The detailed processes are shown in Figure 1. This technology can automatically learn knowledge based on the data process in order to make accurate reaction, which generally can save a great deal of time.
Electronics 2021, 10, x FOR PEER REVIEW 4 of 25 of images. Additionally, he discussed how to distinguish high-level and low-level features inside the neural network. A deep neural network may have multiple layers, the first of which will learn the low-level features and then the more they progress toward the output layer, the more the layers will learn the high-level features. Du [11] also used a transfer learning approach with GoogLeNet and achieved 90.2% accuracy, suggesting the feasibility of using it to classify the tumor stroma ratio (TSR). Xu et al. [12] improved the activation features of the AlexNet model and proposed the characteristic of visualizing the neurons in the last hidden layer to classify and segment them. Trained by ImageNet, the framework successfully transferred the features extracted from the network into little histopathology images features for training and visualization and a test accuracy rate of 97.5% was reported. Bejnordi et al. [13] proposed deep convolutional neural networks with some new geometric features, and trained the algorithm networks to classify stroma images, including stroma, fat tissues, other situ lesions and to predict the stroma regions. Bejnordi analyzed the stroma between surrounding invasive cancer and situ lesions and achieved a 96.2% accuracy. Additionally, Kather [3] replaced the classification layer and the best accuracy rate was 98.7% with VGG19.

Advantages and Limitations of Using Machine Learning Approaches
Machine learning teaches computers to simulate and implement human learning behavior based on computational methods to learn knowledge from sample data. It is widely used in various applications such as image, content recommendation, computer vision, etc., in which it is difficult to develop conventional algorithms and solve the above problems to achieve the required tasks [14]. There are two main techniques: supervised learning (used to learn mapping between input and output) and unsupervised learning (involves using a model to extract relationships from data). The goals of machine learning are feature extraction, selection, prediction and recognition. The detailed processes are shown in Figure 1. This technology can automatically learn knowledge based on the data process in order to make accurate reaction, which generally can save a great deal of time.  The deep learning approach generally requires massive amount of data for training which means that the more data there is to train a model, the better it will perform. However, experts are needed for manual identification and labelling of histological images. It subjects to potential time-consumption and high expense issues. Even the underlying method automatically pays attention to discriminative information for better classification, prospective validation studies are still required to firmly establish routine biomarker  The deep learning approach generally requires massive amount of data for training which means that the more data there is to train a model, the better it will perform. However, experts are needed for manual identification and labelling of histological images. It subjects to potential time-consumption and high expense issues. Even the underlying method automatically pays attention to discriminative information for better classification, prospective validation studies are still required to firmly establish routine biomarker for clinical use. In short, highly trained pathologists remain the decision-makers during the subjective evaluation for cancer diagnosis. The techniques developed by deep learning can assist the doctors for more accurate projection but not to replace the duty of physicians.

How Deep Learning Works
Deep Learning is another major subfield of machine learning. Hubel [15] used it to find corresponding relationships between neuron systems based on the cortex cells. Deep learning is inspired by biological nervous systems, and combine multiple nonlinear processing layers and hidden layers to learn features directly from data. Hinton [16,17] proposed that using multi-hidden layers to learn features is conducive to classification, as shown in Figure 2.

How Deep Learning Works
Deep Learning is another major subfield of machine learning. Hubel [15] used it to find corresponding relationships between neuron systems based on the cortex cells. Deep learning is inspired by biological nervous systems, and combine multiple nonlinear processing layers and hidden layers to learn features directly from data. Hinton [16,17] proposed that using multi-hidden layers to learn features is conducive to classification, as shown in Figure 2. The use of deep learning to learn features from multi-hidden layers of a large volume of data enhances the accuracy of predictions and a set of labels can be produced by using a GPU to train this model. Back propagation facilitates statistical regularity. Deep learning is based on the concept of learning from the first layer and automatically learning the features of many images from the combined layers. Each layer uses the output of the previous layer as input and learns to classify new images through to the next layer, and make a prediction.
Many different deep learning models have been developed for image recognition [18] over the past few years, such as histopathological image, facial recognition, and many advanced driver assistance technologies. CNN, which was proposed by Lecun [18,19], has a shared weights multi-neural layer network. The image is directly used as the input, which can reduce the complexity and parameters of the network. Besides, the structure of the network is invariant for image recognition. It is usually composed of two independent neurons, the first of which is an S layer of the characteristic extraction as the number of input connections to the mode, and the neurons are of equal weight. The feature map has displacement invariance, being activated by the small sigmoid function. Another neuron, feature C layer, is the feature extraction layer for anti-deformation. When part of the feature is extracted, its positional relationship with each input neuron connects with the previous layer.
A basic CNN consists of an input layer, output layer, and hidden layers, including ReLU, pooling layers, and fully connected layers: (1) Input layer: the input layer is the beginning of the artificial neural network, and brings the initial data, which comprises of a number of images, height, width, input channels, etc., into the system for further processing by subsequent layers of artificial neurons. (2) Convolutional layers: This layer is used to extract the various features from the input images, such as corners and edges. In this layer, the operation of convolution is per- The use of deep learning to learn features from multi-hidden layers of a large volume of data enhances the accuracy of predictions and a set of labels can be produced by using a GPU to train this model. Back propagation facilitates statistical regularity. Deep learning is based on the concept of learning from the first layer and automatically learning the features of many images from the combined layers. Each layer uses the output of the previous layer as input and learns to classify new images through to the next layer, and make a prediction.
Many different deep learning models have been developed for image recognition [18] over the past few years, such as histopathological image, facial recognition, and many advanced driver assistance technologies. CNN, which was proposed by Lecun [18,19], has a shared weights multi-neural layer network. The image is directly used as the input, which can reduce the complexity and parameters of the network. Besides, the structure of the network is invariant for image recognition. It is usually composed of two independent neurons, the first of which is an S layer of the characteristic extraction as the number of input connections to the mode, and the neurons are of equal weight. The feature map has displacement invariance, being activated by the small sigmoid function. Another neuron, feature C layer, is the feature extraction layer for anti-deformation. When part of the feature is extracted, its positional relationship with each input neuron connects with the previous layer.
A basic CNN consists of an input layer, output layer, and hidden layers, including ReLU, pooling layers, and fully connected layers: (1) Input layer: the input layer is the beginning of the artificial neural network, and brings the initial data, which comprises of a number of images, height, width, input channels, etc., into the system for further processing by subsequent layers of artificial neurons.
(2) Convolutional layers: This layer is used to extract the various features from the input images, such as corners and edges. In this layer, the operation of convolution is performed between the input image and a filter of an image size, which are used to operate the convolution of the input data and set the stride of pixels to scan the full image. Later, this feature map is fed into other layers to learn several other features of the input image. (3) Rectified Linear Unit (ReLU): ReLU is the most common activation function in artificial neural networks. It is also known as transfer for better gradient propagation and efficient computation. f(x) = max (0, x). This function is defined as the positive part of its argument, where x is the input to a neuron, the function used to determine the output of the neural network, and then maps the resulting values between 0 to 1. (4) Pooling layers: The pooling layers provide an approach to address this sensitivity to down sampling features by summarizing their presence in patches of the feature map. This capability of local translation invariance has the effect of making the resulting features focus on changes in the position of the featured images. (5) Fully connected layers: This layer is the end of the network. It is often accompanied by the classifier to make the classification decision, and can be stacked.

CNN Architecture
The structure of the CNN is designed for different purposes [20]. As seen in common neural networks, the neurons in a fully connected layer can be stacked and related to activations in the previous layer. This puts the filtered image on a higher level and puts it into a vote. With each additional layer, the network can learn more complex combinations of functions, which helps to make better decisions. These votes are expressed as the weight of the connection between each value or category. Therefore, their activation can be calculated by matrix multiplication and bias offset, and the main operations are performed by the backpropagation algorithm and the random gradient descent of the momentum algorithm, and fed back to the optimization method of weight update [20].
CNN is a cascaded filter, the first block of which is dedicated to detecting lower-level features (such as sharp points and surfaces folds), and the subsequent one is aggregated by the previous activation. From the perspective of deep learning, the main advantage of this architecture compared to traditional networks is that it can reduce the parameters of the image. CNN is composed of multiple connected kernels. Continuous layer learning features are gradually improved at the abstract level, and the input information can be represented hierarchically by combining low-level and high-level features. The objective of the fully connected layer is to take the results of the convolution to classify the images.
Currently, CNN [21] is widely used for image recognition by deep learning methods. Convolutional neural networks replace feature extraction, feature selection and classification. Different combinations of convolutional layers contain a series of fixed-size filters, which are used to manipulate the convolution of input data to generate so-called feature value maps (feature maps). During training in the context of convolutional neural networks, these filters can provide useful modules for image recognition, such as line detectors, regular edges and changes in image color. The ReLU layer usually follows the operation of the convolutional layer and provides a non-saturated activation function f(x) = max (0, x) for the output. According to Krizhevsky's research [22], these equations can be used to train fast-converging neural network convolutions, and can also solve the gradient problem, thereby accelerating the training.

Five Different CNN Models Networks
In this paper, we used five common deep neural networks based on CNN models and proposed an improved classification model for systematic colorectal cancer (CRC) tissue. AlexNet AlexNet [22] is a widely-applied deep convolutional neural network, which can still achieve a competitive performance in classification compared to other kinds of networks. In the training stage of the AlexNet model, the input image is resized to 224 × 224 pixels and fed into the network. The architecture of AlexNet firstly adopts a convolutional layer to perform convolution and max pooling with local response normalization (LRN) using 96 different size 11 × 11 receptive filters. The max pooling operations are performed with 3 × 3 filters with a stride size of 2. The same operations are performed in the second layer with 5 × 5 filters. The 3 × 3 filters are used in the third, fourth and fifth convolutional layers with 384, 384, and 296 feature maps respectively. The output of the two fully connected (FC) layers is used as an extracted feature vector with dropout followed by a softmax layer at the end.

SqueezeNet
SqueezeNet [23] is a small CNN architecture, which achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, model compression techniques enabled us to compress SqueezeNet to less than 0.5 MB (510× smaller than AlexNet). The SqueezeNet begins with a standalone convolution layer (conv1), followed by 8 Fire modules (fire2-9), ending with a final convolution layer (conv10). We gradually increased the number of filters per fire module from the beginning to the end of the network. SqueezeNet performed max-pooling with a stride of 2 after layers conv1, fire4, fire8, and conv10.
VGGNet VGGNET [24] was the runner up of the 2014 ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The main contribution of VGGNET is that it shows that the depth of a network is a critical component to achieve better recognition or classification accuracy in CNNs. The VGG architecture consists of two convolutional layers both of which use the ReLU activation function, which is followed by a single max pooling layer and several fully connected layers, which also use the ReLU activation function. The final layer of the VGGNet model is a Softmax layer for classification. In addition, the size of the convolution filter is changed to a 3 × 3 filter with a span of 2 in VGG-E.

GoogLeNet
The GoogLeNet [25] main architecture improves the computing resources inside the network model to incorporate inception layers with the objective and reducing complexity. It not only increases the depth of the architectural approach (adding 1 × 1 convolutional layers to the network) with a different kernel, but also the width of the network. This reduces the number of computation layers to capture sparse correlation patterns.

ResNet
The ResNet [26] has a residual learning framework with ultra-deep networks, and the residual functions can ease networks that did not lose out from a vanishing gradient problem. Unexpectedly, as the depth of the ResNet framework network increases, the accuracy is saturated, but adding more layers causes training errors.

Research Method
Deep learning has gained enormous popularity in scientific computing due to CNN, and its algorithms are widely used by industries to solve complex problems. In this study, we observed different network architectures [22][23][24][25][26] for comparison purposes.

Experimental Steps
The diagram in Figure 3 illustrates the recognition process, which can be divided into three stages. The first stage is model training, the second is finding the superior architecture and parameters, and the third is model testing:

Model Training
In this step, we used two different datasets. The first dataset we used was NCT-CRC-HE-100K [3] image documents of histological images with a 150 × 150-pixel format, which included nine different tissue classes. We divided it into 70% training dataset, 15% validation dataset, and 15% test dataset. Since the number of each type in the original dataset was not the same, we used the number ratio of each type to take the corresponding number of training, validation and test for tissue classification in order to ensure the right proportion. Another dataset, the Kather-texture-2016-image [2] represents a collection of textures of eight tissue categories of human colorectal cancer in 5000 histological images of 150 × 150 pixels. Each image belongs to exactly one of the eight tissue categories, and the group sizes balanced (625 images per set). We divided it into 70% training dataset, 15% Step2. Find the best architecture and parameters

Model Training
In this step, we used two different datasets. The first dataset we used was NCT-CRC-HE-100K [3] image documents of histological images with a 150 × 150-pixel format, which included nine different tissue classes. We divided it into 70% training dataset, 15% validation dataset, and 15% test dataset. Since the number of each type in the original dataset was not the same, we used the number ratio of each type to take the corresponding number of training, validation and test for tissue classification in order to ensure the right proportion. Another dataset, the Kather-texture-2016-image [2] represents a collection of textures of eight tissue categories of human colorectal cancer in 5000 histological images of 150 × 150 pixels. Each image belongs to exactly one of the eight tissue categories, and the group sizes balanced (625 images per set). We divided it into 70% training dataset, 15% validation dataset, and 15% test dataset, as shown in Table 2. Secondly, we used 5 different CNN models for training: AlexNet [22], SqueezeNet [23], VGG19 [24], GoogLeNet [25], and Resnet50 [26]. Finding the Superior Architecture and Parameters To gauge the performance of the network architectures of these CNN models, in the first research experiment we compared three training network method optimizers: stochastic gradient descent with momentum (SGDM), root mean square propagation (RMSProp) and adaptive moment estimation (Adam) with an NCT-CRC-HE-100K [3] dataset. As a result, the root-mean-square prop (RMSprop) method used the training options as input to the argument to select the training network to train, validate and test the CNN model to the highest accuracy, as shown in Table 3. In the second experiment, we trained the 5 different CNN models neural network with the root-mean-square prop (RMSprop) method. We further compared the replaced mini-batch size and epoch to test the models.

Model Testing
The last stage of our architectural parameters involved classifying the histological images through each CNN model's neural network architecture by training the model to identify the types from different tissue classes. After neural network training all the 100,000 image patches (which were derived from 86 whole-slide images) in the first dataset (NCT-CRC-HE-100K), we used the dataset for testing purposes. Besides, we assessed the accuracy of the tissue classification and the convolutional neural network using an independent external dataset (CRC-VAL-HE-7K), which contained 7180 image patches derived from 25 hematoxylin and eosin (H&E) slides of human CRC tissue. Additionally, we used 70% of the dataset (Kather-texture-2016-image) consisting of 5000 images in eight classes of colorectal cancer tissue for training, 15% for validation and the other 15% for testing. We created a confusion matrix chart of the experimental results and showed the precision of each class using column and row summaries. The percentage of correctly and incorrectly classified observations for the true class were shown in the normalized row, while the percentage of correctly and incorrectly classified observations for the predicted class were shown in the normalized column.

Images of Nine Tissue Classes
In this experiment, we used the open histological dataset of nine tissue classes from NCT-CRC-HE-100K for model training. These images were generated by Kather et al. [3], and have 86 hematoxylin and eosin stain (H&E) slides of tissue. The labels of the histological images of the available data were taken from the NCT-UMM website. The example images of the nine tissue classes are listed in Figure 4. The dimensions of all the images were 224 × 224 pixels (112 × 112 µm), and they were presented to the model network sequentially for training, validation and testing. After training and testing our network framework with the "NCT-CRC-HE-100K", we also assessed the accuracy of the tissue classification with an external validation set, data description "CRC-VAL-HE-7K", which contained 7180 image patches for testing purposes only. The nine classes are categorized as following ( while the percentage of correctly and incorrectly classified observations for the predicted class were shown in the normalized column.

Images of Nine Tissue Classes
In this experiment, we used the open histological dataset of nine tissue classes from NCT-CRC-HE-100K for model training. These images were generated by Kather et al. [3], and have 86 hematoxylin and eosin stain (H&E) slides of tissue. The labels of the histological images of the available data were taken from the NCT-UMM website. The example images of the nine tissue classes are listed in Figure 4. The dimensions of all the images were 224 × 224 pixels (112 × 112 µ m), and they were presented to the model network sequentially for training, validation and testing. After training and testing our network framework with the "NCT-CRC-HE-100K", we also assessed the accuracy of the tissue classification with an external validation set, data description "CRC-VAL-HE-7K", which contained 7180 image patches for testing purposes only. The nine classes are categorized as following (

Images of Eight Tissue Classes
We used the open dataset Kather-texture-2016-image to verify the accuracy of our optimized deep neural network architecture in distinguishing other tissue classes. This dataset was collected from the Institute of Histological Images of Pathology of Human Colorectal Cancer taken from the pathology archive by Kather, et al. [2]. The dataset consisted of 5000 non-duplicated histological images of human colorectal cancer (CRC) using hematoxylin and eosin (H&E) and healthy normal tissue images. This dataset created images with 150 × 150 pixels (74 × 74 µm) each for every RGB color, and contained eight different tissue texture features and original tissue images with a size of 5000 pixels (e.g., Figure 5).

Images of Eight Tissue Classes
We used the open dataset Kather-texture-2016-image to verify the accuracy of our optimized deep neural network architecture in distinguishing other tissue classes. This dataset was collected from the Institute of Histological Images of Pathology of Human Colorectal Cancer taken from the pathology archive by Kather, et al. [2]. The dataset consisted of 5000 non-duplicated histological images of human colorectal cancer (CRC) using hematoxylin and eosin (H&E) and healthy normal tissue images. This dataset created images with 150 × 150 pixels (74 × 74 µ m) each for every RGB color, and contained eight different tissue texture features and original tissue images with a size of 5000 pixels (e.g., Figure 5).

Software and Tools Platform
In this study, we used the MATLAB of R2020a based on deep neural network architecture to train and test on two Intel workstation computers with high-level NVIDIA QUADRO GeForce GTX 1070 GPUs, and OS Windows 10 64-bit Core i7 i7-7700with 3.60 GHz Processor (4 Core).

Experiments and Discussion
A series of experiments on different convolutional neural networks (CNNs) models were conducted in this study, including AlexNet, SqueezeNet, VGGNet, GoogLeNet and ResNet50. In Experiment I, we compared the accuracy rate of three training network optimizer methods: the stochastic gradient descent with momentum (SGDM), the root mean

Software and Tools Platform
In this study, we used the MATLAB of R2020a based on deep neural network architecture to train and test on two Intel workstation computers with high-level NVIDIA QUADRO GeForce GTX 1070 GPUs, and OS Windows 10 64-bit Core i7 i7-7700 with 3.60 GHz Processor (4 Core).

Experiments and Discussion
A series of experiments on different convolutional neural networks (CNNs) models were conducted in this study, including AlexNet, SqueezeNet, VGGNet, GoogLeNet and ResNet50. In Experiment I, we compared the accuracy rate of three training network optimizer methods: the stochastic gradient descent with momentum (SGDM), the root mean square propagation (RMSProp), which utilizes the magnitude of recent gradients to normalize the gradients, and the adaptive moment estimation (Adam), which is an optimization algorithm that can be used for a classical stochastic gradient descent. In addition, parts of the parameters in the network layers were modified, such as the minibatch size and different epoch. Next, we used our approach to identify the accuracy rate of the colorectal cancer tissue types from the histological images in different open datasets, and the results are presented in the next section.

Randomly Split the Dataset
Next, we split the image dataset into three data stores: 70% into training data and 15% each into testing and validation, so that none of them overlapped with the others.

Specify a Set of Options for Training
The network was trained using stochastic gradient descent with three training method optimizers after defining the network structure: stochastic gradient descent with momentum (SGDM), root mean square propagation (RMSProp), adaptive moment estimation (Adam) with an initial learning rate of 0.01 and four training periods on the entire dataset.

Train the Network
Train the network of histological images, and monitor the accuracy rate.

Predict Classification Accuracy
Predict the test data with three open datasets to calculate the final accuracy rate and execution time.

Experimental Results
Since deep learning technique will be adopted in this study, the performance of a CNN model depends on many factors in general. For example, the weight initialization, batch sizes, epochs, learning rates, activation function, optimizer, loss function, network topology, etc. The optimizer selection study of [27] for brain tumor segmentation in magnetic resonance images (MRI) suggests that a good optimizer could be a critical issue for the proposed approach. The authors of [27] listed 10 different state-of-the-art optimizer including: adaptive gradient (Adagrad), adaptive delta (AdaDelta), stochastic gradient descent (SGD), adaptive momentum (Adam), cyclic learning rate (CLR), adaptive max pooling (Adamax), root mean square propagation (RMS Prop), Nesterov adaptive momentum (Nadam), and Nesterov accelerated gradient (NAG) for CNN. The Adam optimizer achieved the best accuracy in study of [27] for MRI. Comprehensive analyses have been performed during this study for those optimizers. Based on the performance of final results, only SGDM, RMSProp, Adam are listed since their performance are overall better than other optimizers for different network models.
Firstly, the open dataset description (Kather-texture-2016-image), which included 5000 images in eight tissue classes are trained and the results of the experiment are shown in Table 4, and the plot of the confusion matrix in Figure 6. Firstly, the open dataset description (Kather-texture-2016-image), which included 5000 images in eight tissue classes are trained and the results of the experiment are shown in Table 4, and the plot of the confusion matrix in Figure 6.   Secondly, we used the data description (NCT-CRC-HE-100K) image documents with a 224 × 224-pixel format to classify the histological images, which included 100,000 images in nine different tissue classes, and display the precision for each class by using column and row summaries to plot the confusion matrix, as shown in Figure 7. In addition, we tested the classification performance with another independent set of 7000 images from different patients (CRC-VAL-HE-7K), and plotted the confusion matrix, as shown in Figure 8. The detailed results are shown in Table 5.
Among the selected optimizer, unlike Adam achieved the highest accuracy for brain tumor segmentation in magnetic resonance images, root-mean-square prop (RMSprop) network optimizer consistently have the highest accuracy rates for colorectal cancer tissues shown in Tables 4 and 5. Therefore, RMSprop optimizer will be adopted in the following experiments.
Electronics 2021, 10, x FOR PEER REVIEW 15 of 25 Secondly, we used the data description (NCT-CRC-HE-100K) image documents with a 224 × 224-pixel format to classify the histological images, which included 100,000 images in nine different tissue classes, and display the precision for each class by using column and row summaries to plot the confusion matrix, as shown in Figure 7. In addition, we tested the classification performance with another independent set of 7000 images from different patients (CRC-VAL-HE-7K), and plotted the confusion matrix, as shown in Figure 8. The detailed results are shown in Table 5. Among the selected optimizer, unlike Adam achieved the highest accuracy for brain tumor segmentation in magnetic resonance images, root-mean-square prop (RMSprop) network optimizer consistently have the highest accuracy rates for colorectal cancer tissues shown in Tables 4 and 5. Therefore, RMSprop optimizer will be adopted in the following experiments.

Approach
Using the same split dataset method as shown in Experiment 1, we trained the neural network of five different CNN models with the most accurate optimizer: root-meansquare prop (RMSprop) and compared it with the replaced mini-batch size and epoch. Parts of the network layer were extracted to a new mode in the model revision process, which was used to extract the image features to modify the parameter. We improved this stage based on five cycles of the minimum batch size per training. We considered different models of convolutional neural networks (CNNs), such as AlexNet, SqueezeNet, VGG-Net, GoogLeNet and ResNet for the classification of the pathological images.
The architectural design of the convolutional neural network (CNN) ResNet50 can be seen in Table 6. (ONV + POOL)_maxrepresents the convolutional layer, followed by  Using the same split dataset method as shown in Experiment 1, we trained the neural network of five different CNN models with the most accurate optimizer: root-mean-square prop (RMSprop) and compared it with the replaced mini-batch size and epoch. Parts of the network layer were extracted to a new mode in the model revision process, which was used to extract the image features to modify the parameter. We improved this stage based on five cycles of the minimum batch size per training. We considered different models of convolutional neural networks (CNNs), such as AlexNet, SqueezeNet, VGGNet, GoogLeNet and ResNet for the classification of the pathological images.
The architectural design of the convolutional neural network (CNN) ResNet50 can be seen in Table 6. (ONV + POOL)_maxrepresents the convolutional layer, followed by the use of the maximum generalized pooling layer, and (CONV + POOL)_avg is a pooling layer that follows the generalization of the average.

Experimental Results
In the second part, we compared five different CNN networks with the replaced mini-batch size and epoch from the network to form a new model. We began by using the data "NCT-CRC-HE-100K" documents of histological images for the model training with a 224 × 224-pixel format for classification, including 100,000 images of nine different tissue classes, and displaying the precision of each class by using column and row summaries to plot the confusion matrix, as shown in Figure 9. The detailed results are shown in Table 7. Secondly, we tested the classification performance in an independent dataset of 7180 images from "CRC-VAL-HE-7K", and plotted the confusion matrix as shown in Figure 10. The detailed results are shown in Table 8. In addition, we used the open dataset description "Kather-texture-2016-image", which included 5000 images in eight tissue classes. The experimental results are shown in Table 9, and the plot of the confusion matrix is shown in Figure 11.       By the same token, based on the experimental results, it can be seen that, when revising the parameters, ResNet50 was found to have achieved the highest accuracy rate at 15 epoch shown in Figure 12a, and 32 mini-batch size for nine classes of CRC images, as shown in Figure 12b. Furthermore, the same parameters of the ResNet50 neural network used for eight types of CRC images achieved a ratio of 94.86% accuracy, as shown in Figure 13a,b. Further extensive experiments have been conducted to verify the efficacy of different variants of the ResNet architecture, such as ResNet18, ResNet50 and ResNet101 [26]. It is also worth noting that an accuracy rate of 99.69% can be achieved using 177 layers of a neural network (ResNet50), which is better than the 98.61% using 71 layers of ResNet18 and the 99.31% using 347 layers of ResNet101. Furthermore, an accuracy rate of 94.86% can be achieved using 177 layers and the same parameters of a ResNet50 neural network for eight classes of CRC images, which is better than the 92.86% using 71 layers of ResNet18 and the 94.16%, using 347 layers of ResNet101. The differences between ResNet18, ResNet50 and ResNet101 are highlighted in Figure 14. It can be seen from the previous experiments that the best classification accuracy rate can achieved by revising the parameters and using ResNet50.  By the same token, based on the experimental results, it can be seen that, when revising the parameters, ResNet50 was found to have achieved the highest accuracy rate at 15 epoch shown in Figure 12a, and 32 mini-batch size for nine classes of CRC images, as shown in Figure 12b. Furthermore, the same parameters of the ResNet50 neural network 94.86% can be achieved using 177 layers and the same parameters of a ResNet50 ne network for eight classes of CRC images, which is better than the 92.86% using 71 la of ResNet18 and the 94.16%, using 347 layers of ResNet101. The differences between Net18, ResNet50 and ResNet101 are highlighted in Figure 14. It can be seen from the vious experiments that the best classification accuracy rate can achieved by revising parameters and using ResNet50.

Discussion
After the detailed explanation of the approach and experiments, it is necessary to compare the performance of the proposed techniques with published data. In Reference [13],

Discussion
After the detailed explanation of the approach and experiments, it is necessary to compare the performance of the proposed techniques with published data. In Reference [13], Kather et al. applied the same NCT-HE-100K data set of 100,000 histological images to train a VGG19 CNN model and tested the classification performance in an independent set of 7180 images from different patients (CRC-VAL-HE-7K). The overall nine-class accuracy was close to 99% in an internal testing set and 94.3% in an external testing set. Unlike the approach in [13], the experimental results of ResNet50 outperform the data of VGG19 in Table 7. We had achieved 99.69% accuracy rate in the same internal testing set and 99.32% in the same external testing set from Figure 14. Through comprehensive and thorough analyses, this study suggests that ResNet50 could be a better deep learning architecture for colorectal cancer tissue than VGG19.
To further validate our claim, the independent data set with eight classes of [2] is also utilized for comparison purpose. Through our study, ResNet50 achieved 94.86% accuracy in Figure 14 and [2] reported the best accuracy rate was 87.4%. Through comprehensive studies and comparison, it is highly suggested that ResNet50 with suggested settings of this study could be the most efficient and accuracy deep learning techniques to classify colorectal cancer tissue.
Since deep neural networks have been adopted in this study as the classifier, the modular design of those models conveniently provides their architecture to specific needs. Many factors could easily be modified like weight initialization, batch sizes, epochs, learning rates, activation function, optimizer, loss function, network topology, etc. to improve the classification accuracy. Among various settings for the superior classification performance, several studies [28][29][30] have suggested that loss function could be critical to affect the deep learning models and learning efficiency, as well as the classifier robustness to various situation.
In this study, the authors adopted the transfer learning of deep learning architecture for the classification of colorectal cancer tissue, those network models are optimized based on the pre-trained data from ImageNet [22]. Since ImageNet is a large labeled dataset of real-world images, it is one of the most widely used dataset in latest computer vision research and several well-known models are the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners. The loss functions for each network adopted in this study are cross entropy loss function for all network models. The authors would pay extra attention to optimize the selection of loss function for future research in order to further improve the overall accuracy, class imbalance awareness and convergence speed for the classification of colorectal cancer tissue. Therefore, our research could effectively classify the medical images in aiding clinical care and treatment.

Conclusions
This study was based on exploring different deep learning models for the recognition of colorectal cancer tissue using CNN. An improved version of deep learning parameters was proposed in this article to improve the accuracy of classification. In order to verify our optimized parameters, we used CRC histological images as the experimental dataset, and compared the ability of the five most commonly-used deep learning network models to accurately distinguish colorectal cancer tissues. Based on the experimental results, our method was superior to the techniques described in the literature and achieved a high recognition rate. In summary, the nine-class accuracy of NCT-HE-100K data set of 100,000 histological images was close to 99% in an internal testing set and 94.3% in an external testing set in [3]. However, the experimental results of ResNet50 in this study achieved 99.69% accuracy rate in the same internal testing set and 99.32% in the same external testing set which outperform the data of VGG19 of [3]. In addition, the independent data set with eight classes of [2] is also utilized for comparison purpose. Consequently, ResNet50 achieved 94.86% accuracy and [2] reported the best accuracy rate was 87.4%. Through comprehensive studies and comparison, it is highly suggested that ResNet50 with