Deep Neural Network Models for Colon Cancer Screening

Simple Summary Deep learning models have been shown to achieve high performance in diagnosing colon cancer compared to conventional image processing and hand-crafted machine learning methods. Hence, several studies have focused on developing hybrid learning, end-to-end, and transfer learning techniques to reduce manual interaction and for labelling the regions of interest. However, these weak learning techniques do not always provide a clear diagnosis. Therefore, it is necessary to develop a clear explainable learning method that can highlight factors and form the basis of clinical decisions. However, there has been little research carried out employing such transparent approaches. This study discussed the aforementioned models for colon cancer diagnosis. Abstract Early detection of colorectal cancer can significantly facilitate clinicians’ decision-making and reduce their workload. This can be achieved using automatic systems with endoscopic and histological images. Recently, the success of deep learning has motivated the development of image- and video-based polyp identification and segmentation. Currently, most diagnostic colonoscopy rooms utilize artificial intelligence methods that are considered to perform well in predicting invasive cancer. Convolutional neural network-based architectures, together with image patches and preprocesses are often widely used. Furthermore, learning transfer and end-to-end learning techniques have been adopted for detection and localization tasks, which improve accuracy and reduce user dependence with limited datasets. However, explainable deep networks that provide transparency, interpretability, reliability, and fairness in clinical diagnostics are preferred. In this review, we summarize the latest advances in such models, with or without transparency, for the prediction of colorectal cancer and also address the knowledge gap in the upcoming technology.


Introduction
Colorectal cancer is the third most common cancer worldwide and was the second most common cause of cancer-related deaths in 2018 [1,2]. Endoscopic removal of precancerous lesion is considered the best way to prevent colorectal cancer. The prognosis of patients with colorectal cancer can be improved by early detection of cancerous lesions; thus, there is a need for reliable, early, and accurate endoscopic diagnosis [3][4][5][6]. Colonoscopy is the gold standard for screening colorectal lesions [7][8][9]. However, the rate of missed polyp detection during colonoscopy increases according to the expert's knowledge of endoscopy [10][11][12].
Hence, artificial intelligence (AI) technologies could help in reducing the skill gaps among clinicians and thereby decrease the rate of missed lesions during colonoscopy [13][14][15][16].
Given their shared features, colon cancer and rectal cancer are often indicated together. In this study, rectal, colorectal, and other types of cancers related to colon cancer were analyzed using deep learning [17][18][19][20]. Convolutional neural network (CNN)-based standard deep structures have been extensively used to segment and classify colon lesions as being distinct from other unwanted regions [21][22][23][24][25]. However, to date, most of the AI for computer-aided diagnostic systems discussed in the literature have relied on extensive manual parameter setting for feature pattern extraction, which affects the outcomes [26][27][28][29]. Hand-crafted features with a feature selection module are required before implementing a neural network, which can be automatically interpreted and markedly improves the accuracy of colorectal cancer diagnosis [30][31][32].
Two separate colorectal cancer neural networks that were developed for segmentation and classification of colon glands achieved accuracy in detecting benign and malignant cancer [33]. Although these showed good performance, their frameworks typically showed poor performance in detecting lumen and gland size variations. This may be due to manual parameter settings for reducing various illumination conditions, which affects the region of interest for the classification of features. This causes bias and is undesirable for lesion detection.
Most previous systems relied on preprocessing to extract features for deep learning structures [34][35][36][37]. Only a few of these systems used end-to-end learning, allowing automatic extraction of features from images without requiring expert feature detection [38][39][40][41]. However, the information essential for clinical decision-making based on these architectures is often hidden in high-dimensional spaces and is not comprehensible to humans. It is, therefore, essential to address the interpretability and explainability of decisions in healthcare. If these aspects are not addressed, these challenges may limit the chances of adoption of an automatic system in real-world clinical practice. Thus, it is important to develop AI approaches that can generate additional new attentive information in order to gain insights into the behavior of networks. This is not yet widely available or exploited in current diagnostics. However, a few methods that have approached the interpretability of these networks have been developed [42,43].
Network training with unbalanced data distributions produces high-precision and low-recall predictions and is severely biased toward the majority classes [44,45]. This is unacceptable because of potential false negatives, which are more important than false positives in cancer diagnosis. This also emphasizes the importance of the development of more reliable AI techniques and interpretations.
Although an increasing number of AI systems for the detection of colorectal cancer have been developed, they have not focused on interpretability, reliability, or the potential to design a cost-effective AI system-based diagnostic framework. Our systematic review explains the descriptions of the recent AI learning based on hybrid, end-to-end, knowledgetransferring, explainable AI and sampling methods and elucidates its advantages and disadvantages for more reliable detection. We also investigate the gaps in subsequent decision-making, identify future challenges, and present further recommendations. We have summarized literature retrieved from PubMed on the latest developments in deep learning (DL) models focusing on colorectal cancer.

Imaging Modalities
To remain consistent and avoid a selection bias towards the datasets, several studies used varied images and sizes. One study used 224 × 224 RGB images with a resolution of 256 × 256 pixels of 200 normal tissue samples and 200 tumor samples [39]. The slidingwindow technique was used to break these down into smaller images. Other studies used various images, such as endoscopic and whole slide images (WSI) for the detection of colon cancer [46,47]. Another study used a larger image size (768 × 768 pixels) to preserve tissue architecture information and reduce computational cost, as opposed to a smaller patch size (384 × 384 pixels), which produced the same result but had a higher computational cost [48]. In the study using WSIs of cytokeratin immunohistochemistry obtained using a digital slide scanner, images were standardized to set 1 µm = 1 pixel and were saved as non-layered joint photographic experts group (JPEG) images, which were then converted into binary images after deletion of non-cancerous areas [49]. Another study utilized an automatic cropping approach, which removed black margins and resulted in a square image with a 1:1 ratio [50].

Methodological Approaches
Recent studies using DL models for recognizing colorectal polyps were able to achieve good performance with a large amount of data. However, the predictions of nonpolypoid lesions were unclear [34,37]. This is clinically critical, because the target task of the developed AI system is the accurate identification of nonpolypoid lesions, given that this is not a difficult task for an endoscopist. Furthermore, an AI system that can achieve superior sensitivity and specificity by preventing missed lesions, without being user-dependent, would be highly useful in clinical trials. Such a system could be particularly valuable for improving reliability and reducing interobserver variability. The DL methods explained in the following sections were originally implemented for specific tasks and can be applied to colon screening and diagnostic tasks using various types of images ( Figure 1). of colon cancer [46,47]. Another study used a larger image size (768 × 768 pixels) to preserve tissue architecture information and reduce computational cost, as opposed to a smaller patch size (384 × 384 pixels), which produced the same result but had a higher computational cost [48]. In the study using WSIs of cytokeratin immunohistochemistry obtained using a digital slide scanner, images were standardized to set 1 μm = 1 pixel and were saved as non-layered joint photographic experts group (JPEG) images, which were then converted into binary images after deletion of non-cancerous areas [49]. Another study utilized an automatic cropping approach, which removed black margins and resulted in a square image with a 1:1 ratio [50].

Methodological Approaches
Recent studies using DL models for recognizing colorectal polyps were able to achieve good performance with a large amount of data. However, the predictions of nonpolypoid lesions were unclear [34,37]. This is clinically critical, because the target task of the developed AI system is the accurate identification of nonpolypoid lesions, given that this is not a difficult task for an endoscopist. Furthermore, an AI system that can achieve superior sensitivity and specificity by preventing missed lesions, without being user-dependent, would be highly useful in clinical trials. Such a system could be particularly valuable for improving reliability and reducing interobserver variability. The DL methods explained in the following sections were originally implemented for specific tasks and can be applied to colon screening and diagnostic tasks using various types of images ( Figure 1).

Hybrid Learning Methods
Hybrid learning methods combine various algorithms, processes, or procedures from different applications. In situations where datasets are lacking, extracting the most relevant information from the available datasets is important for analysis. This technique can be helpful, particularly for extraction and classification of colon cancer. Ghosh et al. developed a hybrid learning model that combined two machine learning techniques involving supervised (SL) and unsupervised learning techniques for the detection of colon

Hybrid Learning Methods
Hybrid learning methods combine various algorithms, processes, or procedures from different applications. In situations where datasets are lacking, extracting the most relevant information from the available datasets is important for analysis. This technique can be helpful, particularly for extraction and classification of colon cancer. Ghosh et al. developed a hybrid learning model that combined two machine learning techniques involving supervised (SL) and unsupervised learning techniques for the detection of colon cancer. This yielded better accuracy than existing approaches and could potentially be used for real-time cancer detection [51]. This study evaluated data clustering by K-means, the Girvan-Newman algorithm, and Mahalanobis distance-based clustering, followed by feature selection and dimensionality reduction based on principal component analysis. The data was then fed into an artificial neural network (ANN) for colon cancer classification. Another study on colorectal cancer involving small datasets utilized the CNN system ConvNet from the Visual Geometry Group (VGG) and modified it in five different ways. The configuration model that could best identify tumor images was then evaluated [39], and it was found that the best model was the one that had the most weight layers and depths and displayed the most stable accuracy and loss curves. However, that study did not include some variables, such as large images, to ease computation. To overcome gaps in colonoscopy, i.e., the detection of non-polypoid colorectal lesions, Yamada et al. first trained a Faster R-CNN model with an ImageNet dataset and then further trained it with images of polypoid lesions, consecutive lesions, and noncancerous tissues taken from videos to detect features such as edges and curves [46]. In another study, the issues associated with small datasets was addressed by a method that included images of polyps in the dataset. This produced more samples for training, while at the same time it preserved the realistic features of the images [52]. This model improved the colonic polyp detection rates and also reduced the false-negative rate. Furthermore, Ho et al. utilized a hybrid AI model using training data with annotations from pathologists. They applied a classical machine learning classifier and a Faster R-CNN model with ResNet-101 for glandular segmentation and achieved high detection and sensitivity rates for colorectal features [47].
The segmentation model provides detailed results based on individual samples and enables pathologists to derive further quantitative data from WSIs. For instance, the application of segmentation not only allows the study to segment colonic tissues into categories, but also segments other structures, such as blood vessels and inflammatory lesions. Yu et al. compared SL and semi-supervised learning (SSL) and showed that the latter performed better and had better generalization abilities than SL with a small amount of labeled data and large amounts of unlabeled data [53]. They also demonstrated that the SL model had reduced generalization performance when training data and testing data were not obtained from the same source. In a study by Urban et al., a different CNN model was trained with images from ImageNet, resulting in a highly accurate model with potential for real-world use [34]. Moreover, an accurate, reliable, and active (ARA) strategy was implemented in a new Bayesian DL CNN model (ARA-CNN), which was tasked with classifying colorectal tissue and which provided an estimated uncertainty using variational dropout to quicken the learning process [54]. The model, which was inspired by Microsoft ResNet and DarkNet 19, displayed high accuracy and surpassed other methods that were trained using the same dataset. Furthermore, the detection of colorectal cancer using a DL-based Inception V3 model pre-trained with an ImageNet dataset and combined with segmentation from digitized hematoxylin-eosin (H&E)-stained histology slides yielded good performance [48].
A computer-aided diagnostic system for endocytoscopic imaging can support nonexperts in diagnosing lesions without prior training. Such a system showed an accuracy rate comparable to those of experts, and hence is more beneficial to trainees, as it only requires the push of a button to obtain a real-time diagnostic output [55]. Takamatsu et al. used image processing combined with machine learning in Image J software to generate a prediction model for colorectal cells in lymph node metastasis (LNM) and used cytokeratin immunohistochemistry obtained from a digital slide scanner for an accurate detection of cancer foci. It successfully predicted LNM [50]. A further study sought to develop a mass screening method for determining colorectal cancer risk. To this end, they evaluated seven SL models, i.e., linear discriminant analysis, support vector machine, naive Bayes, decision tree, random forest, logistic regression, and ANN. They followed this up with six imputation methods to deal with missing data (mean, Gaussian, Lorentzian, one-hot encoding, Gaussian expectation-maximization, and listwise deletion) [56]. It was found that the model combining ANN and Gaussian expectation-maximization fared best and had the potential to be used as a screening tool for early detection of colorectal cancer. Wan et al. developed the early cancer prediction algorithm model, based on an existing model (the nonnegative matrix factorization method), but with reduced matrix dimensionality and removal of repetitive data, resulting in more interpretable data [57]. Utilizing CNNs, such as AlexNet and Caffe, on colon endoscopy cases is useful in detecting protruding, flat, and recessed lesions. This was proven to yield accurate diagnoses and good areas under the receiver operating characteristic curve (AUC), as demonstrated by Ito et al. [58]. Tamai et al. investigated magnifying narrow-band imaging (M-NBI). This is a detailed observation approach usable in endoscopic diagnosis of colorectal lesions, albeit requiring knowledge and experience. They demonstrated its potential to be utilized with computer-aided diagnosis [59]. The software was then used to divide images of colorectal lesions into three groups: hyperplastic polyps, adenoma/adenocarcinoma lesions, and submucosal-deep lesions. Some diagnoses differed from those of expert endoscopists, which led the authors to believe that this model may have limitations in diagnosing villous lesions.
In a study focusing on the diagnosis of colorectal adenoma, training slides were accurately labeled using a custom-developed system for annotation on iPad. The authors investigated a DL model based on DeepLab v2 with ResNet-34 and found that it yielded performance on par with that of pathologists [60]. The study also demonstrated that the deeper the network, the more information was displayed, such as gland shape, nucleus, and cell form. The report also stated that the model identified abnormalities in the glands as adenomatous glands. Rathore et al. developed a novel strategy that combined textural and geometric features of colon tissues and traditional features for the detection of colon cancer cells, and its classification into normal and malignant cases [61]. The study involved the usage of a hybrid feature space based on colon-classification; morphological, texture, scale-invariant feature transform; elliptic Fourier descriptors; and support vector machines as classifiers to extract and classify the datasets. In addition, Nadimi et al. developed a CNN by improving ZF-Net, a model that combines transfer learning, pre-processing, and data augmentations, before deploying it to a Faster R-CNN to restrict images to regions that contained colorectal polyps. This approach yielded high accuracy in autonomous detection of polyps and demonstrated high interpretability in sensitive regions by providing saliency maps [62].

End-to-End Learning Methods
In order to improve reliability, end-to-end DL models were considered for colon cancer identification. End-to-end (e2e) learning is the process of training a complex learning system by applying descending gradient-based learning to the system [63]. Simply put, models learn all the steps between the initial input phase and final output phase, and these parts are simultaneously trained. Some of the studies that apply e2e learning methods include the neural Turing machine, differentiable neural computer vision-based navigation in 3D environments, and value iteration networks [64][65][66][67]. While these studies have showcased successful models and techniques, quite a few noteworthy limitations remain, such as poor local optima, vanishing gradients, ill-conditioned problems, and slow convergence in different circumstances. Specifically, the development of the network architectures becomes more complex [63]. However, some of these limitations were efficiently overcome in the study by Buendgens et al., who applied e2e learning methods in a non-annotated routine database without manual labels. It accomplished good predictive performance in the identification of several diagnoses from gastrointestinal endoscopy images. This displays the potential of weakly supervised AI in clinical imaging modalities, in contrast to claims that manual annotations are a bottleneck for the future clinical application of AI [49]. The image dataset was preprocessed in MATLAB R2021a, and the ResNet-18 model was trained with the datasets. The model was capable of diagnosing inflammatory, degenerative, infectious, and neoplastic diseases from raw gastroscopy and colonoscopy images. It was able to detect the presence of diverticulosis, candidiasis, and colon and rectal cancer by learning the visual patterns of gastrointestinal (GI) pathology directly from the examination labels. In another study on the histopathological classification of gastric and colonic epithelial tumors and lesions, the authors trained CNNs and recurrent neural networks (RNNs), which included the Inception v3 network, to classify WSIs of biopsy specimens from the stomach and colon into adenocarcinoma, adenoma, and non-neoplastic tissue [68]. When tested on datasets obtained from The Cancer Genome Atlas (TCGA) with a mix of formalin-fixed paraffin-embedded (FFPE) and flash frozen tissues, the model was capable of generalized adenocarcinoma prediction, despite being largely trained on biopsies. The max-pooling aggregation method (MP-AGG) for WSI classification, demonstrated a higher log-loss than the RNN aggregation method. The probabilities of MP-AGG require a high cut-off threshold, and the method is more prone to errors in classification. Pinckaers and Litjens utilized gland segmentation datasets to train three models, i.e., a baseline U-Net model, another U-Net model with fewer filters and ordinary differential equation blocks (called U-Node), and a trained U-Node network (called U-ResNet), to predict and separate individual colon glands [69]. The U-Node network used fewer parameters compared to the other two proposed models and was able to improve segmentation. The study also showed that the neural ordinary differential equation improved the segmentation results in terms of memory load and parameter counts.

Transfer Learning Methods
Transfer learning is a technique that transfers knowledge gained from a machine learning model used to address one problem to another model to solve a different but related problem. Transfer learning involves transfer of information to new tasks while entirely depending on the previously learned tasks (Figure 2). It has several main advantages over other training models, such as a better starting model, higher accuracy, and faster training [70]. There are two similar approaches, one of which involves using a pre-trained model to transfer knowledge to a target model and adapting the features of the source model, while the other involves developing a new model from scratch to transfer knowledge from its main task, and then explicitly training it with available information [71]. Transfer learning based on the AlexNet model was adopted to learn effective classification features [58]. This method increased the limited number of colon lesion images and improved the screening performance for colorectal cancers. A study that utilized the former method to study colorectal cancer using confocal laser microscopy images with various transfer learning methods from the ImageNet dataset found that this approach yielded improved performance compared to the latter method, despite functioning differently according to different models and delegated tasks [72].
In another study, a fully automated lymph node detection and segmentation method was generated through transfer learning techniques, namely by fusing T2-and diffusionweighted images of multiparametric magnetic resonance imaging to the model Mask R-CNN to improve the performance of magnetic resonance imaging-based lymph node detection and segmentation. The model performance was evaluated based on sensitivity, positive-predictive value, false positive rate per case, and Dice similarity coefficient [32]. The model performed significantly better and faster than junior radiologists using manual detection. In addition, transfer learning methods are able to overcome issues such as a lack of rich WSI datasets. Hamida et al. used several CNNs, such as AlexNet, VGG, ResNet, DenseNet, and Inception, for studying the classification of patch-level colon cancer WSIs, of which ResNet presented the highest accuracy [73]. A pixel-wide segmentation approach for colon cancer was also applied using U-Net and SegNet models in the same study, in order to highlight regions of colon cancer in the slides. This revealed that SegNet had a higher accuracy than U-Net. Despite its exceptional performance, SegNet has a high computational cost. Additionally, Hamida et al. also compared different transfer learning strategies deployed in various CNN models and found that the models demonstrated low accuracy when learning from scratch, whereas pretraining the models resulted in better performance, although the accuracy of classification was unknown. However, fine-tuning the models produced the best performance among all the strategies and enabled rapid scanning of the datasets. Furthermore, Malik et al. investigated the usability of DL-based methods, namely a deep CNN with a limited amount of labeled data and low-resolution histology images for colorectal cancer identification and detection [74]. and found that the models demonstrated low accuracy when learning from scratch, whereas pretraining the models resulted in better performance, although the accuracy of classification was unknown. However, fine-tuning the models produced the best performance among all the strategies and enabled rapid scanning of the datasets. Furthermore, Malik et al. investigated the usability of DL-based methods, namely a deep CNN with a limited amount of labeled data and low-resolution histology images for colorectal cancer identification and detection [74]. In contrast to the findings of Hamida et al., it was found that training a CNN from scratch resulted in higher accuracy and a more consistent detection rate than that of other fine-tuned models, whereas existing deep CNN models trained with transfer learning approaches produced the most superior identification of cancer. Kather et al. trained several CNNs, namely VGG19, AlexNet, SqueezeNet 1.1, GoogLeNet, and ResNet50, to identify tissue types that are abundant in histological images of colorectal cancer, including those that are non-tumorous. These were also able to decompose complex tissues into constituents and aggregated the score of the abundance of the tissue parts [75]. The VGG19 model performed best and was also able to recreate the morphological features learnt from the datasets and visualize tissue structures via the DeepDream approach. This was continuously applied to larger images and WSIs, showing a high classification accuracy that was on par with human vision. Gessert et al. utilized transfer learning approaches, including learning from scratch, partial freezing variants, and fine-tuning, in the VGG16 and Inception v3 models, with a small number of datasets [72]. The training-from-scratch method performed extremely poorly, whereas there were no significant differences between the partial freezing variants and fine-tuning strategies. The study In contrast to the findings of Hamida et al., it was found that training a CNN from scratch resulted in higher accuracy and a more consistent detection rate than that of other fine-tuned models, whereas existing deep CNN models trained with transfer learning approaches produced the most superior identification of cancer. Kather et al. trained several CNNs, namely VGG19, AlexNet, SqueezeNet 1.1, GoogLeNet, and ResNet50, to identify tissue types that are abundant in histological images of colorectal cancer, including those that are non-tumorous. These were also able to decompose complex tissues into constituents and aggregated the score of the abundance of the tissue parts [75]. The VGG19 model performed best and was also able to recreate the morphological features learnt from the datasets and visualize tissue structures via the DeepDream approach. This was continuously applied to larger images and WSIs, showing a high classification accuracy that was on par with human vision. Gessert et al. utilized transfer learning approaches, including learning from scratch, partial freezing variants, and fine-tuning, in the VGG16 and Inception v3 models, with a small number of datasets [72]. The training-from-scratch method performed extremely poorly, whereas there were no significant differences between the partial freezing variants and fine-tuning strategies. The study also demonstrated that, while transfer learning improved performance, the optimal strategy differed for various models and classification tasks.
Learning transfer with a pretrained model on ImageNet datasets, based on various CNN models for colorectal cancer using histology-stained slides, replaced the final classification layer and trained the whole network with a stochastic gradient descent with momentum. The VGG19 showed the best performance of all the networks, within an acceptable training time [75]. Another study implemented a modified VGG-based CNN model on colorectal histology images to classify normal and tumor tissue samples. This system accurately classified 294 out of 309 normal tissue images, and 667 out of 719 tumor tissue images [39]. Each of the above studies had its limitations, such as the use of a relatively small number of datasets for generalization, a weak learning procedure for an appropriate level of support for the diagnostic decision, and nontransparent high prediction accuracy in complex model architectures. It is imperative that the connection between features and predictions be comprehensible from the algorithm. Therefore, if the generated AI algorithm contributes to a clinical decision, it should be easy for clinicians to understand why a specific output was produced and how it was characterized.

Explainable Learning Methods
When dealing with medical data, besides demonstrating accurate prediction, it is important for models to justify uncertainties in results, as cases that are more complex should be further inspected by humans. These "failed" prediction data can then be annotated by experts and turned into a new training set [76]. This is where explainable models come into play. Explainable AI (XAI) involves processes or methods that allow users to understand the results produced by machine learning models, their impacts, and potential biases. While some models are able to compute accurate predictive data, some are not able to supply justification for their decisions. Most of the aforementioned models for colorectal cancer have used standard DL structures. Moreover, these approaches mostly focus on increasing the accuracy of the final results [68,[77][78][79]. Hence, very few studies have mentioned any significant evidence that contributes to the decision outcomes [41][42][43][44][45]. Korbar et al. developed a deep ResNet visualization network for detection of colorectal polyps [42]. They established a pretrained ResNet-101 classification model with labeled patches of stained slide images. Furthermore, the classification could be projected back to the input pixel space to indicate parts of the input image that were key to the classification. This approach used a visualization model that identified regions and features of interest. A fully convolutional ResNet would be useful for visualizing the output in the last layer and to find the regions of interest for pathologists to analyze and confirm the classification of the model. Similarly, Raczkowski et al. addressed misclassified labels using an active learning-based Bayesian CNN model for classifying colorectal cancer [54]. This model was initially trained on a small dataset and on a dataset extended by using new samples, to reduce the entropy in the data analysis. The explainable AI system using a cumulative fuzzy class membership criterion for the classification of colorectal cancer tissues complements its decision with three types of information: visualization of the most important regions for decision, visualization of unwanted regions, and semantic explanation [43]. The use of the membership criterion proved to be a reliable and accountable explainable classifier in the decision-making process in clinical trials, with a highly satisfactory performance. In multiple instances, a fully convolutional network with attention to pooling architecture has been used to aggregate interpretable features of colorectal cancer patterns [80]. A pretrained VGG using the ImageNet dataset have been used to extract features for each image patch and adopted K-means clustering to cluster patches based on their extracted features. This was found to be more effective and suitable for huge datasets and showed better interpretability in locating important patterns and features, which contributed to accurate prediction of survival in patients with cancer.
Sabol et al. used a plain CNN model to generate an explainable Cumulative Fuzzy Class Membership Criterion (X-CFCMC) model that could be used to classify images and WSI segmentation on histopathological cancer tissues. It rationalized its decisions using three main methods: semantically explaining the possibilities of misclassification, displaying the training samples that were responsible for a certain prediction, and showing the training samples from other conflicting classes [43]. The pathologists involved in the study preferred the X-CFCMC model over the plain CNN model, as it was more useful and reliable. Another model utilizing XAI methods, namely the layer-wise relevance propagation method, was tested to identify various types of tumor entities, and it produced results that were consistent with experts' insights and provided visual explanations in the form of heat maps [76]. These explainable heat maps assisted in detecting biases that could potentially affect the generalization abilities of the models, such as biases affecting the entire dataset, biases correlated to a specific class label by chance, and sampling biases. Moreover, Korbar et al. demonstrated the capabilities of XAI-based techniques, using gradient-based visualization approaches, for explaining reasons for classification in WSI analysis for different types of colorectal polyps, with minimal costs and easy interpretability [42]. For this study, a ResNet model was used to classify colorectal polyps on labeled patches of H&E-stained WSIs, and to justify the outcomes using a gradient-based approach. The DL architecture and imaging modality based on explainable AI and sampling studies are shown in Table 1.

Sampling Methods
The basic properties of AI systems should include transparency, interpretability, and reliability to provide trust and fairness in clinical diagnostics. These qualities improve discrimination ability and diminish potential mistakes. The majority of AI techniques have been designed based on balanced class distributions. However, when trained on imbalanced data, such techniques are biased toward the majority class at the expense of the minority class, degrading their overall performance [44,45,81]. Data imbalance poses a significant challenge for traditional learning algorithms. Hence, a few studies using DL methods in colon cancer have approached this data distribution problem using data-level and algorithm-level methods. Koziarski et al. utilized the oversampling technique in the image space to enhance a large amount of data to train a CNN and implemented it by sampling in the feature space to fine-tune the last layers of the network [81]. They revealed that higher levels of imbalance between classes highly affected the classification performance. Thus, they concluded that the decrease in the number of observations used for training was not only the reason for the performance decline, but that data imbalance was also an important factor. Hong et al. developed a novel algorithmic-level loss function, which combined cross-entropy with asymmetric loss in EfficentNet and U-Net models [45]. This identified each pixel individually by comparing the class predictions and producing a better balance between precision and recall for colon cancer polyp segmentation. Shapcott et al. used systematic random sampling and adaptive sampling in the CNN architecture to overcome the imbalance problem and achieved significant improvements in colorectal cancer diagnostic performance [44].

Results and Discussion
It is important to focus on what the network is learning and interpret the pixel space visualizations by an attention-based network, as these represent the regions of interest that should be located and used to confirm the classification outcomes of the model. Visualizing particular features in such deep network architectures can result in the highest probability of success in the diagnostic decision-making. Modifications to the VGG-inspired CNN model ConvNet were evaluated by identifying colorectal cells, yielding values of 93.48%, 0.4385, 95.10%, and 92.76%, for accuracy, loss, sensitivity, and specificity, respectively [39]. Ghosh et al. reported that the proposed classifiers had the highest classification accuracy (98.60%) among classifiers, ranging from 88.71% to 98.40% [51]. Furthermore, the diagnostic performance of AI and endoscopists yielded sensitivities of 97.30% and 87.40%, respectively, specificities of 99.00% and 96.40%, respectively, and processing times of 0.022 s/image and 2.4 s/image, respectively [46]. In another study, researchers achieved an accuracy exceeding 92% when using hybrid models that automatically detect colon cancer [61]. Li et al. discussed the accuracy of hybrid models that combined available features and variables to yield improved accuracy in both training and validation datasets (>90% and >85%, respectively), aside from significantly improving the prediction performance of the hybrid model [82]. In some studies, the AI model achieved an AUC of 0.917, with high sensitivity (97.4%) in detecting high-risk features of dysplasia and malignancy [47]. In another study, the labeled patches from WSIs achieved accurate patch-level recognition [83]. However, when SSL was used, only about one-tenth of labeled patches used in the SL testing and 37,800 unlabeled patches were used to achieve a similar AUC [53]. In another study, AI operating on real-time detection of polyps (1 frame in 10 ms) was able to detect the presence of polyps with an accuracy of 96.4% and an AUC of 0.991, using a CNN model that was first trained on the ImageNet data and then on an available polyp dataset [34]. Additionally, ARA-CNN was found to perform better, by 18.78%, than other models using the same dataset [54]. Another hybrid learning approach demonstrated that this technique produced a median accuracy of 99.9% for healthy tissue slides and 94.8% for cancer slides, as compared to pathologist-based diagnosis from clinical samples [48]. A hybrid learning model that was modified from ZF-Net had an accuracy of 98.0%, a sensitivity of 98.1%, and a specificity of 96.3% [62]. A study by Iizuka et al. demonstrated high AUC values (0.96-0.99) for adenomas in the gastric and colonic epithelium by applying the same techniques, with few limitations [68]. A study on weakly supervised e2e AI in gastrointestinal endoscopy found that the AUC for the diagnoses of 13 diseases had a value of 0.7-0.8 and was able to predict the presence of colorectal cancer with an AUC > 0.76 [49]. The accuracy of a few CNN models in recognizing features in colorectal histopathological WSIs was compared. After fine-tuning, AlexNet, ResNet, DenseNet, VGG, and Inception models presented better performance than training-from-scratch and pre-trained approaches. The CNNs displayed accuracy rates of 89.42%, 95.25%, 96.98%, 95.86%, and 92.43%, respectively, with fine tuning and enabled rapid scanning and updating of the parameters to cope with the dataset [68]. In another study, the training-from-scratch approach performed better and had the most consistent detection rate across all evaluation criteria, achieving a specificity rate that was 16.81% higher than the best-performing CNN model. The model also produced a detection accuracy of 94.5%, which was 3.85% higher than the highest accuracy rate achieved by other CNN models [74]. Another study evaluated the reliability of the X-CFCMC XAI model by running acceptability tests against a plain CNN model, using feedback from pathologists, and found that the former was more acceptable to pathologists due to its explaining capacity [43].

Conclusions
Overall, this study concluded that AI is demonstrating promising results in terms of accuracy in the diagnosis of colorectal cancer. However, the user-dependent and complex, non-transparent deep network models do not provide an appropriate level of evidence for the key points used in classification, which is the reason for the slow application of this technique in clinical practice. Most AI models for predicting invasive cancer are prone to over-detection. This suggests that supporting evidence for results of AI-based diagnosis of colorectal cancer is strongly required to continue to optimize model performance for practice-level validation. Therefore, we propose that AI using a visualization method for classification outcomes could significantly reduce the burden on clinicians and improve the diagnostic accuracy for colorectal cancer.