A CNN CADx System for Multimodal Classiﬁcation of Colorectal Polyps Combining WL, BLI, and LCI Modalities

: Colorectal polyps are critical indicators of colorectal cancer (CRC). Blue Laser Imaging and Linked Color Imaging are two modalities that allow improved visualization of the colon. In conjunction with the Blue Laser Imaging (BLI) Adenoma Serrated International Classiﬁcation (BASIC) classiﬁcation, endoscopists are capable of distinguishing benign and pre-malignant polyps. Despite these advancements, this classiﬁcation still prevails a high misclassiﬁcation rate for pre-malignant colorectal polyps. This work proposes a computer aided diagnosis (CADx) system that exploits the additional information contained in two novel imaging modalities, enabling more informative decision-making during colonoscopy. We train and benchmark six commonly used CNN architectures and compare the results with 19 endoscopists that employed the standard clinical classiﬁcation model (BASIC). The proposed CADx system for classifying colorectal polyps achieves an area under the curve (AUC) of 0.97. Furthermore, we incorporate visual explanatory information together with a probability score, jointly computed from White Light, Blue Laser Imaging, and Linked Color Imaging. Our CADx system for automatic polyp malignancy classiﬁcation facilitates future advances towards patient safety and may reduce time-consuming and costly histology assessment.


Introduction
Colorectal cancer (CRC) is the fourth cause of cancer-related death worldwide, with the highest incident rates in developed countries [1,2]. An early diagnosis of CRC can prevent spreading throughout the colon and avoid further complications. Colorectal polyps (CRP) are precursor lesions and indicators of colorectal cancer. There are roughly two classes of CRPs: (1) non-neoplastic CRPs, on polyp detection when the experts were assisted by a CADe system. Overall, in recent years, systems for polyp detection have achieved impressive results and been shown to be an effective tool on assisting medical experts. Assuming a high detection rate, the following issue presented to clinicians is to visually identify a polyp for being benign or pre-malignant, which imposes an extra challenge that several studies have tried to overcome. In early studies, CRPs were classified based on local features from blood vessels using NBI images [22][23][24], or exploiting a combination of chromoendoscopy, WL, and NBI [25]. In Scheeve et al. [26], handcrafted features were used to predict the histology of polyps using Support Vector Machines (SVMs) and clinical classification models. The development of CNNs also had a great impact on colorectal polyp classification. Initial work employed classifiers using features extracted from broadly used CNNs [27][28][29][30], such as AlexNet, ResNet, or InceptionNet. Next, a more traditional classifier such as SVMs was required to classify CRPs between healthy and malignant. The early success pushed the development of several CADx systems to classify polyps using different classification schemes. In Konami et al. [31], high-accuracy results were obtained (sensitivity, 93.0%; specificity, 93.3%) using SVMs on a dataset of 118 lesions obtained with NBI magnifying colonoscopy. The study developed a CADx system for the Hiroshima classification. In a similar fashion in Mori et al. [32], a CADx was developed for CRP classification from images, obtained with NBI and stained endocytoscopy. The study employed SVM which required a three-step process to perform the polyp prediction. As the availability of medical data increased, alternative approaches using more recent deep learning frameworks allowed the design of end-to-end predictions. In Chen et al. [33], a framework was developed to classify diminutive polyps using magnified NBI. The magnified modalities allow for detailed imaging of lesions but require a high-level of precise movement of the endoscope, which makes it a less desirable technique in clinical usage. More recently, in Byrne et al. [34], a deep learning framework was developed to classify unaltered video frames of nonmagnified NBI to classify polyps using the NICE classification. One main arising issue is that the NICE classification does not incorporate SSAs polyps, which causes doctors to consider them as dangerous as ADs. Despite this aspect, the study presents a great method to accurately classify polyps during real-time colonoscopy. The current CADx studies have focused their potential around NBI, whereas BLI and LCI have not yet seen significant developments with the capabilities of artificial intelligence.
In our previous work [35], we collected a dataset of 203 patients with WL, BLI, and LCI. The limited dataset constrained the classification of our data. Therefore, we have explored and extracted features from a pretrained ResNet50 with the aim to classify the dataset of polyps between benign and pre-malignant. We have trained the features by combining single SVMs for each modality and evaluated the results via Leave-One-Patient-Out Cross-Validation (LOPOCV). Our study was finalized by incorporating the information of WL, BLI, and LCI and combining the posterior probabilities of the trained model.
In this work, we present a new study which builds further on our previous research in the following points.

1.
Our dataset is improved with 458 new patients resulting in a total of 2919 images obtained from three different hospitals.

2.
We perform a benchmark with mostly used state-of-the-art colorectal polyp deep learning architectures, in order to train an end-to-end CNN, evaluated with a test set of 60 patients obtained with WL, BLI, and LCI.

3.
We build a CADx system to classify CRPs between benign and pre-malignant and we compare our results with the knowledge and expertise of 19 endoscopists (13 novices and 6 experts).

4.
We present a probability score to the endoscopists, which is computed from the average prediction of WL, BLI, and LCI.

5.
Our developed CADx systems provides explainable visual data from the CNN to contribute to smooth decision-making.
Our study concludes with showcasing how our CADx system could perform in clinical routine and how the outcomes can offer benefits to the endoscopists during real-time colonoscopy.

Patient Inclusion and Data Acquisition
The data collection was carried out in a prospective fashion, according to a predefined image acquisition protocol, in the Maastricht University Medical Center+ (MUMC+) and the Catharina Hospital Eindhoven (CZE), both in the Netherlands, and the Queen Alexandra Hospital in Portsmouth, United Kingdom. The training dataset consisted of a total of 468 patients, with 2319 pre-malignant polyps and 420 benign, with CRPs of all sizes, from diminutive polyps to large polyps. The dataset includes polyps acquired in WL, BLI, LCI, and I-Scan (HDWL; Mode 1, 2, and 3) modalities. Using I-Scan data adds robustness to the algorithm, as all modes have almost similar visual properties as the three other modalities. The test set was restricted to 60 patients to match with the exact same patients analyzed by endoscopists (further explained in Section 2.6). All the test set was acquired from CZE, with 45 patients identified as pre-malignant polyps and 15 benign. For each test patient, a single image of the polyp was acquired at different time steps with three different modalities-WL, BLI, and LCI-adding to a total of 180 polyp images. Figure 1 summarizes the data collection described above. All collected data was made fully anonymous prior to the study.

Data Preprocessing
In order to obtain optimal classification, the central region of the image was automatically selected as the ROI. The cropped region ensures a coverage of the polyp area, as well as its surrounding texture. Successively, the dataset was normalized by subtracting the mean and by dividing the standard deviation of the pretrained ImageNet data. For the last step, each input image was resized to 229 × 229 pixels in the RGB color space. To increase the generalization of the network, data augmentation was used to enhance the model capabilities for our classification task. In this study, the training images are augmented by a combination of flipping, shifting and ±90 • rotation, contrast enhancement, blurring, and zooming.

Network Architectures
We performed a benchmark with different architectures and based on the results, we selected EfficientNet [36] as the main architecture for our system. This family of models achieves state-of-the-art accuracy on the ImageNet dataset by employing a simple, yet powerful concept where the network models are not only scaled in depth, but also in width and resolution. To achieve such a scheme, the authors propose a compound coefficient (Φ) that uniformly scales the network along the three dimensions. The coefficient controls the distribution of resources available for scaling the model under the constraint of a maximum operation growth of 2 Φ FLOPS. For our CADx, we employed the variant B4, which has a total of 19 million parameters. The B4 variant was the preferred option for two main reasons; one, it achieved a higher performance against state-of-the-art polyp classification architecture while reducing the number of parameters; and second, it allowed the best memory performance on our CADx setup. Additionally, several other commonly-used architectures were considered for the development of our CADx system, therefore we trained all alternatives and compared their performance with EfficientNet. The architectures were selected based on the most common networks employed in state-of-the-art polyp detection and classification studies. For this reason, we selected the following networks: VGG16, ResNet50, ResNet101, Xception, and InceptionResNet.

Training
All the networks were initialized with ImageNet weights and trained from scratch with Stochastic Gradient Descent (SGD) using a momentum of 0.9. For all the networks except EfficientNet, we used a batch size of 32, and for the latter a batch size of 8, due to the memory restrictions on our single GPU. We chose to use an exponential learning rate, with hard restarts at every two epochs, ranging from 1 × 10 −2 to 4 × 10 −3 -except for VGG16, where the learning rate ranged from 1 × 10 −3 to 1 × 10 −4 . The results of each architecture can be found in Table 1. Finally, the model was trained for 100 epochs or until convergence on the validation set, using a single TitanXp GPU. As input, the network received a single image of any of the modalities present in the training set (WL, BLI, and LCI), which allowed for shared features between all modalities. To ensure that each class is representative, during training, an independent image generator was created for the benign and the pre-malignant class. During inference time, we divided the results in WL, LCI, and BLI and obtained the posterior probabilities to observe the classification to the final prediction.

Explainable CADx System
During the assessment of CRPs, the CADx system provides the endoscopist with a quantitative measure of how likely the observed CRP is to be benign or pre-malignant. Although a probability measure might be sufficient for agreement between the system and the endoscopist, there is a likelihood that a visual inspection of the polyps is required for further confirmation. Gradient-weighted Class Activation Mapping (Grad-CAM) [37] is an effective method that allows the visualization of the decision region of a CNN. Through the average product of the feature maps and a class-activation function, we are able to produce a visual map and add explainability to the endoscopist's observations.

Clinical Benchmark
To benchmark the performance of the proposed algorithm, a prospective, endoscopist-blinded, noninterventional study was conducted at the Maastricht University Medical Center+ (MUMC+) and Catharina Hospital Eindhoven (CZE). The study was in accordance with the declaration of Helsinki as well as the General Data Protection Regulation. A total of 19 endoscopists optically diagnosed 60 colonoscopy images containing a single polyp acquired in WL, BLI, and LCI modalities (later referred to as test data). Two person groups were derived from the medical professionals. The first group consisted of six expert endoscopists from the international BLI-expert group, who were knowledgeable in using BLI and BLI Adenoma Serrated International Classification (BASIC) [13,38] (Table 2) and brought an experience of more than 2000 colonoscopies. The second group consisted of thirteen Dutch novices with limited colonoscopy experience (<400 colonoscopies) and without prior experience in using BLI or BASIC.
The benchmark study was divided in two stages. For both stages, only WL and BLI were present as modalities. In the first stage, the experts and novices were instructed to classify the set of 60 polyps between HP, SSA, or AD based on their intuition and expertise, with a time limit of 30 seconds. After a washout period of four weeks, the medical group was further trained in using BLI and BASIC. Following the instruction period, the same endoscopists were asked to classify the exact same dataset of polyps based on the BASIC classification [13]. Moreover, each endoscopist had to report the level of confidence for each CRP.

Evaluation and Results
The test set was evaluated using a single network that was simultaneously trained with all acquisition modalities, which allowed the CNN to learn shared features across all domains. For the single and combined modality, we computed the accuracy and the area under the curve (AUC). Additionally, sensitivity, defined as the rate of correct pre-malignant polyps classified as such; and specificity, defined as the correct rate of benign polyps classified as benign, were calculated as well. The performance of the CADx system and all associated architectures was evaluated and compared to outcomes of the experts and novices. The best CADx system achieves an accuracy of 95.0% with specificity of 93.3%, sensitivity of 95.6%, and AUC of 0.97. For each input image, the CADx system computes the region for decision-making via Grad-CAM and the malignancy prediction. The average prediction of the three modalities offers the endoscopists with the best possible diagnosis result. The end-to-end framework described above is depicted in Figure 2.

Discussion
In clinical practice, endoscopists must perform visual inspection of all detected colorectal polyps. Experience and expertise are factors that currently dictate decision-making during real-time endoscopy. In this study, we have evaluated a CADx system against expert and novice endoscopists. Firstly, in Table 1, a noticeable difference between novices and experts is observed for both intuition and the BASIC classification. Experts showed a higher diagnostic accuracy compared to novices (79.5% vs. 66.7%, p = 0.005) and (81.7% vs. 66.5%, p = 0.002), respectively. Both groups improved the diagnosis of pre-malignant polyps during the second round of assessments, which positively reflects on the training with the BASIC classification received after the first stage of the study. Following the first point, our CADx system showed an overall better performance than both novices and experts in all of the trained architectures. All our trained models contained information of three different hospitals with WL, BLI, and LCI modalities, with a total of 2739 polyp images. Compared to the endoscopists, our training stage utilized the benefits of deep learning, which allowed us to build a robust diagnostic tool. Our final CADx system (EfficientNetB4) correctly assessed a total of 57 out of 60 polyps. The CADx system erroneously classified only three polyps: two adenomas and one hyperplastic polyp. Table 3 presents the individual contribution of each modality to the overall test set. In this study, the performance of the endoscopists was evaluated on the knowledge of WL and BLI. Moreover, the BASIC classification only takes into account the latter modality, hence, comparison with all combined modalities may lead to unreliable one-to-one comparison. Solely observing BLI, our CADx system outperforms experts and novices for both intuition and BASIC. In contrast, WL and LCI do not outperform both groups on classifying enough HP polyps. On the one hand, WL offers limited visualization and enhancement, which makes it difficult to identify small benign polyps; while on the other hand, LCI is not the most common modality in current clinical practice. This is reflected in our training dataset, which contained far more WL and BLI images than LCI data. Lastly, when comparing each single modality with the combined results, we did not find a statistically significant improvement compared to combining modalities. This might indicate that for a further study, the sample size for the test set should be increased. In Figure 3, we showcase the strength of our CADx system, where three predictions are performed-one for each modality-to obtain a final prediction based on the three observations. In the work of Murata et al. [30], the authors proposed a voting system from the predictions of each combination. Although the voting system would be suitable, we prefer to adopt the same methodology as our previous study [35], where the posterior probabilities acquired for each modality are subsequently averaged to compute one unique probability per polyp. In addition, Grad-CAM provides the endoscopists with visual information on the decision region. This has two benefits, one is to provide the endoscopists with visual feedback on the CADx judgment, and the second is to offer an alternative method for tracking polyps (either benign or pre-malignant) during real-time video endoscopy. Our CADx system excels in cases where not all modalities are predicted as correct, such as in the example of Figure 4, where a pre-malignant polyp is incorrectly predicted in the WL modality. Furthermore, a visual informative inspection is supported by the results of Grad-CAM region, which focuses on the intestinal tract instead of the CRP. Therefore, the combined information of BLI and LCI allows for a correct decision of malignancy, facilitating endoscopists with a better diagnosis during colonoscopy. Figure 3. Output generated from the CADx system where an image of a colorectal polyp (CRP) is received from the endoscope. Two outputs are presented, (1) a prediction of whether the polyp is benign (green) or pre-malignant (red), and (2) a GRAD-CAM map that allows the endoscopists to judge the decisions of the CADx. From left to right, the three modalities employed in this study are shown: White Light, Blue Light Imaging, and Linked Color Imaging. In the first row, a benign polyp is predicted from the three modalities with the CADx system decision shown in the following row. In the third and fourth rows, a pre-malignant polyp is predicted from the combined modalities. In this case, BLI and LCI correctly identified the pre-malignant polyp and the CADx system is capable pf identifying it as such.

Conclusions
In this study, we have developed an end-to-end CADx system with a state-of-the-art deep learning architecture to classify colorectal polyps between benign and pre-malignant, obtained with three different image modalities (WL, BLI, and LCI). We evaluated our framework on an independent test set of 60 patients and compared the results with the diagnosis of 19 endoscopists (6 experts and 13 novices). Our CADx system was trained with a dataset of 2739 images collected from three different hospitals. Anticipating clinical application, we employed EfficientNetB4, which is a state-of-the art architecture for classification. The network demonstrates excellent performance for our CADx system compared to most common networks found in polyp classification literature. Besides its optimal performance, one of the most noticeable downsides we found during training is that the complex computations for scaling and depth limited the batch size for training. We have opened the path for endoscopists towards combining WL, BLI, and LCI to predict polyp histology, and the results have shown that there is potential to enhance the prediction of individual polyps from several modalities, but further studies should conclude the findings with a broader testing set. Moreover, we have experimented with Grad-CAM to offer endoscopists an interpretable answer of the CADx decisions. In further studies, we will investigate our system performance in real-time colonoscopy. In conclusion, we present a CADx system that could be used in routine colonoscopy to classify benign and pre-malignant colorectal polyps. If the CADx system successfully predicts CRP histology, then potentially diminutive hyperplastic polyps could be left in the colon and the suggested 'diagnose-and-leave' strategy could be applied. The same principle can be applied to diminutive adenomatous polyps as well, which then could be resected without performing histological evaluation following the 'resect-and-discard' strategy. Overall, the presented study may allow for improved diagnosis of CRC and decrease the current cost burden of histological examinations.