Polyp Detection and Segmentation from Video Capsule Endoscopy: A Review

Video capsule endoscopy (VCE) is used widely nowadays for visualizing the gastrointestinal (GI) tract. Capsule endoscopy exams are prescribed usually as an additional monitoring mechanism and can help in identifying polyps, bleeding, etc. To analyze the large scale video data produced by VCE exams automatic image processing, computer vision, and learning algorithms are required. Recently, automatic polyp detection algorithms have been proposed with various degrees of success. Though polyp detection in colonoscopy and other traditional endoscopy procedure based images is becoming a mature field, due to its unique imaging characteristics detecting polyps automatically in VCE is a hard problem. We review different polyp detection approaches for VCE imagery and provide systematic analysis with challenges faced by standard image processing and computer vision methods.


Introduction
Video capsule endoscopy (VCE) is an innovating diagnostic imaging modality in gastroenterology, which acquires digital photographs of the gastrointestinal (GI) tract using a swallowable miniature camera device with LED flash lights [1,2]. The capsule transmits images of the gastrointestinal tract to a portable recording device. The captured images are then analyzed by gastroenterologists, who locate and detect abnormal features such as polyps, lesions, bleeding etc and carry out diagnostic assessments.
A typical capsule exam consists of more than 50,000 images, during its operation time, which spans a duration of 8 to 10 hours. Hence, examining each image sequence produced by VCE is an extremely time consuming process. Clearly an efficient and accurate automatic detection procedure would relieve the diagnosticians of the burden of analyzing a large number of images for each patient.
based schemes were proposed to find polyps in virtual colonoscopy or computed tomography colonography and have been addressed; see e.g. [13,14,15,16,17,18]. Most of these methods take the already reconstructed surface representing the colon's interior or rely on some specific imaging techniques, see [19,20] for reviews. In contrast, VCE comes with an un-aided, uncontrolled photographic device, which moves automatically and is highly susceptible to illumination saturation due to near-field lighting [21]. Moreover, the images from VCE differs significantly from images obtained with the traditional colonoscopy.
For example, the liquid material in the lumen section is less in colonoscopy and hence the images look more specular. Whereas in VCE images the mucosa tissue looks diffusive under the presence of liquid and additionally the trash and turbidity can hinder the view of the mucosal surface [21]. Due to the unaided movement of the capsule camera, blurring effects make the image looks less sharper. Moreover, the color of mucosal tissue under VCE has some peculiar characteristics [22]. Due to these reasons particular to VCE its sensitivity for detecting colonic lesions is low compared with the use of optical colonoscopy as noted in [23]. Nevertheless, a recent meta-analysis showed that capsule endoscopy is effective in detecting colorectal polyps [24] (at-least in the colon capsules, though the jury is still out on the small-bowel and esophagus). Newer advances in sensors, camera system results in second generation capsule endoscopes and sensitivity and specificity for detecting colorectal polyps was improved [25,26]. However, the increased imaging complexity and higher frame rates, though provide more information, inevitably puts more burden on the gastroenterologists. Thus, having efficient, robust automatic computer aided detection and segmentation of colorectal polyps is of great importance and need of the hour now.
In this comprehensive survey paper, we provide an overview on different automatic image/video data based polyp detection (localization) and segmentation methods proposed in the literature so far (up-to September 2016 1 ) and discuss the challenges that remain. We organized the rest of the paper as follows.
Section 2 provides a review of polyp detection, segmentation and holistic techniques from the literature. Section 3 we discuss the outlook in this field along with challenges that needs to be tackled by future research.

Review of polyp detection and segmentation in VCE
Variable lighting and rare occurrence of polyps in a given (full) VCE video creates immense difficulties in devising a robust and data-driven methods for reliable detection and segmentation. We can classify polyp detection/segmentation methods into two categories: (a) Polyp detection -Finding where the polyp frame 2 occurs, not necessarily the location of the polyp within that frame (b) Polyp segmentation -once 1 We refer the reader to the project website which is updated continuously with links to all the papers presented here and also to obtain more details about this research area: http://goo.gl/eAUWKJ 2 Polyp frames may contain more than one polyp. We do not make a distinction of detecting one or multiple polyps in a given image.  (a) pedunculated/stalked or subpedunculated, (b) sessile. We show a 3D representation of VCE frames obtained using shape from shading technique [27].
a frame which contains polyp(s) is given segment mucosal area in which the polyp appears. Note that the first task is a much harder problem than the latter, due to the large number of frames the automatic algorithm needs to sift through to find the polyp frames which is usually a rare occurrence. Naturally, machine learning based approaches are essential for both categories, especially for polyp detection since the occurrence of polyp frames, frames where at least one polyp is visible, are very few in contrast to the typical full length of video frames (typically greater than 50000) in VCE imagery. The task is more complicated since colorectal polyps do not have common shape, texture, and color features even within a single patient's video. Nevertheless, there have been efforts in identifying polyp frames using automatic data-driven algorithms. For polyp segmentation, it is relatively an easier problem since the automatic algorithms need only to analyze a given polyp frame to find and localize the polyps that are present. We next review polyp detection and segmentation methods studied so far in the literature and discuss the key techniques used with relevant results.

Polyp detection in capsule endoscopy videos
There are two main classes of polyps with respect to their appearance in shape; pedunculated and sessile.
In Figure 2 we show some example polyps (selected from images shown in Figure 1) in 3D using the shape from shading technique [27], indicating the amount of protrusion out of the mucosa surface 3 . Figure 3 3 Visualizations as 3D figures are available in the Supplementary and also at the project website. One of the earliest works in VCE image processing and especially in detecting polyps automatically is by Kodogiannis et al [28], who studied an adaptive neuro-fuzzy approach. By utilizing texture spectrum from six channels (red-green-blue:RGB and hue-saturation-value:HSV color spaces) with adaptive fuzzy logic system based classifier they obtained 97% sensitivity on 140 images with 70 polyp frames.
Li et al [29] compare two different shape features to discriminate polyp from normal regions. They utilize MPEG-7 shape descriptor (angular radial transform -ART), Zernike moments as features along with multi-layer perceptron (MLP) neural network as the classifier. Due to invariance to rotation, translation, and scale change Zernike moments are well suited to for polyp detection in VCE which has unconstrained movement of the camera. They test their approach on 300 representative images out of which 150 con-tain polyps and achieved an accuracy of 86.1%. However, their approach only compares two specific shape features and discards color and texture information entirely. In a related work, Li et al [30]   the curvature index derived in [37] fails to pick up VCE frames which contain sessile or flat polyps. Also 6 note that this method does not rely on any classification and is purely a geometry based approach.
Zhao and Meng [38] proposed to use opponent color moments with local binary patterns (LBP) texture feature computed over contourlet transformed image with SVM classifier with 97% accuracy reported.
Their work unfortunately do not mention how many total frames and polyp frames were used or how the color and texture features are fused. Condessa and Bioucas-Dias [44] studied an extension of the protrusion measure in [37]. By utilizing a two stage approach: first stage involves multichannel segmentation, local polynomial approximation, and second stage extracts contour, curvature features with SVM classifier, they obtain 92.31% sensitivity.
The authors advocate using the temporal information via recursive stochastic filtering, though this has not yet been considered by researchers.
David et al [45] utilized the protrusion measure from [37], however they observe that not all polyps Zhou et al [52] utilized RGB averaging with variance for polyp localization and radius measurement of suspected protruding polyps using statistical region merging approach. On a total of 359 this approach with SVM classifier obtained 75% sensitivity. Gueye et al [53] used SIFT features and BoF method. On 8 800 frames with 400 polyp frames their SVM classifier obtained classification rate 61.83%, and on 400 frames with 200 polyp frames the rate increased to 98.25%.
In summary, there have been a few number of automatic polyp detection methods, though unfortunately the full dataset description is lacking in many of these published works which makes it hard for us to benchmark them. Table 1 summarizes all the polyp detection methods covered so far with main techniques, classifiers utilized along with tested dataset details (whenever available). It can be seen that the popular classifier is linear SVM with radial basis function as kernel due to its simplicity and ease of use.
However, it is our belief that these majority of these methods either overfit or underfit as the proposed methods are tuned to obtain best possible detection accuracy results for their corresponding datasets. All the aforementioned methods highlight the accuracy of polyp detection based on how many polyp frames are detected out of all the input frames given. However, as mentioned by Mamonov et al [50] per polyp accuracy over per frame basis is very important. This is since detecting minimum a single polyp frame in a (typically) consecutive frames based sequences will suffice to alert the gastroenterologist/clinician. Figure 3 shows different frames of a single patient VCE exam wherein a sequence of length 55 contain a pedunculated polyp 4 . It is clear that detecting any one of the frames as polyp frame is enough as the gastroenterologist can inspect the neighboring frames manually. This prospective automatic polyp alert system can reduce the burden on gastroenterologists as the number of frames to be inspected can be dramatically reduced, from a tens of thousands of frames to a few hundred possible polyp sequences.

Polyp localization or segmentation within a VCE frame
Polyp localization or segmentation of polyps in a single frame of VCE is an (relatively easier) object identification problem. The general task falls under the category of image segmentation that is a well studied problem in various biomedical imaging domains. As we have seen before the color, texture and shape features individually are not discriminative enough to obtain polyp segmentation from VCE frames. There have been a number of efforts in polyp segmentation from VCE which we classify based on which segmentation techniques are utilized. Note that majority of the previously discussed polyp detection algorithms employ a polyp segmentation step to identify candidate polyp frames though accurately segmenting the polyp region is not required for subsequent polyp detection in VCE.
Prasath and Kawanaka [54] proposed to use a novel vascularization feature which is based on Frangi vesselness filter. 4 Original video available in the Supplementary.
• Geometry: Hwang and Celebi [34] derive a localization approach using curvature center ratio on K-mean clustering based segmented polyp frames. Figueiredo et al [37] used a protrusion measure based on Gaussian and mean curvature to highlight possible polyp locations in a given VCE image.
• Hybrid: Jia [58] used K-means clustering and localizing region-based active contour segmentation. The results shown in this paper seems to be from colonoscopy imagery and not VCE.

Accurate boundaries, segmentation
• Active contours: Meziou et al [59] used alpha divergence based active contours segmentation, though their approach is a general segmentation not for polyps in particular, see also [57].
Prasath et al [27] used active contours without edges method for identifying mucosal surface in conjunction with shape from shading technique, see also [60]. Eskandari et al [55] used region based active contour model to segment the polyp region from a given image which contain a polyp, see also [56]. [64] circular Hough + co-occurrence matrix Boosting 1500(300) [65] Hough transform + co-occurrence matrix Boosting 1500(300) Table 2: Holistic polyp detection and segmentation approaches for endoscopy systems. Note that all these proposed methodologies are so far only tested with traditional colonoscopy images [8].
trash, illumination artifacts. A preliminary mucosa segmentation [60,27,61,62] maybe required before applying polyp segmentation step to avoid these non-polyp pixels.  [64,65]. Table 2 summarizes these holistic system approaches proposed so far. Note that the polyp detection testing is done using a traditional colonoscopy images for now. Apart from these technological constraints, these approaches requires robust embeddable computer vision systems which needs to operate under strict energy budget (battery constraints). However, an computer vision embedded VCE system can revolutionize the diagnosis procedures which are done manually with tedious processes so far.

Discussion and outlook
Automatic polyp detection and segmentation is nascent area which requires various computer vision methodologies such as geometrical primitives, color spaces, texture descriptors, features matching, along with strong machine learning components. A standard assumption made in classical colonoscopy imagery based polyp detection methods is that, polyps have high geometrical features such as well-defined shape [18] or protrusion out of mucosal surface [13]. Thus, curvature measures are widely utilized in detecting polyps and the adaptation to VCE imagery is undertaken with limited success [37,44,45,46].
Texture or color features based schemes such as [9,10] applied to colonoscopy images does not work well in WCE when used on their own as we have seen in Section 2.1. This is because, though the polyps appear to have different textural characteristics than the surrounding mucosa, bubble, trash and capsule motion induced by uncontrollable peristalsis have a strong effect on texture features. In such a scenario classifying with texture features alone is highly error prone. For example, the polyp shown in Figure 3(top row) has unique texture pattern, whereas for the polyp in Figure 3(middle row) texture appears to be uniformly spread across the mucosal structure surrounding it. Moreover, in a video, the neighboring frames show a partial view of the polyp with mucosa folds having a similar texture. Similarly, color of the polyp is not homogeneous across different polyp frames within a patient and highly variable across different exams from patients. In a comprehensive learning framework one can try to incorporate local and global texture features for discerning polyps along with vascularization, and color information.
Existing approaches reviewed in Section 2 are plagued by various factors. One of the main problem is the existence of trash, bubble since no colon cleaning is required in VCE exams. Robust polyp detection approaches must combine efficient trash, bubble detectors to avoid false positives. To conclude, we believe, a holistic view of combining, motion, geometry, color, and texture with a strong machine learning paradigm may prove to be a successful for robust, efficient automatic polyp detection in VCE imagery.
Based on the detailed description of the current state of the art polyp detection and segmentation methods we observe the following salient points for future outlook: • Recent excitement generated by deep learning is very promising direction where massively trained neural network based classifiers can be used to better differentiate polyp frames from normal frames.
However deep learning networks in general require huge amount of training data, in particular labeled data of positive (polyp frames) and negative (normal frames) samples. One possible remedy for imbalanced data problem is to use data augmentation, therein one can increase the polyp frames by artificial perturbation (rotation, reflection,...), see e.g. [51] for an attempt to create a higher number of polyp frames for training. There have been some works in the last two years on endoscopy image analysis with deep learning [66,67,68].
• Similar to ASU-Mayo Clinic polyp database for colonoscopy polyp detection benchmarking, VCE polyp detection requires a well-defined database with multiple expert gastroenterologists marked polyp regions. This will make the benchmarking and testing different methodologies for automatic polyp detection and segmentation standardized.
• Sensor improvements with novel capsule systems [69,70] such as as more control in terms of higher image resolution, standardized illumination/contrast, controlled capsule speed, variable image capturing mechanisms can help automatic image analysis.
• Finally, embedding the image analysis part within capsule endoscopy imaging systems [65,71] is an exciting research area which will enable the gastroenterologists can make real-time decisions.