Next Article in Journal
Time- and Dose-Dependent Effects of Irradiation on Endothelial and Tumor Endothelial Cells: Transcriptional, Molecular, and Functional Changes Driving Activation In Vitro and In Vivo
Previous Article in Journal
Conditionally Reprogrammed Cells as Preclinical Model for Rare Cancers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study

by
Ian Io Lei
1,2,*,
Daniel R. Gaya
3,
Alexander Robertson
4,
Benedicte Schelde-Olesen
5,
Alice Mapiye
4,
Anirudh Bhandare
6,
Bei Bei Lui
1,
Chander Shekhar
1,
Ursula Valentiner
7,
Pere Gilabert
8,
Pablo Laiz
9,
Santi Segui
8,
Nicholas Parsons
10,
Cristiana Huhulea
1,
Hagen Wenzek
9,11,
Elizabeth White
11,
Anastasios Koulaouzidis
5,12,13,14 and
Ramesh P. Arasaradnam
1,2,15
1
Institute of Precision Diagnostics & Translational Medicine, University Hospital of Coventry and Warwickshire, Clifford Bridge Rd, Coventry CV2 2DX, UK
2
School of Medicine, University of Warwick, Coventry CV4 7AL, UK
3
Department of Gastroenterology, Glasgow Royal Infirmary, Glasgow G4 0SF, UK
4
Department of Digestive Diseases, University Hospitals of Leicester NHS Trust, Leicester LE1 7RH, UK
5
Surgical Research Unit, Odense University Hospital, 5700 Svendborg, Denmark
6
Department of Gastroenterology, Royal Oldham Hospital, Northern Care Alliance, Rochdale Road, Oldham OL1 2JH, UK
7
Institute of Anatomy and Experimental Morphology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
8
Mathematics and Computer Science Department, University of Barcelona, 58508007 Barcelona, Spain
9
GI Digital, Inc., New York, NY 10017, USA
10
Warwick Clinical Trials Unit, University of Warwick, Coventry CV4 7AL, UK
11
Corporate Health International, Inverness IV2 5NA, UK
12
Department of Gastroenterology, Pomeranian Medical University, 70-204 Szczecin, Poland
13
Department of Surgery, OUH Svendborg Sygehus, 5700 Svendborg, Denmark
14
Department of Clinical Research, University of Southern Denmark, 5230 Odense, Denmark
15
Leicester Cancer Centre, University of Leicester, Leicester LE1 7RH, UK
*
Author to whom correspondence should be addressed.
Cancers 2025, 17(17), 2840; https://doi.org/10.3390/cancers17172840
Submission received: 1 July 2025 / Revised: 24 August 2025 / Accepted: 26 August 2025 / Published: 29 August 2025
(This article belongs to the Section Methods and Technologies Development)

Simple Summary

This study assessed the reliability of AI-assisted bowel cleansing scoring in colon capsule endoscopy using the CC-CLEAR scale. While interobserver agreement was excellent with manual scoring among experienced readers, AI-assisted reads did not improve agreement but showed reduced consistency, particularly among less experienced users. The mean AI-assisted scores were significantly lower than manual scores, highlighting potential interpretive challenges. These findings suggest that AI’s effectiveness currently depends on user expertise, reinforcing the importance of further development and refinement required for a robust AI implementation in CCE.

Abstract

Background: Colon capsule endoscopy (CCE) has seen increased adoption since the COVID-19 pandemic, offering a non-invasive alternative for lower gastrointestinal investigations. However, inadequate bowel preparation remains a key limitation, often leading to higher conversion rates to colonoscopy. Manual assessment of bowel cleanliness is inherently subjective and marked by high interobserver variability. Recent advances in artificial intelligence (AI) have enabled automated cleansing scores that not only standardise assessment and reduce variability but also align with the emerging semi-automated AI reading workflow, which highlights only clinically significant frames. As full video review becomes less routine, reliable, and consistent, cleansing evaluation is essential, positioning bowel preparation AI as a critical enabler of diagnostic accuracy and scalable CCE deployment. Objective: This CESCAIL sub-study aimed to (1) evaluate interobserver agreement in CCE bowel cleansing assessment using two established scoring systems, and (2) determine the impact of AI-assisted scoring, specifically a TransUNet-based segmentation model with a custom Patch Loss function, on both interobserver and intraobserver agreement compared to manual assessment. Methods: As part of the CESCAIL study, twenty-five CCE videos were randomly selected from 673 participants. Nine readers with varying CCE experience scored bowel cleanliness using the Leighton–Rex and CC-CLEAR scales. After a minimum 8-week washout, the same readers reassessed the videos using AI-assisted CC-CLEAR scores. Interobserver variability was evaluated using bootstrapped intraclass correlation coefficients (ICC) and Fleiss’ Kappa; intraobserver variability was assessed with weighted Cohen’s Kappa, paired t-tests, and Two One-Sided Tests (TOSTs). Results: Leighton–Rex showed poor to fair agreement (Fleiss = 0.14; ICC = 0.55), while CC-CLEAR demonstrated fair to excellent agreement (Fleiss = 0.27; ICC = 0.90). AI-assisted CC-CLEAR achieved only moderate agreement overall (Fleiss = 0.27; ICC = 0.69), with weaker performance among less experienced readers (Fleiss = 0.15; ICC = 0.56). Intraobserver agreement was excellent (ICC > 0.75) for experienced readers but variable in others (ICC 0.03–0.80). AI-assisted scores were significantly lower than manual reads by 1.46 points (p < 0.001), potentially increasing conversion to colonoscopy. Conclusions: AI-assisted scoring did not improve interobserver agreement and may even reduce consistency amongst less experienced readers. The maintained agreement observed in experienced readers highlights its current value in experienced hands only. Further refinement, including spatial analysis integration, is needed for robust overall AI implementation in CCE.

1. Introduction

Colon capsule endoscopy (CCE) is a non-invasive method for assessing the mucosa of the colon with pan-enteric visualisation capabilities. Despite its potential, maintaining reproducibility and consistency of key measures such as bowel cleansing assessment remains difficult. Adequate bowel preparation is vital not only for improving mucosal visibility and polyp detection but also for correctly deciding if follow-up optical colonoscopy is necessary. According to current European Society of Gastrointestinal Endoscopy (ESGE) guidelines, insufficient cleansing requires further evaluation to confidently rule out pathology, especially polyps measuring ≥5 mm [1,2]. Although the Colon Capsule CLEansing Assessment and Report (CC-CLEAR) scale was developed as a more objective, quantitative tool [3], bowel cleansing assessment is naturally subjective [4,5]. Interobserver agreement varies, with several studies showing only moderate to good consensus, even among experienced CCE reviewers [6,7]. Some results further complicate interpretation, with conflicting evidence suggesting that the Leighton–Rex score may provide better interobserver agreement than CC-CLEAR [4].
The rise of artificial intelligence (AI) in capsule endoscopy has brought promising advances, especially in improving time efficiency [8]. For example, Spada et al. reported a nine-fold reduction in reading time for small bowel CE using AI-assisted systems [9], a finding confirmed by interim results from the Capsule Endoscopy at Scale through Enhanced AI Analysis (CESCAIL) study [10]. Most AI frameworks concentrate on extracting clinically relevant frames, allowing readers to skip large parts of unremarkable footage. However, this efficiency introduces a new limitation: by skipping through the video, readers are unable to thoroughly assess bowel cleansing quality, particularly for segmental scoring systems. Without a reliable AI model to evaluate bowel cleanliness across the entire CCE video, this change in workflow may undermine the rigour and reproducibility of cleansing assessments, potentially eroding confidence among clinicians and patients.
To support the semi-automated reading pathway, AI algorithms must evolve beyond polyp detection to provide contextual interpretation, including cleansing quality evaluation, pathology classification, and polyp matching, as highlighted by Esmaeil et al. [1]. While AI-assisted bowel cleansing scores have been proposed to reduce interobserver variability, prior studies primarily focused on frame-level analysis rather than video-level analysis [2,3]. These approaches fail to account for spatial and temporal continuity within colon segments, whereby cleansing should be judged across mucosal areas and over time rather than from isolated frames. In practice, areas initially poorly visualised might later be adequately assessed from a different angle or with capsule rocking, a factor not captured in frame-level scoring. A recent video-based study by Schelde-Olesen et al. further underscored the limitations of current AI models, demonstrating poor agreement between AI algorithms and human readers, likely due to variability in training data and subjective reference standards [4]. These findings suggest AI should be positioned as a supportive adjunct rather than a replacement for human assessment, an approach not previously explored in the literature. In addition, no study has examined intraobserver variability before and after AI-assisted bowel cleansing assessment, and the impact of such approaches on readers’ evaluations remains unknown. Table 1 summarises all AI-based bowel preparation studies in CCE identified in our literature review.”
Considering these challenges, our sub-study aimed to address these gaps by integrating an AI-assisted tool for objective bowel preparation scoring in CCE. The primary objective was to evaluate interobserver variability in bowel cleansing assessment within the standard reading arm, using both the Leighton–Rex and CC-CLEAR scoring systems, among readers with differing levels of experience in CCE. The secondary objective was to evaluate both the interobserver and intraobserver variability by comparing standard and AI-assisted readings of the same CCE videos among the same readers, following a washout period. Figure 1 summarises the current limitations in bowel cleansing assessment for CCE and outlines the objectives of this study in addressing those gaps. This prospective, multi-reader, washout-paired evaluation tests a hybrid human-in-the-loop AI-assisted bowel cleansing assessment tailored to CCE, rather than small bowel extrapolations, directly addressing standardisation and reproducibility at scale.

2. Methods

2.1. Study Design and Video Selection

In this study, 25 completed CCE videos, defined as those with capsule excretion before battery exhaustion, were pseudonymised and randomly selected from 673 videos in the CESCAIL multicenter prospective diagnostic accuracy study using the RAND function in Microsoft Excel (Microsoft Corporation, Redmond, WA, USA). Each video ID was assigned a random number, and the dataset was then sorted in ascending order based on these values. The first 25 entries were selected for inclusion [6]. The CESCAIL study investigated a Computer-Aided Detection (CADe) system for polyp detection in CCE using the PillCam™ COLON 2 system (Medtronic, Dublin, Ireland) [7]. The patient inclusion criteria were based on the NHS England pilot study, which included adults referred to secondary care under the urgent referral pathway for lower gastrointestinal (GI) symptoms [8] and those scheduled for post-polypectomy surveillance as part of their routine clinical care [9] (Supplementary Table S1 for the details of the inclusion criteria). The sole exclusion criterion for CESCAIL was the inability to provide informed consent.
A power analysis was conducted for both the paired comparative analysis and interobserver agreement. For the primary comparison between AI-assisted and clinician bowel cleanliness scores, a paired-sample design was assumed with an expected moderate effect size (Cohen’s d = 0.6), α = 0.05, and power = 80%. This yielded a required sample size of 24 paired observations; the current study includes 25, thus meeting power requirements. For the interobserver reliability analysis using the Intraclass Correlation Coefficient (ICC), we assumed a population ICC of 0.60, α = 0.05, 8 raters, and 25 subjects [10]. Using an F-distribution-based approximation [11], the calculated statistical power to detect an ICC of at least 0.60 was 1, sufficient for reliable ICC estimation.

2.2. CCE Readers: Grading of Bowel Cleansing

This study employed two distinct video assessment arms: the standard and the AI-assisted reading arms (Figure 2). In the standard arm, accredited CCE readers with various experiences, ranging from 150 to 2000 cases, independently reviewed full-length videos at the maximum frame rate for bowel cleansing assessment only. An experienced reader is defined by more than 500 CCE lifetime reads in this study [12]. Readers’ experiences are detailed in Supplementary Tables S2 and S3. Key anatomical landmarks, including the first caecal image, hepatic flexure, splenic flexure, and final rectal image, were pre-marked by an expert reader to standardise assessments.
During the review, readers evaluated bowel cleansing quality using both the Leighton–Rex [13] and CC-CLEAR [14] scoring systems. The Leighton–Rex scale was applied using a 4-point score (poor, fair, good, excellent), in which only “fair,” “good,” and “excellent” were considered adequate (Figure 3). For an examination to be considered overall adequate on this scale, all five colonic segments had to meet the threshold for acceptable cleansing. In contrast, the CC-CLEAR scale employs a more quantitative approach across three colonic segments: the right colon, the transverse colon, and the left colon. Within each segment, cleansing is scored from 0 to 3 points based on the percentage of mucosa visualised (<50%  =  0 points, 50–75%  =  1 point, >75%  =  2 points, and >90%  =  3 points). The total score, obtained by summing the segment scores, categorises overall bowel cleanliness as excellent (8–9), good (6–7), or inadequate (0–5).

2.3. AI-Assisted Cleansing Grading

The AI algorithm used in this sub-study was developed by our collaborators, Gilabert et al., to support clinicians in evaluating bowel cleanliness in CCE using the CC-CLEAR scale. The system combines image segmentation and classification to estimate mucosal visibility across the entire video while significantly reducing CCE experts’ annotation burden during its training phase. It employs a TransUNet architecture trained to detect intraluminal content in capsule frames, guided by a custom “Patch Loss” function that relies on binary patch-level labels “clean” or “dirty”, rather than full-frame manual segmentation [15,16]. During model development, the following hyperparameters were tuned: (i) patch size for segmentation; (ii) Gaussian smoothing parameters; (iii) TransUNet architecture settings (depth/heads); and (iv) learning rate. Cleanliness is calculated on a frame-by-frame basis by quantifying the proportion of visible mucosa. This information is then summarised in a timeline plot, illustrating fluctuations in bowel cleanliness throughout the capsule examination journey. From this continuous analysis, the algorithm extracts features aligned with CC-CLEAR thresholds and classifies video segments into corresponding cleanliness categories (scores 0–3). Per-frame visible-mucosa proportion was mapped deterministically to CC-CLEAR thresholds: <50% = 0 points, 50–75% = 1 point, 75–90% = 2 points, and >90% = 3 points; these cut-offs were not learned by the model but applied to its per-frame predictions. This system is designed to enhance reader efficiency while preserving clinical control, supporting a reader-led, AI-assisted workflow (Figure 4 for an example of the AI output). A detailed description of the algorithm’s training, validation, and optimisation can be found in the work by Gilabert et al. [15,16,17]. The model was trained, validated, and tested on 113 videos (69/22/22), with splitting performed at the patient level to prevent data leakage.
In addition to generating a timeline plot, the algorithm identifies and flags the six lowest bowel cleansing quality frames within each colonic segment, providing corresponding timestamps. These frames are selected according to the lowest predicted mucosal visibility, without independent validation of this approach. This fixed number was selected to optimise clinical usability by fitting clearly into a single-page report format, allowing high-resolution image display without cognitive overload. The approach follows a “worst-first” principle, whereby if the most poorly visualised frames in a colonic segment are deemed adequate, the remainder of the segment can reasonably be assumed adequate. Conversely, if the worst frames or sections are inadequate, the whole segment would be considered inadequate overall, prompting a follow-up colonoscopy regardless of the remainder. To maintain reader autonomy, flagged frames were accompanied by timestamps, allowing further review of adjacent video segments when needed. This strategy supports a semi-automated, human-in-the-loop workflow and represents a practical first step in validating AI-assisted cleansing evaluation in clinical settings.
Extending on the work of Gilabert et al., our study required all original readers from the initial standard read to undergo an 8–24-week washout period to minimise recall and reporting bias before reassessing the same 25 videos in the AI-assisted arm. Readers were briefed on the AI-assisted reading approach using a detailed instruction document, and optional supplementary training was provided either in person or via virtual meetings to ensure consistency in interpretation. During this phase, readers were limited to the AI-generated visual guide, which included six flagged frames per segment along with the option to review a small number of adjacent frames via RAPID software v9 [18] as needed. As the AI output was based on the CC-CLEAR scale, assessments in the AI-assisted arm were limited to CC-CLEAR scoring only (Figure 3). To minimise bias, all readers were blinded to each other’s scores during both rounds.
All datasets were assessed for missing values before statistical analysis. If missing data were present, the pattern and extent of missingness were examined. Given the observational design, we planned to exclude data points with missing values if they were minimal, non-systematic, and unlikely to bias the results. No imputation was planned unless missingness exceeded 5% or showed a systematic pattern [19]. For interobserver agreement analyses, any missing reader scores were omitted on a per-segment basis.

2.4. Statistical Analysis

Interobserver agreement among CCE readers, with and without AI assistance, was assessed using Fleiss’ Kappa, with bootstrapping (1000 iterations) applied to estimate 95% confidence intervals. Fleiss’s equally arbitrary guidelines characterise Kappas over 0.75 as excellent, 0.40 to 0.75 as fair to good, and below 0.40 as poor [20]. Agreement was evaluated, including overall and by colonic segment, using both the Leighton–Rex and CC-CLEAR scoring systems. Although intraclass correlation coefficients (ICC) have limitations when applied to categorical data, it was included in this study to maintain consistency with previous literature, where it has been commonly used to evaluate the overall reliability across raters in bowel cleansing assessment [10,21,22]. Given that the scoring systems used in CCE represent quasi-continuous ordinal scales, ICC was used alongside Fleiss’ kappa to enhance comparability with previous work and to offer a comprehensive picture of interobserver variability. Agreement levels were interpreted using the criteria established by Landis and Koch, which classify values < 0 as no agreement, 0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement [23].
Intraobserver variability comparing the agreement of standard and AI-assisted reads by the same reader was assessed using weighted Cohen’s Kappa (κ) to account for the ordinal nature of the CC-CLEAR scores. To evaluate whether AI-assisted readings were clinically equivalent to standard clinician readings, both the paired t-test and Two One-Sided Tests (TOST) methodologies were applied [24]. Equivalence bounds were defined as ±1 CC-CLEAR point, representing the maximum difference considered clinically acceptable. For the paired TOST, equivalence was concluded if both one-sided tests yielded statistically significant results (p < 0.05). All statistical analyses were conducted in R version 2025.05.1 [25] using the following packages: “psych” for reliability analysis [26], “dplyr” [26] for data manipulation, “effsize” [27] for effect size analysis, “irr” [28] for agreement measures, “boot” for bootstrapped confidence intervals [29], and “TOSTER” for equivalence testing. All visualisations were created using the “ggplot2” package [26].
To assess the stability of interobserver agreement in both the manual and AI-assisted reads, a sensitivity analysis was performed using a leave-one-observer-out approach. This method systematically excludes one observer at a time to determine whether any individual rater disproportionately influences the overall agreement. This was important to address in the study design, in which one observer dropped out after the washout period, potentially impacting the reliability of consensus. Agreement was quantified using both Fleiss’ Kappa and ICC. For each reduced set of observers, we computed the agreement statistics and used a non-parametric bootstrap procedure with 1000 replicates to estimate 95% confidence intervals. The Bias-Corrected and Accelerated (BCa) method was employed via the “boot.ci” function (type = “bca”) from the R “boot” package. To evaluate whether the agreement values obtained after excluding an observer differed significantly from the overall mean, empirical two-tailed p-values were calculated from the bootstrap distribution.

2.5. Ethical Approval and Funding

The CESCAIL study received ethical approval from the Southwest–Central Bristol Research Ethics Committee (REC reference: 21/SW/0169) and was registered on ClinicalTrials.gov (NCT06008847). The main study was funded by the National Institute for Health and Care Research (NIHR) through the AI Award programme (Award number: NIHR AI_AWARD02440). The design, conduct, data collection, analysis, and reporting of this study were carried out independently of the funders. All participants provided written informed consent after receiving verbal and written information about this study.

3. Results

The evaluations from both the standard and AI-assisted bowel cleansing assessments of 25 videos, including interobserver and intraobserver agreements, are summarised in Table 2 and Table 3. One reader dropped out following an extended 6-month period of intermission. For the Leighton–Rex scores, interobserver agreement was poor, with a Fleiss’ Kappa of 0.15, and moderate agreement on the ICC of 0.55. In contrast, the CC-CLEAR score showed fair agreement, as indicated by Fleiss’ Kappa of 0.27, and excellent agreement by ICC (0.90). Subgroup analyses revealed that experienced readers demonstrated marginally higher agreement than less experienced readers in both scoring systems (Table 2).
In the AI-assisted arm, the agreement did not consistently improve. When accounting for sampling variability via bootstrap resampling (appropriate given the smaller sample size), Fleiss’ Kappa decreased to 0.14 vs. 0.27 for CC-CLEAR, and bootstrapped ICCs were also reduced to 0.59 vs. 0.69 for CC-CLEAR. Subgroup analysis indicated that experienced readers maintained higher interobserver agreement (Fleiss’ Kappa: 0.41, ICC: 0.87) compared to less experienced readers (Fleiss’ Kappa: 0.15, ICC: 0.56). When comparing interobserver agreement between the standard and AI-assisted arms, bootstrapped ICC values were consistently lower in the AI-assisted read compared to the standard read (Table 2). A paired t-test of raw CC-CLEAR scores showed a mean difference of −1.46 points (95% CI: −1.58 to −1.33; p < 0.001) in the AI-assisted read when compared to the standard read, supported by a Cohen’s d of –0.74 (indicating a moderate-to-large effect size) (see Table 3). TOST analyses further confirmed statistically significant differences in scoring between AI-assisted and standard reads across all readers, consistent with the decline in CC-CLEAR scores observed in the paired t-test in the AI-assisted read (Supplementary Table S4). These findings suggest that AI-assisted scoring did not enhance interobserver agreement and may have reduced scoring consistency, particularly among less experienced readers.
Intraobserver agreement, assessed by comparing each reader’s standard and AI-assisted scores, was excellent among all experienced readers. In contrast, half of the less experienced readers demonstrated poor or no agreement. These patterns were consistent across both ICC and weighted Cohen’s Kappa (κ) metrics (Table 3).
The sensitivity analysis revealed no statistically significant outliers in the manual read for either ICC or Fleiss’ Kappa, nor in the AI-assisted ICC (see Supplementary Tables S5–S7 and Figure S1). However, the AI-assisted Fleiss’ Kappa analysis identified four observers whose exclusion led to statistically significant changes in agreement (p < 0.05) in Table S8 and Figure S2. Notably, the removal of two less experienced observers from the same centre resulted in a notable increase in agreement. In contrast, the exclusion of two other observers, one experienced and one inexperienced, both from a different centre, led to a decrease in agreement. These findings suggested the influence of individual raters and institutional contexts on interobserver reliability within AI-assisted evaluation frameworks only.

4. Discussion

In CCE, bowel preparation is traditionally assessed through full manual video review, conducted alongside the evaluation for colonic pathologies. However, as AI becomes increasingly embedded in clinical workflows, enhancing diagnostic efficiency, the necessity for traditional full CCE video review is anticipated to diminish. AI algorithms increasingly filter and prioritise the most relevant (typically positive for pathologies) frames, thereby reducing the time burden on readers. As a result, there is a growing need for efficient and reliable methods to assess bowel cleanliness without requiring full video examination. While several studies have evaluated AI-based bowel preparation assessment using manual readings as the reference standard. Most of them are image-based rather than video-based, potentially limiting their clinical applicability [2,3,30]. Notably, Schelde-Olesen et al. recently reported minimal agreement between AI output and CCE readers’ assessments when AI was used entirely autonomously on video-based analysis [4]. While high-quality reference standards may improve agreement between AI and human readers, excluding human oversight could undermine this agreement as well as the trustworthiness of AI-generated scores.
To our knowledge, this is the first study to address this issue by implementing an AI-assisted, rather than fully autonomous, bowel cleansing assessment, aligning with the principle of “keeping the human in the loop” [4]. Our evaluation centres on a human-in-the-loop workflow purpose-built for colon capsule, rather than fully autonomous, to test whether targeted AI guidance can standardise cleanliness scoring across readers with varying levels of experience. This hybrid approach was intended to preserve clinical control and judgement while improving workflow efficiency. Despite this, our findings revealed that interobserver agreement remained low, even with AI assistance. The interpretation of both the cleansing timeline and the selection of the six worst frames proved to remain highly subjective, particularly among less experienced readers. In subgroup analyses, experienced readers consistently demonstrated significantly higher agreement (Fleiss’ Kappa = 0.41, ICC = 0.87) compared to less experienced readers (Fleiss’ Kappa = 0.15, ICC = 0.56). This may be due to the experienced readers placing greater emphasis on visual assessment of the worst images, while less experienced readers tended to rely more heavily on the AI-computed cleansing scores displayed in the timeline (over-reliance from automation bias). Additionally, the AI algorithm was trained using annotations from expert CCE readers, rather than a mix of experience levels (miscalibration) [31]. This may partly explain the higher concordance observed among experienced readers and could also amplify reliance or create mismatches for novices. Future iterations should therefore include calibration across experience levels and integrate explicit user-feedback loops.
However, interpreting these timelines is complex and subject to several limitations. Firstly, unlike colonoscopy, the capsule’s bidirectional movement and dual-camera views allow for mucosal surfaces obscured in one frame to be visualised in another. The timeline’s per-frame cleansing estimates do not account for this spatiotemporal integration, which human readers often perform intuitively. A promising direction for future research is the integration of spatial mapping into AI algorithms, enabling them to recognise regions of the bowel that have been adequately visualised from multiple angles [1]. Such spatiotemporal modelling would not only enhance the accuracy of bowel cleansing assessment but would also be critical for reliable polyp localisation and for distinguishing multiple lesions within the same colonic segment [4,12]. This spatial localisation capability has already been used in the gastric magnetic capsule technologies [4].
Secondly, AI assessments are fully quantitative, based solely on the percentage of visible mucosa. In contrast, clinician assessments, even when using structured tools like the CC-CLEAR scale, retain a degree of subjectivity and qualitative interpretation. This discrepancy was evident in our intraobserver analysis, where most readers showed a statistically significant reduction in segment CC-CLEAR scores during AI-assisted assessments, as demonstrated by paired t-tests and TOST. On average, scores declined by 1.46 points on the CC-CLEAR scale in the AI-assisted read (Table 3 and Figure S3 in Supplementary Materials). Consequently, the lower AI-assisted cleansing scores may result in more patients being referred for unnecessary colonoscopy due to poor bowel cleansing, thereby affecting both cost-effectiveness and patient burden [4]. Importantly, although AI-assisted and manual reads were not statistically equivalent, this does not imply that the AI approach is inaccurate. Rather, the AI-assisted method failed to reproduce the outcomes of full manual assessment, particularly concerning clinical judgement and interobserver agreement. This limitation may also potentially stem from the “six worst frames” method, which, while conceptually sound, may not yet be the most optimised way to capture the true cleansing quality of a segment. Future studies should refine this approach, for instance, by selecting a larger or variable number of frames depending on segment quality, with more frames presented when cleansing is poor to provide a more accurate assessment. Currently, manual reader-based evaluation remains the reference standard, and AI tools will require further refinement to meet or surpass this benchmark before they can be adopted widely in clinical practice.
The sensitivity analysis revealed that readers who trained and worked closely together exhibited similar interpretive patterns. Notably, the removal of two readers from the same institution, one of whom was a nurse routinely pre-reading for a consultant, led to a decrease in overall agreement, while the removal of another reader pair from a different centre with a similar nurse–consultant dynamic increased agreement. These findings suggest that institutional training environments and shared interpretive frameworks can significantly shape scoring behaviour and influence interobserver reliability. Despite the subjective nature of bowel preparation assessment, the results demonstrated the potential for harmonised training to enhance consistency, particularly in AI-assisted workflows. This further reinforces the necessity of external validation through multicentre studies to ensure the generalisability of AI-assisted approaches.
Moreover, our results reaffirm prior studies indicating that the CC-CLEAR score yields higher interobserver consistency compared to Leighton–Rex. In our study, CC-CLEAR showed better agreement (Fleiss’ Kappa = 0.27, ICC = 0.90) than Leighton–Rex (Fleiss’ Kappa = 0.15, ICC = 0.55), consistent with prior literature [14].
Finally, a major limitation of this study is the small number of readers, with the dropout of one experienced reader potentially introducing bias. Another limitation of this study is the lack of direct evaluation and comparison of reading efficiency between the two arms. The potential efficiency gain remains theoretical and requires validation in prospective time-and-motion studies. In this assisted paradigm, a potential clinical risk is that conservatively low AI-generated cleansing scores could trigger unnecessary conversion to colonoscopy when a segment might otherwise be judged adequate on full review. While AI-assisted reading of CCE is feasible, further refinement is essential to improve intraobserver and interobserver agreements and foster greater trust among clinicians. Future studies should involve larger and more diverse reader cohorts, ideally incorporating a qualitative component to explore the dynamics of reader–AI interaction. Understanding the human factors that shape trust, reliance, and interpretation of AI-generated outputs will be critical to the effective and sustainable integration of AI into clinical CCE workflows.

5. Conclusions

In summary, AI assistance did not improve interobserver agreement overall and, in fact, reduced consistency among less-experienced readers, whereas experienced readers maintained excellent intraobserver reliability. These findings highlight that the effectiveness of AI-assisted interpretation remains highly dependent on reader experience. Future study should prioritise spatial and segmental mapping, as well as user-level calibration, to improve accuracy of bowel cleansing assessment, prevent unnecessary colonoscopy conversions, and support standardised adoption across centres. Importantly, cleansing evaluation represents only one element of the broader algorithmic framework needed to deliver a fully integrated AI-assisted CCE diagnostic service.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers17172840/s1, Table S1: Inclusion and exclusion criteria for NHS England Criteria used in the CESCAIL study [4]. Table S2: The CCE Readers List. Table S3: CCE readers and experience. Table S4: Summary of Paired TSOT and NHST. Table S5: Sensitivity Analysis of Manual Reads Using Bootstrapped Fleiss’ Kappa. Table S6: Sensitivity Analysis of Manual Reads Using Bootstrapped ICC. Table S7: Sensitivity Analysis of AI-assisted Reads Using Bootstrapped ICC. Table S8. Sensitivity Analysis of AI-assisted Reads Using Bootstrapped Fleiss’ Kappa. Figure S1: Sensitivity analysis using a leave-one-observer-out approach: (a) Fleiss’ Kappa for manual reads; (b) Intraclass Correlation Coefficient (ICC) for manual reads. Figure S2: Sensitivity analysis using a leave-one-observer-out approach: (a) Fleiss’ Kappa values after sequential removal of each observer. Removal of AM and AR resulted in lower overall agreement, while removal of BBL and CS improved agreement; (b) Intraclass Correlation Coefficient (ICC) for AI-assisted reads. Figure S3: Box plot to compare mean score in the AI-assisted arm against the standard arm using CC Clear score. The AI-assisted group shows consistently lower total CC-Clear scores compared to clinician-only ratings, indicating stricter or more conservative evaluations by the AI-assisted method. The boxes represent interquartile ranges (IQR), with horizontal lines indicating medians, and whiskers denoting 1.5× IQR.

Author Contributions

I.I.L. conceptualised, designed, and conducted the project; managed administration and data collection; CCE AI-assisted and panel reading; accessed all raw datasets; conducted the statistical analysis; and prepared, reviewed and edited the draft manuscript. N.P. and R.P.A. supervised the project, accessed all raw datasets, curated and verified data, and contributed to reviewing and editing the draft manuscript. H.W., C.H. and E.W. supervised and administered the project. D.R.G., A.R., B.S.-O., A.M., A.B., B.B.L., C.S. and U.V. performed CCE readings in standard and AI-assisted arms, reviewed and verified data, and participated in reviewing and editing the manuscript. A.K., P.G., P.L. and S.S. reviewed and verified data and contributed to the reviewing and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by the National Institute for Health and Care Research (NIHR) under the AI Award (NIHR AI_AWARD02440).

Institutional Review Board Statement

The CESCAIL study was conducted in accordance with the principles of the Declaration of Helsinki. Ethical approval was granted by the Southwest–Central Bristol Research Ethics Committee (REC reference: 21/SW/0169) on 12 November 2021. Thos study also received Health Research Authority (HRA) and Health and Care Research Wales (HCRW) approval on 23 November 2021. University Hospitals Coventry and Warwickshire (UHCW) NHS Trust acted as the study sponsor.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study, and written informed consent has also been obtained from the patients to publish this paper.

Data Availability Statement

Owing to GDPR restrictions, raw capsule videos cannot be shared. De-identified scoring matrices and complete R analysis scripts are available from the corresponding author upon reasonable request. Controlled on-site or secure-environment video access may be arranged subject to appropriate approvals.

Conflicts of Interest

Hagen Wenzek, Elizabeth White, and Pablo Laiz are affiliated with CHI and GI Digital, Inc., the organisation that holds the intellectual property rights to the AI used in this study. This project was additionally supported by funding from the National Institute for Health and Care Research.

Abbreviations

AIArtificial intelligence
AI-SPEED™AI-assisted System for Polyp and Endoscopy Evaluation and Detection
BCaBias-Corrected and Accelerated (bootstrap method)
CADeComputer-Aided Detection
CCEColon capsule endoscopy
CC-CLEARColon Capsule CLEansing Assessment and Report
CECapsule endoscopy
CESCAILCapsule Endoscopy at Scale through Enhanced AI Analysis Study
CI (bootstrap)Confidence interval from bootstrapping
Fleiss’ KappaStatistical measure for multi-rater agreement
k/κCohen’s Kappa (inter-rater reliability)
MRIMagnetic Resonance Imaging
MREMagnetic Resonance Enterography
NIHRNational Institute for Health and Care Research
NHSNational Health Service
NPVNegative Predictive Value
OEOptical Endoscopy
PCEPanenteric capsule endoscopy
PEGPolyethylene Glycol
PPVPositive Predictive Value
PRISMAPreferred Reporting Items for Systematic Reviews and Meta-Analyses (referenced in style, not acronym use)
RR Programming Language (used for analysis)
RAPIDRapid Access Real-Time Device (Medtronic software)
RECResearch Ethics Committee
S1/S2/S3Supplementary Tables/Figures (and their annotations, respectively)
SDStandard Deviation
TOSTsTwo One-Sided Tests (equivalence testing)
TransUNetNeural Network Architectures for image segmentation

References

  1. Nadimi, E.S.; Braun, J.M.; Schelde-Olesen, B.; Khare, S.; Gogineni, V.C.; Blanes-Vidal, V.; Baatrup, G. Towards full integration of explainable artificial intelligence in colon capsule endoscopy’s pathway. Sci. Rep. 2025, 15, 5960. [Google Scholar] [CrossRef]
  2. Buijs, M.M.; Ramezani, M.H.; Herp, J.; Kroijer, R.; Kobaek-Larsen, M.; Baatrup, G.; Nadimi, E.S. Assessment of bowel cleansing quality in colon capsule endoscopy using machine learning: A pilot study. Endosc. Int. Open 2018, 6, E1044–E1050. [Google Scholar] [CrossRef]
  3. Becq, A.; Histace, A.; Camus, M.; Nion-Larmurier, I.; Abou Ali, E.; Pietri, O.; Romain, O.; Chaput, U.; Li, C.; Marteau, P.; et al. Development of a computed cleansing score to assess quality of bowel preparation in colon capsule endoscopy. Endosc. Int. Open 2018, 6, E844–E850. [Google Scholar] [CrossRef] [PubMed]
  4. Schelde-Olesen, B.; Herp, J.; Braun, J.-M.; Koulaouzidis, A.; Bjørsum-Meyer, T.; Kaalby, L.; Baatrup, G.; Nadimi, E.S.; Deding, U. Interobserver agreement between an artificial intelligence algorithm and colon capsule endoscopy readers on bowel-cleansing quality. iGIE 2023, 2, 148–153.E3. [Google Scholar] [CrossRef]
  5. Lei, I.I.; Nia, G.J.; White, E.; Wenzek, H.; Segui, S.; Watson, A.J.M.; Koulaouzidis, A.; Arasaradnam, R.P. Clinicians’ Guide to Artificial Intelligence in Colon Capsule Endoscopy-Technology Made Simple. Diagnostics 2023, 13, 1038. [Google Scholar] [CrossRef]
  6. Microsoft Corporation. Microsoft Excel; Version 2022; Microsoft Corporation: Redmond, WA, USA, 2022. [Google Scholar]
  7. Lei, I.I.; Tompkins, K.; White, E.; Watson, A.; Parsons, N.; Noufaily, A.; Segui, S.; Wenzek, H.; Badreldin, R.; Conlin, A.; et al. Study of capsule endoscopy delivery at scale through enhanced artificial intelligence-enabled analysis (the CESCAIL study). Color. Dis. 2023, 25, 1498–1505. [Google Scholar] [CrossRef]
  8. Primary Care Diagnostic Pathway for Lower Gastrointestinal (GI) Symptoms in Adults (Not for Acutely Unwell Patients). Available online: https://www.whatsupwithmygut.org.uk/healthcare#adult-pathway (accessed on 28 December 2024).
  9. Turvill, J.; Haritakis, M.; Pygall, S.; Bryant, E.; Cox, H.; Forshaw, G.; Musicha, C.; Allgar, V.; Logan, R.; McAlindon, M. Multicentre Study of 10,369 Symptomatic Patients Comparing the Diagnostic Accuracy of Colon Capsule Endoscopy, Colonoscopy and CT Colonography. Aliment. Pharmacol. Ther. 2025, 61, 1532–1544. [Google Scholar] [CrossRef] [PubMed]
  10. Schelde-Olesen, B.; Koulaouzidis, A.; Deding, U.; Toth, E.; Dabos, K.J.; Eliakim, A.; Carretero, C.; Gonzalez-Suarez, B.; Dray, X.; de Lange, T.; et al. Bowel cleansing quality evaluation in colon capsule endoscopy: What is the reference standard? Therap. Adv. Gastroenterol. 2024, 17, 17562848241290256. [Google Scholar] [CrossRef] [PubMed]
  11. Walter, S.D.; Eliasziw, M.; Donner, A. Sample size and optimal designs for reliability studies. Stat. Med. 1998, 17, 101–110. [Google Scholar] [CrossRef]
  12. Lei, I.I.; Koulaouzidis, A.; Baatrup, G.; Samaan, M.; Parisi, I.; McAlindon, M.; Toth, E.; Shaukat, A.; Valentiner, U.; Dabos, K.J.; et al. Rationalizing polyp matching criteria in colon capsule endoscopy: An international expert consensus through RAND (modified DELPHI) process. Therap. Adv. Gastroenterol. 2024, 17, 17562848241242681. [Google Scholar] [CrossRef]
  13. Leighton, J.A.; Rex, D.K. A grading scale to evaluate colon cleansing for the PillCam COLON capsule: A reliability study. Endoscopy 2011, 43, 123–127. [Google Scholar] [CrossRef] [PubMed]
  14. De Sousa Magalhaes, R.; Sousa-Pinto, B.; Boal Carvalho, P.; Rosa, B.; Moreira, M.J.; Cotter, J. Cc-clear (colon capsule cleansing assessment and report): The novel scale to evaluate the quality of bowel preparation in capsule colonoscopy-a prospective validation study. Endoscopy 2021, 53, S193–S194. [Google Scholar] [CrossRef]
  15. Gilabert, P.; Malagelada, C.; Wenzek, H.; Vitrià, J.; Seguí, S. Automated Cleanliness Scoring andDigestive Content Segmentation forCapsule Endoscopy. Artif. Intell. Res. Dev. 2023, 375, 134–135. [Google Scholar] [CrossRef]
  16. Gilabert Roca, P. End-to-End AI Solutions for Capsule Endoscopy: Enhancing Efficiency and Accuracy in Gastrointestinal Diagnostics. Doctoral Dissertation, University of Barcelona, Barcelona, Spain, 2025. Available online: https://hdl.handle.net/10803/694089 (accessed on 8 July 2025).
  17. Gilabert, P.; Malagelada, C.; Wenzek, H.; Watson, A.; Alexander, R.; Robertson, A.F.; Jordi, V.; Santi, S. AI-Assisted Evaluation of Colon Cleanliness in Capsule Endoscopy Videos. Comput. Biol. Med. 2024, Preprinted. [Google Scholar]
  18. Spada, C.; Riccioni, M.E.; Costamagna, G. Rapid Access Real-Time device and Rapid Access software: New tools in the armamentarium of capsule endoscopy. Expert Rev. Med. Devices 2007, 4, 431–435. [Google Scholar] [CrossRef]
  19. Roderick Little, D.R. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
  20. Fleiss, J.L. Statistical Methods for Rates and Proportions, 2nd ed.; Wiley: Hoboken, NJ, USA, 1981. [Google Scholar]
  21. Buijs, M.M.; Kroijer, R.; Kobaek-Larsen, M.; Spada, C.; Fernandez-Urien, I.; Steele, R.J.; Baatrup, G. Intra and inter-observer agreement on polyp detection in colon capsule endoscopy evaluations. United Eur. Gastroenterol. J. 2018, 6, 1563–1568. [Google Scholar] [CrossRef] [PubMed]
  22. Kastenberg, D.; Bertiger, G.; Brogadir, S. Bowel preparation quality scales for colonoscopy. World J. Gastroenterol. 2018, 24, 2833–2843. [Google Scholar] [CrossRef]
  23. Hallgren, K.A. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor. Quant. Methods Psychol. 2012, 8, 23–34. [Google Scholar] [CrossRef] [PubMed]
  24. Lakens, D. Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Soc. Psychol. Personal. Sci. 2017, 8, 355–362. [Google Scholar] [CrossRef]
  25. Team, R.C. seq: Sequence Generation. In R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024. [Google Scholar]
  26. Revelle, W. psych: Procedures for Psychological, Psychometric, and Personality Research, version 2.3.9; R Package; Northwestern University: Evanston, IL, USA, 2023. [Google Scholar]
  27. Torchiano, M. Effsize: Efficient Effect Size Computation; Version 0.8.1; R Package; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar] [CrossRef]
  28. Gamer, M.; Lemon, J.; Fellows, I.; Singh, P. irr: Various Coefficients of Interrater Reliability and Agreement; Version 0.84.1; R Package; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
  29. Canty, A.; Ripley, B.D. Boot: Bootstrap Functions (Originally by Angelo Canty for S); Version 1.3-28; R Package; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  30. Moen, S.; Vuik, F.E.R.; Kuipers, E.J.; Spaander, M.C.W. Artificial Intelligence in Colon Capsule Endoscopy—A Systematic Review. Diagnostics 2022, 12, 1994. [Google Scholar] [CrossRef]
  31. Lyell, D.; Coiera, E. Automation bias and verification complexity: A systematic review. J. Am. Med. Inform. Assoc. 2017, 24, 423–431. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Summary of current gaps and study objectives for AI-assisted bowel cleansing in CCE.
Figure 1. Summary of current gaps and study objectives for AI-assisted bowel cleansing in CCE.
Cancers 17 02840 g001
Figure 2. Study flowchart. CCE, colon capsule endoscopy.
Figure 2. Study flowchart. CCE, colon capsule endoscopy.
Cancers 17 02840 g002
Figure 3. Examples of colon capsule endoscopy frames graded according to the Leighton–Rex scale: (A) excellent, (B) good, (C) fair, and (D) poor.
Figure 3. Examples of colon capsule endoscopy frames graded according to the Leighton–Rex scale: (A) excellent, (B) good, (C) fair, and (D) poor.
Cancers 17 02840 g003
Figure 4. Example of an AI-generated output displaying the cleansing timeline, colour-coded according to CC-CLEAR scores. The timeline shows the percentage of mucosal cleanliness over time: green (>90%), yellow (75–90%), orange (50–75%), and red (<50%). Red timestamps indicate the worst image within each colonic segment, while dotted black lines mark the colonic flexures. The grey zones indicate time intervals containing the worst images, corresponding to segments where the cleanliness graph falls below 90% and subsequently rises above 90%, with the start and end points marked by purple timestamps. The images at the bottom represent the seven worst frames selected by the AI for each segment.
Figure 4. Example of an AI-generated output displaying the cleansing timeline, colour-coded according to CC-CLEAR scores. The timeline shows the percentage of mucosal cleanliness over time: green (>90%), yellow (75–90%), orange (50–75%), and red (<50%). Red timestamps indicate the worst image within each colonic segment, while dotted black lines mark the colonic flexures. The grey zones indicate time intervals containing the worst images, corresponding to segments where the cleanliness graph falls below 90% and subsequently rises above 90%, with the start and end points marked by purple timestamps. The images at the bottom represent the seven worst frames selected by the AI for each segment.
Cancers 17 02840 g004
Table 1. Summary of key studies evaluating AI-based bowel cleanliness assessment in CCE [5].
Table 1. Summary of key studies evaluating AI-based bowel cleanliness assessment in CCE [5].
StudyType of AI Number of Videos/Frames AnalysedLevel of Agreement AI with Readers, %Sensitivity Specificity
Buijs [2]Non-linear index model
SVM mode
41 videos
41 videos
32%
47%
-
-
-
-
Becq [3]R/G ratio
R/(R + G) ratio
216 frames
192 frames
-
-
86.5%
95.5%
77.7%
62.9%
Schelde-Olesen [4]Pixel-level classification was performed using models as originally described by Buijs et al. [2]842 videosCohen’s k = 0.02–0.17 on the 2-point scale
Cohen’s k = 0.02–0.16 on the 4-point scale
--
Table 2. Summary of the interobserver agreement of both standard read and AI-assisted arms.
Table 2. Summary of the interobserver agreement of both standard read and AI-assisted arms.
Interobserver Agreement–Standard Read
Readers (n = 9)Fleiss
Kappa
Boostrapped Fleiss Kappa (95%CI)ICCBootstrapped ICC (95%CI)
Leighton–Rex (all)0.150.15 (0.11–0.18)0.550.55 (0.48–0.62)
Experienced readers0.180.18 (0.13–0.24)0.600.60 (0.54–0.67)
Less experienced readers0.120.12 (0.06–0.18)0.540.53 (0.46–0.63)
CC-Clear (all)0.270.27 (0.23–0.30)0.900.90 (0.86–0.92)
Experienced readers0.290.29 (0.24–0.25)0.900.90 (0.87–0.92)
Less experienced readers0.240.24 (0.18–0.29)0.880.88 (0.83–0.91)
Interobserver Agreement–AI-assisted Read
Readers (n = 8)Fleiss
Kappa
Boostrapped Fleiss Kappa (95%CI)ICCBootstrapped ICC (95%CI)
CC-Clear (all)0.270.14 (0.10–0.11)0.690.59 (0.49–0.67)
Experienced readers0.410.27 (0.21–0.33)0.870.68 (0.60–0.75)
Less experienced readers0.15−0.034 (0.079–0.004)0.560.51 (0.35–0.63)
Table 3. Intraobserver agreement within the same reader comparing standard vs. AI-assisted read using ICC and weighted Cohen’s Kappa.
Table 3. Intraobserver agreement within the same reader comparing standard vs. AI-assisted read using ICC and weighted Cohen’s Kappa.
Readers (n = 8)CCE ReadersICC (95%CI)Weighted Cohen’s Kappa (k)p Value
ExperiencedReader 10.77 (0.68–0.84)0.316<0.001
ExperiencedReader 20.90 (0.85–0.93)0.321<0.001
ExperiencedReader 30.81 (0.73–0.87)0.338<0.001
ExperiencedReader 40.78 (0.69–0.85)0.352<0.001
Less experiencedReader 50.69 (0.57–0.78)0.1090.004
Less experiencedReader 60.21 (0.02–0.39)−0.0070.771
Less experiencedReader 70.03 (−0.16–0.23)−0.0310.178
Less experiencedReader 80.80 (0.72–0.86)−0.0230.796
Comparing CC-CLEAR scores between standard and AI-assisted arms
Paired t-test on Raw scoreMean difference = −1.46
(−1.58 to −1.33)
Cohen’s d (effect size)
d = −0.74
p < 0.001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lei, I.I.; Gaya, D.R.; Robertson, A.; Schelde-Olesen, B.; Mapiye, A.; Bhandare, A.; Lui, B.B.; Shekhar, C.; Valentiner, U.; Gilabert, P.; et al. Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study. Cancers 2025, 17, 2840. https://doi.org/10.3390/cancers17172840

AMA Style

Lei II, Gaya DR, Robertson A, Schelde-Olesen B, Mapiye A, Bhandare A, Lui BB, Shekhar C, Valentiner U, Gilabert P, et al. Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study. Cancers. 2025; 17(17):2840. https://doi.org/10.3390/cancers17172840

Chicago/Turabian Style

Lei, Ian Io, Daniel R. Gaya, Alexander Robertson, Benedicte Schelde-Olesen, Alice Mapiye, Anirudh Bhandare, Bei Bei Lui, Chander Shekhar, Ursula Valentiner, Pere Gilabert, and et al. 2025. "Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study" Cancers 17, no. 17: 2840. https://doi.org/10.3390/cancers17172840

APA Style

Lei, I. I., Gaya, D. R., Robertson, A., Schelde-Olesen, B., Mapiye, A., Bhandare, A., Lui, B. B., Shekhar, C., Valentiner, U., Gilabert, P., Laiz, P., Segui, S., Parsons, N., Huhulea, C., Wenzek, H., White, E., Koulaouzidis, A., & Arasaradnam, R. P. (2025). Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study. Cancers, 17(17), 2840. https://doi.org/10.3390/cancers17172840

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop