Fecal Immunochemical Tests for Colorectal Cancer Screening: Is Fecal Sampling from Multiple Sites Necessary?

Fecal immunochemical tests (FITs) for hemoglobin (Hb) are increasingly used for colorectal cancer (CRC) screening. Most FIT manufacturers instruct that fecal samples from multiple parts of one bowel movement should be obtained. Our aim was to compare the FIT diagnostic performance based on fecal samples from just one versus two different sites of one bowel movement. A total of 1141 participants of screening colonoscopy provided two fecal samples from two different sites of a single bowel movement for FIT analyses. There was no statistically significant difference in the diagnostic performance of the FIT when either one or both fecal samples were used for analysis, with area under the curve (AUC) for detecting CRC ranging from 0.94 (95% confidence interval (CI) 0.84–0.99) for one FIT to 0.95 (95%CI 0.86–0.99) for a geometric mean of two FITs. The manufacturers’ recommendation of sampling multiple sites of the stool aims to reduce intra-individual Hb variability and improve diagnostic performance. If no such improvement can be achieved, the recommendation for multiple-site sampling might have potential adverse effects on population adherence to FIT-based CRC screening. Our results point to a potential of increasing adherence to FIT screening by simplifying instructions for fecal sampling at no loss of the diagnostic performance.


Introduction
Fecal immunochemical tests (FITs) for hemoglobin (Hb) are increasingly recommended and used for colorectal cancer (CRC) screening [1][2][3]. Typically, fecal sampling from only one bowel movement is required for FITs, as previous studies have shown little, if any, gain in diagnostic performance when combining FIT results from two or three bowel movements [4]. However, most FIT manufacturers provide detailed and not too appealing instructions on how to obtain fecal samples from multiple parts of the same bowel movement in order to account for different fecal Hb concentrations within a single bowel movement. Table 1 provides examples of stool sample instructions for a number of quantitative FITs.
However, it is unclear whether multiple-site sampling is superior to single-site fecal sampling. To the best of our knowledge, no previous study has assessed the potential gain in diagnostic performance of single-site versus multi-site fecal sampling from the same bowel movement among average-risk screening colonoscopy participants. In particular, none of the studies evaluating the diagnostic In this study, we aimed to provide empirical evidence on inter-site variation of fecal Hb concentrations within a single bowel movement and its potential relevance for the diagnostic performance of FIT comparing single-site versus multiple-site fecal sampling in a large screening study from Germany.

Results
A total of 1141 CRC screening participants from the BliTz study were included in this analysis. Of those, 50.3% were women, and the median age was 60 years (Table 2). One hundred and twenty five participants were diagnosed with advanced neoplasms, including participants diagnosed with CRC (n = 14) or advanced adenoma (n = 111), while 1016 participants had no advanced neoplasm detected at screening colonoscopy. Indicators for the diagnostic performance of both FITs are shown in Table 3. There was no statistically significant difference in the test performance of the first single-site sample (FIT1) and the second single-site sample (FIT2). The sensitivity for detecting CRC was 92.9% (95% CI 66.1-99.8) for each single-site FIT and for all combinations of the two FITs (multiple-site). For detecting advanced adenomas, the sensitivity was 38.7% (95% CI 29.6-48.5) for FIT1 and 37.8% (95%CI 28.8-47.5) for FIT2. When combining CRC and advanced adenomas into a group of advanced neoplasms, the sensitivity was 44.8% (95% CI 35.9-54.0) for FIT1 and 44.0% (95% CI 35.1-53.2) for FIT2. The specificity for no advanced neoplasms was 90.0% and 89.5% for FIT1 and FIT2, respectively. Combining FIT results according to algorithm I (PP-two positive FIT results) resulted in a lower sensitivity of 38.4% (95% CI: 29.8-47.5) for the detection of advanced neoplasms, and a higher specificity of 93.3% (95% CI: 91.6-94.8) for no advanced neoplasms. When combining both FIT results according to algorithm II (PN-at least one positive FIT result), the sensitivity for detecting advanced neoplasms increased to 50.4% (95% CI 41.3-59.5), with the specificity decreasing to 86.1% (95%CI 83.8-88.2). Combinations based on the arithmetic or geometric mean, a simulation for multi-site sampling of the stool, resulted in sensitivities and specificities that were similar to those of the single FITs for detecting CRC, advanced adenomas, or their combination.
Spearman's rank correlation between the two FITs (single-site) was 0.731. For participants with no detectable blood in FIT1 (n = 787), 98% had Hb concentrations below the manufacturer's cutoff (17 µg Hb/g stool) also in FIT2. For participants with detectable Hb concentrations below the manufacturer's cutoff in FIT1 (n = 196), 15% had Hb concentrations above the manufacturer's cutoff in FIT2.

Discussion
With FIT increasingly being used for detecting advanced adenomas and CRC in sc programs worldwide [2], we looked at providing empirical evidence on inter-site fecal Hb v within the same bowel movement. To our knowledge, this is the first study evaluating and comparing the diagnostic performance of single-site versus multiple-site fecal sampling of t bowel movement in screening colonoscopy participants. We observed a similar di performance between single-site and multiple-site fecal sampling for detecting both C advanced adenoma.
In the presence of the major heterogeneity of Hb concentrations within a single Spearman's rank correlation between the two FITs (single-site) was 0.731. For participants with no detectable blood in FIT1 (n = 787), 98% had Hb concentrations below the manufacturer's cutoff (17 µg Hb/g stool) also in FIT2. For participants with detectable Hb concentrations below the manufacturer's cutoff in FIT1 (n = 196), 15% had Hb concentrations above the manufacturer's cutoff in FIT2.

Discussion
With FIT increasingly being used for detecting advanced adenomas and CRC in screening programs worldwide [2], we looked at providing empirical evidence on inter-site fecal Hb variation within the same bowel movement. To our knowledge, this is the first study evaluating and directly comparing the diagnostic performance of single-site versus multiple-site fecal sampling of the same bowel movement in screening colonoscopy participants. We observed a similar diagnostic performance between single-site and multiple-site fecal sampling for detecting both CRC and advanced adenoma.
In the presence of the major heterogeneity of Hb concentrations within a single bowel movement, FIT manufacturers, in the leaflets for patients accompanying the stool collection tubes, call for sampling the stool at multiple sites of the bowel movement (Table 1); the rationale behind this recommendation being that multiple-site sampling may reduce intra-individual variability of FIT results and improve the diagnostic performance. On the other hand, if no such improvement can be achieved, the recommendation for multiple-site sampling might be irrelevant or even harmful as unnecessarily complex or unpleasant fecal sampling schemes might have potential adverse effects on population adherence to FIT-based CRC screening [6].
Although our results indicate that fecal Hb concentrations can differ slightly between different sites of the same bowel movement, combining both FIT results by calculating either an arithmetic or geometric mean, which may simulate stool sampling from multiple sites of the stool sample as is recommended by FIT manufacturers, did not improve the test performance for detecting CRC or advanced adenomas. Similarly, the ROC curves and AUCs were almost identical for single-site versus multiple-site sampling, indicating that multiple-site sampling did not improve the overall diagnostic performance across a wide range of cutoffs compared to single-site fecal sampling.
A few previous colonoscopy-controlled studies [7,8] compared the diagnostic performance of FITs in average-risk screening populations with regards to the number of FIT samples. The authors found that with an increasing number of FIT samples, the sensitivity increases too, but in a similar way, the specificity decreases. However, looking at the overall test performance, similar AUCs for the detection of advanced neoplasms were observed. Since the samples for these studies were taken on consecutive days from different bowel movements, the question of whether multiple-site fecal sampling improves the overall test performance of FIT compared to single-site sampling from the same bowel movement was not directly addressed.
The meta-analysis by Lee et al [4], published in 2014, also looked at aspects of the diagnostic accuracy of one, two, or three samples for FITs taken from consecutive bowel movements for the detection of CRC in average-risk screening populations. The authors concluded that the characteristics of FIT, such as sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio, were very similar, irrespective of the number of stool samples tested, although the authors found significant heterogeneity in the sensitivity and specificity rates between studies.
The strengths of our study lie in its setting within a true screening population. All the participants in our study, not only those with a positive FIT result, underwent screening colonoscopy, independent of the FIT result, thus enabling us to have a comprehensive look at the diagnostic characteristics of the FIT. Moreover, to our knowledge, this is the first study to compare two stool samples taken on the same day, and from different areas of the same bowel movement. A limitation of our study is the fact that stool samples were not collected directly in original FIT sampling tubes, which are filled with a preservative buffer to slow down hemoglobin decay, but in small containers, and stored frozen until analysis. However, we have previously shown that collection in small containers or samples collected by the participants in FIT sampling tubes provided very comparable data. A comparison of frozen and fresh fecal samples also provided similar results [9]. Furthermore, the analysis was based on fecal sampling from only two different sites of the same bowel movement, while some manufacturers recommend sampling of up to six places in the same stool.

Study Design and Population
Our analysis is based on data from the ongoing BliTz study, whose design has been reported in detail elsewhere [10][11][12][13]. Briefly, the BliTz study was initiated in 2005 with the aim of evaluating and improving non-invasive tools for CRC screening and includes participants of the German screening colonoscopy program, recruited in 20 gastroenterological practices in southwestern Germany. Written informed consent was obtained from each participant in the study. Participants were given stool collection containers for collecting fecal samples prior to preparation for colonoscopy and asked to fill out a self-administered questionnaire including questions regarding health history and lifestyle. The current analyses include BliTz study participants recruited between 2010 and 2012 and the following were excluded from the current analyses ( Figure 2): participants who conducted stool sampling after preparation for colonoscopy or after colonoscopy (n = 14), those under 50 years or over 79 years of age at the time of colonoscopy (n = 33), participants who reported that they had been diagnosed with CRC in the past or who were suffering from inflammatory bowel disease (n = 8), those who had undergone another colonoscopy in the five years prior to the current colonoscopy (n = 60), and participants who had inadequate bowel preparation prior to colonoscopy (n = 129) or an incomplete colonoscopy (caecum not reached) (n = 23). In total, 1141 participants met all inclusion criteria and were included in this study.

Data and Sample Collection
After signing informed consent forms, BliTz study participants were given two small stool collection containers. They were instructed to collect one stool sample per container, with each sample from a different area of the same bowel movement. Stool collection was done at home before

Data and Sample Collection
After signing informed consent forms, BliTz study participants were given two small stool collection containers. They were instructed to collect one stool sample per container, with each sample from a different area of the same bowel movement. Stool collection was done at home before bowel preparation for colonoscopy. No dietary or medicinal recommendations or restrictions were given. The participants were asked to keep the samples frozen, or refrigerated if freezing was not possible, and to bring them to the gastroenterological practice on the day of their colonoscopy. The samples were then directly stored at −20 • C and shipped on dry ice to a central laboratory (see below). Demographic information was obtained from the self-administered questionnaires filled out by all participants.

FIT Analyses
FIT analyses using FOB Gold by Sentinel Diagnostics (Milan, Italy) were evaluated blinded at a central DIN EN ISO 15189 accredited laboratory (MVZ Labor Limbach, Heidelberg, Germany). Reporting and evaluation of the FITs followed FITTER standards [14]. Each collection container held 1 g of stool and the median time from collection to analysis was five days (IQR = 4-7 days). In the lab, the frozen stool samples were thawed and an automatic stool extraction system was used to extract 10 mg stool, which was then diluted in 1.7 mL extraction buffer (i.e., dilution: 1:170) according to routine clinical practice. The samples were assigned as FIT1 or FIT2 by simple randomization. Both samples were analyzed using Abbott Architect c8000 (Abbott Park, IL, USA) with an analytical working range of 0.034-140 µg Hb/g stool on the same date, which was recorded. Classification of FIT results as positive or negative was done at the threshold recommended by the manufacturer (17 µg Hb/g stool).

Statistical Analysis
The current analysis is a post hoc analysis of a sub-group in a large diagnostic study designed to estimate the diagnostic performance of various non-invasive tests compared to screening colonoscopy. This study was therefore not specifically powered or designed to test a specific pre-defined hypothesis. All statistical analyses were conducted using R version 3.4.4 (2018-03-15) [15]. The positivity rate, sensitivity for the detection of CRC, advanced adenomas (defined as adenomas with at least one of the following: ≥1 cm in size, tubulovillous or villous components and high-grade dysplasia) or their combination (advanced neoplasms), as well as specificity for the absence of advanced neoplasms with their exact 95% confidence intervals (CIs), were calculated for each FIT separately and in combination. For the combination of both FIT results, four different algorithms were applied: (1) Positive if at least one of the FIT results was above the manufacturer's cutoff; (2) positive if both FIT results were above the manufacturer's cutoff; (3) positive if the arithmetic mean of the results of the two FITs was above the manufacturer's cutoff; and (4) positive if the geometric mean of the results of the two FITs was above the manufacturer's cutoff. Indicators of diagnostic performance were compared using McNemar's exact test.
Spearman's rank correlation was used to assess the correlation between the two quantitative FIT results. In order to evaluate the diagnostic performance across different cutoffs, receiver operating characteristic (ROC) curves were plotted and the areas under the curve (AUCs) for the detection of CRC, advanced adenomas, and both of these outcomes combined (advanced neoplasms) were determined using the "pROC" package [16] in R. Confidence intervals (95% CIs) of the AUCs were calculated via nonparametric bootstrapping, replicating random sampling with replacement. Statistical significance of two-sided tests was defined by p-values < 0.05.

Conclusions
In conclusion, despite its limitations, our study suggests that the diagnostic performance of FIT utilizing multiple-site fecal sampling from the same bowel movement may not be superior to a single-site sample in the average-risk screening population for the detection of CRC and advanced adenomas. These results do not support the necessity for sampling the stool in different locations for a FIT, as currently recommended by most manufacturers. Our findings suggest that the simplification of patient instructions for FITs might be considered, as the advantages of the expected increase in patient adherence to simplified instructions may outweigh the negligible, if any, loss in diagnostic performance.