Fecal Immunochemical Tests Detect Screening Participants with Multiple Advanced Adenomas Better than T1 Colorectal Cancers

Simple Summary Fecal occult blood tests (FOBTs) detect colorectal cancer (CRC) at high levels of sensitivity and specificity. However, the detection of early-stage cancers is highly important to reduce CRC mortality. We aimed to assess the sensitivity of a large number of different FOBTs according to various tumor characteristics. We observed among all FOBTs consistently lower sensitivities for UICC stage I cancers in comparison to more advanced cancer stages. An even stronger gradient was found according to T status, with substantially lower sensitivities for T1 than for T2–T4 cancers. Furthermore, sensitivities for T1 cancers were even lower than sensitivities for detection of multiple advanced adenomas. Further research should focus on improving the sensitivity of non-invasive tests for detection of UICC stage I and T1 cancers. Abstract Background: Fecal immunochemical tests (FITs) are widely used for colorectal cancer (CRC) screening. The detection of early-stage cancer and advanced adenoma (AA), the most important premalignant lesion, is highly relevant to reducing CRC-related deaths. We aimed to assess sensitivity for the detection of CRC and AA stratified by tumor stage; number; size; histology of AA; and by location, age, sex, and body mass index (BMI). Methods: Participants of screening colonoscopy (n = 2043) and newly diagnosed CRC patients (n = 184) provided a stool sample before bowel preparation or CRC surgery. Fecal hemoglobin concentration was determined in parallel by nine different quantitative FITs among 94 CRC patients, 200 AA cases, and 300 participants free of advanced neoplasm. Sensitivities were calculated at original cutoffs and at adjusted cutoffs, yielding 93% specificity among all FITs. Results: At adjusted cutoffs, UICC stage I cancers yielded consistently lower sensitivities (range: 62–68%) compared to stage II–IV cancers (range: 73–89%). An even stronger gradient was observed according to T status, with substantially lower sensitivities for T1 (range: 39–57%) than for T2–T4 cancers (range: 71–100%). Sensitivities for the detection of participants with multiple AAs ranged from 55% to 64% and were by up to 25% points higher than sensitivities for T1 cancers. Conclusions: FITs detect stage I cancers and especially T1 cancers at substantially lower sensitivities than more advanced cancer stages. Participants with multiple AAs were detected with slightly lower sensitivities than stage I cancers and with even higher sensitivities than T1 cancers. Further research should focus on improving the detection of early-stage cancers.


Introduction
Fecal immunochemical tests (FITs) for hemoglobin are widely used for colorectal cancer (CRC) screening [1][2][3]. FITs achieve overall high sensitivities for the detection of CRC, in the range of 70-80% at very high specificities of 90-95% [4][5][6]. Detection of advanced adenoma (AA), the most important precursor, and early-stage cancers is highly relevant for the reduction of CRC mortality [7,8]. However little is known with respect to the detection of CRC and AA stratified by various characteristics.
Two recent publications [9,10] reported that sensitivities for CRC detection tended to be higher with more advanced UICC stage and differences were suggested to be particularly strong according to T status. However, previous comparisons of stage-specific FIT-performance were all performed using one FIT brand each, and specificities as well as sensitivities for the detection of subgroup-specific CRCs and AAs varied widely between the different studies [9]. It is not clear to what extent these variations reflect differences between the various FITs or relate to differences in populations and designs of the studies.
In this study, we simultaneously assessed the sensitivity of nine different FIT brands according to tumor stage (UICC stage or T status) and location; size; histology; and number of AAs; and sex, age, and body mass index (BMI) among participants of screening colonoscopy and newly diagnosed CRC cases.

Methods
We followed the Standards for the Reporting of Diagnostic Accuracy Studies (STARD) [11] and the guideline for Faecal Immunochemical Tests for Haemoglobin Evaluation Reporting (FITTER) [12].

Study Design and Population
This study is based on the BLITZ and DACHSplus studies. Detailed information about both studies has been provided elsewhere [13][14][15][16]. Briefly, the BLITZ study is an ongoing prospective study among participants of screening colonoscopy, who are recruited before their scheduled colonoscopy appointment. The recruitment of study participants was conducted by 20 cooperating study sites and operated under the strict quality assurance criteria of the German screening colonoscopy program. Because of the low CRC prevalence in a screening setting, additional CRC cases from the DACHSplus study were included. In the DACHSplus study, newly diagnosed CRC patients were recruited in 4 cooperating hospitals before their treatment.
Written informed consent was obtained from each study participant. Both studies were approved by the ethics committee of the Medical Faculty Heidelberg of the University of Heidelberg (BLITZ study (178/2005)

Sample and Data Collection
Study participants recruited from 2005 through 2010 were asked to fill a sample cup (60 mL) with feces from a single bowel movement without any dietary or medicinal restrictions before starting bowel preparation for colonoscopy (BLITZ) or surgery (DACHSplus). Furthermore, they were instructed to store the stool-filled cups in a freezer and bring them to their colonoscopy appointment (BLITZ) or hospital admission (DACHSplus). After arrival, the samples were kept frozen at −20 • C and transported after an initial fecal hemoglobin measurement on dry ice to the German Cancer Research Center (DKFZ) for final storage at −80 • C.
Colonoscopy and histology reports, as well as medical and histological surgery reports were collected, and relevant data were independently extracted by two medical data managers in a blinded manner (unaware of the FIT results).

Selection of Study Participants
Study participants, who were recruited from 2005 through 2010 and who provided a stool sample were eligible for this study (Figure 1). After excluding participants due to the criteria shown in Figure 1, 1667 BLITZ study participants (screening setting) and 94 clinical CRC cases (clinical setting) were eligible. From both studies, all advanced neoplasm cases (CRC or AA) with enough stool material for the evaluation of 9 FITs were included. One screen-detected CRC case with UICC stage 0 was excluded. Therefore, 15 CRC cases and 200 AA cases from BLITZ and 79 CRC cases from DACHSplus were finally included. For analysis of specificity, 300 randomly selected participants free of CRC and AA, who provided enough stool material, were included. The random selection was performed using the SURVEYSELECT procedure in SAS.

Selection of Study Participants
Study participants, who were recruited from 2005 through 2010 and who provided a stool sample were eligible for this study (Figure 1). After excluding participants due to the criteria shown in Figure 1, 1667 BLITZ study participants (screening setting) and 94 clinical CRC cases (clinical setting) were eligible. From both studies, all advanced neoplasm cases (CRC or AA) with enough stool material for the evaluation of 9 FITs were included. One screen-detected CRC case with UICC stage 0 was excluded. Therefore, 15 CRC cases and 200 AA cases from BLITZ and 79 CRC cases from DACHSplus were finally included. For analysis of specificity, 300 randomly selected participants free of CRC and AA, who provided enough stool material, were included. The random selection was performed using the SURVEYSELECT procedure in SAS. More than 95% of the FIT measurements were conducted in the context of a previously reported study [17]. For the 29 CRC cases included in the final analysis, parallel FIT measurements were conducted in the context of another previously reported study [18], in which two (Eurolyser FOB test, OC-Sensor) of the previous nine FITs were not evaluated. Therefore, for these two FITs, the final analysis is based on 65 CRC cases (15 CRC cases from BLITZ + 50 CRC cases from DACHSplus). More than 95% of the FIT measurements were conducted in the context of a previously reported study [17]. For the 29 CRC cases included in the final analysis, parallel FIT measurements were conducted in the context of another previously reported study [18], in which two (Eurolyser FOB test, OC-Sensor) of the previous nine FITs were not evaluated. Therefore, for these two FITs, the final analysis is based on 65 CRC cases (15 CRC cases from BLITZ + 50 CRC cases from DACHSplus).

Laboratory Analysis
Detailed information about the FITs are shown in Table 1. FIT measurements were performed at the DKFZ in Heidelberg or in nearby located laboratories of the manufacturers as previously reported in detail [17,18]. Briefly, each FIT has a brand-specific fecal sampling tube, which is filled with a hemoglobin stabilizing buffer and contains an extricable serrated stick for the collection of a defined amount of feces (range: 9.5-20 mg). The stick was inserted multiple times into different areas of the stool sample until the serrations (which transfer the defined amount of feces) were completely filled with stool and then pushed back into the tube once. To ensure equal pre-analytic conditions, all FIT tubes were filled in parallel after thawing of stool samples and stored at room temperature (range: 17-25 • C) until parallel FIT measurements on the next day. The externally evaluated FIT tubes were packed in a temperature-isolated manner and immediately sent to the cooperating companies via a special courier service. Test calibrators and test controls were performed according to the manufacturers' instructions. All test measurements were conducted in a blinded manner.

Statistical Analysis
All quantitative FIT measurements were converted to the same and directly comparable unit of µg Hb/g feces [19].
Sensitivities were calculated for CRC according to UICC stage, T status, tumor location (proximal colon (caecum, ascending colon, hepatic flexure, transverse colon, splenic flexure), distal colon (descending colon, sigmoid colon, rectosigmoid junction), and rectum), sex, age group (50-59, 60-69, 70-79 years), and BMI group (normal weight: 18.5-24.9 kg/m 2 ; overweight: 25.0-29.9 kg/m 2 , obesity: ≥30 kg/m 2 ). UICC stage definitions followed the AJCC Cancer Staging Manual (7th edition) and are provided in Table S1. Sensitivities for the detection of AA were calculated according to size, villous/tubulovillous architecture, high-grade dysplasia, by location (same definitions as above, but participants with multiple AAs were excluded from the analysis by location only, because the AAs were distributed across different colon sections), number of AAs, sex, age group (as above), and BMI group (as above). Specificities were computed among participants without CRC and AA at screening colonoscopy.
Sensitivity and specificity estimates were computed at the cutoff values recommended by the manufacturers (Table 1). Differences in overall specificities between FITs at their original cutoffs may obscure potential FIT-specific differences in associations between sensitivity and the assessed variables. In order to enhance the comparability of results among the different FIT brands, we additionally calculated sensitivities at cutoffs adjusted to yield an equal overall specificity of 93% [17]. One of the tests, QuikRead go iFOBT, could not be included in this comparison, as the cutoff value could not be lowered below 15 µg/g due to the restricted analytical working range.
The 95% confidence intervals (CIs) of sensitivities and specificities were calculated using the "exact" (Clopper-Pearson) method. The Cochran-Armitage Test for trend in proportions was used to evaluate the statistical significance of trends in sensitivities across T status, UICC stage, location, age, and BMI. Fisher's exact test was used to test for differences in sensitivities between both sexes, AA size, architecture, dysplasia, and number of AAs.
Statistical analyses were performed using SAS Enterprise Guide, version 7.1 (SAS Institute, Cary, NC, USA). Statistical significance was indicated by two-sided p-values below 0.05.

Study Population
Characteristics of the study population are shown in Table 2. Screening (n = 15) and clinical (n = 79) CRC cases were combined. For two tests (Eurolyser FOB test, OC-Sensor), the evaluation of CRC sensitivity is based on 65 CRC cases overall.
Looking at the tumor location (Table 3), the highest sensitivity was consistently observed for rectum cancers, followed by proximal and distal colon cancers. Furthermore, sensitivities for CRC were consistently higher among men than among women. Younger individuals yielded generally higher sensitivities than older ones, and overweight or obese patients yielded generally higher sensitivities than normal-weighted individuals. However, none of these observed differences in sensitivities according to location, sex, age, and BMI reached statistical significance.
Furthermore, sensitivities for AA were consistently higher for males than for females, for older participants than for younger participants, and for obese participants than for over-or normal-weighted participants, even though most of these differences did not reach statistical significance (Table 3).

Sensitivities at Cutoffs Yielding 93% Overall Specificity
In general, overall and subgroup-specific sensitivities became very similar between the different FIT brands when cutoff values were adjusted to yield the same overall specificity ( Figure 2B, Table 4).
Very similar to the results obtained at original cutoffs, strong variations of sensitivity of CRC detection by tumor stage were seen ( Figure 2B, Table 4). Sensitivities for stage I were by up to 27% points lower in comparison to more advanced stages II-IV; however, the results were not statistically significant for 7 of the 8 FITs. Again, an even stronger gradient in sensitivities was observed according to T status. Sensitivities were by 14-61% points lower for T1 cancers in comparison to T2-T4, and for 7 of the 8 FITs included in this analysis, trends towards higher sensitivities by T status were statistically significant (p < 0.05).
Overall sensitivity for AA ranged from 27% to 32% (median: 30%) ( Figure 2B, Table 4). Sensitivities were consistently higher for large AAs (≥1 cm) than for small AAs, and for 7 of the 8 FITs, these differences were statistically significant (p < 0.05). Furthermore, we found in an ancillary analysis that sensitivities for AA increased statistically significantly (p < 0.05) from <0.5 cm to ≥0.5-1 cm, ≥1-3 cm, and ≥3 cm (Table S2). AA with high-grade dysplasia showed by up to 17% points higher sensitivities in comparison to AA without high-grade dysplasia; however, results were not statistically significant. Among participants with a single AA (n = 167), AAs located in the distal colon showed higher sensitivities than AAs located in the rectum or proximal colon, but a statistically significant difference was not found. Again, sensitivities were much higher (by 28-41% points) among participants with multiple AAs than among participants with a single AA, and this difference was statistically significant for all eight FITs (p < 0.005). Additionally, sensitivities for multiple AAs were again higher (by up to 25% points) for 7 of the 8 FITs than for T1 cancers.  In line with results at original cutoffs, highest sensitivities were consistently observed for rectum cancers, whereas sensitivities were even slightly lower for cancers in the distal than for cancers in the proximal colon (Table 4). Furthermore, consistently higher sensitivities for CRC were found for men vs. women; generally slightly higher sensitivities for younger than for older individuals; and for overweight or obese patients, the sensitivities were slightly higher than for normal-weighted individuals. Again, however, none of the differences according to location, sex, age, and BMI reached statistical significance.
Furthermore, sensitivities for AA tended to be higher for men than for women even though differences did not reach statistical significance (Table 4). Sensitivities increased consistently from younger to older participants, and for half of the FITs, this trend towards higher sensitivities by age was statistically significant (p < 0.05).

Discussion
In this study, we evaluated for the first time the sensitivity for CRC and AA detection of a large number of different quantitative FITs according to tumor stage (overall stage and T status), histological characteristics of AA, location, sex, age, and BMI, using fecal samples of participants of screening colonoscopy enriched with newly diagnosed clinical CRC cases. Strong associations between more advanced tumor stage and higher sensitivity were observed. These associations were particularly strong and statistically significant by T status (differences between T1 and T2-T4 by up to~60% points) but also notable among overall UICC stages (differences between stage I and II-IV by up to~25% points). Participants with multiple AAs yielded only slightly lower sensitivities than those with UICC stage I cancer and by up to 25% points higher sensitivities than those with T1 cancers.
The observed gradient in sensitivities with increasing UICC stage was even stronger than the gradient estimated in a previous meta-analysis [9] where pooled sensitivity for more advanced cancer stage was by up to~15% points higher than for stage I cancers. However, in a recent study [10] including 435 newly diagnosed CRC cases, a similarly strong difference in sensitivity by up to~35% points between stage I and more advanced stages (II-IV) was observed. Furthermore, two previous studies [20,21] found an even stronger difference (by~50% points) in sensitivity between stage I and more advanced cancer stages. With respect to T status, in the aforementioned meta-analysis [9], the observed difference between pooled sensitivity for T1 and T2-T4 cancers (by up to~40% points) was also less pronounced than in our analysis (by up to~60% points). However, FIT sensitivities varied widely across the included studies [9], which possibly affected the observed differences. Yet, Hirata et al. [22] and Kim et al. [23] also found a similarly strong difference in sensitivity between T1 and more advanced T statuses (by~60-70% points), and in a recent publication [10], the difference between T1 and T2-T4 cancers ranged up tõ 50% points. Furthermore, we found that sensitivity was the highest among T4 cancers. Even though larger tumors tend to bleed stronger, a previous study [10] found lower sensitivities in T4 compared to T3 cancers and hypothesized that clinically manifest anemia lowered FIT sensitivity in T4 compared to T3 tumors. Future studies might consider stool and blood hemoglobin levels to investigate this topic in detail. Previous studies suggested consistently higher sensitivities of FIT for distal CRC compared to proximal colon cancers [24]. However, those studies did not differentiate between tumors in the distal colon and tumors in the rectum. To our knowledge, only one previous study [10] reported sensitivities for distal colon cancer and rectal cancer separately. In our study, we evaluated for the first time the sensitivities for distal colon and rectal cancer separately for a large number of different FITs and found a consistently (albeit not statistically significantly) higher sensitivity for rectal cancer compared to distal colon cancers for all FITs, although there was no participant with T4 cancer among the rectal cancer cases. Interestingly, among participants with a single AA, sensitivities were highest for distal AA cases and lowest for proximal AA cases, although statistical significance was not reached. Future studies reporting on sensitivity according to location should thus consider additional stratification of distal advanced neoplasms into those located in the distal colon and rectum.
In line with findings from previous studies [5,[25][26][27][28], we observed consistently (albeit in the vast majority not statistically significantly) higher sensitivities among men than among women across all nine FITs. A potential explanation for this observation might be the higher proportion of T1 and lower proportion of T4 cancers among women than among men (32% vs. 11% and 5% vs. 13%, respectively). For AA detection, men showed a higher proportion of multiple AAs (20% vs. 9%) and a higher proportion of large AAs (75% vs. 66%) compared to women. Future studies might consider investigating FIT accuracy in multivariate analysis if case numbers allow it. Moreover, we found lower sensitivities among elderly CRC participants, which goes in line with the results of Selby et al. [5] For the detection of AA, we found a non-statistically significant trend towards higher sensitivities with higher age, which has also been shown in previous studies [26,28] and might be explained by the rising proportion of multiple AAs with age (from 9% to 17% and 27% in age groups 50-59, 60-69, and 70-79 years, respectively).
Our study has a number of strengths. To our knowledge, this is the first study to directly compare sensitivities of different FITs among the same study participants with additional consideration of tumor stage, AA characteristics, location, sex, age, and BMI. This increases comparability to other studies using only one FIT and precludes the potential influence of the different study designs, as an additional reason for varying sensitivities according to the different patient characteristics. Furthermore, we investigated the FITs at different cutoffs: the original cutoffs and at cutoffs adjusted to yield the same overall specificity of 93%. Sensitivities for all FITs were reported stratified by a range of potentially influential factors, such as tumor stage (overall UICC and T status) and location; number; size; histology; and location of AA, by age, sex, and BMI. Ours is also the first study to our knowledge that compared the sensitivity of T1 and UICC stage I cancers with sensitivity of AA with various characteristics and found that sensitivities for multiple AAs were particularly high and exceeded sensitivity for T1 cancers by up to 25% points. Furthermore, several exclusion criteria were applied to the recruited study participants: Of the screening participants, we excluded those not in the relevant age for screening (<50 or >79 years), those at elevated risk (IBD [29] or history of colorectal neoplasms), or decreased risk of CRC (colonoscopy in the past 5 years [30]), and participants with potentially inadequate goldstandard exam (colonoscopy) to verify FIT findings (incomplete colonoscopy or inadequate bowel cleansing). Of the participants recruited after confirmed CRC diagnosis, we also excluded those not in the relevant age for screening or not representative for average-risk participants who developed CRC (prior diagnosis of IBD or history of CRC), and those who had neoadjuvant therapy before stool collection, because chemotherapy might influence FIT results.
Our study also has limitations. Despite the large overall number of FIT measurements (n > 5000), the limited number of CRC and AA cases resulted in wide confidence intervals of subgroup-specific sensitivities and limited power for detecting differences across subgroups. Furthermore, stool samples were frozen and thawed before analysis, which differs from the recommended procedure of sample processing (transferring the fresh stool material directly into the fecal sampling tubes, which are filled with a hemoglobin stabilizing buffer and analyzing the FITs within a short period without any freezing and thawing). However, we found very similar diagnostic performance between both abovementioned fecal sampling procedures [31] and repeated freezing and thawing as well as long-term frozen storage at −80 • C had only very little effect on measurable hemoglobin concentrations and resulting FIT performance [32].

Conclusions
In conclusion, this study found strong differences in sensitivity according to tumor stage, AA size, and number of AAs. Across different FITs, particularly low sensitivities were observed for T1 cancers, and sensitivities were even higher for multiple AAs than for T1 cancers. By contrast, differences in sensitivity were small across the FIT brands when comparing them at equal specificity. These findings warrant further research to investigate