RETRACTED: Using Comorbidity Pattern Analysis to Detect Reliable Methylated Genes in Colorectal Cancer Verified by Stool DNA Test

: Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide in 2020. Colonoscopy and the fecal immunochemical test (FIT) are commonly used as CRC screening tests, but both types of tests possess different limitations. Recently, liquid biopsy-based DNA methylation test has become a powerful tool for cancer screening, and the detection of abnormal DNA methylation in stool specimens is considered as an effective approach for CRC screening. The aim of this study was to develop a novel approach in biomarker selection based on integrating primary biomarkers from genome-wide methylation profiles and secondary biomarkers from CRC comorbidity analytics. A total of 125 differential methylated probes (DMPs) were identified as primary biomarkers from 352 genome-wide methylation profiles. Among them, 51 biomarkers, including 48 hypermethylated DMPs and 3 hypomethylated DMPs, were considered as suitable DMP candidates for CRC screening tests. After comparing with commercial kits, three genes ( ADHFE1 , SDC2 , and PPP2R5C ) were selected as candidate epigenetic biomarkers for CRC screening tests. Methylation levels of these three biomarkers were significantly higher for patients with CRC than normal subjects. The sensitivity and specificity of integrating methylated ADHFE1 , SDC2 , and PPP2R5C for CRC detection achieved 84.6% and 92.3%, respectively. Through an integrated approach using genome-wide DNA methylation profiles and electronic medical records, we could design a biomarker panel that allows for early and accurate noninvasive detection of CRC using stool samples.


Introduction
Colorectal cancer (CRC) was the third most commonly diagnosed cancer and the second leading cause of cancer-related death worldwide in 2020 [1].According to the Taiwan Cancer Registry Annual Report 2018, more than 16,000 new cases of CRC and approximately 6000 CRC-related deaths occurred [2].The 5-year relative survival rate is greater than 90% for stage I CRC but only 10% for stage IV CRC [3].Therefore, early detection of CRC is essential for reducing death from CRC.
Several screening tools are available for CRC.Although colonoscopy provides the highest sensitivity and accuracy for the detection of colorectal lesions, some patients refuse this examination due to it being high cost, its unpleasant bowel preparation, being time-consuming and an uncomfortable procedure, and the risk of bowel perforation and bleeding [4].As a substitute, the fecal immunochemical test (FIT) is the most commonly used screening method worldwide [5] because it is noninvasive and more economical, can be performed at home, and does not require bowel preparation.Although the FIT has these advantages, its sensitivity is relatively low.In studies using colonoscopy as a reference standard, the sensitivity of FITs for detecting invasive cancer and precancerous lesions only ranged from 65.8% to 75.0% and from 27.0% to 29.0%, respectively [6][7][8][9][10].Moreover, the FIT shows a worse performance for the detection of right-sided advanced adenomas than left-sided lesions [11].
To overcome these limitations, the technique of detecting abnormal DNA methylation levels in exfoliated tumor cells from stool specimens has been developed [12].Unlike the FIT, stool DNA testing not only detects left-and right-sided colorectal neoplasms equally well but also has better adherence [13].However, despite the approval of multitarget stool DNA testing as an alternative approach for people at average risk for CRC by the U.S. Food and Drug Administration [9], the selection and validation of appropriate DNA methylation biomarkers are time-consuming and require large financial investments.Furthermore, race differences might affect the sensitivity and specificity of the selected biomarkers [14,15].
The Cancer Genome Atlas (TCGA) database provides the most comprehensive and integrated DNA methylation profiles at the genomic levels [16].A large number of the genome-wide methylation profiles were produced and explored through Illumina Human Methylation 450K Bead Chip Array experiments.Performing quality assessment and executing standard pipelines could facilitate bioinformaticians to identify effective biomarkers through in silico analysis.However, due to a paradigm shift in early diagnosis for CRC screening, a transition from tissue biopsy to noninvasive stool biopsy is expected to achieve the goals.It was expected that all initially identified biomarkers from differential methylation analysis on TCGA tissue specimens could be considered as good candidates for designing a test panel on tissue specimens.However, in this study, only Asian stool biopsies were used as specimens for candidate biomarker verification.Fewer candidate biomarkers for designing verification experiments becomes a practical challenge to be overcome.Under such a circumstance, comorbidity of Asian CRC subjects was proposed to incorporate the identified biomarkers by single use of TCGA methylated profiles, and the identified associated disease-genes could be considered as good constraints to effectively narrow down the number of suitable biomarker candidates.
Taiwan has an internationally well-known National Health Insurance Research Database (NHIRD) that contains all the medical records of Taiwanese citizens.As a population-level data source, Taiwan's NHIRD can serve as a foundation for big data analysis for real-world evidence, especially for Asian populations [17].By utilizing this database, we previously constructed machine learning approaches and prediction models for amyotrophic lateral sclerosis and demonstrated that a prediction model based on electronic medical records (EMRs) could provide a novel approach to estimate the risk for a specific disease [18].To enhance the effective selection and utilization of specific DNA methylation biomarkers from stool DNA screening approaches for the diagnosis of CRC, a traditional method for DNA methylation biomarker selection that incorporates the associated comorbidity patterns and disease-gene associations are proposed.This study aimed to analyze the comorbidity patterns of patients with CRC using a comprehensive EMR database annotated in Taiwan's NHIRD and determine disease-specific associated genes from the identified significant comorbidities.By integrating both primarily identified DNA methylation biomarkers from a genome-wide differential DNA methylation analysis and the secondary significant disease-gene associations from historic EMR analysis, we believe that the cross-validated and selected DNA methylation biomarkers could simultaneously increase biomarker sensitivity and specificity and address the issues regarding race difference.

Differential DNA Methylation Analysis for Primary Biomarkers
In this study, DNA methylation profiling datasets (Illumina Human Methylation 450K Bead Chip Array) were downloaded from TCGA database, and the data of a total of 352 subjects, including 314 specimens with CRC and 38 normal specimens, were used for primary biomarker analysis (Figure 1).First, both CRC and normal control groups were integrated and analyzed using the Chip Analysis Methylation Pipeline (ChAMP) [19], a standard pipeline package including quality control and dataset normalization (BMIQ) for detecting differentially methylated probes (DMPs).Using parameter settings on β value differences (Abs(∆Beta) ≥ 0.5) and multiple testing correction by Benjamini and Hochberg method for decreasing false discovery rate, a set of DMPs could be identified as general primary biomarker candidates.To identify the possible specific biomarker candidates for different racial groups, different racial populations were individually analyzed and cross-race compared.Data of the patients with CRC from several different racial groups were collected from TCGA.We selected three major racial groups (white, Asian, and black) for cross comparison, comprising 229 white, 12 Asian, and 62 black.DMP analysis for each racial group was performed individually using the standard pipeline provided by the ChAMP package with identical parameter settings.Intersection analysis was applied to discover a set of universal biomarker candidates that could serve as a broad-spectrum detection toolkit for CRC.However, an exclusive analysis could be performed to identify unique DMP biomarkers for specific racial groups.

Comorbidity Analysis for Secondary Biomarkers
The EMRs used in this study were anonymous and partially selected from Taiwan's NHIRD.The partial dataset comprised the data of one million insured people, and their comprehensive longitudinal medical records were collected between 2000 and 2013 (IRB:105-0504C).The disease codes followed the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM).They contained 17 major chapters and were further classified into 143 disease groups.For example, the disease group of 560-569 is labelled under the classification "Other diseases of intestines and peritoneum", where a three-digit code represents a single disease type.To explore the comorbidities associated with CRC, the data of positive (with CRC) and negative (without CRC) subjects were retrieved from Taiwan's NHIRD.From the one million population NHIRD medical database, a set of 6293 confirmed positive subjects with age 40-80 years were identified, and a set of near 5-fold negative subjects (30,653 subjects) without CRC were randomly matched with them using case-control matching with regard to gender and age attributes.Data of all selected subjects and their EMRs for 3 years before their first diagnosis of CRC was obtained from Taiwan's NHIRD.All retrieved medical records were subjected to data cleaning and integration to yield simplified personal historic medical records for each subject.Statistical analysis was then performed on each disease code to verify significance with regard to CRC under the threshold settings of odds ratio > 2, p-value < 0.05, and supporting rate (minimum number of diagnosed subjects) > 10%.Once significant comorbidities were identified, the disease-associated genes could be directly retrieved from the DisGeNET [20] annotations.These identified disease-associated genes were considered secondary biomarkers for the following screening processes.Both identified DNA methylation positions (primary biomarkers) and comorbidity-associated genes (secondary biomarkers) were combined to yield a set of biomarker candidates for clinical verification.

Primary biomarkers
Secondary biomarkers

Specimen Collection
In total, 30 stool specimens were collected from 13 healthy participants, 4 patients with adenoma, and 13 patients with CRC in the outpatient and inpatient departments of the Tri-Service General Hospital, respectively.The colorectal status of all participants (age range, 40-80 years) enrolled in this study was confirmed by using colonoscopy.The demographic characteristics of healthy participants and patients are listed in Supplementary Table S1.This study was approved by the Institutional Review Board of the Tri-Service General Hospital (Protocol No. A202105054), and the study protocol was registered on ClinicalTrials.gov(ID No. NCT04823793).Approximately 5 g of fresh stool specimen was collected from each participant and deposited into a storage tube prefilled with 25 mL preservation buffer (BGI Genomics Co., Shenzhen, China).After collection, the samples were stored at 4 • C up to 7 days prior to DNA extraction.

Stool DNA Extraction and Bisulfite Conversion
Specimen DNA was extracted using the Stool DNA Isolation kit (BGI Genomics Co., Shenzhen, China).Briefly, 200 µL stool specimen was applied to DNA extraction.Stool lysis buffer (mainly Triton-X 100) was added, followed by vortexing for 10 min and centrifugation at 12,000 rpm for 3 min.The supernatant was incubated with Protease K at 70 • C for 20 min.After incubation, stool DNA was precipitated using ethanol and separated using magnetic beads.Washing was performed three times to remove redundant salts and proteins.Half of the isolated DNA was modified with bisulfite using the DNA Bisulfite Conversion kit (BGI Genomics Co., Shenzhen, China) according to the manufacturer's instructions.

Quantitative Methylation Specific PCR (qMSP)
Due to high costs of probe design for qMSP, probes for only a few potential biomarkers could be designed and verified through clinical experiments.In this study, we selected three biomarkers, alcohol Dehydrogenase Iron Containing 1 (ADHFE1), syndecan-2 (SDC2), and protein phosphatase 2 regulatory subunit B, γ (PPP2R5C), from the integrated primary and secondary biomarker candidate set, mainly based on comprehensive consideration, such as literature survey, genetic locations of DMPs, primer design, functional analysis, and β value differences (hypermethylated biomarkers).The methylation levels of ADHFE1, SDC2, and PPP2R5C were quantitated using DNA Methylation Detection Kit (BGI Genomics Co., Shenzhen, China) and GAPDH was used as an internal reference gene.Each PCR reaction contained 9 µL bisulfite-converted DNA in a 20 µL reaction mixture, including four primer and probe pairs.PCR procedures were performed using LightCycler 480 II with the following thermal program: 95 • C for 10 min, 10 cycles of 95 • C for 15 s, 65 • C for 30 s, and 72 • C for 30 s, 35 cycles of 95 • C for 15 s, 55 • C for 30 s, and 72 • C for 30 s. Notably, the annealing temperature in the first 10 cycles was kept at 65 • C and then reduced by 1 • C for each cycle till 56 • C. DNA methylation levels of the target genes were determined using crossing point (dCp) values obtained by the following formula: Cp value of target gene-Cp value of GAPDH.

Fecal Immunochemical Test (FIT)
The FIT was conducted using the OC-SENSOR io analyzer (Eiken Chemical Co., Tokyo, Japan), with a 100 ng/mL cutoff hemoglobin level.

Statistical Analysis
Processing procedures of DNA methylation profiles from TCGA database, a multiple testing correction by Benjamini and Hochberg method was performed to decrease false discovery rate.The identified significant DMPs with their corresponding adjusted p-values are shown in the Supplementary Table S2.Statistical analysis and dot plots for methylation level visualization in Results Section were performed using GraphPad Prism 8 software (GraphPad Software Inc., San Diego, CA, USA).The area under the receiver operating characteristic curve (AUC-ROC) method was employed to define the cutoff values used to determine the sensitivity and specificity of ADHFE1, SDC2, and PPP2R5C.The unpaired parametric t-test was used to determine differences in methylation level between the two participant groups.The significant results are presented as two-tailed p-values (p < 0.05).

Primary Biomarkers from Differentially Methylated Positions (DMPs)
DNA methylation profiles of 314 patients pathologically diagnosed with CRC were compared with those of 38 normal subjects for identifying primary DMP biomarkers.Using parameter settings on β value difference (Abs(∆Beta) ≥ 0.5), 125 DMPs with significant adjusted p-values could be identified.In addition, three major racial groups, white, Asian, and black, were further individually analyzed and cross compared.As a result, 79 universal DMPs could be identified in these three racial groups, and three exclusive DMPs in white (n = 229), five in black (n = 62), and none in Asian (n = 12) groups were validated.The identified common and exclusive DMPs and their corresponding methylation probe IDs, gene names, loci, β value differences, p-values, and adjusted p-values are shown in Supplementary Table S2.

Secondary Biomarkers from Comorbidity Analysis
From the one million NHIRD dataset, the association between each comorbidity code and CRC was statistically evaluated.When the statistical parameters were defined as odds ratio > 2, p-value < 0.05, and supporting rate > 10%, there were only four disease groups associated with CRC that could be identified."Other Diseases of Intestines and Peritoneum (560-569)", "Diseases of Veins and Lymphatics, And Other Diseases of Circulatory System (451-459)", "Other Diseases of Digestive System (570-579)", and "Diseases of Esophagus, Stomach, And Duodenum (530-539)".For each identified disease group, the corresponding disease-associated genes could be retrieved from the DisGeNet database.Moreover, 1142, 430, 2693, and 1469 annotated genes could be found associated with the four identified disease groups.In addition, the 10,437 annotated disease-gene associations are known for CRC (ICD9: 153 and 154).All CRC associated genes were considered to be the secondary biomarker candidates and served as a screening filter for advanced selection.The secondary biomarkers identified from the comorbidities are listed in Supplementary Table S3.After performing intersection analysis using the DMP primary biomarkers, a total of 51 candidate genes, including 48 genes that were found to be hypermethylated in CRC tissues, were identified.In addition, we compared the results of 125 primary DMPs obtained by single use TCGA methylation profiles and the results of 51 DMPs by integrating comorbidity constraints based on average sensitivity, specificity, and accuracy measurements.For the initially selected 125 primary biomarkers, the average sensitivity, specificity, and accuracy achieved 68.15%, 99.95%, and 71.58%, respectively, (by setting the β-difference cutoff value larger than 0.5) from the collected TCGA specimens, while performance of the constrained 51 biomarkers could be slightly increased to 70.36%, 99.89% and 73.55%, respectively, under the same cutoff settings.The detailed sensitivity, specificity, and accuracy for each primary biomarker from TCGA dataset are also provided in Supplementary Table S3, individually.Incorporating comorbidity analytics and associated disease-gene constraints provided an effective approach to decrease the number of suitable biomarkers from 125 to 51 DMP candidates.Nevertheless, the development of a usable probe and primer for DNA methylation assessment is time-consuming and laborious.To verify a multi-gene panel that can cover people of all ethnicities for the purpose of CRC screening, we summarized several commercial methylation kits on the market, shown in Table 1.In comparison with these commercial kits, Colotect, as a stool-based DNA methylation test, with three selected genes (ADHFE1, SDC2, and PPP2R5C) from the 48 candidate genes, was designed for further clinical verification.

Verification of Stool DNA Methylation in Individual Specimen
In this study, we estimated the methylation levels of ADHFE1, SDC2, and PPP2R5C in stool DNA from normal participants (n = 13), patients with adenoma (n = 4), and patients with CRC (n = 13).The methylation levels (dCp) of these three genes are presented as dot plots in Figure 2. Lower dCp values represent higher methylation.These three genes were significantly hypermethylated in stool DNA (p < 0.001) collected from patients with CRC (Figure 2A).However, only SDC2 revealed high methylation in adenoma patient's stool DNA (p < 0.001) (Figure 2A).ROC curves and the associated AUCs of the diagnostic prediction model using dCp values in the CRC and normal cohorts were performed using GraphPad Prism Version 8.0 (Figure 2B).The AUCs of ADHFE1, SDC2, and PPP2R5C were 0.8935, 0.8402 and 0.8817, respectively.The Youden index = sensitivity + (specificity-1) represents the difference between the diagnostic performance of the test and the best possible performance.The optimal cutoff based on each gene was set at the cutoff that gave the highest Youden index.The sensitivity and specificity of ADHFE1 were 84.6% and 100%, respectively, when the cutoff value of dCp was defined as 5.02 (Youden index = 84.62)The sensitivity and specificity of SDC2 were 69.2% and 92.3%, respectively, when the cutoff value of dCp was defined as 7.50 (Youden index = 61.54).The sensitivity and specificity of PPP2R5C were 69.2% and 100%, respectively, when the cutoff value of dCp was defined as 9.33 (Youden index = 69.23)(Table 2).The details of the cutoff value settings and corresponding sensitivity and specificity for the three genes are shown in the Supplementary Table S4, respectively.Furthermore, we have compared "OR" combination and linear combination of the three gene dCp values using R package (version 3.3.2).The results showed no difference between OR operation and a linear combination from the three genes.The weighted coefficients obtained by using a logistic regression model of the three genes were formulated as 4.5545 − 0.1397 × (dCP of ADHFE1) − 0.1200 × (dCp of SDC2) − 0.1909 × (dCp of PPP2R5C).The details of the comparison are shown in Supplementary Table S5.Notably, 75% (3/4) of adenoma specimens revealed hypermethylation of SDC2, but only one specimen revealed hypermethylation of ADHFE1 (Table 3).These observations indicate that the methylation status of ADHFE1 showed a good performance for the detection of CRC, whereas that of SDC2 showed a good performance for the detection of both adenoma and CRC.Furthermore, FITs yielded accurate results for late-stage CRC detection but less accurate results for early-stage cancer and precancer detection.Moreover, the combination of methylation markers and the FIT revealed the best sensitivity for both adenoma and CRC screening at all stages (Table 4).   1 Cutoff values of methylation level are determined by AUC-ROC calculation. 2The numbers in parentheses represent the CRC cases identified by hypermethylated gene/total CRC cases. 3The numbers in parentheses represent that the control subject number was able to identify as normal/total control subjects.

Discussion
DNA methylation is an important molecular mechanism associated with human tumorigenesis.In particular, abnormal DNA methylation patterns are related to the diagnosis and prognosis of many types of cancers.This study aimed to identify potential DNA methylation-based biomarkers for CRC.We identified a total of 51 methylationdriven genes through a comprehensive analysis of TCGA and EMRs annotated in Taiwan's NHIRD.We present a system to visually analyze the comorbidities associated with CRC using multidimensional categorical EMR data, of 6293 patients with CRC and 30,653 normal subjects.Our primary aim was to identify primary biomarkers from TCGA and secondary biomarkers from EMRs annotated in Taiwan's NHIRD, which might share a common mechanism for further biomarker exploration and selection."Other Diseases of Intestines and Peritoneum (560-569)", "Diseases of Veins and Lymphatics, And Other Diseases of Circulatory System (451-459)", "Other Diseases of Digestive System (570-579)", and "Diseases of Esophagus, Stomach, And Duodenum (530-539)" were found to be commonly associated with CRC (ICD9: 153 and 154).These identified comorbidity diseasegene associations were considered as secondary biomarkers for the subsequent clinical verification process.
The present study showed that a stool DNA methylation test could facilitate early detection of CRC and adenoma.Compared to the detection rates of the FIT, the stool DNA methylation test could detect 75.0%(3/4) of adenoma (≥0.2 cm) cases and 84.6% (11/13) of CRC cases.These observations suggested that the DNA methylation test showed better performance for predicting advanced adenoma than that of the FIT.This difference may be attributed to the DNA methylation markers for the early detection of CRC.
Notably, most information obtained from TCGA was based on the Caucasian population (>75.0%).Robust evidence is available regarding the relationship between CRC and SEPT9, a well-known hypermethylated gene in CRC, in the Caucasian population.Nevertheless, the diagnostic sensitivity of blood-based mSEPT9 in CRC detection varied among published reports [21].Lee et al. [22] found a very low sensitivity of mSEPT9 (36.6%) among the Korean population.Therefore, we present an integrative approach that combines TCGA with NHIRD-based information to identify biomarker panels for CRC.
The CRC incidence rate in many Asian countries has rapidly increased in recent decades [23,24].Because DNA methylation alternations differ based on patients' race [25], the race difference is considered an indispensable factor for DNA methylation biomarker selection.To our knowledge, this is the first study to integrate the genome-wide differential DNA methylation analysis and disease-gene associations retrieved from Taiwan's NHIRD for the identification of DNA methylation biomarkers for CRC in Asian populations.Three genes, ADHFE1, SDC2, and PPP2R5C, were identified and found to be hypermethylated in Taiwanese patients with CRC using qMSP analysis (Figures 1 and 2).A comparison of our results showed that overlap in both ADHFE1 and SDC2 genes was reported to be highly methylated in Asian populations with CRC, including Chinese and South Korean populations [26][27][28].Therefore, our proposed DNA methylation biomarker identification approach is beneficial to improve the biomarker feasibility for CRC detection in Asian populations.
Thus far, the FIT and colonoscopy are widely used to screen CRC.However, the sensitivity of FITs for early CRC detection needs to be improved [29].While colonoscopy is regarded as a valuable tool for CRC detection, its use is limited due to the high cost, invasiveness, and possible complications [30].Stool DNA methylation level detection has the potential to overcome the abovementioned limitations, and, thus, it has currently become an alternative approach for CRC screening [30].While the efficacy of a single marker is often limited, a multi-marker signature can have greater diagnostic value.Our study demonstrated that the assessment of methylation levels of ADHFE1, SDC2 and PPP2R5C in stool has a better detection rate for CRC screening.The sensitivities of methylated ADHFE1, SDC2 and PPP2R5C, and the combination of these three genes for CRC detection were 84.6%, 69.2%, 69.2% and 84.6%, respectively, and their specificities were 100%, 92.3%, 100% and 92.3%, respectively (Table 2).Methylated SDC2 is known as a valuable biomarker for CRC detection.The sensitivity and specificity by methylated SDC2 for CRC detection in three commercial methylation kits, IColocomf (Ammunition Life Technology Co., Ltd., Wuhan, China), Colosafe (Creative Biosciences Co., Ltd., Guangdong, China), and COLOW-ELL (RealBio Technology Co., Ltd., Shanghai, China) were 77% and 98%, 84% and 98%, and 71% and 94%, respectively [28,31,32].Although methylated SDC2 had a lower sensitivity for CRC screening in the present study than that reported previously, the sensitivity and specificity of the three gene combination for CRC detection were comparable to those results previously (Table 2).These observations support that the assessment of methylation levels of multiple genes might improve the detection rate of CRC [33].Compared with SDC2 and PPP2R5C, ADHFE1 showed a better ranking as well as the highest sensitivity for CRC detection, indicating that the ranking computed via our biomarker selection approach may provide a good recommendation for biomarker utilization in CRC screening.However, the performance of methylated ADHFE1 for adenoma detection was poorer than that of methylated SDC2 (Tables 3 and 4), suggesting that further refinement of our biomarker selection approach for precancer detection is necessary.
The putative biological relevance of ADHFE1, SDC2, and PPP2R5C in carcinogenesis may provide additional support to our methodology.ADHFE1 is involved in the metabolism of 4-hydroxybutyrate in mammalian tissues [34].Previous studies have reported the hypermethylation of ADHFE1 in CRC and adenoma tissues.Hypermethylated ADHFE1 might induce CRC occurrence via the stimulation of tumor cell proliferation [35,36].SDC2 is a transmembrane heparan sulfate proteoglycan, serving as a regulator in the mitogen-activated protein kinase (MAPK) pathway [37].Detection of SDC2 methylation status in blood and stool specimens been widely adopted in the assessment of CRC and adenoma [38,39].Protein phosphatase 2A (PP2A) is composed of a structural subunit (A), a regulatory subunit (B), and a catalytic subunit, which is a serine/threonine phosphatase.As a tumor suppressor, PP2A is involved in the regulation of Wnt signaling and MAPK pathways [40].PPP2R5C belongs to the B subunit family of PP2A, which acts as a negative regulator of cell cycle transition [41].The methylation of PPP2R5C is considered a biomarker for CRC and breast cancer detection [42,43].Notably, these three genes are involved in the MAPK pathway and affect cell proliferation, which may impact the progression of CRC.
DNA methylation tests can be performed using blood and stools.Both approaches are expected to improve the screening compliance for CRC in the general population.Although the blood-based DNA methylation test, which has recently been approved by the U.S. Food and Drug Administration (FDA), is believed to have a better adherence, the examination is only suggested for people who refuse any other screen modalities due to its relatively low sensitivity and specificity and uncertain testing interval [44].Unlike the blood-based DNA methylation test, stool-based DNA methylation test is one of the regularly recommended CRC screening tests in the latest National Comprehensive Cancer Network (NCCN) guidelines [44].Current stool-based tests include the FIT and the stool DNA test.The stool DNA test combines multiple molecular biomarkers with the FIT.A methylation panel comprising ADHFE1, SDC2, and PPP2R5C correctly identified CRC with a sensitivity of 84.6% and specificity of 92.4% in stool samples, and it had a higher sensitivity than FIT alone in detecting advanced precancerous lesions.However, we still found two (2/13) false-negative CRC cases, which yielded positive findings in FIT.To enhance the sensitivity of the stool-based test, a combined FIT-fecal multitarget DNA test is suggested.
The present study has several limitations.First, the small sample size decreases the statistical power of our findings.Second, although we have collected specimens from one hospital, publication bias may still exist.Third, we have searched only for articles written in English, while many articles written in other languages were ignored.The candidate biomarkers identified from TCGA and Taiwan's NHIRD selection model need to be validated using larger sample sizes to confirm their accuracy and clinical utility.Fourth, because the sample size in this study was limited, we were unable to incorporate multiple factors, such as sex, age, and tumor localization, into our biomarker selection approach.Therefore, future studies need to analyze the relationship between other such factors and biomarker selection.Fifth, according to our biomarker selection approach, ADHFE1, SDC2, and PPP2R5C are supposed to display hypermethylation in the CRC tissues from white, Asian, and black populations; however, 5% of ADHFE1, 10% of SDC2, and 10% of PPP2R5C methylation levels in CRC tissues from the white population did not coincide with these findings.To ensure the feasibility of the methylation level biomarker in CRC tissues from Asian populations, the methylation level analysis of three genes in CRC tissue was performed.ADHFE1, SDC2, and PPP2R5C were hypermethylated in all (18/18) CRC tissues samples compared with adjacent normal tissues (p < 0.05) in an Asian population (2021, unpublished data).

Conclusions
In summary, through analysis of 450K methylation data from TCGA and EMR data collected from Taiwan's NHIRD, we identified a set of biomarker genes for CRC detection in this study.The potential disease relevance of the three selected genes was verified using the stool DNA-based methylation test.Notably, in our proof-of-concept study, we found that the multitarget stool DNA methylation test with a FIT could precisely detect CRC and adenoma in the early stages.Furthermore, we revealed a panel of noninvasive methylomic biomarkers for CRC.Among 48 candidate genes, there were 27 candidate genes (AEBP1, AGRN, AMPH, CHST2, COL25A1, CPLX1, FAM110B, GFRA1, GLRB, GRASP, GSG1L, IRF4, IRX5, LMO1, MPPED2, PPP2R5C, PREX2, PTPRN2, RALYL, SND1, SPOCK1, THBD, TLL1, TLX1, USP44, VIPR2, and VSX1) with less known roles in the management of colorectal cancer.These novel findings also enable us to search for similar predictive methylation markers in the future.These epigenetic-based biomarkers to detect early-stage colon cancer require further investigation.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10.3390/genes12101539/s1, Table S1: Demographic characteristics of testing subjects, Table S2: 152 Primary DMPs, Table S3: 51 Candidate DMPs, Table S4: Cutoff values of selected biomarkers, Table S5: OR combination vs linear combination of selected biomarkers.Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.Written informed consent has been obtained from the patients to publish this paper.
Data Availability Statement: Not applicable.

Figure 1 .
Figure 1.Flow chart showing the pipeline for identification of primary biomarkers from DNA methylation analysis using TCGA and secondary biomarkers from the Taiwan's NHIRD.

Figure 2 .
Figure 2. Stool DNA methylation levels of the three candidate genes.(A) The dot plots represent crossing point (dCp) values for methylation status of ADHFE1, SDC2, and PPP2R5C.The average methylation levels are displayed as horizontal bars in the middle of the scattered dots.(ns: not significant; ***: p < 0.001).(B) Area under the receiver operating characteristic curve (AUC-ROC) for the DNA methylation status of ADHFE1, SDC2, and PPP2R5C in CRC stool DNA.The cutoff values of dCp of ADHFE1, SDC2, and PPP2R5C are 5.02, 7.50 and 9.33, respectively.

Table 1 .
Commercial methylation kits for colorectal cancer detection.
1Sensitivity and specificity values were provided by manufacturer instructions or published articles.

Table 2 .
The performance of ADHFE1, SDC2, and PPP2R5C for detection of CRC.

Table 3 .
Quantitative methylation specific PCR results and FIT detection status for individual participants.

Table 4 .
The combined performance of methylation markers and FIT for detection of CRC and adenoma.