DNA Methylation-Based Estimates of Circulating Leukocyte Composition for Predicting Colorectal Cancer Survival: A Prospective Cohort Study

Simple Summary Inflammation is involved in the evolution of cancer. Leukocytes, of which the proportion can be estimated using epigenome-wide methylation data, may serve as a prognostic marker in colorectal cancer (CRC). Our aim was to investigate whether DNA methylation-based estimates of circulating leukocytes is associated with all-cause and disease-specific mortality in a prospective CRC patients’ cohort. Significant associations with CRC prognosis were observed for CD4+ T cells, CD8+ T cells, B cells, NK cells, and lymphocytes, independent of age, sex, tumor stage, tumor subsite, and therapy. CD4+ T cells outperformed other leukocytes and provided added predictive value in comparison to age, sex, and tumor stage. Although cell counting is commonly used in clinical practice, DNA methylation-estimated cell proportions could be a promising tool in understanding the role of leukocytes as CRC prognostic biomarkers when using stored blood samples. Abstract Leukocytes are involved in the progression of colorectal cancer (CRC). The proportion of six major leukocyte subtypes can be estimated using epigenome-wide DNA methylation (DNAm) data from stored blood samples. Whether the composition of circulating leukocytes can be used as a prognostic factor is unclear. DNAm-based leukocyte proportions were obtained from a prospective cohort of 2206 CRC patients. Multivariate Cox regression models and survival curves were applied to assess associations between leukocyte composition and survival outcomes. A higher proportion of lymphocytes, including CD4+ T cells, CD8+ T cells, B cells, and NK cells, was associated with better survival, while a higher proportion of neutrophils was associated with poorer survival. CD4+ T cells outperformed other leukocytes in estimating the patients’ prognosis. Comparing the highest quantile to the lowest quantile of CD4+ T cells, hazard ratios (95% confidence intervals) of all-cause and CRC-specific mortality were 0.59 (0.48, 0.72) and 0.59 (0.45, 0.77), respectively. Furthermore, the association of CD4+ T cells and prognosis was stronger among patients with early or intermediate CRC or patients with colon cancer. In conclusion, the composition of circulating leukocytes estimated from DNAm, particularly the proportions of CD4+ T cells, could be used as promising independent predictors of CRC survival.


Introduction
Colorectal cancer (CRC) represents the second leading cause of cancer-related deaths worldwide [1]. Although recent improvements have been made in screening strategies and treatments for CRC, the prognosis of advanced CRC is still poor [2,3]. Moreover, prognostic information provided by the anatomy-based tumor-node-metastasis staging system is incomplete. Survival rates can be significantly different among patients within the same stage [4]. Therefore, novel prognostic markers are needed to improve patient management for tertiary prevention and to elicit a better understanding of the disease processes. We previously observed that higher levels of DNA methylation (DNAm)-based mortality risk score, AgeAccelPheno and AgeAccelGrim were statistically significantly associated with poorer CRC prognosis. However, the associations were weaker for CRC-specific survival than for overall survival. Studies are needed to identify prognostic biomarkers that are more specific to CRC [5].
Immune inflammatory cells are involved in CRC progression in conflicting ways: both tumor-suppressing and tumor-promoting leukocytes are involved [6,7]. Intratumoral lymphocytes can induce tumor cell death by secreting tumor necrosis factor-α and interferonγ [8]. On the other hand, a higher infiltration of neutrophils facilitates the pathological angiogenesis that is related to tumor growth [9]. Epidemiological studies have shown that a higher neutrophil to lymphocyte ratio in peripheral blood is associated with a worse CRC survival [10][11][12][13][14]. These findings suggest that the leukocyte composition may represent the underlying immuno-biology to CRC progression and could serve as a promising prognostic marker for CRC [15]. However, usually, freshly drawn venous blood and strict cell processing are critical for quantifying peripheral blood leukocyte composition using traditional techniques. Therefore, leukocyte counts are often missing in large-scale clinical and epidemiological studies.
Cell lineages can be distinguished by differentially methylated DNA regions. Methods have been developed to infer the leukocyte composition using DNA methylation (DNAm) signatures that are chemically stable [16][17][18]. Salas and colleagues identified a library of 450 CpGs for deconvoluting CD4+ T cells, CD8+ T cells, B cells, natural killer (NK) cells, monocytes and neutrophil fraction, which were specific to DNAm signatures assayed using Illumina Infinium HumanMethylationEPIC BeadChip Kit (EPIC microarray, Illumina, Inc., San Diego, CA, USA) [18]. The method showed good accuracy of cell composition estimates, and the average coefficient of determination was 99.2% between the artificial predefined cell proportions of blood mixtures and the estimations [18]. It was further validated in actual blood samples and publicly available datasets [18].
In this work, the leukocyte composition was estimated using Salas' method based on EPIC microarray data in a large cohort of CRC patients. We then investigated the association of leukocyte composition with all-cause mortality and CRC-specific mortality of CRC patients and their capability for predicting survival.

Study Design and Population
The current study is based on the prospective follow-up of CRC patients of an ongoing population-based case-control study on CRC, the German Darmkrebs: Chancen der Verhütung durch Screening (DACHS) Study. Details of the DACHS study design have been described elsewhere [19][20][21][22]. Briefly, we recruited patients with a first diagnosis of CRC (ICD 10 codes: C18-C20) in 22 clinics that provide CRC surgery in the Rhine-Neckar region of Germany. In the current analysis, we included 2206 CRC patients whose information on follow-up with respect to survival outcomes and DNAm array data from blood samples taken at baseline.

Data Collection
CRC patients were diagnosed between 2003 and 2010 and were invited by their treating physicians during the first hospital stay due to CRC and notified to the study center at the German Cancer Research Center, after receiving informed consent. Trained interviewers collected patients' sociodemographic, medical and lifestyle information using a standardized questionnaire at the earliest possible convenience, either during hospital stay or shortly thereafter at patients' homes. Moreover, data on tumor characteristics and treatment were extracted from medical records. Information on newly diagnosed diseases and recurrences was provided by physicians at 3-and 5-year follow-up. Data on date and cause of death were obtained from the local population registers and public health authorities. The study was approved by the Ethics Committees of the Medical Faculty of the University of Heidelberg (ID: 310/2001) and the Medical Chambers of Baden-Württemberg (ID: M-198-02) and Rhineland-Palatinate (ID: 837.419.02 (3637)).

DNA Methylation and Leukocyte Composition Estimation
Blood samples were collected after the interview and were shipped to the study center and stored at −80 • C until whole blood DNA extraction. Genome-wide DNAm signatures were assayed using the EPIC microarray (Illumina, Inc., San Diego, CA, USA) according to the manufacturer's protocol at the Genomics and Proteomics Core Facility of the German Cancer Research Center. Preprocessing and normalization of DNAm data was conducted following the pipeline proposed by Lehne and colleagues [23]. Probes with detection p value > 0.01, with missing value > 10%, probes targeting the sex chromosomes, cross-reactive probes and polymorphic CpGs were excluded, leaving 787,231 CpGs for analyses. The proportions of CD4+ T cells, CD8+ T cells, B cells, NK cells, monocytes and neutrophils were estimated using a deconvolution method, which was implemented with the FlowSorted.Blood.EPIC package of R [24].

Statistical Methods
The proportion of categorical covariates was calculated to describe the characteristics of the study population. The distribution of baseline characteristics of patients at different stages were compared using Chi square test. Wilcoxon−Mann−Whitney test was performed to compare medians of leukocyte composition across categorical variables. Jonckheere−Terpstra test was used to determine the significance of the trend of ordinal variables. Correlations between leukocyte components were assessed using Pearson correlation coefficients and scatter plots.
Associations between leukocyte composition and survival were assessed using Cox regression, from which hazard ratios (HRs) and 95% confidence intervals (95% CIs) were derived. We ran a "clinical model" adjusted for covariates that can be easily obtained in clinical settings, including age, sex, stage, CRC subsite, batch, and timing of blood sampling. A "comprehensive model" additionally adjusted for body mass index (BMI, kg/m 2 ), smoking status (never, former and current smokers), and Charlson comorbidity index (CCI) score was applied to evaluate potential confounding by these covariates. The characteristics of all covariates except for the timing of blood sampling are shown in Table 1. Complete case analysis was applied to address missing data (<0.5%) when adjusting for these variables. The timing of blood sampling was grouped into four categories including: (1) prior to surgery or within 1 month after surgery (n = 1006); (2) more than 1 month after surgery, but not receiving chemotherapy or radiotherapy (n = 569); (3) during chemo-or radiotherapy, or within 5 months after the first treatment (n = 278); (4) more than 5 months after chemo-or radiotherapy (n = 353) ( Figure S1). The proportional hazards assumption was tested using the Schoenfeld Residuals. Furthermore, we accounted for delayed entry time by incorporating the time elapsed between diagnosis and the starting of patients' enrolment in the standard Cox models. Adjusted survival curves were plotted to assess whether the association differed depending on tumor stage, tumor subsite and the timing of blood sampling. The difference between survival curves across quartiles of leukocyte composition was evaluated using the G-rho family of tests. Furthermore, predictive accuracy and discriminating ability of leukocyte composition were evaluated using Harrell's concordance statistics (C-statistics) and were compared with age, sex and stage. Cubic spline models were also performed to assess the dose−response relationship between CD4+ T cell proportion and CRC mortality in various therapy groups.
Correlation matrix, survival curves and cubic spline curves were produced using the R 3.6.0 with the packages corrplot, survminer and survival, respectively [25]. Hazard ratios and Harrell's C-statistics were derived using the PROC PHREG procedure in SAS version 9.4 (SAS Institute, Cary, NC, USA). Statistical significance was defined by p < 0.05 in two-sided testing. Table 1 summarizes the demographic and clinical characteristics of the study population at baseline. Among the 2206 CRC patients, there were more men (58.8%) than women (41.2%). More than two-thirds of the study population (67.5%) were older than 65 years, 62.0% of patients were overweight or obese, 15.9% of patients were current smoker at baseline, and more than 40% had a relevant comorbidity (CCI > 0). Of the patients, 18.2%, 34.6%, 33.1% and 14.1% were diagnosed with stage I, II, III and IV cancer, respectively. Colon cancer and rectal cancer accounted for 69.6% and 30.4% of CRC patients, respectively. The distribution of age, BMI, smoking status, CCI and tumor sub-site were statistically significantly different across the tumor stages (Supplementary Materials Table S1). A lower lymphocyte proportion and a higher neutrophil proportion were observed in males, older patients, and patients diagnosed at advanced stages, with rectal cancer and with higher CCI (Table S2). Moreover, CD4+ T cells, CD8+ T cells, B cells, NK cells, and monocytes were statistically positively correlated with each other, whereas neutrophils were statistically inversely correlated with other leukocyte subtypes ( Figure S2). Furthermore, sensitivity analyses showed that some of the associations between leukocyte proportions and CRC prognosis were attenuated after additional adjustment for BMI, smoking status and CCI. However, statistically significant associations with CRC prognosis remained for CD4+ T cells, B cells and neutrophils (Table S2). Figure 3 presents the overall and CRC-specific survival curves across quartiles of the proportion of CD4+ T cells by tumor stage. We observed statistically significant associations between a higher proportion of CD4+ T cells and improved overall survival in the early and intermediate stages (I-III) but not in the advanced stage (IV). Moreover, the proportion of CD4+ T cells was statistically significantly positively associated with CRC-specific survival in patients with stage III cancers. Figure 4 shows the tumor subsite-specific association of the proportion of CD4+ T cells with overall and CRC-specific survival. Among patients with colon cancer, a higher proportion of CD4+ T cells was statistically significantly associated with better survival, and the association was weaker for CRC-specific survival than for overall survival. However, the association between the proportion of CD4+ T cells and both outcomes was not statistically significant among patients with rectal cancer. Note: Hazard ratio (HR) and 95% confidence interval (CI) were derived from Cox regression model adjusting for age, sex, tumor stage, tumor subsite, timing of blood sampling and batch (npatients = 2194, ndeaths = 1071). An HR of one means that there is no difference in survival between the two groups. Abbreviation: Q1-Q4, quartile 1 (lowest)-quartile 4 (highest); SD, standard deviation. Note: Hazard ratio (HR) and 95% confidence interval (CI) were derived from Cox regression model adjusting for age, sex, tumor stage, tumor subsite, timing of blood sampling and batch (n patients = 2194, n deaths = 1071). An HR of one means that there is no difference in survival between the two groups. Abbreviation: Q1-Q4, quartile 1 (lowest)-quartile 4 (highest); SD, standard deviation.

Subgroup Analyses
Additionally, patients in the second-highest quartile of CD4+ T cell proportion had the poorest survival among the subgroup whose blood was sampled during or within 5 months after initiation of chemo-or radiotherapy treatment, which is different from the other subgroups ( Figure S3). Moreover, as shown in the dose−response curves, a "Ushaped" dose−response relationship was observed among patients who received surgery, chemotherapy and radiotherapy (or only received surgery and radiotherapy) ( Figure S4).

Figure 2.
Forest plot of the associations between leukocyte composition and colorectal cancer (CRC)specific mortality. Note: Hazard ratio (HR) and 95% confidence interval (CI) were derived from Cox regression model adjusting for age, sex, tumor stage, tumor subsite, timing of blood sampling and batch (npatients = 2179, ndeaths = 593). An HR of one means that there is no difference in survival between the two groups. Abbreviation: Q1-Q4, quartile 1 (lowest)-quartile 4 (highest); SD, standard deviation.  Forest plot of the associations between leukocyte composition and colorectal cancer (CRC)-specific mortality. Note: Hazard ratio (HR) and 95% confidence interval (CI) were derived from Cox regression model adjusting for age, sex, tumor stage, tumor subsite, timing of blood sampling and batch (n patients = 2179, n deaths = 593). An HR of one means that there is no difference in survival between the two groups. Abbreviation: Q1-Q4, quartile 1 (lowest)-quartile 4 (highest); SD, standard deviation. Table 2 presents the discriminative ability of various combinations of predictors, including age, sex, tumor stage, and leukocyte compositions. The prediction of CRC prognosis was modestly improved after adding the proportion of CD4+ T cells into the model, overall and by tumor stage. In all stages, as well as stages I-II and stage III, simultaneous consideration of all leukocyte subtypes combined slightly outperformed exclusive consideration of CD4+ T cell proportions for prediction of CRC prognosis. However, in stage IV, the models including CD4+ T cell proportions only, had an even better performance than the models including all leukocyte subtypes.  Figure 4 shows the tumor subsite-specific association of the proportion of CD4+ T cells with overall and CRC-specific survival. Among patients with colon cancer, a higher proportion of CD4+ T cells was statistically significantly associated with better survival, and the association was weaker for CRC-specific survival than for overall survival. However, the association between the proportion of CD4+ T cells and both outcomes was not statistically significant among patients with rectal cancer.  Harrell's C-statistics (95% confidence interval) for all-cause mortality and colorectal cancer (CRC)-specific mortality prediction.  Additionally, patients in the second-highest quartile of CD4+ T cell proportion had the poorest survival among the subgroup whose blood was sampled during or within 5 months after initiation of chemo-or radiotherapy treatment, which is different from the other subgroups ( Figure S3). Moreover, as shown in the dose−response curves, a "Ushaped" dose−response relationship was observed among patients who received surgery, chemotherapy and radiotherapy (or only received surgery and radiotherapy) ( Figure S4).

Discussion
This analysis is the first prospective study to investigate the association of DNAmbased estimates of leukocyte composition with CRC prognosis. Patients with higher estimated proportions of CD4+ T cells, CD8+ T cells and B cells had lower all-cause and CRC-specific mortality, and patients with a higher proportion of neutrophils had higher allcause and CRC-specific mortality. Among all leukocyte subtypes, CD4+ T cells presented the strongest association with CRC prognosis. A one-SD increment of the proportion of CD4+ T cells (7%) was associated with 17% and 18% lower all-cause and CRC-specific mortality, respectively. Subgroup analyses showed that the association of CD4+ T cells with CRC prognosis was statistically significant in stage I-III and colon cancer patients. CD4+ T cells, including T helper (Th) cells and regulatory T (Treg) cells, function to instigate and shape adaptive immune responses. Th cells can target tumor cells in various ways, either directly by eliminating tumor cells through cytolytic mechanisms or indirectly by activating and directing innate immune cells, B cells and CD8+ T cells [26]. Studies have shown that intratumoral CD4+ T cell infiltration can suppress tumor progression [27,28]. On the other hand, Treg cells, which are essential to maintain self-tolerance and immune cell homeostasis, are involved in the tumor progression by establishing an immunosuppressive tumor microenvironment [29]. Suppression of tumor-specific CD4+ T cells by Treg was associated with a worse CRC prognosis in a cohort of 62 patients [30]. CD8+ T cells, also known as cytotoxic T lymphocytes, can infiltrate into tumor tissue and kill malignant cells by releasing cytotoxic granules or by recognizing Fas ligand on the surface of target cells. A higher density of intratumoral CD8+ T cells was associated with improved CRC prognosis, which is in line with our findings [31,32].
B lymphocytes are involved in cell-mediated adaptive immunity and can inhibit tumor progression by secreting tumor-reactive antibodies and priming T cells [33]. Various studies reported that the tumor-infiltrating B cell density was positively associated with CRC prognosis [34,35]. However, regulatory B cells can suppress Th1 and CD8+ T cell responses and subsequently promote tumor progression, similar to Treg cells [33]. Nevertheless, evidence on the association of circulating CD4+ and CD8+ T cells and B cells with CRC prognosis is scarce. Further studies are needed to confirm our findings.
NK cells can kill tumor cells without any priming or prior activation. Tang and colleagues observed in 447 CRC patients that patients with an increased percentage of peripheral blood NKs had a higher 3-year survival rate [36]. It was consistent with our results that a marginal inverse association between NK proportion and all-cause mortality was observed among CRC patients whose circulating leukocyte composition was not affected by the therapies. Monocytes are a major innate immune component of the mononuclear phagocyte system. A meta-analysis of 16 prospective studies comprising 3826 CRC patients showed that increased absolute blood monocyte counts were significantly associated with worse overall survival [37]. However, no significant association between the proportion of blood monocytes and CRC prognosis was observed in our study. The difference between the results of the two studies could be explained by the different measures of peripheral blood monocyte (cell count vs. cell proportion).
Neutrophils are the most abundant leukocytes in blood and an essential part of the innate immune system. In line with our findings, a higher neutrophil to lymphocyte ratio in peripheral blood is associated with higher mortality among CRC patients, overall or by tumor stage, which suggests that lymphocytes and neutrophils may play opposite roles in CRC progression [10][11][12][13][14]. Evidence also shows that neutrophils may play both pro-and antitumor roles, depending on the heterogeneous subsets [38,39]. The association between neutrophil levels and CRC prognosis is controversial. Rao et al. observed that elevated intratumoral neutrophils in CRC were significantly associated with malignant phenotype and adverse prognosis in 229 patients [40]. However, Wikberg et al. found in a study of 448 patients that low infiltration of neutrophils in the tumor front was indicative of a worse prognosis [41]. Thus, research with larger CRC patient cohorts is needed to investigate the relationship of neutrophils with survival. DNAm-base markers have shown prognostic value for CRC patients in previous studies [5,42]. Our study further revealed that blood DNAm-based leukocyte composition, especially the proportion of CD4+ T cells, could be used as an independent marker to enhance clinical judgment of prognosis in patients with early or intermediate stage CRC. Leukocyte composition can also be used to explore potential mechanisms underlying immune responses to tumor growth. The positive association of each subtype of circulating lymphocytes with CRC overall survival suggests that lymphocytes, at the interface of innate and adaptive immunity, suppress CRC progression. Our data suggest that specific lymphocytes and neutrophils may be potentially utilized as pathogenic effectors and therapeutic targets. CD4+ T cells were the strongest predictor for CRC prognosis among leukocyte subtypes in our study. However, the associations between CD4+ T cell proportion and CRC survival outcomes were weak and statistically nonsignificant among patients with advanced CRC. A possible explanation is that the survival for advanced CRC is generally extremely poor and the case numbers in this group were the smallest, which limited the statistical power to detect possible associations. Among early-and intermediate-stage patients, the predictive performance of circulating CD4+ cell proportion was better for all-cause mortality than for CRC-specific mortality, even after controlling for comorbidity score. It suggested that circulating CD4+ cell may not serve as a specific predictor for CRC prognosis. In subsite specific analyses, the association of a higher proportion of CD4+ T cells with better survival was statistically significant among colon cancer patients but not rectal cancer patients, of whom the sample size was smaller.
This study has several strengths including the prospective study design, large sample size, long-term follow-up, the well-recorded causes of death, and detailed information on the study population. Moreover, the composition of the leukocyte population was estimated using DNAm array data that is stable and can be obtained from stored whole blood samples.
There are also limitations. First, blood samples were obtained at various points of time shortly before or after surgery and potentially other therapy. However, leukocyte composition can be affected by surgery, chemo-and radiotherapy administration [43]. The percentage of neutrophils was higher, and the percentage of lymphocytes was lower shortly after surgery and chemotherapy or radiotherapy ( Figure S1). We therefore adjusted for the timing of blood sampling relative to treatment in all Cox models. Survival curves stratified by the timing were performed. Significant and consistent associations of CD4+ T cell proportions with CRC prognosis were observed among all groups of patients except those whose blood sample was taken during, or within 5 months of the initiation of, chemoor radiotherapy. These patterns suggest that one should avoid collecting blood samples within the course of chemo-or radiotherapy when using leukocyte estimates to predict CRC prognosis in research and clinical practice. Second, the absolute leukocyte counting information is missing in our study. Even though the DNAm-based estimation of leukocyte composition has been validated in various studies, comparison of the DNAm-based method with the traditional cell counting method is still needed in further studies. However, DNA methylation estimated cell proportions could help scientists to further understand the role of cell counts as prognostic biomarkers in CRC by using archived blood samples. Third, CRC is recognized as a heterogeneous disease, and its behavior differs between molecular subtypes [44], information on which was only available for a proportion of our patients and did not allow sufficiently powered analyses in our study. Further studies with comprehensive information on CRC molecular subtypes are needed to investigate the association across molecular subtypes. Lastly, the current high cost of DNAm microarray makes this technique difficult to be promoted in clinical settings. However, this study provides a potential tool to improve the prediction of CRC prognosis and to assist the physician's decision-making. It also provides a foundation for further study of inspiration. The DNAm-based markers may gain wider use if a low-price DNAm microarray technique is developed and commercialized.

Conclusions
Our findings suggest that blood DNAm-based estimates of leukocyte distribution, in particular the proportion of CD4+ T cells, have the potential to improve the accuracy of prognostic judgment for patients with early and intermediate CRC. However, validation by other studies, in particular studies with information on molecular subtypes of CRC and leukocyte count, is warranted to corroborate and extend the clinical importance of this observation.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/cancers13122948/s1, Table S1: Baseline characteristics of the patient cohort according to tumor stages. Table S2: Median and interquartile range of leukocyte proportions by characteristics of the study population at baseline. Table S3: Associations of leukocyte composition with all-cause and colorectal cancer (CRC)-specific mortality in total study population (sensitivity analyses). Table S4: Associations of leukocyte composition with all-cause and colorectal cancer (CRC)-specific mortality among patients whose blood sample was taken prior to surgery, more than 1 month after surgery but not receiving chemo-or radiotherapy or more than 5 months after the first chemo-or radiotherapy. Figure S1: Leukocytes proportion according to the time of blood collection relative to treatments. Figure S2: Correlations of leukocyte subtype proportions. Figure S3: Survival curves for CD4+ T cell proportion by the timing of blood sampling. Figure S4: Adjusted restricted cubic splines for CRC patients across quartiles of CD4+ T cells proportion by therapies.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to ethical and data security requirements.