Epigenome-Wide Association Study of Prostate Cancer in African Americans Identifies DNA Methylation Biomarkers for Aggressive Disease

DNA methylation plays important roles in prostate cancer (PCa) development and progression. African American men have higher incidence and mortality rates of PCa than other racial groups in U.S. The goal of this study was to identify differentially methylated CpG sites and genes between clinically defined aggressive and nonaggressive PCa in African Americans. We performed genome-wide DNA methylation profiling in leukocyte DNA from 280 African American PCa patients using Illumina MethylationEPIC array that contains about 860K CpG sties. There was a slight increase of overall methylation level (mean β value) with the increasing Gleason scores (GS = 6, GS = 7, GS ≥ 8, P for trend = 0.002). There were 78 differentially methylated CpG sites with P < 10−4 and 9 sites with P < 10−5 in the trend test. We also found 77 differentially methylated regions/genes (DMRs), including 10 homeobox genes and six zinc finger protein genes. A gene ontology (GO) molecular pathway enrichment analysis of these 77 DMRs found that the main enriched pathway was DNA-binding transcriptional factor activity. A few representative DMRs include HOXD8, SOX11, ZNF-471, and ZNF-577. Our study suggests that leukocyte DNA methylation may be valuable biomarkers for aggressive PCa and the identified differentially methylated genes provide biological insights into the modulation of immune response by aggressive PCa.


Introduction
Prostate cancer (PCa) is the most common cancer and second leading cause of cancer death in American men, with an estimated 248,530 new cases and 34,130 deaths from PCa in the U.S. in 2021 [1]. African American men have the highest incidence and mortality rates among all the racial/ethnic groups in the U.S. [2,3]. Prostate-specific antigen (PSA) testing has enabled the detection of PCa at early stages and greatly improved the survival of PCa. The majority of PSA screening-detected PCa are localized, indolent, and not life-threatening. However, most of the PCa patients opt to receive upfront aggressive therapies (radical proctectomy and radiotherapy) that are often associated with significant morbidity, thus overdiagnosis and overtreatment for localized PCa patients have become

Study Population
This study included 280 self-reported African American men with histologically confirmed prostate adenocarcinoma. All patients were treated and followed up at the University of Texas MD Anderson Cancer Center. Blood specimens were collected at the time of diagnosis, before treatments. Clinical and follow-up data, which included date of diagnosis, performance status, clinical stage, Gleason scores, PSA levels at diagnosis and follow-up, and treatment (e.g., prostatectomy, radiotherapy, and hormone therapy), were extracted from electronic medical records. The study was approved by the Institutional Review Board of MD Anderson Cancer Center. All patients signed an informed consent form.

Whole Genome Methylation Profiling Using Illumina Human MethylationEPIC Beadchip
The whole genome CpG site methylation profiling was performed using Illumina human MethylationEPIC Beadchip, as previously described [22,23]. The MethylationEPIC Beadchip contains over 850,000 CpG sites across the human genome, among which 54% are located within gene promoters, 30% in gene bodies, and 16% in intergenic regions (16%) [24]. The CpG sites in CpG island are enriched on the MethylationEPIC chip, accounting for 19% of all the CpG sites on the chip [24]. Briefly, 500 ng genomic DNA from peripheral blood was treated with sodium bisulfite using the EZ DNA Methylation-Gold Kit (Zymo Research, Irvine, CA) following manufacturer's protocol. Bisulfite-treated DNA was hybridized to the MethylationEPIC chip according to the manufacturer's protocol.
Beadchips were scanned on an Illunima HiScan SQ. The fluorescence intensities of the images were extracted using Genome Studio Methylation Module.

Bioinformatics and Data Analyses
Raw fluorescence intensity data (idat files) were processed and normalized using the minfi R package [25]. The methylation status of each CpG site was shown as βvalue, calculated as the ratio of the fluorescence intensity signals of the methylated (M) and unmethylated (U) alleles [25]. β values range between 0 (non-methylated) and 1 (completely methylated). The batch effect was removed using the ComBat function from R package [26]. The variation in peripheral blood white cell proportions was controlled for using cell proportion estimates generated by the estimateCellCounts function in minfi R package [27]. The differentially methylated CpG sites (DMCs) were identified by comparing the β values of each CpG site between GS = 6, GS = 7, and GS ≥ 8 groups using a trend test. To determine the prognostic value of each DMC, we dichotomized patients into highand low-methylation groups based on the median β value and used the low-methylation group as the reference group. A multivariable Cox proportional hazards model was used to estimate the hazard ratio (HR) and 95% confidence interval (CI) for the associations of each DMC and biochemical recurrence (BCR), adjusting for age, PSA level, Gleason score, clinical stage, and treatment. We also identified differentially methylated regions/genes (DMRs). We set the criteria of DMR as having at least 7 consecutive CpG sites and the largest distance between each CpG site as 500 base pairs.

Leukocyte CpG Methylation Pattern in African American PCa Patients
We first compared the genome-wide CpG methylation levels between patients with different Gleason scores. Overall, the mean methylation levels of all the assayed 860K CpG sites were slightly higher in GS ≥ 8 (mean β = 0.5091) and GS = 7 (mean β = 0.5088) than in GS = 6 (mean β = 0.5076) patients (P for trend = 0.002). As expected, the mean methylation levels were dramatically different depending on the locations of CpG sites, with CpG sites within or near the transcriptionally active regions showing the lowest methylation levels ( Figure 1). CpG islands play critical roles in regulating gene expression and mostly have very low methylation (mean β = 0.189), allowing active transcription, while CpG sites outside of CpG islands have much higher methylations (mean β = 0.420, 0.645, and 0.625 for CpG sites in shores, shelfs, and open seas relative to islands). When CpG sites were grouped by locations relative to genes, the mean β values showed a progressive increase associated with the increased distance to the core promoter. CpG sites within 200 base pairs of the transcription start site (TSS200) had the lowest methylation (mean β = 0.187), followed by those in the Exon 1 (mean β = 0.218), TSS-1500 (within 1500 base pairs of TSS) (mean β = 0.378), 5 untranslated region (5 -UTR) (mean β = 0.417), intergenic region (IGR) (mean β = 0.566), gene body (mean β = 0.608), and 3 -UTR (mean β = 0.653) ( Figure 1). We then performed a trend test to identify individual CpG sites that showed significant trends of increasing or decreasing methylation associated with increasing Gleason scores. There were 52,456 differentially methylated CpG sites with a nominal significance (P < 0.05), 10,734 CpG sites with P < 0.01, 993 sites with P < 0.001, 78 sites with P < 10 −4 , and 9 sites with P < 10 −5 . About 80% of these DMCs showed a progressive increase of methylation with increasing GS (Table 2).  To test the prognostic value of these DMCs, we used a multivariable Cox proportional hazards model to determine the associations of these DMCs with biochemical recurrence (BCR). Among these top DMCs, only two were independently associated with BCR at a significance level of 0.05 (Table 2). High methylation at cg16432885 and cg00915676 was associated with significantly increased risks of BCR (HR = 3.66, 95% CI, 1.33-10.11, P = 0.012 and 3.24, 95% CI, 1.12-9.4, P = 0.030, respectively). The association of high methylation with worse prognosis is consistent with increased methylation level in patients with higher Gleason score at these two CpG sites.

Differentially Methylated Regions/Genes (DMRs) in High-Grade African American PCa Patients
Due to the fact that DMRs are more likely to be functionally important than scattered individual DMCs, we next searched for DMRs in which at least seven CpGs showed consistently increased or decreased methylation associated with increasing Gleason scores. A total of 77 DMRs were found (Table 3 and Supplemental Table S1). There were 10 homeobox genes (ALX1, HOXC11, HOXD1, HOXD8, HOXD11, LHX8, MSX2, NKX6-2, PAX7, POU4F2) and six zinc finger protein genes (ZBTB16, ZNF83, ZNF471, ZNF577, ZNF714, ZSCAN1). Figure 2 shows the representative DMRs. The majority of these DMRs were found in gene promoter regions with higher methylation levels in patients with higher Gleason scores. We performed a gene ontology (GO) molecular pathway enrichment analysis of these 77 DMRs. The main enriched pathway was DNA-binding transcriptional factor activity, consistent with the major functional roles of homeobox proteins and zinc finger proteins as transcriptional factors.

Discussion
In this study, we performed genome-wide CpG methylation profiling of leukocyte DNA from 280 African American PCa patients with different Gleason scores to identify intrinsic methylation differences that may serve as predictors of aggressive PCa in African Americans. To our knowledge, this is the first EWAS of PCa in African Americans.
As expected, the mean methylation level was the lowest in the core promoter region (TSS-200) (mean β value < 0.2) and also remained at low levels in Exon 1, TSS-1500, and 5 UTR (mean β values range between 0.2 and 0.4), but much higher levels were found in the gene body, 3 UTR, and IGR (mean β values range between 0.5 and 0.7), consistent with the literature of lower promoter methylation allowing a more open chromatin structure and active transcription [28]. When we compared the overall leukocyte methylation levels of GS = 6, GS = 7, and GS ≥ 8 patients, there was a slight increase of methylation level with the increase of Gleason scores. We identified a panel of promising DMCs that are differentially methylated between different Gleason scores. These CpG site methylations may serve as biomarkers for aggressive cancer. More importantly, we identified at least 77 DMRs associated with high-grade PCa. The majority of these DMRs exhibited increased methylation with increased Gleason scores and are located in the promoter island regions of functionally important genes, indicating an overall downregulation of gene expression in leukocytes of high-grade PCa patients, likely affecting immune response, DNA repair, etc., as well as contributing to the aggressive phenotypes.
Among the DMRs between PCa patients with high and low Gleason scores, there were 10 homeobox genes and six zinc finger genes. At least 235 homeobox genes have been identified in the human genome [29]. Homeobox genes contain a highly conserved DNA sequence of about 180 bp encoding the homeodomain of 60 amino acids. Homeodomain proteins act as transcription factors that specifically bind to DNA motifs and regulate the expression of numerous target genes involved in cell proliferation, apoptosis, adhesion, angiogenesis, and DNA repair [30,31]. Homeobox genes are frequently dysregulated in hematological malignancies and solid tumors [31,32]. In addition, homeodomain proteins also play important roles in regulating inflammation and immune response [33][34][35]. Aber-rant DNA methylation is one of the major causes of homeobox gene dysregulation during cancer development and progression [36]. Many homeobox genes are hypermethylated in various cancers, including prostate cancer [36]. Two of the identified homeobox genes in our current study, HODX1 and HOXD8, have been shown to play a tumor-suppressor function in various cancers [37][38][39]. Increased methylation of HOXD8 was observed in lymphoma patients compared to normal B cells [40]. Hypermethylation of HOXD8 in urine was associated with disease progression in PCa patients on active surveillance [41]. HOXD8 and several other homeobox genes were hypermethylated in aged muscle tissue compared with young tissue [42]. These previous reports are consistent with our observations of increased HOXD8 methylation in high Gleason score PCa patients. Increased methylation of homeobox genes in leukocyte DNA of PCa patients with high Gleason scores may indicate weakened immune response and suboptimal DNA repair capacity.
The zinc finger proteins are the largest family of transcriptional factors, which function through the binding of the zinc finger domain to specific DNA sequences [43]. In addition to DNA binding, zinc finger proteins can also bind to RNAs and proteins [44]. Zinc finger proteins play crucial roles in transcriptional and post-transcriptional regulation of immune response [45] and are involved in cancer development and progression [46]. Among the identified zinc finger proteins in our study, ZNF471 functions as a tumor suppressor in several cancers and is frequently hypermethylated in tumor tissues [47][48][49][50][51]. Higher levels of ZNF577 methylation in leukocytes have been associated with obesity [52]. As obesity is associated with aggressive PCa [53], ZNF577 methylation may provide a biological link between obesity and PCa progression. Another study showed higher ZNF577 methylation in T-cells from kidney transplant patients who developed de novo skin cancer than those who did not develop skin cancer [54], suggesting increased methylation of ZNF577 may lead to weakened immune response. The exact molecular mechanisms for the roles of these homeodomain and zinc finger proteins play in aggressive PCa warrant further investigation.
Another interesting DMR in our study is SOX11. SOX proteins are a family of about 20 transcriptional factors that have a conserved high-mobility group (HMG) domain that mediates DNA binding [55]. SOX11 has been shown to act as a tumor suppressor in several cancers, including prostate cancer [56,57]. SOX11 is hypermethylated in prostate tumor tissues and the hypermethylation is associated with aggressive clinical features, including higher PSA and Gleason scores [58]. A previous study showed higher SOX11 methylation in leukocyte DNA from gastric patients than that from controls [59]. Our study is the first to show an increased methylation of SOX11 in leukocyte DNA from aggressive PCa patients.
Previously, we performed a similar study of EWAS in European American (EA) prostate cancer patients using the Illumina 450k methylation arrays [23], which covered about 60% of the CpG sites on the MethylationEPIC arrays in this current study. The overall methylation level (mean β value) is higher in AA than EA patients with the same Gleason scores. There were more hypermethylated than hypomethylated CpG sites in patients with higher Gleason scores compared to those with lower Gleason scores. There was very little overlap among the top DMCs and DMRs, but there was significant enrichment of transcriptional factors among DMRs.
CpG methylation in leukocyte DNA sits at the interface between genetics and environment, with longstanding effects on gene expression, inflammation, and immune response [60]. Leukocyte DNA methylation signatures are excellent biomarkers of age and smoking status in a normal population [61][62][63][64]. We compared methylation patterns in PCa patients with different Gleason scores. The age and smoking status distributions in patients with GS6, GS7, and GS ≥ 8 in our study were very similar. In addition, most of our patients were never smokers. The differences of CpG site methylation were consistent when we performed stratified analyses by age group and smoking status (data not shown). The top DMCs in our study (Table 2) were not among those DMCs found in the aging and smoking methylation signatures [61][62][63][64]. Therefore, the DMCs identified in this current study were not likely due to age, smoking, or other potential confounders. Leukocyte DNA methylation profiles have been used to derive systemic inflammation indices [65][66][67][68] and immune cell lineages [60]. We did not observe significant differences in the immune cell subpopulations between patients with different Gleason scores (data not shown), indicating that the differential methylations were not due to differences in immune cell subtypes, but were intrinsic biological changes associated with disease severity across various immune cell types. Leukocyte DNA methylation levels, therefore, may be valuable biomarkers for aggressive PCa patients. However, it is worth noting that the absolute methylation difference (β value difference) at each CpG site between high-grade and low-grade patients was small (generally <0.05). This small difference of leukocyte DNA methylation level between high-grade and low-grade PCa patients is not surprising because of the high background of normal leukocyte methylation.
There were some limitations of this study. Firstly, this was a single-center study. External validations using independent populations are warranted to confirm the DMCs and DMRs identified from this study. Secondly, we only reported nominal significance values ( Table 2). None of the DMCs reached genome-wide significance level (P < 5.9 × 10 −8 ) after correcting for multiple testing of 850K markers. This is not surprising as an extremely large sample size is always required for genome-wide association studies (GWASs) and EWASs to reach genome-wide significance level. Future large collaborative efforts are needed to significantly increase the sample size and bring the top DMCs to a genome-wide significance level.
In summary, we performed genome-wide DNA methylation profiling of leukocyte DNA in African American PCa patients. We observed slightly increased overall DNA methylation in high-grade PCa patients compared to low-grade PCa patients. We identified a large panel of differentially methylated CpG sites between patients with different Gleason scores. We found 77 differentially methylated genes between high-grade and low-grade patients, and homeodomain protein and zinc finger protein genes were enriched in DMRs.
Our study suggests that leukocyte DNA methylation may be a valuable biomarker for aggressive PCa and the identified DMRs provide biological insights into the modulation of immune response by aggressive PCa.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to due to privacy information and ethical consideration.