Systemic Investigation of Promoter-wide Methylome and Genome Variations in Gout

Current knowledge of gout centers on hyperuricemia. Relatively little is known regarding the pathogenesis of gouty inflammation. To investigate the epigenetic background of gouty inflammation independent of hyperuricemia and its relationship to genetics, 69 gout patients and 1455 non-gout controls were included. Promoter-wide methylation was profiled with EPIC array. Whole-genome sequencing data were included for genetic and methylation quantitative trait loci (meQTL) analyses and causal inference tests. Identified loci were subjected to co-methylation analysis and functional localization with DNase hypersensitivity and histone marks analysis. An expression database was queried to clarify biologic functions of identified loci. A transcription factor dataset was integrated to identify transcription factors coordinating respective expression. In total, seven CpG loci involved in interleukin-1β production survived genetic/meQTL analyses, or causal inference tests. None had a significant relationship with various metabolic traits. Additional analysis suggested gouty inflammation, instead of hyperuricemia, provides the link between these CpG sites and gout. Six (PGGT1B, INSIG1, ANGPTL2, JNK1, UBAP1, and RAPTOR) were novel genes in the field of gout. One (CNTN5) was previously associated with gouty inflammation. Transcription factor mapping identified several potential transcription factors implicated in the link between differential methylation, interleukin-1β production, and gouty inflammation. In conclusion, this study revealed several novel genes specific to gouty inflammation and provided enhanced insight into the biological basis of gouty inflammation.


Marker Selection
In the MethylationEPIC platforms, CpG markers were classified based on their chromosome location and the feature category gene region as per University of California (UCSC) annotation (TSS200, TSS1500, 5'UTR, first Exon, Body, 3'UTR, and intergenic). In this classification system, the TSS200 category included the region between 0 and 200 bases upstream from the transcriptional start site (TSS); the TSS1500 category contained 201 to 1500 bases upstream TSS [10]; 5'UTR included the region between the TSS and the start site (ATG); CpGs within the first exon of a gene were considered as the first Exon category; CpGs downstream from the first exon including intronic regions until the stop codon were classified as gene body; CpGs located downstream from the stop codon until the poly A signal were considered as 3'UTR; and CpGs that were not classified in any of the previous categories were annotated as intergenic. Since contributions of environmental influences to DNA methylation peaked in the vicinity of transcription start sites [11] and binding sites of transcription factors, the readers and effectors of DNA methylation, occurred near transcription start sites [12,13], regions close to transcription start sites were more likely to be functional compared with those that lay far from transcription start sites. Past studies also confirmed a key role for the region proximal to transcription start sites in transcriptional regulation [14], and transcriptional silencing occurred when DNA region near transcription start site, including transcription start site upstream area and 5'UTR, became heavily methylated [15][16][17]. Hence, we focused our analysis on CpG sites located in TSS1500, TSS200, and 5'UTR that was broadly defined as promoters in past studies [18]. CpG sites of X and Y chromosome were excluded since X chromosomes underwent inactivation through methylation in females [19].

Methylation Data Processing and Analysis
Raw Idat files containing fluorescence intensity data were loaded. The intensity of methylated and unmethylated probe values was used to generate methylation β-values that were used for all of the downstream analyses. Human hg19 genomes were downloaded from the UCSC Genome Browser website (https://genome.ucsc.edu/). All of the downstream analyses were conducted using the hg19/GRCh37 human genome assembly. Minfi version 1.18.2 [20] was employed to load, annotate probes, and analyze the relationship between CpG sites and gout.
Methylation results from EPIC array were analyzed according to previously reported approaches ( Figure S1, Step 1) [21]. Quality control at the probe level was conducted by computing a detection P value relative to control probes. Probes with non-significant detection (P > 0.05) for 5% or more of the samples were excluded ( Figure S1, Step 1b). Furthermore, we removed probes annotated to sex chromosomes ( Figure S1, Step 1c), non-CpG probes ( Figure S1, Step 1d), probes containing single nucleotide polymorphisms (SNPs) (minor allele frequency ≥ 5%), probes with SNPs at the single base extension (minor allele frequency ≥ 5%), and probes with an SNP at the CpG site (minor allele frequency ≥ 5%) ( Figure S1, Step 1e) [21]. Finally, we excluded 40,377 cross-reactive probes previously identified in the MethylationEPIC BeadChip ( Figure S1, Step 1f) [21]. Data were further preprocessed using functional normalization with principal components from control probes to adjust for technical variation ( Figure S1, Step 1g) [21]. Qualified probes were included in the following analyses.
To evaluate associations between methylation and gout, a linear regression model was used to identify the differentially methylated probes by testing the association of every CpG site with gout, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets ( Figure S1, Step 1h), similar to past approaches [22,23]. Non-smokers included those who never smoked or did not continuously smoke for six months or more. Former smokers were those who continuously smoked for a minimum of six months but were not smoking at the time that data were collected, while current smokers included those who ceaselessly smoked for six months or more and were still smoking. Alcohol consumption categories comprised non-drinkers (those who did not drink alcohol or drank <150 cc per week for six months), former drinkers (those who quitted alcohol for more than six months) and current drinkers (those whose weekly alcohol consumption for six consecutive months was at least 150 cc), according to the definition from Taiwan Biobank questionnaires [2].
The proportions of various cell types were inferred with minfi [24,25]. Minfi used informative CpG probes of past studies to estimate the proportions of T-(CD8, CD4), NK-, and B-lymphocytes, monocytes, and total granulocytes [24]. Differential methylation associations between gout and nongout were corrected for multiple testing using a Benjamini-Hochberg method [26]. The threshold of significance levels was set at 5 % as described previously [27].

Identification of CpG Sites Specifically Associated with Gouty Inflammation
To gain insight about these differentially methylated CpG sites, we built protein-protein interaction network on the target genes mapped by differentially methylated CpG sites with NetworkAnalyst ( Figure S1, Step 2a) [28] and visualized with Cytoscape [29]. NetworkAnalyst utilized machine learning and Walktrap algorithms and integrated protein-protein interaction data from IMEx Interactome database to identify important genes (hubs) that played critical roles in the biological networks [29]. NetworkAnalyst showed that many hub genes were interleukin-1β (IL-1β)regulating genes ( Figure S2). Thus we conducted a literature search to clarify biologic functions of differentially methylated genes. CpG sites mapped to genes regulating IL-1β, the key player in gouty inflammation [30], or participating in gouty inflammation in past studies were retained for the following analysis ( Figure S1, Step 2b).
Additionally, gout was associated with numerous metabolic comorbidities, including increased body mass index, elevated glycated hemoglobin (HbA1c), and hypercholesterolemia [31,32]. To test the specificity of identified CpG methylation in gout, we explored associations between DNA methylation and body mass index, HbA1c, and total cholesterol, adjusting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and cell proportions inferred with minfi ( Figure S1, Step 2c) [24,25].
To further exclude the contribution of these CpG sites to progression from normouricemia to hyperuricemia, we investigated whether methylation was related to hyperuricemic status in nongout participants, using sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets as covariates ( Figure S1, Step 2d).

Comparison with Past Genome-Wide Association Studies (GWAS) Results
To compare study results with past uric acid-associated risk loci, we conducted a systematic literature search. A systematic literature search of PubMed (on October 1, 2019) identified studies addressing the relationship between genetic loci and uric acid levels. The following search terms were used: 'GWAS' and 'uric acid'. Studies were included if they were written in English and contained data on loci associated with uric acid levels. No limitation was placed on ethnicity. Review articles and case reports were excluded. Selected articles were screened for potential uric acid-associated loci. Only genome-wide significant loci (P < 5 × 10 -8 ) related to uric acid levels were retained (Figure S1, Step 2e).

Genetic and Methylation Quantitative Trait loci (meQTL) Analysis and Causal Inference Test
To eliminate epigenetic associations between gout and CpG site methylation confounded by genetic factors, we conducted methyl-quantitative trait locus (meQTL) analyses and causal inference tests of variants within 20000 base pairs of CpG, similar to previous studies ( Figure S1, Step 3) [33,34]. Briefly, we first retrieved variant information from UCSC dbSNP version 151 track from UCSC. Variants that were located within 20000 base pairs of CpG of interest were reserved. Genotyping results of respective variants were obtained from whole genome sequencing. Next, we evaluated the associations between genetic variants and methylation level of a CpG site, adjusting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and cell compositions. Additionally, we also tested associations between variants and gout, adjusting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and cell proportions, and plotted regional variant results from analysis.
For variants of concomitant associations with CpG methylation and gout, differential methylation potentially mediated genetic association between these variants and gout ( Figure S3), as demonstrated in past study of rheumatoid arthritis [34]. Thus, we conducted causal inference test ( Figure S1, Step 3b) [34,35]. The causal inference test involved a series of conditional probability tests to determine if the association between variant and gout was mediated by CpG methylation, in this case, by testing for the following conditions: (a) variant was associated with gout; (b) variant was associated with CpG methylation after adjusting for gout; (c) CpG methylation was associated with gout after adjusting for variant; and (4) variant was independent of gout after adjusting for CpG methylation. These four conditions had to be met to produce a significant causal inference test that supported the causal structure of an variant-CpG methylation-gout relationship ( Figure S3). If not, variants possibly confounded associations of CpG methylation with gout.

Methylation Correlation of Methylated CpG Positions
After identifying CpG sites whose methylation was associated with gouty inflammation, we applied coMET to estimate DNA methylation correlation between CpG sites ( Figure S1, Step 4a) [36]. We searched for nearby CpG sites located in TSS1500, TSS200, and 5'UTR. A regional plot of the epigenetic-phenotype association results, the estimated DNA methylation correlation between CpG sites (co-methylation), and the genomic context was generated to visualize the results.

Functional Localization of Differentially Methylated CpG Sites in Monocytes
To characterize the biological relevance of differentially methylated CpG sites, the WashU epigenome browser (https://epigenomegateway.wustl.edu/) was utilized as the annotation database to inspect chromatin modifications and histone acetylation patterns of monocytes in the vicinity of CpG sites of interest [37]. We retrieved DNase footprinting patterns in the region of our association results since DNase hypersensitivity implied open chromatin structure, the usual characteristics of active regulatory elements [38]. CpG regions were also aligned against ChIP-seq data for acetylated and methylated lysine variants of histone H3 (H3K4me1, H3K4me3, H3K9ac, and H3K27ac), the histone code of active regulatory regions [39]. Locations of differentially methylated sites with respect to DNase and histone code of monocytes was visualized in WashU's epigenome browser ( Figure S1, Step 4b).

Gene Expression Omnibus (GEO) Data Analysis
To elucidate the potential functions of differentially methylated genes, we retrieved past expression microarray data from the GEO repository for further analysis. Briefly, GEOquery was utilized to download a corresponding SeriesMatrix file and platform annotation files via FTP from corresponding GEO datasets [50]. Expression levels of various genes were extracted from downloaded SeriesMatrix files that were inputs of differential expression analysis. Limma Bioconductor libraries were used to calculate the statistical significance of expression level changes of interested genes between different conditions [51]. 33   The results were expressed as mean ± standard deviation. HbA1C: glycosylated hemoglobin. NA: not applicable. regulates cells proliferation and apoptosis [1], migration [2], invasion [3], and drug resistance [4] cg11168446 0.87% 1.25×10 -5 1 94312877 BCAR3 TSS200 is associated with proliferation, migration, invasion, drug resistance of cancer [5], and contributes to cancer prognosis, diabetes, and body mass index [6][7][8]  participates in iron metabolism, cell proliferation, migration, invasion [31,32], cytotoxic drug resistance, DNA damage response [33,34], protein ubiquitination and degradation [35,36], and is associated with Parkinson's disease [37] cg01120514 modulates transforming growth factor (TGF)-β signaling [193], proliferation, migration, invasion [194,195], angiogenesis [196], bone remodeling, chondrogenesis [197,198], fibrosis [199], metabolism [200][201][202], and contributes to neurodevelopment disorder [203,204], chronic otitis media [192], colistin nephrotoxicity [205], and myopia [206] cg21486148 -0.74% 6.03×10 -6 19 1070688 HMHA1 TSS1500 regulates proliferation, migration, invasion, and endothelial barrier function [207,208] Figure S1. Flowchart of the analysis pipeline. Arrows indicate the direction of the workflow of the study design. Five major steps are represented by different colors. The boxes indicate the analyses taken within each step. The study cohort first undergoes promoter-wide methylation profiling with EPIC array (Step 1a). After removing probes with non-significant detection (Step 1b), sex chromosome probes (Step 1c), non-CpG probes (Step 1d), probes containing single nucleotide polymorphism (SNP) or with SNP at single base extension or SNP at CpG site (Step 1e), cross-reactive probes (Step 1f), remaining probes are subjected to functional normalization (Step 1g) and linear regression, adjusting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and cell subsets (Step 1h). Probes that are significantly associated with gout enter following analyses (Step 2-5). In Step 2a, we create protein-protein interaction network with NetworkAnalyst to gain insight about the biologic network of gout (Step 2a). We then identify CpG sites that are mapped to genes regulating interleukin-1β (IL-1β) or gouty inflammation (Step 2b). We further exclude CpG sites that are associated with metabolic traits (Step 2c), epigenetic associations of which come from hyperuricemia (Step 2d), or which are located in uric acid-associated genes (Step 2e). For the remaining CpG sites, genetic and methylation quantitative trait loci (meQTL) analyses (Step 3a) with additional causal inference tests (Step 3b) are conducted to identify CpG sites whose epigenetic relationships with gout are not confounded by genetic factors. Co-methylation analysis (Step 4a) and functional localization (Step 4b) are performed to reveal regulatory potential of CpG sites. We finally utilize MoLoTool (Step 5a) and ReMap (Step 5b) to identify potential transcription factors mediating the relationship between CpG sites and gout (Step 5). Figure S2. Protein-protein interaction network created by NetworkAnalyst. The protein-protein interaction network reveals several hub genes regulate interleukin-1β (IL-1β) ( Table 1). Hub genes regulating IL-1β are highlighted with distinct hues.

Figure S3. Identification of methylation mediating associations between genetic variant and gout.
The schematic representation shows a causality model where association between genetic variant and gout is caused by methylation-mediated relationship, in which genmetic variant influences CpG methylation level that in turn affects risk of gout. The causal inference test attempts to identify such causal network. Circles represent effect size estimates, with error bars representing respective 95% confidence interval. The effect size estimates and corresponding 95% confidence intervals are also listed. The associations between CpG methylation and body mass index, HbA1c levels, and total cholesterol levels are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (A) Regional association plots of nearby variants with methylation levels of cg20419410. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and cg20419410 methylation. The variants with P values less than threshold are labeled with corresponding rs number. The associations between variants and CpG methylation are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (B) Regional association plots of nearby variants with gout. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and gout. The variants with P values less than threshold are labeled with corresponding rs number. Every point is one variant colored with a respective hue, with different colors implying different variants. The dashed purple lines indicate the significance threshold (P = 0.05), and the red box highlights the location of cg20419410. The associations between variants and gout are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. Figure S6. Genetic and methyl-quantitative trait locus (meQTL) analysis of cg17618153 (ANGPTL2). (A) Regional association plots of nearby variants with methylation levels of cg17618153. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and cg17618153 methylation. The associations between variants and CpG methylation are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (B) Regional association plots of nearby variants with gout. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and gout. The variants with P values less than threshold are labeled with corresponding rs number. Every point is one variant colored with a respective hue, with different colors implying different variants. The dashed purple lines indicate the significance threshold (P = 0.05), and the red box highlights the location of cg17618153. The associations between variants and gout are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. Figure S7. Genetic and methyl-quantitative trait locus (meQTL) analysis of cg15686135 (JNK1). (A) Regional association plots of nearby variants with methylation levels of cg15686135. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and cg15686135 methylation. The variants with P values less than threshold are labeled with corresponding rs number. The associations between variants and CpG methylation are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (B) Regional association plots of nearby variants with gout. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and gout. The variants with P values less than threshold are labeled with corresponding rs number. Every point is one variant colored with a respective hue, with different colors implying different variants. The dashed purple lines indicate the significance threshold (P = 0.05), and the red box highlights the location of cg15686135. The associations between variants and gout are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets.

Figure S8. Genetic and methyl-quantitative trait locus (meQTL) analysis and causal inference test of cg14167017 (UBAP1).
(A) Regional association plots of nearby variants with methylation levels of cg14167017. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and cg14167017 methylation. The variants with P values less than threshold are labeled with corresponding rs number. The associations between variants and CpG methylation are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (B) Regional association plots of nearby variants with gout. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and gout. The variants with P values less than threshold are labeled with corresponding rs number. Every point is one variant colored with a respective hue, with different colors implying different variants. The dashed purple lines indicate the significance threshold (P = 0.05), and the red box highlights the location of cg14167017. Variants of concomitant associations with cg14167017 methylation and gout (rs1189081296, rs962251804) are marked with red color. The associations between variants and gout are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (C) For variants of concomitant associations with cg14167017 methylation and gout (rs1189081296, rs962251804), causal inference test indicates cg14167017 methylation mediates relationship between variants (rs1189081296, rs962251804) and gout. Figure S9. Genetic and methyl-quantitative trait locus (meQTL) analysis and causal inference test of cg11988568 (RAPTOR). (A) Regional association plots of nearby variants with methylation levels of cg11988568. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and cg11988568 methylation. The variants with P values less than threshold are labeled with corresponding rs number. The associations between variants and CpG methylation are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (B) Regional association plots of nearby variants with gout. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and gout. The variants with P values less than threshold are labeled with corresponding rs number. Every point is one variant colored with a respective hue, with different colors implying different variants. The dashed purple lines indicate the significance threshold (P = 0.05), and the red box highlights the location of cg11988568. The variant of concomitant associations with cg11988568 methylation and gout (rs754012543) is marked with red color. The associations between variants and gout are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (C) For variant of concomitant associations with cg11988568 methylation and gout (rs754012543), causal inference test indicates cg11988568 methylation mediates relationship between variant (rs754012543) and gout.

Figure S10. Genetic and methyl-quantitative trait locus (meQTL) analysis of cg16745952 (CNTN5).
(A) Regional association plots of nearby variants with methylation levels of cg16745952. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and cg16745952 methylation. The variants with P values less than threshold are labeled with corresponding rs number. The associations between variants and CpG methylation are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets. (B) Regional association plots of nearby variants with gout. X-axis represents positions on the respective chromosome. Y axis represents minus log10P of associations between variants and gout. The variants with P values less than threshold are labeled with corresponding rs number. Every point is one variant colored with a respective hue, with different colors implying different variants. The dashed purple lines indicate the significance threshold (P = 0.05), and the red box highlights the location of cg16745952. The associations between variants and gout are calculated with multiple regression, correcting for sex, age, smoking history (total pack-years), smoking status, alcohol consumption, and blood cell subsets.