DNA Methylation Patterns According to Fatty Liver Index and Longitudinal Changes from the Korean Genome and Epidemiology Study (KoGES)

The role of differentially methylated regions (DMRs) in nonalcoholic fatty liver disease (NAFLD) is unclear. This study aimed to identify the role of DMR in NAFLD development and progression using the Korean Genome and Epidemiology Study (KoGES) cohort. We used laboratory evaluations and Illumina Methylation 450 k DNA methylation microarray data from KoGES. The correlation between fatty liver index (FLI) and genomic CpG sites was analyzed in 322 subjects. Longitudinal changes over 8 years were confirmed in 33 subjects. To identify CpG sites and genes related to FLI, we obtained enrichment terms for 6765 genes. DMRs were identified for both high (n = 128) and low (n = 194) groups on the basis of FLI 30 in 142 men and 180 women. To confirm longitudinal changes in 33 subjects, the ratio of follow-up and baseline investigation values was obtained. Correlations and group comparisons were performed for the 8 year change values. PITPNM3, RXFP3, and THRB were hypermethylated in the increased FLI groups, whereas SLC9A2 and FOXI3 were hypermethylated in the decreased FLI groups. DMRs describing NAFLD were determined, and functions related to inflammation were identified. Factors related to longitudinal changes are suggested, and blood circulation-related functions appear to be important in the management of NAFLD.


Introduction
The global prevalence of nonalcoholic fatty liver disease (NAFLD) has increased to 25% of the general population and is the main cause of chronic liver disease [1][2][3]. Although most patients show a benign clinical course [4], 20-33% of patients progress to nonalcoholic steatohepatitis (NASH) with fibrosis, which can develop to liver cirrhosis and hepatocellular carcinoma [5,6]. In addition, NAFLD is associated with elevated risk for cancer development and cardiovascular diseases [7][8][9]. Several factors are associated with the development and progression of NAFLD, including age, sex, genetics, epigenetics, and other metabolic factors [10][11][12]. The relationship between NAFLD development and progression and genetic and epigenetic factors has not been fully elucidated. Therefore, screening of NAFLD patients in the general population is important to identify patients with a high risk of disease progression. NAFLD can be diagnosed when patients have hepatic steatosis confirmed by liver biopsy and imaging studies such as ultrasonography, computed tomography (CT), and magnetic resonance imaging (MRI)

Subjects and Data Source
The epidemiological and epigenomic datasets of this study were collected from a survey during 2001-2002 and were obtained from the AS-AS cohort of the KoGES released by the Korea Center for Disease Control and Prevention [21]. Participants were 40-69 years of age and belonged to the Ansan and Ansung communities located in Gyeonggi province, South Korea. All the participants provided written informed consent. For DNA methylation data, the matrix provided by KoGES was used, and the detailed bioinformatics process followed has been described in a previous study [33]. This study was approved by the Institutional Review Board (IRB) of Korea University (approval number: KUIRB-2020-0191-01) and was performed in accordance with the Declaration of Helsinki.

Study Design
In this study, we investigated the epigenetic patterns of NAFLD. We divided the participants into two groups on the basis of one-time (n = 322) and longitudinal (n = 33) analyses. Flowcharts describe the conditions under which the subjects were excluded from the analysis and classified (Figure 1). The Pearson correlation between the CpG beta value and FLI was measured for the 322 patients included in the primary analysis. The comparison pattern between the beta value and FLI was analyzed over time. As a case-control study, we selected high and low intergroup differentially methylated regions (DMRs) for 322 subjects on the basis of the FLI 30. For the 33 patients included in the longitudinal study, CpG sites were extracted for the group with increased and decreased beta value ratios.
The longitudinal change in FLI was calculated by dividing the baseline by the followup. To categorize longitudinal change in FLI, it was divided into an up group when it was more than 1.1 and a down group when it was less than 0.9. Out of a total of 33 subjects, except for 10 subjects, 13 subjects were classified as up and 10 subjects as down. The characteristics of each group are presented in Table 1.
Curr. Issues Mol. Biol. 2022, 2, FOR PEER REVIEW 3 by the Institutional Review Board (IRB) of Korea University (approval number: KUIRB-2020-0191-01) and was performed in accordance with the Declaration of Helsinki.

Study Design
In this study, we investigated the epigenetic patterns of NAFLD. We divided the participants into two groups on the basis of one-time (n = 322) and longitudinal (n = 33) analyses. Flowcharts describe the conditions under which the subjects were excluded from the analysis and classified (Figure 1). The Pearson correlation between the CpG beta value and FLI was measured for the 322 patients included in the primary analysis. The comparison pattern between the beta value and FLI was analyzed over time. As a case-control study, we selected high and low intergroup differentially methylated regions (DMRs) for 322 subjects on the basis of the FLI 30. For the 33 patients included in the longitudinal study, CpG sites were extracted for the group with increased and decreased beta value ratios.
The longitudinal change in FLI was calculated by dividing the baseline by the followup. To categorize longitudinal change in FLI, it was divided into an up group when it was more than 1.1 and a down group when it was less than 0.9. Out of a total of 33 subjects, except for 10 subjects, 13 subjects were classified as up and 10 subjects as down. The characteristics of each group are presented in Table 1. Of the 446 samples, 50 samples were reanalyzed at follow-up after 8 years. A total of 322 subjects with the laboratory parameters necessary to calculate the fatty liver index (FLI) were obtained. Among them, samples were available for 33 subjects and were followed up. The correlation between the FLI value and the beta value was obtained and presented as correlation plots for each CpG. On the basis of the FLI value of 30, we selected differentially methylated regions (DMRs) between high and low groups. A total of 322 subjects with the laboratory parameters necessary to calculate the fatty liver index (FLI) were obtained. Among them, samples were available for 33 subjects and were followed up. The correlation between the FLI value and the beta value was obtained and presented as correlation plots for each CpG. On the basis of the FLI value of 30, we selected differentially methylated regions (DMRs) between high and low groups.

DNA Methylation and Bioinformatics Analysis
In this study, genomic DNA extracted from the peripheral blood of subjects was used to evaluate DNA methylation levels. Genomic DNA (500 ng) from each sample was modified by treatment with sodium bisulfite provided in the EZ DNA methylation kit (Zymo Research, Irvine, CA, USA), according to the manufacturer's instructions. Genome-wide DNA methylation was profiled using the Illumina Infinium Human Methylation 450 k BeadChip (Illumina, San Diego, CA, USA), which contains over 485,000 CpG probes covering 99% of the RefSeq genes. Each CpG probe has a beta value ranging from 0 to 1; higher methylation of CpG gives rise to a value closer to 1. We analyzed 364,050 CpG probes in 423 subjects, excluding the missing values. Furthermore, the gene symbol and genomic location corresponding to each CpG probe of the Illumina 450 k methylation chip were obtained using the "getAnnotation" function of the IlluminaHumanMethylationEPI-Canno.ilm10b2.hg19 library provided by Bioconductor (https://www.bioconductor.org, accessed on 25 February 2022).

Bioinformatics Analysis
DMRs were identified using a t-test between the high and low FLI in each of the four groups. The filtered DMRs were visualized as volcano plots and heatmaps. Enrichment analyses were performed using DMRs and visualized as KEGG and GO dot plots, gene concept networks, and heatmaps. Enrichment analysis was performed using the "cluster-Profiler" and "DOSE" packages. We used the "pheatmap" package to provide heatmaps. For the correlation analysis, we used Python v3.6.9, based on the Google Colaboratory platform and the "corr" function of the "pingouin" package.

Study Processes
The process followed in our study for the selection, classification, and variant processing of the study subjects is presented as a flowchart in Figure 1. We recruited 9351 participants, and 13 variables were used to process the FLI. When missing values were detected in at least one of the 13 variables, the subjects were dropped. Subjects who consumed excessive alcohol (men > 210 g/week and women > 140 g/week) were excluded from the study. Finally, 7067 subjects were followed up.
DNA methylation analysis was performed on 322 subjects, which is the intersection of 446 subjects whose methylation patterns were analyzed and 7067 subjects who met the filtering criteria. From 446 subjects in the base survey, the methylation of 50 subjects in the fifth survey (fourth follow-up) was re-analyzed. To confirm longitudinal changes, we retrieved 33 subjects, which is the intersection of 50 subjects whose methylation patterns were re-analyzed and 7067 subjects for the filtering criteria. Nineteen demographic and laboratory evaluations, including age, sex, TG, GTP, AST, and ALT, associated with NAFLDs were analyzed. Male participants were classified into the FLI-high group (Table 1).

Correlation Analysis between CpG Site Methylation and FLI
Pearson's correlation coefficient was calculated using the beta value of each probe and the FLI. From 403,129 correlation results, 15,628 probes with power >0.99 were selected. Nine genes with the top nine r-squared values were selected and are presented (Figure 2a). Nine CpG probes were retrieved (CD8A, CTU1, TMEM87B, BCL2, AP1M2, PRTFDC1, KIAA1429, cg13188812, and cg08257212). The last two probes were non-annotated (NA) probes (cg13188812) located in the intergenic region (cg08257212).
Among the duplicate symbols, probes with a low r-squared value and those located intergenically were excluded. Subsequently 5879 gene symbols were identified. GO terms were extracted for 5258 genes with positive correlations and 621 genes with negative correlations, and the top 20 terms are presented as dot plots (Figure 2b,c). The top 20 KEGG terms are shown (Figure 2d). A network analysis was performed on 892 genes with an absolute value of correlation coefficient >0.28, and five terms were connected (Figure 2e).

Identification of DMRs
Because of differences in DNA methylation patterns between female and male subjects, the groups were divided according to sex to avoid bias. We examined the demographic variables to classify the subjects on the basis of the FLI 30 and two sexes for methylation analysis, as shown in Table 1. Volcano plots depict DNA methylation matrices of the two sexes. Points that met the conditions are marked in pink (highly methylated in FLI > 30) and blue (highly methylated in FLI ≤ 30). In each plot, the x-axis represents the fold change (FC), and the y-axis represents the p-value on a −log10 scale. The area that meets the threshold of FC and p-value is represented by two vertical lines and one horizontal line. On the basis of these criteria, differences in methylation levels were confirmed between the high-and low-FLI groups in both sexes (Figure 3a,b). Two pairs of DMRs are presented as heatmaps (Figure 3c,d). In each sex analysis, |FC| subject to >0.08 and p-value < 0.01, 17 probes for males and 34 probes for females were selected. Heatmaps were generated for the two sexes and showed variation in DNA methylation between the two FLI groups. Column annotation bars indicate three parameters in the samples: FLI group, age, and sex (male and female).
On the basis of the FCs, GO and KEGG terms were retrieved (GO; Figure 3e-h, KEGG: Figure 3i,j). A network analysis was then performed (Figure 3k,l). Five terms were identified for each sex, two of which were common (inflammation and anoxia).

Correlation Analysis of Longitudinal Data
To confirm the longitudinal change in DNA methylation, we divided the follow-up by the beta value at baseline. Thus, high values at CpG sites indicate hypermethylation over time. The relationships between FLI and methylation levels of CpG sites were determined by calculating the correlation coefficients and p-values for all CpG sites. Correlation plots were drawn between FLI and CpG methylation levels in the top nine squares of the correlation coefficients (Figure 4a). Correlation coefficients and p-values were calculated for 9256 genes, and 4133 and 5123 genes had positive and negative correlations, respectively, with FLI.
The correlation coefficients of genes were then used to obtain enriched GO (Figure 4b,c) and KEGG terms (Figure 4d). Network analysis was performed, and five nodes were found to be connected to these genes. The closer to red, the higher the beta value, and the FLI and the closer to blue, the lower the FLI (Figure 4e).

Identification of Differently Changed Regions
Two groups, 13 up and 10 down longitudinal changes in FLI, were compared by longitudinal changes in CpG sites. A t-test was performed between the FLI up and down groups for all the 431,651 CpG sites. If the FC of the analysis result is a positive value, it indicates hypermethylation in the FLI up group over time, and a negative value indicates hypermethylation in the FLI down group. Sixteen CpG sites were selected on the basis of the condition of p-value < 0.001 and FC > 0.2, and seven CpG sites were selected on the basis of the condition of FC < −0.2. Selected CpG sites were visualized as volcano plots and heat maps (Figure 5a,b).
A total of 8975 specimens located at the CpG site were selected, and GO analysis was performed on 4370 specimens that had a positive correlation with the amount of change in FLI and 4605 specimens that had a negative correlation (Figure 5c,d). The KEGG enrichment test was also performed and is presented as a dot plot (Figure 5e). As a result of network analysis of genes with a correlation between FLI and methylation change, enrichment terms are presented in a total of five nodes (Figure 5f).

Integration of Four Analyses as Circos Plot
The four analyses were integrated as heatmaps and peaks in one Circos plot ( Figure 6). The Circos plot has eight tracks, heatmaps indicating fold changes or correlation coefficients, and peaks indicating −log10(p-value). The four tracks located outside the Circos plot are the results of the correlation and t-test analysis for 322 subjects at baseline. The four tracks located inside are the results of the correlation and t-test analysis for the 33 people included in the longitudinal study.
In the four analyses, the most statistically significant region was identified in the short arm of chromosome 6. The most significantly differentially expressed genes in each of the four analyses were FUT9, GSTA4, L3MBTL3, and CD83. A significant pattern was observed for chromosomes 17 and 19. Meanwhile, in the baseline analysis, a pattern different from the fold-change or correlation coefficient of the autosome was observed on the X chromosome. As a result of the t-test analysis at baseline, a high overall fold-change was observed in the X chromosome, and the correlation coefficient with FLI was negative.

Discussion
The relationship between epigenetic patterns and FLI was revealed in 322 samples simultaneously, and longitudinal epigenetic changes in 33 samples were confirmed. Using two approaches on 322 samples, we retrieved enrichment terms for highly correlated genes and genes located on the DMRs and visualized them as dot plots of KEGG and GO analysis and networks.
Epigenetics, including DNA methylation, plays an important role in the pathogenesis and progression of metabolic disorders, lifestyles, and cancer [34]. However, the mechanisms by which DNA methylation is involved in the development of NAFLD are not fully understood. Previous studies suggest that oxidative stress can change methylation patterns through 8-hydroxydexoyguanosine  in the liver [35]. Other environmental insults, such as diet-induced hepatic steatosis, can also alter DNA methylation profiles in mouse models [36,37]. The strength of our study is that it elucidates the patterns of DNA methylation associated with FLI under four conditions. Gene and enrichment terms related to different DNA methylation patterns associated with high and low FLI in one-time and longitudinal analyses are presented. A DNA methylation and liver disease model can

Discussion
The relationship between epigenetic patterns and FLI was revealed in 322 samples simultaneously, and longitudinal epigenetic changes in 33 samples were confirmed. Using two approaches on 322 samples, we retrieved enrichment terms for highly correlated genes and genes located on the DMRs and visualized them as dot plots of KEGG and GO analysis and networks.
Epigenetics, including DNA methylation, plays an important role in the pathogenesis and progression of metabolic disorders, lifestyles, and cancer [34]. However, the mechanisms by which DNA methylation is involved in the development of NAFLD are not fully understood. Previous studies suggest that oxidative stress can change methylation patterns through 8-hydroxydexoyguanosine  in the liver [35]. Other environmental insults, such as diet-induced hepatic steatosis, can also alter DNA methylation profiles in mouse models [36,37]. The strength of our study is that it elucidates the patterns of DNA methylation associated with FLI under four conditions. Gene and enrichment terms related to different DNA methylation patterns associated with high and low FLI in one-time and longitudinal analyses are presented. A DNA methylation and liver disease model can be developed through a comprehensive analysis of our results and those of previous studies. It has been suggested that overexpression of MMP-9 in cell lines is related to intergenic hypermethylation [38]. Similarly, it is necessary to develop a validation model related to the genes and terms we proposed in future studies.
In the correlation analysis of the change in FLI with time and gene methylation level, performed in the longitudinal follow-up data, the FLI and the methylation level of the THRB gene showed a positive correlation. Hepatic thyroid hormone receptor beta, encoded by the THRB gene, is involved in systemic lipid regulation and is an important factor in the development and progression of NAFLD. In a gene expression study of human liver biopsy samples, THRB expression was negatively correlated with NASH score. This result matches the positive correlation between FLI and THRB methylation shown in our study [39]. Hepatic THRB is important because THRB agonists are effective in the treatment of NASH [40]. For example, it was found that the NAFLD activity score improved when Resmetirom (a THR-β agonist) was administered in a mouse model of NASH [41]. Our study demonstrated the significance of thyroid hormone receptors in NASH/NAFLD at the methylation level in a Korean cohort.
We analyzed and selected GO and KEGG enrichment terms from DMRs discovered in 322 subjects, and as a result, various pathways related to FLI were detected. Interestingly, the discovered pathways were hypercholesterolemia and hyperlipidemia. Disruption of cholesterol homeostasis is known to cause hepatocyte damage and steatotic changes and thus contributes to the pathogenesis of NAFLD/NASH [42]. Moreover, thyroid hormones are known to have a strong impact on cholesterol level regulation [43]. Therefore, the relationship between hypercholesterolemia and FLI revealed in this study is consistent not only with the results of previous studies but also with the THRB-related data mentioned above.
Left ventricular hypertrophy was discovered among the enrichment terms from the 33 samples from a longitudinal study. Interestingly, left ventricular hypertrophy associated with NAFLD has been studied in relation to type 2 diabetes mellitus. In particular, MYO18B is related to PI3K/AKT/mTOR, a liver cancer-related pathway, and showed a positive correlation with the increase in methylation as the FLI increased over time.
Among the enrichment terms from the two analyses (baseline and longitudinal), hypercholesterolemia and essential hypertension were comorbidities of metabolic syndrome with diabetes, and NAFLD was considered the main risk factor for metabolic syndrome. However, in this study, insulin sensitivity was not selected. Insulin sensitivity plays an important role in the development of NAFLD in metabolic syndrome. This is because the FLI was calculated using limited variables when evaluating NAFLD. In addition, the role of hypercholesterolemia may be overestimated because triglycerides are included in the calculation of the FLI. For the induction and maintenance of insulin resistance, studies related to the mechanism of DNA methylation patterns and related pathways should be conducted.
Factors related to myocardial reperfusion injury have been identified and are related to necroptosis in the steatotic liver. Terms related to anoxia and hypoxia have been discovered and reported to be related to liver disease related to sleep disturbance [44]. As such, we discovered tissue oxygen perfusion-and reperfusion-related factors associated with NAFLD. These findings suggest that NAFLD/NASH is not simply a disease limited to the liver but is also closely related to systemic conditions such as the cardiovascular system.

Conclusions
In this study, using baseline (n = 322) and longitudinal (n = 33) cohorts, we extracted FLI-related changes in DNA methylation using t-tests and correlation analyses. The association between THRB and hypercholesterolemia and essential hypertension has been suggested to be associated with NAFLD.

Patents
A Korea patent application is pending with similar contents to this study. Informed Consent Statement: Informed consent was obtained from all the subjects involved in the study.
Data Availability Statement: KoGES dataset information and data shearing processes can be obtained from the site provided by the National Research Institute of Health, Korea Disease Control and Prevention Agency, Ministry for Health and Welfare, Korea (https://nih.go.kr/contents.es?mid= a50401010100, accessed on 25 February 2022). R source code will be provided upon request.