Meta-Prediction of the Effect of Methylenetetrahydrofolate Reductase Polymorphisms and Air Pollution on Alzheimer’s Disease Risk

Background: Alzheimer’s disease (AD) is a significant public health issue. AD has been linked with methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism, but the findings have been inconsistent. The purpose of this meta-predictive analysis is to examine the associations between MTHFR polymorphisms and epigenetic factors, including air pollution, with AD risk using big data analytics approaches. Methods and Results: Forty-three studies (44 groups) were identified by searching various databases. MTHFR C677T TT and CT genotypes had significant associations with AD risk in all racial populations (RR = 1.13, p = 0.0047; and RR = 1.12, p < 0.0001 respectively). Meta-predictive analysis showed significant increases of percentages of MTHFR C677T polymorphism with increased air pollution levels in both AD case group and control group (p = 0.0021–0.0457); with higher percentages of TT and CT genotypes in the AD case group than that in the control group with increased air pollution levels. Conclusions: The impact of MTHFR C677T polymorphism on susceptibility to AD was modified by level of air pollution. Future studies are needed to further examine the effects of gene-environment interactions including air pollution on AD risk for world populations.


Introduction
Alzheimer's disease (AD) is a degenerative brain disease and the leading cause of dementia [1], thus causing great public health concerns. In 2016, an estimated 5.3 million people in the U.S. were affected by AD, and approximately 10 million U.S. residents will live with AD by 2050 [2]. AD has a devastating impact on not only every aspect of the lives of those affected and their families, but also on society as a whole. In addition, AD is one of the most costly chronic diseases [3]. Between 2015 and 2016, the estimated direct U.S. costs of AD were $236 billion [1], and the indirect costs (e.g., unpaid caregiving, loss or reduction of income, and benefits for caregivers) amounted to another $221.3 billion [4]. To date, the etiology of AD remains unclear, and it is likely a multifactorial disorder involving genetic, environmental, and lifestyle interactions [5,6].
We included the articles that: (1) examined the association of the MTHFR C677T and A1298C polymorphisms and AD risk using a case-control design; (2) defined AD cases using the criteria developed by the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer's Disease and Related Disorders Association (ADRDA) [35]; (3) included the genotype frequency in both case and control groups; and (4) were written in English, or non-English ones that provided tables with genotype allele frequencies for both case and control groups. Articles were excluded if they: (1) were not based on case-control design; or (2) did not include complete genotype frequency counts per case and control groups. We searched the previously mentioned databases three times at least 3 months apart until no additional articles were identified.
Of the 94 identified articles based on the inclusion/exclusion criteria, we excluded 26 that were not case-control studies. We further eliminated 23 articles that lacked genotype allele counts. We then removed two articles for duplicate use of data [38,39]. One publication [40] included two study cohorts (American and Italian). As a result, we included 43 articles with 44 case-control groups in the final analysis ( Figure 1, online supplementary Table S1). Study populations were drawn from continents across the globe (Europe, North America, South America, Asia, the Middle East, and Africa). The most investigated racial or ethnic populations in these studies were Asian (24 studies, including four South Asian) followed by Caucasian (13 studies), Middle Eastern (three studies), mixed race (two studies), and African (two studies) (see Table S1). Among these studies, one included participants with vascular Alzheimer's disease [41]. We present the distributions of MTHFR C677T polymorphism per countries for control and AD case groups in Figure S1a, and MTHFR A1298C polymorphism for control and AD case groups in Figure S1b. non-English ones that provided tables with genotype allele frequencies for both case and control groups. Articles were excluded if they: (1) were not based on case-control design; or (2) did not include complete genotype frequency counts per case and control groups. We searched the previously mentioned databases three times at least 3 months apart until no additional articles were identified.
Of the 94 identified articles based on the inclusion/exclusion criteria, we excluded 26 that were not case-control studies. We further eliminated 23 articles that lacked genotype allele counts. We then removed two articles for duplicate use of data [38,39]. One publication [40] included two study cohorts (American and Italian). As a result, we included 43 articles with 44 case-control groups in the final analysis ( Figure 1, online supplementary Table S1). Study populations were drawn from continents across the globe (Europe, North America, South America, Asia, the Middle East, and Africa). The most investigated racial or ethnic populations in these studies were Asian (24 studies, including four South Asian) followed by Caucasian (13 studies), Middle Eastern (three studies), mixed race (two studies), and African (two studies) (See Table S1). Among these studies, one included participants with vascular Alzheimer's disease [41]. We present the distributions of MTHFR C677T polymorphism per countries for control and AD case groups in Figure  S1a, and MTHFR A1298C polymorphism for control and AD case groups in Figure S1b.

Data Extraction
Two raters independently extracted the data from the included articles. When there were discrepancies during data extraction, we cross-checked discrepancies and reached consensus among the members.

Quality Assessment
We scored the studies for quality, using criteria appropriate for assessing the quality of metaanalyses [42] and case-control studies [29,43] (Supplementary Table S1). The quality assessment scale included three categories: (1) external validity, with 10 items on demographic data (scores range from 0-11); (2) internal validity, with 12 items on research methods and procedures (scores range from 0-12); and (3) quality reporting (scores range from 0-6). The total scores ranged from 0 to 29, and a higher score showed higher quality [18].

Data Extraction
Two raters independently extracted the data from the included articles. When there were discrepancies during data extraction, we cross-checked discrepancies and reached consensus among the members.

Quality Assessment
We scored the studies for quality, using criteria appropriate for assessing the quality of meta-analyses [42] and case-control studies [29,43] (Supplementary Table S1). The quality assessment scale included three categories: (1) external validity, with 10 items on demographic data (scores range from 0-11); (2) internal validity, with 12 items on research methods and procedures (scores range from 0-12); and (3) quality reporting (scores range from 0-6). The total scores ranged from 0 to 29, and a higher score showed higher quality [18].

Data Synthesis and Analysis
We used Microsoft Excel (Microsoft Corp, Redmond, WA, USA) to enter data, and StatsDirect Version 3 to perform pooled analyses (2005, StatsDirect, Cheshire, UK). We pooled risk ratios (RR) for the associations of MTHFR polymorphisms and AD risk. We used both Cochran's Q-statistic and I-square (I 2 ) to determine the between-study heterogeneity [44], I 2 statistic is better at assessing inconsistencies across study results regardless of the number of included studies [45]. If the result of the Q test was p < 0.05, it indicated the heterogeneity. If there was a significant heterogeneity among the included studies, we used a random effect model [46]. On the other hand, if there was little heterogeneity among the included studies, we used a fixed effect model.
When computing the standardized ratios for RRs, we used the total counts of all three MTHFR C677T genotypes (homozygous TT, heterozygous CT, and wild-type CC genotypes), or MTHFR A1298C genotypes (homozygous CC, heterozygous AC, and wild-type AA genotypes) as the denominators. Compared to the method using only one of the genotypes as denominator, our approach helped to identify the sources of heterogeneity of the findings [47,48]. Additionally, we examined the sources of heterogeneity using subgroup analyses by geographic regions and other potential contributing factors such as air pollution levels, various AD types, sources of control, and quality score. Further, we used SAS's JMP 12 program (2013, SAS Institute, Cary, NC, USA) to generate the GIS maps representing geographic patterns and global distributions of MTHFR polymorphism and AD risks [49]. Additionally, we used big data analytics including partition trees, nonlinear association curve fit, and heat maps to explore the sources of heterogeneity [18]. The annual death rates from air pollution (AP death) at various geographical areas were reported by the World Health Organization (WHO) as number of deaths per million population (Level 2 = 50-100 deaths/million; Level 3 = 100-250/million; and Level 4 = 250-400 or greater/million) [50].
For the individual studies that reported Hardy-Weinberg Equilibrium (HWE) results, we verified the reported HWE status and reported any discrepancies (Table S1). On studies that showed HWE discrepancies, we performed additional subgroup analyses, and the results confirmed no significant differences between the analyses. Therefore, all studies were included in the final meta-analysis [51]. To detect publication bias, we used Egger's test and funnel plots [52,53]. An asymmetric plot suggested a possible publication bias and a p value of Egger's test less than 0.05 was considered representative of significant publication bias [54]. To assess the stability of the pooled results, we performed a sensitivity analysis by studies with potential differences such as vascular AD to examine the influence of individual studies on the results.

Meta-Analysis
For pooled analyses on MTHFR C677T polymorphism, we included a total of 4732 AD cases and 5979 controls from 44 study groups (see Table 1). The frequencies of the MTHFR homozygous TT genotype were highest in East Asian samples (21.39%), followed by Caucasian (15.76%), Middle Eastern (13.57%), African (11.11%), mixed populations (7.20%), and South Asian (5.08%) ( Table 1). For pooled analyses on MTHFR A1298C genotypes, we included a total of 564 AD cases and 741 controls from six studies (see Supplementary Table S2). Due to a small sample size, we were unable to identify the different distributions of genotypes on MTHFR A1298C across ethnic groups.  Table 1). Pooled analyses did not show a significant statistical link between MTHFR A1298C mutation and AD risk (see Table S2).
In the subgroup analysis, MTHFR C677T TT genotype was associated with increased AD risk in East Asian ( Because of significant heterogeneity across regions, we analyzed subgroups based on: (1) the countries that had MTHFR C677T TT genotype as a risk type (RR > 1) (see Figure 2); (2) the countries that had it as a protective genotype (RR < 1) (Supplemental Figure S2a); and (3) others (RR varied around 1) (see Figure S2b). Figure 2 showed that MTHFR C677T TT homozygous genotype was a risk type for some European countries (Poland, Italy, and Ireland), U.S., Asian countries (Japan, South Korea, China and India), Iran, and Egypt. While there were heterogeneity in some of these countries including Italy, China, and India, the overall RR for MTHFR C677T TT genotype within these countries were greater than 1 when studies within each of these countries were pooled. Therefore, these countries were listed with the countries having RR > 1. Noteworthy, a study from South Korea included vascular AD cases that had higher than average risk of a RR being greater 2, which presented MTHFR C677T TT genotype as a potential causal factor of vascular AD. In contrast, MTHFR C677T TT genotype was a protective type for other European countries (Sweden and Germany), Israel, and Tunisia (see Figure S2a). Brazil showed mixed results: one study suggested homozygous TT was a protective genotype of AD, while another suggested the harmful effect (see Figure S2b).

Meta-Prediction
For meta-prediction, we performed both partition tree and Turkey's test to examine the potential interaction between AP death and MTHFR polymorphisms (see Table 2 and Figure 3). We present the partition tree (split groups) and Tukey's test results side by side for MTHFR C677T genotypes and AP death risks in Table 2. The partition tree split the data into two groups by levels of annual AP death rates (Levels, 2 = 50-100, 3 = 100-250, 4 = 250-400 death/million populations). There were significant differences between Levels 2 and 3 (p = 0.0021) also Levels 2 and 4 (p = 0.0029) for percentage of MTHFR C677T TT genotype in the case group. Similarly, there were significant differences between Levels 2 and 3 (p = 0.019) also Levels 2 and 4 (p = 0.0457) for percentages of MTHFR C677T CT genotype in the control group. However, there was no significance for the RRs on the various genotypes (RRCC, RRCT, and RRTT), despite small AICc (smaller is better) on the partition tree analyses. The partition tree and the Tukey's tests were not performed for MTHFR A1298C genotypes due to small number of studies. causal factor of vascular AD. In contrast, MTHFR C677T TT genotype was a protective type for other European countries (Sweden and Germany), Israel, and Tunisia (see Figure S2a). Brazil showed mixed results: one study suggested homozygous TT was a protective genotype of AD, while another suggested the harmful effect (see Figure S2b).

Meta-Prediction
For meta-prediction, we performed both partition tree and Turkey's test to examine the potential interaction between AP death and MTHFR polymorphisms (see Table 2 and Figure 3). We present the partition tree (split groups) and Tukey's test results side by side for MTHFR   nonlinear fit to examine the associations between AP death, the percentages of each genotype and AD risks. The data distributions were further presented by using a heat map with color spectrum. The nonlinear associations between AP death rates and percentages of MTHFR polymorphisms were examined and plotted in Figure 3. With a change in AP deaths from low (Level 2) to high (Level 3 and 4), there was a substantial increase in the percentages of MTHFR C677T TT homozygous genotype in both case and control groups (Figure 3, left graph); however higher percentages of MTHFR C677T TT genotype were noted in the AD group than that in the control group with increased air pollution levels. On the heat map ( Figure S3), a high concentration of MTHFR C667T TT genotype appeared in areas with highly polluted air (Level 4 zone) with red blocks. As shown in Figure 3 (right graph), the percentage of MTHFR C667T CT heterozygous genotype increased substantially in the control group from low AP death rates (Level 2) to high AP death rates (Levels 3 and 4); however, higher percentages of MTHFR C677T CT genotype were noticeable in the AD group than that in the control group with increased air pollution levels. To detect regional patterns, we used GIS maps generated by the SAS JMP program to visualize the geographic distribution of MTHFR C677T polymorphism and AD risk across countries/regions (see Figure S4 for combined TT and CT genotypes, Figure S5 for TT homozygous genotype, and Figure S6 for CT heterozygous genotype). On the third map of each Figures, RRs were presented in chromatic color spectrum with the red color representing AD To ensure consistency with other meta-analyses and allow easy comparison, we performed both conventional analyses (Tukey's test) and big data analytics (partition trees) when examining interaction between gene mutation and air pollution (AP), and its prediction on AD risks. The advanced techniques such as the recursive partition tree, nonlinear fit, and heat maps are able to predict more precisely and accurately by integrating data from diverse sources. The partition-based goodness-of-fit was judged by using the Akaike's information criterion correction (AICc). A smaller AIC suggests a better model [55]. To compare AICc results with the partition trees, we used the Turkey's test [56]. All p values were two-tailed with a significance level at 0.05. We also used nonlinear fit to examine the associations between AP death, the percentages of each genotype and AD risks. The data distributions were further presented by using a heat map with color spectrum. The nonlinear associations between AP death rates and percentages of MTHFR polymorphisms were examined and plotted in Figure 3. With a change in AP deaths from low (Level 2) to high (Level 3 and 4), there was a substantial increase in the percentages of MTHFR C677T TT homozygous genotype in both case and control groups (Figure 3, left graph); however higher percentages of MTHFR C677T TT genotype were noted in the AD group than that in the control group with increased air pollution levels. On the heat map ( Figure S3), a high concentration of MTHFR C667T TT genotype appeared in areas with highly polluted air (Level 4 zone) with red blocks. As shown in Figure 3 (right graph), the percentage of MTHFR C667T CT heterozygous genotype increased substantially in the control group from low AP death rates (Level 2) to high AP death rates (Levels 3 and 4); however, higher percentages of MTHFR C677T CT genotype were noticeable in the AD group than that in the control group with increased air pollution levels.
To detect regional patterns, we used GIS maps generated by the SAS JMP program to visualize the geographic distribution of MTHFR C677T polymorphism and AD risk across countries/regions (see Figure S4 for combined TT and CT genotypes, Figure S5 for TT homozygous genotype, and Figure S6 for CT heterozygous genotype). On the third map of each Figures, RRs were presented in chromatic color spectrum with the red color representing AD risk; while the green color standing for protective effects. In Figure S4, combined MTHFR C667T TT and CT genotypes were observed as highest risks in Asia, Africa, and South America, followed by North America, then Europe. A similar pattern was observed in Figures S5 and S6. The countries ranking from highest AD risk to protective on MTHFR C677T homogenous TT genotype was from Asia including Japan, China, India; then Iran, then Africa including Egypt and Tunisia, then America including Brazil and U.S., and finally Europe including Germany and Sweden. Unlike studies from Germany and Netherlands, studies from Italy, Poland, and Ireland showed significant AD risk in populations with MTHFR C677T homogenous TT genotype ( Figure S5).
The pooled analyses did not show a statistically significant link between MTHFR A1298C polymorphism and AD risk (see Table S2). GIS maps were further generated and demonstrated the potential associations between the geographic pattern of MTHFR A1298C polymorphism and AD risk, as well as their impact on AD risk among studies conducted in India, Japan, Tunisia, Poland, Germany, and Brazil (see Figure S7 for combined CC and AC genotypes, Figure S8 for CC homozygous genotype, and Figure S9 for AC heterozygous genotype). The countries with the highest frequency of MTHFR A1298C polymorphism (in dark red) in AD cases were India, followed by Tunisia, Poland, Germany, and Brazil (see Figure S7, second map). The countries with highest frequency of MTHFR A1298C CC homozygous genotype were India, followed by Japan, Poland, Germany, then Brazil in AD cases ( Figure S8, second map). The countries with highest MTHFR A1298C heterozygous AC genotype in AD cases were Tunisia, followed by India, Brazil, Germany, and Poland ( Figure S9, second map).

Discussion
Similar to the results reported by previous meta-analyses [13,[23][24][25][26] our findings showed a significant association between MTHFR C677T polymorphism and AD risk in all samples pooled from 44 study groups with great heterogeneity across geographic areas. To expand the findings from previous meta-analyses, we further conducted meta-prediction to examine potential impact of epigenetic factors, air pollution, on the link between MTHFR polymorphisms and AD risks. The nonlinear plots and heat maps demonstrated the percentage of MTHFR C677T TT genotype in both case and control groups increased substantially from the regions with low to high levels of air pollution. Compared to control group, higher percentages of MTHFR C677T TT and CT genotypes were noticeable for the AD group with the increased level of air pollution. The underlying physiological mechanism of this association pattern could be that global pollution associated with the greenhouse effect may diminish MTHFR enzyme functions and compromise methylation pathways, impairing health in populations with chronic diseases such as AD [18,57]. In addition, studies showed direct effect of air pollution on nervous system. Increasing evidence has implicated air pollution as a chronic source of neuro-inflammation that contributes neuro-degenerative changes in stroke, Alzheimer's and Parkinson's diseases [58][59][60].
We further demonstrated the association of MTHFR C677T polymorphism with AD risks using big data analytics including GIS maps, and found the risks in countries from Asia (Japan, South Korea, China, and India), as well as Iran and North America (U.S.). Specifically, a study from South Korea included vascular AD cases presented a higher than average RR (2.1) (Table S1), which presented MTHFR C677T TT genotype as a potentially significant causal factor of vascular AD based on the criteria commonly used in the international consensus panels. However, mixed results were observed in Europe, Africa and South America. While some studies conducted in Europe (Poland, Italy, and Ireland), African (Egypt) and a mixed population residing in one region of Brazil showed MTHFR C677T polymorphism as significant risk factors for AD, other studies conducted in Germany, Sweden, Tunisia, and Brazil had the opposite protective effects. Meta-predictive analyses presented that inconsistent evidence could be explained by variations in the percentages of MTHFR C677T polymorphism and the potential gene-environment interactions from air pollution across various geographic areas. The GIS maps further visually demonstrated the source of heterogeneity in the associations between MTHFR C677T polymorphism and AD risks, which provided an intuitive insight to the different strengths of linkage between MTHFR C677T polymorphism rates and AD risks in regions.

Conclusions
MTHFR C677T polymorphism was associated with increased risk of AD. Epigenetic mechanisms including environmental toxins from air pollution may affect the development of AD through modifying the expressions of genes in the methylation pathways. Additional studies are needed to examine the roles of epigenetic factors in the methylation and metabolism pathways. In the meantime, proactive strategies could be implemented in cities with significant air pollution to prevent AD and promote the health of susceptible populations.
Supplementary Materials: The following are available online at www.mdpi.com/1660-4601/14/1/63/s1. Figure S1. (a) MTHFR C677T percentage of mutations per control and Alzheimer's (AD) case groups; (b) MTHFR A1298C percentage of mutations per control and Alzheimer's (AD) case groups. Figure S2. (a) Forest plot for meta-analysis of MTHFR C677T polymorphism by TT genotype, countries with risks <1; (b) Forest plot for meta-analysis of MTHFR C677T polymorphism by TT genotype, countries with risks varied~1. Figure S3. Heat maps of MTHFR C677T homozygous TT polymorphisms for control and case groups in association with annual deaths from air pollution (TT%7ct: percentage of MTHFR 677 TT in control group; TT%7ca: percentage of MTHFR 677 TT in case group; AP Death: Death rates per million population: Levels 2 = 50-100 deaths, 3 = 100-250 deaths, 4 = 250-400+ deaths). Figure S4. Geographic information maps for percentages of MTHFR C677T TT plus CT genotypes per control and Alzheimer's disease (AD) case groups, and their associations with AD risks (TTCT%7ct: percentage of MTHFR 677 TT + CT genotypes in control group; TTCT%7ca: percentage of MTHFR 677 TT + CT genotypes in case group; RR7TTCT: the relative risk between percentage of MTHFR 677 TT + CT genotypes and development of AD). Figure S5. Geographic information maps for percentages of MTHFR C677T TT genotype per control and Alzheimer's disease (AD) case groups, and its association with AD risks (TT%7ct: percentage of MTHFR 677 TT genotype in control group; TT%7ca: percentage of MTHFR 677 TT genotype in case group; RR7TT: the relative risk between percentage of MTHFR 677 TT genotype and development of AD). Figure S6. Geographic information maps for percentages of MTHFR C677T CT genotype per control and Alzheimer's disease (AD) case groups, and its association with AD risks (CT%7ct: percentage of MTHFR 677 CT genotype in control group; CT%7ca: percentage of MTHFR 677 CT genotype in case group; RR7TT: the relative risk between percentage of MTHFR 677 CT genotype and development of AD). Figure S7. Geographic information maps for percentages of MTHFR A1298C CC + AC genotypes per control and Alzheimer's disease (AD) case groups, and their associations with AD risks (CCAC%8ct: percentage of MTHFR 1298 CC + AC genotypes in control group; CCAC%8ca: percentage of MTHFR 1298 CC + AC genotypes in case group; RR8CCAC: the relative risk between percentage of MTHFR 1298 CC + AC genotypes and development of AD). Figure S8. Geographic information maps for percentages of MTHFR A1298C CC genotype per control and Alzheimer's disease (AD) case groups, and their associations with AD risks (CC%8ct: percentage of MTHFR 1298 CC genotype in control group; CC%8ca: percentage of MTHFR 1298 CC genotype in case group; RR8CC: the relative risk between percentage of MTHFR 1298 CC genotype and development of AD). Figure S9. Geographic information maps for percentages of MTHFR A1298C AC genotype per control and Alzheimer's disease (AD) case groups, and their associations with AD risks (AC%8ct: percentage of MTHFR 1298 AC genotype in control group; AC%8ca: percentage of MTHFR 1298 AC genotype in case group; RR8AC: the relative risk between percentage of MTHFR 1298 AC genotype and development of AD). Table S1. Summary of MTHFR 677 and 1298 loci distributions for included studies on Alzheimer's disease (AD) by geographic location (43 papers, 44 study groups with genotype counts for control groups). Table S2. Pooled Meta-Analysis: MTHFR A1298C Genotypes and Risks of Alzheimer's disease (AD).