Smoking-Related DNA Methylation is Associated with DNA Methylation Phenotypic Age Acceleration: The Veterans Affairs Normative Aging Study

DNA methylation may play a critical role in aging and age-related diseases. DNA methylation phenotypic age (DNAmPhenoAge) is a new aging biomarker and predictor of chronic disease risk. While smoking is a strong risk factor for chronic diseases and influences methylation, its influence on DNAmPhenoAge is unknown. We investigated associations of self-reported and epigenetic smoking indicators with DNAmPhenoAge acceleration in a longitudinal aging study in eastern Massachusetts. DNA methylation was measured in whole blood samples from multiple visits for 692 male participants in the Veterans Affairs Normative Aging Study during 1999–2013. Acceleration was defined using residuals from linear regression of the DNAmPhenoAge on the chronological age. Cumulative smoking (pack-years) was significantly associated with DNAmPhenoAge acceleration, whereas self-reported smoking status was not. We observed significant validated associations between smoking-related loci and DNAmPhenoAge acceleration for 52 CpG sites, where 18 were hypomethylated and 34 were hypermethylated, mapped to 16 genes. The AHRR gene had the most loci (N = 8) among the 16 genes. We generated a smoking aging index based on these 52 loci, which showed positive significant associations with DNAmPhenoAge acceleration. These epigenetic biomarkers may help to predict age-related risks driven by smoking.


Introduction
DNA methylation is an aging biomarker that has attracted growing attention in recent years. Tobacco smoking is a leading risk factor for many age-related diseases, including cardiopulmonary diseases and various types of cancers [1]. Several studies have revealed, by means of regulating gene expression and genome stability, that DNA methylation is associated with smoking and smoking-related diseases [2]. Smoking-related CpG sites, located in genes such as AHRR, GPR15, and F2RL3, have been reported in an increasing number of epigenome-wide association studies (EWAS) using whole blood samples [3,4]. A recent study generated a smoking index based on 1501 smoking-related loci, suggesting that such methylation indices could be helpful to predict smoking-related health risks [5].
Smoking is a risk factor of disproportional aging that accelerates epigenetic age and decreases life expectancy [6,7]. Aging is characterized by a series of changes at both the cellular and molecular levels, including telomere length shortening, senescence, and variations in gene expression levels.
Hannum et al. [8,9] and Horvath (2013) developed the first generation of algorithms to estimate DNA methylation age, which correlates well with chronological age. However, those methods only capture CpG sites that correlate with chronological age and fail to account for an individual's physiological status [10].
A newly developed epigenetic biomarker, DNA methylation phenotypic age (DNAmPhenoAge), was derived from a series of clinical measurements [11][12][13][14][15] to offer a better biomarker of biological aging as a substitute measurement of "phenotypic aging". DNAmPhenoAge acceleration is defined as a discrepancy between DNAmPhenoAge and chronological age and has been proposed as an indicator of disproportionate aging.
A recent study associated DNAmPhenoAge with self-reported smoking status [10]. However, no study has investigated the association of smoking-related DNA methylation with DNAmPhenoAge or age acceleration. In the current study, we used a well-established cohort to strengthen and extend prior findings with respect to the association between self-reported and epigenetic smoking indicators and age acceleration in whole blood samples.

Study Subjects
This study included 692 participants from the Normative Aging Study (NAS), a closed longitudinal study of aging in men from eastern Massachusetts [16]. A total of 1596 participants free of chronic diseases have been enrolled in NAS since 1984, and they undergo physical examinations every 3-5 years. At each visit, information on demographic factors, lifestyle, and disease history were collected, and physical examinations and laboratory tests were performed. Specifically, we analyzed DNA samples up to 4 visits from 1999-2013. The current study was restricted to the first three visits to observe long-term DNAmPhenoAge acceleration and control for race heterogeneity. Participants provided written informed consent during each visit, and the Veterans Affairs Boston Health Care System Institutional Review Board approved the study (code: AAAQ8739).

DNA Methylation Data
DNA was extracted from stored blood buffy coats using the QIAamp DNA Blood Kit (Qiagen, Valencia, CA, USA). We treated 500 ng DNA for bisulfite conversion using the EZ-96 DNA Methylation Kit (Zymo Research, Orange, CA, USA). After bisulfite conversion, we hybridized DNA samples to 12-sample Illumina HumanMethylation450 BeadChips per Infinium HD Methylation protocol (Illumina, San Diego, CA, USA). Technical effects resulting from plates/chips were minimized with a two-stage, age-stratified algorithm to randomize the samples. The β-values, ranging from 0-1, were used to represent the methylation status of a specific CpG site, with 0 indicating no methylation and 1 indicating full methylation.
Calculation of DNAmPhenoAge: DNAmPhenoAge was estimated using the method described by Levine et al. [10]. In short, phenotypic age was developed using clinical measurements from the third National Health and Nutrition Examination Survey. A penalized Cox regression model was used to select 10 out of 42 biomarkers. Phenotypic age was then regressed on DNA methylation data from whole blood samples based on an elastic net regression analysis, which led to 513 CpG sites. We calculated DNAmPhenoAge based on these loci for each participant as: Values of coefficients and intercepts were extracted from Levine et al. [10]. DNAmPhenoAge acceleration was defined as residuals from a linear regression of DNAmPhenoAge on chronological age.
Selection of smoking-related DNA methylation loci: We selected 151 CpG sites reported from previous smoking EWAS at least twice from a previous systematic review [3]. There was no overlap between these 151 loci and the 513 loci used to predict DNAmPhenoAge.

Statistical Analysis
Descriptive statistics were used to present distributions of demographic and clinical variables among all participants. Distributions of these variables across each visit were described as well. Generalized linear mixed models were used to assess the relationship between smoking indicators and DNAmPhenoAge acceleration ( Figure 1). We conducted two parts of statistical analysis: first, to explore the association between self-reported smoking indicators and DNAmPhenoAge acceleration; and second, to explore the association between epigenetic smoking indicators and DNAmPhenoAge acceleration. third National Health and Nutrition Examination Survey. A penalized Cox regression model was used to select 10 out of 42 biomarkers. Phenotypic age was then regressed on DNA methylation data from whole blood samples based on an elastic net regression analysis, which led to 513 CpG sites. We calculated DNAmPhenoAge based on these loci for each participant as: Values of coefficients and intercepts were extracted from Levine et al. [10]. DNAmPhenoAge acceleration was defined as residuals from a linear regression of DNAmPhenoAge on chronological age.
Selection of smoking-related DNA methylation loci: We selected 151 CpG sites reported from previous smoking EWAS at least twice from a previous systematic review [3]. There was no overlap between these 151 loci and the 513 loci used to predict DNAmPhenoAge.

Statistical Analysis
Descriptive statistics were used to present distributions of demographic and clinical variables among all participants. Distributions of these variables across each visit were described as well. Generalized linear mixed models were used to assess the relationship between smoking indicators and DNAmPhenoAge acceleration ( Figure 1). We conducted two parts of statistical analysis: first, to explore the association between self-reported smoking indicators and DNAmPhenoAge acceleration; and second, to explore the association between epigenetic smoking indicators and DNAmPhenoAge acceleration. First, we examined the associations between DNAmPhenoAge acceleration and self-reported smoking indicators smoking status (never, current, or former) and cumulative smoking status (packyears). Model 1 was adjusted for age (years) and leukocyte distribution was estimated using the Houseman et al.'s algorithm [17]. In addition, both visits and random batch effects of methylation measurement were included in the model as random effects. Additionally, Model 2 was adjusted for smoking status, alcohol consumption (abstainer, low, intermediate, or high), body mass index (BMI) First, we examined the associations between DNAmPhenoAge acceleration and self-reported smoking indicators smoking status (never, current, or former) and cumulative smoking status (pack-years). Model 1 was adjusted for age (years) and leukocyte distribution was estimated using the Houseman et al.'s algorithm [17]. In addition, both visits and random batch effects of methylation measurement were included in the model as random effects. Additionally, Model 2 was adjusted for smoking status, alcohol consumption (abstainer, low, intermediate, or high), body mass index (BMI) (underweight or normal weight BMI <25, overweight BMI ≥25 to <30, or obese BMI ≥30), physical activity (low, metabolic equivalent of task (MET) ≤12 kcal/kg hours/week; median, MET 12-30 kcal/kg hours/week; or high, MET ≥30 kcal/kg hours/week), years of education (≤12 years, 13-16 years, or >16 years), hypertension, stroke, coronary heart disease, and diabetes (yes/no).
Second, we tested if epigenetic indicators were associated with DNAmPhenoAge acceleration. The aforementioned fully adjusted model (Model 2) was used to explore associations between DNAmPhenoAge acceleration and each of the 151 smoking-related loci. False discovery rate (FDR)-adjusted [18] P-values were used to account for multiple testing. The alpha threshold for FDR-adjusted P-value was set at ≤0.05. We performed bootstrap analysis to validate the selected CpGs. We resampled the original dataset to create 1000 simulated datasets for each locus. Then, we ran the model in each dataset and returned the effect estimates for the tested CpG sites. We counted the effect estimates that were greater than or equal to the observed value and divided that count by the number of estimates. We again used FDR-adjusted P-values to select the most robust loci.
Finally, we generated a smoking aging index based on the verified 52 CpG sites [5]. The mean β-value (µ c ) and standard deviation (SD) (σ c ) were calculated across the never-smokers for each selected CpG. The smoking aging index was defined as where, W c is +1 if the smoking-associated CpG, c, is hypermethylated or −1 if it is hypomethylated in smokers, and β c is the β-value of this CpG in sample s [5]. The smoking aging index for each participant was calculated based on these validated DNAmPhenoAge acceleration-related loci, which were validated by the bootstrap method. Statistical analysis was performed using R 3.5.1 (R core team, 2018).

Characteristics of Participants
Demographic and clinical characteristics of the participants at all three visits are shown in Table 1. Participants were 55-94 years old with a mean (SD) age of 72.7 (6.7) years at first visit. The mean (SD) of the DNAmPhenoAge was 68.4 (11.9) years at first visit, younger than the chronological age. The chronological age and the DNAmPhenoAge were correlated with each other (r = 0.42, P < 0.001) ( Figure 2). The majority of the participants in the cohort were former smokers and never smokers, accounting for 66% and 30% at the first visit, respectively. The percentage for current smokers was only 4% at the first visit. The total years of smoking was 16.9 years on average at the first visit.   Table 2 illustrates the association of DNAmPhenoAge acceleration with self-reported (smoking status and pack-years) indicators in both the basic model (Model 1) and fully adjusted model (Model 2). Only current smokers had significantly higher DNAmPhenoAge acceleration in Model 1, although not in Model 2. However, cumulative smoking (pack-years) was significantly associated with   Table 2 illustrates the association of DNAmPhenoAge acceleration with self-reported (smoking status and pack-years) indicators in both the basic model (Model 1) and fully adjusted model (Model 2). Only current smokers had significantly higher DNAmPhenoAge acceleration in Model 1, although not in Model 2. However, cumulative smoking (pack-years) was significantly associated with DNAmPhenoAge acceleration in both models. Figure 3 shows associations between DNAmPhenoAge and pack-years for all participants (r = 0.09, N = 1214, P = 0.002), current smokers (r = 0.37, N = 46, P = 0.01), and former smokers (r = 0.16, N = 794, P = 0.0001).    Moreover, we assessed associations between the 151 smoking-related CpG candidates and DNAmPhenoAge acceleration using Model 2, with visits and random batch effects of methylation measurement as random effects. A total of 85 of 151 CpG candidates passed the threshold of FDR < 0.05, and thus demonstrated significant associations with DNAmPhenoAge acceleration. We attempted to validate theses CpG candidates using bootstrap methods. Validation confirmed that 52 of 85 CpG sites were associated with DNAmPhenoAge acceleration (FDR < 0.05, Table 3). Of these 52 loci, 18 were hypomethylated and 34 were hypermethylated. AHRR had the most related loci (N = 8).  Moreover, we assessed associations between the 151 smoking-related CpG candidates and DNAmPhenoAge acceleration using Model 2, with visits and random batch effects of methylation measurement as random effects. A total of 85 of 151 CpG candidates passed the threshold of FDR <0.05, and thus demonstrated significant associations with DNAmPhenoAge acceleration. We attempted to validate theses CpG candidates using bootstrap methods. Validation confirmed that 52 of 85 CpG sites were associated with DNAmPhenoAge acceleration (FDR < 0.05, Table 3). Of these 52 loci, 18 were hypomethylated and 34 were hypermethylated. AHRR had the most related loci (N = 8).  We generated a smoking aging index based on the 52 validated smoking-related loci and compared this new index with AHRR cg05575921, a strong smoking-associated hypomethylated locus. The smoking aging index was highly associated with self-reported smoking indicators in the generalized linear mixed effect model (P < 0.05, data not shown). For cg05575921, negative associations were found in both Model 1 (β = −1.34, P = 0.0001) and Model 2 (β = −0.97, P = 0.007) ( Table 3). For the smoking aging index, positive associations were found in both Model 1 (β = 1.92, P < 0.0002) and Model 2 (β = 1.91, P < 0.0001).

Discussion
We systematically explored the association between self-reported and epigenetic smoking indicators and DNAmPhenoAge in whole blood samples based on an elderly cohort in eastern Massachusetts. One of the self-reported variables of smoking, cumulative smoking expressed in pack-years, was significantly and strongly associated with DNAmPhenoAge acceleration. For epigenetic indicators, 52 previously confirmed CpG sites (including a robust mono-biomarker of smoking, ch05575921) and smoking aging index derived from these loci were associated with DNAmPhenoAge acceleration. Tobacco smoking has long been regarded as a leading risk factor for several age-related health outcomes [1,19,20]. A recent study investigated the relationship between DNAmPhenoAge acceleration based on Horvath's and Hannum et al.'s algorithms and smoking-related DNA methylation [21]. Although our current study assessed DNAmPhenoAge acceleration based on DNAmPhenoAge, our findings are consistent with those of Gao et al., since we both found loci mapping to AHRR, a well-known tumor suppressor gene. One specific AHRR locus, cg05575921, has been regarded as a strong epigenetic smoking indicator in previous smoking EWAS [22][23][24] and was confirmed by our current study [25]. Other consistent loci involve CNTNAP2 (contactin associated protein-like 2) and have been reported associated with several mental diseases [26][27][28], as well as KCNQ1 (voltage-gated KQT-like subfamily Q, member 1) that has been associated with type 2 diabetes [29]. Moreover, our current study also unveiled other genes, such as GNG12 (a negative regulator of inflammation in microglial cells that could impact the neural system) [30], LRRN3 (a member of neuron leucine-rich repeat superfamily that may be associated with neural system development and differentiation processes) [31], and GFI1 (involved in the regulation of hematopoietic stem cells) [32].
There are some inconsistencies between self-reported smoking indicators and epigenetic smoking indicators with respect to their associations with aging. One inconsistency is that the smoking status was not related to either DNAmPhenoAge acceleration in the fully-adjusted model ( Table 2) or the smoking aging index (Figure 3). One explanation is that the number of current smokers is far less than former and never smokers, which may diminish statistical power. Another reason might be the susceptibility of different individuals. For instance, genetic polymorphisms may play a role in modifying the health impacts of exposure to environmental health hazards, indicating that some individuals may be more or less likely to be affected by environmental hazards and develop diseases [33,34].
This point leads to our consideration that genetic variation may affect the relationship between epigenetics and aging [35]. A recent study reported findings regarding the heritability of discrepancies between chronological and epigenetic age by examining twins [9]. Moreover, we should also bear in mind that exposure to cigarette smoking is not the only factor that could lead to differences in DNA methylation. Other environmental hazards, such as drinking alcohol, lifestyle or nutritional factors, or interaction effects between these hazards and smoking, might induce similar methylation changes as well [36,37]. This evidence supports the idea that aging rates are influenced by interactions among genetic variation, environmental exposures, and the epigenome.
Our study has the following strengths. We present a longitudinal study to investigate the relationship between smoking-related DNA methylation and aging. Longitudinal studies that investigate aging have advantages over cross-sectional studies, particularly with regard to genetic variations [38]. Furthermore, the cohort longitudinal design allowed us to understand relationships between genetics and other factors that specify inter-subject temporal changes in human DNA methylation profiles.
However, there are also limitations to our study. First, we validated our results using a bootstrap method in the training cohort instead of an external validation cohort. Although it is generally ideal to use another cohort to validate and replicate findings, we could not access another similar phenotypic cohort. Therefore, further studies are needed to validate and replicate our findings. Second, our signature to predict smoking-related aging was derived from elderly males only. Thus, it remains unclear whether the signature would differ in a female population. However, previous EWAS suggest that smoking-associated changes in blood samples are not dependent on sex [39,40]. Nonetheless, future validation in a larger sex-balanced cohort is still needed. Third, measurement inaccuracies may affect the comparability of longitudinal DNAmPhenoAge acceleration. However, we adjusted for batch effect by including it as a random effect in the model and normalized methylation profiles using Horvath's internal normalization methods. Fourth, loss to follow-up may have biased our findings. However, we used inverse probability weighting to impute missing data as a sensitivity analysis.

Conclusions
In summary, our results show that self-reported cumulative smoking and epigenetic smoking indicators were associated with DNAmPhenoAge acceleration. Epigenetic changes at specific smoking-related CpG sites and our combined smoking aging index might be useful biomarkers to predict smoking-associated aging. Further research is warranted to replicate our findings in follow-up studies and to identify the mechanism underlying the effects of smoking on DNAmPhenoAge.
Funding: This research received no external funding.