Genetically Determined Physical Activity and Its Association with Circulating Blood Cells

Lower levels of physical activity (PA) have been associated with increased risk of cardiovascular disease. Worldwide, there is a shift towards a lifestyle with less PA, posing a serious threat to public health. One of the suggested mechanisms behind the association between PA and disease development is through systemic inflammation, in which circulating blood cells play a pivotal role. In this study we investigated the relationship between genetically determined PA and circulating blood cells. We used 68 single nucleotide polymorphisms associated with objectively measured PA levels to perform a Mendelian randomization analysis on circulating blood cells in 222,645 participants of the UK Biobank. For inverse variance fixed effects Mendelian randomization analyses, p < 1.85 × 10−3 (Bonferroni-adjusted p-value of 0.05/27 tests) was considered statistically significant. Genetically determined increased PA was associated with decreased lymphocytes (β = –0.03, SE = 0.008, p = 1.35 × 10−3) and decreased eosinophils (β = –0.008, SE = 0.002, p = 1.36 × 10−3). Although further mechanistic studies are warranted, these findings suggest increased physical activity is associated with an improved inflammatory state with fewer lymphocytes and eosinophils.


Introduction
Reduced physical activity (PA) poses a serious threat for public health. Accumulating evidence shows that lower levels of PA are associated with an increased risk of cardiovascular disease (CVD) and all-cause mortality [1][2][3]. Although the World Health Organization (WHO) recommends at least 150 minutes of moderate-intensity aerobic PA throughout the week, the proportion of Europeans reported to not meet these recommendations has increased in recent years to 46% [4,5]. This trend is not limited to Europeans but is occurring worldwide [6]. Although health care cost estimates related to physical inactivity vary across studies (e.g., 2.4-11.1% of the healthcare expenditure in the United States of America), it is generally believed that physical inactivity is a costly pandemic and associated with a substantial disease burden [7].
However, the exact mechanisms underlying the associations of PA the development of disease are incompletely understood. It has been suggested that systemic inflammation plays a pivotal role in the association between PA and CVD, possibly through changes in circulating (inflammatory) blood cells [8,9]. It is therefore important to investigate whether the effects of PA on CVD risk could be linked through changes in circulating blood cells. However, the association between PA activity and circulating blood cells has only been investigated using traditional observational analyses, which are prone to suffering from confounding effects [10,11]. PA is determined by both genetic and environmental factors [6,12]. A recent genome-wide association study (GWAS) using objectively measured data from wrist-worn accelerometers of PA in a large sub-cohort of the UK Biobank identified newly associated single nucleotide polymorphisms (SNPs) and studied whether activity might contribute causally to disease outcomes [12]. In this study, we aimed to investigate the relationship between genetically determined PA levels, based on the previously reported SNPs, and circulating blood cells using a Mendelian randomization (MR) strategy to minimize confounding effects. We hypothesize a genetically determined higher level of PA is associated with a lower inflammatory state with fewer circulating inflammatory blood cells.

UK Biobank Participants
The UK Biobank study design and population have been described in detail elsewhere [13]. In brief, the UK Biobank is a large community-based prospective study in the United Kingdom that recruited over 500,000 participants aged 40-69 years aiming to improve the prevention, diagnosis, and treatment of a plethora of diseases. All participants gave informed consent for the study [13]. At the baseline visit, vital signs and biological samples were collected, together with data of self-completed questionnaires, interviews, and physical measurements. The present study was conducted under application number 12,006 of the UK Biobank resource.

Genotyping and Imputation
The genotyping process and arrays used in the UK Biobank study have been described elsewhere in more detail [14]. Briefly, participants were genotyped using the custom UK Biobank lung exome variant evaluation axiom (Affymetrix: Santa Clara, CA, United States; n = 49,949), which includes 807,411 SNPs or the custom UK Biobank axiom array (Affymetrix; n = 452,713), which includes 820,967 SNPs [13]. The arrays have insertion and deletion markers with more than 95% common content [14,15]. Imputed genotype data were provided by UK Biobank, based on merged UK10K and 1000 Genomes phase 3 panels [16]. Figure 1 shows a flowchart of the study sample selection and is further described below. Participants were excluded if there was no genetic data available or if there was a mismatch between genetic and reported sex (n = 378). Furthermore, participants with high missingness or excess heterozygosity were excluded (n = 963). Participants with familial relatedness or who were not of white British descent were excluded as well (n = 64,535). In addition, participants included in the GWAS on physical activity (n = 90,277) [12] and participants without lab measurements were excluded (n = 18,498). Lastly, participants with diseases or medication affecting the immune response were excluded as well (n = 105,263). We created a set of 222,645 individuals for the present analyses.
For the definitions of diseases, we used hospital episode statistics data in combination with self-reported diagnoses and medication, as described previously [17]. Further information on the definitions of diseases is presented in Supplementary Table S1.

Single Nucleotide Polymorphisms
For our analyses between genetically determined PA and circulating blood cells, we used a set of 68 SNPs identified in the GWAS on physical activity by Doherty et al. [12]. Similar to Doherty et al., we used 68 SNPs that were associated with physical activity with p < 5 × 10 −6 (Doherty et al., Supplementary Figure S8) to explain more phenotypic variance than the three SNPs at p < 5 × 10 −8 . Supplementary Table S2 contains a detailed list of the extracted SNPs. SNPs associated with sleep duration in the study by Doherty et al. were not present in this list of SNPs.

Statistical Analyses
Normally distributed continuous variables were summarized as mean ± standard deviation (SD) and skewed variables as median and interquartile range (IQR). Linear regression analyses were performed to assess the association between PA and blood cell counts. Regression analyses between SNPs and circulating cells were adjusted for age at the baseline visit, sex, genotyping chip, and the first 30 principal components provided by UK Biobank (to adjust for population structure). Because the PA SNPs were identified in a GWAS on objectively measured PA, the associations between the SNPs with self-reported PA were not tested as these were considered separate entities. Linear regression analyses were performed using Stata 15 (StataCorp, College Station, TX, United States).
The association between genetically determined increased physical activity with outcomes was first assessed using a fixed-effects inverse-variance weighted (IVW) meta-analysis method, combining the Mendelian randomization (MR) estimates for each SNP with the outcome. To adjust for multiple testing, we applied a Bonferroni correction (significance level divided by number of independent tests) and considered a two-sided p-value of less than 0.05/27 = 1.85 × 10 −3 as statistically significant for the main analyses using the MR-IVW fixed effects model. For the MR-IVW effects model to be valid, the assumption of absence of pleiotropy needs to be fulfilled. Pleiotropy occurs when genetic variants associated with the exposure of interest, exert their effect on the same outcome through multiple pathways. Heterogeneity tests are an easy way to evaluate possible pleiotropy, since low heterogeneity indicates that estimates between the genetic variants' association with the outcome should vary by chance only, which is only possible in case of absence of pleiotropic effects. The Rücker framework was adopted to differentiate between preferred models. In case Cochran's Q in the MR-IVW fixed effects analyses was significant (p < 0.05), suggesting heterogeneity and thus non-random error, the MR-IVW random effects model was adopted. Rücker's Q was calculated to evaluate the heterogeneity within the MR-Egger analyses [18]. We then calculated whether the Rücker's Q' statistic was different (p < 0.05) from Cochran's Q statistic (Q-Q') [18]. A statistically significant difference indicates the MR-Egger test to be the best approach in case Q-Q' is large and positive [18]. MR-Egger assumes pleiotropic effects of the SNPs on the outcome are independent of their association with PA, and therefore allows for a non-zero intercept [11]. For this, the MR-Egger must not violate the InSIDE assumption, which assumes the association of the SNPs with the exposure are independent of their direct pleiotropic effects on the outcome. The MR-Egger provides additional information on pleiotropy as a non-significant different intercept from 0 (p > 0.05) indicates evidence for absent pleiotropic bias. MR-Steiger filtering was performed to remove variants with stronger associations (R 2 ) with the outcome than the exposure [18]. Beta values (β) and standard errors (SE) are provided for the MR outcomes. Lastly, we performed a weighted median analysis, which allows up to 50% of the information from variants to violate the MR assumptions [19]. For sensitivity analyses, we adopted a p-value of <0.05.

Population Characteristics
Baseline characteristics are provided in Table 1. Of the 222,645 participants included in the UK-Biobank, 105,970 (47.6%) were male, and the mean age was 56 ± 8 years. The population was slightly overweight (body mass index ≥25 kg/m 2 ) with a mean body mass index of 27.0 kg/m 2 . More than half of the population never smoked or smoked <100 cigarettes. On average, participants spent 4.88 (inter quartile range (IQR): 1.5-11.3) hours per week doing moderate PA and 0.75 (IQR 0.0-2.7) hours per week doing vigorous PA, based on self-reported data using questionnaires.

Genetically Determined Physical Activity and Circulating Blood Cells
Detailed information on the SNPs and their estimates on circulating blood cells is provided in Supplementary Table S3. The association between all 68 SNPs associated with PA with circulating blood cells was assessed using MR analyses. In MR-IVW fixed-effects analyses, genetically determined increased duration of PA was associated with decreased lymphocytes (β = -0.026, SE = 0.008, p = 1.35 × 10 −3 ), decreased eosinophils (β = -0.008, SE = 0.002, p = 1.36 × 10 −3 ), and increased platelet distribution width (β = 0.04, SE = 0.009, p = 9.07 × 10 −5 ) ( Table 2). Figures 2-4 displays the individual SNP forest plots for these three outcomes. Supplementary Figure S1-S3 displays the corresponding scatter plots. PA was not associated with any of the other outcomes in the MR-IVW fixed-effects analyses (Table 2), which will therefore not be discussed any further.
Cochran's Q of the association between PA and lymphocytes (Q = 160, DF = 67, p = 6.50 × 10 −10 ) and eosinophils (Q = 180, DF = 67, p = 6.10 × 10 −12 ) indicated the MR-random effects model to be the preferred approach. Using this approach, the association with eosinophils remained significant (β = −0.0078, SE = 0.009, p = 0.049), but the association with lymphocytes was attenuated (β = −0.026, SE = 0.014, p = 0.072). However, the loss of statistical significance was probably attributable to the larger standard error due to loss of power in the random effects model. In the weighted median MR analyses, the associations between PA and leukocytes (β = −0.030, SE = 0.0134, p = 0.022) and eosinophil count (β = −0.0036, SE = 0.0036, p = 0.3229) were lost, of which the latter could be attributed to wider standard errors. MR-Steiger filtering indicated all SNPs were more strongly associated with the PA behavior than with lymphocytes and eosinophils. The MR Egger intercept indicated little evidence for pleiotropy for the analyses on lymphocytes and eosinophils.     As an additional analysis, we investigated whether the relationship between genetically determined PA with circulating lymphocytes or eosinophils differed between levels of self-reported PA. However, genetically determined PA was not significantly associated with either lymphocytes or eosinophils amongst individuals with no (n = 16,307), only moderate (n = 52,273), or only vigorous (n = 4536) self-reported PA.
In the heterogeneity analyses, Q-Q' was statistically significant for platelet distribution width (Q = 160 DF = −67 p = 1.80 × 10 −3 ), indicating the MR-Egger analysis to be the best approach. Using this approach, the association between genetically determined PA with platelet distribution width was reversed in effect and no longer significant (β −0.05, SE 0.04, p = 0.26). Furthermore, we performed a look-up in MR-Base to explore whether the 68 genetic variants were associated with other traits than PA. This information can be found in Supplementary Table S5.

Discussion
In the present study, we provide evidence for the association between genetically determined PA and circulating blood cells. Genetically determined increased duration of PA was cautiously associated with decreased lymphocytes and decreased eosinophils, suggesting increased levels of PA may improve the inflammatory state. This is in line with our hypothesis.
The present study is the first to report associations between genetically determined PA and circulating blood cells. The association between genetically determined increased PA and decreased lymphocyte levels is partly in line with previous research which studied the association between PA and total leukocyte count [8]. In 4,857 individuals with a mean age of 43 ± 1 year and 43% females, participating in the National Health and Nutrition Examination Survey, the association between increased PA and a decreased leukocyte count has been observed, suggesting that active individuals might maintain a lower inflammatory state and might be less prone to future chronic disease development [8]. We did not assess the correlations between self-reported PA and circulating blood cells, and this study can therefore not be directly compared with these previous studies. However, our study is of additive value, since we were able to study the association between PA and a broader range of cell types (i.e., monocytes, lymphocytes, neutrophils, eosinophils, and basophils) instead of total leukocyte count solely. We did not observe an association between genetically determined PA and total leukocyte count, but we did observe an association between increased genetically determined PA and decreased lymphocytes levels in the MR-IVW analyses. For a long period, a decrease in lymphocytes has been considered as a suppression of the human immune system and therefore as detrimental [20]. However, recent evidence indicates that this reduction in peripheral blood does not reflect immune suppression, but represents a heightened state of immune surveillance and immune regulation, which is driven by a transfer of cells to peripheral tissues, such as the gut or the lungs [21]. Within the cardiovascular field, T-lymphocytes are known to stimulate macrophages expressing collagen-degrading enzymes and thereby increasing the risk of plaque rupture. Lower lymphocyte levels in the blood stream, might therefore also be beneficial for the risk of CVD [22,23]. The association between lymphocyte count and PA in our study was lost in the sensitivity analyses, although the strength of the effect remained similar. This loss of association may be due to a larger standard error reducing the statistical power. Similarly, no association was observed between genetically determined PA and lymphocytes or eosinophils across levels of self-reported PA, which is likely due to the small sample sizes in these groups. Further MR studies using variants which explain a larger variance in PA and also in larger groups of self-reported PA levels are warranted to further investigate these associations.
There is limited data on the association between PA and eosinophils. Earlier research in 11 athletes running an ultramarathon (90 kilometers), showed a non-allergic activation of eosinophils, reflected by an increase of eosinophil cationic protein [24]. Our study population of community dwelling middle-aged men and women substantially differed from those athletes (e.g., age, ethnicity, and athletic condition), and therefore, it is not possible to formulate any hypotheses or to draw any comparisons [24]. Recently, in 5287 patients who underwent coronary angiography, a negative association between peripheral eosinophil count and the severity of coronary artery disease (CAD) has been observed [25]. Furthermore, previous studies have suggested that eosinophils play a key role in the initiation, progression, and rupture of thrombotic plaques, which was confirmed by tissue samples obtained through thrombus aspiration in patients with myocardial infarction [26,27]. In these samples, a large amount of eosinophils was present [26]. However, these findings (low peripheral eosinophil count associated with increased CAD severity and high eosinophil counts observed in atherosclerotic plaques) were observed in CAD patients, whereas our study was performed in a general population of which only 3.7% had a medical history of CAD at baseline. The present findings indicate increased PA is associated with decreased eosinophil levels. Possibly, less PA leading to higher eosinophil levels could play a role in the development of plaques eventually leading to CAD. In this case, increased PA leading to lower eosinophil levels might be protective for plaques and CAD. Further research is needed to investigate this hypothesis. Contrary to previous cross-sectional observational studies, we did not observe associations between genetically determined PA and red blood cell indices [28][29][30]. This might imply that changes in red blood cells during or after PA are bystanders instead of a consequence of PA, although mechanistic studies are necessary to unravel these associations.
Of interest is the specific function of three leading genetic polymorphisms rs564819152, rs2696625, and rs59499656, which may provide more insights in the mechanisms underlying genetically determined PA and circulating immune cells. rs564819152 is located near the SKIDA1 gene. This gene is located on chromosome 10 and encodes the Ski-Dach domain-containing protein 1, which is associated with different types of cancer [31,32]. Furthermore, this gene was found to be associated with lung function [33]. These functions might affect the relation between PA and blood cells, although our heterogeneity tests did not indicate pleiotropy of the genetic variants. rs2696625 and rss59499656 are located near the genes KANSL1-AS1 and SYT4, respectively. KANSL1-AS1 has been described in the context of Alzheimer's and Parkinson's diseases [34]. SYT4 (synaptotagmin 4) is a protein coding gene that has been associated with various traits, i.e., body mass index, lung function, and body fat percentage [33,35,36]. As obesity is associated with an increased inflammatory state, this route could be involved in the association between genetically determined PA and circulating immune cells [37]. Aside from the leading variants, the association with obesity-related traits was also observed in the MR-base look-up with the other genetic variants. However, further mechanistic studies are warranted to provide more insights into these possible additional pathways.
A major strength of our study is the large cohort size of 222,645 participants. Second, this is the first study to assess genetic variants of PA using SNPs that were found in a GWAS performed with objectively measured PA data using wrist-worn accelerometer data. Thirdly, we used strict exclusion criteria to confine factors affecting immune cells such as infections. Furthermore, we used a stringent threshold p-value to reduce false positive rates and increase reproducibility.
As a future perspective, it would be of additional value to investigate the association with the functionality and activity of cells, for example by examining circulating cytokine levels. Circulating cytokines were not measured in the UK Biobank and therefore not tested in the present study. This study could, however, serve as a starting point, providing new insights into where to focus future studies on the association between PA and circulating blood cells.

Conclusions
In conclusion, this study shows that genetically determined PA is associated with changes in circulating blood cells. Increased genetically determined PA is associated with decreased lymphocyte and eosinophil levels. Although further mechanistic studies are warranted, these findings cautiously suggest lifestyle changes that include more PA should be encouraged to improve the inflammatory state.   Funding: N.V. is supported by NWO VENI grant (016.186.125). The funding agencies had no role in the study design, analysis, or interpretation of data; the writing of the manuscript; or in the decision to submit the article for publication.