Genetic Determinants of Leisure-Time Physical Activity in the Hungarian General and Roma Populations

Leisure-time physical activity (LTPA) is one of the modifiable lifestyle factors that play an important role in the prevention of non-communicable (especially cardiovascular) diseases. Certain genetic factors predisposing to LTPA have been previously described, but their effects and applicability on different ethnicities are unknown. Our present study aims to investigate the genetic background of LTPA using seven single nucleotide polymorphisms (SNPs) in a sample of 330 individuals from the Hungarian general (HG) and 314 from the Roma population. The LTPA in general and three intensity categories of it (vigorous, moderate, and walking) were examined as binary outcome variables. Allele frequencies were determined, individual correlations of SNPs to LTPA, in general, were determined, and an optimized polygenetic score (oPGS) was created. Our results showed that the allele frequencies of four SNPs differed significantly between the two study groups. The C allele of rs10887741 showed a significant positive correlation with LTPA in general (OR = 1.48, 95% CI: 1.12–1.97; p = 0.006). Three SNPs (rs10887741, rs6022999, and rs7023003) were identified by the process of PGS optimization, whose cumulative effect shows a strong significant positive association with LTPA in general (OR = 1.40, 95% CI: 1.16–1.70; p < 0.001). The oPGS showed a significantly lower value in the Roma population compared with the HG population (oPGSRoma: 2.19 ± SD: 0.99 vs. oPGSHG: 2.70 ± SD: 1.06; p < 0.001). In conclusion, the coexistence of genetic factors that encourage leisure-time physical activity shows a more unfavorable picture among Roma, which may indirectly contribute to their poor health status.


Introduction
Urbanization and the spread of technological innovations [1], as well as the restrictions during the COVID-19 pandemic [2], have contributed greatly to the sudden decline in physical activity in recent years. Today, physical inactivity is a severe public health problem, as the prevalence of a sedentary lifestyle among adults is increasing worldwide [3]. Physical inactivity is an important preventable risk factor for non-communicable diseases [3,4]. Leisure-time physical activity (LTPA) is a well-known modifiable lifestyle factor associated with a wide range of cardiometabolic outcomes, including obesity, hypertension, type 2 diabetes, metabolic syndrome, and cardiovascular diseases in general [5].
Various psychological, biological, social, and environmental factors affecting leisuretime physical activity have been investigated and identified [6][7][8]. Demographic and health variables associated with levels of physical activity include sex, age, education, and body mass index (BMI) [9]. Recognizing the demographic, environmental, and social determinants of physical activity among adults is important for designing effective intervention strategies to promote it [10]. Several studies have shown that some determinants, such as age, higher educational attainment, and higher income, are associated with increased participation in LTPA for some groups [11,12]. Despite this knowledge and continued efforts to encourage physical activity, in most developed countries prevalence remains low and participation rates for women are consistently lower than for men [13].
Leisure-time physical activity is influenced by a combination of several factors, of which genetic heritability is estimated by studies to be between 30% and 52% [7]. A study published in 2009 involving 1644 unrelated Dutch and 978 Americans of European ancestry found that the heritability of leisure-time physical activity behavior is explained by a large number of genetic variants with small individual effect sizes [14]. A 2014 study by Kim et al. [15] in a sample of 8842 Koreans found similar results, with no significant association of single nucleotide polymorphisms (SNPs) with LTPA at the individual level, but 59 SNPs (in 76 genes) were identified using multiple SNP bootstrap analysis. LTPA varies between different ethnic groups [16,17], which can be partly explained by the environmental and lifestyle characteristics mentioned above, but differences in the genetic background cannot be excluded [18].
Although the association between the very unfavorable socioeconomic circumstances and unfavorable health status [31,41,42] of Roma is evident, it seems that the differences observed in comparison with the general populations cannot be explained solely by their poorer socioeconomic characteristics [30,43]. Recent studies on the genetic background of increased risk of various non-communicable diseases among them [44][45][46][47][48] further support the hypothesis that their health status is determined by the complex interactions of healthrelated genetic and non-genetic factors.
The physical activity of Roma has not been well characterized; data on physical activity from the 2011 cross-sectional, population-based HepaMeta survey in Slovakia showed that LTPA was significantly lower among Roma women than among non-Roma [49]. Regarding the health risk behavior of Roma adolescents in segregated settlements in Slovakia compared to non-Roma, the differences were not statistically significant, except for the significantly higher rate of physical inactivity among Roma women [50]. The results of a complex health survey carried out by our research team in 2018 [51] are similar to those reported in Slovakia, in that while there is no significant difference in LTPA between Roma and non-Roma men, Roma women were found to have significantly lower levels than non-Roma.
An article published in 2019 [52], comparing the physical activity levels of two Roma subgroups (Gabor and Băies , i) and non-Roma groups in Romania, found that both Roma subgroups had significantly lower levels of daily physical activity (with gender differences). In addition, both Roma subgroups were less active than non-Roma in sports and gardening.
Given that physical activity is to a large extent genetically determined and that there are differences in leisure-time physical activity between the Hungarian general and the Roma population, the question arises whether these differences are not due, at least partly, to different genetic backgrounds resulting from their different origins.
Our study aims to investigate whether LTPA is also genetically determined in the Hungarian general and Roma populations and how this contributes to the lower LTPA among Roma using previously identified polymorphisms that promote LTPA.

Characteristics of the Study Populations by Sex
No significant differences were found in mean age, abdominal circumference, and BMI between the two study populations by sex. In the Roma population, the proportion with lower levels of education were significantly higher, and the proportion of people traveling by vehicle was significantly lower (HG men = 81.4% vs. Roma men = 26.6%, p < 0.001; HG women = 68.1% vs. Roma women = 24.7%, p < 0.001) in both sexes. See Table 1 for more details. In general, the proportion of people who did LTPA was not significantly different between the two study populations for either sex.
For men in the Hungarian general population, the proportion of participants with LTPA of vigorous (HG: 39.3% vs. Roma: 12.7%, p < 0.001) and moderate (HG: 32.4% vs. Roma: 10.2%, p = 0.016) intensity was significantly higher than among Roma, while the proportions of people walking in leisure time did not differ significantly between the two study populations (HG: 53.8% vs. Roma: 54.4%, p = 0.927).

Results of Linkage Disequilibrium (LD), Hardy-Weinberg Equilibrium (HWE), and Power Analyses and Comparison of Genotype Distribution between Sample Populations
In LD analysis of ten SNPs, there was no linkage between SNPs. For three SNPs (rs12405556, rs429358, and rs6092090), significant deviations from HWE were measured and these SNPs were excluded from further analysis.
For four (rs10252228, rs12612420, rs459465, and rs10887741) of the seven SNPs included in the study, a significant allele frequency difference was found between the Hungarian general and Roma populations and the statistical power varied between 0.147 and 0.985. See Supplementary Table S1 for more details.

The Result of the Association of SNPs with LTPA of Different Intensities
Only the C allele of rs10887741 showed a significant positive association with LTPA in general (odds ratio (OR) = 1.48, 95% CI: 1.12-1.97, p = 0.006), but none of the seven SNPs included in the study showed a significant association with any intensity category. For more details see Table 4.

Calculation and Comparison of Optimized Polygenic Score (PGS) for LTPA in the Hungarian General and Roma Populations
The PGS optimization process tests the cumulative effect of SNPs by starting with the SNP showing the strongest association with LTPA (rs10887741: OR = 1.48, p = 0.006) and in decreasing order to the weakest one (rs459465: OR = 1.01, p = 0.967). During the process, rs6022999 and rs7023003 increased the strength of association of optimized polygenic score (oPGS) with LTPA in general. The remaining four SNPs (rs12612420, rs10252228, rs8097348, and rs459465) did not increase the strength of association and were therefore excluded from further analysis. See more details in Supplementary Table S2. Based on univariate analysis, oPGS showed a significant positive correlation with the LTPA in general (OR = 1.40, 95% CI: 1.17-1.68; p < 0.001) and in intensity categories of vigorous (OR = 1.32, 95% CI: 1.09-1.59; p = 0.004) and moderate (OR = 1.23, 95% CI: 1.04-1.46; p = 0.013). After adjusting for confounders (ethnicity, age, waist circumference, BMI, education, and driving), the association remained significant only for the LTPA in general (OR = 1.40, 95% CI: 1.16-1.70, p < 0.001). For more details see Table 5.
In the groups defined based on oPGS values, we examined how the METS-min/week values changed for LTPA in general and its intensity categories and conducted a trend analysis. With the increase in oPRS, there was a significant upward trend in LTPA expressed in MET-min/week in general (p for trend = 0.002) as well as in the vigorous intensity category (p for trend = 0.015). The moderate (p for trend = 0.028) and walking (p for trend = 0.019) intensity categories showed no significant correlation with the oPGS categories after the test correction. For more details see Table 6.
We also examined how the oPGS values are related to the weekly frequency of LTPA, i.e., the average number of days with at least 10 min that a person engages in leisure-time physical activity in general and its intensity categories. In this case, as in the MET-min/week results, there is a significant trend between the increase in oPGS values and the number of days per week of leisure-time physical activity in general (p for trend = 0.001), as well as for the vigorous (p for trend = 0.003), moderate (p for trend = 0.014), and walking (p for trend = 0.009) intensity categories. For more details see Supplementary Table S3.  The distribution of oPGS differed significantly between the two study populations (oPGS Roma : 2.19 ± SD:0.99 vs. oPGS HG : 2.70 ± SD:1.06; p < 0.001). A strong rightward shift (to the higher values) is observed for the HG population compared with the Roma. See Figure 1 for more details.

Discussion
LTPA is low in both the Hungarian general and Roma populations [51], which may be due to the influence of genetic background [7,65] in addition to known environmental and lifestyle factors [66]. The aim of the present study is to test this hypothesized genetic effect and, if it exists, to compare its magnitude between the Hungarian general and Roma populations.

Discussion
LTPA is low in both the Hungarian general and Roma populations [51], which may be due to the influence of genetic background [7,65] in addition to known environmental and lifestyle factors [66]. The aim of the present study is to test this hypothesized genetic effect and, if it exists, to compare its magnitude between the Hungarian general and Roma populations.
Based on a systematic literature search, ten SNPs were selected to investigate the genetic background of LTPA. Of the ten SNPs selected, three were excluded based on HWE, while four of the remaining seven had significant allele frequency differences between the two populations. When examining the individual effects of SNPs, only the C allele of rs10887741 showed a significant association with LTPA. PGS optimization identified three SNPs for which the combined effect showed a strong positive significant association with LTPA in general, and oPGS categories are significantly correlated with an increasing trend in MET-min/week values as well as with the frequency of LTPA in general and vigorous-intensity categories.
The distribution of the populations by oPGS shows a significant shift to the right in the Hungarian general population compared with the Roma population. This finding suggests that the Hungarian general population has a higher genetic predisposition to doing leisure-time physical activity compared to the Roma.
The rs10887741 polymorphism in the 3 -phosphoadenosine 5 -phosphosulfate synthase 2 (PAPSS2) gene showed the strongest individual association with LTPA in general. The enzyme encoded by the PAPSS2 gene is involved in the sulfation of many molecules in addition to glycosaminoglycans. At present, the mechanisms by which the PAPSS2 gene affects participation in leisure-time physical activity are not known, but mutations in it cause spondyloepimetaphyseal dysplasia, a disease characterized by short stature and limbs in both mice and humans [67]. A study on siblings found a correlation between the 10q23 region harboring the PAPSS2 gene and maximum physical performance [68]. This further supports the hypothesis that physical fitness may be an important determinant of leisure-time physical activity behavior [69].
The rs6022999 SNP is located in the CYP24A1 (Cytochrome P450 family 24 subfamily A member 1) gene, whose protein product is responsible for the conversion of vitamin D into its physiologically inactive form. Vitamin D is essential for proper muscle function [70,71], and polymorphisms of the vitamin D receptor in humans are associated with altered muscle strength regardless of sex [72]; these changes are likely to affect levels of physical activity.
The rs7023003 is located in an intergenic region between the RN7SK and SLC44A1 genes. This SNP showed the strongest association with LTPA in a Korean study but still did not reach genome-wide association study significance [15]. Its significant association with LTPA was not confirmed in the Japanese population [73]. Currently, no research has investigated its role in LTPA through direct or indirect processes.
The importance of understanding the genetic reasons behind differences in individual (leisure-time) physical activity is supported by recently published articles. Doherty and colleagues investigated the genetic background of physical activity and sleep duration (both were based on measured data) in 91,105 individuals registered in the UK Biobank [74]. They successfully identified 14 significant loci (seven novel-five for LTPA and two for sleeping) accounting for 0.06% of physical activity and 0.39% of sleep duration. They found that the heritability was higher in women than in men for general activity (23% vs. 20%, p = 1.5 × 10 −4 ) and sedentary behavior (18% vs. 15%, p = 9.7 × 10 −4 ). Klimentidis et al. [75] also investigated UK Biobank samples and identified ten loci with a significant (p < 5 × 10 −9 ) effect on all physical activity measures. Of these, the variant rs429358 in the APOE gene (which was excluded from our study due to its deviation from HWE) was most strongly associated with moderate to vigorous physical activity. A GWAS study by Wang et al. [76] successfully identified a combination of 99 genetic variants associated with self-reported moderate to vigorous leisure-time physical activity, leisure-time screen time and/or sedentary behavior at work. Results summarized in a review article by De Geus in 2023 [77] support the general opinion that genetic factors strongly contribute to physical activity either self-reported or measured by accelerometer. The heritability of physical activity was found to be approximately 43% across the lifespan. It has also been shown that a polygenic score based on genetic variants influencing PA (which we also use) could help to improve the success of targeted interventions.
This study has its strengths and limitations. First, the correct identification of ethnicity is a common challenge in studies like ours [78]. Roma ethnicity was determined solely through self-identification, and consequently, there may be Roma individuals in the Hungarian general population, so the effect of ethnic differences in the study may be underestimated. Another limitation is that individuals who are above 65 years of age were not included in the study. Owing to a lack of information on gene-gene and geneenvironment interactions, epigenetic factors, and structural variants, we did not consider them in our analysis. In the current study, ten SNPs related to LTPA were included for the calculation of oPGS. Theoretically, incorporating a larger number of SNPs may further improve the predictive ability of the PGS model. Nonetheless, adding many SNPs into the PGS model does not necessarily lead to a better predictive ability, as could be seen in the optimization process. Despite the limitations of the study, it should be emphasized that this is the first study to examine the possible genetic causes of the unfavorable level of leisure-time physical activity in the Roma population in comparison with that in the Hungarian general population.
In conclusion, the present study demonstrates that the differences in the prevalence of different intensity categories of LTPA between the Hungarian general and Roma populations can be partly explained by genetic causes.

Sample Populations and Questionnaire-Based Interviews
Data used in our present study were obtained in a cross-sectional three-pillar (i.e., questionnaire-based, physical examination, and laboratory examination) complex (i.e., health behavior and examination) survey carried out in 2018. Sampling and data collection are described in detail elsewhere [79].
Briefly, the Hungarian general (HG) and Roma sample populations were recruited from two counties (Hajdú-Bihar and Szabolcs-Szatmár-Bereg) in Northeast Hungary, the area where the representation of Roma is the highest and where most segregated Roma colonies are located. First, twenty-five colonies, and then from each colony 20 households, were randomly selected and one person (aged 20-64) from each household was invited to participate in the survey. Participants' ethnicity was determined by self-declaration. The Hungarian general population included randomly selected individuals aged 20 to 64 years, living in private households in the same counties, and registered with general practitioners. From each of the 20 randomly selected GP practices, 25 randomly selected individuals were invited to participate in the study. The planned sample size of the survey was 500 persons per population, but the final study sample, for the present study, was reduced to 797 (410 HG and 387 Roma) after excluding individuals with incomplete records.
The main part of the questionnaire used in the complex health survey was the European Health Interview Survey wave 2 (EHIS 2) questionnaire [80], which consists of four modules: (a) health status, (b) health care utilization, (c) determinants of health, and (d) socioeconomic variables. The EHIS 2 questionnaire has been extended with some additional sets of questions, including the long version of the International Physical Activity Questionnaire (IPAQ) to measure physical activity by domains and dimensions. Only activities performed for at least ten minutes during the last seven days were recorded in the questionnaire.

Characterization of LTPA by Sub-Domains and Intensity Categories
The IPAQ measures time spent in different areas (sub-domains): (1) work, (2) transport, (3) home and gardening, and (4) leisure in three intensity categories (walking, moderate-intensity activity, and vigorous-intensity activity). For details on calculating physical activity levels, see elsewhere [51].
Briefly, individuals who (regardless of intensity category) performed any form of physical activity in their leisure time (strictly outside working hours) were included in the group of people who performed LTPA.
The three intensity categories of LTPA are based on the form of exercise performed: In addition, LTPA intensity was quantified as weekly metabolic equivalent task minutes (MET-min/week) based on participants' responses according to the IPAQ scoring protocol [81]. Total minutes over the last seven days spent on different types of LTPA were defined for each individual to create MET-min/week scores for activity sub-domains, and average values were calculated for both sample populations by sex.

DNA Extraction, SNP Selection, Genotyping, Testing Hardy-Weinberg Equilibrium, and Linkage Disequilibrium
DNA was extracted from EDTA-anticoagulated blood samples using the MagNA Pure LC system (Roche Diagnostics, Basel, Switzerland) following the manufacturer's instructions.
Using online search engines such as PubMed, Ensemble, and HuGE navigator, a systematic literature search was conducted to identify SNPs statistically significantly associated with LTPA. The search time frame related to the present study was until 5 August 2019. Keywords and their combinations used in the search: leisure time physical activity, recreational physical activity, genetics, genome-wide association study (GWAS), candidate gene, genotype. In the selection of SNPs, particular attention was focused on the results of the three GWAS [14,15,73] and a candidate gene study [82], which were the most relevant in this field.
The literature search identified a total of ten SNPs, and these were genotyped using the MassARRAY platform (Sequenom Inc., San Diego, CA, USA) with iPLEX Gold chemistry in the Mutation Analysis Core Facility (MAF) of the Karolinska University Hospital, Sweden. The MAF conducted validation, concordance analysis, and quality control according to their protocols. The Hardy-Weinberg Equilibrium (HWE) and linkage disequilibrium (LD) structure of the genotyped SNPs were calculated by Haploview software (version 4.2; Broad Institute; Cambridge, MA, USA).

Calculation and Optimization of the Polygenic Score
Individuals with any missing SNP genotypes were excluded from further analyses; thus, 330 participants from the HG sample and 314 Roma individuals were included in genotype analysis. In the PGS calculation, each person was assigned a score based on the number of effect alleles carried. The effect allele was considered to be the allele that promotes LTPA. Homozygous effect alleles were considered as "2", heterozygotes as "1", and genotypes with no effect allele were considered as "0". By using these codes, a simple count score was calculated as described by Equation (1), in which Gi is the number of the effect alleles for the ith SNP. This model sums up all alleles over all loci as a summary score, assuming that all alleles have the same effect in direction and size.
The polygenic model optimization procedure aimed to select SNPs (identified in the systematic literature search) that had a strong association with LTPA in both study populations. For PGS optimization, adjusted logistic regression analyses (for age, ethnicity, sex, education, traveling by vehicle, BMI, and waist circumference) were used, and these analyses were also performed on a combined sample of the two populations.
The SNPs were tested in ascending order of p-value, in which process each SNP was inserted into the statistical model one by one, starting from the SNP with the strongest association (with the lowest p-value), and the association between oPGS and LTPA was examined after each insertion.
SNPs were selected and used for final oPGS only if they increased the strength of association of oPGS (decreased p-value and increased Cox-Snell R-squared value) with LTPA. SNPs that did not affect or weaken the model's association, i.e., increased the p-value and decreased the R-squared, were excluded from further analyses.
Genetic predisposition categories were formed based on the population distribution of oPGS, four groups were created, and trend analysis was used to examine the association of these groups with LTPA in general and its intensity categories.

Statistical Analysis
The χ 2 test was used to compare the differences between nonquantitative variables and to examine the HWE of genotyped SNPs. Statistical power for each SNP was calculated by using the Online Sample Size Estimator (OSSE) online tool (http://osse.bii.a-star.edu.sg/ calculation1.php) (accessed on 10 January 2023). The Shapiro-Wilk test was used to examine whether the quantitative variables were normally distributed or not, and, if necessary, Templeton's two-step method was considered to transform the non-normal variables into normal ones [83]. The Mann-Whitney U test was used to assess the distribution of age, waist circumference, BMI, oPGS, and MET-min/week between the study populations.
Multiple logistic regression analyses were used to determine the association between genetic factors (individual SNPs and oPGS) and LTPA. All regression analyses were conducted using a model adjusted for relevant factors (e.g., age, ethnicity, sex, education, vehicle travel, BMI, and waist circumference). The Jonckheere-Terpstra trend test [84] was used to analyze the trend of association between oPGS categories and MET-min/week values. Ethnicity was used as a covariate when the two populations were combined and examined together. Statistical analyses were performed using IBM Statistical Package for the Social Sciences (SPSS) version 26 (Armonk, NY, USA). For multiple statistical analyses (all calculations involving the oPGS), the Bonferroni correction method was used (the conventional p-value of 0.05 was divided by the number of independent polymorphisms).

Ethics Declarations
Informed consent was recorded for all subjects who were included in the study. The survey was conducted under the conditions set out in the Declaration of Helsinki and the protocol was approved by the Ethical Committee of the Hungarian Scientific Council on Health (61327-2017/EKU).

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the Hungarian Scientific Council on Health (Reference No.: 8907-O/2011-EKU).