Genetic Variations between Youth and Professional Development Phase English Academy Football Players

The purpose of this study was to examine differences in the genotype frequency distribution of thirty-three single nucleotide variants (SNVs) between youth development phase (YDP) and professional development phase (PDP) academy football players. One hundred and sixty-six male football players from two Category 1 and Category 3 English academies were examined within their specific age phase: YDP (n = 92; aged 13.84 ± 1.63 years) and PDP (n = 74; aged 18.09 ± 1.51 years). Fisher’s exact tests were used to compare individual genotype frequencies, whereas unweighted and weighted total genotype scores (TGS; TWGS) were computed to assess differences in polygenic profiles. In isolation, the IL6 (rs1800795) G allele was overrepresented in PDP players (90.5%) compared to YDP players (77.2%; p = 0.023), whereby PDP players had nearly three times the odds of possessing a G allele (OR = 2.83, 95% CI: 1.13–7.09). The TGS (p = 0.001) and TWGS (p < 0.001) were significant, but poor, in distinguishing YDP and PDP players (AUC = 0.643–0.694), with PDP players exhibiting an overall more power-orientated polygenic profile. If validated in larger independent youth football cohorts, these findings may have important implications for future studies examining genetic associations in youth football.


Introduction
The process of athlete development and ultimately reaching senior professional status in a sport such as football (soccer) is both dynamic and multifactorial [1]. Indeed, task constraints (e.g., the value of deliberate practice and deliberate play, or the importance of early engagement), performer constraints (e.g., differences between skill levels on anthropometric/physiological factors, psychological characteristics, and technical or tactical skill), and environmental constraints (e.g., the influence of birth-place, relative age, and/or socio-cultural influences) have all been associated with the performance of youth football players and their potential to achieve adult success [2]. Despite being heavily researched, the extent to which each of these elements impact performance and affects the likelihood of achieving senior professional status in football remains unclear [3].
The failure to clearly identify a set of variables that uniformly predicts performance levels is, in part, due to methodological issues identified throughout talent identification and development research in football [4]. Prospective and longitudinal analyses in youth football have also revealed that specific performer characteristics may be more important at different time-points throughout development (see [5] for a review). For instance, when comparing English academy football players of different age groups (i.e., under-9 to

Participants
One hundred and sixty-six male football players from two Category 1 and Category 3 English academies participated within their specific age phase: YDP (n = 92; aged 13.84 ± 1.63 years) and PDP (n = 74; aged 18.09 ± 1.51 years). Informed assent from all players, consent from parents/guardians, and gatekeeper consent from each academy were collected prior to the commencement of the study. All experimental procedures were conducted in accordance with the guidelines in the Declaration of Helsinki and ethical approval was granted by the corresponding author's institutional Ethics Committee. This study was conducted in accordance with the recommendations for reporting the results of genetic association studies defined by the Strengthening the Reporting of Genetic Association studies (STREGA) statement.

Genetic Procedures 2.2.1. Genotyping
Saliva was collected from players via sterile, self-administered buccal swabs, following a minimum of 30 min since food or drink ingestion. Within 36 h, saliva samples were sent to AKESOgen, Inc. (Peachtree Corners, GA, USA) for DNA extraction. Using Qiagen chemistry, DNA was extracted on an automated Kingfisher FLEX instrument (Thermo Fisher Scientific, Waltham, MA, USA). To measure the quality and quantity of extracted DNA, PicoGreen and Nanodrop measurements were taken. Input to the custom testing array occurs at 200 ng in 20 µL. Amplification, fragmentation, and resuspension were performed using Biomek FXP. GeneTitan instrumentation (Thermo Fisher Scientific, Waltham, MA, USA) was used to stain and scan the arrays, with hybridization performed in a Binder oven at 48 degrees for 24 h, following the Affymetrix Axiom high throughput 2.0 protocol. Data analysis was then performed using raw CEL file data input into the Affymetrix Axiom Analysis Suite (Affymetrix, Santa Clara, CA, USA). Procedures were in accordance with previous studies [19,20,24].

Total Genotype Score
Unweighted and weighted total genotype scores (TGS; TWGS) were calculated to assess the differences in polygenic profiles between YDP and PDP players (as described previously [19,20]). Both TGSs and TWGSs have demonstrated sufficient discriminatory power in previous sport genomic research [27,28]. To generate both the TGS and TWGS, each genotype of a respective SNV initially received a score between 0-2 using a datadriven approach based on the observed genotype associations with PDP status. Genotypes of dominant (AA vs. Aa-aa) and recessive (AA-Aa vs. aa) models were assigned a score of two (i.e., associated genotype[s]) or zero (i.e., alternate genotype[s]), whereas genotypes of co-dominant models (AA vs. Aa vs. aa) were assigned three scores (i.e., homozygousassociated genotypes received a score of two, the heterozygote received a score of one, and the alternate homozygous genotype received a score of zero).
For the TGS, the original procedure of Williams and Folland [28] was followed. Genotype scores (GS) were summed and transformed into a 0-100 scale by dividing the total score by the maximum possible score and multiplying by 100.
For the TWGS, a similar procedure to Varillas Delgado et al. [27] was used. Each GS was multiplied by the β coefficients of each SNV following multiple regression to create weighted genotype scores (WGS). The WGSs were then summed and transformed into a 0-100 scale by dividing the total score by the maximum possible score and multiplying by 100. TWGS = (combined − WGS/maximum − WGS) × 100

Data Analysis
Data were analyzed using Jamovi version 1.8.1 and IBM SPSS version 25. Fisher's exact tests were used to test SNVs for adherence with Hardy-Weinberg equilibrium (HWE) and to compare genotype frequencies between YDP and PDP players. Akaike information criterion (AIC) was used to select which genetic model (i.e., co-dominant, dominant, recessive) best fit the data and would be subjected to hypothesis testing. However, if MAF ≤ 0.25, a dominant model was utilized to retain statistical power [21]. An independent t-test was used to assess differences in the TGS and TWGS between YDP and PDP players. Additionally, receiver operating characteristic (ROC) curves and area under the curve (AUC) were used to evaluate the discriminatory power of the TGS and TWGS to distinguish YDP and PDP players with threshold values of: >0.5-0.7 = poor, >0.7-0.8 = acceptable, >0.8-0.9 = excellent, and >0.9 = outstanding [29]. Odds ratios (OR) and 95% confidence intervals (CI) were also calculated to estimate the effect size of individual genotypes and polygenic models (split into equal thirds using tertiles). Statistical significance was set at p < 0.05.

Results
The genotype and allele distributions of all SNVs were in HWE, except for GALNT13 (p < 0.001) and UCP2 (p = 0.010) in the PDP group (see Table 2). The genotype frequency distribution of IL6 was significantly different between YDP and PDP players (p = 0.023) (see Figure 1). More specifically, the G allele was overrepresented (13.3%) in PDP players (90.5%) compared to YDP players (77.2%). Furthermore, PDP players had 2.83 times the odds of possessing a G allele (OR = 2.83, 95% CI: 1.13-7.09) compared to YDP players. No significant differences in genotype frequency distribution between the age-specific phases for any other SNVs existed (see Table 3). Table 2. Descriptive statistics of youth and professional development phase English academy football players.

Discussion
This study examined differences in the genotype frequency distribution of thirtythree SNVs, both individually and collectively, between YDP and PDP English academy football players. The key findings showed an overrepresentation of the IL6 (rs1800795) G allele in PDP players compared to YDP players. In addition, the TGS and TWGS models demonstrated that the combination of these thirty-three SNVs was effective in differentiating YDP and PDP players. As such, these results suggest there is significant genetic variation between youth football players of distinct age groups. To our knowledge, this is the first assessment of genotype frequency distribution in isolation, and as part of a polygenic profile, between two age-specific phases of academy football players in England. Therefore, these findings may have important implications for future studies examining genetic associations in youth football.
The IL6 gene encodes for the pleiotropic cytokine interleukin-6 (IL-6), which has previously been associated with multiple biological processes relevant to sport performance (i.e., glucose homeostasis, muscle hypertrophy, and repairing damaged muscle) [30]. The circulating levels of IL-6 can vary depending on specific variants within the gene. For instance, the G and C alleles of the IL6 (rs1800795) SNV alter promoter activity and consequently result in higher and lower IL-6 levels, respectively [31]. Higher IL-6 levels have been associated with greater muscle hypertrophy, improved glucose uptake, and increased protection against exercise-induced muscle damage, possibly due to reduced muscle inflammation by positively regulating the pro-and anti-inflammatory cytokine production balance [32,33]. In contrast, lower IL-6 levels may increase the possibility of sustaining a muscular injury, inhibit recovery, and hinder athletic performance, with

Discussion
This study examined differences in the genotype frequency distribution of thirtythree SNVs, both individually and collectively, between YDP and PDP English academy football players. The key findings showed an overrepresentation of the IL6 (rs1800795) G allele in PDP players compared to YDP players. In addition, the TGS and TWGS models demonstrated that the combination of these thirty-three SNVs was effective in differentiating YDP and PDP players. As such, these results suggest there is significant genetic variation between youth football players of distinct age groups. To our knowledge, this is the first assessment of genotype frequency distribution in isolation, and as part of a polygenic profile, between two age-specific phases of academy football players in England. Therefore, these findings may have important implications for future studies examining genetic associations in youth football.
The IL6 gene encodes for the pleiotropic cytokine interleukin-6 (IL-6), which has previously been associated with multiple biological processes relevant to sport performance (i.e., glucose homeostasis, muscle hypertrophy, and repairing damaged muscle) [30]. The circulating levels of IL-6 can vary depending on specific variants within the gene. For instance, the G and C alleles of the IL6 (rs1800795) SNV alter promoter activity and consequently result in higher and lower IL-6 levels, respectively [31]. Higher IL-6 levels have been associated with greater muscle hypertrophy, improved glucose uptake, and increased protection against exercise-induced muscle damage, possibly due to reduced muscle inflammation by positively regulating the pro-and anti-inflammatory cytokine production balance [32,33]. In contrast, lower IL-6 levels may increase the possibility of sustaining a muscular injury, inhibit recovery, and hinder athletic performance, with higher creatine kinase activity reported in response to eccentric exercise in C allele carriers [30].
More recent sport-specific research has shown that IL-6 may be an important biomarker in power-orientated sports and performance phenotypes. Studies assessing Polish and Spanish high performing athletes have reported an overrepresentation of the IL6 (rs1800795) G allele in those who take part in power-based sports (i.e., jumpers, sprinters, and weightlifters) compared to controls [34,35]. Cross-sectional quantitative data supporting these findings also exist, as youth footballers in Britain possessing the G allele performed significantly better than C allele carriers in acceleration and speed assessments (i.e., 5 m and 20 m sprint) [24]. Therefore, due to the mechanistic properties associated with IL6 (rs1800795), the G allele may better protect skeletal muscle and aid in repair during powerful muscle contractions, which subsequently allows for a higher volume of training that stimulates favorable adaptations and ultimately results in superior performance in high-intensity activities.
Although power-orientated phenotypes such as acceleration, speed, and vertical jumps are important across all youth football age groups [36], they appear to become more important as players age and mature [37]. For instance, in many male English football academies, youth players do not progress to compete on a full-sized pitch, with eleven players on each team, until the under-13 age group. With this increase in pitch size, players spend more of their competitive match-play time at low speeds and perform a greater number of sprint actions, placing a greater physiological demand on anaerobic capacity [38]. Furthermore, in a longitudinal investigation of English academy football players, it was reported that whilst future professionals began to outperform their nonprofessional counterparts in vertical CMJ from the age of 12 years (>0.6 cm), differences became more pronounced in older age groups (e.g., aged 18 years > 1.7 cm) [9].
As competitive match-play demands shift more towards anaerobic capacities, academy recruitment teams may choose to retain players displaying superior power rather than endurance capabilities [37]. This may explain the overrepresentation of the IL6 (rs1800795) G allele in the PDP group compared to the YDP group due to its association with several power-orientated phenotypes. However, recent research in academy football has also shown that the G allele may protect PDP players from injury. More specifically, Hall et al. [25] reported that only post-peak height velocity players (aged 17.5 ± 2.1 years) possessing the IL6 (rs1800795) C/C genotype suffered significantly more injuries than G allele carriers. The authors noted that the association was possibly due to the combination of greater muscle damage and inflammation experienced by C allele carriers, alongside the higher intensity of match actions and increased frequency of training and/or competitive match-play in older age groups. As such, the overrepresentation of the G allele in the PDP group may be explained by a pleiotropic effect of IL6 (rs1800795) on power and injury.
The TGS and TWGS models showed that YDP and PDP football players have distinct polygenic profiles, with the TWGS demonstrating greater discriminatory accuracy. This suggests that whilst each SNV has a small additive effect, favorable alleles of individual SNVs have different degrees of influence. This corresponds with previous research in academy football players on physiological, psychological, and technical phenotypes that underpin differences in these age-specific phases [19][20][21]. The general frequency distribution of the genotypes across all SNVs also aligns with the IL6 (rs1800795) findings. Specifically, PDP players had a greater proportion of alleles previously associated with power-orientated phenotypes (e.g., ADBR2 rs1042714 G allele, CKM rs8111989 T allele, FTO rs9939609 A allele, GALNT13 rs10196189 G allele, IGF1 rs35767 A allele, PPARG rs1801282 G allele, TRHR rs7832552 T allele). This indicates PDP players may have an overall more power-orientated polygenic profile, which corresponds with similar findings reported in post-peak height velocity (aged 16.8 ± 2.3 years) academy football players using only four of these SNVs: ACTN3 (rs1815739), AGT (rs699), PPARA (rs4253778), and NOS3 (rs2070744) [21].
The polygenic models also showed that in general YDP players had a greater proportion of favorable alleles in SNVs previously associated with psychological and technical phenotypes (e.g., HTR2A rs6311 T allele, ADBR2 rs1042714 C allele, BDNF rs6265 T allele, DBH rs1611115 C allele, DRD1 rs4532 C allele, DRD4 rs1800955 C allele, GABRA6 rs3219151 C allele) in academy footballers [19,20]. The importance of these psychological and technical phenotypes in youth football has been demonstrated in previous research by effectively differentiating higher and lower performers in adolescence and predicting success at adulthood [2,3]. However, these findings suggest having an increased frequency of these preferred psychological/technical alleles may be more advantageous in younger age groups. This corresponds with previous research that reported coaches and recruiters consider technical, tactical, and psychological factors as the most important during this stage of development [39,40]. As such, the polygenic models collectively showcase that English academy football players of different age-specific phases may have distinct genetic profiles, with PDP players more power-orientated and YDP players more psychologicaland technical-orientated, though further replication studies are required to build on the limited evidence available in youth football players.
Although the polygenic models distinguished YDP and PDP players, they still had relatively poor accuracy, which indicates they should not be considered for practical implementation. Moreover, given the data-driven cross-sectional nature of the analyses, these findings may not generalize well to other youth football cohorts and may reflect cohort effects. Therefore, the external validity of these results should be assessed in larger independent samples alongside the addition of many more relevant genetic variants. It is also important to note that the previous associations of the SNVs included in this study with specific physiological, psychological, technical, and injury phenotypes may not be reliable due to the relatively small sample sizes in football genomic research [10]. Therefore, the inferences made with regards to genetic profile orientation in YDP and PDP players should be interpreted with caution.
Studies with this type of unique sample are typically underpowered so it is important to be relatively conservative with any conclusions, as meaningful implications cannot be made from one study in isolation. However, in the early stages of development in a field, informed speculation based on prior knowledge may be important for informing future work. As a result, we made informed speculation about our findings as a way of guiding subsequent work in this area. Moreover, building this research base with studies using transparent methodologies is important so they can contribute to research synthesis approaches in the future and draw more valid and reliable conclusions before these findings are implemented into applied settings [41].
Nevertheless, this study does have important limitations that should be considered. For instance, we did not make adjustments for multiple comparisons, which may have increased type 1 errors. However, due to the exploratory nature of this study, in regard to the novel experimentation methods employed and the unique cohort, reducing type 2 errors was considered a priority. This is recommended in exploratory research, as a main aim is to ensure an important discovery is not missed in the first instance, which can be validated in subsequent dedicated replication studies [42]. In addition, the sample size (N = 166) used in this study was relatively small. However, this was still larger than the median sample size (N = 60) reported in a recent review of eighty genetic association studies in football [10]. There were also some deviations from HWE (i.e., GALNT13 and UCP2), which can indicate genotyping error and may have influenced the findings.

Conclusions
This study has presented novel evidence with regard to the genetic profiles of YDP and PDP male academy football players in England. To be specific, the IL6 (rs1800795) G allele was overrepresented in PDP players compared to YDP players, possibly due to its theorised pleiotropic effect on power and injury phenotypes. Moreover, the TGS and TWGS models derived from all thirty-three SNVs effectively distinguished YDP and PDP players, with PDP players exhibiting an overall more power-orientated polygenic profile. As such, this study has shown for the first time that there is significant inter-individual genetic variation between youth football players of specific age phases in English academies. If validated in larger independent youth football cohorts, these findings may have important implications for future studies examining genetic associations in youth football.