TTN Variants Are Associated with Physical Performance and Provide Potential Markers for Sport-Related Phenotypes

TTN encodes the third myofilament, titin, which plays structural, mechanical, regulatory, and developmental roles in sarcomeres. The aim of this research was to determine the interaction between novel and previously described TTN variants and athletic performance, as well as competition level, in Caucasians. Firstly, 100 athletes and 47 controls were recruited, and whole-genome sequencing was performed. Secondly, 348 athletes (108 endurance, 100 sprint/power, 140 mixed-sport athletes) and 403 volunteers were included, and real-time PCR was performed. We found a significant overrepresentation of the rs10497520 CT and TT genotypes in the sprint/power athlete group (95% CI, 1.41–3.66, p = 0.0013). The rs10497520 T carriers were 2.17 times more likely to become sprint/power athletes (95% CI 1.35–3.49, p = 0.0021). We also found that the likelihood of having the TT genotype was higher for the highly elite and sub-elite sprint/power athletes. Possessing at least one TAA (rs10497520, rs55837610, rs72648256) haplotype resulted in an increase in the log-odds ratio by 0.80 (p = 0.0015), 1.42 (p = 0.003), and 0.77 (p = 0.044) for all, highly elite, and sub-elite sprint/power athletes, respectively. We demonstrated that harbouring the rs10497520 T allele, individually and in a haplotype combination, increased the chance of being an elite sprint/power athlete, indicating that this allele may be favourable for sprint/power performance.


Introduction
In the 1950s, the filamentous nature of the sarcomere was established, and muscle contraction was considered in terms of the interaction between two contractile filaments, actin and myosin (referred to as the sliding filament theory) [1,2]. The discovery of a third filament, titin, also known as a connectin, challenged the previous concepts and aroused the interest of scientists in explaining its role in the development of the structural and functional properties of sarcomeres and muscle as a whole [3][4][5].
Titin is the largest known protein and has a very complex structure, which extends from the Z-disc to the M-line. Four structurally and functionally distinct regions with numerous sequence domain repeats can be distinguished: (i) an amino-terminal Z-disc region, which acts as an anchor, embedding titin to the Z-disc through binding to proteins such as α-actinin and functioning as a part of the mechanical stretch sensor machinery; (ii) a middle I-band region, which acts as a molecular spring, providing myocyte elasticity and maintaining the Z-and M-line connections during muscle elongation and contraction; (iii) an A-band region, which is important for thick filament termination, interaction with myosin-binding protein C, and modification of the thick filament length; and (iv) a carboxyterminal part spanning the M-line, which is involved in various signalling pathways [6][7][8]. Detailed visualisations of the domain layout of the protein and its position in the sarcomere are available in multiple excellent reviews about the protein [6,7,9]. Thus, titin performs a variety of functions in muscle contraction and force production, including structural and developmental roles in the organisation of sarcomeres, mechanical roles in passive and active force regulation in cardiac and skeletal muscles, and serving as a sensory and signalling mediator [9][10][11][12][13][14]. Undoubtedly, we are only in the early stages of understanding the role of titin in the formation of muscle and its properties [15].
The giant TTN gene (2q31.2) encoding titin consists of 365 exons and transcribes an mRNA over 100 kb long. Due to extensive mRNA splicing, human TTN generates seven isoforms with different physiological properties: the longer isoforms are more elastic, while the shorter isoforms are stiffer [6,[16][17][18]. The main skeletal muscle isoform is N2A, which is the longest known isoform, expressing 312 exons but lacking exon 49, which encodes the N2B domain in the I-band region. In contrast, the cardiac-specific isoforms are shorter, and two major isoforms, N2B (191 exons) and N2BA (313 exons), have been identified. Depending on muscle type and developmental stage, different lengths of these titin isoforms are produced [6,8,19]. TTN polymorphisms may affect the contractile characteristics of titin filaments, which may be a favourable or unfavourable factor for muscle performance [8,20]. The TTN gene has been described as a major human disease gene [21]. Recently, TTN gene variants have been linked to numerous inherited diseases of skeletal and cardiac muscle [21][22][23][24]; however, their relationship with physical performance, athletic status, and injury risk is almost unknown. Subsequent research has confirmed the key role of titin in the development of crucial skeletal and cardiac muscle functions, such as strength and stiffness, and highlighted its role in sports performance [6,20,25,26], and in all probability as argued by Perrin et al. (2017) in the risk of injury [27]. The best-known variation within the TTN gene is the C>T polymorphism (rs10497520), which results in a lysine (Lys) transforming into a glutamic acid (Glu), which may affect the variability within the isoforms' expression in muscle tissue [20]. To date, this polymorphism has been associated with differences in the maximal oxygen consumption (VO 2 max) training response [25] and muscle fascicle length [20].
These studies have suggested that the TTN gene may be a very promising candidate gene for physical performance and injury risk. However, whole TTN gene analysis is very complicated due to its large size and complex structure. As next-generation sequencing (NGS) has become widely available, it has recently become possible to examine the TTN gene structure and identify specific mutations that influence physical performance and human health. To date, more than 120 TTN polymorphisms related to the human phenotype have been identified [6]. However, TTN gene sequencing in varied populations, as well as the isoform diversity and reference sequences used, underline the problems in assessing the significance and relationship between the sequence changes and the observed phenotype. Thus, the role of TTN mutations in modifying parameters associated with sports skills is still unexplained. Therefore, the aim of this study was to determine the interaction between novel (rs72648256 and rs55837610) and previously described (rs10497520) allelic variants, genotypes, and haplotypes located in the TTN gene and athletic performance and competition level in the Caucasian population. Taken together, we can verify the usefulness of the TTN gene as a genetic marker for sports skills, which may underpin differences in the potential to be an elite athlete.

Ethics Statement
The experimental protocols were conducted according to the World Medical Association Declaration of Helsinki, the Strengthening the Reporting of Genetic Association studies (STREGA) statement, and ethical standards in sports science research. All procedures were approved by the Ethics Committee at The District Medical Chamber in Gdansk (KB-8/19). Written informed consent was obtained from all individual participants.

Participants
In the first part of the experiment involving whole-genome sequencing (WGS), the study group consisted of 100 male elite sprint/power (n = 52) and endurance (n = 48) athletes (age: 23.5 ± 5.9 years) of the highest nationally-competitive standards (classification was based on scoring tables, e.g., IAAF, FINA, or receiving a medal in the national championships, or participation in international competition at the European or World Championships). Athletes with personal best results ranking them in the top 100 in a particular sports discipline in the world or in Europe were included in the study group. As the aim of this part of the study was to determine genetic variants associated with overall physical performance, all athletes were considered as a single group.
The control group consisted of 47 healthy individuals with no pairs with kinship above 0.125 (kinship was assessed with hl.pl_relate method, kinship metric scale 0-5) (age: 22.4 ± 6.3 years). The inclusion criteria for volunteers were no medical history of any cardiorespiratory diseases, and not participating in professional sport training.
The second part of our study aimed to assess the association of selected TTN variants with sport-related phenotypes, and 348 Polish athletes (age: 27.8 ± 7.1 years) who competed in national and international events were involved. The athletes were stratified into three groups according to values of relative anaerobic/aerobic energy system contribution, time of competitive exercise performance, and intensity of exertion in each sport: endurance (n = 108), sprint/power (n = 100), and mixed-sport (n = 140) athletes. The athletes in these groups were divided into subgroups according to their competition level: highly elite (gold medalists in the World and European Championships, World Cups, or Olympic Games), elite (silver or bronze medalists in the World and European Championships, World Cups, or Olympic Games), and sub-elite (participants in international competitions). The breakdown among groups was as follows: high elite (n = 19), elite (n = 45), sub-elite (n = 36) sprint/power athletes; high elite (n = 19), elite (n = 47), sub-elite (n = 42) endurance athletes; and high elite (n = 17), elite (n = 63), sub-elite (n = 61) mixed-sport athletes.
Non-training unrelated students (n = 403) from the Gdansk University of Physical Education and Sport (age 22 ± 3.4 years) were included in the control group.
Athletes in the first and second part of the study, as well as controls, were of Caucasian origin.
We performed sample size calculations to assess the power of the genotyping study, assuming the following parameters: power 90%, effect size 25%, and alpha 0.05. The results indicated a minimum group size of 329 participants per group.

WGS and Data Processing
WGS was performed externally by the BGI company using the following parameters: paired reads of 150 bps, at least 90 GB of data per sample with 300 M reads and 30× coverage. Fastq files were processed with Intelliseq Germline Pipeline 1.8.3 (https: //gitlab.com/intelliseq/workflows, accessed on 9 December 2021) built with Cromwell (https://cromwell.readthedocs.io/en/stable/, accessed on 9 December 2021). Within the pipeline, Fastq file quality was assessed with fastQC. Fastq files were then aligned to the Broad Institute Hg38 Human Reference Genome with GATK. Duplicate reads were removed with Picard, and base quality Phred scores were recalibrated using GATK's covariance recalibration. Variants were called with GATK HaplotypeCaller, and genomic variant calling files (gvcf) were obtained. Files were then imported to Hail (0.2.62, https://hail.is/ (accessed on 9 December 2021)).

WGS Data Filtering and Annotation
All analyses described below were performed with Hail (0.2.62, https://hail.is/ (accessed on 14 December 2021)). Detailed information about file preparation, annotation and filtering has been provided to the project Github repository (https://github.com/ ippas/imdik-zekanowski-sportwgs, accessed on 9 December 2021). Briefly, all gvcf were combined into a sparse matrix table and genotyped. Multiallelic variants were split. The Vcf file with all alternate allele calls in the analysed cohort was filtered to exclude repeated and low-complexity sequences (UCSCRepeatMasker track). Only loci with more than 90% gnomAD v3 samples with a DP of >1 were kept for further analysis. Variants were annotated with gnomAD (v3; https://gnomad.broadinstitute.org/ (accessed on 14 December 2021)). One sample was excluded from the analysis based on PCA of 0.1% of genotypes.

Variant Filtering
To narrow the list of analysed genotypes and to exclude common mutations, a minor allele frequency (MAF, based on gnomAD v3 non-Finnish Europeans) filter of 0.5% was used. Each of the genotypes was tested with Fisher's exact test to find over-or underrepresented variants in elite sportsmen.

Statistical Analyses
Hardy-Weinberg equilibrium was evaluated using the exact test. An association analysis between a single SNP and dependent categorical variables was conducted under four different genetic models (inheritance patterns): co-dominant, dominant, recessive and over-dominant, whenever possible. The odds ratio and 95% confidence interval (taking the most frequent homozygous genotype as the reference) were calculated. Haplotypes were inferred using an expectation-maximisation algorithm. In only one case was the haplotype ambiguous. Unambiguous haplotypes were used in further analysis, and a pair of haplotypes were assigned to each individual. Only haplotypes with frequencies >1% were considered. Haplotypes were tested for association under the assumption of a dominant model using a general linear model. SNPassoc and haplo.stats package for R (R Core Team (2020), R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, URL https://www.R-project. org/ (accessed on 27 January 2022)) were used for single SNP and haplotype analysis, respectively. p values < 0.05 were considered significant.

Results
Three polymorphisms were selected for the study: rs10497520 (Lys1201Glu) based on a literature review, and two rs55837610 (Ile23902Thr) and rs72648256 (Asp31617Glu) based on the WGS results. All polymorphisms were in Hardy-Weinberg equilibrium (p = 0.600, p = 1.0, p = 1.0, for TTN rs10497520, rs55837610, and rs72648256, respectively) in control individuals. rs10497520 deviated from expectations in the sprint/power group (p = 0.037).
Comparisons of all athletes (sprint/power, endurance, and mixed-sports athletes) with controls revealed no significant associations ( Table 1), regardless of assumptions about the genetic model of a trait. When athletes were stratified by sport discipline into sprint/power, endurance, and mixed-sports groups (Tables 2-4), we found a significant over-representation of the rs10497520 CT and CT-TT genotypes in the sprint/power athlete group ( Table 2). The genotypic odds ratio (OR) for genotype CT (relative to CC) was 2.25 (95% confidence interval, CI, 1.40-3.62, p = 0.0034) under the co-dominant model and 2.27 (95% CI, 1.41-3.66, p = 0.0013) under the over-dominant model. Under the dominant model, the odds ratio revealed that individuals possessing the T allele (CT+TT) were 2.17 times more likely to be sprint/power athletes (95% CI 1.35-3.49, p = 0.0021) compared with CC homozygotes. There were no associations between endurance athletes (Table 3) or those competing in mixed sport disciplines (Table 4). When sport discipline and athletic performance were considered (Table 5), significant associations in the same directions were found only in the sprint/power group for highly elite athletes (co-dominant, dominant, and over-dominant model) and for sub-elite athletes (over-dominant model). In the highly elite athletes, the strength of association measured by OR was approximately two times greater than in all sprint/power athletes (Table 6). A detailed summary of associations stratified by athletic performance is presented in Supplementary Tables S1-S8.      Four haplotypes were reconstructed (Table 7). In all individuals except one, haplotypes could be assigned unambiguously. Two haplotypes, CAA and TAA (rs10497520, rs55837610, rs72648256, respectively), had frequencies greater than 1%, and these haplotypes were considered in the association analysis. These two haplotypes existed as three diplotypes (matched pairs of haplotypes), namely: CAA/CAA, CAA/TAA, and TAA/TAA. The distribution of diplotypes in control individuals and athletes stratified by sport discipline is shown in Supplementary Table S9. The association of diplotypes with athletic status using a generalised linear model is presented in Table 8. As in single SNP analysis, we found an association in the sprint/power athletes (all athletes, highly elite athletes, and sub-elite athletes). Possessing at least one TAA haplotype (dominant model) resulted in an increase in the log-odds ratio by 0.80 (p = 0.00146), 1.42 (p = 0.003), and 0.77 (p = 0.044) for all sprint/power athletes, high elite sprint/power athletes, and sub-elite sprint/power athletes, respectively. This translates into a 123%, 314%, and 116% increase in odds of being a sprint/power athlete, high elite sprint/power athlete, and sub-elite sprint/power athlete, respectively.

Discussion
Currently, rapid advances in technologies in the field of genomics, such as highthroughput DNA sequencing combined with improved and new methods of analysis, are helping sports and medicine sciences achieve goals towards precision sports medicine and exercise prescription. These novel data are predicted to advance precision sport sciences considerably by facilitating optimally tailored training programs based on an individual's molecular profile, which includes their genomic information [28,29]. Knowledge regarding the biochemical, physiological, and genetic factors that affect muscle performance is key to achieving success in sports [30].
Titin plays a pivotal role in muscle passive stiffness, flexibility, tension, and strength [6,8]. Therefore, the TTN gene may be considered as a promising genetic marker for athletic status, which may underlie differences in the potential to achieve outstanding results in sports competition. However, analysis of the whole gene is demanding due to its complex structure and large size, and only a few studies have been undertaken to clarify its role in the formation of physical performance [20,25,31] and injury risk [26]. Recently, with DNA sequencing becoming widely available, it has been possible to determine the individual polymorphism located in the TTN gene that may influence elite athletic status, which was the main aim of our study. Therefore, we selected three polymorphisms: one previously described missense rs10497520 (Lys1201Glu) polymorphism, and due to our WGS results, two novel missense rs55837610 (Ile23902Thr) and rs72648256 (Asp31617Glu) polymorphisms. The first one is localised in the part of the TTN gene encoding the titin Z-disc region and the other two in the protein A-band (https://fraternalilab.kcl.ac.uk/TITINdb/search_page/ (accessed on 12 April 2022)).
The experiment revealed that harbouring the specific TTN genotypes is significantly associated with athletic performance and athletes' competition level in the Caucasian population. Specifically, we found that the rs10497520 CT and TT genotypes were significantly more frequent among the sprint/power athlete group. The T allele carriers were over 2.17 times more likely to be sprint/power athletes compared with the CC homozygotes, indicating that the T allele is favourable for sprint/power performance. When competition level was considered, we found that the likelihood of having the TT genotype was higher for the highly elite power athletes (co-dominant, dominant, and over-dominant model) and for sub-elite athletes (over-dominant model). Furthermore, these results were confirmed in the complex haplotype analysis and demonstrated that harbouring at least one TAA haplotype (rs10497520, rs55837610, or rs72648256, respectively) may be favourable for achieving success in sprint/power sports. In particular, the chance of being a sprint/power athlete, high elite sprint/power athlete, or sub-elite sprint/power athlete was 123%, 314%, and 116% higher, respectively. These findings provide support for the potential influential role of the TTN rs10497520 polymorphism in determining elite athletic status in sprint/power athletes, and point to the T allele as a beneficial factor for sprint/power performance, which constitutes the most important finding of this study. This knowledge may potentially be used during screening to identify young athletes' predisposition for a certain type of sport, which may be a vital component of many sports programs and would also be useful to guide children towards the most suited athletic discipline. For trainers, it may serve as an additional source of detailed information, helpful in the personalisation of training methods and more efficient management of the sports careers of trained athletes. However, it needs to be highlighted that physical performance is a very complex trait, which is pre-determined by inherited traits against the degree of influence by environmental factors (training, diet, motivation, development opportunities, and health conditions). Stebbings et al. suggested that TTN variants, specifically rs10497520, may contribute to the variability within titin isoform expression in muscle tissue [20]. The length of skeletal muscle titin is directly related to sarcomere elongation, elasticity, and stiffness. This is associated with the I-band region variations, especially in different lengths of the PEVEK region (rich in proline, glutamic acid, valine, and lysine residues) and different numbers of immunoglobulin (Ig) domains. The longer isoforms are more elastic, while the shorter isoforms are more rigid, thus, increasing passive stiffness and improving motor control [6,[16][17][18]. Therefore, TTN gene variants may result in the production of shorter or longer isoforms, which alter contractile characteristics [8,20]. Additionally, due to the selected variants' localisation, the missense rs10497520 polymorphism may lead to the production of normal length isoforms but is poorly attached to the Z-disc, which may impair the mechanical stability of sarcomeres. However, the missense rs55837610 and rs72648256 polymorphisms may modify the stabilisation of myosin filaments in the A-band region and, thus, influence sarcomere structure and contraction. These molecular changes may impact (positively or negatively) muscle performance and overall sports skills.
To the best of our knowledge, this is the first study to analyse the association of the TTN rs10497520, rs55837610, rs72648256 polymorphisms, either individually or in haplotype combination, with athletic performance and competition level in elite Caucasian athletes. Only rs10497520 was previously analysed in the context of sports predispositions. Therefore, our results cannot be discussed with direct comparisons to other studies. However, the literature includes a genome-wide linkage scan for training-induced changes in submaximal exercise stroke volume (∆SV50) in the HERITAGE Family Study including 483 individuals, demonstrating that TTN seems to be a promising candidate gene for human variation in ∆SV50 in the sedentary state and adapting cardiac function to endurance training [31]. Further evidence was provided by combining RNA profiling with single-gene DNA marker association analysis, which indicated that the rs10497520 polymorphism may be linked to the post-training changes in VO 2 max in the selected 473 participants [25]. In a study including 278 Caucasian males, Stebbings et al. examined the relationship between this SNP and muscle fascicle length in recreationally active men, and marathon personal best times in marathon runners. These results showed that the T allele was associated with a shorter length of the skeletal muscle fascicle, which requires less energy to produce a given force and conveys an advantage for marathon-running performance in trained individuals. They suggested that CC homozygotes possess longer vastus lateralis fascicles, which may be associated with successful sprint running performance [20]. Surprisingly, our results revealed that the T allele was associated with sprint/power-not enduranceperformance. These inconsistent results confirm that we are only in the early stages of understanding the relationship between TTN polymorphisms and physical performance, and further research is needed. During physical activity or sport, the complex structural organisation of skeletal muscle is easily stressed and damaged, leading to frequent injuries. In sports, muscle injuries comprise 10-55% of all injuries [32]. These injuries reduce or even prevent participation in physical or occupational activities for at least several weeks, affecting athletes' performance and career, team results and financial aspects, as well as having a negative impact on their quality of life. These consequences support the need for improved pre-and post-injury health care. Although they have been relatively well described at the clinical level, the molecular mechanisms causing these injuries still remain mostly unknown, and we are only in the early stages of understanding the associations between genetic variants and the risk of muscle injuries. In the future, genetic studies may be used by clinicians, coaches, and athletes to develop personalised programs for the pre-habilitation of susceptible individuals, or to aid recovery after overloading/injury (personalised medicine). Additionally, the potential biological mechanisms connecting titin with injuries have been described by Perrin et al. [27], thus, it has been assumed that the variants within the TTN gene may be also associated with the risk of muscle injuries. It was suggested that rs10497520 CC genotype carriers with longer skeletal muscle fascicles may have a rightward shift in the length-tension relationship and greater optimal joint angles for maximal torque production. Such a shift has been linked to a decreased risk of injury, as a longer optimal skeletal muscle length would mean that less of the muscle's functional range would be along the more unstable descending limb of the length-tension curve. Conversely, T allele carriers may be susceptible to injuries [20,33]. The importance of the TTN gene for muscle properties and injury risk was confirmed by Vera et al., who showed the association between connective tissue disorders (CTDs) and genetic variants within this gene [26]. In the future, knowledge of the TTN genotype in populations at higher injury risk, such as athletes and physically active people, may help to predict the consequences of physical activity and tailored training interventions specific to the TTN genotype [20]. However, further research is required before the information from genetic studies will be able to be used in practice.
The strengths of our study include the exact experimental organisation, particularly in combining DNA sequencing methodology and subsequent advanced bioinformatic assessment with the association of selected TTN variants with sport-related phenotypes. It needs to be highlighted that these data are the first effort in applying WGS to determine the significance of novel and previously described allelic variants, genotypes, and haplotypes located in the TTN gene related to physical performance and to the achievement of outstanding results in sports competition. The chosen methodology allows the identification of sport-related variants in both coding and non-coding regions of the gene. A potential limitation of the present study is population-specific characteristics, such as the highly homogenous European background of the cohort. Thus, we were unable to perform an analysis by ethnicity. Additionally, none, or a low number, of some homozygotes in the Caucasian population required us to analyse the homozygotes in combination with the heterozygotes and, thus, may be potential confounders in the analyses.

Conclusions
This study provides evidence for the association between TTN variants and sports skills, which may underline differences in the potential to be an elite athlete. We demonstrated that harbouring at least one of the rs10497520 T alleles, either individually or in haplotype combination, increases the chance of being an elite sprint/power athlete, indicating that the T allele is favourable for sprint/power performance. In conclusion, we believe that the TTN gene may be a promising genetic marker for athletic performance and competition level. In the future, this knowledge may potentially be used during screening to find new sports talents, as well as an additional source of detailed information helpful in the personalisation of training methods and more efficient management of the sports careers of trained athletes. However, far more studies are required to establish the role of these variants and their implications for human health, physical activity, adaptive changes in muscles in response to training, and injury risk.