Gallop Racing Shifts Mature mRNA towards Introns: Does Exercise-Induced Stress Enhance Genome Plasticity?

Physical exercise is universally recognized as stressful. Among the “sport species”, the horse is probably the most appropriate model for investigating the genomic response to stress due to the homogeneity of its genetic background. The aim of this work is to dissect the whole transcription modulation in Peripheral Blood Mononuclear Cells (PBMCs) after exercise with a time course framework focusing on unexplored regions related to introns and intergenic portions. PBMCs NGS from five 3 year old Sardinian Anglo-Arab racehorses collected at rest and after a 2000 m race was performed. Apart from differential gene expression ascertainment between the two time points the complexity of transcription for alternative transcripts was identified. Interestingly, we noted a transcription shift from the coding to the non-coding regions. We further investigated the possible causes of this phenomenon focusing on genomic repeats, using a differential expression approach and finding a strong general up-regulation of repetitive elements such as LINE. Since their modulation is also associated with the “exonization”, the recruitment of repeats that act with regulatory functions, suggesting that there might be an active regulation of this transcriptional shift. Thanks to an innovative bioinformatic approach, our study could represent a model for the transcriptomic investigation of stress.


Introduction
During the last few years, transcriptomic analysis has experienced a tremendous boost and it's the gold standard for fully characterize the RNA molecules expressed in a certain tissue in a specific physiological time.
Great efforts and resources have been invested in defining the functional part of the genome (which turned out to be remarkable) through the extensive use of RNA-Sequencing (RNA-Seq) technology

Training
All the horses recruited in this study followed the same training schedule. Before the competitions, horses were subjected to a daily training (6 days out of 7) on racetrack for about an hour. Each athlete was warmed up with 15 min of walk and trot, then the aerobic protocol was applied: progressive speeds form walk-to-canter with series lasting 1-3 min. Starting three days before the race, a strength workout that consisted of a canter-to-gallop at progressive speeds for 100-200 m (anaerobic workout) was added to previously described protocol. After competition, subjects were allowed to rest for one or two days, depending on the effort.

Sampling
Sardinian Anglo-Arab horses (five, two males and three females) participating in gallop races (2000 m) were recruited for the study. Ages ranged from three to four years. Peripheral blood samples were taken via jugular venipuncture at two time points: T0, at rest before the competition and at T1, immediately after the competition. Samples were taken in the same day, by one authorized veterinary, at the Pinna di Sassari racecourse (Sardinia, Italy) during the routine anti-doping procedures following the guidelines contained in the DM 797 of 16th October 2002 and subsequent amendments of Ministry of Agricultural, Food and Forestry Policies (MiPAAF). In addition, informed consent was given to horses' owners. PBMCs were isolated through a gradient centrifugation on Ficoll-Hypaque (GE Healthcare, Pollards Wood, UK) immediately after sample collection, to avoid changes that may occur on one side and possible bias in the gene expression due to cellular heterogeneity on the other. Once obtained, cells were immediately cryopreserved (−80 • C) in TriZol until RNA extraction.

RNA Extraction
After cell separation, samples were subjected to RNA extraction with a Fatty and Fibrous RNA Kit (cat. 732-6870, BIO-RAD, Hercules, CA, USA) following manufacturer specifications. The extracted RNA was quantified by VersaFluor-Bio-Rad fluorimeter using the Quant-It RNA kit (Invitrogen, Dorset, UK) and quality was verified through microfluidic electrophoresis (Bioanalyzer2100 Agilent Technologies, Santa Clara, CA, USA).

Sequencing
Samples were sequenced using Illumina technology. Illumina TruSeq 2 libraries for a total of 10 samples, 5 T0 and 5 T1 were prepared. Massive parallel sequencing was carried out on an HiSeq 1500 machine generating 101 bases paired-end reads.

Annotations Retrieval and Count Matrices
The complete Ensembl genes annotation (Release 98) was obtained through the Ensembl web page (https://www.ensembl.org/Equus_caballus/Info/Index). This file was used as starting point to construct the count matrices, one for the exons one for the introns. To assess for the exact proportion of reads aligning to intron regions we created a custom "negative" GTF file. Briefly, we constructed two preparatory GTF files through bash commands starting from the downloaded annotation: one file contained start/end coordinates for genes while the other one for each exon. Then we used bedtools subtract [25] to obtain the difference between overlapping regions. Therefore, the resulting GTF obtained by subtracting exons to genes contains only the introns, whatever the gene structure is. This procedure accounts also for overlapping genes, avoiding the user to choose which transcript is representative in presence of multiple isoforms. Some minor parsing adjustments were performed to produce a formally correct GTF. Counts were obtained with featureCounts [26] program using the computed GTF files.

Differential Expression Analyses: Genes
The differential gene expression analysis on the two produced datasets (gene counts from exons and from introns) was carried out through the R DESeq2 package [27] that offers a method for gene-level analysis of RNA-seq data. We set as selection parameters an absolute Fold Change (log2FC) greater than 1 and an adjusted p smaller than 0.05. (q < 0.05) [28]. Before the differential analysis, data were filtered for scarcely expressed entries, excluding genes whose mean expression was less than 5 FPKM in at least 80% of samples.
Through the execution of a custom Python script, we compared the differentially expressed genes (DEGs) for both exon and intron regions with a list of genes belonging to gene ontology (GO) referred to splicing events and mRNA maturation such as: regulation of alternative mRNA splicing via spliceosome (GO:0000381); mRNA splicing via spliceosome (GO:0000398); RNA processing (GO:0006396); RNA splicing (GO:0008380); RNA splicing regulation (GO:0043484).

Differential Expression Analyses: Isoforms
We assessed the differentially expressed gene isoforms between the two biological conditions (T0 and T1), using DEXseq [29] a Bioconductor package that allows differential exon usage estimation. This analysis was carried out both for both exon and intron compartment.

Differential Expression Analyses: Repetitive Elements
Differential repeats analysis was carried out in three phases: (1) Genome-wide differential expression analysis (2) Differential expression analysis of repeats classes (3) Differential expression analysis of long interspersed nuclear elements subclass 1 (LINE1) only.
Full repeats representation on equcab3 were retrieved from the University of California Santa Cruz (UCSC) Table Browser [30]. Minor bash parsing procedures were applied to produce a formally correct GTF for counting purposes with featureCounts, preserving name, class and family of the repeated element. The "genome-wide" differential expression of repeats was performed as previously described for exon and intron compartments respectively. Repeats class level aggregate statistics were also produced summing the results from DESeq2 output.
The relation between differentially expressed repeats and differentially expressed genes (exon and introns) was assessed through coordinate intersection using BEDTools [25]. Functional enrichment analysis was performed on the resulting genes subsets.
For full-length LINE-1 elements characterization, the LiftOver tool by UCSC Genome Browser [31] was used to update a custom annotation (GTF file provided by Professor David Adelson, University of Adelaide, personal communication) referred to the previous genome assembly equcab2 to the new one (complete sequences are available in Supplementary Material File S1). Successively, we carried out a differential expression analysis as described above for the genome-wide repeats analysis.

Sequencing Statistics
The results of sequencing and the related quality controls are summarized in Table 1. The average sequencing depth is about~22,000,000 reads per sample. Of these, more than 97% passed the trimming phase and more than 86.4% have a single match in the genome. Only these reads were used for downstream analyses.
Interestingly, analyzing the reads alignment rate and comparing the number of reads assigned to exon and intron portions, we noticed a transcriptional shift from the exons to the introns in T1 with respect to T0. On average, the percentage of exons-mapping reads in T0 was 56.3% and lowered to 53.4% in T1 (paired T-Test, p = 0.03) while the same on the intron-mapping reads was 23.2% in T0 and 27.6% in T1 with an increment of 4.4% (paired T-Test, p = 0.008). Only significant results are reported. The reads mapping into the repeated elements also increased from 14.3% in T0 to 15.8% in T1 (paired T-Test, p = 0.03). The remaining portion of the aligned reads was located in intergenic regions Table 2.

Differential Expression Analyses: Genes
Two differential expression analyses, one for the exon and one for the intron regions were carried out, setting T0 as baseline. After statistical analysis with DESeq2 software, starting from a cleaned dataset of 15,038 genes, 648 were found as differentially expressed in exonic compartment (with q < 0.05), 396 of which were up-regulated (log2FC ≥ +1) and 252 were down-regulated (log2FC ≤ −1), full results are provided in Figure 2, Table 3 and Table S1. For the intronic compartment, 1306 were differentially expressed, 637 up-regulated and 669 down-regulated in T1 vs. T0 (Figure 3, Table 3 and  Table S2). Comparing the common list of DEGs from exons and introns (1697 entries) with the list of GO terms referring to splicing events (673 entries), 44 modulated genes involved in messenger RNA maturation processes emerged. Of these, 10 are within the exonic compartment, 30 within the intronic ones and 4 presented a "full gene" expression modulation (Table S3).

Differential Expression Analyses: Isoforms
From the isoform evaluation using DEXSeq, 150 genes emerged that had at least one differentially expressed exon in the sample after the race compared with those at rest (Table S4). It is intriguing that 139 of these genes were not differentially expressed in the previous analysis (exon and intron compartment).
We investigated the biological functions of this gene subset with a GO enrichment analysis using String DB (https://string-db.org) that showed 160 interactions between 129 proteins retrieved from the 139 host genes (Table S5). The analysis pointed out 20 statistically significant enriched biological processes (q < 0.05). Among them, some were in line with our biological framework: "cellular response to stimulus", "response to stress". Other processes were implicated in the gene expression regulation, immune system activation and cardiac and striatal muscles regulation.
Using DEXSeq software to examine the intron differential expression we found 2997 introns differentially expressed contained in a total of 2020 genes (Table S6). Although most of them were poorly expressed, this approach pointed out that a substantial part of the differentially transcribed regions is usually overlooked.

Differential Expression Analyses: Repetitive Elements
We also evaluated the differential expression of repetitive elements, starting from a cleaned dataset comprising 66,151 repeats. In total, 7439 elements were significantly differentially expressed (q < 0.05) with an absolute log2FC greater or equal than 1 [4939 UP-regulated and 2500 DOWN-regulated] (Figure 4, Table S7).   From the evaluation of the repeat class we noticed that, almost half of the modulated repeats belong to the LINE (46.7%), followed by short interspersed nuclear elements (SINEs) (23.8%), DNA repetitive elements (11.7%) and long terminal repeats (LTR) (9.1%). Only a small amount of these repetitive elements is located within intergenic portion, while the remaining part (over 80%) is localized within gene sequences. The 94.1% of these (that correspond to the 77.3% of the total number of differentially expressed repeats) are within intron portion of the genes; the repeat class ranking remained the same of the genome wide distribution with a small increment of LINE percentage (Table S8). Interestingly, the analysis highlighted that over half (60%) of these genes is not modulated in T1.
The intersection between differentially expressed repeats and differentially expressed introns showed 153 matches that are retained intron contained at least a modulated repeat within its sequence.
These introns belong to 143 genes corresponding and GO enrichment analysis of this group of genes exhibited several biological enriched processes such as "response to stress" and "response to stimulus", involving 45 and 74 genes, respectively (Table S9).
The full-length LINEs analysis allowed us to identify 19 entire LINE1 elements modulated in T1 samples (Table S10).

Discussion
The genomic response induced by exercise is carried out through different mechanisms where transcription modulation is the king. With our experimental design we tried to dissect the transcription modulation in horse PBMCs after exercise focusing also on unexplored regions related to introns and intergenic portions.
In humans, as in other species, transcriptional changes induced by stress events have been extensively studied and, although they mainly concern protein-coding genes transcription modulation, many other effects have been noticed. For example the induction of transcriptional read-through that generates very long downstream sequences of gene-containing transcripts (DoGs) [32] or the production of non-coding RNAs [33] and changes in chromatin structure [34]. Under stress conditions, post transcriptional mRNA processing can also be altered [35].
Another phenomenon called exonization or intron retention, that is the transcription of introns through alternative splicing also induced by the transcription of local transposed elements, can be activated [36,37]. Repetitive elements are indeed known to influence the surrounding sequences once activated [38]. All these mechanisms represent a sort of genomic response to external stimuli, and we tried to search for signs of them in exercise induced stress.
Our first result highlights the difference in transcripts mapping to exonic o intronic gene portions: we noticed a statistically significant transcriptional shift after a race (T1), with a reduction in the exonic reads and an increase in the intronic sequences in comparison with T0 ( Figure 1).
Considering the number and the modulation of genes focusing exonic part ( Figure 2) 648 were differentially expressed in T1 compared to T0.
If we focus on intron sequences, usually spliced and less functional, and we found over 1306 genes differentially expressed as part of the intronic component ( Figure 3). According to this result, we tried to evaluate if this was the consequence of alternative splicing activation (AS). AS indeed is the major source of protein diversity and appear to be the general rule rather than an exception; it is now well known that alternative splicing is widely used by the cell [1] and, in case of homeostasis perturbation, the cell can apply a cost-effective strategy preferring to enhance an advantageous isoform, or a not functional one, instead of modulating or shutting down the entire transcription machinery [19]. AS is a highly regulated process with a complex interactions network between sequence features within the pre-mRNA [39] and trans-active splicing factor proteins [40]. This process can occur physiologically for steady state cell protein production, but also in case of response to developmental cues and external stimuli, including stress [41,42].
To further investigate if the transcriptional shift in our data was due to AS towards not annotated exons or to co-transcriptional splicing [43], we matched our DEGs with those having GO terms referred to splicing: 44 modulated genes involved in mRNA maturation processes emerged, encouraging us to continue the investigation aimed at alternative splicing and its potential causes (Table S3).
Indeed, the isoform analysis allowed us to identify the transcription of each exon. From this analysis, 150 genes emerged, reduced then to 139 considering only those that did not appear in the 648 DEGs from the exon compartment (Table S4).
With these genes, that are preferentially expressing a particular isoform in the post-race, we performed a GO enrichment analysis to assess which biological processes were mostly involved. Several categories related to response of exercise-induced stress emerged: response to stimulus, cellular response to stress, positive regulation of gene expression, regulation of muscle adaptation and many others listed in Table S5. This result can be interpreted as a confirmation that exercise-induced stress activates targeted alternative splicing in genes belonging to key biological processes.
The same analysis was carried out for the detection of isoforms on the intronic compartment to evaluate the stress-induced intron transcription that may be correlated to the intron retention (IR) phenomenon, considered one of the multiple declinations of AS [44]. With IR a normally excluded intron is maintained in the final mRNA transcript, probably due to weak splice signals [45,46]. The result of this mechanism is a post-transcriptionally gene-expression regulation through the NMD resulting in the open reading frame interruption by the introduction of premature termination codons [19]. In other cases, IR has the classical alternative splicing function that is to generate protein diversity allowing the production of pairs of protein isoforms [18].
We found 2020 genes with differentially expressed introns therefore with putative retained intron (Table S6). This result confirms that our data are largely interested by this phenomenon.
In literature a general increase of intron retention events has been observed in response to some types of stress like thermal stress exposure [47,48], chemotherapy-induced stress in cancer cells [49] and hypoxia [50]. The intron retention phenomenon could be triggered by the transcriptional activation of repeated elements contained within the intron sequence [36,37]. These features represent over 40% of the human genome sequence [51], 46% of equine ones [52], and are widespread both in genic and intergenic portions. LINEs, long interspersed nucleotide elements, are among the most represented Transposable Elements (TEs) in the genome, making up about 20% of human DNA sequence [51] (17% in the horse [52]). TEs are so called because of the ability to be integrated into other parts of the genome via an RNA intermediate.
For their capabilities to transcribe and induce intron retention, we investigated the repeated sequences with a differential analysis approach that allowed us to identify over 7000 modulated repeats, half of which were LINE 1 ( Figure 4). Interestingly, four fifths of these modulated repeats were within gene sequences, and within differentially expressed genes, confirming the not random nature of their expression (Figure 1). Moreover, 153 differentially expressed introns had at least one differentially expressed repeat within them. These results may support the hypothesis that intron retention is facilitated by the containing of repeated elements, whose transcriptional increase probably activates the exonization phenomenon.
Only a small part of LINE sequences is able, as well as transcribing, to actively retrotranspose, indeed most of them are in a truncated form in the genome.
Moreover, retrotransposition does not normally occur in somatic cells because the repeats are strongly silenced by a high degree of methylation [53,54], but in high-stress conditions the methylation level can decrease and the LINE1 transcription can influence the intron retention [20,[55][56][57][58].
LINEs subclass L1 transcription is driven by an internal RNA pol II promoter [59] which allows it to copy itself in an RNA intermediate pasting it into another genome location. It is mandatory, however, that the L1 sequence is intact and functional to replicate itself [60].
While contributing to genomic variability and species evolution this behavior may induce several adverse events [20,[55][56][57]. Since these genomic modifications may be harmful, the host cell is provided with several mechanisms to suppress dangerous retrotransposon activity [53,54], among which one of the main known is the methylation of CpG sequences in the LINE-1 5 UTR. In fact, a general LINE-1 methylation decrease has been observed in case of intense stress conditions such as many cancer types [58] and neurological disorders and, indeed, the retrotransposition is demonstrated only in few tissues (germinal and nervous tissues) [38,61].
For these reasons, it may be interesting to evaluate the transcription of the entire LINE1 that have the ability, once completely transcribed, to copy and paste itself in other parts of the genome. In our data there were 19 LINE1 (Table S10) elements entirely transcribed and therefore potentially retrontranscribed open the scenery of retrotransposition caused by exercise-induced stress. Retrotransposition indeed can lead to gene breaks and this could explain what referred to as derangement of the immune system during the overtraining syndrome [62].
These observations may be at the base of the genomic plasticity in response to stress or in particular stress stimulus such as that induced by strenuous physical effort.

Conclusions
Our results strongly suggest that strenuous exercise leads to stress adaptation through a variety of genomic mechanisms from the activation of typical stress response pathways and genes (inflammatory and immune response) to the activation of additional resources that stimulate genome plasticity. These include targeted alternative splicing activation, increased intron transcription through intron retention, repeats transcription, especially within genes, and LINE1 full-length modulation that can lead to retrotransposition as ultimately resource of genomic plasticity under stress conditions.  Table S8: Classes distribution of Differentially expressed repeats among genomic features. (XLSX 9 Kb); Table S9: Biological processes enrichment analysis of genes with intron retention events. (XLSX 27 Kb); Table S10: Full length LINE1 expression analysis. Naming of features indicates type of repeats and chromosome location. (XLSX 10 Kb).
Author Contributions: K.C. Contributed to the study conception and design, samples collection, collection and assembly of data, data analysis and interpretation, manuscript writing, and final approval of the manuscript. S.M. and G.C. contributed to collection and assembly of data, data analysis and interpretation, manuscript writing, and final approval of manuscript. S.G., A.G., M.S., and A.V. contributed to the data analysis and interpretation, manuscript drafting, and final approval of the manuscript. R.C. contributed to manuscript drafting, final approval of the manuscript, and provision of study material. S.C. contributed to the study conception and design, general supervision of the research group, collection and assembly of data, data analysis and interpretation, manuscript writing, final approval of the manuscript. All authors have read and agreed to the published version of the manuscript.

Acknowledgments:
The authors gratefully acknowledge Mr. Gianluca Alunni for his valuable technical support. A special thanks to Sardinian Regional Department for Agriculture and Agro-Pastoral Reform, C.R.P. Sardinian Regional Programming Centre and ANACAAD-National association of Anglo-Arab and part bred Anglo-Arab Horse.

Conflicts of Interest:
The authors declare no conflict of interest.