Gene Expression Profile in Similar Tissues Using Transcriptome Sequencing Data of Whole-Body Horse Skeletal Muscle

Horses have been studied for exercise function rather than food production, unlike most livestock. Therefore, the role and characteristics of tissue landscapes are critically understudied, except for certain muscles used in exercise-related studies. In the present study, we compared RNA-Seq data from 18 Jeju horse skeletal muscles to identify differentially expressed genes (DEGs) between tissues that have similar functions and to characterize these differences. We identified DEGs between different muscles using pairwise differential expression (DE) analyses of tissue transcriptome expression data and classified the samples using the expression values of those genes. Each tissue was largely classified into two groups and their subgroups by k-means clustering, and the DEGs identified in comparison between each group were analyzed by functional/pathway level using gene set enrichment analysis and gene level, confirming the expression of significant genes. As a result of the analysis, the differences in metabolic properties like glycolysis, oxidative phosphorylation, and exercise adaptation of the groups were detected. The results demonstrated that the biochemical and anatomical features of a wide range of muscle tissues in horses could be determined through transcriptome expression analysis, and provided proof-of-concept data demonstrating that RNA-Seq analysis can be used to classify and study in-depth differences between tissues with similar properties.


Introduction
Genetic studies of livestock have primarily been aimed at increasing production. Most livestock animals raised today are for meat, and improvements have been made to control fat content and muscle properties to induce rapid muscle growth or maximize the favored types of tissues [1][2][3][4][5][6].
Horses have been identified as a suitable model for studying gene expression in skeletal muscle related to exercise ability [7][8][9][10][11], which is rare for livestock. The present study is similar to a prior report of muscle activity and associated genes in humans [12][13][14][15]. Skeletal muscles have different physiological characteristics depending on their role, which is related to the composition of muscle fibers according to the main purpose of the muscle in question.
The primary component that determines the physiological function of muscles is myosin heavy chain (MyHC), which is myofibril's thick filament. MyHC is classified into three types, type I, type IIa, and type IIb/IIx, depending on their contraction rate and metabolic phenotype. Type I is a slow oxidative (SO) fiber that has a slow contraction speed and obtains ATP from oxidative phosphorylation

Ethics Statements
All animal experiments were performed in accordance with the guidelines of the Institutional Animal Care and Use Committee and were approved by the Animal Genomics and Bioinformatics Division, National Institute of Animal Science (No.2014-080). All efforts were made to minimize animal suffering.

Sample Collection
The specimens were obtained from a male horse, born on 12 June 2012, managed by the National Institute of Animal Science, R.D.A, Jeju, South Korea. It was slaughtered through exsanguination after electric stunning. Each tissue was collected from 18 regions of skeletal muscles in a hot-carcass state and immediately frozen to liquid nitrogen and preserved at −80 • C.

RNA Sequencing
Samples were obtained from a Jeju horse, a region-specific horse breed in Jeju, South Korea. Sample libraries from 18 different skeletal muscles were obtained from one Jeju horse. Ribosomal RNA was removed from total RNA using a RiboMinus Eukaryote kit for RNA-Seq (Thermo Fisher Scientific, Sunnyvale, CA, USA). The RNA-Seq library was prepared using a TruSeq RNA kit (Illumina; San Diego, CA, USA). Sampling and RNA sequencing were conducted by the National Institute of Animal Science, Rural Development Administration, and sequenced using an Illumina HiSeq 2000 (2 × 101 bp). The GEO accession number for this data set is GSE113147.

Data Processing
Quality assessment was conducted using FastQC version 0.11.5 software (https://www. bioinformatics.babraham.ac.uk/projects/fastqc) to analyze sequence reads in the fastq file format. Using the NGSQCToolkit [22] with a Phred quality score < 20 and less than 50 bp in total length were removed with paired-read and adapter sequences were trimmed.
The reference RNA sequence FASTA files were mapped to the horse genome (EquCab2 79, Equus_caballus.EquCab2.DNA.chromosome.1~31, X, MT, nonchromosomal.fa) downloaded from the Ensembl FTP database. The alignment was performed using the STAR (v 2.5.2b) RNA-Seq aligner with a two-pass method [23]. Using this approach, the first alignment detected splice junctions based on transcript information, and the final alignment was performed using splice junctions as a guide. As a guide for transcript alignment, we used the relevant gtf file (Equus_caballus.EquCab2.87.gtf) as a reference. The alignment process provided read counts for a total of 26,841 genes in 18 samples.

Differential Expression Analysis
First, the differentially expressed genes (DEGs) between each Jeju horse skeletal muscle were identified by pairwise analysis using the negative binomial test of the DESeq R package [24]. Based on the reads per gene counts identified through sequence mapping, the pairwise DEGs (p-value < 0.05) of all 18 tissues were identified. All DEGs (n = 1292) identified by pairwise analysis and their read counts were used to implement clustering of samples by their differential gene expression.
Clustering was performed using the hclust and the k-means function of the R package, and the optimal k value (k = 2) was calculated by the NbClust package in the R. The k-means clustering divides 18 tissues into large two groups, and we subdivided two groups into three subgroups each. Based on the classified cluster information, we conducted a DE analysis between groups using entire genes (n = 26,841). DESeq2 was used for DE analysis [25].

IPA Analysis
Further analysis to determine the functions of the identified DEGs was conducted using Ingenuity Pathway Analysis (IPA, Ingenuity Systems, Redwood, CA, USA) [26]. IPA was used to conduct enrichment analysis of the canonical pathway of the gene set and the inter-molecule network. To import the gene list and avoid omissions in the advanced analysis results as much as possible, the Ensembl Equus Caballus gene ID was converted into a well-annotated Ensembl human ID to identify DEGs.

Gene Ontology
Gene ontology (GO) enrichment analysis of DEGs was performed using the biological processes of the GeneOntology site (http://geneontology.org/, Powered by PANTHER) [27]. GO terms selected only the term of the main category, except subcategories.

Read Alignments and Results
By mapping the raw data to the ensemble horse genome (EquCab2 79) using STAR-2PASS alignment, the raw reads were aligned to a total of 26,841 genes. The number of raw sequence reads per sample ranged from 55.7 to 83.1 M, with an average of 64.1 M and an average input read length of 202 bp (2 × 101 bp). Uniquely mapped reads averaged 53.1 M, representing 83.2% of the total average reads.

Classification
A total of 1292 DEGs were identified through pairwise analysis. This number means that most of the DEGs shown in Figure 2 are overlapping. These are key features that can explain the difference between skeletal muscles among 26,841 genes prepared for clustering 18 samples. We classified samples into hierarchical clustering (using the value of log2 (count + 1)) and k-means clustering through the raw count data of these 1292 DEGs. The optimal k value calculated using the NbClust was set as 2 (Supplementary Materials Figure S1). Samples were divided primarily into two groups, and the classification results were the same in both hierarchical clustering and k-means clustering ( Figure 3). Each group classified by clustering was labeled as "A" or "B" ( Figure 3A) and normalized using DESeq2 [25], and subsequently plotted on a PCA plot ( Figure 3B). Groups A and B were further divided into subgroups based on the results of k-means clustering (k = 3). The k = 3 is a maximum value that allows two or more tissues to be included in one subgroup for DESeq2 analysis. The clustering results through k-means are presented on the PCA plot of Figure 3B-D.

Comparison of A vs. B
As a result of performing DE analysis on the total gene counts of groups classified into A and B using DESeq2, a total of 1264 DEGs (up = 597, down = 667, adjusted-p < 0.05, |log2Foldchange| > 0.58 (FC = 1.5)) were identified. The expression fold-change for A vs. B in DESeq2 results represents the value of A/B. Subsequent "upregulated/downregulated" means higher/lower expression in the former's condition of A vs. B. Genes in the Ensemble gene id format were converted into HGNC gene symbol format for further analysis, and a total of 857 (up = 487, down = 370) were successfully converted.
The IPA and gene ontology were used for gene set enrichment analysis of the identified DEGs. Table 1 shows the pathways within the top 10 scores (-log (p-value); > 1.3) with z-score values (not 0) in the IPA canonical pathway results. GO analysis was performed to elucidate the functional enrichment of upregulated DEGs of biological processes using PANTHER. Table S1 contains the top 10 GO terms based on the fold enrichment value. Directly related to the function of skeletal muscles are "regulation of muscle adaptation (GO:0043502)" and "regulation of striated muscle contraction (GO:0006942)".

Group A
To identify the characteristics of the muscles in each group, we conducted DE for subgroups in groups A and B, respectively. In A and B, a total of three sets of 1 vs. 2, 1 vs. 3, and 2 vs. 3 were performed. In comparison of group A1 and A2, a total of 236 DEGs (up = 128, down = 108, adjust-P < 0.05, |log2FC| > 1) were identified, of which 200 genes (up = 101, down = 99) were converted to the human gene symbol. As a result of analyzing the DEGs by IPA (Table 2), an increase in the glycolysis and gluconeogenesis canonical pathway was confirmed. This is similar in GO (Table S2), as we could find fast-twitch glycolytic muscle-related GO terms like "positive regulation of fast-twitch skeletal muscle fiber contraction (GO:0031448)", "canonical glycolysis (GO:0061621)", and "gluconeogenesis (GO:0006094)" in GO analysis of upregulated DEGs. Considering these results, the muscles of group A1 are more glycolytic than the muscles of A2 and are thought to be closer to the fast twitch.    A total of 383 DEGs (up = 3, down = 380, adjust-p < 0.05, |log2FC| > 1) were identified through DE analysis of groups A1 and A3 (301 genes (up = 1, down = 300) were converted to a gene symbol). In A1 vs. A3, all genes were down regulated except three genes, including the only annotated TLE1. The results of the IPA analysis could not identify canonical pathways directly related to metabolic processes, such as glycolysis, and most pathways were associated with immune responses, including T cell signaling pathways (Table 2). Additionally, in GO analysis, which was conducted on downregulated DEGs (Table S3), most of the GO terms related to T cell and immune response were confirmed, and there was no direct result on metabolism.

Group B
The same as group A, organizations classified as group B were also divided into three subgroups by k-means clustering, and DE analysis was conducted for each group. In comparison with group B1 and group B2, a total of 416 DEGs (up = 184, down = 232, adjust-p <0.05, |log2Foldchanage| > 1) were identified, of which 355 (up = 148, down = 208) has been converted to human gene symbol. The upregulation of glycolysis, oxidative phosphorylation, and gluconeogenesis was found in the IPA canonical pathway for these genes (Table 3). This suggests that in tissues belonging to group B1, overall energy metabolism occurs more actively than muscles in group B2. This is the same in GO analyzed using upregulated gene (Table S4), "canonical glycolysis (GO:0061621)", "gluconeogenesis (GO:0006094)", "regulation of oxidative phosphorylation (GO:0002082)", "mitochondrial electron transport, NADH to ubiquinone (GO:0006120)", "aerobic respiration (GO:0009060)", and other GO terms were also identified.  In group B1 vs. group B3, a total of 219 DEGs (up = 151, down = 68, |log2Foldchanage| > 1) were identified, of which 164 (up = 117, down = 46) were converted to gene symbols. In the IPA analysis, a pathway showing a certain tendency to metabolism was not found (Table 3), and pathways presumed to be upregulated are AMPK signaling, senescence pathway, synaptogenesis signaling pathway, factors promoting cardiogenesis in vertebrates, colorectal cancer metastasis signaling, adrenomedullin signaling pathway, cardiac hypertrophy signaling (enhanced), systemic lupus erythematosus in B cell signaling pathway, neuroinflammation signaling pathway, and hepatic fibrosis signaling pathway were identified. In GO of upregulated DEGs (Table S4), "response to nutrient levels (GO:0031667)", "regulation of lipid metabolic process (GO:0019216)", "positive regulation of developmental process (GO:0051094)", and "positive regulation of multicellular organismal process (GO:0051240)" enrichment was confirmed for, and the GO term indicating a certain metabolism type was not identified.

Discussion
In general, it is known that the difference in muscle fiber type and their metabolic properties is the trait that classifies the skeletal muscles. Many studies have demonstrated differences in biochemical-metabolic phenotypes depending on muscle use and fiber composition. In addition, the type and nature of the skeletal muscle fiber that makes up the muscle are transformable and can vary depending on the role the muscle plays. As a result of previous studies of equine exercise capacity, the classification, properties, and related genes of skeletal muscle tissue are well characterized. These prior studies demonstrated a large number of exercise-induced changes, with increased expression of genes associated with oxidative phosphorylation and mitochondrial function in skeletal muscle of individuals endurance-exercised continuously for long periods of time [8,11,15,27]. These results have been verified by molecular biology studies [8,9,15].
In the present study, biochemical analyses using RNA-Seq data from various regions of horse skeletal muscle tissue were conducted, with the objective of identifying whether a classification based on tissue characteristics was possible within very similar tissues. In our 18 skeletal muscle tissue data obtained from Jeju horse, there were no biological or technical replicates for each of the muscle tissues. To compensate for this limitation, the DE between groups was identified by considering the groups of k-means clustering as a replicate for a similar trait.
The DEGs obtained in the pairwise comparison classified 18 muscle tissues into two groups (hierarchical and k-means both), and these were subdivided into three subgroups each. When the expression of genes that determine the fiber type, and the accompanying genes [28], and the expression of genes involved in the functional enrichment of the related traits [29][30][31][32][33] were compared by listing on the basis of clustered samples as a form of the heatmap (Figure 4), it was found that the genes are known to be expressed together with the type of muscle fiber [28], usually increased and decreased here as well. However, the expression of the gene that represents the muscle fiber type and the type of gene that is expected to accompany were not completely consistent. Among DEGs, the expression of the gene known to be muscle fiber specific also represented various values within the same group A and group B. The difference between the two main groups was more distinguishable in the gene expression of the functional enrichment group of GO. Among the identified DEGs on the heatmap (Figure 4), genes included in "regulation of muscle adaptation (GO:0043502)" were upregulated in group A and downregulated in group B, and genes in the "positive regulation of FA β-oxidation (GO:0032000)" represent similar patterns as well. The functional enrichment for "regulation of muscle adaptation (GO:0043502)" was also identified in the GO results of upregulated DEGs of group A (Table S1). These results suggest that the classification results were preferentially distinguished by differences in muscle adaptation-related functions and gene expressions involved therein.
The canonical pathways identified from group A vs. group B do not directly refer to the effect on skeletal muscle, but many studies have verified their metabolic functions in skeletal muscle and muscle adaptation. As shown in Figure S2, estrogen receptor (ER) signaling exists at the center of functions that DEGs are involved (Table 1). Estrogen is a type of sex hormone that circulates in the body and directly or indirectly affects many molecular pathways through ER, and exercise-induced skeletal muscle adaptation is also affected by ER. ER signaling increases muscle mass regulation and regeneration and is known to enhance ATP production and lipid metabolism by promoting fission fusion and β-oxidation of mitochondria [34][35][36][37][38]. In addition, it was confirmed that a number of canonical pathways included in the table were also affected by the control mechanisms of ER signaling or involved in changes in the types associated with exercise-indicated adaptation [39][40][41][42][43][44][45][46][47][48][49].
In the distribution of muscles shown in Figure 1, the muscles in group A are usually located in the anterior and the muscles in group B are mostly located in the posterior. Taking together these differences in group A vs. group B, this result suggests that the muscles located in the front and rear perform different main functions and thus exhibit different gene expression. Anatomical research of the muscle architecture revealed that the muscle of the forelimbs and hindlimbs have different main roles [21,50]. The proximal horse muscles of the hindlimbs provide energy for locomotion and the muscles of forelimbs act as stiff spring-like struts to support a greater proportion of the body mass [21]. Therefore, the gene expressions for adapting to continuous activity are stronger in the muscles of the forelimbs and anterior that supports the weight of the body. On the other hand, the gene expression for energy generation appears to be more dominant in the muscles of the hindlimbs and the posterior that provide power.
In skeletal muscle adaptation research conducted on porcine, it has been found that adaptation to endurance exercise occurs primarily in forelimb musculature [51]. As shown in Figures 1 and 4, gene expressions for exercise adaptation are dominant in muscles of the anterior and it is thought that the same mechanism will work in equine also. According to Harrison et al. [52], muscle #9 (extensor carpi radialis) located in the forelimbs is activated in the swing state, unlike most of the other muscles in the forelimbs, which are activated in the state of stance. This report could explain why muscle #9 shows a similar expression to the muscles of group B, even though it is located in the anterior.
Unlike general knowledge, the expression of genes directly associated with the skeletal muscle fiber type and with a metabolic tendency (glycolysis, OXPHOS) caused by the fiber type is not unilaterally proportional to the associations of "slow-OXPHOS" and" fast-glycolytic". It has complex patterns that cannot be explained by a single element. The sub-clusters of groups A and B are sets of tissues that have similarities in these complex expression patterns. The analysis of DEGs between sub-clusters could identify the direct differences of detailed traits, such as fiber-specific gene expression, glycolysis, and oxidative phosphorylation levels (Tables 1-3) (Figure 4). Here, the tissues included in the cluster are also mostly located in close or homologous positions (group B3) and are expected to play similar roles. As more tissues are added or replication is secured, more clusters are separated. Thus, differences in simple traits overlap and multiple expression types appear, making it possible to classify a narrower range of locations and functions with only in silico data.

Conclusions
In this study, we analyzed the function of muscles with gene expression values obtained from transcriptome sequencing data based on in silico data and literature. We identified that RNA-Seq expression data can be used to classify tissues according to their specific characteristics, even among highly similar tissues, such as different skeletal muscles. Skeletal muscles were categorized by their role and difference in gene expression caused by hormonal and cell signaling pathways. The muscles we analyzed were largely classified into two groups by muscle adaptation-related pathways, which reflected their main roles and location. The two groups of tissues classified can once again form clusters with those with similar properties. This sub-cluster is clustered by detailed types of gene expressions, such as fiber types of muscles or energy metabolic pathways.
If there is an opportunity to confirm the tissue composition and protein expression through analyses using classical methods, this would provide an opportunity to verify the results of the present study and to examine the utility of studying animal tissues with NGS analysis.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/11/1359/s1, Figure S1: Optimal number of clusters for k-means clustering, Figure S2: IPA Summary network of Group A vs. Group B, Table S1: GO biological process of up-regulated DEGs in Group A vs. Group B, Table S2: GO biological process of up-regulated DEGs in Group A sub-clusters, Table S3: GO biological process of down-regulated DEGs, Table S4: GO biological process of up-regulated DEGs in Group B sub-clusters.

Conflicts of Interest:
The authors declare no conflict of interest.