Identification of Alternative Splicing Events Associated with Paratuberculosis in Dairy Cattle Using Multi-Tissue RNA Sequencing Data

Paratuberculosis is a major endemic disease caused by Mycobacterium avium subspecies paratuberculosis (MAP) infection and leads to huge economic loss in the dairy sector worldwide. Alternative splicing (AS) events, playing indispensable regulatory roles in many protein functions and biological pathways, are shown to be associated with complex traits and diseases. In this study, by integrating the RNA sequencing (RNA-seq) data of 24 samples from three tissues (peripheral blood, jejunum and salivary gland) of Holstein cows, we obtained 2,706,541,696 uniquely mapped reads in total that represented 12,870 expressed genes, and detected 4285 differentially expressed genes (DEGs) between MAP-infected and healthy cows (p < 0.05). Of them, 92 differentially expressed splicing factors (DESFs) were included. Further, 119, 150 and 68 differential alternative splicing (DAS) events between MAP-infected and healthy cows were identified in peripheral blood, jejunum and salivary glands, respectively. Of note, six DAS events were highly and significantly correlated with the DESFs (R2 > 0.9; p < 0.01), and their corresponding genes (COPI coat complex subunit gamma 2 gene (COPG2), kinesin family member 2C gene (KIF2C), exocyst complex component 7 (EXOC7), Rab9 effector protein with kelch motifs gene (RABEPK), deoxyribonuclease 1 gene (DNASE1) and early endosome antigen 1 gene (EEA1)) were significantly enriched in immune response such as vesicle-mediated transport, regulation of acute inflammatory response and tuberculosis through gene ontology (GO) and KEGG analysis. KS test showed that the DAS events in the EXOC7 and KIF2C genes indeed displayed differences between MAP-infected cows and healthy cows. The DAS in EXOC7 might produce a new protein sequence with lack of 23 amino acids, and the DAS in KIF2C induced a stop codon of premature occurrence and resulted in a lack of functional domain. In summary, this study identified the DAS events and corresponding genes related to MAP-infection base on the RNA-seq data from multiple tissues of Holstein cows, providing novel insights into the regulatory mechanisms underpinning paratuberculosis in dairy cattle.


Background
Paratuberculosis is a chronic and gastrointestinal infectious disease in livestock such as cattle and sheep, which is caused by Mycobacterium avium subspecies paratuberculosis (MAP) [1,2]. Infected animals usually have a long incubation period of 6-12 months and develop clinical signs such as diarrhea and dehydration. MAP infections have already caused serious economic losses in dairy farms due to decreased milk production and increased management costs [3,4]. Through traditional farm management, paratuberculosis is difficult to prevent and control because of its long incubation periods and convenient transmissible ways of pathogen such as feces and infected milk. In 2007, Gonda et al. first detected a quantitative trait locus (QTL) associated with susceptibility to MAP infection on BTA20 in US Holsteins [5]. Several QTLs for MAP resistance or susceptibility were identified on chromosomes 7, 16 and 22 with genome-wide association studies (GWASs) in Holstein cattle [6,7]. Gao et al. identified 30 immune-related genes associated with paratuberculosis by performing GWAS [8] and RNA sequencing (RNA-seq) in jejunum tissue in Chinese Holsteins, such as NOD2, SLC11A1, TLR, SPI10, IL10RA, BolFNG and PGLYRP1 [9].
Alternative splicing (AS) means different combinations between exons and introns in mRNA processing. In the process of pre-mRNA maturation, AS events of genes may generate numerous mature mRNA isoforms in higher eukaryotes [10], which means that a limited number of genes are able to generate vast numbers of proteins via AS events of eukaryotic transcripts in cells, which has notable physiological functions in the different developmental processes [11]. AS events might be enriched for specific pathways and regulate the physiological function in different tissues, such as AS events in brain were enriched for endocytosis and vesicle mediated transport, in heart were enriched for development of cardiomyocytes and ion channels, in liver were enriched for Actin-based processes and protein transport [12]. Some AS events can result in serious changes in mRNA or protein structures such as the domain change and intronic polyadenylation, which are related to diseases. In humans, it has been shown that 370 kinds of diseases are associated with the disruption of AS events and 2337 splicing mutations are linked to common diseases such as skipped exon 10 of microtubule associated protein tau gene (MAPT) and Alzheimer's Disease, skipped exon 7 of fused in sarcoma/translocated in liposarcoma RNA binding protein gene (FUS) and Amyotrophic Lateral Sclerosis, and skipped exon 2 of cytotoxic T-lymphocyte associated protein 4 gene (CTLA4) and some autoimmune diseases [13][14][15][16][17]. In dairy cattle, 3.66% and 5.4% novel transcripts produced by specific AS events were identified in the mammary tissues from healthy and mastitic cows, respectively, which were related to immune defense and inflammation responses and harbored the known QTL for clinical mastitis [18]. Yang et al. revealed a novel transcript with a deletion of 112 bp of exon 2 in C-C motif chemokine ligand 5 gene (CCL5) had significant lower frequency in mastitic cows compared with healthy ones [19]. By using isolated ileum segments, Liang et al. identified the expression of exon1 of monocyte to macrophage differentiation associated gene (MMD) decreased and the expression of intron4 of adenosine deaminase associated gene (ADA) increased significantly in MAP-infected cows [20].
So far, studies on regulatory mechanism of AS events on susceptibility or resistant to MAP, e.g., which AS event and its corresponding gene play critical roles in MAP infection are still insufficient. In the present study, by integrating our 6 RNA-seq data sets of jejunum from the healthy and clinical Holstein cow of paratuberculosis [8,9], and 18 RNA-seq data sets from Holstein cows before and after MAP-infection from the European Bioinformatics Institute (EMBL-EBI) database (https://www.ebi.ac.uk/ accessed on 20 April 2021) [21], we detected the AS events and corresponding genes and analyzed their potential regulatory roles in paratuberculosis.

RNA Sequencing Data
The jejunum samples for RNA-seq used in this study have been detailed as described in our previous study [9]. Briefly, 6 individuals were selected from 8214 Chinese Holstein cows in Beijing Sanyuan Dairy Farm Center based on serum ELISA detection and stool real-time PCR results [22], i.e., 3 healthy (Non-Positive, NP) and 3 clinical (Double Positive, DP) cows of paratuberculosis. Total RNA of the 6 jejunum samples were sequenced with Illumina Hiseq 2500 technology platform in Novogene (Beijing, China). Consequently, 12 compressed FASTQ files of 112 Gb with around 292,907,510 paired-end reads of 150 bp were obtained. The dataset is available in the NCBI BioProject database under the accession number PRJNA756737.
In addition, 18 paired-end RNA-seq data sets of Holstein cows before and after MAPinfection in the EMBL-EBI database (https://www.ebi.ac.uk/ accessed on 20 April 2021) were downloaded, including 36 compressed FASTQ files of 260 Gb. Of these, there are 14 sequencing samples from salivary gland with around 811,607,061 reads and 4 from peripheral blood with 90,366,248 reads [23,24]. The details of the data sets are summarized in Table 1.

Quality Control of Reads and Reads Alignment to the Bovine Reference Genome
The raw data (reads) were filtered with Trimmatic-0.38 software [25], resulting in high quality clean reads as follows: deletion of reads with linkers; deletion of reads with N (N means that the base information could not be determined) content greater than 10%; deletion of low-quality reads (reads with base quality less than 3 or average base quality less than 15). The clean reads were used for the following analysis.

Identification of Differentially Expressed Genes
Read counts for each of the 24,616 annotated Ensembl genes (specified as from transcription start site (TSS) to transcription end site (TES)) were calculated using Feature-Countsv1.5.2 [29] based on sorted bam files ordered from the alignment. The raw read counts were processed using the EstimateSizeFactors function in the DEseq2 R package (1.26.0), corrected for expression outliers, and DEseq2 normalized read counts were obtained by calculating the median of the ratios of read counts for each gene to the geometric mean of all cows as a scaling factor for a given sample [30].
Gene expression was first quantified using normalized counts. Next, differential expression analysis was performed by the DEseq2 R package to identify differentially expressed genes (DEGs) between MAP-infected and healthy samples in each tissue [30]. Finally, the DEGs in each comparison were obtained (p value < 0.05). Gene lists were functionally annotated using the metascape website (https://metascape.org/ accessed on 18 June 2021) and clusterprofiler (R package) for functional enrichment analysis based on Gene Ontology and KEGG databases [31,32].

Identification and Screening of Differential Alternative Splicing (DAS) Events
rMATs v4.1.0 [33] and leafcutter v0.2.9 [34] were used to identify AS events and quantify their frequency as the percent spliced in (PSI) based on the sorted bam files. rMATs utilizes proportions of exon-exon junction reads to calculate PSI values, whereas a different clustering of introns caused by AS events is the main focus of leafcutter. In addition, as to rMATs, five possible different AS events can be considered, including exon skipping (ES), intron retention (IR), alternative 3 splicing site (A3SS), alternative 5 splicing site (A5SS), and mutually exclusive exons (MXE).
We compared AS events between MAP-infected cows and healthy cows in different tissues, and significantly differential AS (DAS) event was assessed with a ∆PSI > 10% and FDR < 0.05 which means this AS event has a significant difference of over 10% between the two groups [35]. Furthermore, based on two groups of DAS events identified by rMATs and leafcutter, respectively, the intersection of them and corresponding genes in each tissue were screened for the following analysis.

Correlation Analysis between Splicing Factors and AS Events
We downloaded 317 splicing factors (SFs) from the SpliceAid-F database (http://srv0 0.recas.ba.infn.it/SpliceAidF/ accessed on 8 September 2021) [36,37]. Based on the results of differential expression analysis, differential expressed SFs (DESFs) were selected in the 3 tissues, and expression matrix of them was created to perform correlation analysis with PSI values of screened DAS events in each tissue, respectively. As the results, DESFs and DAS events with high and significant correlation coefficients (R 2 > 0.9 and p < 0.01) were selected. Cytoscape v3.5.1 software [38,39] was used to visualize the SF-AS interaction network.

Statistical Analysis and Protein Structure Prediction
KS test was used to verify the significances of differences in each selected DAS event between MAP-infected and healthy cows [40]. To avoid false positives, FDR correction was performed and a corrected FDR < 0.05 was considered to be statistically significant. All statistical boxplots were generated by using R v3.6.0. Based on the NCBI database (https://www.ncbi.nlm.nih.gov/ accessed on 12 October 2021) and SMART website (http: //smart.embl-heidelberg.de/ accessed on 12 October 2021), the effects of DAS events on protein sequences and domains were identified [41,42].

Data Summary
A total of 24 RNA-seq samples from three tissues (peripheral blood, jejunum and salivary gland) were processed, including 2,706,541,696 clean reads after the quality control. Alignment of the sequencing reads against the bovine genome ARS-UCD1.2 obtained 81.35-93.13% of uniquely aligned reads across all samples. Consequently, 12,870 genes that were expressed in at least one samples were detected (DEseq2 normalized count > 1).

Differentially Expressed Genes across Tissues
A total of 4285 DEGs across 3 tissues between MAP-infected and healthy cows (p < 0.05) was detected, including 2928, 668 and 1574 DEGs in peripheral blood, jejunum and salivary gland, respectively. Among the case groups of three tissues, the percentage of upregulated genes was significantly higher than that of downregulated genes in jejunum, suggesting there was a global promotion in physiological functions, whereas the percentages of upregulated and downregulated genes were nearly equal in other two tissues, comparing with control groups ( Figure 1A).
Through GO and KEGG analysis, we found the DEGs in jejunum were significantly enriched in biological processes and pathways related to adaptive immune response, including response to cytokine, cytokine production and Th1 and Th2 cell differentiation, whereas the DEGs in salivary gland were enriched in innate immune response and defense response. Additionally, the DEGs in peripheral blood were enriched in both innate immune responses and adaptive immune response (FDR < 0.05). Detailed information about the enriched GO terms and KEGG pathways was shown in Figure 1B.

Differential AS Events
By using leafcutter, we identified 1242 (713 genes), 1401 (789 genes) and 1168 (652 genes) significantly differential AS (DAS) events between MAP-infected and healthy cows (FDR < 0.05 and ΔPSI > 0.1) in peripheral blood, jejunum and salivary gland, respectively. In addition, 511 (424 genes), 450 (375 genes) and 99 (86 genes) DAS events were identified by rMATs in peripheral blood, jejunum and salivary gland, respectively. To avoid false positives efficiently and access higher reliability, 119 DAS events (90 genes) in peripheral blood, 150 (89 genes) in jejunum and 68 (45 genes) in salivary gland identified by two methods simultaneously were used for further analysis (Figure 2A). The DAS events identified in both software were listed in Supplementary Table S1.

Changes of Protein Structure
To know about whether the DAS events changed the corresponding protein structure, we analyzed the amino acid sequences and protein domains based on GenBank (https://www.ncbi.nlm.nih.gov/genbank/ accessed on 12 October 2021) and SMART databases (http://smart.embl-heidelberg.de/ accessed on 12 October 2021), and found two exon-skipping DASs that significantly changed protein structures. In the exocyst complex component 7 (EXOC7) gene, the skipped exon8 deleted 23 amino acids and shortened the protein sequence, which frequency was significantly less in the MAP-infected group compared with the healthy cows (FDR KS-test = 0.015) ( Figure 4A). As for kinesin family member 2C (KIF2C) gene, there was a higher frequency (FDR KS-test = 0.079) of exon 10 skipping event in the MAP-infected cows ( Figure 4B), and it led to a stop codon of premature occurrence and shortened the protein sequence of 397 amino acids inducing the lack of KISc domain that plays important roles in intracellular transport of organelles and in cell division ( Figure 4B).

Changes of Protein Structure
To know about whether the DAS events changed the corresponding protein structure, we analyzed the amino acid sequences and protein domains based on GenBank (https://www.ncbi.nlm.nih.gov/genbank/ accessed on 12 October 2021) and SMART databases (http://smart.embl-heidelberg.de/ accessed on 12 October 2021), and found two exon-skipping DASs that significantly changed protein structures. In the exocyst complex component 7 (EXOC7) gene, the skipped exon8 deleted 23 amino acids and shortened the protein sequence, which frequency was significantly less in the MAP-infected group compared with the healthy cows (FDRKS-test = 0.015) ( Figure 4A). As for kinesin family member 2C (KIF2C) gene, there was a higher frequency (FDRKS-test = 0.079) of exon 10 skipping event in the MAP-infected cows ( Figure 4B), and it led to a stop codon of premature occurrence and shortened the protein sequence of 397 amino acids inducing the lack of KISc domain that plays important roles in intracellular transport of organelles and in cell division ( Figure 4B).

Discussion
Paratuberculosis is an infectious disease that seriously endangers the health of dairy cows and causes huge economic losses to dairy farms. It is caused by MAP that invades

Discussion
Paratuberculosis is an infectious disease that seriously endangers the health of dairy cows and causes huge economic losses to dairy farms. It is caused by MAP that invades macrophages and inhibits the immune response pathway. MAP invades the body and ushers along the gastrointestinal tract until it reaches the small intestinal mucosa where it enters microfold cells (M cells) and enterocytes to circumventing defense barriers of intestine such as the glycocalyx, tight junctions, and antimicrobial peptides. After that, MAP invades subcutaneous macrophages specifically to escape the host's immune response and establish a long, chronic infection [43,44]. In the infection process, jejunum is the main affected tissue and manifests as inflammation and edema, inducing significant immune response such as activating interferon γ signaling [45]. In addition, some studies reported that MAP also invades the host from oral mucosa and be transported to small intestines through peripheral blood. Therefore, the salivary gland and peripheral blood are also important tissues of immune response in MAP infection, containing innate immune and adaptive immune [46,47]. It has been reported that salivary secretions such as defensins and cathelicidins have enormous significances for the regulation of the intestinal function and immune [48,49]. In this study, we integrated 6 jejunum RNA-seq datasets from our previous study [9] and 18 RNA-seq datasets from EMBL-EBI database, and identified 119 (90 genes), 150 (89 genes) and 68 (45 genes) DAS events between MAP-infected and healthy Holstein cows in peripheral blood, jejunum and salivary gland, respectively. Out of these, two DAS events in the EXOC7 and KIF2C gene were associated with susceptibility and resistance to MAP.
EXOC7 is a component of the exocyst complex that plays a critical role in vesicular trafficking and the secretory pathway by targeting post-Golgi vesicles to the plasma membrane. The EXOC7 gene consists of 21 exons that generates 9 types of known transcripts through different splicing events in cattle [50]. In a previous study, knockdown of the EXOC7 gene inhibited the accumulation of vesicles at the phagosomes, leading to an inhibitory effect on bacterial elimination such as staphylococci and Salmonella [51]. Another research also showed that different transcripts of EXOC7 influenced the efficiency of vesicle production and cytokines transport related to diabetes and inflammation [52]. Thus, the inhibited AS event of exon 8 of EXOC7 in MAP-infected cows may affect immune response by reducing the efficiency of cytokines transport that is related to the susceptibility and resistance to diseases.
The KIF2C gene encodes a kinesin-like protein that functions as a microtubuledependent molecular motor that depolymerizes microtubules at the plus end thereby promots mitotic chromosome segregation. It plays essential roles in the immune cells migration and infiltration in human [53]. Knockdown of KIF2C in mouse inhibited the apoptosis of CD8 + T cell and affected the immune cells infiltration that was associated with endometrial cancer [54]. In this study, the exon10-skipping event with higher frequency in MAP-infected cows might affect immune response to MAP-infection through inhibiting the function of immune cells.
In this study, we performed two methods to identify the DAS events to increase accuracy of DASs [55]. Through the distribution of DAS events in different regions of ∆PSI value, we found that rMATS is more stringent in judging qualified DAS events, whereas there were less DAS events with high ∆PSI value (∆PSI ≥ 70%) with leafcutter ( Figure 5). On the other hand, merely 10% DAS events identified with both methods were used for analysis, and this may increase false negatives and discard some valuable information. Low overlap (10-30%) of results between different software was also reported in previous study which was caused by the false positives in AS identification [55]. In addition, more investigations are needed to verify the impacts of the DAS events in EXOC7 and KIF2C on protein structures and functions and their regulatory mechanism underlying paratuberculosi in dairy cattle. Figure 5. The distribution of DAS events in different regions of ΔPSI value. The figure above i distribution calculated by leafcutter, and below is that by rmats. Of them, blue parts mean DAS ev in jejunum, red parts mean that in peripheral blood and green parts mean that in salivary gland.

Conclusions
In the present study, based on 24 RNA-seq datasets from 3 tissues, we detected 224 g that contained differential alternative splicing (DAS) events between MAP-infection heathy Holstein cows, out of which 14 genes were significantly enriched for immune-rel pathways. Of note, we found that the DAS events in the EXOC7 and KIF2C changed pro structures and were associated with the susceptibility to MAP. Our findings provided n insights into the regulatory mechanisms underlying paratuberculosis in dairy cattle.    The distribution of DAS events in different regions of ∆PSI value. The figure above is the distribution calculated by leafcutter, and below is that by rmats. Of them, blue parts mean DAS events in jejunum, red parts mean that in peripheral blood and green parts mean that in salivary gland.

Conclusions
In the present study, based on 24 RNA-seq datasets from 3 tissues, we detected 224 genes that contained differential alternative splicing (DAS) events between MAPinfection and heathy Holstein cows, out of which 14 genes were significantly enriched for immune-related pathways. Of note, we found that the DAS events in the EXOC7 and KIF2C changed protein structures and were associated with the susceptibility to MAP. Our findings provided novel insights into the regulatory mechanisms underlying paratuberculosis in dairy cattle. Data Availability Statement: The datasets are available in the NCBI BioProject database under the accession number PRJNA756737, PRJNA628877 and PRJNA513864.