De Novo Transcriptome Analysis of Differential Functional Gene Expression in Largemouth Bass (Micropterus salmoides) after Challenge with Nocardia seriolae

Largemouth bass (Micropterus salmoides) are common hosts of an epizootic bacterial infection by Nocardia seriolae. We conducted transcriptome profiling of M. salmoides to understand the host immune response to N. seriolae infection, using the Illumina sequencing platform. De novo assembly of paired-end reads yielded 47,881 unigenes, the total length, average length, N50, and GC content of which were 49,734,288, 1038, 1983 bp, and 45.94%, respectively. Annotation was performed by comparison against non-redundant protein sequence (NR), non-redundant nucleotide (NT), Swiss-Prot, Clusters of Orthologous Groups (COG), Kyoto Encyclopaedia of Genes and Genomes (KEGG), Gene Ontology (GO), and Interpro databases, yielding 28,964 (NR: 60.49%), 36,686 (NT: 76.62%), 24,830 (Swissprot: 51.86%), 8913 (COG: 18.61%), 20,329 (KEGG: 42.46%), 835 (GO: 1.74%), and 22,194 (Interpro: 46.35%) unigenes. Additionally, 8913 unigenes were classified into 25 Clusters of Orthologous Groups (KOGs) categories, and 20,329 unigenes were assigned to 244 specific signalling pathways. RNA-Seq by Expectation Maximization (RSEM) and PossionDis were used to determine significantly differentially expressed genes (False Discovery Rate (FDR) < 0.05) and we found that 1384 were upregulated genes and 1542 were downregulated genes, and further confirmed their regulations using reverse transcription quantitative PCR (RT-qPCR). Altogether, these results provide information on immune mechanisms induced during bacterial infection in largemouth bass, which may facilitate the prevention of nocardiosis.


Introduction
During intensive aquaculture, fish are always exposed to stressors which may facilitate host infection by opportunistic pathogens existing in the water [1].Indeed, detecting the invading pathogens depends on the host's ability to recognize the pathogens [2,3].Therefore, for rapid elimination of pathogens, fish rely on innate or nonspecific immune responses [1].Against this background, transcriptome profiling analysis during infection in the host can facilitate genome studies and functional gene identification.However, in fish the broad identification of immune-related genes at the genome or transcriptome levels are limited to a few species [4,5].Since the genome sequence for many non-model fish species is unknown, the study on immune genes is difficult.Moreover, the introduction of RNA deep sequencing technologies (i.e., Solexa/Illumina RNA-Seq and digital gene expression) have contributed much to the identification of important immune-related genes in fish [6][7][8][9].

Functional Classification
Overall, 8913 (23.63%) annotated putative proteins from COG were grouped into 25 different categories (Figure 1).After filtering the poorly characterised proteins ("general function prediction only" and "function unknown") based on the number of unigenes, the top three functional clusters were determined to be "replication recombination and modification" (1491, 16.72%), which is followed by "transcription" (1354, 15.19%) and "translation, ribosomal structure, and biogenesis" (1263, 14.17%) (Figure 1).Furthermore, 37,712 (78.76%) unigenes were assigned to 835 GO terms based on sequence homology and a total of 52 functional groups were clustered into biological process, cellular component, and molecular function (Figure 2).The unigene sequences from molecular function were clustered into 13 different classifications.Further, the largest subcategory within molecular function was "binding", followed by "catalytic activity" In the biological process, sequences were distributed into 24 classifications.The most represented subcategories were "cellular processes" and "metabolic processes"."Cell part" and "cell" were the most represented among 13 subcategories within the cellular component category.

Differentially Expressed Genes after Nocardia seriolae Challenge
A total of 1384 transcript-derived unigenes were upregulated, whereas 1542 genes were downregulated in phosphate buffered saline (PBS) control and bacterial infection groups, respectively (Figure S1).The top 20 enriched pathways are shown in Figure 4, with genes involved in immune-related "Cell adhesion molecule", "Cytokine receptor interaction", "Hematopoietic cell lineage", and "Phagosome" categories being the most significantly enriched.Natural killer cellmediated cytotoxicity, hematopoietic cell lineage, toll-like receptor signalling, Fc γ R-mediated phagocytosis, antigen processing and presentation, NOD-like receptor signalling, and chemokine signalling (Table S3) were differentially expressed among immune-related categories.These results suggest an important role for these unigenes during N. seriolae infection in largemouth bass.

Differentially Expressed Gene Validation Using Real-Time PCR
We identified immune-related gene sequences that were upregulated from DEG in largemouth bass (Table S4), and evaluated their homology with those from other fish species using the NCBI database.These sequences will be used for our future studies in immune response of largemouth bass to Nocardia seriolae.The expression levels of seven differentially expressed genes related to pathways including TLR, RIG I-like receptors, cytokine-cytokine receptor interaction, natural killer cell mediated cytotoxicity, and antigen processing and presentation (T-cell receptor (TCR)) were evaluated from spleen tissue.The expression levels were largely consistent with the transcriptome profile analyses suggesting that the transcriptome data were reliable (Figure 5).

Discussion
In the present study, Illumina sequencing of control and infection treatment groups yielded 47,881 merged unigenes from spleen tissue of largemouth bass (M.salmoides).This study selected the spleen of largemouth bass 24 h after challenge as experimental samples.After challenge with N. seriolae we observed upregulations of many immune-related genes in the largemouth bass.Noticeably, immune-related pro-inflammatory cytokines and signal transduction related genes, including IL-1β, TNF receptor, CXC chemokine, TGF-β, and NF-κB, were the most significantly upregulated transcripts.
After assembly, 47,881 unigenes were generated with an average length of 1038 bp and an N50 of 1983 bp, longer than the sequences achieved in previous studies using a Roche GS FLX 454 system (Basel, Switzerland) with a MIRA assembler [30] or an Illumina/Hiseq-2000 with assembling program SOAP [31].This difference in sequence quality may be explained by differences in the sampling tissue and de novo assemblers.Since largemouth bass has an absence of a reference genome in the database, the Trinity program used in this study showed better performance compared to other tools in transcriptome assembly [32,33].In contrast to Trinity, SOAP or MIRA assemblies adopted in previous studies [30,31] have been shown to be more fragmented with high levels of errors in sequencing and polymorphism [33,34].In this study, the largemouth bass transcriptome yielded 47,881 merged unigenes from the Illumina/Hiseq-2000 RNA-Seq platform compared to 29,682 unigenes from the Roche 454 system and 2139 unigenes from a SMART cDNA library [35].
It is noteworthy that only 37,712 unigenes were annotated from the databases in this study based on sequence similarity; this annotation limitation also exists in other marine organism transcriptomes [36].This could be explained due to the absence of a genomic database and genomic studies on commercially important aquaculture species [32,[37][38][39].The GO, COG, and KEGG databases used in this study for functional annotation provide valuable information about biological features of largemouth bass challenged by N. seriolae.For example, in the KEGG analysis of 20,329 sequences assigned to 244 KEGG pathways, genetic information processing accounted for 9567 pathways related to pathogen infection (Figure 3).Together, these findings indicate that primary host immune pathways are conserved in largemouth bass which are activated to protect against pathogen infections.
Cytokines are proteins which transfer information among cells to initiate complex intracellular biological processes upon binding to corresponding cell-surface receptors.Moreover, cytokine levels initiate an inflammatory response to bacterial exposure which guides towards leukocyte attraction and activation of antimicrobial pathways [40,41].Against this background, tumour necrosis factor alpha (TNF-α), which is a first cytokine released during infection activates the downstream

Discussion
In the present study, Illumina sequencing of control and infection treatment groups yielded 47,881 merged unigenes from spleen tissue of largemouth bass (M.salmoides).This study selected the spleen of largemouth bass 24 h after challenge as experimental samples.After challenge with N. seriolae we observed upregulations of many immune-related genes in the largemouth bass.Noticeably, immune-related pro-inflammatory cytokines and signal transduction related genes, including IL-1β, TNF receptor, CXC chemokine, TGF-β, and NF-κB, were the most significantly upregulated transcripts.
After assembly, 47,881 unigenes were generated with an average length of 1038 bp and an N50 of 1983 bp, longer than the sequences achieved in previous studies using a Roche GS FLX 454 system (Basel, Switzerland) with a MIRA assembler [30] or an Illumina/Hiseq-2000 with assembling program SOAP [31].This difference in sequence quality may be explained by differences in the sampling tissue and de novo assemblers.Since largemouth bass has an absence of a reference genome in the database, the Trinity program used in this study showed better performance compared to other tools in transcriptome assembly [32,33].In contrast to Trinity, SOAP or MIRA assemblies adopted in previous studies [30,31] have been shown to be more fragmented with high levels of errors in sequencing and polymorphism [33,34].In this study, the largemouth bass transcriptome yielded 47,881 merged unigenes from the Illumina/Hiseq-2000 RNA-Seq platform compared to 29,682 unigenes from the Roche 454 system and 2139 unigenes from a SMART cDNA library [35].
It is noteworthy that only 37,712 unigenes were annotated from the databases in this study based on sequence similarity; this annotation limitation also exists in other marine organism transcriptomes [36].This could be explained due to the absence of a genomic database and genomic studies on commercially important aquaculture species [32,[37][38][39].The GO, COG, and KEGG databases used in this study for functional annotation provide valuable information about biological features of largemouth bass challenged by N. seriolae.For example, in the KEGG analysis of 20,329 sequences assigned to 244 KEGG pathways, genetic information processing accounted for 9567 pathways related to pathogen infection (Figure 3).Together, these findings indicate that primary host immune pathways are conserved in largemouth bass which are activated to protect against pathogen infections.
Cytokines are proteins which transfer information among cells to initiate complex intracellular biological processes upon binding to corresponding cell-surface receptors.Moreover, cytokine levels initiate an inflammatory response to bacterial exposure which guides towards leukocyte attraction and activation of antimicrobial pathways [40,41].Against this background, tumour necrosis factor alpha (TNF-α), which is a first cytokine released during infection activates the downstream expression of other cytokines such as IL-1β and chemokines [42,43].In the present study, after N. seriolae infection it was observed that different cytokines and cytokine receptor families are upregulated in cytokine-cytokine receptor interaction signalling pathways (Table 1), including chemokine receptors (CXCL10, CXCR3, XCR1, CCL 20, 25, 19, 21, 5, CCR3), hematopoietin receptors (IL11RA IL6R), TNF receptors (SF11B, TNFSF12, SF14, and SF6B), TGF-β receptors (TGFBR2), and IL-1 receptors (IL-1β, IL-18, and IL-1R1).These data indicate that, in the case of largemouth bass in early stages of N. seriolae infection, cytokine-cytokine receptor interaction may represent an important anti-bacterial mechanism.
In the host, pattern-recognition receptors (PRRs) recognise pathogen-associated molecular patterns (PAMPs) to defend against pathogen invasion and activate immune responses through signalling pathways, such as TLRs, RIG-I-like receptors (RLRs), NOD-like receptors (NLRs) [44], and C-type lectin receptors (CLRs) [45,46].In this study, a total of 29 gene transcripts, which are involved in the TLR signalling pathway, are found to be upregulated, including the fish-specific TLRs (TLR22), and downstream effector molecules, such as LBP, CASP8, IKK α, IKK β, TRAF6, TAK, TBK, IKK, and RANTES.Additionally, we observed downstream effector molecules of cytokines and transcription factors including p38, IRF3, IRF7, STAT1, IL-12, IL-8, CD40, CD86, and IP10.These suggest that TLR mechanisms are conserved from fish to mammals.We observed upregulations in the expression of pro-inflammatory cytokines in our study after N. seriolae infection including IL-1β, IL-8, and TNF-α (Figures S2 and S3).Our results on TNF-α and IL-1β were in agreement with the study on Japanese flounder (Paralichthys olivaceus) in spleen after immersion challenge with N. seriolae, wherein TNF-α and IL-β were upregulated at 24 h post challenge, while CC chemokine downregulated [47].Moreover, in the case of human monocytes, cytokines induced within 24 h following Gram-positive and Gram-negative bacterial infections [48].
The Janus kinase/signal transducers and activators of transcription (JAK-STAT) pathway initiated due to interleukins, IFNs, and growth factors present in the surrounding microenvironment [49].Different cytokine receptors are associated with JAK for proliferation, survival, and differentiation in lymphoid cell precursor [50,51], while STAT1 activated upon IFN-γ signalling, resulting in enhanced bacteria killing and protection [48].In this study, the members of the JAK-STAT, including STAM and Stat1, were upregulated (Figure S4).This can suggest that, the JAK-STAT pathway activated upon N. seriolae infection in largemouth bass, which can further induce other pathways, namely NF-κB signalling, the TGF-β activated SMAD pathway, and apoptosis [52].

Animal Maintenance
Healthy largemouth bass (Micropterus salmoides) without pathogen infection weighing 125 ˘10 g were used in this study.The fish were kept in an indoor facility at a constant temperature of 26 ˝C and fed daily with commercial feed.The experiment was performed two weeks after acclimatisation.Fish were anaesthetised for handling with 2-phenoxyethanol.Approval for the following animal studies was obtained from the Centre for Research Animal Care and Use Committee of the National Pingtung University of Science and Technology under protocol number 101-027, dated 19 March 2012.

Isolation, Cultivation, and Challenge with Nocardia seriolae
The bacterium N. seriolae was isolated from striped bass and found to be highly virulent in farmed fish [53].The species was identified by API ZYM and 16S rDNA sequencing, grown in Brain Heart Infusion (BHI) broth for five days at 25 ˝C, and enumerated prior to the challenge test.Fifteen fish were anaesthetised and injected intraperitoneally with 1.0 ˆ10 6 cfu N. seriolae that were suspended in 100 µL phosphate-buffered saline (PBS, pH 7.2).The remaining 15 fish per group received only PBS (pH 7.2) as a control.After the fish were returned to the observation tanks, samples were taken at 24 h post infection (hpi).Three fish each from the challenge (treatment) and control groups (n = 3) were examined.Spleen tissue was dissected and total RNA was isolated.

Total RNA Extraction, Preparation of cDNA Library, and Sequencing
Total RNA was extracted using TRIzol ® reagent (Invitrogen Corp., Carlsbad, CA, USA).RNA integrity was assessed using Agilent Bioanalyzer 2100 system (Agilent Technologies, Palo Alto, CA, USA).A TruSeq™ RNA Sample Preparation Kit (Illumina, Inc., San Diego, CA, USA) was used for cDNA library construction.Further, 40 µg total RNA was used for mRNA isolation using poly-T oligo-attached magnetic beads.First-strand cDNA was synthesized using random hexamer primers and Superscript III (Invitrogen, Carlsbad, CA, USA); this was followed by second-strand cDNA synthesis, end repair, and adaptor ligation.The RNA-Seq library was sequenced on the Illumina HiSeq™ 2000 (Illumina, Inc., San Diego, CA, USA) platform as paired-end reads to 100 bp at Genomics Bioscience Technology Co., Ltd.(Taipei, Taiwan).The transcriptome raw sequencing datasets are available from Sequence Read Archive (SRA) database in NCBI and the accession numbers are SRX1739692 and SRX1738842.All of the information on the assembled unigene sequences and annotations are available from the corresponding authors upon request.

Filtering of Sequencing Reads
Raw reads were defined as adaptor-polluted reads containing low-quality or unknown base (N) reads; these reads were removed before downstream analyses.Internal software was used to filter reads, removing (1) reads with adaptors; (2) reads in which unknown bases comprised greater than 5% of the read; and (3) low quality reads (defined as the percentage of bases for which quality is less than 10 and greater than 20% in a read).After filtering, the remaining reads were called "Clean Reads" and stored in FASTQ [54] format.

De Novo Transcriptome Assembly
Trinity [55] was used to perform de novo assembly with clean reads.Next, TIGR Gene Indices clustering tools, or Tgicl, was used to cluster transcripts to unigenes.In the case of two or more samples, Tgicl would be re-executed with each sample's unigene to obtain the final unigene for downstream analysis.Unigenes were divided into two classes: clusters (CL), comprised of several unigenes with shared similarity greater than 70%, and singletons (Unigenes).

Functional Unigene Annotation and Classification
For gene annotation, following database were used; NCBI non-redundant protein database [56], gene ontology (GO) [57], Clusters of Orthologous Groups [58], and the Kyoto Encyclopaedia of Genes and Genomes [59] with E-values less than 10 ´5 using BlastP (Version 2.2.25) [60].With functional annotation, we selected the region of the unigene that best mapped to functional databases in a priority order of NR, SwissProt, KEGG, and COG as its coding sequence (CDS), and displayed this sequence region from 5' to 3' in FASTA format.Unigenes that could not be aligned to any database mentioned above were predicted by ESTScan [61] using Blast-predicted CDS as the model.

Differentially Expressed Genes
Expression data from two libraries (treatment and control) were determined by mapping to the transcriptome assembly using Bowtie2 software [62,63].The fragments per kilobase of transcripts per million fragments mapped (FPKM) values were analysed further using RESM [64] to get differentially expressed genes (DEGs) in the spleen between the control and infected groups.Further, to determine the threshold p-value in multiple tests, a false discovery rate (FDR) was used.Furthermore, significant enrichment was calculated when FDR was <0.05 and FPKM values showed at least a two-fold difference between the two samples reads.

Real-Time Polymerase Chain Reaction
PCR primers were designed based on transcriptome sequences using Primer 2 Plus software (Table 2).cDNA was synthesised from 2 µg of total RNA using 200 U of M-MLV reverse transcriptase (Promega).β-Actin served as internal control and RT-qPCR was performed using iQSYBR Green Supermix (Bio-Rad Laboratories, Hercules, CA, USA), and each sample was run in triplicate.The thermal gradient feature (CFX96, Bio-Rad Laboratories) was used to determine the optimal annealing temperature for all primers.The real-time PCR program used was 95 ˝C for 3 min, followed by 40 cycles of 95 ˝C for 15 s, 58 ˝C for 15 s, and 72 ˝C for 35 s.Dissociation and melting curves of amplification products were performed and results were analysed using the CFX Manager Software package (Bio-Rad Laboratories).The 2 ´∆∆Ct method was chosen as the calculation method [65].The difference in the cycle threshold (C t ) value of the target gene and its housekeeping gene (β-actin), called ∆C t , was calculated using the following equation: ∆∆C t = (∆C t of bacterial challenge or PBS-injected group for the target gene at each time point) ´(∆C t of the initial control).

Statistical Analyses
Statistical analyses were performed using SPSS 16.0 software.All data are given as mean ˘SD.Significant differences between samples were analysed by one-way analysis of variance (ANOVA), and Duncan's tests at a significance level of 0.05.

Conclusions
This study provides necessary information on differential immune gene transcriptome profiling in largemouth bass (M.salmoides) infected with N. seriolae.Moreover, this transcriptome assembly could be used as a reference for studies related to comparative biology within the genus or family.Of course, we acknowledge that this transcriptome-level response to N. seriolae infections is a preliminary study and larger scale studies are required to further understand the defence mechanisms in largemouth bass.
Supplementary Materials: Supplementary materials can be found at www.mdpi.com/1422-0067/17/8/1315/s1.transcriptome sequencing.We are grateful to all those who contributed to the development of this research and provided input during the study.

Figure 1 .
Figure 1.The cluster of orthologous groups (COG) classification.8913 (23.63% of the total annotated putative proteins) were grouped into 25 different categories.

Figure 1 .
Figure 1.The cluster of orthologous groups (COG) classification.8913 (23.63% of the total annotated putative proteins) were grouped into 25 different categories.

Figure 2 .
Figure 2. Functional distribution of GO annotation.Figure 2. Functional distribution of GO annotation.

Figure 2 .
Figure 2. Functional distribution of GO annotation.Figure 2. Functional distribution of GO annotation.

Figure 3 .
Figure 3. KEGG classification of assembled unigenes from control and treated groups.(A) Cellular processes; (B) Environmental information processing; (C) Genetic information processing; (D) Human diseases; (E) Metabolism; and (F) Organismal systems.

Figure 3 .
Figure 3. KEGG classification of assembled unigenes from control and treated groups.(A) Cellular processes; (B) Environmental information processing; (C) Genetic information processing; (D) Human diseases; (E) Metabolism; and (F) Organismal systems.

Figure 4 .
Figure 4. Scatterplot of the top 20 enriched KEGG pathways.Rich Factor is the ratio of differentially expressed gene numbers annotated in this pathway terms to all gene numbers annotated in this pathway term.q ≤ 0.05 as significantly enriched.

Figure 4 .
Figure 4. Scatterplot of the top 20 enriched KEGG pathways.Rich Factor is the ratio of differentially expressed gene numbers annotated in this pathway terms to all gene numbers annotated in this pathway term.q ď 0.05 as significantly enriched.

Figure 5 .
Figure 5. Comparative gene expression analysis from qPCR and RNA-Seq in spleen from the infected largemouth bass with N. seriolae and compared with those in the control at the 24 h time point.Expression of target genes was normalized to β-actin as a reference gene.Statistically significant differences from control are presented, with * p < 0.05.

Figure 5 .
Figure 5. Comparative gene expression analysis from qPCR and RNA-Seq in spleen from the infected largemouth bass with N. seriolae and compared with those in the control at the 24 h time point.Expression of target genes was normalized to β-actin as a reference gene.Statistically significant differences from control are presented, with * p < 0.05.

Table 2 .
Primer name, sequence, target gene, and their application used in the present study.