Transcriptome Analysis of Male and Female Mature Gonads of Silver Sillago (Sillago sihama)

Silver sillago (Sillago sihama) is an emerging commercial marine aquaculture species in China. To date, fundamental information on S. sihama, such as genomic information, is lacking, and no data are available on the gonad transcriptome of S. sihama. Here, the first gonadal transcriptomes of S. sihama have been constructed and genes potentially involved in gonadal development and reproduction identified. Illumina sequencing generated 60.18 million clean reads for the testis and 59.10 million for the ovary. All reads were assembled into 74,038 unigenes with a mean length of 1,004 bp and N50 value of 2,190 bp. Among all the predictable unigenes, a total of 34,104 unigenes (46%) were searched against multiple databases, including 33,244 unigenes annotated in the RefSeq Non- Redundant database at NCBI, and 28,924 in Swiss-Prot. By comparing the ovary and testis, 35,367 unigenes were identified as being differentially expressed between males and females, of which 29,127 were upregulated in the testis and 6,240 were upregulated in the ovary. Numerous differentially expressed genes (DEGs) known to be involved in gonadal development and gametogenesis were identified, including amh, dmrt1, gsdf, cyp19a1a, gnrhr, and zps. Using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses, the top 20 KEGG pathways with highest number of DEGs were found to be involved in regulating gonadal development and gametogenesis in S. sihama. Moreover, 22,666 simple sequence repeats (SSRs) were identified in 14,577 SSR-containing sequences. The findings provide a valuable dataset for future functional analyses of sex-associated genes and molecular marker assisted selection in S. sihama.


Introduction
Fishes are one of the most abundant vertebrates on Earth, including approximately 30,000 species [1]. Sex determination and evolution in fish is more complex than in other species and can be influenced by genetic and environmental factors [2,3]. To date, several master sex-determining genes have been reported in multiple fish species [4], including dmy (Y-specific DM-domain) in medaka, sdY (sexually dimorphic on the Y chromosome) in rainbow trout, amhr2 (anti-Müllerian hormone receptor type II) in fugu, amhy (anti-Müllerian hormone) in Patagonian pejerrey, gsdf (gonadal soma-derived growth factor on the Y chromosome) in rainbow trout and dmrt1 (doublesex and mab-3 related transcription factor 1) in the half-smooth tongue sole. Besides these master sex determining genes, some conserved genes (such as sox9, fox2, fox3, sf1, and cyp19) control sexual development and differentiation, mainly via the action of steroid hormones [4,5]. In addition, many biological pathways such as the Wnt signaling pathway, estrogen signaling pathway and transforming growth factor β (TGF-β) signaling pathway, also play essential roles in sex determination and reproduction [6]. To better understand sexual regulatory mechanisms in fish, it is necessary to explore sex-related genes in more detail.
Transcriptome sequencing provides a general representation of the genes that are expressed in specific tissues or organs [7] and has been widely applied to identifying sex-related genes and associated regulatory mechanisms, due to its advantages of high throughput and low cost. A set of sex-related genes or pathways has been identified by gonad transcriptomes in numerous aquaculture species, including the spotted knifejaw [6], Japanese scallop [8], olive flounder [9] and channel catfish [10].
As the most widespread species of Sillaginidae, the silver whiting Sillago sihama is naturally distributed in the Indo-Pacific region [11]. Due to its high meat quality, taste and year-round breeding pattern, it is emerging as a commercial marine aquaculture species in China. However, despite its economic importance, studies on this species have been limited to a description of its reproductive biology, artificial breeding, feeding habits and population genetics [12][13][14][15]. To date, fundamental details, such as genomic information, on S. sihama are lacking and no data are available on the gonad transcriptome of S. sihama. In this study, we performed the first gonadal transcriptome of S. sihama and annotated genes potentially involved in gonadal development and reproduction. This transcriptome dataset will provide a foundational understanding for further research on the regulatory mechanisms of sex determination and development in S. sihama.

RNA Extraction, Complementary DNA Library Construction and Illumina Sequencing
Sillago sihama reaches sexual maturity at one year [15]. Two two-year-old adult S. sihama individuals (male: body length 22 cm, body weight 87 g; female: body length 27 cm, body weight 108 g) were obtained from the East Island in Zhanjiang, China (21 • 00 33.00"N, 110 • 23 5.99"E). Gender was identified by morphological observation of the gonads. Samples of the testis and ovary were rapidly sampled in RNAlater stabilization reagent and stored at -80 • C. Then, total RNA of the mature gonads (male, named Ss_M; female, named Ss_F) was extracted using Trizol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. All experimental protocols were approved by the Animal Research and Ethics Committee of Guangdong Ocean University (NIH Pub. No. 85-23, revised 1996).
Complementary DNA (cDNA) libraries from one ovary and one testis were constructed separately with TrueSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA) according to the manufacturer's protocol. Messenger RNA (mRNA) was enriched from total RNA by Oligo-(dT) beads (Illumina), and fragments were subjected to end repair, adapter ligation and size selection by agarose gel electrophoresis, with suitable fragments selected as templates for PCR amplification. A 2100 Bioanaylzer (Agilent, Santa Clara, CA, USA) and StepOnePlus Real-Time PCR System (Thermo-Fisher Scientific, Santa Clara, CA, USA) were used for checking the quantity and quality of the sample library. RNA samples (displaying an RNA Integrity Number higher than 8) were selected for the preparation of sequencing libraries. The libraries were sequenced using an Illumina HiSeq TM 2500 platform (v4 reagents) with a 150-bp paired-end strategy. All raw data have been submitted to the NCBI sequence read archive (SRA) under accession number PRJNA503317 (Male: SRR8143991; Female: SRR8143992).

De Novo Assembly and Functional Annotation
After removing adaptors, leading and trailing low quality or N bases (below quality three), scanning the read with a 4-base wide sliding window, cutting when the average quality per base dropped below 15, and dropping reads below 36 bases long, clean data was then de novo assembled using Trinity software (version 2.0.6) [16] with min_kemer_cov set to 3 and all other parameters set as default (scripts can be found at Data Supplementary Data S1). Contigs longer than 200 bp were filtered out for the following analysis after assembly. Unigenes were formed when contigs could not be further stretched on either end. These unigenes were further spliced by a k-mer cut-off value of 25 and then assembled to acquire maximum length nonredundant unigenes using TGICL clustering software (J. Craig Venter Institute, Rockville, MD, USA). To annotate all unigenes, the BLASTx program (version 2.2.23) (https://blast.ncbi.nlm.nih.gov/) with an E-value threshold of 1 × 10 −5 was used to query against different databases: NCBI non-redundant protein (NR) database (https://www. ncbi.nlm.nih.gov/refseq/), the Swiss-Prot protein database (https://www.uniprot.org/), the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (https://www.genome.jp/kegg/), and the eukaryotic orthologous group (KOG) database [17]. Gene ontology (GO) annotation and functional classification of unigenes were performed using Blast2GO software (https://www.blast2go.com/; version 4.4) and WEGO software (http://wego.genomics.org.cn/; version 1.0), respectively.

Single Sequence Repeat Locus Identification
MIcroSAtellite (MISA) software (http://pgrc.ipk-gatersleben.de/misa/misa.html; version 1.0) was applied for single sequence repeat (SSR) mining of all unigenes. The default parameters of mining were as follows: the minimum number of repeat units for di-, tri-, tetra-, penta-and hexa-nucleotide motifs were set as 6, 5, 4, 4 and 4, respectively. Primer 1.1.4 was then used to design primer pairs in the flanking regions of SSRs for subsequent validation.

Differential Gene Expression Analysis
High quality clean reads were mapped to the reference transcriptome using the short reads alignment tool Bowtie2 with default parameters. The reads per kilobase per million reads (RPKM) method [18] was used to measure gene expression levels of S. sihama. The edgeR package (version 3.4.3) [19] was used to identify differentially expressed genes (DEGs) between ovary and testis (scripts can be found at Supplementary Data S1). Unigenes with a fold change ≥ 4 and a false discovery rate (FDR) < 0.01 were considered as DEGs. The DEGs were then subjected to enrichment analysis of GO functions and KEGG pathways, and those with p ≤ 0.05 were considered significantly enriched.

Experimental Validation by Quantitative real-time Polymerase Chain Reaction
A total of 15 genes were selected for quantitative real-time polymerase chain reaction (qRT-PCR) validation and primers were designed by an online software tool Primer-BLAST [20]. qRT-PCR was performed on an ABI 7500 qPCR system (Life Technologies Inc., Carlsbad, CA, USA) using SYBR Green Real time PCR Master Mix (TaKaRa Biotechnology, Dalian, China) as per the following process: denaturation at 95 • C for 1 min, followed by 40 cycles of 95 • C (15 s), 60 • C (30 s), and 72 • C (45 s). The ribosomal protein L7 (rpl7) gene was used to normalize the expression values [21]. The 2 −∆∆Ct method was used to calculate relative expression levels of the target gene. All primers used in real-time PCR are shown in Table S1.

Illumina Sequencing and De Novo Assembly
The first gonad transcriptome in S.sihama will provide a valuable genomic resource for future studies on gonadal development and gametogenesis, and enrich candidate genes involved in these processes in marine fish. In total, 17.89 Gb bases of data were generated (63.69 million raw reads for the testis and 62.05 million raw reads for the ovary) ( Table 1). After performing quality control, 60.18 million clean reads were retained for the testis and 59.10 million reads were retained for the ovary. A total of 74,038 unigenes were assembled by Trinity [16]. By categorizing both clusters and singletons as unigenes, a total of 74,038 unigenes were assembled by Trinity. The total number of unigenes looks much higher than commonly imagined, which might be due to the high heterozygosity and high repetition rate of the genome of Sillaginidae fishes [22]. Using the same assembly method, large numbers of unigenes were also found in crab [7] and shrimp [23], species whose genomes also have high heterozygosity and high repetition rates. The sequence length distribution of the unigenes ranged from 201 bp to 20,589 bp, with a mean length of 1,004 bp ( Table 2). The number of unigenes that exceeded 1,000 bp was 19,591 (26.5%).

Sequence Annotation
Among all the predictable unigenes, a total of 34,104 unigenes (46.07%) were searched against multiple databases, while 39,931 (53.93%) unigenes had no BLASTx hits. Among all the annotated unigenes, 33,244 unigenes matched against the NR database (97.48%) and 28,924 against Swiss-Prot (84.81%) ( Table 2). These annotated unigenes provided the basis for further study on specific molecular processes in S. sihama. This may be related to the limited genomic information available for members of the Sillaginidae, or because these unannotated unigenes were specific sequences containing unknown protein domains, genes' untranslated regions, or non-coding RNAs [23]. Based on the NR database, a total of 33,244 unigenes matched known sequences from 688 different species. Most unigene matches were to Larimichthys crocea (30.07%), Lates calcarifer (21.00%), Stegastes partitus (7.69%), Rhinolophus sinicus (6.80%) and Paralichthys olivaceus (3.75%) (Figure 1), for which genomic information is available. Functional prediction and classification of the unigenes was conducted by searching the KOG and GO databases. In the KOG database, 23,064 unigenes were annotated and classified into 25 functional categories (Figure 2), the top three of which were: signal transduction mechanisms (11,063, 48.0%); general function prediction only (7733, 33.5%); and posttranslational modification, protein turnover, chaperones (4613, 20.0%). 6,938 unigenes were classified into three major GO categories (Figure 3). Among these functional groups, the terms "cellular process" (56.04%), "binding" (48.00%), and "cell" (30.38%) were dominant in the biological process, molecular function ontologies, and cellular component categories, respectively. Similar proportions were observed in previous transcriptome analyses of other fish [24][25][26], suggesting that the genes encoding these functions are highly conserved across species and are more easily annotated [23]. In this study, 9,806 unigenes were mapped to 237 KEGG pathways, and the number of unigenes in different pathways ranged from 1 to 2,377 (Table S2). A total of 24.24% of unigenes were mapped to 'metabolic pathway', indicating this pathway plays an important role in the development and function of silver sillago gonads. Additionally, we identified several KEGG pathways related to reproduction and development, such as the MAPK signaling pathway (508 unigenes), Wnt signaling pathway (316 unigenes), GnRH signaling pathway (229 unigenes) and oocyte meiosis (209 unigenes).

Differential Gene Expression Identification and Enrichment Analysis
In this study, 35,367 unigenes were found to be DEGs between the male and female (FDR ≤ 0.01 and log 2 FC ≥ 2) (Figure 4). A total of 13,324 (37.67%) DEGs were annotated, with 9,815 up-and 3,509 down-regulated genes in the male. Many male-biased unigenes and female-biased unigenes were identified in S. sihama, as in gonadal transcriptome analyses of other species [23,27], indicating that reciprocal antagonism between male-promoting and female-promoting pathways is a central feature underlying sexual development and maintenance in vertebrates. GO annotation was performed to classify the 13,324 DEGs ( Figure 5). At the biological process level, the GO terms "cellular process" (1476 unigenes), "single-organism process" (1335 unigenes) and "metabolic process" (1131 unigenes) included more DEGs. There are also more DEGs in the GO terms "binding" (1218 unigenes) and "catalytic activity" (965 unigenes) at the molecular functional level. At the cellular component level, more DEGs were seen in the GO terms "cell" (668 unigenes), "cell part" (668 unigenes), and "membrance" (639 unigenes). GO enrichment analysis showed that 106, 10, and 60 GO terms were significantly enriched in the categories "biological process", "cellular component", and "molecular function", respectively (Table S3) (p < 0.05). Meanwhile, 3,323 DEGs were mapped to 217 KEGG pathways, and 27 significant KEGG pathways were obtained (with a threshold of Q value < 0.05, Table S4).
As a sex determining gene in fish, the dmrt1 gene encodes a transcription factor that is essential for the maintenance of male-specified germ cells and testis differentiation [28]. Anti-Müllerian hormone (amh) has been identified in several fish species. As a member of the TGF-β family, amh is a central player in gonad development and a target of gonadotropic follicle-stimulating hormone (fsh) in several fish species [29,30]. In rainbow trout, gsdf, a new TGF-β superfamily member, was identified and found to regulate the proliferation of primordial germ cells (PGCs) and spermatogonia [31]. A previous study has shown that overexpression of gsdf induced testis differentiation in Nile tilapia females [32]. In this study, these sex-determining genes were identified and showed a male-biased pattern of expression in S. sihama.
Steroid hormones play vital roles in germ cells, regulating sexual differentiation and sexual behaviour patterns [33]. Cyp11b and cyp21 play a key role in catalyzing the production of cortisol and are involved in sex steroid production [34]. Previous studies have shown that cyp11b mRNA was expressed at higher level in males than in females in several fish species, indicating cyp11b was involved in the regulation of testicular development [35,36]. The aromatase cytochrome P450 family 19 subfamily A member 1 (cyp19a1a) gene catalyzes a key step in the conversion of testosterone to estrogen [37]. Here, the high expression of steroid-metabolizing enzymes (cyp11b, cyp17a1, and cyp27a1) found in the testis, and cyp19a1a in the ovary (Table 3), indicated that steroid-metabolizing enzymes may also participate in the development of gonads and regulate the synthesis of steroid hormones in S. sihama.
Sox (Sry-type HMG box) proteins are involved in the regulation of fish sex determination and differentiation. Sox6, a SOX D transcription factor, was exclusively expressed in the testis and is involved in the later stages of spermatogenesis [38]. In mammals, SRY-related transcription factors from the group SOX E subfamily (Sox8, Sox9 and Sox10) play important roles in the regulation of testis organogenesis and spermatogenesis [39]. In this study, Sox6 and two SOX E genes (Sox8 and Sox10) were expressed at higher levels in the male gonads than in the female gonads, suggesting that the SOX genes play key roles in testis development in S. sihama.
Perm acrosome membrane-associated protein 6 (spaca6), sperm-associated antigen 17 (spag17), spermatogenesis-associated protein 17 (spata17), spermatogenesis-associated protein 4 (spata4) and sperm flagellar protein 2 (spef2) were significantly more highly expressed in male compared to female gonads, and the expression difference might be associated with the spermatogenesis in silver sillago. Spaca6 has been shown to be a sperm membrane component that mediates fusion between the ovum and the spermatozoa [40]. Spag17, Spef2, and Sp17 have been shown to have essential functions in spermatogenesis [8].
The zona pellucida (ZP) is an extracellular glycoproteinaceous matrix shell that is involved in oocyte and gamete development. In fish, the expression level of zp genes' mRNA is significantly increased during oogenesis, especially at the previtellogenic stage, which shows a higher expression level than at undeveloped stages [41]. Previous studies showed that the zp2 gene played a key role in the early formation of the oocyte envelope and zp3 acted as a major class of female-specific molecules for reproduction [42]. In this study, three ZP genes (zp2, zp3, and zp4) showed higher expression levels in the S. sihama ovary compared to the testis, suggesting that these oocyte-specific genes may also play important roles in folliculogenesis and reproduction.
A set of 15 selected genes was validated by qRT-PCR, including eight upregulated genes in the male (amh, cyp11b, cyp27a1, dmrt1, dmrtb1, foxl1, gsdf, izumo1), and seven upregulated genes in the female (cyp19a1a, dmrt3, gnrhr2, igfbp1, spaca4, zp2, zp4). The expression patterns of all genes indicated by qPCR analysis were similar to those indicated by RNA-seq (Table S5), indicating the reliability and accuracy of the transcriptome expression analysis.

Involvement of Candidate Pathways in Gonadal Development and Gametogenesis
In this study, we identified some enriched KEGG pathways related to silver sillago gonadal development and gametogenesis; i.e., neuroactive ligand-receptor interaction, calcium signaling pathway, adrenergic signaling in cardiomyocytes, cell adhesion molecules, MAPK signaling pathway, progesterone-mediated oocyte maturation, and GnRH signaling pathway ( Figure 6).
In teleost, it has been reported that the neuroactive ligand-receptor interaction pathway plays a vital role in teleost reproduction and gonadal development, such as in Nile tilapia [43], yellow perch [44] and spotted knifejaw [6] and olive flounder [9]. The GnRH signaling pathway plays an essential role in regulating the production and activating signal transduction cascades that ultimately modulate gonadotropin biosynthesis [45]. In this study, a large number of DEGs were found in this pathway, including growth hormone secretagogue receptor (ghsr), follicle stimulating hormone (fsh), gonadotropin-releasing hormone receptor (gnrhr), luteinizing hormone (lhcgr), thyroid stimulating hormone (tsh), and prostaglandin E receptor 1 (ptger1). These genes have been reported to be involved in the regulation of sex maturity, gonad development and reproduction by regulating the production and neuro-distribution of sex steroids [6].
It has been reported that Ca 2+ , PKCs and MAPKs may enable the gonadotrope to decode the nature of the pulsatile secretion of GnRH, and play an important role in the interaction network of the GnRHR [46]. Ca 2+ and calmodulin kinase I and II participate in decoding GnRH pulse frequencies [47]. Calcium signaling plays a key role in male reproductive health. Several studies have shown that this pathway participates in cyclic adenosine monophosphate (cAMP)-induced steroidogenesis and male fertility [48][49][50]. Cell adhesion molecules play important roles in spermatogenesis in mice [51]. In fish, this pathway was found to be significantly enriched in the sex-biased genes [9]. The MAPK signaling pathway plays an important role in the regulation of gonadotropin subunit gene expression and the transition of primary spermatocytes across the blood-testis barrier [52]. In the crucifix crab, the MAPK signaling pathway was identified as one of the enriched pathways related to the reproduction [7]. In fish, the GnRH signaling pathway plays an essential role in regulating the production of and activating signal transduction cascades that ultimately modulate gonadotropin biosynthesis [45].

Conclusions
In conclusion, this work represents the first gonadal transcriptomes of S. sihama and 74,038 unigenes were generated. By comparing ovary and testis transcriptomes, numerous DEGs known to be involved in gonadal development and gametogenesis were identified. By KEGG enrichment analyses, the top 20 KEGG pathways with the highest number of DEGs were found to be pivotal in regulating gonadal development and gametogenesis in S. sihama. Moreover, 22,666 SSRs were identified in 14,577 SSR-containing sequences. These findings provide a valuable dataset for future functional analyses of sex-associated genes and molecular marker assisted selections in S. sihama.