Comparative Transcriptome Analysis of Gonads for the Identification of Sex-Related Genes in Giant Freshwater Prawns (Macrobrachium Rosenbergii) Using RNA Sequencing

The giant freshwater prawn (Macrobrachium rosenbergii) exhibits sex dimorphism between the male and female individuals. To date, the molecular mechanism governing gonadal development was unclear, and limited data were available on the gonad transcriptome of M. rosenbergii. Here, we conducted comprehensive gonadal transcriptomic analysis of female (ZW), super female (WW), and male (ZZ) M. rosenbergii for gene discovery. A total of 70.33 gigabases (Gb) of sequences were generated. There were 115,338 unigenes assembled with a mean size of 1196 base pair (bp) and N50 of 2195 bp. Alignment against the National Center for Biotechnology Information (NCBI) non-redundant nucleotide/protein sequence database (NR and NT), the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, SwissProt database, Protein family (Pfam), Gene ontology (GO), and the eukaryotic orthologous group (KOG) database, 36,282 unigenes were annotated at least in one database. Comparative transcriptome analysis observed that 10,641, 16,903, and 3393 genes were significantly differentially expressed in ZW vs. ZZ, WW vs. ZZ, and WW vs. ZW samples, respectively. Enrichment analysis of differentially expressed genes (DEGs) resulted in 268, 153, and 42 significantly enriched GO terms, respectively, and a total of 56 significantly enriched KEGG pathways. Additionally, 23 putative sex-related genes, including Gtsf1, IR, HSP21, MRPINK, Mrr, and other potentially promising candidate genes were identified. Moreover, 56,241 simple sequence repeats (SSRs) were identified. Our findings provide a valuable archive for further functional analyses of sex-related genes and future discoveries of underlying molecular mechanisms of gonadal development and sex determination.


Introduction
The giant freshwater prawn, Macrobrachium rosenbergii (M. rosenbergii), as an important commercial freshwater prawn species, is widely distributed in China and Southeast Asian countries due to its

Sample Collection
The super female (WW genotype) individuals were purchased from the Enzootic Company from Israel (https://enzootic.com), and reared in a national M. rosenbergii seed multiplication farm in Nanning, Guangxi, China. The male (ZZ genotype) and female (ZW genotype) prawns from the same family were collected from this farm. For the WW individuals, they were achieved through a novel biotechnology with two steps: Females were sex-reversed into neo-males by injecting the suspended hypertrophied androgenic gland cells, crossing neo-males with normal females to validated WW females. Notably, the phenotypes of super females were not different from normal females and the fecundity of WW individuals has shown no significant deviation from the ZW individuals [24]. All the healthy experimental individuals used in this study were five months of age, and were maintained in aerated freshwater at 26 ± 2 • C in the previously mentioned farm. Ovaries from females (ZW) and super females (WW), and testes from males (ZZ), were collected and immediately frozen in liquid nitrogen, then stored at 80 • C until RNA extraction. For each of three groups, three biological replicates were used for RNA-seq, with each replicate being pooled by three prawns. The growth performance of the sample individuals is listed in Table 1 and Table S1.

RNA Extraction, Library Preparation, and Transcriptome Sequencing
Three individuals of each sample were pooled to extract the total RNA using Trizol Reagent (Invitrogen, Carlsbad, USA) according to the manufacturer's instructions, and then treated with RNase-free DNase I (TianGen, Beijing, China) to eliminate genomic DNA contamination. The integrity and concentration of RNA was determined using the Agilent Bioanalyzer 2100 system (Agilent Technologies, California, USA) and the Qubit ® 2.0 Flurometer (Life Technologies, Carlsbad, USA), respectively. High-quality RNA samples (OD260/280 = 1.8-2.2, OD260/230 > 2.0, RNA integrity number (RIN) > 7.5, 28S: 18S > 1) was stored at −80 • C and used for library construction. A total of nine sequencing libraries were constructed with NEBNext ® UltraTM RNA Library Prep Kit for Illumina ® (NEB, Beijing, China) following the manufacturer's recommendations. Finally, the libraries were sequenced with the Illumina Hiseq platform using paired-end strategy.

Transcriptome De Novo Assembly and Functional Annotation
The raw sequencing reads were processed through a Perl program for quality control. Clean reads were obtained by removing reads with adaptors and poly-N, removing ambiguous reads containing more than 10% unknown bases and the low-quality reads (the rate of reads in which the quality value ≤20 was more than 50%). Transcriptome assembly was accomplished by Trinity [25] with min_kmer_cov set to 2 by default and all other parameters set to default values. Then, a single set of nonredundant unigenes were acquired with the GICL clustering software [26].
Functional annotation of all assembled unigenes was performed with the following seven databases and four softwares: National Center for Biotechnology Information (NCBI) nonredundant nucleotide sequence database (Nt) based on the NCBI Blast (v2.2.28+) (https://blast.ncbi.nlm.nih.gov/Blast.cgi) (E-value < 1 × 10 −5 ) NCBI nonredundant protein sequence database (Nr), SwissProt database, Kyoto Encyclopedia of Genes and Genomes (KEGG) (E-value < 1 × 10 −10 ) using KEGG Automatic Annotation Server (KAAS) and eukaryotic orthologous group (KOG) database based on the diamond (version 0.8.22) (E-value < 1 × 10 −3 ), Protein family (Pfam) by Hmmscan with the E-value less than 0.01, and Gene Ontology (GO) through Blast2GO (version 3.0) (https://www.blast2go.com/) (E-value < 1 × 10 −6 ), respectively. The unigenes annotated by at least one database were considered as annotated successfully based on the corresponding threshold. When unigenes were different in the cases where searches against different databases disagreed with each other, the desired annotation information was according to our experiment objective and the reliability of different databases (NR > GO > KEGG > SwissProt > Pfam > KOG > NT). Thus, in our study, we preferred the annotated results of NR database.

Identification of Differentially Expressed Genes (DEGs)
RSEM software (version v1.2.15) was used for reads mapping to the assembled reference transcriptome with default parameters [27]. Gene expression levels were calculated based on the expected number of fragments per kilobase of transcript sequence per millions base pairs sequenced (FPKM). Differential gene expression analysis among the three groups was performed with the DESeq2 [28] within q-value < 0.01, |log2 (fold change)| > 2. Additionally, GO enrichment and KEGG pathway analysis of DEGs was implemented with the GOseq tool and online software KOBAS (Version 3.0) (http://kobas.cbi.pku.edu.cn/index.php) [29,30], respectively. Among them, those with p-value < 0.05 were considered to be significantly enriched.

Real-Time Quantitative Reverse Transcription PCR (qRT-PCR)
To validate the accuracy of gene expression data obtained by RNA-seq, 10 DEGs were selected to be verified by qRT-PCR using the same samples for RNA-seq. The 10 selected DEGs consisted of 6 sex-related candidate genes (ubiquitin carboxyl-terminal hydrolase 46, transcription factor SOX-10-like, heat shock protein 70, heat shock 70 kDa protein 1A/1B, male reproductive-related protein B, and male reproductive-related serum amyloid A), one DEG (heat shock protein 10) from the heat shock protein family, one DEG (slow-type skeletal muscle actin 4) with the highest expression in ZZ gonads, and the two DEGs (neuronal-specific septin-3 and tubulin α-1 chain-like) with two-fold different expression between WW gonads and ZW gonads. PCR Primers were designed with Primer 3 and Oligo 7.0 (Table S2). Then 18S was used as a housekeeping gene to normalize the mRNA levels of DEGs [31,32]. The qRT-PCR was performed in triplicate by using the Lightcycler 480 II (Roche Applied Science, Penzberg, Germany) with the following reaction conditions: Pre-denaturation at 95 • C for 10 s; amplification 45 cycles of 95 • C for 10 s, 59 • C for 10 s, and 72 • C for 10 s, and using the following program: 95 • C for 10 min; 45 cycles of 95 • C for 10 s, 60 • C for 10 s, and 72 • C for 10 s; 72 • C for 6 min. The relative expression levels were calculated with the method of 2 −∆∆Ct as described previously [33].

Sequencing Analysis and Transcriptome Assembly
In the present study, a total of 70.33 gigabases (Gb) of clean bases were generated. The clean reads ranging from 47,066,162 to 60,480,354 for nine libraries with the GC% of approximately 42% per library were used for further analysis. In addition, Q20 and Q30 of each library were higher than 97% and 93%, respectively ( Table 2). As illustrated in Figure 1, a total of 115,338 unigenes were assembled with a mean size of 1196 bp and N50 of 2195 bp.

DEGs Identification and Enrichment Analysis
As a result, 10,641 genes were significantly differentially expressed in ZW vs. ZZ and 16,903 genes were found differentially expressed in WW vs. ZZ. Between WW vs. ZW, only 3393 genes were revealed as DEGs ( Figure 3 and Table S4).
GO annotation was performed to classify the DEGs among the three groups. As a result, 268, 153, and 42 GO terms were significantly enriched, respectively ( Figure 4). Amongst them, the most enriched GO terms at the level of biological processes were "chitin metabolic process" and "glucosamine-containing compound metabolic process". At the molecular function level, "dynein binding" was the dominant term. At the cellular component, "dynein complex" was the most significantly enriched term.
Meanwhile, enrichment analysis of KEGG pathway was conducted. Totally, 56 KEGG pathways were significantly enriched. The top two signaling pathways were "amino sugar and nucleotide sugar metabolism" and "starch and sucrose metabolism" ( Figure 5).

Genes of Interest Related to Sex
Based on the functional annotation of unigenes, along with the previous publications, 23 sex-related candidate genes implicated in gonadal development and sex determination were identified (Table 4). Notably, among them, two genes (gametocyte-specific factor 1 and insulin-like receptor) showed specific expression in ovaries between female and super female. And seven genes, including heat shock protein 21, heat shock protein isoform 12Ai1, male reproductive tract-specific Kazal-type proteinase inhibitor, male reproductive-related protein, male reproductive-related protein B, male reproductive-related protein A, and male reproductive-related protein Mar-Mrr, showed male-specific expression patterns. These genes could be considered as the most promising sex-related candidates.

The qRT-PCR Validation
Furthermore, 10 DEGs were selected for qRT-PCR validation. As a result, all the detected genes showed similar expression patterns (Figure 7), which indicated the reliability and accuracy of our transcriptome analysis.

Discussion
As an economically important aquaculture species, a better understanding of the genetic and biological mechanisms underlying the complex ZW/ZZ sex determination system of M. rosenbergii is important, yet it is poorly elucidated. Sagi et al. firstly reported the sex dimorphism in M. rosenbergii [34]. Subsequently, two sex reversal experiments uncovered that M. rosenbergii bears the ZW/ZZ sex determination system. Mating of sex-reversed females (neo-males) with normal females (ZW) could increase a higher ratio of females, while mating of sex-reversed males (neo-females) with normal males could produce all-male (ZZ × ZZ) progeny [1,6,7]. Recently, along with the emergence of high-throughput sequencing, RNA-seq as a most powerful tool is available for illustrating the mechanisms of immune response [35], determining the expression pattern of reproduction, growth, and pheromone communication in hepatopancreas, gill, muscle, and antennal gland [36][37][38]. In the present study, we profiled to detect the transcriptomes of gonads in females, super females, and the males using RNA-seq, with the aim of interpreting the molecular mechanism involved in the sex determination, and identifying sex-related candidate genes.
The general consensus in the prawn (M. rosenbergii) culture industry is that the major approach to potentially boost the prawn production is to produce all females, due to their normal size distribution, higher production, and product values compared to all males [2]. Hence, in our study, to better illustrate the molecular mechanism of sex-determination, we used the normal females, super females, and males to detect the gonadal expression profile. Therein the male individuals and normal female individuals were from one full-sib family, and, along with the super females, they were reared in the same conditions and environments in the national M. rosenbergii seed multiplication farm. Moreover, all the healthy samples were five months; meanwhile, the body weight and body length of samples, as well as the development of ovaries for females and super females, were similar, thus reducing the background effects as far as possible.
We identified 23 sex-related candidate genes. Among them, two and seven genes showed specific expression patterns in ovaries and testes. Gametocyte-specific factor 1 (Gtsf1) as a member of unknown protein family 0224 (UPF0224) was involved in Piwi-interacting RNA (piRNA) pathway [39]. Some evidences indicated that Gtsf1 participated in spermatogenesis, and had significant impact on male germ cells [39,40]. Moreover, in teleost, Gtsf1 as a key candidate gene was involved in sex differentiation [41,42]. Intriguingly, in this study, Gtsf1 displayed sexual dimorphic expression and no expression was detected in testes. Furthermore, the expression in super females was more than two-fold higher than that in females. The current data insinuated that Gtsf1 could be considered as a key candidate gene related to gonadal development and sex determination.
Gonadotropin-releasing hormone II receptor (GnRHR2) was originally identified in humans [43]. Evidence has been accumulated that GnRH2 is required for sexual behavior [41]. Furthermore, initial studies have discovered that GnRHR2 together with its ligand GnRH2, were considered to be novel regulators in driving the reproduction process in mammals [44,45]. Toward this end, our data detected that GnRHR2 was upregulated in ovaries. Taken together, we conjectured that GnRHR2 has essential functions in female reproduction.
Insulin-like receptor (IR) functions as the pivotal member of insulin family signaling pathway and directs the male sexual differentiation in mammals [46]. Earlier evidence has shown that IR interacts with the insulin-like androgenic gland hormone (IAG) to regulate sex differentiation and spermatogenesis in crustaceans [13,15,46,47]. In our study, IR has shown higher expression in ovaries than in testes. Notably, the expression of IR in super females was more than two-fold than that in females. The results presented here suggest that IR could be a promising candidate gene for sex differentiation between females and super females.
Sex determination protein fruitless-like (Fru) was specially displayed in regulating the sexual orientation and sex behavior in Drosophila melanogaster [48,49]. In our transcriptome database, Fru showed sexual dimorphism in ovaries and testes, which displayed upregulated expression in ovaries. As observed in the Chinese mitten crab [50], Fru was implicated in sex determination pathway-like in Drosophila melanogaster.
Spermatogenesis-associated protein 5-like (SPATA5L) belonging to the ATPase family protein 2 homolog was first identified in mice and has essential functions in spermatogenesis, alopecia areata, intellectual disability, and other serious disorders [50]. Ge et al. have shown that down regulation of the expression of SPATA5L in males could decrease the fecundity of females during mating [51]. In our research, SPATA5L was expressed at higher levels in ovaries. Taken together, the current data provides a hint that SPATA5L might be a promising candidate for sex differentiation.
Transcription factor SOX-10-like as a member of SRY-related transcription factors from the Sox (Sry-type HMG box) E subfamily was reported to play a critical role in testis development [18,52]. In our research, it was expressed higher in ovaries than that in testes.
The ubiquitin-related homologous genes have been investigated and shown higher expression in testis and ovary, and further involved in reproductive process [53]. In our gonadal transcriptome database, two ubiquitin-related homologous genes (ubiquitin carboxyl-terminal hydrolase 46 and ubiquitin carboxyl-terminal hydrolase isozyme L5) were identified. It is important to further characterize these genes in M. rosenbergii to illustrate their potential role in sex differentiation and sex determination.
Cytochrome P450 302A1 (CYP302a1) was a new member of the cytochrome P450 (CYP) super-family. It encodes 22-hydroxylase and participates in ecdysteroid biosynthesis [54,55]. In drosophila and mosquito, CYP302a1 was highly expressed in females, which further suggested that CYP302a1 has an important role in the ovaries, as described by Chavez et al. [56] and Warren et al. [54]. Thus, our data have shown that CYP302a1 was also expressed predominantly in the ovaries, which was in accordance with previous studies and further supporting our finding of the function of CYP302a1 in the ovary.
Cytochrome P450 CYP315a1 (CYP315a1), another a member of the CYP super-family gene, was reported to be implicated in the ecdysteroidogenic pathway in Bombyx mori [57] and Drosophila melanogaster [54]. As we all know, steroid hormones were responsible for controlling reproduction and development in higher organisms and arthropods [58]. Thus, we deduced that CYP315a1 may be involved in testicular development.
Forkhead box l2 (Foxl2), a member of Fox gene family, encodes a conserved transcription factor and is a special marker of ovarian differentiation [59]. Moreover, Foxl2 plays a critical role in ovarian differentiation and maintenance [60]. In our study, interestingly, Foxl2 displayed upregulated expression in testes, and has shown similar expression pattern in Litopenaeus vannamei [21]. However, the corresponding role in M. rosenbergii remains to be determined.
Heat shock proteins (HSPs) contribute to the interaction with steroid hormone receptors, temperature, and estrogen signaling, etc. [61,62]. Furthermore, HSPs were considered as promising candidates with potential effects on temperature-dependent sex determination (TSD) [62]. In addition, Matsumoto et al. have verified that heat shock cognate 70 kDa proteins played a regulatory role in mouse spermatogenesis [63]. In the present study, all three HSPs (heat shock protein 27, heat shock protein 70, and heat shock protein 70 cognate3) exhibited upregulated expression pattern in the testes, while heat shock 70 kDa protein 1A/1B, heat shock protein 21, and heat shock protein isoform 12Ai1 displayed testis-specific expression patterns as well.
Some male reproductive-related genes were also identified. Among them, two male-biased genes (male reproductive-related protein B and male reproductive-related serum amyloid A) and two testis-specific genes (male reproductive-related protein and male reproductive-related protein A) were upregulated and showed a specific expression pattern in testes which was similar to the results of Dai et al. [64]. Meanwhile, the other two testis-specific genes, including male reproductive-related protein Mar-Mrr and male reproductive tract-specific Kazal-type proteinase inhibitor, showed a specific testicular expression pattern. Earlier studies had demonstrated that Mar-Mrr was specially expressed in the male reproductive tract, and was required for sperm function [11,14,65]. Thus, combining the current results indicates that Mar-Mrr is implicated in male-reproduction.
Peptidase inhibitors were not only functional on cell migration, signal transmission, wound healing, and tissue remodeling [66], but also on male-reproduction in Drosophila [67]. Previous studies have confirmed that the male reproduction-related peptidase inhibitor Kazal-type (MRPINK) gene has a potential effect on male reproductive processes in prawns and crab [12,68,69]. In our study, one peptidase inhibitor (male reproductive tract-specific Kazal-type proteinase inhibitor) was expressed at higher levels in male gonads, suggesting that the peptidase inhibitor plays a critical role in gonadal development.
Of note, these identified sex-related genes and RNA-seq data could promote our understanding on the molecular mechanism involved in the sex determination. In addition, RNA interference (RNAi) and clustered regularly interspaced short palindromic repeats/CRISPR associated (CRISPR/Cas9)-mediated genome editing has revolutionized the gene functional determination, and was applied to controlling of sex development in aquaculture species [15,70,71]. Definitely, the actual functional roles of these selected sex-related genes were urgently required to be investigated, and the sex-specific genes may provide strong support for further sex manipulation and monosex culture based on the genome editing strategy.
Notably, numerous SSRs were identified. Currently, SSRs, single nucleotide polymorphisms (SNPs), and insertions and deletions (indels) are widely used for genetic linkage mapping, quantitative trait locus (QTL) detection, genetic diversity assessment etc. In M. rosenbergii, See et al. have developed microsatellite markers to evaluate genetic diversity [72]. Using the high throughput sequencing, it is feasible to detect and discover the large numbers of SSRs [37,73]. Correspondingly, the SSRs identified in the present study could serve as genetic markers for further QTL mapping and marker-assisted selection (MAS) in M. rosenbergii.

Conclusions
This is the first comprehensive report on the gonadal transcriptome of the M. rosenbergii, and 115,338 unigenes were identified. By comparing ovary and testis transcriptomes, numerous DEGs were identified, and 23 sex-related genes were revealed. Moreover, 56,241 SSRs were discovered and could be used as genetic markers. These findings established a valuable database for future functional analyses of sex-related genes and understanding the molecular mechanisms of sex determination in M. rosenbergii.