De Novo Assembly, Characterization and Comparative Transcriptome Analysis of the Mature Gonads in Spinibarbus hollandi

Simple Summary Spinibarbus hollandi has been recognized as an economically important aquaculture species in southeastern China. However, the information on its reproductive regulation is scarce. The long maturity period and low egg laying amount remain a major challenge for large-scale breeding of S. hollandi. In the current study, the gonad transcriptomes of S. hollandi were first obtained through Illumina Novaseq technology. By transcriptome comparing of ovary and testis, a lot of differential expression genes were identified and most of them were supposed to participate in gonad formation, differentiation and gametogenesis. Moreover, the expression levels of some typic reproductive genes indicated they might play similar roles in gonad differentiation and development of S. hollandi. Finally, the gonad transcriptome information can help researchers understand the regulatory functions of sex-related genes in S. hollandi. Abstract Spinibarbus hollandi is an important commercial aquaculture species in southeastern China, but with long maturity period and low egg laying amount. However, there has been little study of its gonad development and reproductive regulation, which limits aquaculture production. Here, for the first time, gonadal transcriptomes of male and female S. hollandi were analyzed. A total of 167,152 unigenes were assembled, with only 48,275 annotated successfully. After comparison, a total of 21,903 differentially expressed genes were identified between male and female gonads, of which 16,395 were upregulated and 5508 were downregulated in the testis. In addition, a large number of differentially expressed genes participating in reproduction, gonad formation and differentiation, and gametogenesis were screened out and the differential expression profiles of partial genes were further validated using quantitative real-time PCR. These results will provide basic information for further research on gonad differentiation and development in S. hollandi.


Introduction
Spinibarbus hollandi, belonging to the subfamily Barbinae of Cyprinidae, is an important commercial aquaculture species in southeastern China, which is mainly distributed in Guangdong, Guangxi, Hunan, Hubei, Fujian, Anhui and Taiwan Provinces. S. hollandi has high nutritional value and medicinal value, and is popular as a table fish [1]. S. hollandi is an omnivorous species, which feeds mainly on aquatic insects, small fish, shrimp, and even blue-green algae and organic detritus [2]. Although S. hollandi is easy-to-raise, its growth rate is relatively slow, which limits the aquaculture efficiency and popularization of S. hollandi [3]. In recent years, due to its good taste and beautiful appearance, S. hollandi is increasingly becoming a popular fish with considerable economic value especially in 2 of 13 south China. S. hollandi is a gonochoristic fish species, where males mature earlier than females. The age of first maturation of S. hollandi normally takes 3 to 4 years in south China. In addition, although S. hollandi is a cyprinid fish, only a few thousand eggs are laid at one time for female fish [4]. However, dozens to millions of eggs are laid at one time in grass carp and common carp. In short, long maturity period and low egg laying amount are critical factors limiting the yield increase of S. hollandi. Thus, it is extremely important to develop our understanding of the reproductive biology of this species.
For fish, gonadal development, differentiation and maturation are more complex than other vertebrates, which are easily influenced by the environmental factors [5]. In recent years, cumulative practices have shown that understanding the mechanisms of reproductive regulation is critically important for reproduction management and breeding practice. However, previous studies mainly focused on genetic resources, rearing conditions, morphology and molecular markers associated with growth of S. hollandi [3], whereas less attention was paid to gonadal development, differentiation, maturation and gametogenesis. Thus, to better understand the reproductive mechanisms and facilitate reproductive regulation, it is necessary to explore the functional genes associated with gonad development and differentiation, and gametogenesis.
Despite the rapid development of next generation sequencing (NGS) technology, the available genetic information on S. hollandi is rather limited. Only brain and muscle transcriptome data were published to explore the mechanism of starvation response and compensatory growth [6,7]. Up to now, nearly no reproduction-related gene was characterized and fundamental information of their expression is still lacking. Due to low cost and high throughput, NGS-based transcriptome sequencing has been a most effective method to generate a large number of transcripts and gene expression profiles in recent years [8]. It has been widely used for functional gene discovery and analysis of gene expression in teleosts. In addition, using gonad comparative transcriptome analysis, the expression patterns of reproduction-related genes have been revealed in many fish species such as Spot-fin porcupine fish (Diodon hystrix) [9], spotted scat (Scatophagus argus) [10], spotted knifejaw (Oplegnathus punctatus) [11], yellow catfish (Pleteobagrus fulvidraco) [12], mandarin fish (Siniperca chuatsi) [13] and so on. All these studies characterized a lot of candidate reproduction-related genes and provided basic information for reproductive regulation.
In this study, for the first time, Illumina-based transcriptome sequencing and de novo assembly of female and male gonads was carried out in S. hollandi. After comparative transcriptome analysis, a large number of sex-biased genes were uncovered and differentially expressed genes involved in reproductive regulation were also identified and discussed. Our data will enrich the currently available genetic and genomic data on S. hollandi, and identify candidate genes participating in gonad differentiation, development, maturation, and gametogenesis.

Sample Collection
In this study, six 4-year-old S. hollandi individuals (three males and three females) were obtained from an aquaculture farm in Shaoguan, Guangdong province. The fish were anesthetized with MS-222 before euthanizing them. Gonads from each individual were collected and stored in liquid nitrogen immediately. All animal handling procedures and experimental protocols were approved by the Experimental Animal Ethics Committee of the Guangzhou University of China (No. GURBBB210405).

RNA Extraction and Library Construction
Total RNA was extracted from fresh frozen tissue using RNA isolater Total RNA Extraction Reagent (Vazyme, Nanjing, China) according to the manufacturer's instructions. The RNA concentration and purity were detected preliminarily using the Nanodrop2000 (Thermo Scientific, Wilmington, DE, USA). Then, integrity detection and quantification of RNA was carried out by Agilent 4200 Bioanalyzer (Agilent Technologies, Santa Clara,  Kit for Illumina ® (NEB, Ipswich, MA, USA). Eukaryotic mRNAs were enriched using magnetic beads with Oligo (dT). In a high temperature environment with metal ions, the RNA was fragmented and the first cDNA chain was synthesized with random hexamers, followed by the addition of enzyme, buffer, dNTPs (dATP, dTTP, dGTP, dCTP) to synthesis the second chain of cDNA. The synthesized double-stranded cDNA was purified by magnetic beads and repaired, and additional A was ligated to the tail end of cDNA, and the sequencing connector was connected. The sorted magnetic beads were used for fragment size sorting, and the sorted fragments were enriched by PCR, and finally purified to obtain the final library.

Library Sequencing, De Novo Assembly and Annotation
Using the Illumina Novaseq 6000 (Illumina, Inc., San Diego, CA, USA) high-throughput sequencing platform, the qualified libraries were sequenced with the sequencing strategy was Pair-End 150 (PE150). The amount of sequencing data for each library was not less than 6 Gb.
In order to obtain high quality assembly results for subsequent analysis, the raw reads were filtered, and low-quality reads with unknown base N beyond 5% and joint contamination were discarded. The remaining reads, also called "Clean reads", were saved in FASTQ format. Then, using Trinity software, de novo assembly was carried out [14].
The coding region of Unigene was predicted by three forward and reverse reading frames, which could produce altogether 6 kinds of coding protein sequences. After obtaining the coding protein sequences, the protein sequences were compared with the non-redundant protein sequences (Nr) (https://www.ncbi.nlm.nih.gov/ (accessed on 20 April 2021)) and the Uniprot protein (https://www.uniprot.org/ (accessed on 20 June 2021)) database. The encoding method with the best alignment (the largest alignment score) was chosen as the encoding method of the gene.
Using homology searches, unigenes were annotated against five major public databases including Nr database, the Clusters of euKaryotic Orthologous Groups (KOG, http:// www.ncbi.nlm.nih.gov/KOG (accessed on 27 June 2021)) database, the Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg (accessed on 17 July 2021)) and the Uniprot protein database using BLASTx or Diamond. The NR results were further annotated by Blast2GO [15], and the unigenes were further grouped (GO, http://geneontology.org (accessed on 25 July 2021)) according to the classification of biological process, cell composition and molecular function.

Identification of Differentially Expressed Genes and Enrichment Analysis
The clean reads of each sample are first mapped to the assembled transcripts using hisat2 v2.1.0 [16]. Then, the transcripts per million (TPM), fragments per kilobase of exon model per million mapped fragments (FPKM), coverage and other gene-expressionrelated values were further calculated using stringtie v1.3.3b according to the comparison results [17]. Finally, using python prepDE.py, the output result of stringtie software was converted into a format recognized by the "edgeR" package (V3.6) [18,19], and the differential gene expression was analyzed using edgeR. The genes with p-value < 0.05 and |log2FC| > 2 are set as significantly differentially expressed genes (DEGs). The number of differentially expressed genes was counted by statistical significance, and then the clusterProfiler program of the R package [20] was applied against the background of DEGs. Based on Fisher's exact test and Benjamini correction, the pathways that were significantly enriched in differentially expressed genes compared with the whole genome background with p value < 0.05 were considered to be significantly enriched.

Validation of DEGs Using Quantitative Real-Time PCR
The quantitative real-time PCR (qRT-PCR) analysis was used to validate the reliability of these DEGs. A total of 14 differentially expressed reproduction-related genes were randomly selected for validation. First, based on the sequences of 14 differentially expressed reproduction-related genes and reference gene β-actin, specific primers were designed by Primer Premier 5.0 (Table 1). Then, the HiScript III RT SuperMix for qPCR (+gDNA wiper) (Vazyme, Nanjing, China) was used for synthesizing cDNA template. Each qRT-PCR was carried out using SYBR Green qPCR Mix (GDSBIO, Guangzhou, China) on a LightCycle 480 system. The qRT-PCR reaction included an initial denaturation step at 95 • C for 3 min, followed by 40 cycles of 95 • C for 10 s, 60 • C for 20 s and 72 • C for 20 s, and a final extension at 72 • C for 5 min, ending with a dissociation curve process. Each sample was amplified in triplicate, and the expression of each gene was normalized using β-actin by the comparative CT method (2-∆∆CT) [21] and was shown as mean ± standard error. Table 1. Primers of qRT-PCR.

Overview of Transcriptome Assembly Quality
A sum of six cDNA libraries was constructed in triplicates from testes (JC) and ovaries (LC). After quality control and data filtering, a total of 39.72 Gb clean reads was generated using an Illumina HiSeq platform, with a mean of 6.62 Gb, ranging from 6.0 to 7.28 Gb per sample. The mean values of Q20 and Q30 are higher than 95% (Table 2). After de novo assembly, 167,152 unigenes were assembled, with mean length and N50 of 871 bp and 1198 bp, respectively (Table 3). About 30.13% unigenes were in length of 300-500 bp, and 33,368 of them (19.97%) were more than 1000 bp in length ( Figure 1).

Unigene Annotation
After annotation, a total of 48,275 unigenes were successfully annotated. The highest percentage of annotated unigenes was in the Uniprot database (47,343; 98.07%) and second is the Nr database (46,151; 95.60%), while the lowest was in the GO database (23,186; 48.03%) (Figure 2A). The distribution of BLASTx top-hit species showed that Onychostoma macrolepis (12,937; 28.03%) has the biggest number of homologous genes to S. hollandi, followed by Cyprinus carpio (5924; 12.84%), Sinocyclocheilus rhinocerous (5554; 12.03%) and Sinocyclocheilus anshuiensis (4699; 10.18%) ( Figure 2B). ond is the Nr database (46,151; 95.60%), while the lowest was in the GO database (23,186; 48.03%) (Figure 2A). The distribution of BLASTx top-hit species showed that Onychostoma macrolepis (12,937; 28.03%) has the biggest number of homologous genes to S. hollandi, followed by Cyprinus carpio (5924; 12.84%), Sinocyclocheilus rhinocerous (5554; 12.03%) and Sinocyclocheilus anshuiensis (4699; 10.18%) ( Figure 2B). In addition, all unigenes were further annotated in GO, KOG and KEGG databases to get knowledge of their functional classification. A total of 23,186 (48.03%) unigenes were grouped into three categories of GO. In the biological processes category, cellular In addition, all unigenes were further annotated in GO, KOG and KEGG databases to get knowledge of their functional classification. A total of 23,186 (48.03%) unigenes were grouped into three categories of GO. In the biological processes category, cellular (19,188) was the most representative item. In the category of cellular component, the most represented term was cellular anatomical entity (22,121). In addition, in the molecular function category, the most representative terms were binding (17,708) and catalytic activity (11,582) ( Figure 3A). After KEGG annotation, a sum of 25,817 (53.48%) unigenes was divided into five different functional categories. The largest distribution was "signal transduction" (3943 unigenes), "global and overview maps" (2552 unigenes) and "immune system" (2030 unigenes) ( Figure 3B). According to KOG annotation, a total of 23,186 (48.03%) unigenes were grouped into 25 families, and the richest distribution was "General function prediction only" (5349 unigenes), followed by "Signal transduction mechanisms" (3452 unigenes); the smallest distribution was "Nuclear structure", with only 8 unigenes ( Figure 3C).

Differential Expression Analysis
The levels of gene expression were normalized using the TPM values. A total of 21,903 DEGs were identified between the ovary and testis samples, with 16,395 (74.85%) DEGs significantly high expressed in the testes and 5508 (25.15%) high expressed in the ovaries ( Figure 4A). A volcano plot shows the DEG information ( Figure 4B). The KEGG enrichment analysis revealed that the metabolic pathway was most representative, followed by the MAPK signaling pathway and the neuroactive ligand-receptor interaction pathway ( Figure 5).

Differential Expression Analysis
The levels of gene expression were normalized using the TPM values. A total of 21,903 DEGs were identified between the ovary and testis samples, with 16,395 (74.85%) DEGs significantly high expressed in the testes and 5508 (25.15%) high expressed in the ovaries ( Figure 4A). A volcano plot shows the DEG information ( Figure 4B). The KEGG enrichment analysis revealed that the metabolic pathway was most representative,     In addition, based on the GO and KEGG annotation, numerous genes related production and gonad development and differentiation were identified (Table 4). Mullerian hormone (amh), SRY-box transcription factor 9 (sox9), double sex-and m related transcription factor 2 (dmrt2) and androgen receptor (ar) were highly expres the testis, while cytochrome P450 (cypa19a), catenin beta-1 (ctnnb1) and bone morp netic protein 15 (bmp15) expressed highly in the ovary. The horizontal axis is the ratio of the number of differential genes annotated to the KEGG pathway to the total number of differential genes. The vertical is the enriched KEGG pathways. The size of the dots represents the number of genes annotated on the KEGG pathway term. The colors key, from red to blue, represent significant enrichment.

Validation of Transcriptomic Data
A total of fourteen DEGs related to sex differentiation and gonadal development were analyzed from the transcriptome data, including amh, hsd11b2, fshr, era, erb1, erb2, ar, smad4a, gdf6, bmp7, pax7, pax8, sox5 and sox9. In general, the results of qRT-PCR and RNA-seq analysis were consistent (Figure 6), and the expression of all tested genes was higher in testes than in ovaries, which indicated the gene expression profiles quantified by transcriptomic analysis were reliable and accurate.

Validation of Transcriptomic Data
A total of fourteen DEGs related to sex differentiation and gonadal development were analyzed from the transcriptome data, including amh, hsd11b2, fshr, era, erb1, erb2, ar,  smad4a, gdf6, bmp7, pax7, pax8, sox5 and sox9. In general, the results of qRT-PCR and RNAseq analysis were consistent (Figure 6), and the expression of all tested genes was higher in testes than in ovaries, which indicated the gene expression profiles quantified by transcriptomic analysis were reliable and accurate.

Discussion
Gonad development, differentiation and maturity are extremely complicated biological process, which contains a number of functional genes. In recent years, transcriptome sequencing has been demonstrated to be a highly effective method to obtain comprehensive information of gene expression for specific tissue. S. hollandi is an important commercial aquaculture species with high nutritional value and medicinal value. Its long sexual maturity time and low egg laying amount seriously hinder the development of the breeding industry. However, up to now, the molecular mechanisms of gonadal differentiation and development have not been revealed in S. hollandi. In this study, comparative transcriptome analysis of ovary and testis was first used to identify sex-related DEGs in S. hollandi.

DEGs Involved in Steroid Synthetic Pathway
Sex steroid hormones are critically important for gonadal differentiation, development and maturation in fish species, which mainly includes androgen and estrogen. In fish, the 11-ketotestosterone (11-KT) and 17-estradiol (E2) are the major androgen and estrogen that are essential to testicular and ovarian development [22]. In fact, sex steroid hormone synthesis needs a lot of steroid-metabolizing enzymes, which are encoded by many genes such as cyp19a1a, cyp11b2, hsd11b1, cyp11a1, hsd17b and so on.
Among them, the cyp19a1a gene is the most essential regulator, which can directly catalyze the switching of androgens to estrogens in the ovary. In Oreochromis niloticus and Danio rerio, knock out of the cyp19a1a gene caused a drop in E2 levels and further induced female-to-male sex reversal [23,24]. The Hsd11b2 gene encodes an important enzyme that turns testosterone into 11-KT, which is important in testis maintenance. In Danio rerio, the reproductive capability of hsd11b2 homozygous mutation adult males is almost completely abrogated [25]. The Cyp11b1 gene is also involved in the synthesis of 11-KT, encoding the critical enzyme that turns testosterone to 11-KT [26]. In the sea bass, cyp11b1

Discussion
Gonad development, differentiation and maturity are extremely complicated biological process, which contains a number of functional genes. In recent years, transcriptome sequencing has been demonstrated to be a highly effective method to obtain comprehensive information of gene expression for specific tissue. S. hollandi is an important commercial aquaculture species with high nutritional value and medicinal value. Its long sexual maturity time and low egg laying amount seriously hinder the development of the breeding industry. However, up to now, the molecular mechanisms of gonadal differentiation and development have not been revealed in S. hollandi. In this study, comparative transcriptome analysis of ovary and testis was first used to identify sex-related DEGs in S. hollandi.

DEGs Involved in Steroid Synthetic Pathway
Sex steroid hormones are critically important for gonadal differentiation, development and maturation in fish species, which mainly includes androgen and estrogen. In fish, the 11-ketotestosterone (11-KT) and 17-estradiol (E2) are the major androgen and estrogen that are essential to testicular and ovarian development [22]. In fact, sex steroid hormone synthesis needs a lot of steroid-metabolizing enzymes, which are encoded by many genes such as cyp19a1a, cyp11b2, hsd11b1, cyp11a1, hsd17b and so on.
Among them, the cyp19a1a gene is the most essential regulator, which can directly catalyze the switching of androgens to estrogens in the ovary. In Oreochromis niloticus and Danio rerio, knock out of the cyp19a1a gene caused a drop in E2 levels and further induced female-to-male sex reversal [23,24]. The Hsd11b2 gene encodes an important enzyme that turns testosterone into 11-KT, which is important in testis maintenance. In Danio rerio, the reproductive capability of hsd11b2 homozygous mutation adult males is almost completely abrogated [25]. The Cyp11b1 gene is also involved in the synthesis of 11-KT, encoding the critical enzyme that turns testosterone to 11-KT [26]. In the sea bass, cyp11b1 not only was predominantly expressed in the first stages of spermatogenesis but most likely was also expressed in spermatogonia [26]. In addition, high expression of hsd11b2 and cyp11b1 genes in the testis was also found in mandarin fish [27]. In S. hollandi, the expression of cyp11b1 and hsd11b2 genes was higher in the testis, while the cyp19a1a gene expressed highly in the ovary, indicating the expression pattern was similar among different fish species [9,10,27]. Thus, these results indicated that these DEGs play important roles in gonad development and reproduction in fish.

DEGs Involved in Gonad Differentiation and Development
In recent years, more and more genes were demonstrated to participate in the sex determination and gonad differentiation of teleosts. The upstream regulatory genes are diverse, but the downstream genes are much conserved. The common regulatory gene downstream of most sex-related genes is the cyp19a1a gene, which can directly catalyze the conversion of androgens to estrogens.
Anti-Mullerian hormone (amh), a transforming growth factor from the TGF-β family, controls the degeneration of female primordial germ cell ducts in mammals and promotes male development in vertebrates. Previous studies have demonstrated amh mutation could lead to male-to-female sex reversal in zebrafish [28] and tilapia [29]. Moreover, direct evidence revealed that amh could inhibit the transcription of cyp19a1a through Amhr2/Smads signaling [30]. Bmp15 and gdf9, as oocyte-produced signals, participate in maintaining adult female sex differentiation and, in zebrafish, females deficient in bmp15 begin development normally but switch sex during the mid-to late-juvenile stage, and become fertile males [31]. In S. hollandi, the results revealed that the expression of amh was higher in the testis, indicating the Amh pathway was of great importance in testicular differentiation regulation. In addition, bmp15 and gdf9 also showed ovary-biased expression in S. hollandi, suggesting a conserved role of bmp15 and gdf9 in teleosts.
Sox family genes are widely involved in the sex determination and differentiation process of vertebrates, and are a family of transcription factors that are closely related to the mammalian sex-determining region of the Y genes, namely sry [32]. Previous study has found that sox5 could downregulate the activity of dmy (main sex determining gene) by binding to its promoter, and females with the sox5 mutation will reverse to males in medaka [33]. Sox9 is known to have an essential role in Sertoli cell differentiation; conditional sox9 knockout testes fail to maintain germ cells in mice [34]. Previous evidence has proved that sox9 is capable of interfering with WNT/β-catenin signaling and repressing the ovarian-differentiating genes including follistatin, Iroquois-related homeobox 3 and foxl2 [35]. As in the medaka and mouse, sox5 and sox9 also showed high expression in the testis of S. hollandi, suggesting an important function of these genes in vertebrates.

DEGs Involved in Gametogenesis and Gamete Maturation
In aquaculture practice, it is extremely important to obtain parental fish with welldeveloped gametes. Many genes are involved in the regulation of gamete development and maturation. Here, the expression of many genes involved in gametogenesis and gamete maturation was also characterized, such as spata16, zp3 and zp4. Spata16, namely spermatogenesis-associated protein 16, is essential for spermiogenesis [36] and male fertility; homozygous mutation of spata16 leads to male infertility [37]. Zp3 is a major kind of female-specific factor involved in the regulation of the reproductive process, forming an extracellular matrix surrounding oocytes which mediates sperm binding. Similar to zp3, zp4 also surround oocytes and may act as a sperm receptor. In this study, the expression of zp3 and zp4 was higher in the ovary, indicating that they might participate in ovarian folliculogenesis in S. hollandi. In addition, spata13, spag5 and strbp also showed higher expression in the ovaries, suggesting an important role in oogenesis and oocyte maturation. In all, similar expression patterns of these reproduction-related genes were also found in Diodon hystrix [9], Scatophagus argus [10], Patinopecten yessonsis [38] and so on, indicating a conserved role of these genes in teleosts.

Conclusions
This work is the first study on the gonad transcriptome of S. hollandi. Based on testis and ovary transcriptome data, a number of 167,152 unigenes were identified. Comparative transcriptome analysis identified a large scale of DEGs involved in gonadal development, differentiation and gametogenesis. For these DEGs related to gonad development and reproduction, similar expression profiles were found, suggesting their conserved roles in gonad development and gametogenesis. Our results will provide a valuable resource for further research on sex determination and gonad development.