Transcriptome Sequencing Analysis Reveals Dynamic Changes in Major Biological Functions during the Early Development of Clearhead Icefish, Protosalanx chinensis

Early development, when many important developmental events occur, is a critical period for fish. However, research on the early development of clearhead icefish is very limited, especially in molecular research. In this study, we aimed to explore the dynamic changes in the biological functions of five key periods in clearhead icefish early development, namely the YL (embryonic), PM (first day after hatching), KK (fourth day after hatching), LC (seventh day after hatching), and SL (tenth day after hatching) stages, through transcriptome sequencing and different analysis strategies. A trend expression analysis and an enrichment analysis revealed that the expression ofgenes encoding G protein-coupled receptors and their ligands, i.e., prss1_2_3, pomc, npy, npb, sst, rln3, crh, gh, and prl that are associated with digestion and feeding regulation gradually increased during early development. In addition, a weighted gene co-expression network analysis (WGCNA) showed that eleven modules were significantly associated with early development, among which nine modules were significantly positively correlated. Through the enrichment analysis and hub gene identification results of these nine modules, it was found that the pathways related to eye, bone, and heart development were significantly enriched in the YL stage, and the ccnd2, seh1l, kdm6a, arf4, and ankrd28 genes that are associated with cell proliferation and differentiation played important roles in these developmental processes; the pak3, dlx3, dgat2, and tas1r1 genes that are associated with jaw and tooth development, TG (triacylglycerol) synthesis, and umami amino acid receptors were identified as hub genes for the PM stage; the pathways associated with aerobic metabolism and unsaturated fatty acid synthesis were significantly enriched in the KK stage, with the foxk, slc13a2_3_5, ndufa5, and lsc2 genes playing important roles; the pathways related to visual perception were significantly enriched in the LC stage; and the bile acid biosynthetic and serine-type peptidase activity pathways were significantly enriched in the SL stage. These results provide a more detailed understanding of the processes of early development of clearhead icefish.


Introduction
Clearhead icefish [1] (Salmoniformes, Salangidae, Protosalanx, Protosalanx chinensis) is the largest and the only ferocious carnivorous species in the family of Salangidae ( Figure 1) [2]. As an endemic species of East Asia, clearhead icefish is mainly distributed in China and the west coast of the Korean Peninsula, and now only exists as natural groups [3]. Because of its delicious taste, high nutritional value, and edible whole body, clearhead icefish has become an important export that inputs foreign exchange into China. In addition, its one-year life cycle and complex ecological niche make it an important ecological species in lake governance [4][5][6]. However, at present, the species cannot be artificially cultivated, and therefore, the annual replenishment in natural water can only rely on natural reproduction or artificially stocked fertilized eggs. Clearhead icefish is a rare winter spawning group, the sum of effective temperature (SET) of hatching is about 5100 degrees Celsius hours, and the parental generation has no nurturing behavior. Therefore, in a natural environment, the survival rate during the stages of early development is very low, and the population number is also prone to large fluctuations due to environmental changes [7,8]. The embryonic and larval stages are one of the most critical periods in the life cycle of the fish, during which many key developmental events occur, and the normal development of this process is key to the maintenance of population size for a fish with a shorter lifespan. Hence, adequate research into this process can deepen our understanding of the early development of fish and can provide basic data for factory farming. To date, many scholars have studied the developmental processes of fertilized eggs [9,10], feeding habits [6], whole genome [11], and skeletal and immune system development [2] of clearhead icefish, but the dynamic changes in the main biological functions as well as the underlying molecular regulatory mechanisms during its early developmental processes remain unexplored.

Sample Collection
The fertilized eggs of clearhead icefish were collected within 24 h of fertilization at the Doushan Reservoir (35.3328° N, 118.8806° E) in Linyi City, Shandong Province, China, on 18 December 2018, and were subsequently transported to the Wuxi base of the Chinese Academy of Fishery Sciences under 0 °C cold storage conditions. The hatching of fertilized eggs and larval rearing were both performed in a plastic box (length, width, and height 0.4 m × 0.3 m × 0.2 m) with fresh water at 9.2 ± 0.5 °C. Since the first mouth-open time of the P. chinensis was uncertain, and the first feeding time of marine fish was generally 3-20 days [27], therefore, we started feeding the larvae (twice a day, feeding egg yolks, rotifers, copepods, and Cladocera) from the third day after hatching, i.e., 18 January 2019, and daily microscopy was performed after half an hour of feeding each day to see if the larvae had eaten.
In this study, five representative key periods of early development were selected: YL (embryonic stage, organ formation, and tissue differentiation stage), PM (1st day after hatching, endogenous trophic period), KK (4th day after hatching, mouth-open period), LC (7th day after hatching, mixed nutrition period), and SL (10th day after hatching, exogenous nutrients period, with large zooplankton (Cladocera and copepods) for food), for regular sampling, three replicates per group (10-20 individuals per replicate), for a total of 15 samples. The collected samples were magnified 20-30× using an OLYMPUS SZX16 microscope (Tokyo, Japan) to ensure that the samples were at the same stage of development. Next, the collected samples were placed in a centrifuge tube, and then immediately placed in liquid nitrogen, and subsequently, stored in an ultra-low freezer at −80 °C for RNA extraction.

RNA Extraction, cDNA Library Construction, and Sequencing
The total RNA was extracted according to the instructions provided for a TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Subsequently, the total RNA quality and integrity were measured using 1% agarose gel electrophoresis (voltage 200 V, time 25 min), and Andy Safe™ Nucleic Acid Gel Stain (Applied BioProbes, Davis, CA, USA) was Transcriptome sequencing is considered to be an effective method for rapid identification and quantification of gene transcripts, and has been widely used in cancer detection and treatment [12,13], biological development [14,15], gene response to environmental changes [16], discovery of tissue markers [17], and so on. Moreover, it has also been successfully applied to explore the internal molecular mechanism changes during the early development of fish. Based on transcriptomic sequencing, a large number of scholars have studied the molecular mechanisms in the processes of fish fertilized egg to embryo [18], endogenous to exogenous [19], embryonic development [20,21], and morphological changes of larva [22], which have improved our understanding of the changes in genetic information in the early development of fish. However, most studies have mainly been based on comparative transcriptome analyses that have explored the dynamic expression of transcripts in several periods, and there have been few studies on the functions of the main related genes in each period. A weighted gene co-expression network analysis (WGCNA) is a good tool to study growth and development, which can be used to identify modules and core regulatory genes related to each developmental stage. It has been widely used in plant seed development [23], human brain development [24], early skeletal muscle development [25,26], and so on, but its application in the early development of fish is still limited. Therefore, the combination of transcriptome sequencing and WGCNA can help to identify molecular features during early development, and accordingly, can provide deeper insights into the dynamics of early development in this species.
In order to comprehensively explore the dynamic changes in the main biological functions of the early development of clearhead icefish and the underlying molecular regulatory mechanisms, samples from five key periods were collected for transcriptome sequencing and assembly on the basis of the preliminary research of our group. Subsequently, the dynamic changes of the main biological mechanisms during the early development of this species were explored based on a time series analysis and a WGCNA analysis. The results of this study should further promote our understanding of the processes during the early developmental stages of this species, and should lay a theoretical basis for seedling foundation and suitable bait application.

Sample Collection
The fertilized eggs of clearhead icefish were collected within 24 h of fertilization at the Doushan Reservoir (35.3328 • N, 118.8806 • E) in Linyi City, Shandong Province, China, on 18 December 2018, and were subsequently transported to the Wuxi base of the Chinese Academy of Fishery Sciences under 0 • C cold storage conditions. The hatching of fertilized eggs and larval rearing were both performed in a plastic box (length, width, and height 0.4 m × 0.3 m × 0.2 m) with fresh water at 9.2 ± 0.5 • C. Since the first mouth-open time of the P. chinensis was uncertain, and the first feeding time of marine fish was generally 3-20 days [27], therefore, we started feeding the larvae (twice a day, feeding egg yolks, rotifers, copepods, and Cladocera) from the third day after hatching, i.e., 18 January 2019, and daily microscopy was performed after half an hour of feeding each day to see if the larvae had eaten.
In this study, five representative key periods of early development were selected: YL (embryonic stage, organ formation, and tissue differentiation stage), PM (1st day after hatching, endogenous trophic period), KK (4th day after hatching, mouth-open period), LC (7th day after hatching, mixed nutrition period), and SL (10th day after hatching, exogenous nutrients period, with large zooplankton (Cladocera and copepods) for food), for regular sampling, three replicates per group (10-20 individuals per replicate), for a total of 15 samples. The collected samples were magnified 20-30× using an OLYMPUS SZX16 microscope (Tokyo, Japan) to ensure that the samples were at the same stage of development. Next, the collected samples were placed in a centrifuge tube, and then immediately placed in liquid nitrogen, and subsequently, stored in an ultra-low freezer at −80 • C for RNA extraction.

RNA Extraction, cDNA Library Construction, and Sequencing
The total RNA was extracted according to the instructions provided for a TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Subsequently, the total RNA quality and integrity were measured using 1% agarose gel electrophoresis (voltage 200 V, time 25 min), and Andy Safe™ Nucleic Acid Gel Stain (Applied BioProbes, Davis, CA, USA) was used for precast gel staining; RNA concentration and OD260/OD280 ratio were detected by Agilent 2100 (Agilent Technologies, Santa Clara, CA, USA). Well-detected RNA was used for library construction and sequencing, which was entrusted to the BGI (Wuhan, Hubei, China). The construction process of the cDNA library was as follows: The mRNA with polyA tail was enriched with magnetic beads with OligodT, and then interrupted with buffer. Reverse transcription of the interrupted mRNA with random N6 primers, and then synthesized two strands of cDNA to form double-stranded DNA. The ends of the synthesized doublestranded DNA were filled in and the 5 end was phosphorylated, and the 3 end was added with a dA tail and a bubble adapter. The ligation product was amplified by PCR with specific primers, and the PCR product was thermally denatured into single strands, and then the single-stranded DNA was circularized with a bridge primer to obtain a single-stranded circular DNA library. Library quality was evaluated using an Agilent 2100 bioanalyzer system, and then the libraries were loaded into the patterned nanoarrays, and the pair-end reads of 100 bp were read through on the BGISEQ-500 platform for subsequent data analysis. All raw data used in the main paper were deposited in the Sequence Read Archive (SRA) database at the National Centre for Biotechnology Information (NCBI).

Genome Mapping and Quantification of Gene Expression Levels
The raw data were processed, and reads with low quality, contaminated linkers, and high N content of unknown bases were filtered through the SOAPnuke (version 1.4.0) software to obtain clean reads. A subsequent analysis was based on high-quality clean reads. HISAT2 (version 2.1.0) [28] and Bowtie2 (version 2.2.5) [29] were used to align the clean reads onto the reference genome which was the existing clearhead icefish whole genome sequence (http://gigadb.org/dataset/view/id/100262, accessed on 5 May 2019). Then, the clean reads were assembled into transcripts using StringTie (version 1.0.4), and the abundance estimation and the expression levels of transcripts and genes were calculated through the RSEM (version 1.3.0) software [30]. Later, in order to make the results of a subsequent analysis more reliable, all low-expression genes (FPKM value < 1) were discarded.

Sample Correlation and Clustering Analysis
Based on the FPKM value of all genes (FPKM value ≥ 1) for the 15 samples, the sample correlation heat map was created using an online free website (http://www.bioinformatics. com.cn/plot_basic_matrix_heatmap_064, accessed on 25 July 2021). The principal component analysis (PCA) and hierarchical clustering analysis (HCA) were performed by using an online website (https://www.omicshare.com/tools/, accessed on 25 July 2021).

Trend Analysis of Gene Expression and Weighted Gene Co-Expression Network Analysis (WGCNA)
The trend analysis was performed on the OmicShare platform to exhibit the time sequence profile of gene expression (FPKM value ≥ 1) (https://www.omicshare.com/ tools/, accessed on 1 August 2021). Profiles with a p-value < 0.05 determined by STEM were considered to be statistically significant. The co-expression network was constructed using the WGCNA package [31] in R to identify target modules at different developmental stages. All the genes with FPKM levels ≥ 1 in the 15 libraries were used to construct the network. The soft threshold power network topology analysis was performed, and then a suitable power value was selected to use the one-step method to construct the correlation network. A gene dendrogram was generated based on TOM, and highly correlated genes were divided into a module with a threshold of minimum module size of 30 genes. Then, the module-sample correlation heat map was produced. A module with a p-value < 0.05 and a higher correlation coefficient was identified as a target module.

Functional Enrichment Analysis
In order to have a comprehensive understanding of the main physiological functions of clearhead icefish at each stage of early development, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed, and the p-values were calculated based on the phyper function in the R software (version 4.1.0). GO terms and KEGG pathways with p-values < 0.05 were considered to be significantly enriched. The KEGG enrichment map and GO enrichment map were generated through the online website (http://www.bioinformatics.com.cn/, accessed on 1 August 2021), and the KEGG mapper of the significant pathway was made through the online website (https://www.genome.jp/kegg/mapper/, accessed on 3 August 2021).

Identification and Visualization of Hub Genes
Genes in modules that were highly related to the developmental stages were highly correlated and might also have similar biological functions. In order to identify the hub genes of each target module, the top 200 genes with the highest correlation in each target module according to the cor.geneModuleMembership were selected and used to construct a protein-protein interaction (PPI) network. After that, the correlations among genes were calculated using the R package psych according to the expression levels of the genes at each stage, and then the packages stringr and dplyr were used to filter out the genes with low correlation (p > 0.05, |R| < 0.8) and output the result in a format accepted by the Cytoscape (version 3.8.2) software [32]. The correlation network was presented using the Cytoscape (version 3.8.2) software, and the top 3-6 genes with the highest node degree were identified as hub genes [33].

Quantitative Real-Time PCR (qPCR)
In order to verify the reliability of the transcriptome data, six genes were randomly selected for qRT-PCR (prss (LS_GLEAN_10003514), apob-48 (LS_GLEAN_10003911), cd36 (LS_GLEAN_10003179), apoa-1 (LS_GLEAN_10004674), muc2 (LS_GLEAN_10008706), and slc6a19 (LS_GLEAN_10011640)), and Primer Premier 5.0 was used to design primers for the selected genes. The gapdh (glyceraldehyde-3-phosphate dehydrogenase) gene was used as a reference gene to normalize the expression levels among samples [34] (Table S1). The primers were synthesized by the Yixin Biotechnology Co., Ltd. (Shanghai, China). Real-time PCR was performed using a LightCycler ® 480 II Real-time PCR Instrument (Roche, Basel, Switzerland) with 20 µL of PCR reaction mixture that included 1 µL of cDNA, 10 µL of ChamQ Universal SYBR Qpcr Master Mix, 0.4 µL of each primer (10 µM), and 8.2 µL of nuclease-free water. The PCR conditions were as follows: 95 • C for 30 s, and then 40 cycles of 95 • C for 10 s, 60 • C for 30 s, and 72 • C for 30 s. Each gene had three biological duplications. At the end of the PCR cycles, a melting curve analysis was performed to validate the specific generation of the expected PCR product. Relative expression was calculated using the 2 −(∆∆Ct) method and using Excel 2019 [35]. The OriginPro 2021was used for the qRT-PCR and transcriptome trend analyses. In the correlation analyses, the relative expression qRT-PCR and FPKM values were corrected using log transformation, and to avoid arithmetic error and large negative values in transformation, a pseudo count of 1 was added to each value before log transformation [20]. The SPSS Statistics v26 was used to perform a spearman correlation analysis, and the scatter plots were visualized using the OriginPro 2021.

Overview of Transcriptome Data of Every Developmental Stage
A total of 96.35 GB clean bases were obtained from the 15 cDNA libraries after quality control. The Q20 and Q30 of each transcriptome were 97.94-98.27% and 91.03-92.38%, respectively. In addition, the ratio of comparison to reference genome ranged from 71.89% to 77.51% (Table S2). A total of 22,082 genes were obtained from five stages, and 19,857 genes were expressed in all five stages ( Figure 2A). The original data were stored in the NCBI under BioProject accession: PRJNA687639.
The gene expression was calculated using the FPKM method (fragments per kilobase of transcript per million mapped reads). In order to make the results of subsequent analysis more reliable, the genes with low expression (FPKM < 1) in all groups were removed, and finally 16,600 genes with high expression were obtained. In addition, the real-time fluorescence quantitative (qRT-PCR) analysis showed that the expression trends of the six genes were basically consistent with the transcriptome analysis results, showing a trend of gradual increase in expression levels over time ( Figure S1). Significant positive correlations (r s = 0.624, p < 0.01) were observed between the RNA-seq and qRT-PCR data (log transformed) ( Figure S2), which indicated that the transcriptome data was reliable.

Sample Correlation and Cluster Analysis
A sample correlation analysis, principal component analysis (PCA), and hierarchical clustering analysis (HCA) were performed to obtain a comprehensive understanding of the relationships among samples. The sample correlation analysis revealed that the correlation between biological repeats was greater than 0.89, and the correlation between groups was greater than 0.77 ( Figure 2B), indicating good correlation among samples within groups and similar expression levels. The PCA and HCA showed that the samples of the YL and PM groups were clustered together, and those of the KK, LC, and SL groups were clustered together ( Figure 2C,D). clustering analysis (HCA) were performed to obtain a comprehensive understanding of the relationships among samples. The sample correlation analysis revealed that the correlation between biological repeats was greater than 0.89, and the correlation between groups was greater than 0.77 ( Figure 2B), indicating good correlation among samples within groups and similar expression levels. The PCA and HCA showed that the samples of the YL and PM groups were clustered together, and those of the KK, LC, and SL groups were clustered together ( Figure 2C,D).

Gene Expression Trend Analysis and Enrichment Analysis during Early Development of Clearhead Icefish
All genes with expression levels greater than 1 were grouped into 20 specific expression profiles, four of which were statistically significant, such as continuously increasing expression (Profile 19), continuously decreasing expression (Profile 0), and oscillatory expression (Profile 7 and Profile 15) ( Figure 3), suggesting that gene expression was tightly regulated during early development. Profile 19 could be involved in early development, because the expression levels of the genes in this profile were gradually upregulated with fish development.

Gene Expression Trend Analysis and Enrichment Analysis during Early Development of Clearhead Icefish
All genes with expression levels greater than 1 were grouped into 20 specific expression profiles, four of which were statistically significant, such as continuously increasing expression (Profile 19), continuously decreasing expression (Profile 0), and oscillatory expression (Profile 7 and Profile 15) ( Figure 3), suggesting that gene expression was tightly regulated during early development. Profile 19 could be involved in early development, because the expression levels of the genes in this profile were gradually upregulated with fish development.  The KEGG and GO enrichment analyses were performed to further understa 2032 genes in Profile19 ( Figure 4A,B). The GO enrichment analysis showed that protein-coupled receptor signaling pathway" was the most significantly en biological process term. In addition, biological processes related to vision and im  The KEGG and GO enrichment analyses were performed to further understand the 2032 genes in Profile19 ( Figure 4A,B). The GO enrichment analysis showed that the "G protein-coupled receptor signaling pathway" was the most significantly enriched biological process term. In addition, biological processes related to vision and immunity were significantly enriched. The KEGG enrichment results showed that the "neuroactive ligand-receptor interaction pathway" was the most significant enriched pathway, and large numbers of genes were enriched into pathways related to tissue systems and metabolism, such as: "phototransduction", "olfactory transduction", "circadian entrainment", and "arachidonic acid metabolism". The results indicated that the tissue system of clearhead icefish was gradually perfected, and that its digestive ability was enhanced during early development. The KEGG and GO enrichment analyses were performed to further understand the 2032 genes in Profile19 ( Figure 4A,B). The GO enrichment analysis showed that the "G protein-coupled receptor signaling pathway" was the most significantly enriched biological process term. In addition, biological processes related to vision and immunity were significantly enriched. The KEGG enrichment results showed that the "neuroactive ligand-receptor interaction pathway" was the most significant enriched pathway, and large numbers of genes were enriched into pathways related to tissue systems and metabolism, such as: "phototransduction", "olfactory transduction", "circadian entrainment", and "arachidonic acid metabolism". The results indicated that the tissue system of clearhead icefish was gradually perfected, and that its digestive ability was enhanced during early development. According to the above KEGG enrichment results, a total of 125 genes were enriched into the "neuroactive ligand-receptor interaction" pathway in Profile 19, suggesting that this pathway might play an important role in the early development of this species. For According to the above KEGG enrichment results, a total of 125 genes were enriched into the "neuroactive ligand-receptor interaction" pathway in Profile 19, suggesting that this pathway might play an important role in the early development of this species. For the sake of further comprehension of the genes in this pathway, we collated the pathway map, as shown in Figure 5. The expression levels of pomc, npy, npb, sst, prss1_2_3, rln3, crh, gh, prl, and their receptors gradually increased during early development, with the highest expression levels in the SL stage. The details of 125 genes are shown in Table S3.

Weighted Gene Co-Expression Network Analysis and Identification of Modules Related to Early Development
The 16,600 genes with high expression (FPKM ≥ 1) were used for the weighted gene co-expression network analysis. The weight parameters of the adjacency matrix (soft threshold) were calculated using the pick soft threshold function, and the results showed that when β = 23, the scale-free topology index was greater than 0.8 ( Figure 6A). At this point, the network conformed to the power-law distribution and was closer to the real biological network state. Therefore, with β = 23, the co-expression matrix was further constructed, and the genes with high correlation coefficients were gathered together to form a gene tree ( Figure 6B). Finally, 18 modules were obtained, with the number of genes in the modules from 46 to 3690, and then the module-sample correlation heat map was generated. Eleven of the 18 modules were identified as target modules (brown, red, purple, grey, pink, cyan, lightcyan, yellow, turquoise, black, and midlightblue) (R > 0.6 or R < −0.6, p < 0.05) (Figure 7). 022, 7, x FOR PEER REVIEW 8 the sake of further comprehension of the genes in this pathway, we collated the path map, as shown in Figure 5. The expression levels of pomc, npy, npb, sst, prss1_2_3, rln3 gh, prl, and their receptors gradually increased during early development, with highest expression levels in the SL stage. The details of 125 genes are shown in Table   Figure

Weighted Gene Co-Expression Network Analysis and Identification of Modules Related Early Development
The 16,600 genes with high expression (FPKM ≥ 1) were used for the weighted co-expression network analysis. The weight parameters of the adjacency matrix threshold) were calculated using the pick soft threshold function, and the results sho that when β = 23, the scale-free topology index was greater than 0.8 ( Figure 6A). A point, the network conformed to the power-law distribution and was closer to the biological network state. Therefore, with β = 23, the co-expression matrix was fu constructed, and the genes with high correlation coefficients were gathered togeth form a gene tree ( Figure 6B). Finally, 18 modules were obtained, with the number of g in the modules from 46 to 3690, and then the module-sample correlation heat map generated. Eleven of the 18 modules were identified as target modules (brown, purple, grey, pink, cyan, lightcyan, yellow, turquoise, black, and midlightblue) (R > 0 R < −0.6, p < 0.05) (Figure 7).

GO and KEGG Enrichment Analyses of Genes in Each Target Module
In order to understand the dynamic changes in the biological functions during the early development of this species, gene ontology (GO) and KEGG enrichment analyses were performed for the nine target modules that were significantly positively correlated with the early developmental stages (Table S4).
Brown and red modules (2077 and 1084 genes) were significantly positively

GO and KEGG Enrichment Analyses of Genes in Each Target Module
In order to understand the dynamic changes in the biological functions during the early development of this species, gene ontology (GO) and KEGG enrichment analyses were performed for the nine target modules that were significantly positively correlated with the early developmental stages (Table S4).
Brown and red modules (2077 and 1084 genes) were significantly positively correlated with the YL stage. The GO enrichment analysis results showed that genes in the brown module were significantly enriched in the cell cycle process (mitotic cell cycle and regulation of the metaphase/anaphase transition of the cell cycle) and development (embryo development, muscle organ development, muscle structure development, pronephros development, heart valve development, photoreceptor cell development, diencephalon development, bone development, hindbrain development, and neural crest cell development). The genes in the red module were mainly enriched in organ formation and development (chondrocyte differentiation, lymph vessel morphogenesis, pancreas development, cranial nerve morphogenesis, and digestive system development). According to the KEGG pathways enrichment results, genes in the brown and red modules were mainly enriched in pathways related to cell proliferation, such as the "notch signaling pathway", "wnt signaling pathway", and "cell cycle". In addition, in the red module, the "thyroid hormone signaling pathway" and "steroid biosynthesis" were also significantly enriched pathways.
The purple and grey modules (265 and 1281 genes) were significantly positively correlated with the PM stage. The GO enrichment analysis results showed: The genes assigned to the purple module were mainly enriched in organ and system development (central nervous system neuron development, pharyngeal system development, hard palate development, and limb development), and metabolic processes (regulation of the primary metabolic process and regulation of the cellular metabolic process); the genes in the grey module were enriched in the metabolic process (lipid metabolic process, retinoic acid metabolic process, and carbohydrate catabolic process) and biosynthetic process (lipid biosynthetic process, retinoic acid biosynthetic process, and L-methionine biosynthetic process). The KEGG enrichment results showed: The genes in the two modules were significantly enriched in the pathways relaed to lipid metabolism, such as "alpha-linolenic acid metabolism", "linoleic acid metabolism", and "glycerolipid metabolism".
The cyan and pink modules (100 and 455 genes) were significantly positively correlated with the KK stage. The GO enrichment analysis results showed: The muscle part (myofilament, contractile fiber part, sarcomere, and skeletal muscle contraction) and the metabolic process (regulation of fatty acid metabolic process, regulation of fatty acid oxidation, and fructose metabolic process) were significantly enriched in the cyan module; the genes in the pink module were significantly enriched in cellular respiration (tricarboxylic acid cycle, citrate metabolic process, tricarboxylic acid metabolic process, generation of precursor metabolites, and energy) and enzyme activity (S-succinyltransferase activity, citrate (Si)-synthase activity, NADH dehydrogenase activity, and succinate-CoA ligase activity). The KEGG enrichment results showed: "Oxidative phosphorylation", "thermogenesis", and "cardiac muscle contraction" were significantly enriched in both the cyan and pink modules; in addition, the "citrate cycle (TCA cycle)" was significantly enriched in the pink module and the pathways related to fatty acid synthesis and amino acid metabolism were also significantly enriched pathways, including "biosynthesis of unsaturated fatty acids" and "fatty acid degradation", "valine, leucine and isoleucine degradation", "tyrosine metabolism" and "alanine, aspartate and glutamate metabolism".
The lightcyan module (48 genes) was significantly positively correlated with the LC stage. The genes in this module were mainly enriched GO terms related to the visual process (visual perception, retinal binding, sensory perception of light stimulus, and visual behavior) and enzyme activity (6-phosphofructokinase activity and rhodopsin kinase activity). The KEGG enrichment results showed that "phototransduction" was the most significantly enriched pathway. In addition, the pathways related to fatty acid and amino acid metabolism were also significantly enriched, such as "fatty acid degradation" and "arginine and proline metabolism".
The yellow and turquoise modules (1661 and 3690 genes) were significantly positively correlated with the SL stage. The genes in the yellow module were significantly enriched in the biosynthetic process (bile acid biosynthetic process and tetrahydrofolate biosynthetic process), metabolic process (steroid catabolic process, cholesterol catabolic process, and bile acid metabolic process), development (embryonic skeletal system development), and enzyme activity (serine-type peptidase activity and serine hydrolase activity). The genes in the turquoise module were enriched in the channel activity (extracellularly glutamategated ion channel activity and extracellular ligand-gated ion channel activity), transporter activity (inorganic cation transmembrane transporter activity, inorganic molecular entity transmembrane transporter activity, and metal ion transmembrane transporter activity), enzyme activity (serine-type peptidase activity and serine hydrolase activity), and circadian rhythm. The KEGG enrichment results showed that "pancreatic secretion", "bile secretion", and "phenylalanine, tyrosine and tryptophan biosynthesis" were significantly enriched in the two modules. In addition, in the yellow module, "phototransduction" was the most significantly enriched pathway; "circadian rhythm" and "lipoic acid metabolism" were significantly enriched in the turquoise module.

Identification and Visualization of Hub Genes
For the sake of further comprehension of the relationship between significantly enriched genes in each target module, the correlation network was constructed based on the top 200 genes in seven target modules with a large number of genes, including brown, red, purple, grey, yellow, pink, and turquoise modules. It is worth noting that since there were only 100 and 48 genes in the cyan and lightcyan modules, all genes in these two modules were selected for constructing the correlation network. The first three to five genes with the highest degree of connection were identified as hub genes, which were marked with special colors, as shown in Figure 8. A complete list of hub genes and their descriptions are shown in Table S5. For example, ccnd2 and seh1l were identified as hub genes of the brown module ( Figure 8A and Table S5); kdm6a, arf4, and ankrd28 were identified as hub genes of red module ( Figure 8B, Table S5); in the purple module, pak3 and dlx3 were identified as hub genes ( Figure 8C and Table S5); tas1r1 and dgat2 were identified as hub genes of the grey module ( Figure 8D and Table S5); foxk, slc13a2_3_5, ndufa5, and lsc2 were identified as hub genes of the pink module ( Figure 8F and Table S5); in the lightcyan module, prph2 was identified as a hub gene ( Figure 8G and Table S5); in the yellow and turquoise modules, scn4b, syt12, sag, gabrp, chd3, tspan4, slc7a8, and nlgn were identified as hub genes ( Figure 8H,I and Table S5).

Identification and Visualization of Hub Genes
For the sake of further comprehension of the relationship between significantly enriched genes in each target module, the correlation network was constructed based on the top 200 genes in seven target modules with a large number of genes, including brown, red, purple, grey, yellow, pink, and turquoise modules. It is worth noting that since there were only 100 and 48 genes in the cyan and lightcyan modules, all genes in these two modules were selected for constructing the correlation network. The first three to five genes with the highest degree of connection were identified as hub genes, which were marked with special colors, as shown in Figure 8. A complete list of hub genes and their descriptions are shown in Table S5. For example, ccnd2 and seh1l were identified as hub genes of the brown module ( Figure 8A and Table S5); kdm6a, arf4, and ankrd28 were identified as hub genes of red module ( Figure 8B, Table S5); in the purple module, pak3 and dlx3 were identified as hub genes ( Figure 8C and Table S5); tas1r1 and dgat2 were identified as hub genes of the grey module ( Figure 8D and Table S5); foxk, slc13a2_3_5, ndufa5, and lsc2 were identified as hub genes of the pink module ( Figure 8F and Table S5); in the lightcyan module, prph2 was identified as a hub gene ( Figure 8G and Table S5); in the yellow and turquoise modules, scn4b, syt12, sag, gabrp, chd3, tspan4, slc7a8, and nlgn were identified as hub genes ( Figure 8H,I and Table S5).

Discussion
The embryonic and larval stages are the most critical stage in the life cycle of fish, and many key developmental events occur in these stages. Because the tissue system of a fish larva is not fully developed at these stages, the fish body is fragile and susceptible to

Discussion
The embryonic and larval stages are the most critical stage in the life cycle of fish, and many key developmental events occur in these stages. Because the tissue system of a fish larva is not fully developed at these stages, the fish body is fragile and susceptible to the influence of the external environment, leading to a high mortality rate. A comprehensive understanding of the molecular mechanisms behind these events is important for healthy growth of fish and factory farming. In this study, a total of 22,082 genes were identified in the five developmental stages, of which 16,600 genes had expression levels greater than one. We analyzed the expression trend analysis and WGCNA of the 16,600 genes. The dynamics of major biological events during the early development of clearhead icefish are shown in Figure 9. With continuous development in the early stages, its tissue system was constantly developing and improving, along with continuous improvement of digestive ability and feeding regulation ability, as well as increased feeding. These data provide insights into the molecular mechanism of early development, and may help in the selection of initial feeding and artificial culture of clearhead icefish. the influence of the external environment, leading to a high mortality rate. A comprehensive understanding of the molecular mechanisms behind these events is important for healthy growth of fish and factory farming. In this study, a total of 22,082 genes were identified in the five developmental stages, of which 16,600 genes had expression levels greater than one. We analyzed the expression trend analysis and WGCNA of the 16,600 genes. The dynamics of major biological events during the early development of clearhead icefish are shown in Figure 9. With continuous development in the early stages, its tissue system was constantly developing and improving, along with continuous improvement of digestive ability and feeding regulation ability, as well as increased feeding. These data provide insights into the molecular mechanism of early development, and may help in the selection of initial feeding and artificial culture of clearhead icefish.

G Protein-Coupled Receptors Play an Important Role in Early Development of P. chinensis
The "G protein-coupled receptor signaling pathway" and "neuroactive ligandreceptor interaction" were the most significant enrichments in GO terms and KEGG pathways of genes in continuously increasing expression (Profile 19), suggesting that G protein-coupled receptors might play an important role in larval development of this species. Further studies of the "neuroactive ligand-receptor interaction" pathway found that the expressions of prss1_2_3, pomc, npy, npb, sst, rln3, crh, gh, prl, and genes that encode their receptor proteins gradually increased during the early development of this species.

G Protein-Coupled Receptors Play an Important Role in Early Development of P. chinensis
The "G protein-coupled receptor signaling pathway" and "neuroactive ligand-receptor interaction" were the most significant enrichments in GO terms and KEGG pathways of genes in continuously increasing expression (Profile 19), suggesting that G protein-coupled receptors might play an important role in larval development of this species. Further studies of the "neuroactive ligand-receptor interaction" pathway found that the expressions of prss1_2_3, pomc, npy, npb, sst, rln3, crh, gh, prl, and genes that encode their receptor proteins gradually increased during the early development of this species. Among them, prss1_2_3 is one of the coding genes for trypsin, which is the main enzyme for protein digestion in the early stages of most teleost fish [36], and shows acquaintance expression patterns in the early stages of bullseye puffer (Sphoeroides annulatus) [37], Barramundi (Lates calcarifer) [38], and Atlantic cod (Gadus morhua) [39]. The expression levels of pomc [40,41], npy [42][43][44], npb, gh, prl [45], crh [46], and rln3 [47], which are involved in the regulation of food intake and growth, also gradually increased with development. To sum up, with the development of the larva, food intake progressively increased, and digestion ability also progressively increased.

Exploration of the Main Biological Functions of the Early Developmental Stages of Clearhead Icefish
Vertebrate embryogenesis is one of the most complex developmental procedures, marked by a series of key events, including cleavage, blastocyst formation, gastrulation, and somitogenesis, among which normal operation is determined by countless genes in the whole genome [48,49]. In the process of embryonic development of clearhead icefish, it undergoes a total of 30 developmental processes in nine physiological stages, including placental formation, division stage, blastocyst stage, protocellar stage, neural embryonic stage, organ formation and tissue differentiation stage, embryonic body peristalsis and heart beating stage, hatching gland emergence stage, and hatching stage [9]. For this study, embryos in the stages of organ formation and tissue differentiation were selected as samples for the YL stage. Genes in the brown and red modules associated with this stage were mainly enriched in GO terms associated with eye, bone, and heart development. The KEGG enrichment analysis also showed significant enrichment of pathways associated with cell proliferation, such as "notch signaling pathway", "wnt signaling pathway", and "cell cycle". Among them, the "notch signaling pathway" plays an important role in stimulating the proliferation of cardiomyocytes during embryonic development [50] as well as skeletal development and morphogenesis [20], and the "wnt signaling pathway" is involved in regulating the formation of the brain during the embryonic period and the development of the nervous system [51], bone formation [52], embryonic dorsal axis formation [53], and liver development [54]. Genes such as ccnd2, seh1l, kdm6a, arf4, and ankrd28 have been identified as hub genes, where the protein encoded by the ccnd2 and seh1l genes played an important role in mitosis [55,56], ankrd28 also played a huge role in promoting cell migration and cell proliferation [57,58], the arf4 gene was identified as a gene necessary for mammalian embryonic development [59], and the kdm6a gene played an important role in zebrafish axial formation [60]. It is implied that the development of the eye, bone, and heart during this phase is mainly achieved through cell proliferation and differentiation, while revealing that the "notch signaling pathway" and "wnt signaling pathway" play important roles in these developments.
The PM stage refers to the first day of hatching, and the organ tissues of the fish body are not yet fully developed. In this study, genes in the purple and grey modules that were significantly associated with the PM phase were significantly enriched in GO terms associated with the pharyngeal system development and hard palate development. In addition, the genes pak3 and dlx3 that are associated with bone development were identified as hub genes for this stage [61][62][63]. At the same time, another study pointed out that the upper and lower jaws of fish could continue to develop after hatching [64]. According to previous observations, adult clearhead icefish have relatively long and pointed upper and lower jaws, while the jaw surface of the juvenile stage is blunt. This suggests that the pak3 and dlx3 genes may play an important role in jaw development and tooth mineralization in this species. The taste buds of fish are widely distributed in the oropharyngeal cavity, craniofacial surface, fins, and trunk [65][66][67], but umami amino acid receptors are mainly distributed on the tip of the tongue and palate [68]. The tas1r1 gene [69] as the main umami amino acid receptor, which is widely distributed in carnivorous fish, was also identified as a hub gene at this stage, implying that clearhead icefish is a carnivorous fish and prefers initial feed with umami amino acids [70]. Equally important, the results of the KEGG enrichment analysis showed that "glycerolipid metabolism" was a significantly enriched pathway. The coding gene for the main enzyme associated with TG (triacylglycerol) storage, i.e., dgat2 [71], was identified as a hub gene, which could use endogenous and exogenous fatty acids to synthesize cytosolic and luminal apolipoprotein B associated LD-TAG [72]. This suggests that, at this stage, the larvae mainly use the lipids in the yolk sac to synthesize triglycerides and store them in the body.
The KK stage refers to the time node of the first feeding of larva, which is the beginning of the gradual transition from the endogenous to the exogenous nutrition stage. The PCA analysis results showed that the repeated samples in the KK stage group did not effectively cluster together, and the similarity was not very high, which may be related to the difficulty of determining the mouth-open time point of clearhead icefish. The enrichment analysis of the target modules of this stage showed that the pathways associated with cellular respiration, particularly aerobic metabolism ("oxidative phosphorylation" and "citrate cycle (TCA cycle)") were significantly enriched. It is worth noting that the GO terms associated with fatty acid metabolism and the KEGG metabolic pathway associated with amino acid metabolism were significantly enriched, suggesting that the organic matter metabolized by aerobic metabolism was mainly amino acids and fatty acids. Metabolismrelated genes, such as foxk, slc13a2_3_5, ndufa5, and lsc2 were also identified as key genes in the pink module. Among them, the foxk gene plays an important role in biological processes such as aerobic glycolysis [73] and inhibition of autophagy of muscle cells [74]. The slc13a2_3_5, ndufa5, and lsc2 genes play important roles in the citric acid cycle [75][76][77][78]. The citric acid cycle, one of the most significantly enriched pathways, is the final step in the oxidative catabolism of the carbon backbone of carbohydrates, amino acids, and fatty acids, producing large amounts of ATP [79]. It has been further explained that the clearhead icefish mainly produces a large amount of energy through aerobic metabolism at this stage, and the oxygen demand increases, which may be related to the increase in swimming intensity [80]. Another study has shown that the slc13a5 gene was associated with fat synthesis [81], and the KEGG metabolic pathway associated with unsaturated fatty acid synthesis was significantly enriched, suggesting that a portion of the metabolized organic matter was used to synthesize unsaturated fatty acids stored in the body, similar to the mouth-open stage of zebrafish (Danio rerio) [19] and Chinese tapertail anchovy (coilia nasus) [82].
The LC stage refers to the seventh day after hatching, when the larval fish begins to feed on rotifers, and the yolk sac has not completely disappeared, which is the mixed nutrient stage. Many GO terms and KEGG pathways related to visual perception were significantly enriched at this stage, such as "sensory perception of light stimulus", "rhodopsin kinase activity", and "phototransduction". Moreover, genes associated with the formation of the outer segment of the photoreceptor (OS) such as prph2 (peripherin-2) [83,84], were identified as hub genes. The prph2 gene is expressed only in the photosensitive chambers of rods and cone photoreceptors designated as outer segments (OS) and interacts with rhodopsin [85], and is essential for normal photoreceptor health and vision [86]. During the development of red-spotted grouper (Epinephelus akaara), pacific bluefin tuna (Thunnus orientalis), and Argentine anchovy (Engraulis anchoita), the retina gradually thickens, and the opsin appears accordingly, and generally begins to be expressed before and after the mouth-open stage [87][88][89]. In addition, studies have shown that visual, olfactory, and lateral line systems play important roles in fish predation [90], for example, zebrafish predominantly use vision systems to locate food [91], and juvenile American redfish (Sciaenops ocellatus) predominantly use lateral line systems (mechanical sensing) to locate food [92]. Early observations have found that the eyes occupy a large area of the head, and when the eyes have light sensitivity (at this stage), they begin to prey on rotifers with certain motor ability, suggesting that this species may rely on the visual system to locate food such as zebrafish.
The SL phase refers to the tenth day after hatching, when the yolk sac has been completely absorbed and the larva begins to feed on large zooplankton such as copepods and Cladocera. Many metabolic-related GO terms and KEGG metabolic pathways were markedly enriched at this stage, notably "bile acid biosynthetic and metabolism", "cholesterol catabolic process", "serine-type peptidase activity", and "serine hydrolase activity" pathways. Bile acids are the end product of cholesterol catabolism, and can combine with glycine or taurine to form conjugated bile acids (bile salts), which are crucial for the lytic absorption of cholesterol, lipids, and fat-soluble vitamins (A, D, E, and K) [93]. Studies have shown that taurine is an indispensable nutrient for carnivorous fish, such as juvenile rock bream (Oplegnathus fasciatus) [94], juvenile Japanese flounder (Paralichthys olivaceus) [95], red sea bream (Pagrus major) [96], and rainbow trout (Oncorhynchus mykiss) [97], and that it is the main amino acid that binds bile acids in carnivorous fish. The appropriate addition of this substance can increase the growth performance of juvenile fish. Clearhead icefish is a typical carnivorous fish that feeds mainly on zooplankton in the juvenile stage, such as copepods and Cladocera, but in the adult stage some large individuals begin to feed on small fish and shrimp [6], suggesting that taurine is also crucial for its early developmental stages. In addition, serine hydrolases are one of the largest families of enzymes in eukaryotes, mainly consisting of serine peptidases, lipases, and esterases [98], among which peptidases are mainly related to peptide chain hydrolysis. In this study, "serine-type peptidase activity" was one of the main enriched terms, indicating a higher protein need in the larva. Taken together, it suggests that cholesterol, lipids, and proteins may be the main nutrients at this stage, and taurine may be its essential nutrients.

Conclusions
This research was the first to utilize transcriptome to describe, in detail, the biological events in the early development of clearhead icefish. The results reveal that the processes of early development are strictly regulated by molecules in this species. The results showed that G protein-coupled receptors played important roles in the early development of this species, especially genes associated with growth and feeding (prss1_2_3, pomc, npy, npb, sst, rln3, crh, gh, prl, and genes that encode their receptor proteins). In the embryonic stage, the development of the eyes, bones, and heart was mainly carried out through cell proliferation and differentiation, in which ccnd2, seh1l, kdm6a, arf4, and ankrd28 played important roles; after hatching, the tissue system and digestive capacity of the larvae were gradually enhanced. In the endogenous nutrient stage, the larva mainly used the lipids in the yolk sac to synthesize triglycerides and stored them in the body; in the mouth-opening stage, the larva mainly used amino acids and fatty acids for aerobic metabolism, and it could synthesize unsaturated fatty acids and stored them in the body; after entering the exogenous nutrition, the main nutritional sources of the larva were cholesterol, lipids, and proteins. These findings provide basic data for further research on the early development of clearhead icefish, and at the same time, improve our understanding of the processes of the early developmental stages of fish.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/fishes7030115/s1, Table S1: DEGs and the prime information of qRT-PCR verification, Table S2: Transcriptome sequencing quality indicators, Table S3: Genes and their expression levels in neuroactive ligand-receptor interaction pathway, Table S4: Supplement excel file including gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of genes in the target module (brown, red, purple, grey, cyan, pink, lightcyan, yellow, turquoise module), Table S5: List of hub genes associated with early development in clearhead icefish, Figure S1: Real-time fluorescent quantitative PCR verification of RNA-seq results, Figure S2: The correlations of quantitative real-time PCR (qRT-PCR) and RNA-seq expression data (FPKM) of all chosen unigenes.