Whole-Genome Sequencing Analyses Reveal the Evolution Mechanisms of Typical Biological Features of Decapterus maruadsi

Simple Summary In this study, a high-quality chromosomal-level genome of male Decapterus maruadsi was assembled based on Illumina, PacBio, and Hi-C technology. Notably, 23 chromosome-level genome sequences with lengths ranging between 21.74 and 44.53 Mb were assembled. A total of 22,716 protein-coding genes with an average transcript and a CDs length of 12,823.03 bp and 1676.66 bp, respectively, were successfully annotated. Based on positive selection analysis, some genes associated with the growth and development of bone, muscle, cardioid, and ovary were screened. These genes were likely involved in the evolution of typical biological features in D. maruadsi, such as fast growth rate, small body size, and strong fecundity. The newly established reference genome provides a fundamental genome resource for further genetic conservation, genomic-assisted breeding, and exploration of the molecular mechanism underlying adaptive evolution. Abstract Decapterus maruadsi is a typical representative of small pelagic fish characterized by fast growth rate, small body size, and high fecundity. It is a high-quality marine commercial fish with high nutritional value. However, the underlying genetics and genomics research focused on D. maruadsi is not comprehensive. Herein, a high-quality chromosome-level genome of a male D. maruadsi was assembled. The assembled genome length was 716.13 Mb with contig N50 of 19.70 Mb. Notably, we successfully anchored 95.73% contig sequences into 23 chromosomes with a total length of 685.54 Mb and a scaffold N50 of 30.77 Mb. A total of 22,716 protein-coding genes, 274.90 Mb repeat sequences, and 10,060 ncRNAs were predicted, among which 22,037 (97%) genes were successfully functionally annotated. The comparative genome analysis identified 459 unique, 73 expanded, and 52 contracted gene families. Moreover, 2804 genes were identified as candidates for positive selection, of which some that were related to the growth and development of bone, muscle, cardioid, and ovaries, such as some members of the TGF-β superfamily, were likely involved in the evolution of typical biological features in D. maruadsi. The study provides an accurate and complete chromosome-level reference genome for further genetic conservation, genomic-assisted breeding, and adaptive evolution research for D. maruadsi.


Introduction
The Japanese scad Decapterus maruadsi (Temminck and Schlegel, 1843) is a small pelagic fish common to warm-temperate coastal areas and belongs to the order Perciformes, family Carangidae [1].The Japanese scad is widely distributed across the marginal sea of the Indo-West Pacific and is an important economic fish in Asian coastal countries such as China, Malaysia, and Thailand [2][3][4].For example, the annual catch of D. maruadsi in reads, assembly difficulties, and error regions and gaps generated in the assembly process.Chen et al. [16] reported a chromosome-level reference genome of female D. maruadsi based on Nanopore sequencing and Hi-C technology, and the final genome size was 713.58 Mb and anchored on 23 chromosomes.Li et al. [17] obtained a 16,541bp mitochondrial genome of D. maruadsi using overlapped PCR.The genetic structure and diversity of D. maruadsi from Chinese coastal waters and northern Vietnam were evaluated based on mtDNA COI, Cyt b, and control region sequences [3,9,[18][19][20].Hou et al. [21] revealed spatial and temporal information on the distributions of D. maruadsi eggs with DNA barcodes in the northern South China Sea.Nonetheless, further genetic research on D. maruadsi is beneficial for better promoting the comprehensive analysis of the genetic mechanism behind the biology (body size, growth rate, and fertility), selection of good varieties and the comprehensive assessment of germplasm resources.
In this study, second-generation Illumina, three-generation PacBio, and Hi-C techniques were used to construct a chromosome-level genome of male D. maruadsi.The repeated sequences, protein-coding genes, and noncoding RNA (ncRNA) of the genome were annotated based on the RNA-seq data by Illumina and PacBio sequencing.Comparative genomic analysis was used to determine the phylogenetic relationship and divergence time of 17 fish species, screen the candidate genes associated with typical biological characteristics of D. maruadsi, and identify the collinear regions between the genome assembled in this study and the other two genomes.The findings of this study provide an accurate chromosome-level genome for the analysis of the genetic base of economic traits and genome selection breeding in D. maruadsi.The findings also contribute to the conservation of germplasm resources and the analysis of the molecular mechanisms of adaptive evolution for D. maruadsi and other important small pelagic fish based on their whole genomes.

Sample Collection
Three D. maruadsi samples (Figure 1) were collected from the Zhanjiang Bay of the South China Sea (Zhanjiang City, Guangdong Province, China) for genome sequencing.The body length and weight were measured after anesthetizing the fish with MS-222, followed by harvesting of the muscle, liver, and heart tissues in liquid nitrogen and the subsequent storage of the tissues at ultra-low temperature (−80 °C).Part of the muscle tissue was used for genomic DNA sequencing, while the muscle, liver, and heart tissues were used for transcriptome sequencing.The experimental animal protocols used in this study were approved by the Animal Ethics Committee of Guangdong Ocean University, China.

DNA Library Construction and Sequencing (Illumina and PacBio)
Genomic DNA was isolated from the muscle tissues using the phenol-chloroform extraction method [22].The degradation and contamination of the DNA samples were

DNA Library Construction and Sequencing (Illumina and PacBio)
Genomic DNA was isolated from the muscle tissues using the phenol-chloroform extraction method [22].The degradation and contamination of the DNA samples were checked on a 1% agarose gel electrophoresis.DNA purity and concentration were measured using a NanoPhotometer spectrophotometer (IMPLEN, Westlake Village, CA, USA) and a Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA), respectively.The qualified DNA was used to construct second-generation and third-generation sequencing libraries.
A short-fragment DNA library (350 bp) was constructed using the Truseq Nano DNA HT Sample Preparation Kit (Illumina, San Diego, CA, USA).Briefly, the qualified DNA was randomly fragmented using a Covaris ultrasonic crusher, followed by the preparation of a short-fragment DNA library through terminal repair, A-tail addition, sequencing adaptor addition, purification, and PCR amplification.The qualified libraries were sequenced on an Illumina NovaSeq6000 platform (Illumina, San Diego, CA, USA).The raw sequence reads were fine-filtered to obtain high-quality clean reads by removing the adaptor sequences, reads with more than 10% unknown bases (N), and single-end reads with low-quality bases greater than 20%.The sequence error rate, GC Content, Q20, and Q30 of the clean reads were then calculated to assess the sequence quality.Ten thousand randomly selected clean reads (read1 and read2 each 5000) were mapped to the NCBI-NT database by BLAST to check and exclude the exogenous contaminants.
A SMRTbell library (15 k, PacBio library) was constructed using a SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, Menlo Park, CA, USA).The qualified DNA was first randomly fragmented using a Covaris ultrasonic crusher, followed by the enrichment and purification of the large DNA fragments (5~20 kb) using Ampure PB beads.The Template Prep Kit was then used to do damage and end repair, followed by ligation of the fragments with circular sequencing adapters.The exonucleases were finally used to remove the failed ligation DNA fragments, and the library quality was inspected using Femto Pulse.The qualified SMRTbell library was sequenced on a PacBio Sequel II platform (Pacific Biosciences, Menlo Park, CA, USA).

De Novo Genome Assembly and Quality Assessment
K-mer frequency analysis was used to estimate the genome size, heterozygous ratio, and repeat ratio of the Illumina sequencing data using Jellyfish v2.2.7 [23].SMRTlink v10.2 software [24] was used to filter PacBio sequencing raw data to remove low-quality reads and obtain high-quality HiFi reads.The PacBio HiFi reads were subsequently assembled to obtain a draft genome using the Hifiasm v0.15.4 software [25].The completeness of the assembled genome was assessed by Bench-marking Universal Single-Copy Orthologs (BUSCO) and the Core Eukaryotic Genes Mapping Approach (CEGMA) [26,27].Qualified Illumina DNA sequencing data were mapped onto the assembled genome to calculate the mapping ratio of the clean reads and the degree and depth of genome coverage using the BWA v0.7.8 software [28], which also assessed the completeness of genome assembly and the uniformity of sequencing.Single nucleotide polymorphism (SNP) calling was performed using Samtools v 0.1.19[29] based on the BWA mapping results.The homozygous and heterozygous SNPs were identified to assess the accuracy of the genome assembly.The GC content of the assembled genome was calculated using 10 kp non-overlapping sliding windows.A GC content distribution analysis was done to detect the AT and GC separation of the sequencing data.

Chromosome-Level Genome Assembly
The high-throughput chromosome conformation capture (Hi-C) technique [30,31] was used to construct the chromosome-level genome assembly for D. maruadsi.A Hi-C library was first constructed.Cells were fixed with paraformaldehyde and lysed, followed by digesting the crosslinked DNA with restriction enzymes to obtain sticky ends.The ends of the DNA fragments were repaired, labelled with biotin, and ligated again.The cross-linked DNA was digested with proteinase, purified, and randomly sheared to 300~500 bp.The biotin-containing DNA fragments were captured by avidin beads.The purified DNA fragments were subjected to end-repair, A-tailing, sequencing adaptor ligation, PCR amplification, and purification for Hi-C library construction.The completeness of the DNA fragments and inserted fragment sizes were detected using Agilent 2100 (Agilent Technologies, Palo Alto, CA, USA).The effective library concentration was quantified using Qubit 2.0 and quantitative PCR (qPCR).Different qualified libraries were pooled based on the requirements of effective concentration and target data volume and were then sequenced on an Illumina PE150 platform.The quality of the Hi-C sequencing data was checked using a statistical comparison method (described in 2.2) and HiCUP quality control analysis to obtain effective whole-genome chromosome crosslinking information [32].The clean Hi-C reads were finally mapped to draft genome contigs using ALLHiC v0.9.8 [33] to obtain a chromosome-level genome of D. maruadsi.2.5.Genome Annotation 2.5.1.RNA Library Construction and Sequencing (Illumina and PacBio) A mixed RNA sample from muscle, liver, and heart tissues was prepared for PacBio full-length sequencing based on single-molecule, real-time (SMRT) sequencing technology to obtain an accurate, full-length transcript and to precisely annotate the genome.Each sample was also subjected to high-throughput sequencing on an Illumina NovaSeq6000 platform.The total RNA was extracted from muscle, liver, and heart tissues using the Trizol Reagent Kit (Invitrogen, Carlsbad, CA, USA).The Agilent Bioanalyzer 2100 and Nanodrop 2000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) were used to detect the completeness, concentration, and purity of the total RNA.Good-quality RNA samples were subsequently used for library preparation and sequencing.
The total RNA isolated from the three tissues was mixed in equal amounts, and the mRNA was enriched using oligo (dT) magnetic beads.The full-length cDNA was synthesized using a SMARTer PCR cDNA Synthesis Kit (Takara Bio, Beijing, China).The cDNA was then subjected to PCR amplification, terminal repair, and the attachment of dumbbell-shaped SMRT adapters to construct a full-length transcriptome library.The library quality was checked, and the good-quality libraries were sequenced on a PacBio Sequel sequencer (Pacific Biosciences).The subread sequences were obtained by processing the raw data on SMRTlink (PacBio) and correcting it to obtain circular consensus sequences (CCS).The sequences were divided into full-length sequences and non-full-length sequences based on whether the CCS contained 5 ′ -end primer, 3 ′ -end primer, or polyA tail.The full-length sequences were clustered to obtain the clustered consensus sequences.The accurate, full-length transcripts were finally obtained using isoseq3 in SMRTlink software (https://www.omicsclass.com/article/344,accessed on 26 September 2023).
The mRNA was enriched using oligo (dT) magnetic beads and fragmented in a fragmentation buffer.The first-strand cDNA was synthesized using random hexamer primers, while the second-strand cDNA was synthesized using DNA polymerase I and RNase H.The cDNA was purified and subjected to end repair, poly(A) addition, and Illumina sequencing adaptor ligation.Those with suitable sizes were selected for PCR amplification.The PCR products were subsequently purified using AMPure XP Beads to obtain a short-fragment sequencing library.Good-quality libraries were then sequenced on an Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA).Clean reads were obtained by the fine filtering of raw reads using Fastp v0.21.0 [34].
Candidate genes associated with the body size, growth velocity, and migratory habits of D. maruadsi were screened using three groups of positive selection analyses based on single-copy gene families: D. maruadsi vs. C. melampygus, S. dumerili and S. lalandi (group 1); D. maruadsi vs. T. albacares and T. maccoyii (group 2); and D. maruadsi vs. L. crocea, E. akaara and A. schlegelii (group 3).Multiple protein sequence alignments of single-copy gene families were conducted using MUSCLE v3.7 [57], and the alignment results were used as templates to generate the CDs alignment results.The codeml program in PAML was employed to test whether the genes were under positive selection using the branch-site specific model.GO and KEGG enrichment analyses were performed to obtain candidate genes associated with growth, body size, and migratory habits from the positive selection genes.
The NGenomeSyn v1.41 software [60] was used to construct the syntenic blocks based on the high-quality chromosome-level genomes of male D. maruadsi (this study), female D. maruadsi (GCA_030347415.2,723.8 Mb of genome size, 13.6 Mb of contig N50, 32.2 Mb of scaffold 50), and male T. maccoyii (GCF_910596095.1,782.4 Mb of genome size, 26.8 Mb of contig N50, 33.8 Mb of scaffold 50).This analysis was done to assess the accuracy and completeness of the assembled genome and identify the genome homologous regions between the assembled genome and other genomes.

Analysis of Genomic Characterization
In this study, 118,505,182 raw paired reads and 35,551,554,600 bp (35.55 G) raw base data were generated by Illumina DNA sequencing (Table 1).The removal of low-quality data yielded 91,050,982 clean paired reads and 27,315,294,600 bp clean data, with 96.48% of Q20, 91.15% of Q30, 0.04% of sequencing error rate, and 42.58% GC content.These values illustrated that the construction and sequencing quality of the genomic DNA short-fragment library was good and would guarantee accuracy in the subsequent analyses.Ten thousand clean reads of D. maruadsi were randomly selected and mapped to the nucleotide sequences (NT) database.Notably, the top six species with the highest sequence coverages were Dicentrarchus labrax (0.53%), Haplochromis burtoni (0.45%), O. niloticus (0.31%), Xiphophorus maculatus (0.19%), T. rubripes (0.19%), and G. aculeatus (0.16%), indicating that the alignments were orthologous and the D. maruadsi sample was not contaminated with external nucleotides.The genome of D. maruadsi was estimated to be approximately 739.40 Mbp (Figure S1), with 1.18% heterozygosity and 35.16% repeat sequences, based on the filtered sequence data (clean reads).The genome was subsequently revised to 722.79 Mbp using K-mer frequency analysis (accession number JBANGS000000000).

Genome Assembly and Evaluation
A total of 23,164,121,378 bp (23.16 G) sequencing data and 1,765,660 high-quality HiFi reads were generated by PacBio-SMRT sequencing.The average length and N50 length of the HiFi reads were 13,119 bp and 13,207 bp, respectively, and the sequencing depth was 32.04× (based on the estimated genome size of 722.79 M by survey).The 1,765,660 high-quality HiFi reads were assembled into 349 contigs that were further error-corrected based on Illumina sequencing data (27.32 G).Finally, 716,127,322 bp (716.13Mb) of the draft genome with 20,796,328 bp of N50 were obtained.Notably, this genome size was very close to the estimated genome size based on K-mer frequency analysis (722.79 Mb).
BUSCO analysis (Figure S2) yielded 3538 (97.2%) complete BUSCOs, of which 3494 (96.0%) were complete single-copy BUSCOs and 44 (1.2%) were complete duplicated BUSCOs.This finding suggested that the D. maruadsi genome assembled based on PacBio sequencing data had high coverage of all gene regions.CEGMA evaluation revealed that the assembled genome of D. maruadsi completely covered 233 of 248 (93.95%) conserved-core eukaryotic genes.The evaluation results of BUSCO and CEGMA strongly suggested that the assembled genome sequence of D. maruadsi was relatively complete.
All clean reads obtained by Illumina sequencing were mapped onto the assembled genome.The mapping rate, coverage rate, and average per-base sequencing depth were approximately 97.76%, 99.94%, and 36.05%,respectively, indicating excellent consistency between the Illumina reads and the assembled genome.The alignments yielded 5,341,607 SNPs (0.7547%), including 5,340,262 heterozygous SNPs (0.7545%) and 1345 homozygous SNPs (0.0002%), indicating that the assembled genome had a high single-base accuracy.The GC content of the assembled genome was concentrated around 42.49%, and there was no obvious GC separation, indicating that there was no exogenous pollution in the genome.

Chromosome-Level Genome Assembly by Hi-C
The draft genome of D. maruadsi was further scaffolded using Hi-C technology.Illumina sequencing (Hi-C library presequencing data) generated 12,383,804 raw paired reads and 3,715,141,200 bp (3.7 G) raw data.A total of 10,430,159 clean paired reads and 3,708,356,100 bp clean data were generated after quality control.The sequence quality values Q20 and Q30 were 90.75% and 96.34%, respectively, while the sequencing error rate and the GC content were 0.04% and 44.14%, respectively.These indicators suggested that the Hi-C library construction and sequencing were of high quality.HiCUP analysis (Table S1) revealed that 6,544,378 of the 10,430,159 clean paired reads were successfully matched (62.74% total paired ratio), 5,564,284 (85.02%) were valid read pairs (Di-tags), and 980,094 (14.98%) were invalid Di-tags (same circularized, same fragment dangling ends, same fragment internal).A total of 5,176,451 unique valid Di-tags were generated after the duplicate Di-tags in the valid Di-tags were filtered (Table S2).The unique valid Di-tags included 2,393,982 cis Di-tags (489,982 cis-close Di-tags and 1,904,000 cis-far Di-tags) and 2,782,469 trans-Di-tags.In summary, the effective utilization rate of Hi-C data was 49.63% (unique, valid Di-tags/clean paired reads), indicating that the Hi-C library construction, sequencing, and analytical results were valid.Hi-C library can thus be used for massive sequencing to derive chromosome-level genome assembly.
The total sequencing data of Hi-C showed that a total of 251,272,317 raw paired reads and 75,381,695,100 bp (75.38 G) raw data were generated by the Illumina sequencing platform.Initial quality control yielded 208,470,664 clean paired reads and 62,541,199,200 bp clean data.The Q20, Q30, sequencing error rate, and GC content were 96.30%, 90.91%, 0.04%, and 44.23%, respectively, indicating high-quality Hi-C library construction and sequencing.Ten thousand clean reads were randomly selected for comparison with the NT database.The top six species with the highest sequence coverages were H. burtoni (0.41%), O. niloticus (0.37%), D. labrax (0.37%), Trachurus japonicus (0.17%), D. maruadsi (0.14%, secondgeneration sequencing assembly), and X. maculatus (0.12%), showing that D. maruadsi was highly homologous with these fish and its draft genome was not contaminated with external nucleotides.
The draft genome assembled in Section 3.2 was loaded onto the chromosomes based on the valid Hi-C data after quality control (Table 2).A total of 357 contigs (≥100), with 716,127,322 bp of the total length and 19,703,568 bp of N50, were generated.These contigs were further assembled into 74 scaffolds with 716,155,622 bp of the total length and 30,768,099 bp of N50.Among the 74 scaffolds, 23 scaffolds with a total length of 685,544,437 bp were assembled into 23 chromosome-level sequences, while the remaining 51 scaffolds with a total length of 30,611,185 bp were not assembled into chromosome-level sequences (Figure 2).The final assembly spanned 23 chromosomes with sizes ranging between 21.74 and 44.53 Mb, representing 95.73% of the genome.S3).The clean data drawn from the muscle, liver, and heart tissues were 6.25 G, 5.67 G, and 6.25 G, respectively.The Q20, Q30, GC content, and sequencing error rate of the clean reads of the three tissues were 97.69%~97.83%,93.54%~93.85%,49.80%~52.83%,and 0.03%, respectively, which suggested that the Illumina sequencing quality was good.
Full-length transcriptomes of the mixed samples of the three tissues were also obtained after PacBio SMRT sequencing.A total of 835,845 polymerase reads and 83.33 G polymerase read bases were obtained, with a mean length of 99,697 bp and an N50 of 170,923 bp.These polymerase reads were split into 24,423,333 subreads (81.52 G) with an average length of 3338 bp and N50 of 3778 bp.The high-quality transcriptome data based on Illumina and PacBio sequencing were subsequently used to assist in genome annotation.S3).The clean data drawn from the muscle, liver, and heart tissues were 6.25 G, 5.67 G, and 6.25 G, respectively.The Q20, Q30, GC content, and sequencing error rate of the clean reads of the three tissues were 97.69%~97.83%,93.54%~93.85%,49.80%~52.83%,and 0.03%, respectively, which suggested that the Illumina sequencing quality was good.
Full-length transcriptomes of the mixed samples of the three tissues were also obtained after PacBio SMRT sequencing.A total of 835,845 polymerase reads and 83.33 G polymerase read bases were obtained, with a mean length of 99,697 bp and an N50 of 170,923 bp.These polymerase reads were split into 24,423,333 subreads (81.52 G) with an average length of 3338 bp and N50 of 3778 bp.The high-quality transcriptome data based on Illumina and PacBio sequencing were subsequently used to assist in genome annotation.

Structural and Functional Annotation of Protein-Coding Genes
A total of 27,885 protein-coding genes were annotated in the genome of D. maruadsi by combining homology-based, de novo, and RNA-seq-assisted prediction methods (Table 3).A total of 22,716 protein-coding genes with UTR regions were obtained after the variable shear, low-quality transcripts (overlapping with TE ≥ 20%, premature termination, only de novo evidence supported, less than one of rpkm expression in all tissues), and redundant single-exons were removed (Figure 4A).The average lengths of the protein-coding genes and coding region were 12,823.03bp and 1676.66 bp, respectively.Each gene contained an average of 9.65 exons.The average lengths of exons and introns were 173.83 bp and 1289.29 bp, respectively.Subsequently, 22,716 protein-coding genes were

Structural and Functional Annotation of Protein-Coding Genes
A total of 27,885 protein-coding genes were annotated in the genome of D. maruadsi by combining homology-based, de novo, and RNA-seq-assisted prediction methods (Table 3).A total of 22,716 protein-coding genes with UTR regions were obtained after the variable shear, low-quality transcripts (overlapping with TE ≥ 20%, premature termination, only de novo evidence supported, less than one of rpkm expression in all tissues), and redundant single-exons were removed (Figure 4A).The average lengths of the protein-coding genes and coding region were 12,823.03bp and 1676.66 bp, respectively.Each gene contained an average of 9.65 exons.The average lengths of exons and introns were 173.83 bp and 1289.29 bp, respectively.Subsequently, 22,716 protein-coding genes were functionally annotated using SwissProt, Nr, KEGG, and InterPro databases (Figure 4B).Finally, a total of 22,037 (97%) genes were functionally annotated in at least one database, while the remaining 679 (3%) genes were unannotated.

Gene Family Clustering, Expansions, and Contractions
Cluster analysis of 16,858 (C.maximus) and ~32,712 (C.idella) genes in D. maruadsi and 16 other fish revealed 23,981 gene families, among which 2468 were common single-copy gene families (Figure 5A and Table 4).A Venn diagram of the gene families (Figure 5B) showed that 11,871 orthologous gene families were shared between D. maruadsi and three other fish species of the same family Carangidae (C.melampygus, S. dumerili, and S. lalandi), while 459 gene families were unique to D. maruadsi.GO enrichment analysis (Figure S3 and Table S6) of the 459 unique gene families categorized them into 32 GO terms, including RNA-directed DNA polymerase activity (GO:0003964), RNA-dependent DNA biosynthetic process (GO:0006278), and carbohydrate-binding (GO:0030246), among other GO terms.KEGG enrichment analysis showed that the unique gene families were mainly involved in the regulation of lipolysis in adipocytes (map04923), the PPAR signaling pathway (map03320), the synaptic vesicle cycle (map04721), glycosaminoglycan biosynthesisheparan sulfate/heparin (map00534), the Apelin signaling pathway (map04371), peroxisome (map04146), the biosynthesis of unsaturated fatty acids (map01040), and the GABAergic synapse (map04727), among other functions and pathways.

Gene Family Clustering, Expansions, and Contractions
Cluster analysis of 16,858 (C.maximus) and ~32,712 (C.idella) genes in D. maruadsi and 16 other fish revealed 23,981 gene families, among which 2468 were common single-copy gene families (Figure 5A and Table 4).A Venn diagram of the gene families (Figure 5B) showed that 11,871 orthologous gene families were shared between D. maruadsi and three other fish species of the same family Carangidae (C.melampygus, S. dumerili, and S. lalandi), while 459 gene families were unique to D. maruadsi.GO enrichment analysis (Figure S3 and Table S6) of the 459 unique gene families categorized them into 32 GO terms, including RNA-directed DNA polymerase activity (GO:0003964), RNA-dependent DNA biosynthetic process (GO:0006278), and carbohydrate-binding (GO:0030246), among other GO terms.KEGG enrichment analysis showed that the unique gene families were mainly involved in the regulation of lipolysis in adipocytes (map04923), the PPAR signaling pathway (map03320), the synaptic vesicle cycle (map04721), glycosaminoglycan biosynthesis-heparan sulfate/heparin (map00534), the Apelin signaling pathway (map04371), peroxisome (map04146), the biosynthesis of unsaturated fatty acids (map01040), and the GA-BAergic synapse (map04727), among other functions and pathways.The expansion and contraction analysis of 23,981 gene families in 17 species (Figure 6) showed that 73 gene families were expanded while 52 gene families were contracted in D. maruadsi compared with the common ancestors of D. maruadsi and C. melampygus during the evolutionary process.Enrichment analysis (Figure S4A,B and Table S6) showed that the 73 expanded gene families were categorized into 34 GO terms, including exoalpha-sialidase activity (GO:0004308), nucleosome (GO:0000786) and nucleosome assembly (GO:0006334), and 27 KEGG pathways, including sphingolipid metabolism (map00600), PPAR signaling pathway (map03320), primary bile acid biosynthesis (map00120), longevity regulating pathway-multiple species (map04213), and regulation of lipolysis in adipocytes (map04923), among other functions and pathways.In the same line, the 52 contracted (Figure S4C,D and Table S6) gene families were mainly involved in 27 GO terms, including neurotransmitter: sodium symporter activity (GO:0005328), neurotransmitter transport (GO:0006836), and homophilic cell adhesion via plasma membrane adhesion molecules (GO:0007156), as well was 22 KEGG pathways, including synaptic vesicle cycle (map04721), GABAergic synapse (map04727), graft-versus-host disease (map05332), allograft rejection (map05330), viral myocarditis (map05416), among other functions and pathways.Notably, the unique gene families and the expansion and contraction gene families of D. maruadsi are involved in the PPAR signaling pathway, the regulation of lipolysis in adipocytes, the synaptic vesicle cycle, and GABAergic synapse, amongst other functions.

Phylogenetic Tree and Divergence Times
A phylogenetic tree constructed based on 2468 single-copy gene families shared by the 17 fish species showed that D. maruadsi and C. melampygus clustered into one clade with 100% bootstrap support and then clustered with two other Carangidae fish (S. lalandi and S. dumerili) with a bootstrap support of 100%.Four Carangidae fish were grouped with other seven Perciform fish and finally clustered with one Perciform fish, one Beloniform fish, and three Cypriniform fish.This result is not entirely consistent with the

Phylogenetic Tree and Divergence Times
A phylogenetic tree constructed based on 2468 single-copy gene families shared by the 17 fish species showed that D. maruadsi and C. melampygus clustered into one clade with 100% bootstrap support and then clustered with two other Carangidae fish (S. lalandi and S. dumerili) with a bootstrap support of 100%.Four Carangidae fish were grouped with other seven Perciform fish and finally clustered with one Perciform fish, one Beloniform fish, and three Cypriniform fish.This result is not entirely consistent with the taxonomic classification based on their morphological characteristics.Mismatches between morphological and molecular identifications are common for the classification of many organisms, and more powerful phylogenetic evidence will be needed for the accurate classification.The evolutionary tree of the 17 fish species (Figure 7

Collinearity Analysis
The 23 chromosomes of the male D. maruadsi displayed significant collinearity with 23 chromosomes of the female D. maruadsi and 24 chromosomes of the male T. maccoyii, a closely related species in the Carangidae family (Figure 9 and Table 6).Notably, the genomic collinearity between the male D. maruadsi and the male T. maccoyii outperformed that between the male D. maruadsi and the female D. maruadsi.Moreover, chromosome 14 of the male D. maruadsi corresponded strongly to chromosome 7 and chromosome 24 of the male T. maccoyii.These results collectively demonstrated the high accuracy, completeness, and continuity of the genomes assembled in this study.
x FOR PEER REVIEW 19 of 28 Table 6.Chromosome comparison of the male D. maruadsi, the female D. maruadsi, and the male T. maccoyii.

Genome Features
The chromosome-level genome of the male D. maruadsi were assembled based on Illumina, PacBio, and Hi-C technologies.Notably, the assembled genome size of the have actively contributed to D. maruadsi genome sizes and could be taxonomically and evolutionarily significant.
The TGF-β superfamily is a ubiquitous class of multi-effector cytokines in vertebrates, including TGF-β, BMPs, GDFs, and other subfamilies.TGF-β can interact with its receptors on the surface of target cells to mediate target-intracellular signaling by activating Smaddependent signaling pathways to regulate multiple biological processes, such as cellular proliferation, differentiation, growth control, and skeletal formation [85].The TGF-β subfamily thus plays important roles in the growth and development process in fish.Four members of TGF-β, TGF-β1, TGF-β2, TGF-β3, and TGF-β6 were found in fish [86].Of note, the TGF-β1 and TGF-β2 genes were significantly enriched in our study.TGF-β1 is the most widely expressed subtype of the TGF-β subfamily and is a multi-functional regulator of cell growth and differentiation.TGF-β2 delays the differentiation of muscle cells but increases cell proliferation.TGF-β1 has previously been shown to induce the differentiation of rainbow trout (Oncorhynchus mykiss) cardiac fibroblasts into myofibroblasts [87], inhibit zebrafish oocyte maturation at multiple sites [88,89], and limit the production of androgen and the maturation-inducing hormone (17α,20β-dihydroxy-4-pregnen-3-one) in goldfish (Carassius auratus) ovaries, further influencing follicular maturation as a local regulator [90].Previous studies postulate that the expression of TGF-β2 is dynamically regulated during muscle growth resumption in rainbow trout and satellite cell differentiation [91].TGF-β2-null mice exhibit a profound delay of hair follicle morphogenesis, with a 50% reduced number of hair follicles [92].These reports strongly suggest that TGF-β1 and TGF-β2 potentially play a vital regulatory role in the growth and development of the ovary, cardioid, and muscle of D. maruadsi.
BMPs are the earliest signaling molecules that induce bone formation and differentiation.BMPs specifically bind to BMPR on the surfaces of cell membranes and transmit signals to the R-Smad, thereby activating or inhibiting the expression of genes associated with the formation of cartilage and bone, embryonic development, neural differentiation, adipogenesis, and ovarian follicle development.Numerous BMPs, such as BMP2-7, BMP8a, and BMP9-16, have been reported in fish.Among these BMPs, BMP2, BMPR2, BMP3, BMP3b, BMP10, BMP14, and BMP15 were significantly enriched in the positive selection in this study.BMP2 and BMP14 positively regulate the growth and development of bone, carti-lage, and tendon by interacting with their receptors [93].The silencing of BMP2 and BMP14 leads to the loss of joints in Lethenteron japonicum [94], heart malformation and shortening of pectoral and median fins in zebrafish [95][96][97], and a characteristic malformation of the scapula in mice [98].BMP3 is a powerful negative and positive regulator of skeletal development.The significant up-regulation of BMP3 accelerates bone growth in Sinocyclocheilus graham [99], and the silencing of BMP3 results in the poor development of the zebrafish's head [100,101].Of note, BMP3-knockout mice show an increase in bone mass [102].BMP-3b functions predominantly in bone and cartilage development and can inhibit osteoblast differentiation by antagonizing BMP-2 and -4-mediated osteogenesis [103][104][105].BMP-3b injected into Xenopus embryos triggers secondary head formation autonomously, whereas the depletion of BMP-3b caused headless Xenopus embryos [106].BMP3b is upregulated immediately following a fracture and is constitutively expressed at a higher level throughout osteogenesis [107].BMP10, a cardiac-specific growth factor, promotes cell proliferation in the myocardium and plays a key role in heart development [108,109].Silencing BMP10 in zebrafish leads to the reduction and death of cardiomyocytes [110].Mutations in the BMP10 gene result in a hypoplastic ventricular wall, the loss of ventricular trabeculae, and a significant decrease in heat rate during mouse embryo development [111].BMP15, a growth factor secreted by oocytes in the ovaries, plays a crucial regulatory role in the follicular development of birds and mammals [112].Previous studies postulate that BMP-15 prevents premature oocyte maturation in zebrafish, which helps maintain oocyte quality and subsequent ovulation and fertilization [113].Silencing BMP15 leads to decreased ovulation and reduced fertility in mice [114].In summary, BMP2, BMPR2, BMP14, BMP3, BMP3b, BMP10, and BMP15 in D. maruadsi play vital regulatory roles in the growth and development processes of various tissues, such as bones, heart, muscles, and ovaries.
PRL is a single-chain polypeptide hormone produced by the anterior pituitary [115].It binds to the PRLR and activates signaling molecules that influence gene expression and transcription associated with growth, development, reproduction, and immunity [116].In this study, PRL and PRLR were significantly enriched, indicating that the two genes play important roles in D. maruadsi.Previous studies postulate that PRL and PRLR are associated with growth in Scophthalmus maximus [117].In Hippocampus abdominalis, the accumulation of PRL in the brood pouch reduces early embryonic mortality by regulating Na + /K + -ATPase and reducing Na + /K + concentration [118].Additionally, PRL enhances lymphocyte proliferation and inhibits cortisol-induced proliferation and apoptosis in rainbow trout [119].Platelet-derived growth factor (PDGF) is an important mitogenic factor and is comprised of PDGFA, PDGFB, PDGFC, and PDGFD.These factors stimulate the division and proliferation of specific cell populations and play regulatory roles in cell differentiation and ontogeny.Herein, PDGFA and PDGFB were significantly enriched and were both significantly involved in the physiological and pathological processes in the body, such as embryonic development and tissue repair.Zhang postulates that PDGFA expression is highest in the middle development stage of the tip-tissue of Sika deer antler, enhancing the proliferation rate of antler tip cells [120].Tallquist et al. [121] and Bjarnegård et al. [122] report that PDGFA and PDGFB knockout in mice results in embryonic lethality, cardiac enlargement, and ventricular septal defect.These reports highlight the vital regulatory roles of PRL, PRLR, PDGFA, and PDGFB in the growth, development, and reproduction of D. maruadsi.
LEPR was also significantly enriched in the three positive selection groups of D. maruadsi.LEPR specifically binds to Leptin and further activates many signaling pathways (JAK/STAT, MAPK, and PI3K, among others), thereby regulating feeding, glycolipid metabolism, growth and development, reproduction, immune response, and other physiological processes [123].The significant up-regulation of LEPR expression during early development and ovarian maturation efficiently regulates food intake, energy reserve, and reproduction in D. labrax and Cynoglossus semilaevis [124,125].A hypothesis that LEPR and its associated genes are correlated with the growth, development, and reproduction of D. maruadsi is thus put forward.All genes associated with growth, development, and

Figure 2 .
Figure 2. Primary chromosome contact map of the D. maruadsi genome based on Hi-C data.The color reflects the intensity of each contact, with deeper colors representing higher intensities.

3. 4 .
Genome Annotation 3.4.1.PacBio and Illumina RNA-Seq Data A total of 62,936,860 raw reads and 18.88 G raw data were obtained through transcriptome sequencing of the muscle, liver, and heart tissues.The removal of low-quality reads yielded 60,532,297 clean reads and 18.17 G clean data.The clean reads drawn from the muscle, liver, and heart tissues were 20,827,273, 18,887,824, and 20,817,200, respectively (Table

Figure 2 .
Figure 2. Primary chromosome contact map of the D. maruadsi genome based on Hi-C data.The color reflects the intensity of each contact, with deeper colors representing higher intensities.

3. 4 .
Genome Annotation 3.4.1.PacBio and Illumina RNA-Seq Data A total of 62,936,860 raw reads and 18.88 G raw data were obtained through transcriptome sequencing of the muscle, liver, and heart tissues.The removal of low-quality reads yielded 60,532,297 clean reads and 18.17 G clean data.The clean reads drawn from the muscle, liver, and heart tissues were 20,827,273, 18,887,824, and 20,817,200, respectively (Table

Figure 3 .
Figure 3. Genome coordinates and annotation information of the D. maruadsi genome, including (A) the length of assembled chromosomes, (B) the distribution of DNA transposable elements, (C) the distribution of RNA transposable elements, (D) the distribution of genes, and (E) the GC content of the genome.

Figure 3 .
Figure 3. Genome coordinates and annotation information of the D. maruadsi genome, including (A) the length of assembled chromosomes, (B) the distribution of DNA transposable elements, (C) the distribution of RNA transposable elements, (D) the distribution of genes, and (E) the GC content of the genome.

Figure 4 .
Figure 4. Prediction of gene structures in D. maruadsi (A).Venn diagram of functional annotation based on different databases (B).

Figure 4 .
Figure 4. Prediction of gene structures in D. maruadsi (A).Venn diagram of functional annotation based on different databases (B).

Figure 5 .
Figure 5. Types and numbers of gene families in 17 species (A) and quantitative analysis of gene families in D. maruadsi, C. melampygus, S. lalandi, and S. dumerili (B).

Figure 5 .
Figure 5. Types and numbers of gene families in 17 species (A) and quantitative analysis of gene families in D. maruadsi, C. melampygus, S. lalandi, and S. dumerili (B).

Figure 6 .
Figure 6.The expanded and contracted gene families of 17 fish species during the evolutionary process.MRCA: most recent common ancestor.

Figure 6 .
Figure 6.The expanded and contracted gene families of 17 fish species during the evolutionary process.MRCA: most recent common ancestor.

28 Figure 7 .
Figure 7.The phylogeny and divergence times of D. maruadsi and other fish.The red numbers on the nodes indicate the estimated divergence times.Cetorhinus maximus was chosen as the outgroup species.

Figure 7 .
Figure 7.The phylogeny and divergence times of D. maruadsi and other fish.The red numbers on the nodes indicate the estimated divergence times.Cetorhinus maximus was chosen as the outgroup species.

Figure 9 .
Figure 9. Collinearity analysis of the male D. maruadsi and the female D. maruadsi (A) and the male T. maccoyii (B).Each coloured line in (A,B) represents a 1 Kbp fragment match between two species.

Figure 9 .
Figure 9. Collinearity analysis of the male D. maruadsi and the female D. maruadsi (A) and the male T. maccoyii (B).Each coloured line in (A,B) represents a 1 Kbp fragment match between two species.

Table 1 .
Statistics of assembly data in genome sequencing.

Table 3 .
Statistics of gene structure and functional annotation predicted by three different methods.

Table 4 .
Gene family clustering in 17 fish species.

Table 4 .
Gene family clustering in 17 fish species.

Table 5 .
The results of the positive selection analysis of D. maruadsi.

Table 6 .
Chromosome comparison of the male D. maruadsi, the female D. maruadsi, and the male T. maccoyii.Male D. maruadsiFemale D. maruadsi Male T. maccoyii