Chromosome Genome Assembly of Cromileptes altivelis Reveals Loss of Genome Fragment in Cromileptes Compared with Epinephelus Species

The humpback grouper (Cromileptes altivelis), an Epinephelidae species, is patchily distributed in the reef habitats of Western Pacific water. This grouper possesses a remarkably different body shape and notably low growth rate compared with closely related grouper species. For promoting further research of the grouper, in the present study, a high-quality chromosome-level genome of humpback grouper was assembled using PacBio sequencing and high-throughput chromatin conformation capture (Hi-C) technology. The assembled genome was 1.013 Gb in size with 283 contigs, of which, a total of 143 contigs with 1.011 Gb in size were correctly anchored into 24 chromosomes. Moreover, a total of 26,037 protein-coding genes were predicted, of them, 25,243 (96.95%) genes could be functionally annotated. The high-quality chromosome-level genome assembly will provide pivotal genomic information for future research of the speciation, evolution and molecular-assisted breeding in humpback groupers. In addition, phylogenetic analysis based on shared single-copy orthologues of the grouper species showed that the humpback grouper is included in the Epinephelus genus and clustered with the giant grouper in one clade with a divergence time of 9.86 Myr. In addition, based on the results of collinearity analysis, a gap in chromosome 6 of the humpback grouper was detected; the missed genes were mainly associated with immunity, substance metabolism and the MAPK signal pathway. The loss of the parts of genes involved in these biological processes might affect the disease resistance, stress tolerance and growth traits in humpback groupers. The present research will provide new insight into the evolution and origin of the humpback grouper.


Introduction
The humpback grouper (Cromileptes altivelis), also called the mouse grouper, belongs to the Cromileptes genus of Epinephelidae. The genus only consists of the one species due to its special morphology compared to other groupers. Humpback groupers are mainly distributed in Western Pacific water [1]. The natural population of the humpback grouper is decreasing all over the world due to overfishing, climate change and habitat destruction (IUCN, https://www.iucnredlist.org/species/39774/100458943, accessed on 20 May 2021). Underwater surveys carried out in many areas, including Indonesia, Australia and New Caledonia, suggest that the humpback grouper is extremely rare and patchily distributed in reef habitats. The humpback grouper is one of the most expensive species in all of the grouper species in the live marine fish trade; the price has reached 130 USD per kilogram owing to its tender meat, beautiful color, special shape and rarity (IUCN). Though the artificial culture of the humpback grouper has been successful in recent years [2], the price is still high due to its low growth rate and low output of larva fish. Molecular-assisted selection based on the genome information is essential to increase the growth rate of humpback groupers in future breeding.
The humpback grouper possesses special morphology compared with Epinephelus species. It has an extremely small anterior part of the head and a moderately deep body which forms a concave dorsal profile on the postorbital part, hence why it is also called humpback grouper. Moreover, it possesses a subuliform head and basiconic jaw without canine teeth so it is also widely called a mouse grouper. Generally, grouper species possess 7-11 dorsal spines. Just four groupers possess 10 dorsal spines, aside from the humpback grouper, including Epinephelus snalogus, Hyporthodus exsul and H. nigritus. Based on molecular phylogenetic analysis of mitochondrial and nuclear genes of groupers, the humpback grouper is separated from Epinephelus species [3][4][5][6]. However, the growth ratio of humpback groupers is far less than the most closely related Epinephelus species, such as E. lanceolatus and E. fuscoguttatus. The speciation and evolution of humpback groupers is a controversial issue. At present, no valid genome and transcriptome have been reported for the species. Lack of genetic resources seriously hinders the research about humpback groupers. Recently, several grouper species genomes, including redspotted grouper (E. akaara) [7], giant grouper (E. lanceolatus) [8], leopard coral grouper (Plectropomus leopardus) [9], kelp grouper (Epinephelus moara) [10] and brown-marbled grouper (Epinephelus fuscoguttatus) [6], have been published. These high-quality genomes provide essential genetic resources for better understanding of the evolution and phylogeny of grouper species. In the present study, the assembly of the humpback grouper genome is important for further research on the speciation, evolution, molecular-assisted selection and the developmental mechanism of special shape.

Sample Collection, Library Construction and Sequencing
A humpback grouper with body weight of 183.0 g and total length of 25.4 cm (Figure 1) was collected from Chenhai Aquatic Co., Ltd. (Hainan, China). The fish was immediately dissected after anesthesia with MS-222. White muscle tissue in the dorsal was sampled and immediately stored in liquid nitrogen, which was used for genomic DNA sequencing and Hi-C library construction. Moreover, ten tissues, including skin, muscle, liver, kidney, brain, intestine, fat, spleen, heart and gill, were collected and stored in RNAlater for transcriptome sequencing. Total DNA was extracted from white muscle tissue with a TIANamp Marine Animals DNA Kit (Tiangen Biotech Co., Ltd., Beijing, China). Quality and quantity of total DNA were determined by NanoDrop 2000 (Thermo Fisher Scientific Inc., Waltham, MA, USA). A paired-end sequencing library with insert length of 350 bp was constructed using a TruSeq Nano DNA LT Library Preparation Kit (Illumina, San Diego, CA, USA). The obtained library was then sequenced on Illumina HiSeq X Ten platform.
Genome DNA was broken into fragments by Covaris and was recycled by AMpure PB beads (Pacific Biosciences, Menlo Park, CA, USA). A SMRTbell library was constructed using SMRTbell Template Prep Kit (Pacific Biosciences, USA), according to the manufacturer's instruction and was sequenced on PacBio Bioscience Sequel platform (Pacific Biosciences, USA).
Muscle samples were fixed with fresh paraformaldehyde and then DNA-protein bonds were created. The Mbo I restriction enzyme was used to digest the DNA and the overhanging 5' ends of the DNA fragments were repaired with a biotinylated residue. The fragments, closed to each other in the nucleus during fixation, were ligated and the denatured proteins were removed. The Hi-C fragments were further sheared by sonication and then pulled down with streptavidin beads. The library was sequenced on an Illumina HiSeq X Ten platform with PE150 strategy.
Total RNA of the 10 tissues (approximately 80 mg of each) was extracted using RNAiso reagents (Takara, Dalian, China), following the manufacturer's instructions. The quantity and quality of RNA samples were determined using a microplate spectrophotometer (BioTek Company, Winooski, VT, USA) and electrophoresis which was conducted using 1% agarose gel. Total RNA of the 10 tissues was mixed with equal amounts to generate a mixed RNA pool. An RNA-seq library was prepared using NEBNext UltraTM RNA Library Prep Kit (NEB, USA), following the manufacturer's protocols. The library's quality and quantity were measured using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Finally, the libraries were sequenced on Illumina-Hiseq 2000 platform with PE150 paired-end approach. A total of 119.33 Gb clean data was obtained with depth of 114.26× (Table 1).

Genome Assembly
We estimated the main genome characteristics of the humpback grouper through k-mer frequency distribution analysis [11]. After filtering, the clean data were used to estimate genome size and heterozygosity using 19-mer.
For the PacBio sequencing data, after removing low-quality reads, clean data were corrected and assembled using Canu version 1.8 with parameters of genomeSize = 107,000,000, corOutCoverage = 80, corMhapSensitivity = low and correctedErrorRate = 0.025 [12]. Wtdbg was used to assemble genome by constructing fuzzy Brujin graph (https://github. com/ruanjue/wtdbg accessed on 20 May 2021). Quickmerge [13] was used to merge assemblies produced by Canu and wtdbg to produce a more contiguous assembly with a parameter of -hco 5.0 -c 1.5 -l 100,000 -ml 5000. In simple terms, contigs from Canu as query input and contigs from wtdbg as reference input are aligned through mummer version 4.0 [14]. The assembled genome was polished using Pilion version 1.22 [15]. For estimating genome completeness, Illumina data were mapped back to the humpback grouper genome to calculate the mapping rate using BWA v0.7.17 [16]. The genome integrality was verified based on the Core Eukaryotic Genes Mapping Approach (CEGMA) database [17] and Benchmarking Universal Single-Copy Orthologs (BUSCO) database [18].

Pseudochromosome Construction
The sequencing data of Hi-C were filtered to remove low-quality reads using Fastp version 0.12.6 [19]. The clean reads were aligned to draft genome of humpback grouper using bowtie2 version 2.3.2 [20] with end-to-end model and a parameter of very-sensitive. The draft genome was broken into 50 Kb fragments and reassembled with correct cluster, order and orient of the contigs using Lachesis with parameters of CLUSTER_MIN_RE_SITES = 100; CLUSTER_MAX_LINK_DENSITY = 2; CLUSTER_NONINFORMATIVE_RATIO = 2; OR-DER_MIN_N_RES_IN_TRUN = 15; and ORDER_MIN_N_RES_IN_SHREDS = 15 [21].
The phylogenetic tree was constructed using shared single-copy genes among the eight fish species mentioned above. Protein sequences of these single-copy genes were aligned using MAFFT version 7.394 [48]. Gblocks version 0.91b [49] was used to remove low-quality alignments. The phylogenetic tree was constructed using RAxML software version 8.0.9 [50] and IQ-tree version 1.6.11 [51] with bootstrap value of 1000.
The genomic collinearity analyses of brown-marbled grouper and other groupers were carried out using MCScanX [52]. The divergence time at each tree node was predicted using MCMCtree in PAML package version 4.9 [53]. The two calibration times including E. akaara vs. E. fuscoguttatus with 15.2~28.5 Myr and L. crocea vs. P. leopardus with 99~127 Myr were obtained from the TimeTree database [54].

Sequencing and Genome Assembly
Firstly, next-generation sequencing was applied to estimate the genome characteristics, including size, heterozygosity and repeat ratio, using the k-mer method. A total of 196.19 Gb clean data (184×) was obtained, with Q20 of 96.80% and Q30 of 91.90% (Table 1). The main peak of 19-mer in frequency distribution was at a depth of 152× (Figure 2A). The predicted genome size of the humpback grouper is 1.07 Gb with repeats of 29.53%. The heterozygosity and GC content were 0.09% and 41.2%, respectively.  (Table 1). A draft genome was assembled with 470 contigs with a length of 1.04 Gb and a contig N50 of 18.09 Mb using Canu and wtdbg ( Table 1).
The genome quality was assessed by illumina data using BWA. The results showed that 98.29% of reads mapped into the draft genome, while 96.25% of reads properly mapped into the genome. The genome was aligned against the CEGMA database, which was constructed by a set of 458 core eukaryotic genes (CEGs) that are present in a wide range of taxa. A total of 447 (97.60%) CEGs were mapped ( Figure 2B). The genome was then aligned to a reference gene set from the BUSCO database, which was constructed from 20 fish species, consisting of 4584 genes. A total of 4351 (94.92%) genes were mapped completely.

Annotation
A total of 1,618,118 repeat sequences were predicted with 380.56 Mb, which account for 37.55% of the humpback grouper genome (Table S1). Of them, 2184 microsatellite sequences with 1,738,061 bp were detected, which account for 0.17% of the humpback grouper genome.
For non-coding RNAs, a total of 490 miRNA, 530 rRNA and 1239 tRNA were predicted. For protein-coding genes, a total of 26,037 genes were predicted (Table S2); the total length and average length of these genes were 443. 64

Evolution analyses
The orthogroups of eight fish species were predicted using Orthofinder software. A total of 23,448 orthogroups were detected in all species. A total of 20,971 orthogroups were predicted in the humpback grouper. Of them, 25 species-specific orthogroups and 65 species-specific genes were detected ( Figure 3A). Parts of these genes could be annotated and associated with myofiber structure and immunization. The number of shared singlecopy orthogroups is 5066, which was applied in the establishment of the phylogenetic tree.
The phylogenetic tree from RAxML software was consistent with that from the IQ-tree software ( Figures 3B and S1): the humpback grouper and other Epinephelus species clustered into one branch with high bootstrap value (≥85). The giant grouper is the most closely related species to the humpback grouper with the shortest divergence time (3.22~16.30 Mya) compared to other grouper species. Then, the two groupers were clustered with the orangespotted grouper, brown-marbled grouper, kelp grouper, red-spotted grouper and leopard coral grouper in turn ( Figure 2B). The divergence time between the humpback grouper and all the Epinephelus species was less than 16.55 Mya, while the divergence time between Epinephelus species and the leopard coral grouper reached 16.20-92.98 Mya ( Figure 3C). The results suggested that the humpback grouper may diverge from Epinephelus species.
The collinearity analyses were carried out using MCScanX. There was high collinearity between the humpback grouper and giant grouper ( Figure 4A). However, there was a gap in chromosome 6 of the humpback grouper compared to the most closely related Epinephelus species (Figures 4A and S2). Most of the genes located in 29,818,799~43,791,507 of the giant grouper genome were loosed in the humpback grouper ( Figure 4B). Based on KEGG enrichment analysis ( Figure 4C), the missed genes were mainly involved in immunity (IL1s, KRABs and JAK), substance metabolism (UGTs and GSTs) and MAPK signal transduction (CACNAs, FGF, EREG, MAPK1/3, MAPKAPK5 and MKP).

Discussion
The humpback grouper, the only species in the Cromileptes genus of Epinephelidae, possesses special morphologies compared with other groupers. Thus, its speciation, evolution and phylogenetic position has attracted much attention. The high-quality chromosomelevel genome of the humpback grouper provides an important foundation for the analyses of its origin and evolution.
In the research, a chromosome-level genome of the humpback grouper was assembled using PacBio sequencing and high-throughput chromatin conformation capture (Hi-C) technology. The heterozygosity of the genome was 0.09%. Its heterozygosity is lower than other grouper species, such as 0.375% in red-spotted grouper [7], 0.42% in leopard coral grouper [9] and 0.35% in brown-marbled grouper [6], which implies that the genetic diversity of the humpback is lower than other grouper species. The assembled genome was 1.013 Gb with 283 contigs, of which, a total of 143 contigs with 1.011 Gb in size were correctly anchored into 24 chromosomes. The genome size is similar with groupers in Epinephelus, such as 1.054 Gb in giant grouper [8], 1.047 Gb in brown-marbled grouper [6], 1.135 Gb in red-spotted grouper [7] and 1.08 Gb in kelp grouper [10].
The percentage of repeat contents in the humpback grouper was 37.55%, which was slightly higher than 34.6% of leopard coral grouper [9] and lower than 43.02% of red-spotted grouper [7], 41.1% of giant grouper [8] and 43.16% of brown-marbled grouper [6]. For protein-coding genes, a total of 26,037 protein-coding genes were predicted, of them, 25,243 (96.95%) genes could be functionally annotated. The results showed a high annotation rate.
The specific genes were predicted using Orthofinder software. A total of 65 specific genes in the humpback grouper were obtained. These genes were mainly associated with myofiber structure and immunity, which may be involved in the growth and disease resistance of the humpback grouper. A total of 5066 shared single-copy orthogroups were obtained in the seven grouper species and the outgroup, which was higher than other research that used more outgroups [6,7,9]. The application of more single-copy orthogroups made the phylogenetic analysis more accurate. Based on the phylogenetic analyses of the humpback grouper and other Epinephelus species clustered into one branch with high bootstrap value, the giant grouper is the most closely related species of humpback grouper with a divergence time of 3.22~16.30 Mya. The divergence time between the humpback grouper and all the Epinephelus species was less than 16.55 Mya. Similar conclusions have been reported in several studies. For example, in the phylogenetic tree constructed by mitochondrial genome, the humpback grouper was clustered with the giant grouper as one clade, then clustered with all Epinephelus species as one clade [3]; in the phylogenetic tree based on cytochrome b gene, the humpback grouper was also clustered with the Epinephelus species as one clade [4]; and in the phylogenetic tree constructed by single-copy orthogroups, the humpback grouper was firstly clustered with the brownmarbled grouper, then was clustered with Epinephelus species [6].Though there were slight differences in topological structure, these results confirmed that Cromileptes originated from the Epinephelus genus.
There was high collinearity between the genomes of the humpback grouper and Epinephelus species. However, we found a gap in chromosome 6 which spanned 29,818,7994 3,791,507 bp in the humpback grouper genome compared to the giant grouper. The result was also observed in the collinearity analysis between the brown-marbled grouper and humpback grouper [6]. Missed genes in the gap were mainly involved in immunity, substance metabolism and the MAPK signal pathway. Recently published research also reported that there were expansions of gene families involved in immunity in the brownmarbled grouper compared with the humpback grouper [6]. The change of genes involving immunity might induce a difference in disease resistance between the two species. For missed genes involved in immunity, interleukin 1s (IL1s) play important roles in innate inflammation through stimulating thymocyte proliferation and B-cell maturation and proliferation [55]. Tyrosine protein kinase (JAK) is involved in various processes such as metabolism, immunity and cell cycle control [56][57][58][59]. For substance metabolism, glucuronosyltransferases (UGTs) are essential factors in the elimination and detoxification of drugs and metabolism of endogenous and xenobiotics substances [60][61][62]; glutathione S transferases (GST) as detoxification enzymes could catalyze a combination of glutathione with electrophilic groups of substances, which play important roles in resistance to drugs, environmental pollutants and reactive oxygen species [63]. The MAPK signal pathway plays an important role in the regulation of cell proliferation [64]. The loss of the parts of genes involved in these biological processes might affect the disease resistance, stress tolerance and growth traits in the humpback grouper. In the missed fragment, the gene directly associated with morphology was not detected. The gene involved in the form of the humpback in the humpback grouper still need to be explored.

Conclusions
In the research, a high-quality chromosome-level genome of humpback grouper was assembled using PacBio sequencing and high-throughput chromatin conformation capture (Hi-C) technology, which will provide pivotal genomic information for future research of speciation, evolution and molecular-assisted breeding in humpback groupers. In addition, phylogenetic analysis, based on single-copy orthologues of grouper species, showed that the humpback grouper is included in the Epinephelus genus and clustered with the giant grouper to one clade with a divergence time of 9.86 Myr. Moreover, a gap in chromosome 6 of the humpback grouper was detected based on collinearity analysis; the missing genes were mainly associated with immunity, substance metabolism and the MAPK signal pathway. The loss of the parts of genes involved in these biological processes might affect the disease resistance, stress tolerance and growth traits in the humpback grouper.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10.3 390/genes12121873/s1, Figure S1: The phylogenetic tree of humpback grouper and other groupers using IQ-tree software, Figure S2: Collinearity analysis of Cromileptes altivelis and Epinephelus coioides, Table S1: The statistics of repeat information in humpback grouper genome, Table S2: Gene prediction of humpback grouper, Table S3: Construction statistics of predicted genes in humpback grouper genome.

Data Availability Statement:
The chromosome-level genome assembly of the humpback grouper was deposited in the GenBank database with Whole Genome Shotgun projects: JAAWWW000000000. The raw data of NGS sequencing, Pacbio, Hi-C and RNA-seq for genome assembly of the humpback grouper were reserved in the GSA (Genome Sequence Archive) database and the accession numbers were CRA002771, CRA002772, CRA002773 and CRA002774.

Conflicts of Interest:
The authors declare no conflict of interest.