Novel Gene Rearrangement and the Complete Mitochondrial Genome of Cynoglossus monopus: Insights into the Envolution of the Family Cynoglossidae (Pleuronectiformes)

Cynoglossus monopus, a small benthic fish, belongs to the Cynoglossidae, Pleuronectiformes. It was rarely studied due to its low abundance and cryptical lifestyle. In order to understand the mitochondrial genome and the phylogeny in Cynoglossidae, the complete mitogenome of C. monopus has been sequenced and analyzed for the first time. The total length is 16,425 bp, typically containing 37 genes with novel gene rearrangements. The tRNA-Gln gene is inverted from the light to the heavy strand and translocated from the downstream of tRNA-Ile gene to its upstream. The control region (CR) translocated downstream to the 3’-end of ND1 gene adjoining to inverted to tRNA-Gln and left a 24 bp trace fragment in the original position. The phylogenetic trees were reconstructed by Bayesian inference (BI) and maximum likelihood (ML) methods based on the mitogenomic data of 32 tonguefish species and two outgroups. The results support the idea that Cynoglossidae is a monophyletic group and indicate that C. monopus has the closest phylogenetic relationship with C. puncticeps. By combining fossil records and mitogenome data, the time-calibrated evolutionary tree of families Cynoglossidae and Soleidae was firstly presented, and it was indicated that Cynoglossidae and Soleidae were differentiated from each other during Paleogene, and the evolutionary process of family Cynoglossidae covered the Quaternary, Neogene and Paleogene periods.


Introduction
Flatfishes (Pleuronectiformes) are unique animals with both eyes moved on one side of the body through asymmetrical development. Furtherly, according to the traditional morphological opinions, the tonguefishes (Cynoglossidae and Soleoidae) are the most specified and have an advanced classification in Pleuronectiformes [1,2]. Cynoglossus monopus belongs to Cynoglossidae in suborder Soleoidei, distributed from the Malay Archipelago to the Indian Ocean, including the west and northward along the South China Sea [1,2]. It is a small benthic fish with ctenoid scale covered on both-side, two lateral line on ocular-side and absent lateral line on blind-side. Especially, the small pedunculate eyes are the distinct diagnosis in C. monopus from other Cynoglossus species.
Traditionally, suborder Soleoidei includes three families: Achiridae, Cynoglossidae and Soleidae [3], among which Achiridae was thought to be separated from Soleidae and represented the primitive sister group to Soleidae and Cynoglossidae [4]. The Cynoglossidae was believed to be derived from the Soleidae as well, and have a closer relationship with Soleidae than Achiridae [5]. In morphology, both eyes of the species in Cynoglossidae and Soleidae turned to the left-side and the right-side, respectively, which is the principle basis to distinguish these two families [1]. Within Cynoglossidae, there are more than 140 species in three genera of two subfamilies. The genera Cynoglossus and Paraplagusia were sister groups belonging to subfamily Cynoglossinae, while Symphurus was the only genus in subfamily Symphurinae [3].
The mitochondrial DNA (mtDNA) is a double-stranded circular DNA molecule. Similar in composition and structure to most vertebrates, it is 15-20 kb in length, containing 13 protein-coding genes (PCGs), 2 ribosomal RNA genes (rRNA), 22 transfer RNA genes (tRNA), a light-strand replication origin (O L ), and a control region (CR) that possesses cis-regulatory elements [6,7]. Compared to the nuclear genome, the mitochondrial genome shares maternal inheritance, large copy number, stable gene composition, conserved gene arrangement, and high evolutionary rate [6,8]. Therefore, mitogenome has been widely used to infer phylogenetic relationships and population genetics in animal [9][10][11]. Furthermore, the gene rearrangements have been reported in various vertebrates [11][12][13][14][15][16], but the organization in most fish mitogenomes is generally considered quite conserved [17,18]. However, we analyzed the complete mitogenome sequence of C. monopus and found a novel gene order in this study. Although there are some phylogenetic studies focusing on genus Cynoglossus, the evolutionary histories of Cynoglossidae and Soleidae are still remain unclear since very few reports concentrated on the divergence time among them (Table 1). To fill the gap in genetic information, this study analyzes the phylogeny and evolutionary histories of Cynoglossidae and Soleidae based on the complete mitogenome of C. monopus and all available mitogenomic data of these two families.

Genome Organization and Nucleotide Composition
The complete mitochondrial genome of C. monopus is 16,425 bp in length (GenBank accession number MT798589), within the range of other reported Pleuronectiformes mitogenomes from 15,973 bp (Kareius bicoloratus) to 18,369 bp (Cynoglossus trigrammus). This mitogenome contains 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, the origin of light-strand replication (O L ) and a control region (CR) that possesses cis-regulatory elements ( Figure 1, Table 2). Except ND6 and eight tRNA genes encoded on the Light-strand, others are encoded on the Heavy-strand. The tRNA-Gln gene was inverted from the L-strand position to the H-strand in the other position. This result is consistent with the findings of previous studies where species in subfamily Cynoglossinae have been found to have large-scale gene rearrangements, and a unique gene order, CR-Gln-Ile-Met, which is different from the typical gene order of CR-Phe-12S-Val-16S-Leu1-ND1-Gln-Ile-Met [19][20][21][22][23][24][25][26][27].   The overall base composition is 30.80% A, 24.04% C, 14.77% G, and 30.39% T, with a high AT content (61.19%). The AT-skew and GC-skew of the C. monopus mitogenome are 0.01 and −0.24, respectively ( Table 3). C. monopus showed a higher AT content (63.19%) in 16S rRNA gene than the 12S rRNA (55.31%). The control region is 736 bp in length with the rich AT (70.92%) and poor G (11.41%) content. Moreover, the AT content of 13 PCGs ranged from 54.21% (ND4L) to 66.67% (ATP8). The AT content of tRNAs is 61.10% in average, while the CG content is 38.90%.

Protein-Coding Genes (PCGs), Transfer RNAs and Ribosomal RNAs
In the mitogenome of C. monopus, except for the ND3, started with ATT and COII terminated with T, most PCGs have typical initiation codons (ATG or GTG) and termination codons (TAA or TAG) ( Table 2). The size of 13 PCGs ranged from 165 bp (ATP8) to 1854 bp (ND5), and the total length of PCGs is 11,422 bp, similar to other Cynoglossidae species ( Figure 2B). The GC-skews of the 13 PCGs were all negative and the majority of the AT-skew values were negative, similarly. The AT skews of ND3 and ND6 were the lowest and the GC-skews of COI and COII were the highest (Table 3)

Mitochondrial Gene Codon Usage and Skewness
The amino acids were utilized by either two or four different codons, respectively. Ile was the most frequently used, while Met and Trp were the least frequently used ( Figure 3A). In addition, the relative synonymous codon usage (RSCU) analysis indicated that Arg (AGG, AGA), Pro (CCC) and Ala (GCU) were the most frequent, and Ser (UCG), Thr (ACG), Pro (CCG) and Lys (AAG) were rare.
Moreover, the AT-skew and GC-skew in the mitogenome of the subfamily Cynoglossinae were reflected in the codon usage consistently ( Figure 3B). The RSCU values indicated that codons with A or U in the third position were more frequent than C or G. Additionally, the GC-skew values are negative in the available mitogenomes of Cynoglossidae species (Figure 2A). In contrast, most of the AT skews are positive, except for C. itinus. Int   The 22 tRNA genes of C. monopus ranged from 65 bp (tRNA-Cys) to 76 bp (tRNA-Lys), with a total of 1540 bp. The 12S rRNA (946 bp) and 16S rRNA genes (1702 bp) of the C. monopus mitogenome are located in the typical position between tRNA-Phe and tRNA-Leu (UUR), and separated by tRNA-Val, with a high AT content of 60.42%. Additionally, the origin of light-strand is 24 bp in length between tRNA-Asn and tRNA-Cys.

Mitochondrial Gene Codon Usage and Skewness
The amino acids were utilized by either two or four different codons, respectively. Ile was the most frequently used, while Met and Trp were the least frequently used ( Figure 3A). In addition, the relative synonymous codon usage (RSCU) analysis indicated that Arg (AGG, AGA), Pro (CCC) and Ala (GCU) were the most frequent, and Ser (UCG), Thr (ACG), Pro (CCG) and Lys (AAG) were rare.

Gene Rearrangement
Similar to other Cynoglossus and Paraplagusia species which mitogenome available, the CR of C. monopus is translocated downstream to the place between ND1 and tRNA-Gln instead of the typical location between tRNA-Pro and tRNA-Phe (Figure 1). The translocation left a functionally undefined 24 bp trace fragment in the original CR position. Furthermore, the tRNA-Gln gene (Q) translocated from the downstream of tRNA-Ile gene (I) to its upstream with inverted encoding direction (Q'). The result was the formation of the Q'-I-M gene order, which is different from the typical I-Q-M gene order in most vertebrates (Figure 4a).
Among several mitochondrial gene rearrangement models [46][47][48], the intramitochondrial Moreover, the AT-skew and GC-skew in the mitogenome of the subfamily Cynoglossinae were reflected in the codon usage consistently ( Figure 3B). The RSCU values indicated that codons with A or U in the third position were more frequent than C or G. Additionally, the GC-skew values are negative in the available mitogenomes of Cynoglossidae species (Figure 2A). In contrast, most of the AT skews are positive, except for C. itinus.

Gene Rearrangement
Similar to other Cynoglossus and Paraplagusia species which mitogenome available, the CR of C. monopus is translocated downstream to the place between ND1 and tRNA-Gln instead of the typical location between tRNA-Pro and tRNA-Phe (Figure 1)

Phylogenetic Analyses
Both BI and ML analyses produced almost identical topologies with similar branch lengths. Most clades were strong supported by high bootstraps (ML) and posterior probabilities (Bayesian) values ( Figure 5). Contrary to the traditional classification [36,39,40], the molecular phylogenetic tree showed a clade with three Soleidae species (P. pavoninus, A. kobensis and L. melanospilos) clustered to Cynoglussidae as a sister group rather than other Soleidae species ( Figure 5). This suggests that Soleidae is not a monophyletic group. However, the insufficient species could cause questionable phylogeneic results [49], and as Soleidae is a large family including 32 genera and at least 180 species [50], more species will be necessarily needed for further investigation. Furthermore, the most terminal branch relationship within Cynoglossus is consistent with traditioanal taxon. This supports the opinion that lateral line number and scale characteristics are valuable morphological diagnoses in classification [1].
Among Cynoglussidae, compared to other Cynoglossus species, C. monopus had the closest phylogenetic relationship to C. puncticeps. Traditionally, they were also classified into the subgenus Cynoglossoides by sharing the most similarities with both sides covered with ctenoid scale, two lateral lines on the ocular-side and an absent lateral line on the blind-side [1]. However, the clade of C. monopus and C. puncticeps clustered to the Paraplagusia rather than other Cynoglossus species. This Among several mitochondrial gene rearrangement models [46][47][48], the intramitochondrial recombination is the most probable mechanism to explain the rearrangement events in the mitogenome of C. monopus based on the principle of parsimony. The hypothesized intermediate steps are as follows. Firstly, the whole Control region (likely carrying some neighbour sequences) translocated to the downstream of ND1 gene, and left a duplicated partial Control region in the original position. At the same time, the tRNA-Gln gene was inversely translocated to the upstream of tRNA-Ile with some neighbour sequences and left a duplicated partial tRNA-Gln fragment between tRNA-Ile and tRNA-Met. The rearrangement formed a new ND1-CR-Q'-I-M region in the mitogenome of C. monopus with unfunctional gene fragments connecting each gene (Figure 4b). Secondly, after a rapid deletion process under the strong selective pressure, the unfunctional gene fragments in each gene junction left a 24 bp trace fragment in the original CR position and 5 and 6 bp intergenic spacers between Q'-I and I-M gene junctions, respectively (Figure 4c,d).

Phylogenetic Analyses
Both BI and ML analyses produced almost identical topologies with similar branch lengths. Most clades were strong supported by high bootstraps (ML) and posterior probabilities (Bayesian) values ( Figure 5). Contrary to the traditional classification [36,39,40], the molecular phylogenetic tree showed a clade with three Soleidae species (P. pavoninus, A. kobensis and L. melanospilos) clustered to Cynoglussidae as a sister group rather than other Soleidae species ( Figure 5). This suggests that Soleidae is not a monophyletic group. However, the insufficient species could cause questionable phylogeneic results [49], and as Soleidae is a large family including 32 genera and at least 180 species [50], more species will be necessarily needed for further investigation. Furthermore, the most terminal branch relationship within Cynoglossus is consistent with traditioanal taxon. This supports the opinion that lateral line number and scale characteristics are valuable morphological diagnoses in classification [1].

Divergence Time Analyses
The advent of the phylogenomic era has significantly improved our understanding of the taxonomy and phylogenetic relationships of many animals [51]. Compared with other flatfish studies, the divergence time estimations within or between families Cynoglossidae and Soleidae were additionally conducted based on two calibration constraints in this study. The divergence time between Cynoglossidae and Soleidae could be occurred at about  Mya). Furthermore, the divergence time of the subfamily Cynoglossinae could be dated back to 20.92 Mya (17.11-28.44 Mya) earlier than the fossil record of C. leuchsi (lower and middle Miocene) [5].
The Chronogram for the 34 species of Pleuronectiformes covered three geological epochs, including the Quaternary, Neogene, and Paleogene periods ( Figure 6). It was indicated that family Soleidae (33.  Among Cynoglussidae, compared to other Cynoglossus species, C. monopus had the closest phylogenetic relationship to C. puncticeps. Traditionally, they were also classified into the subgenus Cynoglossoides by sharing the most similarities with both sides covered with ctenoid scale, two lateral lines on the ocular-side and an absent lateral line on the blind-side [1]. However, the clade of C. monopus and C. puncticeps clustered to the Paraplagusia rather than other Cynoglossus species. This suggests that the relationship between Cynoglossus and Paraplagusia is more complex than expected and needs further study.

Divergence Time Analyses
The advent of the phylogenomic era has significantly improved our understanding of the taxonomy and phylogenetic relationships of many animals [51]. Compared with other flatfish studies, the divergence time estimations within or between families Cynoglossidae and Soleidae were additionally conducted based on two calibration constraints in this study. The divergence time between Cynoglossidae and Soleidae could be occurred at about 45.15 Mya (37.47-49.99 Mya). Furthermore, the divergence time of the subfamily Cynoglossinae could be dated back to 20.92 Mya (17.11-28.44 Mya) earlier than the fossil record of C. leuchsi (lower and middle Miocene) [5].
The Chronogram for the 34 species of Pleuronectiformes covered three geological epochs, including the Quaternary, Neogene, and Paleogene periods ( Figure 6). It was indicated that family Soleidae (33.

Specimen Collection and DNA Extraction
A single specimen of C. monopus was collected from Sanya (E 108°56′, N 18°09′), Hainan Province. The voucher specimen (Voucher No. HNSY2010060432) was deposited in the College of Marine Sciences, South China Agricultural University, Guangzhou, China (SCAU). Animal experiments Figure 6. Chronogram for the 32 species of Pleuronectiformes with P. erumei and P. olivaceus as outgroups based on the concatenated nucleotide sequences of the 12 PCGs (except ND6 and the third codon positions) and 2 rRNA genes using BEAST analysis. Numbers near the nodes indicate the average estimated divergence time estimated in Mya, and the 95% confidence intervals for each node are shown in blue bars.

Specimen Collection and DNA Extraction
A single specimen of C. monopus was collected from Sanya (E 108 • 56 , N 18 • 09 ), Hainan Province. The voucher specimen (Voucher No. HNSY2010060432) was deposited in the College of Marine Sciences, South China Agricultural University, Guangzhou, China (SCAU). Animal experiments were conducted in accordance with the guidelines and approval of the Animal Research and Ethics Committees of SCAU. Genomic DNA was extracted from the muscle of C. monopus according to the standard phenol-chloroform procedure [52]. The data analysis method is based on the previous study [53].

PCR Amplification and Sequencing
To ensure a sufficient amount of DNA for the amplification and sequencing of these small fishes, the parameters of the LA-PCR reactions were mostly in accordance with the manufacturer's recommendations. PCR products were purified using the gel purification kit (Invitrogen) after gel-cutting (1.5% TBE agarose). Purified PCR products were sequenced directly on an ABI 3730 automated sequencer (Life Technologies Holdings Pte Ltd, Tuas, Singapore) with ABI PRISM BigDye Terminators v3.0 Cycle Sequencing (ABI) using the primer-walking strategy. The eight fragments were separated from the complete mitogenome of C. monopus, with universal primer for Cynoglossidae mitogenome.

Sequene Analysis
Sequence data were analyzed and compiled to create the complete genome using the SeqMan program from Lasergene soft package (DNASTAR, Madison, WI, USA), and manually adjusted in a few cases. The complete mitogenome was annotated using the software of Sequin v16.0 (National Library of Medicine, Bethesda, MD, USA). Mitochondrial tRNA genes and their secondary structures were obtained by ARWEN v1.2 [54], and identified by tRNAscan-SE Search Server v2.0 (Washington University School of Medicine, St Louis, MO, USA) [55] using default search mode, then anticodons were further confirmed. Annotation and accurate boundary determination of protein-coding and ribosomal RNA genes were first performed by NCBI-BLAST searches, and then by alignment and manual comparisons with the other released reference mitogenomes of Cynoglossidae species using DNAMAN v6.0 (Lynnon Biosoft, San Ramon, CA, USA).
The complete mitogenome of C. monopus was uploaded to GenBank with accession number MT798589. The graphical genome map of the C. monopus mitogenome was drawn using CGView Server v1.0 [56]. The base composition, codon usage and RSCU values were obtained using MEGA 7.0 (Tokyo Metropolitan University, Tokyo, Japan) [57]. Strand asymmetry was estimated using the following formulas by Perna and Kocher (1995) [58]:

Phylogenetic and Divergence Time Analyses
Phylogenetic analyses were conducted using 19 Cynoglossidae species and 13 Soleidae species with P. erumei and P. olivaceus as outgroups. All sequences were available in GenBank (27/7/2020). We aligned DNA sequences of 12 protein-coding genes (except ND6) and two rRNA genes in the 34 species using the MAFFT program with the default parameters [59]. The alignments of PCGs (except ND6) excluded the start codon and the stop codon. Then, ambiguously aligned fragments of two alignments were removed using Gblocks [60], and exported two gblocks alignments. We used a dataset comprised of concatenated the gblocks of 12 protein-coding genes (the first and second codon positions, except ND6) and 2 rRNA genes.
Then, the data were divided into three pre-defined partitions for the best partitioning scheme using PartitionFinder 2.0 (Macquarie University Genes to Geosciences Centre, North Ryde, NSW, Australia) [61], with the greedy algorithm. ModelFinder [62] lugin integrated into PhyloSuite v1.2.1 (Bio-Transduction Lab, Wuhan, China) [63] was used to select the best-fit partition model (Edge-unlinked). The best-fit model according to BIC: GTR + F + G4 was selected as the optimal model for the glocks of first codons of PCGs (except ND6) and rRNAs, respectively, whereas TVM + F + G4 was chosen for the gblocks of two rRNA genes. Additionally, phylogenetic analyses were performed using the Bayesian analyses (BI) and Maximum Likelihood (ML) methods [64,65]. ML analysis was inferred using IQ-TREE v1.6.2 [66] plugin integrated into PhyloSuite v1.2.1 under Edge-linked partition model for 10,000 ultrafast [67] bootstraps, approximate Bayes test [64], as well as the Shimodaira-Hasegawa-like approximate likelihood-ratio test [68]. Additionally, Bayesian inference with partition model was conducted in MrBayes 3.2.6 (University of California, La Jolla, San Diego, CA, USA) [65] under the partition model (two parallel runs, 2,000,001 generations), in which the initial 25% of sampled data were discarded as burn-in with default settings and 5 × 106 metropolis-coupled Markov chain Monte Carlo (MCMC) generations. The iTOL dataset files produced by PhyloSuite were then used to visualize and annotate the phylograms and gene orders in iTOL [69].
The evolutionary analysis was inferred by BEAST v1. 10.4 (open source under the GNU lesser general public license) using Bayesian Information Criterion based on the two gblocks (the first and second codon positions of PCGs, except ND6) and two rRNA genes of 34 Pleuronectiformes species [70]. The divergence times were presented in the Time Tree database (http://www.timetree.org/) [71] and fossil-based comparative analyses [72]. The time tree was computed using two calibration constraints: the most recent common ancestor (MRCA) of genus Cynoglossus was estimated to be lower and middle Miocene period based on fossilized C. leuchsi and the MRCA of both Cynoglossidae and Soleidae was estimated to be at the lower Eocene at least 45 Mya [5]. Molecular dating involved a Birth-Death process as the tree prior, and an uncorrelated relaxed clock as the best model. The chains of 1 × 10 8 samples were run for the MCMC analysis, and the 10% of all samples was burn-in using TreeAnnotator. Tracer v1.7.1 was used to confirm the output [73]. FigTree v1.4.3 was used to edit the results.

Conclusions
This study suggests that mitochondrial gene rearrangement in Cynoglossidae is a single originated evolutionary event, which occurred in the common ancestor of Cynoglossus and Paraplagusia before at least 22.82 Mya (17.11-28.44 Mya). The highly similar rearranged gene order in all available Cynoglossus and Paraplagusia species [18,[35][36][37][38][39][40][41][42][43][44][45] inferred that novel gene order possesses some selective advantage in this group, then kept it conserved in the whole lineage. Even the details of the molecular mechanism are still unclear; the intramitochondrial recombination is the most probable model to explain the process of gene rearrangement in this group based on the principle of parsimony.
The phylogenetic relationships constructed by ML and BI method are consistent. The time tree covers three geological epochs, including the Quaternary, Neogene, and Paleogene periods. The divergence times show that C. monopus and C. puncticeps begin to differentiate from other species at about 14.29 Mya (6.43-22.04 Mya) within the Neogene period. Tonguefish is a highly specialized body. The traditional morphological classifications of tonguefish are often controversial due to the unstable diagnosis caused by its asymmetrical development. The mitochondrial rearrangement is a helpful gene marker to reconstruct the phylogenetic relationship in this group with mitogenome and other molecular data.
Author Contributions: C.W. analyzed the data, prepared the figures and tables, and wrote the paper. H.C. collected the samples and revised the drafts of the paper. S.T. and C.Y. performed the experiments, analyzed the data. X.C. conceived and designed the experiments, supervised the work, and polished the paper. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.