The New Mitochondrial Genome of Hemiculterella wui (Cypriniformes, Xenocyprididae): Sequence, Structure, and Phylogenetic Analyses

Hemiculterella wui is an endemic small freshwater fish, distributed in the Pearl River system and Qiantang River, China. In this study, we identified and annotated the complete mitochondrial genome sequence of H. wui. The mitochondrial genome was 16,619 bp in length and contained 13 protein coding genes (PCGs), two rRNA genes, 22 tRNA genes, and one control region. The nucleotide composition of the mitochondrial genome was 29.9% A, 25.3% T, 27.4% C, and 17.5% G, respectively. Most PCGs used the ATG start codon, except COI and ATPase 8 started with the GTG start codon. Five PCGs used the TAA termination codon and ATPase 8 ended with the TAG stop codon, and the remaining seven genes used two incomplete stop codons (T and TA). Most of the tRNA genes showed classical cloverleaf secondary structures, except that tRNASer(AGY) lacked the dihydrouracil loop. The average Ka/Ks value of the ATPase 8 gene was the highest, while the average Ka/Ks value of the COI gene was the lowest. Phylogenetic analyses showed that H. wui has a very close relationship with Pseudohemiculter dispar and H. sauvagei. This study will provide a valuable basis for further studies of taxonomy and phylogenetic analyses in H. wui and Xenocyprididae.


Introduction
Mitochondria are semi-autonomous organelles that exist widely in eukaryotic cells and possess their genome (called the mitochondrial genome) [1].The mitochondrial genome is a covalently closed circular double-stranded DNA molecule that can independently encode some proteins for many biological processes [2].The mitochondrial genomes of fish usually contain 37 genes, namely 13 protein coding genes (PCGs), 2 ribosomal RNA genes (rRNAs), and 22 transfer RNA genes (tRNAs), in addition to a control region (CR) [3,4].The CR is a non-coding region with the largest variation in the sequence and length of the entire mitochondrial genome and is generally found between the tRNA Pro and tRNA Phe genes [5].Because the mitochondrial genome has the advantages of simple structure, small molecular weight, self-replication, strict maternal inheritance, and a fast evolution rate, it has been widely used in fish phylogeny, species identification, population genetics, adaptive evolution, etc. [5,6].
The Xenocyprididae is one of the most species-rich families of Cypriniformes, comprising approximately 160 species belonging to 45 genera [7].H. wui (Wang, 1935) is an endemic fish that is distributed in the Pearl River system, Poyang Lake system, and the Qiantang River system, China, and is used as a small economic species in local areas.The main characteristics of H. wui are that absence of spinous rays in the dorsal fin and a ventral ridge from the base of the pelvic fin to the anus [8].The common name of H. wui is "LanDao" in China.However, little is known about H. wui, and previous research has focused mainly on resource investigation.In this study, we first sequenced, annotated, and characterized the complete mitochondrial genome sequence of H. wui.A preliminary analysis of its genetic composition and structural characteristics was conducted to provide molecular insights into the taxonomic and phylogenetic structure of the family Xenocyprididae.On this basis, combined with data from the NCBI database, the phylogenetic relationship of Xenocyprididae in this family was analyzed.Our results reveal relevant information about the mitochondrial genomes of H. wui, as well as the evolutionary relationships of the Xenocyprididae, which will provide a valuable basis for further studies of the evolution of Hemiculterella and Xenocyprididae.

Sample Collection, DNA Extraction, and Illumina Sequencing
The fish samples were collected from the Duliujiang River, Guizhou Province, China, and preserved in anhydrous ethanol and stored at −20 • C. Genomic DNA was extracted from the muscle of a single specimen using the DNeasy Blood & Tissue Kit (Qiagen Inc., Hilden, Germany) according to the manufacturer's protocol.Next-generation sequencing was performed at the DNA Stories Bioinformatics Center (Chengdu, Sichuan, China).Library construction and Illumina sequencing were carried out according to Zhang et al. [9].

Mitochondrial Genome Organization of H. wui
After assembly, annotation, and analysis, the mitochondrial genome (16,619 bp) of H. wui was determined in this study (GenBank Accession No.OR574832).The size of the complete mitochondrial genome of H. wui was almost the same as that of H. sauvagei Warpachowski, 1888 (16,618 bp).The mitochondrial genome of H. wui consisted of 13 PCGs, 2 rRNAs, 22 tRNAs, and a control region 'D-loop' (Figure 1; Table 2).Only 9 genes (tRNA Gln , tRNA Ala , tRNA Asn , tRNA Cys , tRNA Tyr , tRNA Ser (UCN) , tRNA Glu , tRNA Pro , and ND6) were encoded in the light strand, and the other genes were encoded in the heavy strand.The gene composition and order in H. wui were the same as in a typical fish mitochondrial genome [3][4][5].There were 13 intergenic spacers found in the H. wui mitochondrial genome.The intergenic spacers varied in length from 1 bp to 32 bp (Table 2).The longest intergenic spacer was located between tRNA Asn and tRNA Cys (32 bp).Five gene overlaps were found in the H. wui mitochondrial genome.The minimum overlap region was located between tRNA Thr and tRNA Pro (1 bp), and the maximum overlap region was located between ATPase 8 and ATPase 6, ND4L and ND4 (7 bp).Mitochondrial gene overlap and gene spacers were common phenomena in teleost species [9,21].There were 13 intergenic spacers found in the H. wui mitochondrial genome.The intergenic spacers varied in length from 1 bp to 32 bp (Table 2).The longest intergenic spacer was located between tRNA Asn and tRNA Cys (32 bp).Five gene overlaps were found in the H. wui mitochondrial genome.The minimum overlap region was located between tRNA Thr and tRNA Pro (1 bp), and the maximum overlap region was located between ATPase 8 and ATPase 6, ND4L and ND4 (7 bp).Mitochondrial gene overlap and gene spacers were common phenomena in teleost species [9,21].
The total nucleotide composition of the mitochondrial genome was 29.9% A, 25.3% T, 27.4% C, and 17.5% G, respectively, with a slight AT bias (55.2%) (Table 3).It was similar to other fish species in the Xenocyprididae family (Table 1).The highest A + T content was in the noncoding control region (64.4%) and the lowest A + T content was in the first codon position of PCGs (47.5%).The AT skew value was positive (0.084) in the mitochondrial genome of H. wui, while the GC skew value was negative (−0.22).It showed a preference for A and C bases compared to that for T and G bases.

Protein Coding Genes and Codon Usage
There were 13 PCGs (ND1, ND2, ND3, ND4L, ND4, ND5, ND6, COI, COII, COIII, ATPase 6, ATPase 8, and Cyt b) in the mitochondrial genome and their total size was 11,422 bp (Table 3).Only ND6 was encoded on the light strand, while the rest of the PCGs were located on the heavy strand.The A + T content (54.6%) of PCGs was higher than the G + C content (45.4%).Most PCGs used the ATG start codon, but COI and ATPase 8 started with GTG.The GTG start codon for COI was observed in many other fish species [3], but the use of GTG as the start codon in ATPase 8 was a rare phenomenon.The termination codon TAA was used by five genes (COI, ND1, ND4L, ND5, and ND6).TAG was the stop codon of the ATPase 8 gene.The remaining 7 genes used two incomplete stop codons (T and TA).Incomplete stop codons can be transformed into complete stop codons by post-transcriptional modification, such as polyadenylation [22].
The RSCU value was a measure of codon usage preference in the genome.The number of codons and the RSCU of 13 PCGs are presented in Table 4 and Figure 2. A total of 3804 amino acid triplets were used in the 13 PCGs.The most commonly used amino acids were Leu, followed by Ala, Thr, Ile, and Gly.The least commonly used amino acid was Cys.The most commonly used codon was CUA, followed by UUC, AUU, and GCC.The least frequent codon was CGU, not including stop codons.The mitochondrial genome of H. wui has two rRNAs (12S rRNA and 16S rRNA), both encoded in the heavy stand.They were close together in the genome, separated by a single tRNA.The 12S rRNA was located between tRNA Phe and tRNA Val with a length of 964 bp.The 16S rRNA was located between tRNA Val and tRNA Leu (UUR) with a length of 1690 bp.The A + T content of the two rRNA was 53.70%.The two rRNA genes had a positive AT skew value of 0.245 and a negative GC skew value of −0.046 (Table 3).55.8%.It showed a preference for AT bases.The AT skew and GC skew values were 0.029 and 0.056, respectively.
A + T content of the two rRNA was 53.70%.The two rRNA genes had a positive AT skew value of 0.245 and a negative GC skew value of −0.046 (Table 3).
Transfer RNA, also known as transfer ribonucleic acid, is often referred to as tRNA.As shown in Figure 3, we found that most tRNA genes had a typical cloverleaf secondary structure, except tRNA Ser(AGY) lacked the dihydrouracil loop (DHU loop).The tRNA genes varied in length from 68 bp (tRNA Cys ) to 76 bp (tRNA Leu (UUR) and tRNA Lys ) and were scattered across the mitochondrial genome.The sum of the A + T content of all tRNAs was 55.8%.It showed a preference for AT bases.The AT skew and GC skew values were 0.029 and 0.056, respectively.

Control Region
The control region, also known as the AT rich region or D-loop, was 934 bp in length.It was located between tRNA Pro and tRNA Phe , similar to a typical fish mitochondrial genome [3].The nucleotide composition was 32.9% A, 31.5% T, 14.9% G, and 20.8% C (Table 3).This region had the highest AT content in the entire mitochondrial genome.

Control Region
The control region, also known as the AT rich region or D-loop, was 934 bp in length.It was located between tRNA Pro and tRNA Phe , similar to a typical fish mitochondrial genome [3].The nucleotide composition was 32.9% A, 31.5% T, 14.9% G, and 20.8% C (Table 3).This region had the highest AT content in the entire mitochondrial genome.

Selection Analysis
Among the 13 PCGs of the Xenocyprididae, the average value of Ka/Ks of ATPase 8 was the highest, while the average value of Ka/Ks of COI was the lowest (Figure 4).It implies that ATPase 8 might evolve more rapidly than other mitochondrial protein coding genes.Even under different selection pressures, the evolutionary patterns of 13 PCGs in Xenocyprididae were similar to those of Sinocyclocheilus (Fang, 1936) fishes [23] and Labeoninae [9].Furthermore, the average Ka/Ks ratios of all PCGs were much lower than one, indicating that these genes were all under a strong purifying selection.

Selection Analysis
Among the 13 PCGs of the Xenocyprididae, the average value of Ka/Ks of ATPase 8 was the highest, while the average value of Ka/Ks of COI was the lowest (Figure 4).It implies that ATPase 8 might evolve more rapidly than other mitochondrial protein coding genes.Even under different selection pressures, the evolutionary patterns of 13 PCGs in Xenocyprididae were similar to those of Sinocyclocheilus (Fang, 1936) fishes [23] and Labeoninae [9].Furthermore, the average Ka/Ks ratios of all PCGs were much lower than one, indicating that these genes were all under a strong purifying selection.

Phylogenetic Analysis
We used a total of 75 species of Xenocyprididae to construct phylogenetic trees based on 13 PCGs in the mitochondrial genome, with C. carpio and G. rarus as outgroups.The two methods used to build the phylogenetic tree yielded similar topological structures (Figure 5).The phylogenetic trees showed that H. wui, P. dispar (Peters, 1881), and H. sauvagei, were clustered into a group (ML bootstrap value = 84%, Bayesian posterior probability = 0.5).The differences in external morphology between Pseudohemiculter Nichols & Pope, 1927 and Hemiculterella Warpachowski, 1888 are not obvious.The main difference is that the last unbranched dorsal fin ray of Pseudohemiculter is hard, whereas that of Hemiculterella is soft [7].The monophyly of Hemiculterella, Pseudohemiculter, and Hemiculter was not supported.More extensive sampling and multilocus markers are required to understand the robust phylogenetic relationships of Pseudohemiculter, Hemiculterella, Hemiculter, and related genera.

Phylogenetic Analysis
We used a total of 75 species of Xenocyprididae to construct phylogenetic trees based on 13 PCGs in the mitochondrial genome, with C. carpio and G. rarus as outgroups.The two methods used to build the phylogenetic tree yielded similar topological structures (Figure 5).The phylogenetic trees showed that H. wui, P. dispar (Peters, 1881), and H. sauvagei, were clustered into a group (ML bootstrap value = 84%, Bayesian posterior probability = 0.5).The differences in external morphology between Pseudohemiculter Nichols & Pope, 1927 and Hemiculterella Warpachowski, 1888 are not obvious.The main difference is that the last unbranched dorsal fin ray of Pseudohemiculter is hard, whereas that of Hemiculterella is soft [7].The monophyly of Hemiculterella, Pseudohemiculter, and Hemiculter was not supported.More extensive sampling and multilocus markers are required to understand the robust phylogenetic relationships of Pseudohemiculter, Hemiculterella, Hemiculter, and related genera.

Conclusions
In conclusion, we analyzed and described the characterization of the mitochondrial genome of H. wui and the phylogenetic relationship position in the Xenocyprididae, which will provide a valuable basis for further studies of H. wui and the evolutionary relationship of Xenocyprididae.

Figure 1 .
Figure 1.Circular map of the H. wui mitochondrial genome.

Figure 3 .
Figure 3. Predicted secondary structures of 22 tRNAs in the mitochondrial genome of H. wui.

Figure 3 .
Figure 3. Predicted secondary structures of 22 tRNAs in the mitochondrial genome of H. wui.

Figure 5 .
Figure 5.The phylogeny of H. wui with other species of Xenocyprididae based on the concatenated nucleotide sequences of 13 PCGs.The Maximum likelihood bootstrap values and Bayesian posterior probabilities are superimposed on each node.

Figure 5 .
Figure 5.The phylogeny of H. wui with other species of Xenocyprididae based on the concatenated nucleotide sequences of 13 PCGs.The Maximum likelihood bootstrap values and Bayesian posterior probabilities are superimposed on each node.

Table 1 .
Species information for phylogenetic analysis.

Table 2 .
Organization and characterization of the H. wui mitochondrial genome.

Table 3 .
List of the nucleotide composition, AT skew, and GC skew of the H. wui mitochondrial genome.