Complete Mitogenome of the Triplophysa bombifrons: Comparative Analysis and Phylogenetic Relationships among the Members of Triplophysa

In the last decade, the phylogenetic relationships within the genus Triplophysa have become controversial, due to a lack of molecular data. The mitochondrial genome plays a vital role in the reconstruction of phylogenetic relationships and in revealing the molecular evolution of bony fishes. Herein, we obtained the complete mitogenome of Triplophysa bombifrons via HiFi reads of the Pacbio Sequel II system and DNBSEQ short-reads. We compared all available mitogenomes of the Triplophysa genus and reconstructed the phylogeny of Nemacheilidae, based on the mitogenomes, using maximum likelihood (ML) methods. The results show that the complete mitogenome sequence of T. bombifrons was circular and 16,568 bp in length, including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA), 2 ribosomal RNA (rRNA), and a typical control region (D-loop). The most common start codons were ATG, except for cox1, and TAA/TAG were the stop codons for all PCGs. In total, 677 SNPs and 9 INDELs have been found by comparing the sequence divergence between this study and previous reports. Purity selection was found in all PCGs. Phylogeny was inferred by analyzing the 13 PCGs and the concatenated nucleotide sequences of 30 mitogenomes. The phylogenetic analyses based on the nucleotides of the 13 PCGs supported the assumption that the Triplophysa genus can be divided into 4 main clades and demonstrated that T. bombifrons and T. tenuis are closely related species for the first time. This study laid the foundation for further study on the mitogenome and phylogeny of Nemacheilidae fishes.


Introduction
As the world's largest clade of primary freshwater fishes, the order Cypriniformes is divided into two superfamilies: Cyprinoidea (carp-like fishes) and Cobitoidea (loach fishes) [1]. The genus Triplophysa (Cobitoidea: Nemacheilidae) is a species-rich group that is an important component of the ichthyofauna of the Qinghai-Tibetan Plateau (QTP). The rapid and persistent elevation of the QTP is considered a major reason for the origin and diversity of this genus. Due to its strong adaptability to extreme environments, species in the Triplophysa genus are widely distributed in the QTP. This genus represents an ideal system by which to address questions about past climatic and geological events and their impacts on current biodiversity. Due to the morphological plasticity of this genus, traditional taxonomy cannot accurately distinguish all species, particularly in the case of morphologically similar and related species [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].
The mitochondrion is a very important organelle in the eukaryocyte that exists in nearly all the bionts. Mitochondria are involved in energy metabolism, aging, apoptosis, and disease regulation [21]. Close circular double-stranded mitochondrial DNA represents a good molecular marker in systematic studies due to its simple structure, fast evolution, and high copy speed, along with its easy separation and purification. [21]. Over the last decade, more than 20 Triplophysa mitogenomes have been reported [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Although some groups of Triplophysa have been addressed phylogenetically, a comprehensive phylogenetic analysis has never been presented. T. bombifrons is a bony fish with a narrow distribution in China. On the one hand, water hubs, environmental pollution, overfishing, and other human activities have contributed to its endangered status. On the other hand, the endangerment of indigenous fishes has intensified under degraded environments such as alpine climatic conditions and salinized water features. One mitogenome of T. bombifrons has been reported, which was collected from Balkhash Lake in China [17]. According to the contents of the zoography and the distribution surveys over the last ten years, T. bombifrons only can be found in the upper reaches of the Yarkand River in the Kashgar area, Yurungkax River, and Keriya River in the Hotan area at present, and no distribution report for Balkhash Lake has been published in other literature sources. We have not as yet been able to source a sample of T. bombifrons from Balkhash Lake, and cannot access any voucher information from previous reports [17]. In this study, we acquired a sample of T. bombifrons from the Yurungkax River and confirmed that its morphological characteristics are consistent with the description given in the literature (Figure 1). We wanted to compare the variation in the mitochondrial genome sequence, structure, and gene content in these two mitogenomes of T. bombifrons (NC_027189 and this study). In their study, a phylogenetic tree was reconstructed with only eight Triplophysa species, which placed T. bombifrons as a closed species with T. strauchii. With the rapid development of sequence technology, sequence costs have dropped dramatically, and sequencing read length and accuracy continue to improve [22,23]. High-fidelity (HiFi) reads in single-molecule sequences overcome the disadvantages of short-read sequencing technologies and can thereby bring us more accurate mitogenome information [24].  In this study, we report the complete mitogenome of T. bombifrons, assembled with HiFi reads of the Pacbio Sequel II system and DNBSEQ short-reads. We have carried out a comprehensive analysis of Triplophysa mitogenomes and reconstructed the phylogeny relationships of the genus Triplophysa, aiming to contribute the mitogenomic data of Nemacheilidae for future phylogenetic studies of the Cypriniformes.

Ethical Approval
The sample collection and animal experiments were conducted according to the regulations and guidelines for the care and use of laboratory animals and were approved by the Animal Care and Use Committee of Tarim University (protocol code TDDKYXF20220316).

Experimental Fish and Sampling
One adult T. bombifrons specimen was collected via nets in the Yurungkax River (37 • 6 39.6" E, 79 • 54 46.8" N), in the Hotan district of the Xinjiang Uygur Autonomous Regions, China. Voucher specimens were deposited at Tarim University (accession number GYQ2022030001, Xinyue Wang, 120050007@taru.edu.cn). The species and gender identification were determined by examining the dissected gonads. Pectoral fin clips were preserved in 75% ethanol and stored at −80 • C before DNA isolation.

DNA Isolation, Library Preparation, and Sequencing
The total genomic DNA was extracted using the TIANamp Genomic DNA Kit (TIAN-GEN, Beijing, China). The HiFi Library was prepared according to the manufacturer's protocol. First, a 15 µg sample was selected and the SMRTbell ® Express Template Preparation Kit v2 was used to construct the SMRTbell library. The small DNA fragments were removed with BluePippin. The SMRTbell template was annealed with sequence primer, and the complex was bound by DNA polymerase. The library was sequenced on the Sequel II sequencing platform (Pacific Biosciences of California, Inc., Menlo Park, CA, USA). CCS (v.6.4.0) was used to generate the HiFi reads.
A total amount of 0.2 µg of DNA was used and the genomic DNA sample was fragmented into 350 bp fragments. The sequencing library was constructed following the manufacturer's recommendations. The 5 end of the library was phosphorylated and cyclized. The cyclized library was amplified by the rolling loop. Finally, the DNA nanospheres (DNB) were loaded into flowcell and then sequenced on the MGI DNBSEQ-T7 platform. In total, 20 Gb of short reads was generated. FastQC (v0.11.5) was used to qualify the sequence data-quality software [25]. Fastp (v 0.23) was used to filter low-quality reads, including those reads that contain more than 50% of bases with a Q-value of less than 2, and those reads that contain more than 5% of unknown nucleotides [26].

Sequence Analyses
Codon W was used to calculate the composition of the base, the pattern of codon usage, and the relative synonymous codon usage (RSCU). Patterns of nucleotide diversity (Pi), the non-synonymous (Ka) to synonymous rate (Ks) ratio of 13 PCGs among Triplophysa were conducted in DnaSP (v6.12.03). The sequence diversity of each PCG was estimated using sliding window analyses (window length ≤ 100 and step size = 25) in DnaSP. MEGA (v7.0) was used to estimate the genetic distances, using a Kimura-2 parameter (K2P) [33]. The number of single-nucleotide polymorphisms (SNPs) and indel sites was detected using the DnaSP software (v6.12.03) [34].

Phylogenetic Analyses
To clarify the phylogenetic relationships between T. bombifrons and other species in the Triplophysa genus, the 13 concatenated PCGs of T. bombifrons and other species available in GenBank (Table 1) were aligned using MAFFT, with default parameters [35]. The best-fit mode was calculated using the Akaike information criterion (AIC) in ModelFinder. Subsequently, the maximum-likelihood phylogenetic tree was reconstructed using IQ-TREE (v 2.1.2) with 1000 ultrafast bootstraps, under the GTR+F+R6 model [36,37].

Genome Structure and Base Composition
The newly complete mitogenome of T. bombifrons was identified as circular doublestranded molecules with a length of 16,568 bp, which exhibits striking similarity with other Triplophysa mitogenome sequences, differing from them between 24 bp and 113 bp, and 1 bp less than the previously published T. bombifrons mitogenome ( Table 1). The mitogenome base composition is 27.46% A, 25.83% C, 18.58% G, and 28.13% T, with a slight AT bias (55.59%). Similar to other Triplophysa species, the mitogenomes of T. bombifrons contain 13 PCGs, 22 tRNAs, 2 rRNAs, and a putative control region (AT-rich region) (Figure 1, Table 2). The length of the 22 tRNAs ranged from 66 bp to 76 bp; tRNA Cys was the shortest (67 bp), whereas tRNA Lys was the longest (76 bp) in this study. The control region is 916 bp in length and is located between tRNA Pro and tRNA Phe .
Both the expression levels of the genes and the stability of the mRNA were affected by the codon preference, providing evidence in analyzing the evolutionary patterns and phylogenetic relationship [40]. The 13 PCGs encoded a total of 5522 codons in the T. bombifrons mitogenome. Isoleucine, lysine, leucine, proline, phenylalanine, alanine, asparagine, and threonine acid were the codons with the highest usage, the usage rate accounting for 3.13%, 2.88%, 2.70%, 2.64%, 2.52%, 2.48%, and 2.44% in all codons, respectively. The arginines were the codons with the lowest usage and only accounted for 0.83% of all codons ( Table 3). The stop codon (TAA) was the most frequently used in the PCGs of the T. bombifrons mitogenome in this study.  As a significant indicator to identify molecular adaptation, the Ka/Ks ratio (ω) is widely used in phylogenetic analyses of molecular evolution [41]. The Ka/Ks (ω) values of the 13 PCGs were far lower than 1 (<0.12) (Figure 2), indicating that purifying selection was detected in these PCSs and these genes were suitable for reconstructing the phylogenetic relationship of the Triplophysa genus. The Ka/Ks of atp8 (0.111), nad6 (0.072), and nad2 (0.065) are much higher than other PCGs, suggesting that these three PCGs had experienced more relaxed evolutionary pressure than other PGCs and retained more non-synonymous mutations in the genes. The lowest Ka/Ks ratio was found on the nad1 gene, implying that the nad1 gene had received the greatest evolutionary pressure. Mitochondrial DNA plays a vital role in encoding the essential components of the mitochondrial respiratory chain and its inheritance is strictly maternal, which makes deleterious mutations accumulate easily in the mitogenome. The nad genes are utilized as a co-substrate in non-redox reactions and play important roles in the signaling and regulatory pathways. The strong purifying selection detected in nad1 helps to erase the deleterious mutations and makes the nad1 gene a suitable molecular marker of phylogenetic analysis in the Triplophysa genus. The aligned sequences of 13 PCGs of six Triplophysa mitogenomes were used to detect DNA polymorphism (Figure 2, Supplementary Table S2). The highest nucleotide diversity The aligned sequences of 13 PCGs of six Triplophysa mitogenomes were used to detect DNA polymorphism (Figure 2, Supplementary Table S2). The highest nucleotide diversity (Pi) was found in the nad2 gene (0.203), followed by nad1 (0.182), nad5 (0.181), and nad6 (0.178). The cox3 (0.134), cox1 (0.130), and nad1 genes (0.129) have the lowest values. A similar pattern was also observed in terms of mean genetic distances (Table S3 in the  Supplementary Materials). Nad2, nad11, nad5, and nad6 genes showed high genetic distances with 0.24, 021, 0.21, and 0.21, whereas the cox3, cox2, and atp8 genes exhibit lower genetic distances of 0.14, 0.12, and 0.10, respectively.

Sequence Divergence within T. bombifrons Mitogenomes
Comparing the two mitogenomes of T. bombifrons between our study and the previous report (NC_027189), 4.14% nucleotide dissimilarity (677 SNPs and 9 INDELs) had been found (Table S1 in the Supplementary Materials) [17]. In total, 550 SNPs were distributed widely among the 13 PCGs, the nad5 gene (1839 bp) demonstrated a higher ratio (12.34%) than other PCGs relative to the size of the gene, whereas the atp6 gene (684 bp) had the lowest ratio (0.58%). A similar result has been reported in Branchinella kugenumaensis mitogenomes, indicating that sequence divergence within the same species is a common phenomenon [20].
The similarity patterns of amino acid composition and synonymous codon usage were found in two T. bombifrons mitogenome sequences (Figure 3 and Supplementary Figure S1 in the Supplementary Materials). The analysis of RSCU showed that the 13 PGCs contain all codons. The five most frequently used codons in our study were Met (AUA), Met2 (AUG), Ala (GCC), Leu (CUU), and Thr (ACC), while in a previous report (NC_027189), they were Met (AUA), Met2 (AUG), Ala (GCC), Leu (CUU), and Ter (UAA) [17].

Phylogenetic Analyses
To ensure the reliability of the phylogenetic analyses, we downloaded all 28 mitogenomes of the Triplophysa species that have been characterized to date (28 October 2022) from the NCBI reference sequence (RefSeq) database [42]. The ML analyses showed Triplophysa contains 4 main clades (Clades I, II, III, and IV) (Figure 4). Clade I is divided into two subclades (I-A and I-B), with strong support in our phylogenetic reconstructions. The phylogenetic position of T. bombifrons indicated that it is the closest species to T. tenuis in subclade I-A, which was not reported in the previous T. bombifrons mitogenome report [17]. Subclade I-B encompassed T. dalaica and T. wuweiensis. Clade II comprised two subclades, both including 1 monophyly with 4 species. Species of T. cuneicephala, T. pappenheimi, T. robusta, and T. siluroides are included in Clade III. The remaining 6 species were divided into two subclades (Subclade IV-A and Subclade IV-B) and belong to Clade IV; this clade can be considered an ancestral group.

Conclusions
The results of the present study reported the complete mitogenome sequence of T. bombifrons using a hybrid assembly strategy, with PacBio HiFi read and DNBSEQ shortread sequence technologies. The structure of the evaluated T. bombifrons was identical to the mitogenome structure of the Triplophysa genus, including 13 PCGs, 22 tRNAs, 2 rRNAs, and one control region. Phylogenetic analyses based on the 13 PCGs strongly supported the idea that the genus Triplophysa should be divided into 4 main clades and demonstrated that T. bombifrons and T. tenuis are closely related species. The findings of this study will enrich resources of mitogenome in the genus Triplophysa and improve our knowledge of molecular characteristics in the Nemacheilidae family, providing a foundation for future study of population genetic and phylogenetic relationship in the Nemacheilidae family.
Supplementary Materials: The following supporting information can be downloaded at: https:// zenodo.org/record/7299966 on 7 November 2022. Figure S1: Amino acid composition and relative synonymous codon usage in the mitogenomes of T. bombifrons; Table S1: Sequence divergence within T. bombifrons mitogenomes; Table S2: Nucleotide diversity, Ka, and Ks results of Triplophysa; Table S3: Genetic distances, based on 13 PCGs of Triplophysa.
Author Contributions: S.C. designed this study; X.W. conducted the experiments; S.L. analyzed the data; X.W. and S.L. wrote the manuscript; Y.S. and S.L. were in charge of writing, review, and editing; H.X. and F.Z. took samples. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at (https://www.ncbi.nlm.nih.gov/) under accession no OP499856 on 26 December 2022. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA914502, SAMN32338863, SRR22839356 and SRR22839357, respectively.

Conflicts of Interest:
The authors declare no conflict of interest.