The Complete Mitochondrial Genome of Eurasian Minnow (Phoxinus cf. Phoxinus) from the Heilongjiang River, and Its Phylogenetic Implications

Simple Summary Fishes of genus Phoxinus belong to the family Leusciscidae and are widely distributed in Eurasia. Since the early 20th century, it has generally been accepted that the European minnow (P. phoxinus) was the only valid species of Phoxinus in Europe. After the loss of most species from North America and Asia, only those taxa in Euasia originally placed under the name P. phoxinus remain in the genus. In recent years, more than 20 lineages of genus Phoxinus have been identified in Europe, more than 10 of which have been considered as valid species in taxonomy. Existing morphological and genetic cues suggest distinct differences between Asian and European members of Phoxinus. Mitochondrial DNA from European Phoxinus spp. and their Asian relatives provide evidence that Far Eastern phoxinins should be divided into two genera: Phoxinus and Rhynchocypris. The updated phylogeny of Phoxinus indicates five mtDNA lineages in Asia. The validity of some of these lineages have been supported previously by morphological evidence. Abstract Over the past two decades, the genus Phoxinus has undergone extensive taxonomic revision and many new species or mitochondrial lineages have been found in Europe. However, Asian populations of Phoxinus spp. have received less attention and have rarely been compared with their European relatives. In this study, we deciphered the 16,789-nucleotide mitochondrial genome of Phoxinus cf. phoxinus from the Heilongjiang River (HLJ) and compared it with other known mitogenomes or partial mitochondrial DNA (mtDNA) sequences of Phoxinus spp. We discovered that all known mitochondrial genomes of Phoxinus had a typical mtDNA architecture across vertebrates, but their D-loop regions varied greatly in length. A repetitive motif of ~130 bp was identified in the D-loop regions of Phoxinus spp. The unusual repetitive structure was revealed at the beginning of D-loop regions of all known mitogenomes of Phoxinus spp. The length differences of the D-loop region were attributed mainly to the number of repetitive motifs and the inserted sequences among them. However, this repetitive structure was absent in the other Far East phoxinins. This is further evidence for the notion that Far Eastern phoxinins should be divided into two genera: Phoxinus and Rhynchocypris. All mtDNA sequences (including three mitogenomes) from South Korea represent the same genetic lineage, as there were only slight differences among them. The remaining six mtDNA sequences are highly divergent and represent different lineages of the genus, as supported by partial mtDNA sequences. The updated phylogeny of genus Phoxinus suggests that there are five distinct mtDNA lineages in Asia. The Asian lineages have diverged markedly from their European relatives and should not be included with the European minnow (P. phoxinus).


Introduction
The fishes of genus Phoxinus belong to the family Leusciscidae and are widely distributed in Eurasia [1]. The classification of genus Phoxinus has changed dramatically over the past 20 years. The genus Phoxinus was once considered to be the only genus with a Holarctic distribution in suborder Cyprinoidei (the former Cyprinidae sensu lato), but molecular systematics suggests that North American species should not be attributed to genus Phoxinus [2]. The European minnow (P. phoxinus) has long been considered the only valid species of genus Phoxinus in Europe [3]. In recent years, a growing number of studies have shown that there are more than 10 valid species in the genus Phoxinus in Europe [3][4][5][6][7]. In Asia, most members of the Asian phoxinins should be reclassified as another genus, Rhynchocypris, with only "Asian populations" of European minnow (P. phoxinus) left in genus Phoxinus [8,9]. There are obvious differences in morphology and geographical distributions between Asian and European Phoxinus species, but Asian Phoxinus spp. are still mostly recognized as a subspecies or population of European minnow [1]. With the extensive revision of the taxonomy of Phoxinus, further clarification of the genus Phoxinus in Asia is urgently needed.
High phenotypic plasticity makes species delimitation of Phoxinus difficult and the wide distribution range makes comparisons between different types of minnows inconvenient [5,7,[10][11][12][13][14]. Based on molecular data, a multispecies complex of Phoxinus in the Western Balkan Peninsula was reported and dispersion through subterranean water in karst formations was suggested [5]. Since this report, as many as 23 mtDNA lineages of genus Phoxinus have been reported [5,7,[15][16][17][18]. Among the molecular markers used to study Phoxinus, the cytochrome oxidase subunit I (COI) gene encoded by mitochondrial DNA (mtDNA) seems to be the most successful [17,18] and has been generally used as a DNA barcode [19,20]. The application of molecular markers not only revealed substantial hidden diversity of Phoxinus, but also facilitated studies of geographical patterns, including invasion and diffusion patterns of the genus [18,21,22]. However, relevant studies on the molecular phylogeny and species delimitation of Phoxinus were mainly focused on European populations, and Asian populations have received less attention and have rarely been compared with European relatives.
Although there have been fewer comprehensive studies of Phoxinus spp. in Asia, the genetic structure of Phoxinus has been investigated with respect to their distribution [23][24][25][26][27][28][29] and mitochondrial DNA data have accumulated. Unfortunately, various mtDNA markers were used in these studies, and most of them were not COI genes, making direct comparison of different Asian populations or their European relatives difficult. The typical mtDNA of fish is a 16-17 kb circular molecule, characterized by a simple structure, maternal inheritance, lack of recombination and rapid evolution [30]. The complete sequence of the mitochondrial genome not only provides more genetic and evolutionary information than a partial sequence, but also provides the opportunity to integrate and compare different partial sequences. By searching GenBank and various publications, a total of 8 mitogenomic sequences of genus Phoxinus were retrieved, 2 from Europe and 6 from Asia [31][32][33][34]. By extracting the corresponding gene fragments from these known mitogenomes, previous results based on different fragments could be reanalyzed together.
The primary purpose of this study was to explore the relationship between Asian populations of Phoxinus spp. and their European relatives. Six of 8 known mitogenomes were from South Korea, China and Mongolia, so there is a relatively wide geographical distribution. However, according to "Fauna Sinica", there are 3 subspecies of "Phoxinus phoxinus" in China [35]. The complete mitogenome of Phoxinus (phoxinus) ujmoneesis and Phoxinus (phoxinus) tumenesis have been determined [32,33]. In this study, the complete mitogenome of Phoxinus cf. phoxinus from the Heilongjiang River (HLJ) was successfully sequenced by an overlapping PCR method. Based on known mitogenomes of Phoxinus spp. and related studies, the phylogeny of Phoxinus was revised with special reference to species diversity in Asia.

Sample Collection and DNA Extraction
Specimens of Phoxinus cf. phoxinus from the Heilongjiang River (HLJ) were collected in Tahe County, Heilongjiang province, China (indicated in Figure 1 as Tahe). The Phoxinus cf. phoxinus samples were anesthetized before being preserved in 95% ethanol at room temperature. All animal procedures in this study were conducted according to the guidelines for the care and use of laboratory animals of Heilongjiang River Fisheries Research Institute, Chinese Academy of Fishery Sciences (CAFS). The studies in animals were reviewed and approved by the Committee for the Welfare and Ethics of Laboratory Animals of Heilongjiang River Fisheries Research Institute, CAFS. spp. and related studies, the phylogeny of Phoxinus was revised with special reference to species diversity in Asia.

Sample Collection and DNA Extraction
Specimens of Phoxinus cf. phoxinus from the Heilongjiang River (HLJ) were collected in Tahe County, Heilongjiang province, China (indicated in Figure 1 as Tahe). The Phoxinus cf. phoxinus samples were anesthetized before being preserved in 95% ethanol at room temperature. All animal procedures in this study were conducted according to the guidelines for the care and use of laboratory animals of Heilongjiang River Fisheries Research Institute, Chinese Academy of Fishery Sciences (CAFS). The studies in animals were reviewed and approved by the Committee for the Welfare and Ethics of Laboratory Animals of Heilongjiang River Fisheries Research Institute, CAFS. Triangles represent Phoxinus cf. phoxinus from the Heilongjiang River; Pentagon represents P. tumensis from Tumen River; Diamond represents the Phoxinus cf. phoxinus from South Korea; Solid circles represent Phoxinus cf. phoxinus from Mongolian Plateau; Square represents P. ujmonensis from Irtysh River. Samples from Anadyr River (in Northeast of Russia) clustered together with Phoxinus cf. phoxinus from HLJ, indicating that this lineage may have a wide distribution.
Total DNA was obtained from a fin clip of one individual using Proteinase K digestion and phenol-chloroform extraction. Some studies have reported complete or partial mitochondrial sequence of Phoxinus spp. from Asia and these publicly available mtDNA data were included in our analyses. Detailed information on the sequences used in this study are listed in Table 1, and the locations of Phoxinus populations in Asia are shown in Figure 1. The complete mitochondrial genome (X61010) of Common carp (Cyprinus carpio) was used as the outgroup [36]. Triangles represent Phoxinus cf. phoxinus from the Heilongjiang River; Pentagon represents P. tumensis from Tumen River; Diamond represents the Phoxinus cf. phoxinus from South Korea; Solid circles represent Phoxinus cf. phoxinus from Mongolian Plateau; Square represents P. ujmonensis from Irtysh River. Samples from Anadyr River (in Northeast of Russia) clustered together with Phoxinus cf. phoxinus from HLJ, indicating that this lineage may have a wide distribution.
Total DNA was obtained from a fin clip of one individual using Proteinase K digestion and phenol-chloroform extraction. Some studies have reported complete or partial mitochondrial sequence of Phoxinus spp. from Asia and these publicly available mtDNA data were included in our analyses. Detailed information on the sequences used in this study are listed in Table 1, and the locations of Phoxinus populations in Asia are shown in Figure 1. The complete mitochondrial genome (X61010) of Common carp (Cyprinus carpio) was used as the outgroup [36]. AB100732-AB100733 40 [9] P. lumaireul Rhynchocypris spp. AB100697-AB100731 582 [9] Cyprinus carpio X61010 X61010 1 [36] 1 There are two mitogenomes from the same isolate DM856 of Bleak (Alburnus alburnus) in Genbank, which were submitted by the same author (Leerhoei, F.) on 29-APR-2020 and 09-JUN-2020, respectively. One (MT584105) is highly similar to Bleak mitochondrial reference genome, but the other (MT410946) should come from Phoxinus morella based on our analysis. This sequence (MT410946) was the only unpublished one involved in our study.

Sequencing and Annotation of the Mitogenome
Based on the known mitogenomes of Phoxinus spp. and related species (Table 1), twelve primer pairs were designed to cover the complete mtDNA. When necessary, additional internal walking primers were designed for sequencing. Primer details were listed in Table S1 (Supplementary Materials). All primers were diluted to 10 µM, the PCR reaction volume was 30 µL and contained 1x PCRmix (Cowin bioscience, Beijing, China), 1 µL each of forward primer and reverse primer, and~100 ng of template DNA. The amplification procedure was as follows: pre-denaturation at 94 • C for 2 min, followed by 35 denaturation cycles at 94 • C for 30 s; annealing at 58 • C for 30 s; extension at 72 • C for 1.5 min; final extension at 72 • C for 7 min. After amplification, the PCR products were tested using 1% agarose gel electrophoresis and commissioned to Sangon Biotech (Shanghai Co.; Ltd., Shanghai, China) for purification and sequencing.
The sequencing trace file was viewed on FinchTV v.1.3.1 (Geospiza, Inc., Seattle, WA, USA), and the low-quality clips at the beginning and end were trimmed. The mitogenome of HLJ were assembled with Cap3 [37] and submitted to MitoAnnotator (http://mitofish.aori.u-tokyo.ac.jp/annotation/input.html (accessed on 22 August 2022) for annotation [38]. Online software tRNAscan-SE 2.0 (http://trna.ucsc.edu/tRNAscan-SE (accessed on 22 August 2022) was used to locate the tRNA genes and analyze their secondary structure [39]. The D-loop region of known mitogenomes was analyzed by Tandem Repeat Finder software v. 4.07b to identify the repetitive units [40]. Then, repeated units were aligned with the ClustalX v.1.83 software [41] with manual modification to analyze
Sequences were aligned with MUSCLE [44], and alignments were verified visually. Phylogeny reconstruction was performed using Bayesian inference (BI) and maximumlikelihood (ML) approaches. The best-fit nucleotide substitution models of Bayesian inference (BI) were selected by mrModeltest v2.4 [45] using the Hierarchical Likelihood Ratio Tests (hLRTs). Bayesian analysis was conducted using MrBayes v3.2 [46]. Four chains were run for 5,000,000 generations, sampling trees every 100 generations and the first 12,500 trees were discarded as burn-in. The ML analysis was conducted by combining ModelFinder, tree search, SH-aLRT test and ultrafast bootstrap with 1000 replicates in IQ-TREE [47].

Characterization of the Mitogenome
The complete mitochondrial genome of Phoxinus cf. phoxinus (HLJ) was 16,789 bp, with an overall GC content of 43.79%, and it has been deposited into GenBank under accession OP326577. Our annotation identified 2 rRNAs (12S and 16S rRNA), 22 tRNAs, 13 protein-coding genes (PCGs) and 1 D-loop region, which were consistent with the gene content of typical mtDNA in the teleost. As shown in Table 2, the arrangement of genes was quite compact. There were 14 spacers between genes or elements, ranging from 1 to 33 bp, totaling 64 bp. There were 10 overlapping regions, with 1−7 bp shared bases, totaling 28 bp. Excluding the D-loop region, the longest non-coding sequence is the origin of light strand replication (O L ). As is typical with the teleost, only 1 protein-coding gene (ND6) and 8 tRNA genes are encoded by the light strand (L-strand) and all other genes are encoded by the heavy strand (H-strand). Compared with the gene order of typical vertebrate mtDNA, no rearrangement was observed in the mtDNA of Phoxinus cf. phoxinus (HLJ).
Most ( tRNA-Glu, and tRNA-Pro) were encoded on the L-strand. All of the features described above are similar to the general features of the mitochondrial genome of cyprinids.

The D-loop Region of Phoxinus Spp.
Like other cyprinids, the D-Loop regions of Phoxinus spp. are located between tRNA-Pro and tRNA-Phe, but their length has a larger variation, from 991 bp (MT410946) to 2544 bp (MK227443) in genus Phoxinus. Excluding the D-loop region, mitochondrial genomes of Phoxinus spp. were highly conserved in length. We found a 130 bp repetitive unit at the beginning of the D-loop of Phoxinus cf. phoxinus (HLJ) and there are similar repetitive structures in the known mtDNA of the genus Phoxinus (Figure 2). Each repetitive unit can be further divided into a variable domain and 3 conservative domains. Our comparative analysis suggests that the length variation of the D-loop region is directly related to the number of repetitive units and length of the inserted sequences between the repetitive units. The longest D-loop region was found in MK227443, in which the repetitive unit appeared up to 6 times, and partial sequences of D-loop regions were also inserted between some repetitive units. However, there was only 1 copy of a repetitive unit without insertion of other sequence in the shortest D-loop region (MT410946). We speculate that the repetitive structure may have mediated the insertion and reorganization of fragments between repetitive units. bp (MK227443) in genus Phoxinus. Excluding the D-loop region, mitochondrial genomes of Phoxinus spp. were highly conserved in length. We found a 130 bp repetitive unit at the beginning of the D-loop of Phoxinus cf. phoxinus (HLJ) and there are similar repetitive structures in the known mtDNA of the genus Phoxinus (Figure 2). Each repetitive unit can be further divided into a variable domain and 3 conservative domains. Our comparative analysis suggests that the length variation of the D-loop region is directly related to the number of repetitive units and length of the inserted sequences between the repetitive units. The longest D-loop region was found in MK227443, in which the repetitive unit appeared up to 6 times, and partial sequences of D-loop regions were also inserted between some repetitive units. However, there was only 1 copy of a repetitive unit without insertion of other sequence in the shortest D-loop region (MT410946). We speculate that the repetitive structure may have mediated the insertion and reorganization of fragments between repetitive units. As shown in Figure 3, several important regulatory elements of cyprinids were present in the D-loop of Phoxinus cf. phoxinus (HLJ). The extended termination associated sequence (ETAS) domain was identified at the 5' end of the control region. In cypriniforms, the consensus sequence of ETAS is TACAT---ATGTATTATCACCA---TATTTAACCA-TAAA [42], similar to the conservative domain of the repeat motif. Extended termination associated sequence (ETAS) domain acts as a signal for termination of H-strand synthesis. Repeated termination signals have also been found in other vertebrates [48,49], but it seems to be that the first ETAS plays a major role [50]. This point was also supported by the D-loop structure of Phoxinus spp. At the beginning of the D-loop, there is an independent ETAS at the front of repetitive region. The sequence after the repetitive region was similar to the homologous sequence of other cyprinids. CSB-D and CSB-E have also been identified in the central domain, but a sequence (GTAGTGAGAGCCCACCAACTAGA) in HLJ shown limited similarity with CSB-F identified in other cyprinids [42]. All three elements of conserved sequence blocks (CSB1-3) were identifiable in the 3′ end of the control region [42,43]. The base composition of the D-loop region of HLJ also has a significant A+T bias (66.5%) and is greater than the A+T content of other regions of the mitochondrial genome. As shown in Figure 3, several important regulatory elements of cyprinids were present in the D-loop of Phoxinus cf. phoxinus (HLJ). The extended termination associated sequence (ETAS) domain was identified at the 5' end of the control region. In cypriniforms, the consensus sequence of ETAS is TACAT-ATGTATTATCACCA-TATTTAACCATAAA [42], similar to the conservative domain of the repeat motif. Extended termination associated sequence (ETAS) domain acts as a signal for termination of H-strand synthesis. Repeated termination signals have also been found in other vertebrates [48,49], but it seems to be that the first ETAS plays a major role [50]. This point was also supported by the D-loop structure of Phoxinus spp. At the beginning of the D-loop, there is an independent ETAS at the front of repetitive region. The sequence after the repetitive region was similar to the homologous sequence of other cyprinids. CSB-D and CSB-E have also been identified in the central domain, but a sequence (GTAGTGAGAGCCCACCAACTAGA) in HLJ shown limited similarity with CSB-F identified in other cyprinids [42]. All three elements of conserved sequence blocks (CSB1-3) were identifiable in the 3 end of the control region [42,43]. The base composition of the D-loop region of HLJ also has a significant A+T bias (66.5%) and is greater than the A+T content of other regions of the mitochondrial genome.  We added the COI gene sequences from Table 1 into the COI dataset of Palandačić et al. [18] for phylogenetic analysis. The results obtained by the two methods (BI and ML) are highly consistent. The major difference between our results and those of previous studies is the number of identified clades. In previous studies, 22 linages of Phoxinus were identified based on the COI dataset [18], and an additional lineage 21 was identified based

Phylogeny of Phoxinus Spp.
We added the COI gene sequences from Table 1 into the COI dataset of Palandačić et al. [18] for phylogenetic analysis. The results obtained by the two methods (BI and ML) are highly consistent. The major difference between our results and those of previous studies is the number of identified clades. In previous studies, 22 linages of Phoxinus were identified based on the COI dataset [18], and an additional lineage 21 was identified based the Cyt b data [16][17][18]. In this study, we identified 27 linages from the COI dataset (also excluding lineage 21) by involving the COI sequences of Asian Phoxinus spp. The 5 additional clades are from South Korea, Tumen River, Irtysh River, Mongolian Plateau and Portugal. To avoid confusion, we numbered these 5 lineages following previous studies (Figure 4). The mtDNA of HLJ was clustered with lineage 22 from the previous study. A newly defined lineage 24 was also presented in previous studies, which had previously been classified with lineage 16 [18], rather than an independent clade. Of the COI fragments extracted from the whole mitogenome sequences, three from South Korea form one lineage (numbered 25) [26,34]. However, each of the remaining 5 COI sequences represented a distinctive mitochondrial lineage. AP009309 and AP009147 belong to lineage 1 and 11, respectively, and the other 3 sequences from Asia correspond to 3 new lineages (linages [26][27][28].   Based on allozyme and mitochondrial 16S rRNA sequences, Sakai et al. [9] concluded that Far Eastern phoxinins should be split into two genera: Phoxinus and Rhynchocypris. The corresponding 16S rRNA sequences of 9 mitochondrial genomes were extracted and added to the dataset of Sakai et al. [9]. Reanalyzed results were highly consistent with that of Sakai et al. [9], showing that the separation of Phoxinus and Rhynchocypris was primary ( Figure 5). The results from different datasets are consistent and indicate that all Phoxinus samples from South Korea represent the same mitochondrial lineage (see in Figure 4, Figure 5 and Figure S1). All mtDNA sequences of Phoxinus. cf. phoxinus from HLJ cluster into the same mitochondrial lineage 22. Interestingly, samples from the Anadyr River (in Northeast Russia) also cluster into Lineage 22, indicating that this lineage may have a wide distribution. Phoxinus spp. from the Irtysh River, Mongolian Plateau and Tumen River, each representing a distinct lineage, were not included in previous data of Sakai et al. The results of Cyt b data analysis were consistent with those of COI and 16S rRNA ( Figure S1 in Supplementary Materials). Based on allozyme and mitochondrial 16S rRNA sequences, Sakai et al. [9] concluded that Far Eastern phoxinins should be split into two genera: Phoxinus and Rhynchocypris. The corresponding 16S rRNA sequences of 9 mitochondrial genomes were extracted and added to the dataset of Sakai et al. [9]. Reanalyzed results were highly consistent with that of Sakai et al. [9], showing that the separation of Phoxinus and Rhynchocypris was primary ( Figure 5). The results from different datasets are consistent and indicate that all Phoxinus samples from South Korea represent the same mitochondrial lineage (see in Figure 4, 5 and S1). All mtDNA sequences of Phoxinus. cf. phoxinus from HLJ cluster into the same mitochondrial lineage 22. Interestingly, samples from the Anadyr River (in Northeast Russia) also cluster into Lineage 22, indicating that this lineage may have a wide distribution. Phoxinus spp. from the Irtysh River, Mongolian Plateau and Tumen River, each representing a distinct lineage, were not included in previous data of Sakai et al. The results of Cyt b data analysis were consistent with those of COI and 16S rRNA ( Figure S1 in Supplementary Materials). Kottelat studied the morphology of Phoxinus spp. from Mongolia and argued that they were markedly different from European minnow (Phoxinus phoxinus) [1]. He also Kottelat studied the morphology of Phoxinus spp. from Mongolia and argued that they were markedly different from European minnow (Phoxinus phoxinus) [1]. He also pointed out that Phoxinus spp. from the Selenge, Kherlen and Bulgan drainages differed from each other, and suggested that they were distinct species [3]. Our results support the existence of 3 lineages in Mongolia. Lineage 23 from the central Mongolia Plateau was collected at Ulaanbaatar [29] and the Tyva Republic [28]. Eastern samples from the Kherlen River (a tributary of the HLJ) belong to Lineage 22. Populations from western Mongolia are speculated to be conspecific with Altai populations and discussed under P. ujmonensis (Irtysh River) [1]. In a phylogenetic analysis, the mtDNA of P. ujmonensis also formed an independent lineage (No. 27). "Fauna Sinica" recorded that there were 3 subspecies of "Phoxinus phoxinus" in China, two are the above-mentioned P. ujmonensis from Irtysh River and Phoxinus cf. phoxinus from HLJ, and the third is P. tumenesis, which has a complete lateral line and is distributed only in the Tumen River Basin. The mtDNA of P. tumenesis also formed a distinct lineage (No. 26). All mtDNA sequences of Phoxinus cf. phoxinus from South Korea cluster into a distinct lineage, though it is closely related to P. tumenesis. Thus, there are a total of 5 Asian lineages of Phoxinus now identified. They are significantly different from their European relatives, suggesting that Asian populations of Phoxinus should no longer be identified as "Phoxinus phoxinus".

Conclusions
In this study, we deciphered the complete mitochondrial genome of Phoxinus cf. phoxinus from the Heilongjiang River and the results were analyzed and compared with other mtDNA datasets. An unusual repetitive structure was revealed at the beginning of Dloop regions of all known mitogenomes of Phoxinus spp. However, this repetitive structure was absent in phoxinin populations that we propose reclassifying as Rhynchocypris spp. This is further evidence for the phylogenetic result that Far Eastern phoxinins should be divided into two genera: Phoxinus and Rhynchocypris. The updated phylogeny of Phoxinus also indicates 5 mtDNA lineages of Phoxinus in Asia that are deeply divergent from their European relatives. Northern Asia is suitable for the distribution of Phoxinus species, but there are many areas not covered in previous publications. Given the tremendous progress made in taxonomy and the phylogeography of Phoxinus in Europe, there is no doubt that more lineages or species of Phoxinus may be found in Northern Asia.