Comparative Analysis and Phylogenetic Study of Dawkinsia filamentosa and Pethia nigrofasciata Mitochondrial Genomes

Smiliogastrinae are recognized for their high nutritional and ornamental value. In this study, we employed high-throughput sequencing technology to acquire the complete mitochondrial genome sequences of Dawkinsia filamentosa and Pethia nigrofasciata. The gene composition and arrangement order in these species were similar to those of typical vertebrates, comprising 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 non-coding region. The mitochondrial genomes of D. filamentosa and P. nigrofasciata measure 16,598 and 16,948 bp, respectively. Both D. filamentosa and P. nigrofasciata exhibit a significant preference for AT bases and an anti-G bias. Notably, the AT and GC skew values of the ND6 gene fluctuated markedly, suggesting that the selection and mutation pressures on this gene may differ from those affecting other genes. Phylogenetic analysis, based on the complete mitochondrial genomes of 23 Cyprinidae fishes, revealed that D. filamentosa is closely related to the sister group comprising Dawkinsia denisonii and Sahyadria chalakkudiensis. Similarly, P. nigrofasciata forms a sister group with Pethia ticto and Pethia stoliczkana.


Introduction
Tan and Armbruster [1] reclassified Puntius and related species under the subfamily Smiliogastrinae, following a phylogenetic analysis of fish genera within the order Cypriniformes.At present, there are 25 genera in the subfamily Smiliogastrinae according to the National Center for Biotechnology Information, including 9 single-species genera.In Bangladesh, extensive research has been conducted on the abundance and distribution of barbs in various habitats, including rivers, floodplains, and mountain streams [2,3].Most species of the Smiliogastrinae subfamily have bright colors, and accurate identification and monitoring of these fish species are crucial, given their important economic and ornamental value.
Mitochondrial DNA (mtDNA) is a compact genome, typically 15-20 kb in size, and exists as a closed loop separate from the nuclear genome.mtDNA exhibits properties such as maternal inheritance, simple structure, autonomous replication, high mutation rate, and consistent mutation probability, making it an invaluable tool in studying species evolution, phylogenetic relationships, and population genetics.mtDNA has been used to examine genetic differentiation within and between closely related species [4][5][6].Recent advancements in DNA sequencing technology have greatly facilitated the rapid and accurate acquisition of fish mitochondrial genome data.Moreover, several complete fish mtDNA sequences have been reported, enriching our understanding of classification and systematic evolution within this group.Thus far, complete mitochondrial genome sequences have been reported for only 19 species of Smiliogastrinae [7].
In this study, we employed high-throughput sequencing technology to determine and analyze the complete mitochondrial genome sequences of Dawkinsia filamentosa and Pethia nigrofasciata.By comparing these genomes with those of 19 related species, we aimed to elucidate the molecular phylogenetic relationships within this group.These findings are expected to contribute fundamental insights for the germplasm identification and genetic diversity conservation of Smiliogastrinae.

Mitochondrial Genome Analysis
The total mitochondrial genome lengths of D. filamentosa and P. nigrofasciata are 16,598 and 16,948 bp, respectively (Figures 1 and 2).In D. filamentosa, the nucleotide compositions are 33.9, 24.2, 27.3, and 14.7%, for A, T, G, and C, respectively; whereas in P. nigrofasciata the nucleotide compositions are 33.2,26.7, 24.9, and 15.1%, for A, T, G, and C, respectively.Both species exhibit high AT content in their mitochondrial genomes, which comprise 37 genes (13 protein-coding genes [PCGs], 22 tRNAs, and 2 rRNAs), and one non-coding control region (Tables 1 and 2).Seven gene overlaps were found in both species, with the longest overlap (7 bp) between ATP8 and ATP6, and ND4L and ND4.
aimed to elucidate the molecular phylogenetic relationships within this group.These findings are expected to contribute fundamental insights for the germplasm identification and genetic diversity conservation of Smiliogastrinae.

Mitochondrial Genome Analysis
The total mitochondrial genome lengths of D. filamentosa and P. nigrofasciata are 16,598 and 16,948 bp, respectively (Figures 1 and 2).In D. filamentosa, the nucleotide compositions are 33.9, 24.2, 27.3, and 14.7%, for A, T, G, and C, respectively; whereas in P. nigrofasciata the nucleotide compositions are 33.2,26.7, 24.9, and 15.1%, for A, T, G, and C, respectively.Both species exhibit high AT content in their mitochondrial genomes, which comprise 37 genes (13 protein-coding genes [PCGs], 22 tRNAs, and 2 rRNAs), and one non-coding control region (Tables 1 and 2).Seven gene overlaps were found in both species, with the longest overlap (7 bp) between ATP8 and ATP6, and ND4L and ND4.aimed to elucidate the molecular phylogenetic relationships within this group.These findings are expected to contribute fundamental insights for the germplasm identification and genetic diversity conservation of Smiliogastrinae.

Mitochondrial Genome Analysis
The total mitochondrial genome lengths of D. filamentosa and P. nigrofasciata are 16,598 and 16,948 bp, respectively (Figures 1 and 2).In D. filamentosa, the nucleotide compositions are 33.9, 24.2, 27.3, and 14.7%, for A, T, G, and C, respectively; whereas in P. nigrofasciata the nucleotide compositions are 33.2,26.7, 24.9, and 15.1%, for A, T, G, and C, respectively.Both species exhibit high AT content in their mitochondrial genomes, which comprise 37 genes (13 protein-coding genes [PCGs], 22 tRNAs, and 2 rRNAs), and one non-coding control region (Tables 1 and 2).Seven gene overlaps were found in both species, with the longest overlap (7 bp) between ATP8 and ATP6, and ND4L and ND4.In the mitochondrial genomes of both species, the heavy and light strands encode a different number of genes.The heavy strand encodes 28 genes, including 12 PCGs, 14 tRNAs, and 2 rRNAs; the light strand encodes 9 genes, comprising 1 PCG and 8 tRNAs.The non-coding control regions of D. filamentosa, located between tRNA-Pro and tRNA-Phe, span 931 bp, and the nucleotide compositions of A, T, G, and C are 36.2,33.0, 11.8, and 19.0%, respectively.The non-coding control area of P. nigrofasciata is similarly positioned with a length of 1288 bp, and the A, T, G, and C compositions are 36.7,33.4,11.5, and 18.4%, respectively.The overall AT content of the D. filamentosa mitochondrial genome is 58.1%, with AT skew and GC skew values of 0.166 and −0.300, respectively (Table 3); for P. nigrofasciata, these values are 59.9%, 0.108, and −0.245, respectively (Table 4).

Protein-Coding Gene Analysis
Among the 13 PCGs in D. filamentosa, only COI uses GTG as the starting codon, while the remaining 12 PCGs initiate with ATG.COII, ND2, ND3, and ND4 have incomplete termination codons (T), ATP6 and COIII terminate with TA-, and the remaining seven PCGs end with the complete termination codons TAA or TAG.Among the 13 PCGs of P. nigrofasciata, only COI starts with GTG, while the remaining 12 PCGs begin with ATG.COII, COIII, Cytb, ND2, ND3, and ND4 have incomplete termination codons (T), ATP6 ends with TA-, and the remaining six PCGs terminate with the complete termination codons TAA or TAG.
A preliminary analysis of the relative synonymous codon usage (RSCU) and amino acid composition of the 13 PCGs in the mitochondrial genomes of D. filamentosa and P. nigrofasciata was performed.The results showed that leucine (Leu) was the most frequently used amino acid, constituting 12.69 and 11.25% of the 3798 and 3800 amino acids encoded, respectively.This was followed by alanine (Ala, 8.23 and 8.8%, respectively) and threonine (Thr, 8.84 and 8.22%, respectively).Cysteine (Cys) was the least abundant, accounting for only 0.66% (Tables 5 and 6).Dawkinsia filamentosa and P. nigrofasciata showed a preference for 25 and 26 codons (RSCU ≥ 1) in their 13 PCGs, respectively.

tRNAs and rRNAs Gene Analysis
The 22 tRNA genes in D. filamentosa are 67-77 bp long.The longest gene, Leu2-tRNA, is 74 bp long, while the shortest, Cys-tRNA, is 67 bp (Table 1).In P. nigrofasciata, the sequence length of the 22 tRNAs genes is 67-77 bp, with Lys-tRNA being the longest at 77 bp, and Cys-tRNA being the shortest at 67 bp (Table 2).Of the 24 RNA genes of D. filamentosa and P. nigrofasciata, 8 are located on the L chain and 16 on the H chain.
The total length of the 16S rRNA gene in the mitochondrial genome of D. filamentosa is 1695 bp (Table 3), with an AT content of 57.4%.The 12S rRNA measures 955 bp (Table 3), with an AT content of 51.8%.In P. nigrofasciata, the 16S rRNA gene is 1684 bp in length (Table 4), with an AT content of 59.1%, and the 12S rRNA gene is 956 bp (Table 4), with an AT content of 51.7%.

Phylogenetic Analysis
Using the mitochondrial genomes of 23 Cyprinidae fish, 13 PCG tandem sequences were used to construct Bayesian inference (BI) and maximum likelihood (ML) phylogenetic trees (Figure 3).The trees consistently supported the monophyly of Enteromius, Hampala, and Pethia.However, further research is required regarding the monophyletic relationships of the other three genera: Barbodes, Dawkinsia, and Puntius.Notably, both phylogenetic trees strongly support D. filamentosa forming a sister group with Dawkinsia denisonii and Sahyadria chalakkudiensis (BS = 100, PP = 1).Similarly, P. nigrofasciata is strongly indicated to be a sister group of Pethia ticto and Pethia stoliczkana (BS = 100, PP = 1).

Discussion
We used high-throughput sequencing technology to obtain the complete mitochondrial genome sequences of D. filamentosa and P. nigrofasciata, measuring 16,598 and 16,948 bp, respectively.The structural characteristics of these sequences align with those of previously published studies of Smiliogastrinae fish, underscoring the high evolutionary conservation of the mitochondrial genome in this group [8].AT and GC skew values are indicative of base content differences between the heavy and light chains, and larger absolute values indicate more significant differences in the base composition between the two [9].Dawkinsia filamentosa and P. nigrofasciata mitochondrial DNA sequences exhibit a higher AT content and a lower GC content, thus exhibiting a clear AT preference.This preference has also been observed in other Smiliogastrinae species, but the content varies slightly depending on the species.This base preference may be related to natural mutations and selection pressure during replication and transcription.Unless there are significant changes in the number of coding regions and tRNA bases, each gene is similar to other species of Smiliogastrinae and has a high homology.Typical features of AT duplication also exist in non-coding regions [10].The base distribution in PCGs was relatively uniform, except for the first codon, whereas both the second and third codons exhibited significant anti-G bias.This uneven distribution is likely due to limitations imposed by amino acid compositions and differences in codon usage frequencies [11].
Leucine, a hydrophobic amino acid, was the most frequently used in the 20 amino acid-encoding proteins, a trend consistent with 19 other fish species of Smiliogastrinae [8].This may be related to the composition of transmembrane proteins encoded by mitochondrial genes.Notably, the absolute value of AT skew is the highest in the base composition of the ND6 gene, and the GC skew value is the only positive value, with significant fluctuations in AT/GC bias values.The proton-transporting NADH-quinone oxidoreductase, also known as complex I, is a multi-subunit membrane protein complex that catalyzes electron transfer in the NADH respiratory chain and provides approximately 40% of the proton power required for ATP synthesis in vertebrates [12,13].The ND6 subunit, a critical component of complex I, suggests differential selection and mutation pressures related to respiratory metabolism compared to other genes.
The total lengths of the two rRNA genes of D. filamentosa and P. nigrofasciata were 2650 and 2640 bp, respectively.The intervals between the 12S rRNA and 16S rRNA gene were consistent with Val-tRNA, aligning with the typical characteristics of Smiliogas-

Discussion
We used high-throughput sequencing technology to obtain the complete mitochondrial genome sequences of D. filamentosa and P. nigrofasciata, measuring 16,598 and 16,948 bp, respectively.The structural characteristics of these sequences align with those of previously published studies of Smiliogastrinae fish, underscoring the high evolutionary conservation of the mitochondrial genome in this group [8].AT and GC skew values are indicative of base content differences between the heavy and light chains, and larger absolute values indicate more significant differences in the base composition between the two [9].Dawkinsia filamentosa and P. nigrofasciata mitochondrial DNA sequences exhibit a higher AT content and a lower GC content, thus exhibiting a clear AT preference.This preference has also been observed in other Smiliogastrinae species, but the content varies slightly depending on the species.This base preference may be related to natural mutations and selection pressure during replication and transcription.Unless there are significant changes in the number of coding regions and tRNA bases, each gene is similar to other species of Smiliogastrinae and has a high homology.Typical features of AT duplication also exist in non-coding regions [10].The base distribution in PCGs was relatively uniform, except for the first codon, whereas both the second and third codons exhibited significant anti-G bias.This uneven distribution is likely due to limitations imposed by amino acid compositions and differences in codon usage frequencies [11].
Leucine, a hydrophobic amino acid, was the most frequently used in the 20 amino acidencoding proteins, a trend consistent with 19 other fish species of Smiliogastrinae [8].This may be related to the composition of transmembrane proteins encoded by mitochondrial genes.Notably, the absolute value of AT skew is the highest in the base composition of the ND6 gene, and the GC skew value is the only positive value, with significant fluctuations in AT/GC bias values.The proton-transporting NADH-quinone oxidoreductase, also known as complex I, is a multi-subunit membrane protein complex that catalyzes electron transfer in the NADH respiratory chain and provides approximately 40% of the proton power required for ATP synthesis in vertebrates [12,13].The ND6 subunit, a critical component of complex I, suggests differential selection and mutation pressures related to respiratory metabolism compared to other genes.
The total lengths of the two rRNA genes of D. filamentosa and P. nigrofasciata were 2650 and 2640 bp, respectively.The intervals between the 12S rRNA and 16S rRNA gene were consistent with Val-tRNA, aligning with the typical characteristics of Smiliogastrinae.
Given its stability, the 12S rRNA is often used for fish identification and phylogenetic studies [14].Studies have shown that base mismatches are also commonly present in the secondary structure of tRNA, as they help to mitigate harmful mutations in the nonrecombinant mitochondrial genome [15].The mitochondrial control region, containing elements regulating mitochondrial genome replication and gene expression, offers insights into DNA replication, transcription mechanisms, and evolutionary laws.The significant difference in D-loop length variance as it was observed in our work was also reported previously in the investigated subfamily Smiliogastrinae [7].
Most current phylogenetic studies of Smiliogastrinae rely on individual genes [16,17].However, in our study we used concatenated sequences of 13 PCGs to construct partial Smiliogastrinae phylogenetic trees using ML and BI methods.These analyses confirmed the monophyletic relationships of Enteromius, Hampala, and Pethia, while suggesting the need for further research on the monophyly of Barbodes, Dawkinsia, and Puntius.In addition, both analyses identified D. filamentosa (comprising Dawkinsia denisonii and Sahyadria chalakkudiensis) and P. nigrofasciata (alongside Pethia ticto and Pethia stoliczkana) as sister groups.
In summary, this study is the first to analyze the mitochondrial DNA genome structure of D. filamentosa and P. nigrofasciata and to explore their phylogenetic relationships based on multi-gene tandem sequences.These results provide new insights and reference points for further research into the species evolution, classification, identification, and diversity of Smiliogastrinae.

Experimental Materials and DNA Extraction
In September 2021, fish samples were collected from the Huadiwan Flower, Bird, Fish, and Insect Wholesale Market in Liwan District, Guangzhou City, Guangdong Province (23 • 3 ′ 48 ′′ N, 113 • 12 ′ 18 ′′ E).Following preliminary morphological identification, fresh muscle tissue from the back and abdomen were excised and stored in 1.5 mL Eppendorf centrifuge tubes.These samples were soaked in 95% alcohol for 48 h and then preserved at −20 • C. Genomic DNA was extracted using the traditional chloroform tris-saturated phenol method [18].The concentration of genomic DNA was measured with a Nan-oDrop ultra-micro spectrophotometer, and DNA integrity was verified by 1% agarose gel electrophoresis.

High-Throughput Sequencing and Gene Annotation
DNA samples that passed the quality inspection were fragmented to approximately 200 bp using a Covaris ultrasonic crusher (Covaris, Woburn, MA, USA) for small genomic DNA library construction.The HiSeq 2500 high-throughput sequencing platform was used for sequencing (Guangzhou Tianyi Huiyuan Gene Technology Co., Ltd., Guangzhou, China).SOAPdenovo 2.04 software (http://soap.genomics.org.cn/soapdenovo.html,accessed on 25 January 2024) [19] was used to assemble clean reads and optimize local assembly based on read parings and overlap with default parameters; gaps introduced during scaffold splicing were compensated and repaired using GapCloser 1.12 software (http://soap.genomics.org.cn/soapdenovo.html,accessed on 25 January 2024) [20], with a redundant segment subsequently removed to obtain the final assembly result.The mitochondrial genome was extracted and assembled using NOVOPlasty v4.3.1 software [21].The assembled complete mitochondrial genome sequence was annotated on the MITOS web server (http://mitofish.aori.u-tokyo.ac.jp/, accessed on 25 January 2024), involving PCGs, RNA, and non-coding regions.This was supplemented by manual comparison and correction to precisely determine the position and length of each gene.MitoAnnotator (https://mitofish.aori.u-tokyo.ac.jp/annotation/input/, accessed on 25 January 2024) was used to create mitochondrial genome structure maps for D. filamentosa and P. nigrofasciata.
Institutional Review Board Statement: All specimens in this study were collected in accordance with Chinese laws.The collection and sampling of specimens were reviewed and approved by the Animal Ethics Committee of Nanjing Forestry University (NFU2017-2023187, 1 July 2021).All experiments were conducted with respect to animal welfare and care.This study was compliant with CBD and Nagoya protocols.
Informed Consent Statement: Not applicable.

Figure 3 .
Figure 3. Phylogenetic tree inferred from the nucleotide sequences of 13 protein-coding genes.Numbers at nodes represent the posterior probability values for Bayesian analysis and bootstrap values for maximum likelihood analysis.

Figure 3 .
Figure 3. Phylogenetic tree inferred from the nucleotide sequences of 13 protein-coding genes.Numbers at nodes represent the posterior probability values for Bayesian analysis and bootstrap values for maximum likelihood analysis.

Table 1 .
Annotation of the mitochondrial genome of Dawkinsia filamentosa.

Table 2 .
Annotation of the mitochondrial genome of Pethia nigrofasciata.

Table 5 .
Relative synonymous codon usage of 13 protein-coding genes of the Dawkinsia filamentosa mitochondrial genome.

Table 6 .
Relative synonymous codon usage of 13 protein-coding genes of the Pethia nigrofasciata mitochondrial genome.