Next Article in Journal
Editorial for the Special Issue “Toxicology of Anthropogenic Pollutants on Fish”
Previous Article in Journal
Application of Magnetic Resonance Tools for Qualification and Traceability of Mullets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Building a Local Multi-Marker eDNA Reference Database Reveals the Limitations of Public Repositories for Freshwater Fish Monitoring in the Three Gorges Reservoir

National Agricultural Science Observing and Experimental Station of Chongqing, Yangtze River Fisheries Research Institute, Chinese Academy of Fishery Science, Wuhan 430223, China
*
Authors to whom correspondence should be addressed.
Fishes 2026, 11(5), 264; https://doi.org/10.3390/fishes11050264
Submission received: 22 March 2026 / Revised: 25 April 2026 / Accepted: 27 April 2026 / Published: 29 April 2026
(This article belongs to the Section Biology and Ecology)

Abstract

Environmental DNA (eDNA) metabarcoding has emerged as a powerful tool for biodiversity monitoring, yet its accuracy is fundamentally constrained by the completeness and taxonomic reliability of reference sequence databases. For the Three Gorges Reservoir (TGR), no integrated multi-marker eDNA reference library exists, hampering standardized fish conservation monitoring under the Yangtze River Ten-Year Fishing Ban. Here, we constructed a comprehensive, multi-marker eDNA reference database for the fish fauna of the TGR, encompassing mitochondrial 12S rRNA, 16S rRNA, and cytochrome c oxidase subunit I (COI) gene sequences from 173 specimens (120 species) collected between 2021 and 2024. After integrating publicly available sequences, the final database comprised 161 species. Then, we quantitatively compared species annotation performance between this local database and public repositories. Results showed that while public databases achieved higher nominal species coverage (94.67%), they exhibited critical deficiencies in annotation accuracy, correctly annotating only 77.97% (12S rRNA), 75.00% (16S rRNA), and 38.14% (COI) of sequences from shared species under controlled conditions. In contrast, the local database exhibited 92.37%, 93.10% and 100% annotation accuracy for the respective markers. Optimal interspecific Kimura 2-parameter (K2P) thresholds for species delimitation were 0.00448 (12S rRNA), 0.00531 (16S rRNA), and 0.00734 (COI). In addition, 15, 0, and 4 species pairs exhibited zero interspecific distance for 12S rRNA, 16S rRNA, and COI, respectively. These limitations reinforce the need for cautious interpretation of eDNA metabarcoding results and the integration of multiple markers or complementary nuclear loci. This study provides preliminary evidence that regionally curated, multi-marker reference libraries could improve taxonomic assignment reliability in eDNA metabarcoding compared to uncurated public repositories, providing a foundational resource for biodiversity conservation.
Key Contribution: This study established an integrated multi-marker eDNA reference database for the Three Gorges Reservoir and provided preliminary evidence that locally curated reference libraries outperformed public databases in species annotation accuracy, while also providing optimized interspecific genetic thresholds for reliable species delimitation in eDNA metabarcoding.

1. Introduction

Freshwater ecosystems are among the most imperiled habitats globally, experiencing rates of biodiversity loss far exceeding those of terrestrial and marine realms. Over the past century, human disturbances—such as hydrological regulation, pollution, habitat fragmentation, and overexploitation—have triggered an accelerating decline in freshwater fish diversity across continents. Accurate and comprehensive biodiversity assessments are therefore globally recognized as the cornerstone of effective conservation and management strategies for freshwater resources. Traditional survey methods, such as net capture, are inherently destructive to target organisms and their habitats. Species identification with these methods relies heavily on the taxonomic expertise of researchers and lacks standardized protocols [1]. Environmental DNA (eDNA) metabarcoding has emerged as a non-invasive and eco-friendly alternative for biodiversity monitoring. By amplifying and high-throughput-sequencing DNA extracted from environmental samples, and comparing the sequences against reference databases for taxonomic assignment, eDNA metabarcoding is increasingly used to characterize freshwater fish diversity, detect rare and endangered species, and identify biological invasions [2,3,4,5]. A complete and high-quality eDNA reference database is critical for the successful application of this technique. Incomplete databases can lead to inaccurate taxonomic assignments, directly undermining the reliability of eDNA-based fish diversity surveys [6,7].
Missing taxa, mislabeling, insufficient taxonomic resolution, and missing intraspecific variants represent pervasive, systemic limitations in public genetic databases, which collectively compromise the accuracy of taxonomic assignment in eDNA metabarcoding studies [8]. Currently, the reliability of eDNA-based fish surveys is frequently limited by the lack of regionally representative reference databases for native species, representing a major bottleneck in the field [9,10]. For freshwater fish eDNA metabarcoding, the mitochondrial 12S rRNA, 16S rRNA, and cytochrome c oxidase subunit I (COI) genes are the most widely used markers [11]. Several studies have applied these markers to develop eDNA databases, including a local eDNA metabarcoding database for freshwater fishes of Hainan Island, China [12].
The Three Gorges Reservoir (TGR) in the Yangtze River serves as a critical habitat for numerous rare and endemic fish species, representing a vital national germplasm repository for freshwater fishes. However, frequent water-level fluctuations caused by reservoir impoundment and flood discharge have disrupted fish feeding, spawning, and overwintering grounds. Coupled with long-term overfishing and environmental pollution, these pressures have driven sharp declines in fishery resources within the reservoir region, accompanied by severe losses in fish abundance and biodiversity [13]. To restore fishery resources, the Yangtze River Fishing Ban was implemented in 2021 to cease all basin-wide commercial fishing in China. Accurate species-level identification is therefore urgently needed for effective biomonitoring and ecosystem management in the TGR. Gao et al. [14] established a local COI gene database for 51 common fish species. However, the previous study was restricted to a single genetic marker and limited taxonomic coverage; to date, no integrated eDNA metabarcoding database incorporating 12S rRNA, 16S rRNA, and COI genes has been constructed for the TGR. Moreover, the TGR harbors many endemic and threatened fish species that are poorly represented in public databases, which often leads to low annotation accuracy or false negatives in field eDNA surveys. This lack of a comprehensive, high-accuracy, multi-marker reference database hinders the accurate and consistent application of eDNA technology for regional fish diversity monitoring.
To address this critical need, the present study aims to establish an integrated environmental DNA reference database for fish species in the TGR, encompassing mitochondrial 12S rRNA, 16S rRNA, and COI gene sequences. We then compared the species annotation performance of this local database against that of existing public databases. Using this integrated database as a foundation, we further evaluated interspecific genetic divergence, assessed the species discrimination ability of each molecular marker, and determined optimal interspecific divergence thresholds. This study will provide essential data resources, support monitoring efforts of the Yangtze River Ten-Year Fishing Ban, and contribute to the assessment of aquatic ecological quality in the reservoir area. Furthermore, it may offer a standardized methodological template for regional eDNA database construction in freshwater ecosystems across China.

2. Materials and Methods

2.1. Sample Collection and Morphological Identification

Fish specimens were collected from the mainstem and major tributaries of the Yangtze River within the TGR between 2021 and 2024 (Figure 1). Morphological species identification was performed following authoritative references, including the Fishes of Sichuan and Colored Atlas of Fishes of Sichuan. For specimens with unambiguous morphological identification, fin clips or dorsal muscle tissues were excised, preserved in absolute ethanol, and stored at −20 °C.

2.2. DNA Barcode Generation

Total genomic DNA was extracted using the Rapid Animal Genomic DNA Isolation Kit (Sangon, Shanghai, China). Extracted DNA was then quantified using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Polymerase chain reaction (PCR) amplification was conducted using three universal primer sets widely employed in fish eDNA metabarcoding studies: MiFish-U, targeting a ~170 bp fragment of the mitochondrial 12S rRNA [15]; Ac16S, targeting a ~330 bp fragment of the mitochondrial 16S rRNA [16]; and Ps1, targeting a ~247 bp fragment of the COI gene [17]. All PCR amplifications were performed in a total volume of 25 μL, consisting of 12.5 μL of 2× HyperMB HiFi Ultra-Fast PCR Master Mix (Sangon, Shanghai, China), 1.0 μL of each forward and reverse primer (10 μM), 2.0 μL of template DNA, and 8.5 μL of nuclease-free ddH2O. The PCRs were initiated by denaturation at 95 °C for 3 min, followed by 35 amplification cycles of denaturation at 95 °C for 30 s, annealing for 30 s (annealing temperatures: 60 °C for 12s rRNA, 60 °C for 16S rRNA, and 54 °C for COI), and extension at 72 °C for 20 s, with a final extension at 72 °C for 5 min. PCR products were purified and subjected to bidirectional Sanger sequencing at Sangon Biotech Co., Ltd. (Shanghai, China). For fish species with well-documented historical occurrence records in the TGR that were not captured during field sampling, corresponding mitochondrial sequences were retrieved from the public databases. To ensure the reliability of these data, a strict quality-control (QC) workflow was implemented. Searches were conducted using the scientific name as the primary query. Priority was given to sequences derived from voucher specimens with published locality information within the Yangtze River basin or adjacent freshwater systems. Each candidate sequence was verified by BLAST against both the NCBI and BOLD databases to confirm correct taxonomic assignment and sequence identity. When identical species were represented by multiple entries, only unique haplotypes or the longest high-quality sequence per species was retained to avoid redundancy. Following alignment and trimming to a uniform length, these sequences were incorporated as reference sequences into the local eDNA metabarcoding library for freshwater fishes in TGR (Table S1).

2.3. Local Database Construction and Sequence Annotation Analysis

For construction of the local database, we followed the protocol described by Gao and Jiang [14]. Briefly, BLAST-2.4 software was installed and configured in the system environment. Validated sequences of 12S rRNA, 16S rRNA and COI were compiled into three separate FASTA files (12S-barcode.fa, 16S-barcode.fa and COI-barcode.fa). These datasets were formatted and indexed using the makeblastdb command with the following parameters: -in 12S-barcode.fa -dbtype nucl -parse_seqids -out fish, -in 16S-barcode.fa -dbtype nucl -parse_seqids -out fish, and -in COI-barcode.fa -dbtype nucl -parse_seqids -out fish, thereby establishing a BLAST-compatible local database. Query sequences were compiled into a test.fa file, and database searches were performed using the blastn command with the following parameters: -query test.fa -db fish -evalue 1e-5 -outfmt 2.
We simulated the standard eDNA metabarcoding sequence annotation workflow based on sequence similarity and quantitatively evaluated species annotation performance using our newly generated sequences against both public databases and our custom-built local database. For species not captured in this study but historically recorded in the TGR, sequences were obtained from public databases and only added to the reference library—not used as queries. For the local database, query sequences were compiled into a test.fa file, and database searches were performed using the blastn command with the following parameters: -query test.fa -db fish -evalue 1e-5 -outfmt 2. A query sequence was considered accurately annotated if and only if the reference database yielded a high-similarity (similarity ≥98%) match exclusively to its conspecific species (i.e., no other species exhibited an equal or higher similarity). For annotation against public databases, sequences were queried against the NCBI non-redundant nucleotide (nt) database using the BLASTn algorithm. (1) A query was classified as correctly annotated if the top hit had ≥98% sequence similarity and ≥90% query coverage, and matched the known species of the query. (2) A query was classified as incorrectly annotated if multiple hits had ≥98% similarity and none of them matched the correct species. (3) A query was classified as unidentified if no hit met the ≥98% similarity threshold. To ensure strict comparability between the local and public database evaluations, identical alignment algorithms (BLASTn), similarity thresholds (≥98%), coverage criteria (≥90%), and match selection rules were applied to both datasets. Any differences in annotation performance can therefore be attributed to the database content rather than procedural artifacts.

2.4. Phylogenetic Reconstruction and Genetic Threshold Analysis

Neighbor-joining (NJ) phylogenetic analysis and optimization of interspecific genetic divergence thresholds were performed as previously reported [18]. Specifically, raw sequencing reads were manually inspected, assembled, and trimmed using the SeqMan program within the DNAStar software package (version 7.1.0). Neighbor-joining phylogenetic trees were reconstructed based on the Kimura 2-parameter (K2P) model using MEGA 12 software, with branch support evaluated via 1000 bootstrap replicates. Pairwise genetic distances were calculated between sequences. A series of candidate interspecific divergence thresholds was evaluated at 0.00005 intervals within the range of 0 to 0.02. The threshold yielding the lowest species misidentification rate was identified as the optimal interspecific divergence threshold for species delimitation.

3. Results

3.1. Overview of the Local Reference Database

A total of 173 fish specimens representing 120 species were collected from the mainstem and major tributaries of the Yangtze River within the TGR between 2021 and 2024. Of these, 80 species were represented by 2–5 individuals each, while the remaining 40 species were represented by a single individual. The collected samples included one national first-class protected fish species of China, Acipenser dabryanus Duméril, 1869, and seven national second-class protected species of China: Coreius guichenoti Sauvage et Dabry, 1874, Onychostoma macrolepis Bleeker, 1871, Procypris rabaudi Tchang, 1930, Percocypris pingi Tchang, 1930, Schizothorax davidi Sauvage, 1880, Leptobotia elongata Bleeker, 1870, and Myxocyprinus asiaticus Bleeker, 1864. Sequencing was successful for 119 species for the 12S rRNA (170 sequences), 117 species for the 16S rRNA (160 sequences), and 119 species for the COI gene (154 sequences). In parallel, we downloaded, from public databases, additional sequences for 42 (12S rRNA), 44 (16S rRNA), and 42 (COI) fish species. In total, the combined local and public databases covered 161 fish species, belonging to 12 orders and 26 families. Cypriniformes was the most dominant order, accounting for 115 species, followed by Siluriformes (21 species) and Perciformes (8 species). Other orders included Gobiiformes (six species); Acipenseriformes, Salmoniformes, and Cichliformes (two species each); and Cyprinodontiformes, Beloniformes, Synbranchiformes, Clupeiformes, and Characiformes (one species each) (Table S1). Notably, Culter oxycephalus Bleeker, 1871 had no reference sequences in public databases, and the sequences obtained in this study represent the first reported reference sequences for this species.

3.2. Comparison of Species Coverage

A total of 169 fish species have been documented in the TGR, compiled from historical literature review and field sampling from this study (Table S1). Among the species collected in the TGR, sequences were successfully obtained for 119, 117, and 119 species for the 12S rRNA, 16S rRNA, and COI genes, respectively, corresponding to species coverage rates of 70.41% (119/169) for 12S rRNA, 69.23% (117/169) for 16S rRNA, and 70.41% (119/169) for COI. Public databases contained reference sequences for 160 of the 169 documented species, resulting in a higher overall coverage rate of 94.67% (160/169) compared to our self-constructed local database alone (Table 1). Eight species lacked reference sequences in both public and our local databases, including Megalobrama elongata Huang & Zhang, 1986, Onychostoma angustistomata Fang, 1940, Gobiobotia abbreviate Fang & Wang, 1931, Leiocassis tenuifurcatus Nichols, 1931, Pseudobagrus adiposalis Oshima, 1919, Paraprotomyzon lungkowensis Xie Yang & Gong, 1984, Paracobitis wujiangensis Ding & Deng, 1990, and Leptobotia tientainensis Wu, 1930 (Table S1).

3.3. Comparison of Sequence Similarity

Comparisons of sequence similarity between local sequences and public database entries revealed marker-specific differences. For 12S rRNA, 93.28% (111/119) showed high similarity (similarity ≥98%) to public sequences. For 16S rRNA, 89.74% (105/117) showed high similarity. In contrast, only 43.70% (52/119) had high-similarity COI sequences in public databases. Accordingly, 6.72% (8/119) of 12S rRNA, 10.26% (12/117) of 16S rRNA, and a substantial 56.30% (67/119) of COI sequences lacked high-similarity matches in public databases and could not be reliably identified to the species level using public resources alone (Table S1, Figure 2).

3.4. Comparison of Annotation Accuracy

To assess annotation accuracy, we used sequences from fish species that were present in both our local database and public databases as queries. These sequences were annotated against both databases, using sequence similarity and p-value as primary criteria, with cross-referencing against information from FishBase, Fishes of Sichuan, and Colored Atlas of Fishes of Sichuan. Given that the query sequences were also constituents of the local reference set, the accuracy observed for the local database represents its theoretical upper bound rather than an independent predictive performance. When evaluated under identical controlled conditions, the local database yielded substantially higher annotation accuracy than the public databases. For the 12S rRNA, the accuracy rate was 92.37% (109/118) with the local database, compared to 77.97% (92/118) with public databases. For the 16S rRNA, the accuracy was 93.10% (108/116) locally versus 75.00% (87/116) publicly. For the COI gene, the local database achieved 100% accuracy (118/118), whereas public databases achieved only 38.14% (45/118) (Table 2).

3.5. Phylogenetic and Interspecific Divergence Threshold Analysis

Based on the Kimura 2-parameter (K2P) model, we constructed neighbor-joining (NJ) phylogenetic trees for all sequences (from both our sequencing and public databases) of the 12S rRNA (Figure 3), 16S rRNA (Figure 4), and COI genes (Figure 5). The complete, high-resolution NJ trees with all individual tip labels are provided in Figures S1–S3 in the Supplementary Information. The overall mean genetic distances were 0.17702 for 12S rRNA, 0.23638 for 16S rRNA, and 0.19853 for COI. These markers showed insufficient discrimination power for some closely related species, with 15, 0, and 4 fish species exhibiting an interspecific genetic distance of zero for 12S rRNA, 16S rRNA, and COI, respectively (Table 3, Tables S2–S4). We determined the optimal interspecific divergence thresholds based on K2P genetic distances. The results showed optimal thresholds of 0.00448 for 12S rRNA, 0.00531 for 16S rRNA, and 0.00734 for COI (Tables S2–S4). At these thresholds, the corresponding species discrimination accuracy rates were 87.58% (141/161) for 12S rRNA, 90.06% (145/161) for 16S rRNA, and 95.03% (153/161) for COI.

4. Discussion

The development of curated, regionally representative DNA barcode libraries remains a critical bottleneck for reliable species detection using environmental DNA (eDNA) metabarcoding in freshwater ecosystems [19,20]. In the present study, we constructed a regional multi-marker eDNA reference database for the fish fauna of the TGR, based on morphologically validated voucher specimens. By comparing the local database with public repositories, we quantified disparities in species coverage, sequence similarity, and annotation accuracy, while also determining optimized interspecific genetic divergence thresholds for three widely used metabarcoding markers. Our findings address long-standing limitations of public databases in freshwater faunas [19] and provide preliminary evidence that a locally curated database outperforms general public databases for accurate species assignment under controlled conditions.

4.1. The Necessity of a Region-Specific, Multi-Marker Reference Database

In this study, despite achieving nominally higher species coverage than our local database, public databases exhibited deficits in taxonomic accuracy and regional representativeness. The high similarity (similarity ≥ 98%) of our local 12S rRNA and 16S rRNA sequences to those in public databases indicated their utility and conservation for taxonomic assignment in this region. This aligns with findings from large-scale assessments like those of the European marine fish assemblage, where 12S rRNA was identified as the preferred region for fish-targeting studies due to its specificity [8]. However, a stark contrast was observed for the COI gene, where over 56% of our locally generated sequences lacked high-similarity matches in public databases. This issue might be not merely one of missing data, but one of actively misleading data. Studies on freshwater vertebrates indicated that the COI coverage rate in NCBI/BOLD was less than 20% for most species groups in Sri Lanka, which severely limits the application of eDNA [21]. In addition, public databases are widely recognized to contain high rates of misidentified sequences, missing voucher information, and geographically mismatched individuals [20,22,23,24], all of which undermine taxonomic assignment in eDNA surveys. In this study, public sequences yielded high-similarity matches (≥98%) for only 43.70% of species for COI, which might point to a systemic problem of misidentification or poor sequence quality within public databases for this marker. Furthermore, public databases could only correctly annotate 77.97%, 75% and 38% of the shared species for 12S rRNA, 16S rRNA and COI genes, indicating that a substantial number of entries are likely mislabeled in public databases. Our results suggested that custom regional libraries are essential for eDNA metabarcoding in diverse freshwater ecosystems, as emphasized across marine and freshwater fish studies [2,19,25].
We note that since the query sequences were generated in this study and incorporated into the local database, the local database accuracy values represent an idealized upper bound of performance, and real-world in-field accuracy will likely be lower than these values. This limitation should be considered when interpreting the apparent superiority of the local database. Nevertheless, the comparison with public databases remains valid because public databases also contain sequences for these shared species; their lower annotation accuracy reflects mislabeling and quality issues rather than an absence of reference data. Even when accounting for the upper-bound nature of our local database accuracy estimates, the magnitude of the performance gap between the two databases suggests that public repositories are insufficient for reliable eDNA monitoring in the TGR, and a locally curated library is still critically needed. These results reinforce the conclusion that eDNA surveys cannot rely on public databases alone and that locally curated, voucher-validated references represent a necessary complement for achieving more reliable species-level taxonomic assignment [20,26].

4.2. Differential Performance and Complementarity of Molecular Markers

Our comparative evaluation of 12S rRNA, 16S rRNA, and COI revealed clear differences in species discrimination power, with COI exhibiting the highest resolution (95.03%), followed by 16S rRNA (90.06%) and 12S rRNA (87.58%). These results align with broad-scale DNA barcoding studies demonstrating that protein-coding genes such as COI generally outperform ribosomal RNA genes due to higher nucleotide substitution rates [22,23,27]. Notably, this observation is concordant with the finding that COI can resolve most closely related taxa, whereas 16S rRNA often fails to discriminate recently diverged or closely related congeners [28]. Similarly, a previous investigation of six cyprinid species in India further supported the superior discriminatory performance of COI relative to 16S rRNA, reinforcing that protein-coding genes provide more robust resolution for taxonomically challenging fish groups [29]. Although 12S rRNA remains popular for eDNA metabarcoding due to primer universality and short amplicon length [15], our results suggest that no single genetic marker is universally optimal for fish eDNA metabarcoding, and that a multi-marker strategy might provide more robust taxonomic coverage.

4.3. Determining and Applying Interspecific Genetic Thresholds

The determination of marker-specific optimal genetic thresholds is a cornerstone for objective and automated species taxonomic assignment in eDNA metabarcoding. Our K2P-based analyses yielded optimal thresholds of 0.00448 for 12S rRNA, 0.00531 for 16S rRNA, and 0.00734 for COI. These values are considerably lower than the often-cited 2–3% threshold in some animal groups [27]. Accumulating evidence demonstrates that eDNA-based investigations should adopt tailored thresholds for specific molecular markers and amplicons, rather than indiscriminately applying the universal 2% COI threshold [22,23,30,31]. Notably, for short ribosomal fragments such as 12S rRNA and 16S rRNA, the optimal K2P divergence thresholds typically range from 0.4% to 0.8% [32,33]. The short and highly conserved nature of 12S/16S rRNA amplicons leads to low evolutionary rates and consequently compressed interspecific genetic divergence; rigid application of the 2% threshold would therefore significantly increase the rate of false judgments [32,33,34,35]. In the present study, all amplicons targeting 12S rRNA, 16S rRNA, and COI were shorter than 400 bp, resulting in relatively low optimal threshold values. This observation supports recent critiques that fixed universal thresholds frequently overestimate or underestimate species diversity and should be replaced by clade-specific optimized thresholds.
The presence of species pairs with zero interspecific genetic distance across these markers indicated that certain closely related or recently diverged taxa cannot be resolved by these markers alone, a phenomenon often attributed to incomplete lineage sorting or introgression. This challenge is widespread in species-rich freshwater fish assemblages and reflects recent speciation, introgressive hybridization, or conserved barcode regions [23,36]. These results also highlight a fundamental limitation of single-locus approaches and reinforce the necessity of integrating multiple markers and, where possible, nuclear genomic data to resolve complex species boundaries [37]. The relatively high discrimination accuracy achieved at our calculated thresholds suggests their practical utility for distinguishing the vast majority of fish species in the TGR, thereby providing a provisional but empirically grounded foundation for future eDNA-based monitoring programs.

4.4. Implications for Conservation and Management

The TGR represents one of China’s most important freshwater ecosystems, harboring numerous endemic and threatened fish species while simultaneously facing intense anthropogenic pressures. The integrated reference database developed in this study could support the conservation and management goals outlined in the Yangtze River Ten-Year Fishing Ban and aquatic ecological monitoring programs. By providing high-quality reference sequences for 120 fish species, including first-reported sequences for Culter oxycephalus and sequences representing seven nationally protected species in China, our database will enable more accurate eDNA-based monitoring of fish diversity in the reservoir. This is particularly critical for rare and endangered species, which are often detected at low frequencies and require high-confidence assignments to inform conservation actions. Furthermore, our threshold analysis may provide quantitative criteria for converting eDNA sequencing reads into species presence records, facilitating standardized comparisons across monitoring campaigns and locations. The establishment of marker-specific thresholds allows for more objective biodiversity assessments, reducing the subjectivity inherent in arbitrary similarity cutoffs.

4.5. Study Limitations and Future Directions

Although the present study provides a critical foundational resource for regional DNA barcoding, several limitations warrant acknowledgment and outline clear directions for future work. First, among the 173 specimens collected, 40 species were represented by a single individual, restricting our ability to capture full intraspecific genetic variation within the TGR. Insufficient haplotype diversity could lead to false negatives if environmental sequences from unsampled variants fail to match the reference with high similarity. This limited sampling also constrains our ability to robustly characterize intraspecific genetic divergence, which may lead to underestimation of the optimal species delimitation thresholds. These singly represented taxa are largely rare or difficult-to-capture species, for which obtaining multiple vouchers is inherently challenging. Nevertheless, we are continuing targeted field sampling to expand haplotype coverage, and will periodically update the database with additional sequences as new specimens become available. Second, the current validation employed local Sanger sequences, rather than actual eDNA samples. This approach may not fully reflect the in-field performance of the database under real monitoring conditions. We therefore recognize that the true effectiveness of the local reference database can only be verified through blind testing with real eDNA samples. A dedicated pilot field study using water samples from the TGR is planned to empirically evaluate species detection rates and taxonomic resolution gained through use of the local database. Finally, because the current accuracy evaluation employed self-generated reference sequences as queries, the results should be interpreted as indicative rather than definitive. The circularity between query and reference datasets inherently inflates the apparent accuracy of the local database and must be regarded as a methodological limitation. Future validation using fully independent eDNA data will be necessary to measure real-world annotation performance.

5. Conclusions

The construction of a region-specific, multi-marker eDNA reference database represents an important step toward improving biodiversity monitoring accuracy. In this study, we established a multi-marker eDNA reference database for fishes in the TGR and provided preliminary evidence that locally curated libraries outperform public databases in taxonomic annotation accuracy under controlled conditions. However, given the circularity of the current validation design, these local accuracy rates must be interpreted as theoretical maximums, highlighting the need for future validation using independent environmental samples. The COI gene provided the highest species resolution when supported by local sequences, whereas 12S rRNA and 16S rRNA remained valuable for broad applicability and primer universality. The optimized genetic thresholds reported here could improve the reliability of species delimitation in this ecologically unique river–reservoir system. By highlighting the critical need for regional barcode libraries, this work supports biodiversity monitoring, conservation assessment, and enforcement of the Yangtze River Ten-Year Fishing Ban.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/fishes11050264/s1. Figure S1. Full expanded neighbor-joining tree of 12S rRNA sequences. Figure S2. Full expanded neighbor-joining tree of 16S rRNA sequences. Figure S3. Full expanded neighbor-joining tree of COI sequences. Table S1. List of fish species involved in the database. Table S2. Pairwise distance of genetic divergences (K2P) within various sequences for 12S rRNA. Table S3. Pairwise distance of genetic divergences (K2P) within various sequences for 16S rRNA. Table S4. Pairwise distance of genetic divergences (K2P) within various sequences for COI gene.

Author Contributions

Conceptualization, L.X. and Y.L.; methodology, L.X. and Y.P.; data curation, L.X., Z.S. and H.D.; writing—original draft preparation, L.X.; writing—review and editing, H.D., D.W. and H.T.; visualization, L X.; supervision, Y.L. and Z.S.; funding acquisition, L.X., Y.L. and X.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2023YFC3205903), Hubei Provincial Natural Science Foundation of China (No. 2023AFB558), Central Public-interest Scientific Institution Basal Research Fund, CAFS (No. 2023TD09), and Central Public-interest Scientific Institution Basal Research Fund (No. YFI202414).

Institutional Review Board Statement

The animal study protocol was approved by the Animal Experimental Ethical Inspection of Laboratory Animal Centre, Yangtze River Fisheries Research Institute, Chinese Academy of Fishery Sciences (protocol code: 2022TFI-XL-01; approval date: 4 March 2022).

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article and available from the corresponding authors upon reasonable request.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
TGRThree Gorges Reservoir
eDNAEnvironmental DNA
NJNeighbor joining
K2PKimura 2-parameter

References

  1. Beng, K.C.; Corlett, R.T. Applications of environmental DNA (eDNA) in ecology and conservation: Opportunities, challenges and prospects. Biodivers. Conserv. 2020, 29, 2089–2121. [Google Scholar] [CrossRef]
  2. Valentini, A.; Taberlet, P.; Miaud, C.; Civade, R.; Herder, J.; Thomsen, P.F.; Bellemain, E.; Besnard, A.; Coissac, E.; Boyer, F.; et al. Next-generation monitoring of aquatic biodiversity using environmental DNA metabarcoding. Mol. Ecol. 2016, 25, 929–942. [Google Scholar] [CrossRef]
  3. Liu, S.; Chen, J.; Cui, G.; Zhang, B.; Yan, B.; Nie, Q. Environmental DNA metabarcoding: Current applications and future prospects for freshwater fish monitoring. J. Environ. Manag. 2025, 376, 124531. [Google Scholar] [CrossRef]
  4. Belle, C.C.; Stoeckle, B.C.; Geist, J. Taxonomic and geographical representation of freshwater environmental DNA research in aquatic conservation. Aquat. Conserv. Mar. Freshw. Ecosyst. 2019, 29, 1996–2009. [Google Scholar] [CrossRef]
  5. Thomsen, P.F.; Willerslev, E. Environmental DNA—An emerging tool in conservation for monitoring past and present biodiversity. Biol. Conserv. 2015, 183, 4–18. [Google Scholar] [CrossRef]
  6. Keck, F.; Couton, M.; Altermatt, F. Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses. Mol. Ecol. Resour. 2023, 23, 742–755. [Google Scholar] [CrossRef] [PubMed]
  7. Jerde, C.L.; Mahon, A.R.; Campbell, T.; McElroy, M.E.; Pin, K.; Childress, J.N.; Armstrong, M.N.; Zehnpfennig, J.R.; Kelson, S.J.; Koning, A.A.; et al. Are Genetic Reference Libraries Sufficient for Environmental DNA Metabarcoding of Mekong River Basin Fish? Water 2021, 13, 1767. [Google Scholar] [CrossRef]
  8. Claver, C.; Canals, O.; de Amézaga, L.G.; Mendibil, I.; Rodriguez-Ezpeleta, N. An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study. Environ. DNA 2023, 5, 634–647. [Google Scholar] [CrossRef]
  9. Schenekar, T.; Schletterer, M.; Lecaudey, L.A.; Weiss, S.J. Reference databases, primer choice, and assay sensitivity for environmental metabarcoding: Lessons learnt from a re-evaluation of an eDNA fish assessment in the Volga headwaters. River Res. Appl. 2020, 36, 1004–1013. [Google Scholar] [CrossRef]
  10. Roesma, D.I.; Tjong, D.H.; Syaifullah, S.; Nofrita, N.; Janra, M.N.; Prawira, F.D.L.; Salis, V.M.; Aidil, D.R. The importance of DNA barcode reference libraries and selection primer pair in monitoring fish diversity using environmental DNA metabarcoding. Biodiversitas J. Biol. Divers. 2023, 24, 2251–2260. [Google Scholar] [CrossRef]
  11. Zhang, S.; Zhao, J.; Yao, M. A comprehensive and comparative evaluation of primers for metabarcoding eDNA from fish. Methods Ecol. Evol. 2020, 11, 1609–1625. [Google Scholar] [CrossRef]
  12. Chen, Z.; Cai, X.; Zhang, Q.; Li, G.; Ma, C.; Shen, Z. Preliminary construction and comparative analysis of environmental DNA metabarcoding reference database of freshwater fishes in Hainan Island. South China Fish. Sci. 2022, 18, 1–12. [Google Scholar]
  13. Zhang, H.; Kang, M.; Shen, L.; Wu, J.; Li, J.; Du, H.; Wang, C.; Yang, H.; Zhou, Q.; Liu, Z.; et al. Rapid change in Yangtze fisheries and its implications for global freshwater ecosystem management. Fish Fish. 2020, 21, 601–620. [Google Scholar] [CrossRef]
  14. Gao, X.C.; Jiang, W. The Construction and Application of BLAST Database of DNA Barcode forCommon Fish in the Three Gorges Reservoir. Genom. Appl. Biol. 2021, 40, 1952–1960. [Google Scholar]
  15. Miya, M.; Sato, Y.; Fukunaga, T.; Sado, T.; Poulsen, J.Y.; Sato, K.; Minamoto, T.; Yamamoto, S.; Yamanaka, H.; Araki, H.; et al. MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: Detection of more than 230 subtropical marine species. R. Soc. Open Sci. 2015, 2, 150088. [Google Scholar] [CrossRef]
  16. Evans, N.T.; Olds, B.P.; Renshaw, M.A.; Turner, C.R.; Li, Y.; Jerde, C.L.; Mahon, A.R.; Pfrender, M.E.; Lamberti, G.A.; Lodge, D.M. Quantification of mesocosm fish and amphibian species diversity via environmental DNA metabarcoding. Mol. Ecol. Resour. 2016, 16, 29–41. [Google Scholar] [CrossRef] [PubMed]
  17. Wang, S.; Yan, Z.; Hänfling, B.; Zheng, X.; Wang, P.; Fan, J.; Li, J. Methodology of fish eDNA and its applications in ecology and environment. Sci. Total Environ. 2021, 755, 142622. [Google Scholar] [CrossRef]
  18. Saitou, N.; Nei, M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987, 4, 406–425. [Google Scholar] [CrossRef] [PubMed]
  19. Marques, V.; Milhau, T.; Albouy, C.; Dejean, T.; Manel, S.; Mouillot, D.; Juhel, J. GAPeDNA: Assessing and mapping global species gaps in genetic databases for eDNA metabarcoding. Divers. Distrib. 2021, 27, 1880–1892. [Google Scholar] [CrossRef]
  20. Weigand, H.; Beermann, A.J.; Čiampor, F.; Costa, F.O.; Csabai, Z.; Duarte, S.; Geigerg, M.F.; Grabowski, M.; Rimet, F.; Rulik, B.; et al. DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work. Sci. Total Environ. 2019, 678, 499–524. [Google Scholar] [CrossRef] [PubMed]
  21. Ranaweera, J.B.S.; Fernando, T.S.P.; Weerakoon, D.K. eDNA Metabarcoding: Gaps of Publicly Available Reference Databases of Freshwater Vertebrates in Sri Lanka. Proc. Int. For. Environ. Symp. 2024, 27. [Google Scholar] [CrossRef]
  22. Zhang, H.; Bu, W. Exploring Large-Scale Patterns of Genetic Variation in the COI Gene among Insecta: Implications for DNA Barcoding and Threshold-Based Species Delimitation Studies. Insects 2022, 13, 425. [Google Scholar] [CrossRef] [PubMed]
  23. Koroiva, R.; Santana, D.J. Evaluation of partial 12S rRNA, 16S rRNA, COI and Cytb gene sequence datasets for potential single DNA barcode for hylids (Anura: Hylidae). An. Acad. Bras. Ciências 2022, 94, e20200825. [Google Scholar] [CrossRef] [PubMed]
  24. Fort, A.; McHale, M.; Cascella, K.; Potin, P.; Perrineau, M.; Kerrison, P.D.; da Costa, E.; Calado, R.; Domingues, M.D.R.; Azevedo, I.C.; et al. Exhaustive reanalysis of barcode sequences from public repositories highlights ongoing misidentifications and impacts taxa diversity and distribution. Mol. Ecol. Resour. 2021, 22, 86–101. [Google Scholar] [CrossRef]
  25. Stat, M.; John, J.; DiBattista, J.D.; Newman, S.J.; Bunce, M.; Harvey, E.S. Combined use of eDNA metabarcoding and video surveillance for the assessment of fish biodiversity. Biol. Conserv. 2019, 33, 196–205. [Google Scholar] [CrossRef]
  26. Jerde, C.L.; Wilson, E.A.; Dressler, T.L. Measuring global fish species richness with eDNA metabarcoding. Mol. Ecol. Resour. 2019, 19, 19–22. [Google Scholar] [CrossRef]
  27. Hebert, P.D.N.; Cywinska, A.; Ball, S.L.; Dewaard, J.R. Biological identifications through DNA barcodes. Proc. R. Soc. B Biol. Sci. 2003, 270, 313–321. [Google Scholar] [CrossRef]
  28. Kochzius, M.; Seidel, C.; Antoniou, A.; Botla, S.K.; Campo, D.; Cariani, A.; Vazquez, E.G.; Hauschild, J.; Hervet, C.; Hjörleifsdottir, S.; et al. Identifying Fishes through DNA Barcodes and Microarrays. PLoS ONE 2010, 5, e12620. [Google Scholar] [CrossRef]
  29. Mohanty, M.; Jayasankar, P.; Sahoo, L.; Das, P. A comparative study of COI and 16 S rRNA genes for DNA barcoding of cultivable carps in India. Mitochondrial DNA 2015, 26, 79–87. [Google Scholar] [CrossRef]
  30. Ma, Z.; Ren, J.; Zhang, R. Identifying the Genetic Distance Threshold for Entiminae (Coleoptera: Curculionidae) Species Delimitation via COI Barcodes. Insects 2022, 13, 261. [Google Scholar] [CrossRef]
  31. Magoga, G.; Fontaneto, D.; Montagna, M. Factors affecting the efficiency of molecular species delimitation in a species-rich insect family. Mol. Ecol. Resour. 2021, 21, 1475–1489. [Google Scholar] [CrossRef]
  32. Jackman, J.M.; Benvenuto, C.; Coscia, I.; Carvalho, C.O.; Ready, J.S.; Boubli, J.P.; Magnusson, W.E.; McDevitt, A.D.; Sales, N.G. eDNA in a bottleneck: Obstacles to fish metabarcoding studies in megadiverse freshwater systems. Environ. DNA 2021, 3, 837–849. [Google Scholar] [CrossRef]
  33. Milan, D.T.; Mendes, I.S.; Damasceno, J.S.; Teixeira, D.F.; Sales, N.G.; Carvalho, D.C. New 12S metabarcoding primers for enhanced Neotropical freshwater fish biodiversity assessment. Sci. Rep. 2020, 10, 17966. [Google Scholar] [CrossRef]
  34. Fontes, J.T.; Katoh, K.; Pires, R.; Soares, P.; Costa, F.O. Benchmarking the discrimination power of commonly used markers and amplicons in marine fish (e)DNA (meta)barcoding. Metabarcoding Metagenomics 2024, 8, e128646-320. [Google Scholar] [CrossRef]
  35. Shu, L.; Ludwig, A.; Peng, Z. Environmental DNA metabarcoding primers for freshwater fish detection and quantification: In silico and in tanks. Ecol. Evol. 2021, 11, 8281–8294. [Google Scholar] [CrossRef] [PubMed]
  36. Collins, R.A.; Bakker, J.; Wangensteen, O.S.; Soto, A.Z.; Corrigan, L.; Sims, D.W.; Genner, M.J.; Mariani, S. Non-specific amplification compromises environmental DNA metabarcoding with COI. Methods Ecol. Evol. 2019, 10, 1985–2001. [Google Scholar] [CrossRef]
  37. Doorenweerd, C.; Jose, M.S.; Leblanc, L.; Barr, N.; Geib, S.M.; Chung, A.Y.C.; Dupuis, J.R.; Ekayanti, A.; Fiegalan, E.; Hemachandra, K.S.; et al. Towards a better future for DNA barcoding: Evaluating monophyly- and distance-based species identification using COI gene fragments of Dacini fruit flies. Mol. Ecol. Resour. 2024, 24, e13987. [Google Scholar] [CrossRef]
Figure 1. Sampling site locations.
Figure 1. Sampling site locations.
Fishes 11 00264 g001
Figure 2. Comparison of sequence similarity between local and public databases.
Figure 2. Comparison of sequence similarity between local and public databases.
Fishes 11 00264 g002
Figure 3. Simplified neighbor-joining tree for the 12S rRNA. Major orders are collapsed into triangles to show taxonomic coverage.
Figure 3. Simplified neighbor-joining tree for the 12S rRNA. Major orders are collapsed into triangles to show taxonomic coverage.
Fishes 11 00264 g003
Figure 4. Simplified neighbor-joining tree for the 16S rRNA. Major orders are collapsed into triangles to show taxonomic coverage.
Figure 4. Simplified neighbor-joining tree for the 16S rRNA. Major orders are collapsed into triangles to show taxonomic coverage.
Fishes 11 00264 g004
Figure 5. Simplified neighbor-joining tree for COI. Major orders are collapsed into triangles to show taxonomic coverage.
Figure 5. Simplified neighbor-joining tree for COI. Major orders are collapsed into triangles to show taxonomic coverage.
Fishes 11 00264 g005
Table 1. Comparison of species numbers between public databases and the custom local database for the Three Gorges Reservoir.
Table 1. Comparison of species numbers between public databases and the custom local database for the Three Gorges Reservoir.
GeneLocal-OnlySharedPublic-OnlyMissingTotal
12S rRNA1118428169
16S rRNA1116448169
COI1118428169
Local-only: local endemic species; Shared: shared species; Public-only: public database-specific species; Missing: species without reference sequences.
Table 2. Comparison of annotation accuracy for shared species between the local and public databases.
Table 2. Comparison of annotation accuracy for shared species between the local and public databases.
Reference DatabaseTarget GeneNumber of Shared SpeciesLow-Similarity SpeciesUncertain or Misannotated SpeciesCorrectly Annotated SpeciesAnnotation Accuracy (%)
Public Database 12S rRNA1188189277.97%
16S rRNA11612178775.00%
COI1186674538.14%
Local Database12S rRNA1180910992.37%
16S rRNA1160810893.10%
COI11800118100.00%
Uncertain species were defined as cases where the query sequence showed identical matches (sequence similarity = 100%) to multiple candidate species, all of which had historical distribution records in the study area, making it difficult to confidently rule out erroneous species assignments.
Table 3. Summary of species pairs showing zero interspecific distance.
Table 3. Summary of species pairs showing zero interspecific distance.
MarkerSpeciesPairwise Distance
12S rRNACulter oxycephalusAncherythroculter nigrocauda0
Culter dabryiAncherythroculter nigrocauda0
Culter dabryiCulter oxycephalus0
Pseudobagrus prattiPseudobagrus truncatus0
Jinshaia abbreviataJinshaia sinensis0
Glyptothorax fokiensisGlyptothorax sinensis0
Discogobio brachyphysallidosDiscogobio yunnanensis0
Xenocypris fangiXenocypris argentea0
Gnathopogon herzensteiniGnathopogon imberbis0
COIXenocypris fangiXenocypris argentea0
Gnathopogon herzensteiniGnathopogon imberbis0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, L.; Pu, Y.; Deng, H.; Tian, H.; Wang, D.; Duan, X.; Shen, Z.; Li, Y. Building a Local Multi-Marker eDNA Reference Database Reveals the Limitations of Public Repositories for Freshwater Fish Monitoring in the Three Gorges Reservoir. Fishes 2026, 11, 264. https://doi.org/10.3390/fishes11050264

AMA Style

Xie L, Pu Y, Deng H, Tian H, Wang D, Duan X, Shen Z, Li Y. Building a Local Multi-Marker eDNA Reference Database Reveals the Limitations of Public Repositories for Freshwater Fish Monitoring in the Three Gorges Reservoir. Fishes. 2026; 11(5):264. https://doi.org/10.3390/fishes11050264

Chicago/Turabian Style

Xie, Lang, Yan Pu, Huatang Deng, Huiwu Tian, Dengqiang Wang, Xinbin Duan, Ziwei Shen, and Yunfeng Li. 2026. "Building a Local Multi-Marker eDNA Reference Database Reveals the Limitations of Public Repositories for Freshwater Fish Monitoring in the Three Gorges Reservoir" Fishes 11, no. 5: 264. https://doi.org/10.3390/fishes11050264

APA Style

Xie, L., Pu, Y., Deng, H., Tian, H., Wang, D., Duan, X., Shen, Z., & Li, Y. (2026). Building a Local Multi-Marker eDNA Reference Database Reveals the Limitations of Public Repositories for Freshwater Fish Monitoring in the Three Gorges Reservoir. Fishes, 11(5), 264. https://doi.org/10.3390/fishes11050264

Article Metrics

Back to TopTop