Next Article in Journal
Kinetic Analysis of the Thermal Inactivation Behavior of AMP Deaminase and IMPase in Each Muscle Type of Yellowtail Seriola quinqueradiata
Previous Article in Journal
Dietary Supplementation of Astragalus Polysaccharides Modulates Growth Physiology, Metabolic Homeostasis, and Innate Immune Responses in Rice Field Eels (Monopterus albus)
Previous Article in Special Issue
DNA Barcoding of the Genus Discogobio (Teleostei, Cyprinidae) in China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of Bioinformatic Pipelines for eDNA Metabarcoding Data Analysis of Fish Populations †

by
Romulo A. dos Santos
1,2 and
Petr Blabolil
1,2,*
1
Faculty of Science, University of South Bohemia in České Budějovice, Branišovská 1760, 370 05 České Budějovice, Czech Republic
2
Institute of Hydrobiology, Biology Centre of the Czech Academy of Sciences, Na Sádkách 7, 370 05 České Budějovice, Czech Republic
*
Author to whom correspondence should be addressed.
This work is part of the Bachelor Thesis of the first author Romulo A. dos Santos. Bachelors in applied informatics at University of South Bohemia, Faculty of Science, České Budějovice, Czech Republic.
Fishes 2025, 10(5), 214; https://doi.org/10.3390/fishes10050214
Submission received: 25 January 2025 / Revised: 16 April 2025 / Accepted: 30 April 2025 / Published: 6 May 2025
(This article belongs to the Special Issue Fish DNA Barcoding)

Abstract

:
Environmental DNA (eDNA) metabarcoding has gained popularity as a biomonitoring tool, leading to the emergence of various bioinformatic pipelines. However, comparisons are essential to assessing the reliability and similarity of results. In this study, we compared five bioinformatic pipelines (Anacapa, Barque, metaBEAT, MiFish, and SEQme) using samples collected from three reservoirs in the Czech Republic during the autumn and summer seasons. Negative and positive controls were used to monitor potential contamination during sample processing. eDNA was amplified, targeting the 12S fish rRNA gene, sequenced, and analyzed with the selected bioinformatic pipelines. Statistical analyses were applied to assess result similarity, including the number of detected taxa, read count, alpha and beta diversities, and the Mantel test. Our findings showed consistent taxa detection across pipelines, with increased sensitivity compared to traditional methods. Alpha and beta diversities and the Mantel test also exhibited significant similarities between pipelines. Divergences were observed based on the reservoir, season, and their interaction. In conclusion, the choice of bioinformatic pipeline did not significantly affect metabarcoding outcomes or their ecological interpretation.
Key Contribution: This study demonstrates that the choice of bioinformatic pipeline does not significantly influence metabarcoding outcomes or their ecological interpretation.

1. Introduction

Environmental DNA (eDNA) metabarcoding research has witnessed a recent surge in popularity and rapid advancement [1]. The eDNA metabarcoding technique represents a non-invasive method for multi-taxa identification, involving the analysis of DNA extracted from various organisms sourced from environmental elements such as water and soil [2]. Notably, its application in detecting freshwater fish has demonstrated superiority over conventional methods, particularly in detecting elusive species and achieving a higher overall taxa count [3]. The successful implementation of eDNA metabarcoding experiments hinges upon the meticulous consideration of various factors influencing taxa detection. Species display distinct habitat preferences, ranging from deep-water realms characterized by dimness, coldness, oxygen scarcity, and heightened pressure, to open-water zones boasting brighter environments, warmth, higher oxygen content, and reduced pressure [4]. Consequently, optimal sample site selection becomes pivotal to capturing taxa across divergent habitats and behaviors. In the intricate interplay of eDNA persistence within freshwater ecosystems, temporal dynamics play a significant role [5]. This degradation process is significantly modulated by environmental conditions, most notably temperature. The impact of temperature is further mediated by UV-B exposure and pH levels, with these factors collectively influencing microbial growth and activity, which, in turn, impacts degradation dynamics [6]. Importantly, the transport of eDNA downstream from its origin can span several kilometers, necessitating its incorporation into both sampling design and subsequent data analysis [7]. Therefore, comprehensive sample collection across diverse environmental conditions, spanning both warm and cold seasons, becomes paramount to ensure robust and representative results.
Environmental DNA metabarcoding outputs are dependent on primer pair selection. Primers are specifically designed to amplify short, conserved regions of mitochondrial DNA, such as the 12S rRNA, cytochrome c oxidase I (COI), 16S rRNA, and Cytochrome b (Cyt b) genes, which offer sufficient variability to distinguish between species [8]. The 12S gene is highly conserved across vertebrates, which allows for the design of universal primers that can amplify a wide range of fish species from a single eDNA sample. At the same time, it contains enough sequence variation to differentiate between species, making it suitable for taxonomic resolution at the species or genus level. Additionally, 12S amplicons are typically short (around 100–200 bp), which is ideal for eDNA studies since eDNA is often degraded and fragmented. The extensive reference databases available for 12S sequences further support accurate taxonomic assignment, making it a reliable and efficient choice for monitoring fish biodiversity in aquatic environments [9].
The other important aspect of eDNA metabarcoding is the selection of sequencing platforms. Platforms such as Illumina, Ion Torrent, and others differ significantly in their underlying chemistries, which directly influence the quality and characteristics of the sequencing output [10]. Illumina relies on sequencing by synthesis with reversible dye terminators, producing highly accurate, high-throughput short reads with uniform error profiles, predominantly substitution errors [11]. In contrast, Ion Torrent uses semiconductor-based sequencing that detects hydrogen ions released during nucleotide incorporation, which tends to introduce more insertion and deletion errors, especially in homopolymeric regions [12]. These differences in error types and rates impact how raw data are processed. For instance, the DADA2 pipeline, which is widely used for amplicon sequence variant (ASV) inference, takes these platform-specific error characteristics into account. The DADA2 manual emphasizes adjusting error modeling parameters and preprocessing steps based on the sequencing platform used, ensuring accurate denoising and variant calling [13]. Therefore, understanding the nuances of sequencing chemistry is critical to tailoring the bioinformatics workflow to maintain data quality and biological relevance.
Data analysis is conducted in silico through a series of well-defined steps. The typical bioinformatic pipeline workflow commences with sequence demultiplexing, a process that employs the sequence-embedded tag to allocate data to respective source samples. Subsequent stages involve trimming, merging, filtering, dereplication, clustering, or inference, culminating in taxonomic assignments. Demultiplexing constitutes the initial step, facilitating the allocation of sequences to their corresponding source samples by utilizing embedded sequence tags. Trimming, the subsequent stage, involves the removal of adapter sequences and low-quality bases, ensuring data quality and integrity. In the merging step, forward and reverse reads are unified into a singular sequence, ameliorating potential sequencing-induced errors. During the filtering phase, sequences are scrutinized and culled based on criteria such as length and quality. Additional refinement transpires through a secondary filtering process aimed at the removal of chimeric sequences. Employed to expedite subsequent procedures by circumventing redundant comparisons, dereplication amalgamates identical sequences. The subsequent stage employs an inference process to cluster related sequences into operational taxonomic units (OTUs) or ASVs. This correctional measure mitigates errors introduced during polymerase chain reaction (PCR) and sequencing. Ultimately, the journey culminates in the assignment of sequences to specific taxonomic ranks [14]. In essence, this intricate pipeline creates a sequence of meticulously orchestrated steps, imparting robustness and accuracy to the ensuing taxonomic identification and analysis.
A multitude of distinct bioinformatic pipelines have been developed to analyze eDNA metabarcoding data [15,16,17]. However, it is crucial to conduct a comparative assessment of these pipelines to ascertain their similarities and determine the most appropriate choice for a given research objective. Ensuring the reliability and reproducibility of eDNA metabarcoding studies demands a thorough exploration of the strengths and limitations of each tool before opting for one. This investigation is essential given the considerable variability in the performance of bioinformatic tools, wherein some exhibit superior accuracy, sensitivity, and specificity over others [18]. Comparisons can be made by applying robust statistical procedures, such as a Z-score ranking procedure and a network meta-analysis method, to identify software tools that are consistently accurate for mapping DNA sequences to taxonomic hierarchies [19] or stressing individual steps to check the reliability of the tools and computational strategies that produce robust results, as well as emphasizing the importance of understanding how these tools influence ecological insights, thus providing a more comprehensive framework for selecting and utilizing those bioinformatics tools (this study). The performance of these tools is often contingent on the reference database or the matrix of OTUs employed. In essence, the careful selection of bioinformatic tools and reference databases or OTU matrices is of paramount importance in guaranteeing the trustworthiness and replicability of outcomes in eDNA metabarcoding studies [20].
This study undertook a comprehensive comparison of five distinct bioinformatic pipelines, encompassing the quintessential tools employed in metabarcoding fish eDNA data from the analysis of three Czech reservoirs. The Anacapa pipeline [16] predominantly hinges upon the utilization of DADA2 [13] for executing various steps in metabarcoding data analysis, primarily centering on the detection of ASVs as opposed to the conventional approach of OTUs clustering. Unlike OTUs, which entail aggregations of sufficiently similar sequences, ASVs are deduced via an error model, allowing for discrimination between authentic biological sequences and those containing errors [21]. Additionally, taxonomic assignment of sequences is achieved through the Bayesian lowest common ancestor (BLCA) method [22]. In contrast to standard machine learning models, BLCA necessitates no preparatory training step; instead, it relies on the alignment between reads and a reference database. The Barque pipeline [23], in contrast, abstains from OTU or ASV clustering, opting solely for read annotation. Here, reads are matched against a reference database for taxonomic assignment, facilitated by an alignment-based taxonomy approach employing global alignment from VSEARCH [24]. In a parallel vein, the metaBEAT pipeline [25] shares similar tools with Barque throughout the various stages of metabarcoding data analysis. Notable exceptions lie in the creation of OTUs through VSEARCH [24] and the employment of BLAST [26] for local alignment-based taxonomic assignment. MiFish pipeline [9] similarly relies on BLAST-based alignment for taxonomic classification. Yet, it deviates in the choice of programs adopted for interim analysis steps. On a divergent trajectory, the SeqME pipeline [27] initiates sequence merging prior to trimming, contrasting with the typical practice of other bioinformatic pipelines. Eventually, this pipeline integrates a machine learning paradigm for taxonomic classification, leveraging a Bayesian classifier sourced from the Ribosomal Database Project (RDP) [28]. Collectively, these bioinformatic pipelines encapsulate diverse methodologies, each tailored to distinct facets of eDNA metabarcoding data analysis. The comprehensive evaluation presented here furnishes a pivotal foundation for the judicious selection of a pipeline aligned with the specific demands and nuances of a given research pursuit.
A post-execution comparison of the bioinformatic pipelines centers around the crucial aspect of taxonomic assignment. This evaluation encompasses multiple dimensions, including execution time, number of sequences assigned, species detection count, alpha and beta diversities, the Mantel test, and the successful detection of both positive and negative control samples. Notably, execution time serves as a yardstick for assessing the impact of individual tools within the workflow on the overall processing time. Monitoring the number of sequence reads assigned at each step of the pipeline offers insights into potential significant deviations from the mean. Species detection within the bioinformatic pipelines is subjected to comprehensive scrutiny, drawing comparisons not only among the pipelines themselves but also against the species cataloged through conventional methodologies, yielding valuable insights into result reliability. The utilization of alpha diversity enables the quantification of species diversity, fostering inter-group comparisons [29]. Accompanying this is the computation of the Shannon index, which aids in verifying the equitable distribution of sequence reads across species [30]. The exploration of beta diversity parallels the efforts in alpha diversity, delving into species diversity considerations, while additionally accounting for the composition of the species—a dimension often overlooked despite its profound influence on ecological dynamics [31]. The assessment extends further, employing the Jaccard index to probe the presence/absence of species, and incorporating the Bray–Curtis index to factor in the allocation of sequence reads for individual species [32,33]. Notably, the inclusion of positive and negative control samples within both field and laboratory stages of eDNA metabarcoding furnishes an essential reliability checkpoint. The taxonomic assignment data derived from these control samples bolster the credibility of the dataset. Undoubtedly, eDNA metabarcoding studies hold immense ecological significance. The data output from these bioinformatic pipeline analyses assumes a critical role in guiding the actions of scientists, water and fisheries managers, and nature protection agencies, paving the way for informed decisions aimed at safeguarding natural habitats and wildlife. Consequently, the imperative nature of conducting robust comparisons among bioinformatic pipelines becomes strikingly evident, representing a collective stride toward preserving our planet for present and future generations.
In this study, we hypothesized that pipeline-specific quality filtering strategies would result in different read-retention rates across pipelines. The use of different clustering or denoising methods will lead to varying numbers and composition of ASVs or OTUs. Pipelines using different taxonomic assignment algorithms will produce varying taxonomic profiles and confidence levels. The thresholds for sequence identity and query coverage used in taxonomic assignment will influence the number of taxa detected and their taxonomic resolution. Differences in primer trimming tools and strategies will affect downstream taxonomic assignments due to variations in retained sequence regions. Pipelines incorporating additional filtering steps based on sequence length or ambiguous bases will yield cleaner but potentially smaller datasets, influencing biodiversity estimates.

2. Materials and Methods

2.1. Samples Collection

The study was conducted in Klíčava, Římov, and Žlutice, three reservoirs in the Czech Republic that have been monitored repeatedly during the last decade by boat electrofishing [34], shore seining [35], pelagic trawling [36,37], hook lines [38], and benthic and pelagic gillnets [39] (Table 1). All of these methods are included in the set of conventional methods, and the cumulative number of fish species detected by these methods in 2018–2020 is used to compare with the 2018 eDNA results. The reservoirs have distinct geographical and morphological characteristics (Table 1) and are dominated by cyprinids (Table 2).
The water samples for eDNA were collected in the littoral, pelagic surface, deep water (5, 10, 20 m, under the thermocline) in the entire longitudinal profile, and inflows, with one additional sample collected in the side bay, to cover all major habitats. A total of 29 samples were taken in the summer and 30 in the autumn season of 2018 in Klíčava, 38 and 35 in Římov, and 28 and 29 in Žlutice, respectively (Figure 1).
In the field, 2 L water samples were prepared by pooling five 400 mL subsamples, prefiltered through a 40 µm sterile plankton net to remove seston, stored in sterile bottles, and transported on ice. In the laboratory (within 24 h, stored at 4 °C), 1 L of water per sample was filtered (2 × 0.5 L) using sterile 0.45 µm mixed cellulose acetate and cellulose nitrate filters (diameter 47 mm, Whatman, Maidstone, UK) with a vacuum pump. DNA was extracted using the Mu-DNA water protocol [40], including two field blanks per sampling event. A subset of samples was used to test whole-filter extraction, but due to low DNA yields (average 7.7 ng/µL), the protocol was optimized by increasing the lysis solution and additive to 900 µL and 300 µL, respectively. Centrifugation time was extended to 2 min, and all lysates were transferred to tubes with flocculant for inhibitor removal. The eluate was reapplied to the column to enhance yield. Extractions were performed separately for each sampling campaign to reduce contamination, and blanks were included (field, filtration, extraction), with one pooled extraction blank per campaign. DNA was stored at −20 °C for up to a week before PCR.
PCR targeted a 73–110 bp fragment of the vertebrate mitochondrial 12S rRNA gene using primers from Riaz et al. [41]. Reactions were prepared in UV-sterilized laminar flow hoods using individually capped eight-strip tubes. Negative (molecular grade water) and positive controls (0.05 ng/µL Maylandia zebra tissue DNA) were included. Each campaign was processed separately with its own blanks. Primers had 24 unique indices and heterogeneity spacers. The first PCR (25 µL) included Q5® High-Fidelity 2X Master Mix (New England Biolabs®, Ipswich, MA, USA) (12.5 µL), primers (Integrated DNA Technologies, Solihull, UK) (1.5 µL each, 10 µM), BSA (Fisher Scientific, Loughborough, UK) (0.5 µL), DNA (5 µL), and molecular grade water (Fisher Scientific, UK) (4 µL). Thermocycling: 5 min at 98 °C; 35 cycles (98 °C 10 s, 58 °C 20 s, 72 °C 30 s); final extension at 72 °C for 7 min. PCRs were run in triplicate and pooled. Products were visualized on 2% agarose gels with GelRed (Cambridge Bioscience, Bar Hill, UK) and normalized into 12 sub-libraries based on band strength (5–20 µL). Controls were included. Sub-libraries were purified via double size selection using Mag-Bind® beads (Omega Bio-tek, Norcross, GA, USA) (0.9× and 0.15× ratios).
A second PCR added Illumina adapters (Illumina, San Diego, CA, USA) (50 µL reactions: 25 µL Q5® High-Fidelity 2X Master Mix (New England Biolabs®, MA, USA), 3 µL each primer, 4 µL purified product, and 15 µL molecular grade water (Fisher Scientific, UK)), with the following conditions: 95 °C for 3 min; 10 cycles (98 °C 20 s, 72 °C 1 min); final extension at 72 °C for 5 min. Duplicates were pooled and purified with magnetic beads (Omega Bio-tek, GA, USA) (0.7× and 0.15×). Sub-libraries were pooled based on sample count and DNA concentration, excluding blanks and negative controls. The final pooled library was purified again and quantified via qPCR (NEBNext® Kit, New England Biolabs®, MA, USA) and checked for secondary products using Agilent TapeStation (Agilent Technologies, Santa Clara, CA, USA). The sequencing was performed on an Illumina MiSeq® (600-cycle v3 kit, Illumina, CA, USA) at 13 pM with 10% PhiX Control v3 (Illumina, CA, USA). For methodological details, see Blabolil et al. [42].

2.2. Bioinformatic Processing

A Linux Ubuntu Mate v18.04 Server computer with Intel Xeon CPU E5-2620 v2 2.10 GHz × 12 and 24 GB RAM was used in the subsequent bioinformatic pipeline execution and data analysis. The reference database based on the molecular marker 12S rRNA gene was created by updating the database developed by the Evolutionary and Environmental Genomics group (EvoHull) from the University of Hull, United Kingdom [3]. The database was updated to represent all fish species in Czechia based on the availability of new sequences in GenBank and de novo sequences (Leuciscus aspius GenBank accession numbers: MT163435, MT163450, MT163449) and Coregonus maraena GenBank accession numbers: MT163451, MT163458, MT163460). In addition, the database was curated following the workflow from the EvoHUll group (https://github.com/HullUni-bioinformatics/Curated_reference_databases, accessed on 3 January 2021), which consists of keeping only the sequence region for the molecular marker 12S rRNA gene, removing redundant sequences, filtering sequences by length, and correcting taxonomically mislabeled sequences using the SATIVA v0.9-55-g0cbb090 (accessed on 4 January 2021) [43] tool. In addition, the names of all species were joined, or the name of the genus was used to represent species involved in a multiple hit. Leuciscus aspius and Scardinius erythrophthalmus were joined to Aspius+Scardinius, Leuciscus idus and Leuciscus leuciscus to L.idus+leuciscus, Blicca bjoerkna and Vimba vimba to Blicca+Vimba, and Perca fluviatilis and Sander lucioperca to Sander+Perca. Furthermore, sturgeons were joined to Acipenser-sp and whitefishes to Coregonus-sp to represent the genus. Acipenser sturio and Acipenser ruthenus, although having different subsequences in the amplification region, were also joined to Acipenser-sp.
After the EvoHull workflow, a manual curation of eDNA sequenced reads was applied to remove sequences not covered by the amplification region after aligning the reference database file to the forward and reverse primers. Sequences with identical subsequences in the amplification region were also removed. However, one sequence representing the group was kept if they were from the same species. Before the bioinformatic pipeline execution, the FASTQ raw eDNA sequence reads resulting from sequencing the amplicons were demultiplexed using a custom Python script to group the sequences to the respective initial samples. In addition, Cutadapt v1.18 (accessed on 1 December 2020) [44] was used to remove any adapters at the 3’ end of the sequences. Finally, the reference database was converted to FASTA format using a custom Python script according to the requirements of each pipeline, with an additional table for the Anacapa and seqME bioinformatic pipelines.
The Anacapa pipeline (accessed on 2 December 2020) [16] was developed at UCLA (University of California, Los Angeles, USA) by the CALeDNA program. The primer configuration files (forward_primers.txt and reverse_primers.txt files) were filled with the respective primer sequences (forward ACTGGGATTAGATACCCC and reverse TAGAACAGGCTCCTCTAG). In addition, the minimum merge length was set to 126. Primers and adapters were removed using Cutadapt v1.18 (accessed on 1 December 2020) [44]. Sequences with quality scores below 20 were trimmed, and sequences with lengths below 90 bases were removed using FASTX-Toolkit v0.0.14 (accessed on 2 December 2020) [45]. DADA2 v1.6.0 (accessed on 2 December 2020) [13] was used to filter, trim, dereplicate, merge, filter chimera, and infer ASV. The reference database previously converted to FASTA format, and the taxonomic table generated during the conversion were used to create a bowtie2 index database using Bowtie 2 v2.3.5.1 (accessed on 2 December 2020) [46]. In addition, Bowtie 2 was used to align the sequences to the reference database. The alignment hits were then used by the Bayesian lowest common ancestor (BLCA) [22] to assign taxonomy for the alignments and to generate the confidence scores for each assignment. Finally, only taxa having at least 50% confidence were kept in the final table.
The Barque pipeline v1.7.2 (accessed on 3 December 2020) [23] was developed in Louis Bernatchez’s laboratory at Laval University (Quebec City, Canada). The configuration file used in the execution can be found on GitHub (https://github.com/RomuloAS/eDNA_metabarcoding_pipelines_comparison). The file with PCR primer information was filled out with forward (ACTGGGATTAGATACCCC) and reverse (TAGAACAGGCTCCTCTAG) primers, 90 and 110 for the minimum and maximum amplicon size, respectively, the name of the reference database file, and 100% for the species identity. Bases were trimmed using Trimmomatic v0.36 (accessed on 3 December 2020) [47] until a base with a quality score equal to or above 20 was found. In addition, subsequences with 20 bases long were trimmed if the average score quality of the bases dropped below 20. Trimmomatic was also used to filter sequences shorter than 90 bases. FLASH (Fast Length Adjustment of SHort reads) v1.2.11 (accessed on 3 December 2020) [48] was used to merge forward (R1) and reverse (R2) reads from paired-end sequencing. Sequences smaller than 90 bases and larger than 110 were filtered out using a custom Python script. VSEARCH v2.14.2 (accessed on 3 December 2020) [24] was used to dereplicate (group identical sequences) and to remove chimeric sequences. Finally, a global alignment algorithm implemented on VSEARCH was used to align the sequences to the reference database for the taxonomic assignment using a threshold of 100% for the sequence identity and 85% for the sequence query cover.
MetaBEAT [25] is a pipeline for the analysis of metabarcoding data developed at the University of Hull, UK. The data processing and taxonomic assignment were performed using metaBEAT v0.97.10 (accessed on 3 January 2021) [25] and a workflow based on Hänfling et al. [3]. Trimmomatic v0.32 (accessed on 3 December 2020) [47] was used for quality trimming at both ends by removing bases until a score of 20 or higher was found. In addition, the average quality of a window of 5 bases was assessed, and subsequences were dropped if below 20. Trimmomatic was also used to discard sequences shorter than 90 bases. Forward (R1) and reverse (R2) reads generated from paired-end (PE) sequencing were merged using FLASH v1.2.11 (accessed on 3 December 2020) [48]. Sequences were clustered by applying a threshold of 100% identity using a variant of the UCLUST algorithm [49] implemented on VSEARCH v1.1.0 (accessed on 3 January 2021) [24]. In addition, VSEARCH was also used to remove chimeric sequences. Clusters with fewer than 3 sequences were omitted from the taxonomic assignment. BLAST v2.2.28+ (accessed on 3 January 2021) [26] was used to align the read sequences to the reference database. Alignments with bit score, query cover, and identity smaller than 80, 85%, and 100%, respectively, were discarded. Finally, a custom Python script implementing the lowest common ancestor (LCA) algorithm was used to assign the lowest common taxonomic rank.
MiFish [9] is a metabarcoding pipeline created to be used together with a set of PCR primers, both developed by the same group at the University of Tokyo, Japan. FASTQC v0.11.9 (accessed on 23 December 2020) [50] was used to assess the quality of the sequences. A quality trimming using DynamicTrim v1.13 from the SolexaQA package (accessed on 23 December 2020) [51] was applied, keeping only sequences with a score greater than 20. FLASH v1.2.11 (accessed on 3 December 2020) [48] was used to merge forward (R1) and reverse (R2) reads, considering both “innie” and “outie” orientations and setting the minimum overlap and maximum overlap to 15 and 150, respectively. Sequences with ambiguous bases (represented by the letter N), with lengths smaller than 90 or larger than 150, were discarded using Perl scripts. Primers were trimmed from sequences using TagCleaner v0.16 (accessed on 23 December 2020) [52]. USEARCH v11.0.667 (accessed on 23 December 2020) [49] was used to dereplicate, align, and cluster the sequences. The header of the uc_size_fas_integrator.pl script was modified from /$OTUname/ to /\Q$OTUname\E/ and the uc_size_processor.pl script was modified from /$otuname/ to /\Q$otuname\E/, both to deal with Illumina FASTQ files header. Finally, the read sequences were aligned to the reference database using BLAST alignment v2.10.0+ (accessed on 3 January 2021) [26] with 100% identity and a custom Perl script was used to calculate the logarithm of the odds for the taxonomic assignment.
The SEQme [27] pipeline workflow was presented at the Microbiome and Metagenome Data analysis workshop by the company of the same name from the Czech Republic (accessed on 4–8 June 2018). A custom Python script was created to make the workflow automatic for all FASTQ files. A merging of the forward (R1) and reverse (R2) reads was achieved using the fastq-join program v1.3.1 (accessed on 26 December 2020) [53], with a number of mismatches smaller than or equal to 15% and a minimum overlap of 15 bases. The FASTX-Toolkit v0.0.14 (accessed on 2 December 2020) [45] collection of tools was used to filter sequences based on quality and keep only read sequences where 50% of the bases have quality equal to or higher than 20. Length filtering was applied by discarding sequences shorter than 90 and longer than 150 bases using the Biopieces bioinformatic framework v2.0 (accessed on 26 December 2020) [54]. USEARCH v11.0.667 (accessed on 23 December 2020) [49] was used to group identical sequences (dereplication). In addition, USEARCH was also used to cluster sequences into OTUs with 97% identity and to remove chimeric sequences. The taxonomic assignment of the read sequences was accomplished using the RDP classifier v2.11 (accessed on 26 December 2020) [28], which is based on Bayes’ theorem. The classifier was trained using the reference database in FASTA format and the table of taxonomy hierarchy created from the taxa in the reference file. Finally, only taxa with 100% confidence were saved to the final table.
After the bioinformatic pipeline execution, R environment for Statistical Computing v3.6.3 (accessed on 1 December 2020) [55] was used for the data analysis. For each sample, a threshold of 0.1% was applied to remove reads assigned with low frequency, representing false positive taxa that were caused by sequencing and PCR errors [3,56]. They were also rarefied by using the ranacapa and microbiome packages to make the sequencing depth consistent across samples. The number of reads assigned and the number of species were calculated by summing up for each pipeline, reservoir, and season. The Vegan package v2.5-6 (accessed on 1 December 2020) [57] was used to calculate the alpha diversity, considering taxa richness where only the presence and absence of taxa are considered, and the Shannon index, which also uses the number of reads assigned for the diversity calculation. In addition, the same package was used to calculate the beta diversity, with the Bray index for relative abundance and the Jaccard index for presence/absence. Finally, the Mantel test, also from the Vegan package, was applied. An analysis of variance (ANOVA) using the stats package from the R core was applied to test whether the alpha diversity among the bioinformatic pipelines, reservoirs, seasons, and the interaction between reservoirs and seasons were statistically similar. Regarding beta diversity, the Vegan package, applying a permutational multivariate analysis of variance (PERMANOVA), was used to check if the taxa composition was statistically similar among the bioinformatic pipelines, reservoirs, seasons, and the interaction between reservoirs and seasons. Finally, the distances and relationships from the beta diversity results were represented in a two-dimensional plot using the principal coordinates analysis (PCoA) method from the stats package.

3. Results

3.1. eDNA Metabarcoding

A total of 22.46 million raw sequence reads were generated through Illumina Miseq sequencing. After demultiplexing, 20.91 million reads (93.08%) remained available. Following the execution of the bioinformatic pipelines and the subsequent filtration of low-frequency reads, an average of 10.65 million reads (47.41%) were successfully assigned to taxa (Table 3). Contamination was not observed in the negative and positive controls, with only Maylandia zebra detected in the positive control samples. Among the total reads in the sample set, MiFish yielded the lowest detection rate for positive control reads, ranging from 70.11% to 72.4%, while Anacapa displayed the highest rate, ranging from 96.22% to 96.73%. The execution times for each pipeline were as follows: Anacapa took 2 h and 59 min, Barque 21 min, metaBEAT 12 h and 45 min, MiFish 1 h and 51 min, and the SEQme pipeline 23 min. Notably, subsequent analyses excluded the positive and negative control data points.

3.2. Number of Taxa Detected and Assigned Reads

In total, 38 distinct taxa were identified when considering all the bioinformatic pipelines. The individual counts for fish taxa detection per pipeline were as follows: Anacapa identified 35 taxa, while Barque, MiFish, and SEQme each detected 33 taxa. Meanwhile, metaBEAT yielded a count of 32 taxa. When observing the reservoirs, a varying number of taxa emerged: 22 in Klíčava, 23 in Žlutice, and the highest count of 35 in Římov. Furthermore, a notable disparity of 10 taxa was evident between the two seasons. Specifically, 27 taxa were observed during the summer season, contrasting with a higher count of 37 taxa during the autumn season (Table 4). Among the 38 identified taxa, 31 were consistently detected across all the bioinformatic pipelines, while 5 taxa were unique to a single pipeline (4 in Anacapa and 1 in SEQme). In the context of reservoirs, 15 taxa were shared among all reservoirs, while 11 taxa were exclusively detected in one reservoir (2 in Klíčava and 9 in Římov). Lastly, 26 taxa were identified in both the autumn and summer seasons, whereas 12 taxa appeared solely in one of the seasons (1 in summer and 11 in autumn). This thorough assessment underscores the intricate interplay between bioinformatic pipelines, reservoirs, and seasons, shedding light on the multifaceted nature of taxonomic identification within the context of eDNA metabarcoding.
The average number of reads assigned to taxa (excluding the positive control) across all bioinformatic pipelines stood at 7,821,429 reads (Table 4). Among these pipelines, Baraque boasted the highest count with 8,410,039 reads, while MiFish displayed the lowest with 6,820,393 reads. As for the reservoirs, the average read assignment varied: 1,205,830 reads for Klíčava, 4,994,657 reads for Římov, and 1,620,942 reads for Žlutice. Transitioning to the seasonal breakdown, the summer season exhibited an average of 5,676,294 reads assigned, whereas the autumn season witnessed a lower assignment count of 2,387,314 reads. Notably, the taxon with the most substantial read count was Rutilus rutilus, averaging 2,438,217 reads, while the lowest read assignment was attributed to Lampetra planeri, with a mere 50 reads (Figure 2). This comprehensive overview underscores the dynamic interplay between read assignments, bioinformatic pipelines, reservoirs, and seasons, offering a nuanced perspective on the distribution of reads within the context of eDNA metabarcoding (Figure 3).

3.3. Alpha and Beta Diversities

The assessment of alpha diversity richness, focusing solely on the presence and absence of taxa, demonstrated a remarkable similarity in outcomes across distinct bioinformatic pipelines (ANOVA, F25,4 = 0.06, p = 0.99). This trend persisted when adopting a more comprehensive approach to alpha diversity, incorporating the calculation of the Shannon index based on the number of assigned reads (ANOVA, F25,4 = 0.12, p = 0.97). Similarly, the evaluation of beta diversity through the utilization of the Jaccard index, which considers presence and absence of taxa, indicated that there is no strong evidence to suggest that the choice of pipeline significantly affects the community composition (PERMANOVA, F25,4 = 1.11, R2 = 0.15, p = 0.36) (Figure 4). Expanding this analysis to encompass the read count assigned to each taxon, the beta diversity analysis, using the Bray index, revealed a statistically akin pattern across the bioinformatic pipelines (PERMANOVA, F25,4 = 0.52, R2 = 0.08, p = 0.92) (Figure 5). These results collectively underscore a consistent pattern of similarity across various bioinformatic pipelines in terms of alpha and beta diversity metrics. Such findings illuminate the robustness and reliability of these metrics in providing insights into species composition and distribution, thereby fortifying the validity of eDNA metabarcoding data analysis.

4. Discussion

In this study, we conducted a comparative analysis of five distinct eDNA metabarcoding bioinformatic pipelines. Furthermore, we juxtaposed the species composition yielded by these pipelines against that detected by conventional methods. Notably, the eDNA metabarcoding approach exhibited superior performance in comparison to traditional methods, which was particularly evident when evaluating the concordance between the bioinformatic pipeline results and the species detected by conventional means. In our investigation of species composition, intriguing patterns emerged. Both the Barque and MiFish bioinformatic pipelines demonstrated congruent species detections. Similarly, the species composition in metaBEAT closely resembled that of Barque and MiFish, with the absence of Squalius cephalus standing out as the sole disparity. Interestingly, the MiFish pipeline exhibited the lowest allocation of reads to species within the reference database, while the Anacapa pipeline stood out with the highest tally of reads assigned and species detected. Delving into the realm of diversity, both alpha and beta diversity analyses unveiled statistically congruent outcomes. It is noteworthy that the total count of species detected across all bioinformatic pipelines surpassed that of conventional methods. Moreover, every species identified by traditional methods found its counterpart within the bioinformatic pipeline results. Notably, Lota lota was excluded from the analysis due to its low frequency, representing less than 0.1% of the total assigned sequences in the sample [56]. Additionally, our exploration encompassed the allocation of reads, unveiling a higher count during the summer season in comparison to the autumn. Intriguingly, despite the variance in read allocation, a greater diversity of species was detected during the autumn season. This dynamic interplay underscores the multifaceted influence of seasonality on both the quantity and variety of detected species.

4.1. Comparison of the Detection Among Bioinformatic Pipelines and Conventional Methods

The consistency in the number of species detected was evident across the spectrum of bioinformatic pipelines. However, a noteworthy exception emerged, with five species being exclusively detected within a single pipeline. The most distinctive case was observed in Anacapa, where Lampetra planeri, Gasterosteus aculeatus, Micropterus salmoides, and Cottus poecilopus were exclusively identified. Among these, Lampetra planeri had also been initially detected by metaBEAT, MiFish, and SEQme before the removal of low-frequency sequences, thus bolstering its status as a genuine positive detection. The presence of Lampetra planeri in conventional methods further supports its credibility as a valid detection. In contrast, Gasterosteus aculeatus, Micropterus salmoides, and Cottus poecilopus are more likely to be erroneous positives. The former is notably absent from the regions of sample collection, being confined to the eastern parts of the Czech Republic and the Liberec region [58]. Micropterus salmoides appears to have been misidentified, likely mistaken for Lepomis gibbosus, while Cottus poecilopus’s identification aligns more closely with Cottus gobio. It is intriguing to note that the Anacapa pipeline employs a confidence score threshold of 50% for its classifier. Although a low confidence score can augment species detection [59], it also escalates the potential for false positives. Notably, Gymnocephalus cernua emerged as a solitary detection within the SEQme pipeline. This observation finds validation in the outcomes of conventional methods, reaffirming its status as a true positive detection. However, there were instances of species that remained undetected in specific pipelines. Squalius cephalus was conspicuously absent in the Anacapa pipeline, and its non-detection also extended to the metaBEAT pipeline. Meanwhile, Hypophthalmichthys molitrix went undetected in the Anacapa pipeline, while Hypophthalmichthys nobilis was identified. Artificial spawning processes have been known to result in hybridization between Hypophthalmichthys molitrix and Hypophthalmichthys nobilis [60], offering a plausible explanation for their confusion in certain bioinformatic pipelines.
The species composition between the Barque and MiFish bioinformatic pipelines exhibited perfect congruence. Yet, a notable observation emerged concerning the influence of reservoirs and seasons. Within MiFish, during the summer season in Římov, Carassius auratus was excluded following the elimination of low-frequency sequences. Barring the exception of Squalius cephalus, metaBEAT’s species composition closely mirrored that of Barque, sustaining this consistency even when considering reservoirs and seasons. The harmony extends to the analysis tools employed across both bioinformatic pipelines, sharing an identical toolkit for data analysis steps. A divergence lies solely in the taxonomic assignment approach, executed via BLAST [26] by metaBEAT and VSEARCH [24] by Barque, with minor variances in parameters and supplemental Python scripts. Venturing into the MiFish pipeline, a distinct tool selection for intermediate steps influenced read assignments without perturbing the composition of detected species. Notably, Barque, metaBEAT, and MiFish bioinformatic pipelines anchored their taxonomic assignment on an alignment-based paradigm, likely underpinning the parallels in species composition findings. Limited to an intra-pipeline comparison, the species composition of the SEQme pipeline mirrored that of Barque and MiFish. However, Hypophthalmichthys molitrix remained undetected within SEQme, while Gymnocephalus cernua was exclusively identified, yet subsequently discarded, within SEQme, akin to its detection within Anacapa and metaBEAT, albeit subject to a similar elimination of low-frequency sequences. The SEQme pipeline further exhibited a unique capability in detecting Lota lota, albeit its subsequent exclusion upon filtering sequences with frequencies below 0.1% of the total for each sample. A notable parallel arises with conventional methods, where a single Lota lota specimen was detected across the 2018, 2019, and 2020 campaigns. The challenge of eDNA metabarcoding lies in the limited presence of eDNA for rare species [61], rendering their detection intricate. The scarcity of reads assigned to such species might indeed signify false positives attributed to PCR and sequencing errors [56]. Moreover, a strategic shift toward nocturnal winter sampling could elevate the odds of Lota lota detection, considering the species’ affinity to cold temperatures and heightened activity during the night [62,63]. Although the species composition closely aligns with Barque and MiFish, this concurrence does not uniformly translate into read assignments, which were evident in the unique detection of Alburnus alburnus in the SEQme pipeline exclusively in Římov during autumn, contrasting with its identification across all other bioinformatic pipelines in Římov and Žlutice during both the summer and autumn seasons.
Seven species—Barbus barbus, Carassius carassius, Cottus gobio, Phoxinus phoxinus, Rhodeus amarus, Salvelinus fontinalis, and Thymallus thymallus—were consistently detected across all bioinformatic pipelines, yet they remained absent from observations during the 2018, 2019, and 2020 campaigns within the reservoirs when using conventional methods. Specifically, Barbus barbus and Carassius carassius were previously noted in the older campaigns at Klíčava and Římov, their presence being reaffirmed in the same reservoirs through bioinformatic pipelines. For Cottus gobio, Phoxinus phoxinus, and Thymallus thymallus, their preference for tributary habitats renders them less amenable to conventional detection methods. Rhodeus amarus, characterized by its diminutive size, could easily evade detection. The reliance of Salvelinus fontinalis on restocking further contributes to its absence in conventional assessments [42]. It is also plausible that the introduction of new species occurred during the release of predatory species, a routine practice in the reservoirs. Additionally, the possibility of downstream eDNA influx from the catchment cannot be discounted. Turning attention to the control samples, solely Maylandia zebra exhibited detection in the positive control, while no reads were assigned in the negative control. The inclusion of positive and negative controls serves to validate the results against contamination, and as such, no contamination was identified within the bioinformatic pipelines [14]. Of particular interest is the positive control Maylandia zebra, registering the highest average detection (96.45%) within the Anacapa pipeline, in contrast to the lowest detection average of 71.33% observed within the MiFish pipeline. These percentages align with the broader detection trends of other species, where Anacapa consistently records the highest read assignments, while MiFish manifests the lowest read assignments.

4.2. Comparison of Alpha and Beta Diversities Among the Bioinformatic Pipelines

The comparison of bioinformatic pipelines is rare; so far, Li et al. [18] have compared three pipelines and concluded that significant differences exist between fish diversity estimates and that an OTU-based pipeline has superior capabilities for fish eDNA metabarcoding monitoring. The comparison of alpha and beta diversities across bioinformatic pipelines, reservoirs, seasons, and the interactions between reservoirs and seasons yielded statistically indistinct results. The Mantel test between the pipelines corroborates the result. Alpha diversity was assessed considering both the mere presence and absence of detected species, as well as incorporating the quantification of assigned reads for comparison. Additionally, beta diversity accounted for species composition within the framework of the analogy [31]. Considering the 2018, 2019, and 2020 campaigns, conventional methods detected a total of 29 species, excluding cases where L.idus+leuciscus, Aspius+Scardinius, and Sander+Perca were considered distinct species. Without such considerations, the count stands at 32. Notably, the Anacapa pipeline boasted the highest observed alpha diversity richness, identifying 35 distinct species, 6 more than those detected through conventional methods. However, when contemplating the potential for false positives, this count adjusts to 32 in Anacapa, aligning more closely with the findings of traditional methods. A notable observation is the presence of seven species in all bioinformatic pipelines that were conspicuously absent during the three campaigns of conventional methods. This underlines the heightened sensitivity of eDNA for the detection of rare species [8]. Shannon indices, indicative of alpha diversity, emerged statistically similar across the bioinformatic pipelines. Barque, metaBEAT, and MiFish showcased congruency in the tally of reads assigned to each species, attributed to their shared alignment-based taxonomic assignment approaches. A departure from this pattern was evident in the SEQme pipeline, characterized by fewer reads allocated to specific species. For instance, Alburnus alburnus was exclusively detected within the Římov region during autumn by the SEQme pipeline. These findings accentuate the convergence of results within diverse bioinformatic pipelines, affording a deeper understanding of the complex interplay between species detection, sensitivity, and methodology.
In the context of reservoirs, the count of detections surpassed that of conventional methods. Notably, the morphology and trophic state of reservoirs exert influence over the composition of fish communities [64,65]. Additionally, the degradation rate of eDNA is accelerated in oligotrophic environments compared to eutrophic ones [66]. Remarkably, Římov claimed the highest tally with 35 species detected, a reservoir characterized by an eutrophic trophic state, boasting the largest area and volume among the three. Following suit, Žlutice, also eutrophic and the second largest in size, reported 23 species. In contrast, Klíčava, a mesotrophic reservoir with the smallest dimensions, documented the lowest count of 22 species. In Římov, all species detected with conventional methods also found representation within the bioinformatic pipelines. However, Žlutice exhibited the absence of Hypophthalmichthys molitrix and Anguilla anguilla in the bioinformatic pipelines, whereas Klíčava missed Alburnus alburnus, L.idus+leuciscus, and Carassius auratus. Notably, conventional methods yielded only scant specimens for Hypophthalmichthys molitrix and Anguilla anguilla in Žlutice, and Alburnus alburnus, L.idus+leuciscus, and Carassius auratus in Klíčava. Given the often low concentration of eDNA in rare species [61], the likelihood of detection via eDNA metabarcoding is diminished. The bioinformatic pipeline detections revealed a more extensive array of species compared to conventional methods, validating the heightened sensitivity of eDNA in species detection [8]. Specifically, Římov showcased the detection of 8 species not previously identified through traditional methods, followed by 6 in Žlutice and 5 in Klíčava. Notably, three species Gasterosteus aculeatus, Micropterus salmoides, and Cottus poecilopus are deemed likely false positives and have been excluded from consideration. This examination underscores the efficacy of eDNA metabarcoding in enhancing species detection and offering insights into the intricate dynamics of species distribution within different reservoirs.
In terms of seasons, a total of 37 species were identified during the autumn, while 27 species were recorded in the summer. Among the 38 species in total, Lepomis gibbosus stood as the singular exception, remaining absent from the autumn detections. The behavior of species is intrinsically tied to environmental temperature; as ectothermic organisms, fish regulate body temperature in accordance with external conditions [67,68]. Species exhibit heightened activity within specific temperature ranges, and such activity amplifies the probability of detection, signifying the influence of the chosen sampling season on detection dynamics [69]. Evidently, the number of species detected in the autumn season surpassed that of summer. Notably, species exclusively detected in autumn, such as Lamperta planeri, Acipenser sp., Salmo trutta, and Coregonus sp., generally exhibit intolerance of higher temperatures [67]. The autumn season also witnesses a greater influx of upstream water, potentially introducing additional eDNA to the reservoir [63]. Moreover, temperature wields a significant influence over eDNA degradation; lower temperatures prolong eDNA persistence [6]. Lepomis gibbosus, noted for its affinity to warmer waters, experiences heightened activity during summer, thus validating its detection during this season [63]. Turning to the allocation of read assignments, summer witnessed more than double the counts observed during the autumn. Dominating the reservoirs are cyprinid species, which manifest a preference for warmer waters and exhibit increased activity throughout summer [70,71]. While the number of detected species surged during the autumn season, the greater presence of cyprinids during summer contributed to the elevated volume of reads during this season. This intricate interplay between species behavior, environmental factors, and eDNA dynamics underscores the multifaceted influences that dictate the patterns of detection within different seasons.

4.3. Bioinformatic Pipelines Analogy

The alpha and beta diversities exhibited no significant differences, suggesting that the choice of the pipeline does not exert a substantial impact on the metabarcoding results. Beyond pipeline selection, the comprehensiveness of the reference database plays a crucial role; the inclusion of all species native to the study site is imperative to avoid potential false negatives in taxonomic assignment [72]. However, in scenarios where the primary requirement revolves around the identification of related groups rather than specific taxonomy, such as studies focusing on microbial communities [73], a reliance on OTUs or ASVs for grouping similar sequences may suffice. Both strategies are designed to mitigate PCR and sequencing errors [74]. ASVs, though, tend to overinflate richness by generating more groups than are actually present [75], while OTUs more efficiently approximate proxies for species [76]. Should the objective be to proxy haplotypes, ASVs are the apt choice. Conversely, when a more effective proxy for species is required, OTUs should be favored. Within the array of bioinformatic pipelines examined, only the Anacapa pipeline employs the amplicon sequence variant approach. In contrast, the Barque pipeline abstains from applying any sequence grouping approach. In instances where species detection is the central goal and a comprehensive, all-inclusive reference database is at hand, the creation of OTUs or ASVs can potentially be omitted, streamlining the analysis process [77]. This underlines the flexibility inherent in selecting the most appropriate strategy depending on the research objectives and data availability.
The intermediate steps in data curation, aimed at rectifying errors, exerted a direct impact on the ultimate count of assigned reads. Yet, a consensus regarding the correlation between read counts and species abundance remains elusive [78]. The taxonomic assignment process across bioinformatic pipelines was conducted through two distinct approaches: alignment-based classification and the Bayesian classifier. Notably, alignment-based methods for classification yielded consistent results in terms of both species detection and read assignment across the evaluated pipelines. However, for pipelines such as Anacapa and SEQme, which employ Bayesian classifiers, outcomes diverged to a certain extent. Although these pipelines demonstrated heightened sensitivity in detecting rare species, their results displayed greater variability. As such, if maximizing the identification of rare species constitutes a prime objective, pipelines employing the Bayesian approach could be deemed favorable.
Turning attention to the execution time of bioinformatic pipelines, Barque emerged as the swiftest contender, wrapping up its processes within approximately 22 min. In stark contrast, metaBEAT proved to be the most time-consuming pipeline, operating at a pace nearly 35 times slower than Barque. The execution time for the SEQme pipeline aligned closely with Barque, concluding its tasks in 23 min; both pipelines harnessed multiple computer cores for parallelized data analysis. MiFish, on the other hand, exhibited a fivefold delay, while Anacapa trailed behind with a performance roughly eight times slower than Barque. However, it is important to note that the speed advantage of the alignment-based approach, as seen in the fastest pipeline, may potentially diminish as sequence length and volume escalate [79]. These insights underline the intricate balance between execution speed and analytical robustness when considering the suitability of distinct bioinformatic pipelines.
Additional bioinformatic pipelines could be introduced for a comprehensive comparison, thus reinforcing the validation of outcomes across the spectrum of pipelines [80,81]. Furthermore, diversifying the tools employed within these pipelines for data analysis could enhance the comprehensiveness of the evaluation. A notable aspect that warrants attention is the absence of user-friendly graphical interfaces akin to the SLIM pipeline [82]. Demonstrating distinct advantages over conventional methods, eDNA metabarcoding emerges as a non-invasive technique that excels in detecting rare species and exhibits heightened sensitivity in species identification. However, the technique does bear limitations as it is incapable of providing insights into attributes such as size, weight, or age. Moreover, the chosen genetic marker may at times fail to differentiate between closely related species, as exemplified in this study by the non-distinguishability of Perca fluviatilis and Sander lucioperca. The reference database, a critical component, still grapples with underrepresentation and the presence of occasionally suboptimal sequences [83]. Notably, the genetic marker employed in this study specifically targets vertebrates; thus, accommodating other vertebrate species in future investigations could be considered. In most fish eDNA metabarcoding studies, the 12S rRNA gene is the most widely used due to its compatibility with degraded DNA, taxonomic coverage, and strong support from reference databases [84]. However, Cytochrome c oxidase I may be preferred when detailed species-level identification is critical and the quality of DNA is better preserved [85]. Some studies even use multiple genes to increase detection breadth and confidence [86]. A further increase in reliability is to be gained by the use of internal positive control for the accurate validation of eDNA workflows; facilitating reliable interpretation of the generated data helps track down degraded and inhibited samples and avoid false-negative detections, offering insights into extraction efficiency, indispensable for accurate quantification of population densities [87]. Also, the DNA extraction protocol can significantly influence both the yield and quality of extracted DNA, which, in turn, affects downstream applications. For instance, phenol–chloroform extraction often results in higher purity and greater DNA yield compared to other methods [88]. Also, the technology of the filter columns may affect the extracted DNA. We must admit that the sampled water volume was relatively low, and a higher volume might increase the taxa detection [89]. Ultimately, the implications of this study extend toward certifying the dependability of eDNA metabarcoding for effective biomonitoring, significantly contributing to the preservation of global fauna.

5. Conclusions

In this study, five bioinformatics pipelines were applied to one eDNA dataset. Parameters such as taxa detection, read count, alpha and beta diversities, and the Mantel test were compared, revealing significant similarity across pipelines. Therefore, the choice of bioinformatic pipeline did not significantly affect metabarcoding outcomes or their ecological interpretation. As a result, authors should describe the particular bioinformatic procedure used in their studies, including specific details about pipeline selection and related parameters.

Author Contributions

Research data curation, formal analysis, software, visualization, and writing—original draft: R.A.d.S.; funding acquisition, methodology, project administration, resources, supervision, and validation: P.B.; investigation and writing—review and editing: both authors. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the projects QK1920011 “Methodology of predatory fish quantification in drinking-water reservoirs to optimize the management of aquatic ecosystems”, MSM200961901 “The true picture of eDNA”, and by the CAS within the program of Strategy AV 21, Land conservation and restoration.

Institutional Review Board Statement

The animal study protocol was approved by the Ministry of the Environment of the Czech Republic, Department of Species Conservation and Implementation of International Commitments (Approval Code: MZP/2019/630/16, Approval Date: 3 January 2019).

Data Availability Statement

All codes, configuration files, and the reference database can be downloaded from the GitHub repository https://github.com/RomuloAS/eDNA_metabarcoding_pipelines_comparison/tree/Paper-comparison-of-pipelines.

Acknowledgments

We would like to thank the Fish Ecology Unit at the Institute of Hydrobiology, Biology Centre CAS (www.fishecu.cz), the EvoHull group at the University of Hull (www.evohull.org) and Marta Vohnoutová at University of South Bohemia in České Budějovicefor their help with field and laboratory work as well as valuable comments. We thank Chris Steer for English correction and the three reviewers for their valuable comments, which have improved the early version of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Seymour, M. Rapid Progression and Future of Environmental DNA Research. Commun. Biol. 2019, 2, 80. [Google Scholar] [CrossRef] [PubMed]
  2. Taberlet, P.; Coissac, E.; Hajibabaei, M.; Rieseberg, L.H. Environmental DNA. Mol. Ecol. 2012, 21, 1789–1793. [Google Scholar] [CrossRef]
  3. Hänfling, B.; Lawson Handley, L.; Read, D.S.; Hahn, C.; Li, J.; Nichols, P.; Blackman, R.C.; Oliver, A.; Winfield, I.J. Environmental DNA Metabarcoding of Lake Fish Communities Reflects Long-term Data from Established Survey Methods. Mol. Ecol. 2016, 25, 3101–3119. [Google Scholar] [CrossRef] [PubMed]
  4. Kottelat, M.; Freyhof, J. Handbook of European Freshwater Fishes; Publications Kottelat: Cornol, Switzerland, 2007. [Google Scholar]
  5. Dejean, T.; Valentini, A.; Duparc, A.; Pellier-Cuit, S.; Pompanon, F.; Taberlet, P.; Miaud, C. Persistence of Environmental DNA in Freshwater Ecosystems. PLoS ONE 2011, 6, e23398. [Google Scholar] [CrossRef] [PubMed]
  6. Strickler, K.M.; Fremier, A.K.; Goldberg, C.S. Quantifying Effects of UV-B, Temperature, and pH on eDNA Degradation in Aquatic Microcosms. Biol. Conserv. 2015, 183, 85–92. [Google Scholar] [CrossRef]
  7. Deiner, K.; Altermatt, F. Transport Distance of Invertebrate Environmental DNA in a Natural River. PLoS ONE 2014, 9, e88786. [Google Scholar] [CrossRef]
  8. Valentini, A.; Taberlet, P.; Miaud, C.; Civade, R.; Herder, J.; Thomsen, P.F.; Bellemain, E.; Besnard, A.; Coissac, E.; Boyer, F.; et al. Next-generation Monitoring of Aquatic Biodiversity Using Environmental DNA Metabarcoding. Mol. Ecol. 2016, 25, 929–942. [Google Scholar] [CrossRef]
  9. Miya, M.; Sato, Y.; Fukunaga, T.; Sado, T.; Poulsen, J.Y.; Sato, K.; Minamoto, T.; Yamamoto, S.; Yamanaka, H.; Araki, H.; et al. MiFish, a Set of Universal PCR Primers for Metabarcoding Environmental DNA from Fishes: Detection of More than 230 Subtropical Marine Species. R. Soc. Open Sci. 2015, 2, 150088. [Google Scholar] [CrossRef]
  10. Quail, M.; Smith, M.E.; Coupland, P.; Otto, T.D.; Harris, S.R.; Connor, T.R.; Bertoni, A.; Swerdlow, H.P.; Gu, Y. A Tale of Three next Generation Sequencing Platforms: Comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq Sequencers. BMC Genom. 2012, 13, 341. [Google Scholar] [CrossRef]
  11. Stoler, N.; Nekrutenko, A. Sequencing Error Profiles of Illumina Sequencing Instruments. NAR Genom. Bioinform. 2021, 3, lqab019. [Google Scholar] [CrossRef]
  12. Laehnemann, D.; Borkhardt, A.; McHardy, A.C. Denoising DNA Deep Sequencing Data—High-Throughput Sequencing Errors and Their Correction. Brief. Bioinform. 2016, 17, 154–179. [Google Scholar] [CrossRef] [PubMed]
  13. Callahan, B.J.; McMurdie, P.J.; Rosen, M.J.; Han, A.W.; Johnson, A.J.A.; Holmes, S.P. DADA2: High-Resolution Sample Inference from Illumina Amplicon Data. Nat. Methods 2016, 13, 581–583. [Google Scholar] [CrossRef]
  14. Taberlet, P.; Bonin, A.; Zinger, L.; Coissac, E. Environmental DNA: For Biodiversity Research and Monitoring, 1st ed.; Oxford University Press: Oxford, UK, 2018. [Google Scholar] [CrossRef]
  15. Sato, Y.; Miya, M.; Fukunaga, T.; Sado, T.; Iwasaki, W. MitoFish and MiFish Pipeline: A Mitochondrial Genome Database of Fish with an Analysis Pipeline for Environmental DNA Metabarcoding. Mol. Biol. Evol. 2018, 35, 1553–1555. [Google Scholar] [CrossRef] [PubMed]
  16. Curd, E.E.; Gold, Z.; Kandlikar, G.S.; Gomer, J.; Ogden, M.; O’Connell, T.; Pipes, L.; Schweizer, T.M.; Rabichow, L.; Lin, M.; et al. Anacapa Toolkit: An Environmental DNA Toolkit for Processing Multilocus Metabarcode Datasets. Methods Ecol. Evol. 2019, 10, 1469–1475. [Google Scholar] [CrossRef]
  17. Zafeiropoulos, H.; Viet, H.Q.; Vasileiadou, K.; Potirakis, A.; Arvanitidis, C.; Topalis, P.; Pavloudi, C.; Pafilis, E. PEMA: A Flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S Ribosomal RNA, ITS, and COI Marker Genes. GigaScience 2020, 9, giaa022. [Google Scholar] [CrossRef]
  18. Li, Z.; Zhao, W.; Jiang, Y.; Wen, Y.; Li, M.; Liu, L.; Zou, K. New Insights into Biologic Interpretation of Bioinformatic Pipelines for Fish eDNA Metabarcoding: A Case Study in Pearl River Estuary. J. Environ. Manag. 2024, 368, 122136. [Google Scholar] [CrossRef]
  19. Gardner, P.P.; Watson, R.J.; Morgan, X.C.; Draper, J.L.; Finn, R.D.; Morales, S.E.; Stott, M.B. Identifying Accurate Metagenome and Amplicon Software via a Meta-Analysis of Sequence to Taxonomy Benchmarking Studies. PeerJ 2019, 7, e6160. [Google Scholar] [CrossRef]
  20. Van Den Berg, C.P.; Troscianko, J.; Endler, J.A.; Marshall, N.J.; Cheney, K.L. Quantitative Colour Pattern Analysis (QCPA): A Comprehensive Framework for the Analysis of Colour Patterns in Nature. Methods Ecol. Evol. 2020, 11, 316–332. [Google Scholar] [CrossRef]
  21. Callahan, B.J.; McMurdie, P.J.; Holmes, S.P. Exact Sequence Variants Should Replace Operational Taxonomic Units in Marker-Gene Data Analysis. ISME J. 2017, 11, 2639–2643. [Google Scholar] [CrossRef]
  22. Gao, X.; Lin, H.; Revanna, K.; Dong, Q. A Bayesian Taxonomic Classification Method for 16S rRNA Gene Sequences with Improved Species-Level Accuracy. BMC Bioinform. 2017, 18, 247. [Google Scholar] [CrossRef]
  23. Normandeau, E. Environmental DNA Metabarcoding Analysis. Available online: https://github.com/enormandeau/barque (accessed on 3 December 2020).
  24. Rognes, T.; Flouri, T.; Nichols, B.; Quince, C.; Mahé, F. VSEARCH: A Versatile Open Source Tool for Metagenomics. PeerJ 2016, 4, e2584. [Google Scholar] [CrossRef] [PubMed]
  25. Hahn, C.; Lund, D. metaBEAT—metaBarcoding and Environmental DNA Analysis Tool. Available online: https://github.com/HullUni-bioinformatics/metaBEAT (accessed on 3 January 2021).
  26. Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and Applications. BMC Bioinform. 2009, 10, 421. [Google Scholar] [CrossRef] [PubMed]
  27. SEQme. Microbiome and Metagenome Data Analysis Workshop, České Budějovice, Czech Republic. 2018. Available online: https://www.seqme.eu/en/ (accessed on 4 June 2018).
  28. Wang, Q.; Garrity, G.M.; Tiedje, J.M.; Cole, J.R. Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl. Environ. Microbiol. 2007, 73, 5261–5267. [Google Scholar] [CrossRef] [PubMed]
  29. Lobl, R.T.; Maenza, R.M. Androgenization: Alterations in Uterine Growth and Morphology. Biol. Reprod. 1975, 13, 255–268. [Google Scholar] [CrossRef]
  30. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  31. Whittier, T.R. Development of IBI Metrics for Lakes in Southern New England. In Assessing the Sustainability and Biological Integrity of Water Resources Using Fish Communities; Simon, T.P., Ed.; CRC Press: Boca Raton, FL, USA, 2020; pp. 563–582. [Google Scholar] [CrossRef]
  32. Koch, L.F. Index of Biotal Dispersity. Ecology 1957, 38, 145–148. [Google Scholar] [CrossRef]
  33. Bray, J.R.; Curtis, J.T. An Ordination of the Upland Forest Communities of Southern Wisconsin. Ecol. Monogr. 1957, 27, 325–349. [Google Scholar] [CrossRef]
  34. Říha, M.; Ricard, D.; Vašek, M.; Prchalová, M.; Mrkvička, T.; Jůza, T.; Čech, M.; Draštík, V.; Muška, M.; Kratochvíl, M.; et al. Patterns in Diel Habitat Use of Fish Covering the Littoral and Pelagic Zones in a Reservoir. Hydrobiologia 2015, 747, 111–131. [Google Scholar] [CrossRef]
  35. Blabolil, P.; Ricard, D.; Peterka, J.; Říha, M.; Jůza, T.; Vašek, M.; Prchalová, M.; Čech, M.; Muška, M.; Seďa, J.; et al. Predicting Asp and Pikeperch Recruitment in a Riverine Reservoir. Fish. Res. 2016, 173, 45–52. [Google Scholar] [CrossRef]
  36. Říha, M.; Jůza, T.; Prchalová, M.; Mrkvička, T.; Čech, M.; Draštík, V.; Muška, M.; Kratochvíl, M.; Peterka, J.; Tušer, M.; et al. The Size Selectivity of the Main Body of a Sampling Pelagic Pair Trawl in Freshwater Reservoirs during the Night. Fish. Res. 2012, 127–128, 56–60. [Google Scholar] [CrossRef]
  37. Jůza, T.; Ricard, D.; Blabolil, P.; Čech, M.; Draštík, V.; Frouzová, J.; Muška, M.; Peterka, J.; Prchalová, M.; Říha, M.; et al. Species-Specific Gradients of Juvenile Fish Density and Size in Pelagic Areas of Temperate Reservoirs. Hydrobiologia 2015, 762, 169–181. [Google Scholar] [CrossRef]
  38. Vejřík, L.; Vejříková, I.; Kočvara, L.; Blabolil, P.; Peterka, J.; Sajdlová, Z.; Jůza, T.; Šmejkal, M.; Kolařík, T.; Bartoň, D.; et al. The Pros and Cons of the Invasive Freshwater Apex Predator, European Catfish Silurus glanis, and Powerful Angling Technique for Its Population Control. J. Environ. Manag. 2019, 241, 374–382. [Google Scholar] [CrossRef] [PubMed]
  39. Blabolil, P.; Čech, M.; Draštík, V.; Holubová, M.; Kočvara, L.; Kubečka, J.; Muška, M.; Prchalová, M.; Říha, M.; Sajdlová, Z.; et al. Less Is More—Basic Quantitative Indices for Fish Can Be Achieved with Reduced Gillnet Sampling. Fish. Res. 2021, 240, 105983. [Google Scholar] [CrossRef]
  40. Sellers, G.S.; Di Muri, C.; Gómez, A.; Hänfling, B. Mu-DNA: A Modular Universal DNA Extraction Method Adaptable for a Wide Range of Sample Types. MBMG 2018, 2, e24556. [Google Scholar] [CrossRef]
  41. Riaz, T.; Shehzad, W.; Viari, A.; Pompanon, F.; Taberlet, P.; Coissac, E. ecoPrimers: Inference of New DNA Barcode Markers from Whole Genome Sequence Analysis. Nucleic Acids Res. 2011, 39, e145. [Google Scholar] [CrossRef]
  42. Blabolil, P.; Harper, L.R.; Říčanová, Š.; Sellers, G.; Di Muri, C.; Jůza, T.; Vašek, M.; Sajdlová, Z.; Rychtecký, P.; Znachor, P.; et al. Environmental DNA Metabarcoding Uncovers Environmental Correlates of Fish Communities in Spatially Heterogeneous Freshwater Habitats. Ecol. Indic. 2021, 126, 107698. [Google Scholar] [CrossRef]
  43. Kozlov, A.M.; Zhang, J.; Yilmaz, P.; Glöckner, F.O.; Stamatakis, A. Phylogeny-Aware Identification and Correction of Taxonomically Mislabeled Sequences. Nucleic Acids Res. 2016, 44, 5022–5033. [Google Scholar] [CrossRef]
  44. Martin, M. Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads. EMBnet J. 2011, 17, 10. [Google Scholar] [CrossRef]
  45. Hannon, G.J. Fastx-Toolkit. Available online: https://www.hannonlab.org/resources/ (accessed on 20 December 2020).
  46. Langmead, B.; Salzberg, S.L. Fast Gapped-Read Alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
  47. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
  48. Magoč, T.; Salzberg, S.L. FLASH: Fast Length Adjustment of Short Reads to Improve Genome Assemblies. Bioinformatics 2011, 27, 2957–2963. [Google Scholar] [CrossRef] [PubMed]
  49. Edgar, R.C. Search and Clustering Orders of Magnitude Faster than BLAST. Bioinformatics 2010, 26, 2460–2461. [Google Scholar] [CrossRef] [PubMed]
  50. Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. Available online: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed on 23 December 2020).
  51. Cox, M.P.; Peterson, D.A.; Biggs, P.J. SolexaQA: At-a-Glance Quality Assessment of Illumina Second-Generation Sequencing Data. BMC Bioinform. 2010, 11, 485. [Google Scholar] [CrossRef] [PubMed]
  52. Schmieder, R.; Lim, Y.W.; Rohwer, F.; Edwards, R. TagCleaner: Identification and Removal of Tag Sequences from Genomic and Metagenomic Datasets. BMC Bioinform. 2010, 11, 341. [Google Scholar] [CrossRef]
  53. Aronesty, E. Comparison of Sequencing Utility Programs. Open Bioinform. J. 2013, 7, 1–8. [Google Scholar] [CrossRef]
  54. Hansen, M.A. Biopieces: A Bioinformatics Toolset and Framework. Available online: http://www.biopieces.org/ (accessed on 20 December 2020).
  55. R Core Team. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. Available online: https://cran.r-project.org/ (accessed on 1 December 2020).
  56. Lawson Handley, L.; Read, D.S.; Winfield, I.J.; Kimbell, H.; Johnson, H.; Li, J.; Hahn, C.; Blackman, R.; Wilcox, R.; Donnelly, R.; et al. Temporal and Spatial Variation in Distribution of Fish Environmental DNA in England’s Largest Lake. Environ. DNA 2019, 1, 26–39. [Google Scholar] [CrossRef]
  57. Oksanen, J.; Simpson, G.L.; Blanchet, F.G.; Kindt, R.; Legendre, P.; Minchin, P.R.; O’Hara, R.B.; Solymos, P.; Stevens, M.H.H.; Szoecs, E.; et al. Vegan: Community Ecology Package, 2001, 2.6-10. Available online: https://doi.org/10.32614/CRAN.package.vegan (accessed on 1 December 2020).
  58. IUCN. Gasterosteus Aculeatus; The IUCN Red List of Threatened Species. Available online: https://www.iucnredlist.org/ (accessed on 3 January 2021).
  59. O’Rourke, D.R.; Bokulich, N.A.; Jusino, M.A.; MacManes, M.D.; Foster, J.T. A Total Crapshoot? Evaluating Bioinformatic Decisions in Animal Diet Metabarcoding Analyses. Ecol. Evol. 2020, 10, 9721–9739. [Google Scholar] [CrossRef]
  60. Nosova, A.Y.; Kipen, V.N.; Tsar, A.I.; Lemesh, V.A. Differentiation of Hybrid Progeny of Silver Carp (Hypophthalmichthys molitrix Val.) and Bighead Carp (H. nobilis Rich.) Based on Microsatellite Polymorphism. Russ. J. Genet. 2020, 56, 317–323. [Google Scholar] [CrossRef]
  61. Sepulveda, A.J.; Schabacker, J.; Smith, S.; Al-Chokhachy, R.; Luikart, G.; Amish, S.J. Improved Detection of Rare, Endangered and Invasive Trout in Using a New Large-volume Sampling Method for eDNA Capture. Environ. DNA 2019, 1, 227–237. [Google Scholar] [CrossRef]
  62. Eick, D. Habitat Preferences of the Burbot (Lota lota) from the River Elbe: An Experimental Approach. J. Appl. Ichthyol. 2013, 29, 541–548. [Google Scholar] [CrossRef]
  63. Blabolil, P.; Duras, J.; Jůza, T.; Kočvara, L.; Matěna, J.; Muška, M.; Říha, M.; Vejřík, L.; Holubová, M.; Peterka, J. Assessment of Burbot Lota lota (L. 1758) Population Sustainability in Central European Reservoirs. J. Fish Biol. 2018, 92, 1545–1559. [Google Scholar] [CrossRef] [PubMed]
  64. Mehner, T.; Diekmann, M.; Brämick, U.; Lemcke, R. Composition of Fish Communities in German Lakes as Related to Lake Morphology, Trophic State, Shore Structure and Human-use Intensity. Freshw. Biol. 2005, 50, 70–85. [Google Scholar] [CrossRef]
  65. Willemsen, J. Fishery-Aspects of Eutrophication. Hydrobiol. Bull. 1980, 14, 12–21. [Google Scholar] [CrossRef]
  66. Eichmiller, J.J.; Best, S.E.; Sorensen, P.W. Effects of Temperature and Trophic State on Degradation of Environmental DNA in Lake Water. Environ. Sci. Technol. 2016, 50, 1859–1867. [Google Scholar] [CrossRef]
  67. Leuven, R.S.E.W.; Hendriks, A.J.; Huijbregts, M.A.J.; Lenders, H.J.R.; Matthews, J.; Velde, G.V.D. Differences in Sensitivity of Native and Exotic Fish Species to Changes in River Temperature. Curr. Zool. 2011, 57, 852–862. [Google Scholar] [CrossRef]
  68. Van De Pol, I.; Flik, G.; Gorissen, M. Comparative Physiology of Energy Metabolism: Fishing for Endocrine Signals in the Early Vertebrate Pool. Front. Endocrinol. 2017, 8, 36. [Google Scholar] [CrossRef]
  69. De Souza, L.S.; Godwin, J.C.; Renshaw, M.A.; Larson, E. Environmental DNA (eDNA) Detection Probability Is Influenced by Seasonal Activity of Organisms. PLoS ONE 2016, 11, e0165273. [Google Scholar] [CrossRef]
  70. Cherry, D.S.; Dickson, K.L.; Cairns, J., Jr.; Stauffer, J.R. Preferred, Avoided, and Lethal Temperatures of Fish During Rising Temperature Conditions. J. Fish. Res. Board Can. 1977, 34, 239–246. [Google Scholar] [CrossRef]
  71. Cherry, D.S.; Cairns, J. Biological Monitoring Part V—Preference and Avoidance Studies. Water Res. 1982, 16, 263–301. [Google Scholar] [CrossRef]
  72. Schenekar, T.; Schletterer, M.; Lecaudey, L.A.; Weiss, S.J. Reference Databases, Primer Choice, and Assay Sensitivity for Environmental Metabarcoding: Lessons Learnt from a Re-evaluation of an eDNA Fish Assessment in the Volga Headwaters. River Res. Appl. 2020, 36, 1004–1013. [Google Scholar] [CrossRef]
  73. Lladó Fernández, S.; Větrovský, T.; Baldrian, P. The Concept of Operational Taxonomic Units Revisited: Genomes of Bacteria That Are Regarded as Closely Related Are Often Highly Dissimilar. Folia Microbiol. 2019, 64, 19–23. [Google Scholar] [CrossRef] [PubMed]
  74. Nearing, J.T.; Douglas, G.M.; Comeau, A.M.; Langille, M.G.I. Denoising the Denoisers: An Independent Evaluation of Microbiome Sequence Error-Correction Approaches. PeerJ 2018, 6, e5364. [Google Scholar] [CrossRef] [PubMed]
  75. Barnes, C.J.; Rasmussen, L.; Asplund, M.; Knudsen, S.W.; Clausen, M.-L.; Agner, T.; Hansen, A.J. Comparing DADA2 and OTU Clustering Approaches in Studying the Bacterial Communities of Atopic Dermatitis. J. Med. Microbiol. 2020, 69, 1293–1302. [Google Scholar] [CrossRef] [PubMed]
  76. Pauvert, C.; Buée, M.; Laval, V.; Edel-Hermann, V.; Fauchery, L.; Gautier, A.; Lesur, I.; Vallance, J.; Vacher, C. Bioinformatics Matters: The Accuracy of Plant and Soil Fungal Community Data Is Highly Dependent on the Metabarcoding Pipeline. Fungal Ecol. 2019, 41, 23–33. [Google Scholar] [CrossRef]
  77. Antich, A.; Palacin, C.; Wangensteen, O.S.; Turon, X. To Denoise or to Cluster, That Is Not the Question: Optimizing Pipelines for COI Metabarcoding and Metaphylogeography. BMC Bioinform. 2021, 22, 177. [Google Scholar] [CrossRef]
  78. Lamb, P.D.; Hunter, E.; Pinnegar, J.K.; Creer, S.; Davies, R.G.; Taylor, M.I. How Quantitative Is Metabarcoding: A Meta-analytical Approach. Mol. Ecol. 2019, 28, 420–430. [Google Scholar] [CrossRef]
  79. Zielezinski, A.; Vinga, S.; Almeida, J.; Karlowski, W.M. Alignment-Free Sequence Comparison: Benefits, Applications, and Tools. Genome Biol. 2017, 18, 186. [Google Scholar] [CrossRef]
  80. Hakimzadeh, A.; Abdala Asbun, A.; Albanese, D.; Bernard, M.; Buchner, D.; Callahan, B.; Caporaso, J.G.; Curd, E.; Djemiel, C.; Brandström Durling, M.; et al. Pile of Pipelines: An Overview of the Bioinformatics Software for Metabarcoding Data Analyses. Mol. Ecol. Resour. 2024, 24, e13847. [Google Scholar] [CrossRef]
  81. Mathon, L.; Valentini, A.; Guérin, P.; Normandeau, E.; Noel, C.; Lionnet, C.; Boulanger, E.; Thuiller, W.; Bernatchez, L.; Mouillot, D.; et al. Benchmarking Bioinformatic Tools for Fast and Accurate eDNA Metabarcoding Species Identification. Mol. Ecol. Resour. 2021, 21, 2565–2579. [Google Scholar] [CrossRef]
  82. Dufresne, Y.; Lejzerowicz, F.; Perret-Gentil, L.A.; Pawlowski, J.; Cordier, T. SLIM: A Flexible Web Application for the Reproducible Processing of Environmental DNA Metabarcoding Data. BMC Bioinform. 2019, 20, 88. [Google Scholar] [CrossRef]
  83. Weigand, H.; Beermann, A.J.; Čiampor, F.; Costa, F.O.; Csabai, Z.; Duarte, S.; Geiger, M.F.; Grabowski, M.; Rimet, F.; Rulik, B.; et al. DNA Barcode Reference Libraries for the Monitoring of Aquatic Biota in Europe: Gap-Analysis and Recommendations for Future Work. Sci. Total Environ. 2019, 678, 499–524. [Google Scholar] [CrossRef] [PubMed]
  84. Günther, B.; Knebelsberger, T.; Neumann, H.; Laakmann, S.; Martínez Arbizu, P. Metabarcoding of Marine Environmental DNA Based on Mitochondrial and Nuclear Genes. Sci. Rep. 2018, 8, 14822. [Google Scholar] [CrossRef] [PubMed]
  85. Li, Y.; Tang, M.; Lu, S.; Zhang, X.; Fang, C.; Tan, L.; Xiong, F.; Zeng, H.; He, S. A Comparative Evaluation of eDNA Metabarcoding Primers in Fish Community Monitoring in the East Lake. Water 2024, 16, 631. [Google Scholar] [CrossRef]
  86. Shu, L.; Ludwig, A.; Peng, Z. Standards for Methods Utilizing Environmental DNA for Detection of Fish Species. Genes 2020, 11, 296. [Google Scholar] [CrossRef]
  87. Brys, R.; Everts, T.; Halfmaerten, D.; Neyrinck, S.; Mauvisseau, Q. The Use of Multiple Markers and Internal Positive Controls Significantly Improves Species eDNA Detection Rates and Data Reliability. ARPHA Conf. Abstr. 2021, 4, e65064. [Google Scholar] [CrossRef]
  88. Ruan, H.-T.; Wang, R.-L.; Li, H.-T.; Liu, L.; Kuang, T.-X.; Li, M.; Zou, K.-S. Effects of Sampling Strategies and DNA Extraction Methods on eDNA Metabarcoding: A Case Study of Estuarine Fish Diversity Monitoring. Zool. Res. 2022, 43, 192–204. [Google Scholar] [CrossRef]
  89. Schabacker, J.C.; Amish, S.J.; Ellis, B.K.; Gardner, B.; Miller, D.L.; Rutledge, E.A.; Sepulveda, A.J.; Luikart, G. Increased eDNA Detection Sensitivity Using a Novel High-volume Water Sampling Method. Environ. DNA 2020, 2, 244–251. [Google Scholar] [CrossRef]
Figure 1. Maps of Europe (a), Czechia (b), Klíčava (c), Římov (d), and Žlutice (e) reservoirs with localities of water sampling.
Figure 1. Maps of Europe (a), Czechia (b), Klíčava (c), Římov (d), and Žlutice (e) reservoirs with localities of water sampling.
Fishes 10 00214 g001
Figure 2. The number of reads assigned to taxa, considering pipelines, reservoirs, and seasons, excluding positive and negative controls. Taxa highlighted in red were removed after applying a threshold of 0.1% of the total number of reads assigned for each sample.
Figure 2. The number of reads assigned to taxa, considering pipelines, reservoirs, and seasons, excluding positive and negative controls. Taxa highlighted in red were removed after applying a threshold of 0.1% of the total number of reads assigned for each sample.
Fishes 10 00214 g002
Figure 3. Venn diagrams for the number of taxa, considering pipelines, reservoirs, and seasons, excluding positive and negative controls. The numbers indicate shared taxa detection by used bioinformatic pipelines.
Figure 3. Venn diagrams for the number of taxa, considering pipelines, reservoirs, and seasons, excluding positive and negative controls. The numbers indicate shared taxa detection by used bioinformatic pipelines.
Fishes 10 00214 g003
Figure 4. Beta diversity Jaccard index, considering pipelines, reservoirs, and seasons. The X-axis represents a variance of 41.62% in the X-direction, while the Y-axis represents a variance of 27.59% in the Y-direction.
Figure 4. Beta diversity Jaccard index, considering pipelines, reservoirs, and seasons. The X-axis represents a variance of 41.62% in the X-direction, while the Y-axis represents a variance of 27.59% in the Y-direction.
Fishes 10 00214 g004
Figure 5. Beta diversity Bray index considering pipelines, reservoirs, and seasons. The X-axis represents a variance of 40.08% in the X-direction, while the Y-axis represents a variance of 33.50% in the Y-direction.
Figure 5. Beta diversity Bray index considering pipelines, reservoirs, and seasons. The X-axis represents a variance of 40.08% in the X-direction, while the Y-axis represents a variance of 33.50% in the Y-direction.
Fishes 10 00214 g005
Table 1. Geographical and morphological characteristics of the reservoirs and a list of traditional methods used for fish monitoring in the three studied reservoirs between 2018 and 2020.
Table 1. Geographical and morphological characteristics of the reservoirs and a list of traditional methods used for fish monitoring in the three studied reservoirs between 2018 and 2020.
ParameterKlíčavaŘímovŽlutice
Trophic statemesotrophiceutrophiceutrophic
Elevation above sea level (m)294470509
Volume (mil. m3)8.33414
Flooded area (km2)0.622.11.6
Maximum depth (m)344223
Average depth (m)13169
Boat electrofishing2018–20202018-20202018–2020
Shore seining2019–20202018–20202019
Pelagic trawling2018–20202018–20202018–2020
Hook lines2018–20202018–20192018–2019
Benthic and pelagic gillnets2018–20202018–20202018–2020
Table 2. Species composition detected by FishEcU members (www.fishecu.cz) using traditional methods in the three reservoirs surveyed between 2018 and 2020.
Table 2. Species composition detected by FishEcU members (www.fishecu.cz) using traditional methods in the three reservoirs surveyed between 2018 and 2020.
SpeciesKlíčavaŘímovŽlutice
Lampetra planeri X
Acipenser baerii X
Anguilla anguillaXXX
Rutilus rutilusXXX
Chondrostoma nasus X
Squalius cephalusXXX
Alburnus alburnusXXX
Blicca bjoerkna X
Abramis bramaXXX
Leuciscus idus XX
Leuciscus leuciscusXXX
Leuciscus aspiusXXX
Scardinius erythrophthalmusXXX
Pseudorasbora parvaXXX
Gobio gobioXX
Tinca tincaXXX
Hypophthalmichthys molitrix X
Hypophthalmichthys nobilisX
Ctenopharyngodon idella XX
Cyprinus carpioXXX
Carassius auratusXXX
Barbatula barbatula X
Esox LuciusXXX
Sander luciopercaXXX
Perca fluviatilisXXX
Gymnocephalus cernuaXXX
Lepomis gibbosus X
Oncorhynchus mykiss X
Salmo trutta X
Coregonus maraena X
Silurus glanisXXX
Lota lota XX
Table 3. Number of reads after each step of data processing, including positive and negative controls. SEQme pipeline applies merging before trimming and filtering.
Table 3. Number of reads after each step of data processing, including positive and negative controls. SEQme pipeline applies merging before trimming and filtering.
Data Processing StepsAnacapaBarqueMetaBEATMiFishSEQme
Total from original data22,464,14722,464,14722,464,14722,464,14722,464,147
Total after demultiplexing20,910,51720,910,51720,910,51720,910,51720,910,517
Trimmed and filtered19,095,15318,632,24818,513,85320,910,51719,384,077
Merged18,944,44618,619,52317,792,51917,145,43619,384,077
Filtered and chimera removed18,271,44217,889,79417,782,51315,876,67217,454,382
Assigned11,886,53211,112,72110,347,2278,940,46710,650,189
Unassigned (original data)10,577,61511,351,41212,114,22313,523,76110,794,328
Unassigned (demultiplexed data)9,023,9859,797,79610,563,29011,970,0379,950,328
Table 4. The number of taxa detected and assigned reads, considering pipelines, reservoirs, and seasons, excluding positive and negative controls.
Table 4. The number of taxa detected and assigned reads, considering pipelines, reservoirs, and seasons, excluding positive and negative controls.
Bioinformatic PipelineReservoirKlíčavaŘímovŽluticeKlíčavaŘímovŽlutice
SeasonSummerAutumn
AnacapaTaxa122013173021
Reads1,027,2233,207,0051,171,369152,1151,748,946509,967
BarqueTaxa122212162920
Reads1,125,2573,610,6861,171,767139,8391,826,975535,515
MetaBEAT Taxa112111152819
Reads1,060,1673,417,6051,102,758129,1561,697,086452,972
MiFishTaxa122112162920
Reads916,4962,927,711939,469113,1221,490,978432,617
SEQmeTaxa101911172918
Reads1,230,3363,316,3881,228,640135,4391,729,903559,636
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

dos Santos, R.A.; Blabolil, P. Comparison of Bioinformatic Pipelines for eDNA Metabarcoding Data Analysis of Fish Populations. Fishes 2025, 10, 214. https://doi.org/10.3390/fishes10050214

AMA Style

dos Santos RA, Blabolil P. Comparison of Bioinformatic Pipelines for eDNA Metabarcoding Data Analysis of Fish Populations. Fishes. 2025; 10(5):214. https://doi.org/10.3390/fishes10050214

Chicago/Turabian Style

dos Santos, Romulo A., and Petr Blabolil. 2025. "Comparison of Bioinformatic Pipelines for eDNA Metabarcoding Data Analysis of Fish Populations" Fishes 10, no. 5: 214. https://doi.org/10.3390/fishes10050214

APA Style

dos Santos, R. A., & Blabolil, P. (2025). Comparison of Bioinformatic Pipelines for eDNA Metabarcoding Data Analysis of Fish Populations. Fishes, 10(5), 214. https://doi.org/10.3390/fishes10050214

Article Metrics

Back to TopTop