The eDNA-Container App: A Simple-to-Use Cross-Platform Package for the Reproducible Analysis of eDNA Sequencing Data

: The analysis of environmental DNA (eDNA) is a powerful and non-invasive method for monitoring the presence of species in ecosystems. However, ecologists and laboratory staff can find it challenging to use eDNA analysis software effectively due to the unfamiliar command-line interfaces used by many of these packages. Therefore, we developed the eDNA-container app , a free and open-source software package that provides a simple user-friendly interface for eDNA analysis. The application is based on the popular QIIME2 library and is distributed as a Docker image. The use of Docker makes it compatible with a wide range of operating systems and facilitates the reproducible analysis of data across different laboratories. The application includes a point-and-click user interface for selecting sequencing files, configuring parameters, and accessing the results. Key pipeline outputs, such as sequence quality plots, denoising, and ASV generation statistics, are automatically included in a PDF report. This open-source and freely available analysis package should be a valuable tool for scientists using eDNA in biodiversity and biosecurity applications.


Introduction
Environmental DNA (eDNA) is genetic material originating from organisms in the environment, including shed cells, secretions, and whole microorganisms [1].The highthroughput sequencing of eDNA has proven to be a powerful tool in ecology and biosecurity as it can be used to monitor the presence of species, assess the impact of human activities on ecosystems, and track the spread of invasive species [2][3][4][5].eDNA sampling is also less invasive than traditional biodiversity monitoring methods such as electrofishing, and being relatively inexpensive, it is a cost-effective option for large-scale sampling surveys.Due to the ease of sampling, eDNA can be used to survey a wide range of habitats, including those that are difficult to access, such as deep lakes and remote streams [6][7][8].
There are several bioinformatics tools that are commonly used for eDNA analysis including vsearch, usearch, MiFish, and QIIME2 [9 -13].These tools clean and process shortread sequencing data, assign species to reference sequences, and quantify the species diversity of each sample.However, the unfamiliar command-line interface adopted by many of these packages can make it difficult for wet lab staff and field ecologists to use the software effectively.Additionally, Microsoft Windows versions of these packages are often not available due to the popularity of UNIX/Linux amongst developers of scientific software.
QIIME2 is one of the most popular software packages for DNA barcoding-based community analyses [14][15][16].This open-source package includes plugins for workflows such as cutadapt for quality trimming, DADA2 for denoising and building amplified sequence variants (ASVs), as well as tools for building custom taxonomic classifiers [17][18][19].This package was developed to run natively on Linux systems; however, a command-line interface can be accessed on Microsoft Windows via Windows Subsystem for Linux (WSL) or using a Linux virtual machine.WSL and virtual machines are relatively advanced computer utilities, which could hinder the wider adoption of this package for eDNA analysis.
Docker is a software platform that allows developers to build, run, and share applications in containers.The containers are lightweight, standalone packages of software that include everything needed to run an application, including code, system libraries, and program settings.Docker makes it easy to deploy applications because it provides a consistent way to package and run software regardless of the underlying operating system.These features have made Docker an important tool for scientific software development because applications will generate consistent outputs irrespective of the underlying computer frameworks being utilized [20][21][22][23].
Here, we introduce the eDNA-container app, which is an eDNA analysis pipeline that uses QIIME2 for amplicon sequence variant generation and taxonomic assignment.The application includes a graphics user interface (GUI) that allows the user to configure runtime and quality control parameters, select primers, and utilize custom taxonomic databases.Key pipeline outputs, such as sequence quality plots and ASV generation statistics are automatically included in a final PDF report.The final feature counts and taxonomic assignments across all samples are provided in a comma-separated file (CSV) that can be viewed using spreadsheet software (i.e., Excel or LibreOffice Calc).This free open-source application is available on the Docker hub, and developers can access the underlying python and bash code by directly cloning the projects GitHub repository.

Methods
A Snakemake (version 7.24) file defines the execution of a QIIME2 (version label 2023.2) eDNA analysis workflow [13,24].A summary of the Snakemake rules is shown in Figure 1.Snakemake rules automatically build a QIIME2-compatible metadata file based on the FASTQ file names selected by the user through the GUI (Figure S1).The only specification is that unique sample names are included in the FASTQ filename (gzipped) and that the short-read data are paired-end.The metadata file is used to build a QIIME2 data object by running the qiime tools import and qiime demux summarize commands.The qiime cutadapt trim-paired plugin command removes primers and adaptor sequences [17].The number of primers identified in the sequencing data and the number of reads that pass trimming filters are included in a final run PDF report.The DADA2 plugin is used through the qiime dada2 denoise-paired command and denoise parameters p-trunc-len-f, p-trunc-len-r, p-max-ee-f, p-max-ee-r, p-trunc-q, and p-chimera-method set via a configuration file [18].This configuration file is modified by the user through the GUI (Figure S2).
The eDNA-container app is distributed as a Docker image based on continuumio mini-conda3, a bootstrapped version of miniconda.The image was built using Docker version 18.09.7 on an HP workstation running Ubuntu 22.04 LTS.The QIIME2 pipelines bash and python scripts are maintained in a separate git repository, which is cloned into the container as part of the build script.Software tools are installed inside the container through a conda environment YAML file.The Flask library (version 2.3.2) was used to build a browser-based GUI.The GUI is displayed using a virtual server running on the host computer, so no network connection is required, and no data are shared over the internet.Taxonomy is assigned to the ASVs based on a QIIME2-compatible taxonomic database built using the feature-classifier classify-sklearn command [19].The pipeline is distributed with a database based on the MIDORI2 (12S rRNA) reference sequences and the Teleo fish amplicon primers (teleoF: 5′-ACACCGCCCGTCACTCT-3′, teleoR: 5′-CTTCCGG-TACACTTACCATG-3′) [25,26].Any compatible QIIME2 database "qza" file, however, can be selected via the GUI (Figure S2).Alpha diversity rarefaction plots and taxonomic barplots are generated by the commands qiime taxa barplot and qiime diversity alpha-rarefaction, respectively.A PDF report (Supplemental File S1) containing run metrics is generated by pandoc (version 7.2.9) from a markdown template populated by the python library jinja2 (version 3.1.2).
The eDNA-container app is distributed as a Docker image based on continuumio mini-conda3, a bootstrapped version of miniconda.The image was built using Docker version 18.09.7 on an HP workstation running Ubuntu 22.04 LTS.The QIIME2 pipelines bash and python scripts are maintained in a separate git repository, which is cloned into the container as part of the build script.Software tools are installed inside the container through a conda environment YAML file.The Flask library (version 2.3.2) was used to build a browser-based GUI.The GUI is displayed using a virtual server running on the host computer, so no network connection is required, and no data are shared over the internet.

The eDNA-Container App
The eDNA-container app is based on a core QIIME2 pipeline, with data reformatting carried out using python scripts [13].The only software requirement is that the free application Docker Desktop is installed on the host system.Pipeline execution is managed by Snakemake with the entire application packaged in a Docker image so that it is cross-platform and will run reproducibly across different computer frameworks [24].The image can be obtained freely from the Docker hub using the search tag "dwheelerau/edna" or using the Docker pull command from a terminal window.Advanced users with access to Linux can use the pipeline independently of Docker by cloning the conda environment from the supplied environment file and executing the snakemake workflow manually.A Flask app is used as a GUI front-end that can be accessed using a standard web-browser with no data shared across the internet.An extensive user-friendly guide targeted at ecologists and laboratory staff is provided with the package (Supplemental File S2).

The User Interface
The eDNA-container app GUI uses a Flask web interface served on the host's computer, so no internet connection is required to use the package.Initially, a folder of paired-end sequencing data is selected using a folder selection dialogue (Figure S1).The pipeline is configured to accept paired-end FASTQ (gzip) sequencing data, which is the standard output from the Illumina MiSeq platform widely adopted by the eDNA research community.
After selecting the sequencing data, a project name is added, as well as the amplicon primer sequences, and the QIIME2 plugin parameters are adjusted (Table 1).The primer sequence information is used by the cutadapt plugin to remove any adaptor or primer sequences contained in the raw reads, with the percentage of reads containing primers included in the final project report [17].This information is a critical quality control step as reads lacking the expected primer sequences could be an indication of sample misidentifications or poor read quality.The trunc-len-f and trunc-len-r settings parameters can be adjusted based on the read quality profiles and the size of the expected forward and reverse read overlap (Figure S2).Read quality plots and ASV statistics are presented in the final run report and can be used to adjust the previously described settings.Three chimera removal options are available, including consensus, pooled, or none, as described in the QIIME2 documentation.After entering the specific runtime settings, the pipeline will begin to process the eDNA data, and upon completion, the results are provided as a compressed zip file.In testing on a HP Z440 desktop workstation (Intel Xeon E5-1620, Intel Corporation, Santa Clara, CA, USA) with 20 GB of RAM, 12,000 paired-end reads were processed in <5 min.The taxonomic assignment step is computationally intensive in terms of RAM usage, and for this reason, a minimum of 8 GB of RAM is required to run the software (16 GB is recommended).The results include intermediate files and runtime logs that are useful for troubleshooting and parameter optimization.Output related to the pipeline progress is printed to the Docker terminal window as this contains useful information should the pipeline report that the run has failed (Figure S3).Table 1.A summary describing the key settings that are available to the user via the GUI.

Setting Explanation
Project name A name for the project (will be used as the project report PDF) Forward primer Forward PCR primer sequence for cutadapt primer/adapter removal Reverse primer Reverse PCR primer sequence for cutadapt primer/adapter removal trunc-len-f Retain n base-pairs of forward read (0 = no trimming) trunc-len-r Retain n base-pairs of reverse read (0 = no trimming) max-ee-f Forward reads with > number expected errors will be discarded max-ee-r Reverse reads with > number expected errors will be discarded trunc-q Truncate reads at first instance of quality score ≤ value chimera-method Chimera removal method: consensus, pooled, or none Taxonomic database File location for a QIIME2-compatible taxonomic database (optional)

Pipeline Outputs
The key pipeline outputs are shown in Table 2.The main taxa count spreadsheet summarizes species identifications across all samples and includes the ASV sequence used to assign the taxonomic label.The ASV sequence included in the spreadsheet allows for NCBI-BLAST searches so that the taxonomic identification can be independently verified, which is an important quality control step in eDNA analyses (Supplemental File S3).When multiple ASVs are given for the same taxonomic assignment, these counts are summed, and the most common variant presented in the spreadsheet, with the number of variants, is included in the "Reference_variants" column.Also, alpha diversity plots are created allowing the researcher to determine if the sequencing depth was likely sufficient to identify all species found in each sample.These plots are interactive when viewed using the online QIIME2 viewer.A PDF report is populated with information on the key quality control metrics and ASV statistics (Supplemental File S1).This report contains the read quality plots and DADA2 denoise outputs that are important for quality control and can be used to adjust runtime settings to improve the number of forward and reverse read overlaps [18].Table 2.A summary of the output files generated by the eDNA-container app (qzv files can be viewed interactively using the QIIME2 viewer).

A Comparison to the MiFish eDNA Pipeline
MiFish is a mature eDNA processing pipeline that is under active development [11,12].We were interested in testing the performance of the eDNA-container app against this wellestablished pipeline.Therefore, we used the eDNA-container app to analyze the test dataset provided by MiFish, which is based on an eDNA dataset generated by [27].The species lists reported across all samples by both packages were very similar, with 18 of 19 species identified by the eDNA-container app (Supplemental File S3) also being found by MiFish (Supplemental File S4).The eDNA-container app identified a Bos taurus barcode that was not in the MiFish outputs, but this was a low confidence assignment (<90%).MiFish shared 19 of 21 species with the eDNA-container app, with the two unique species in the MiFish results being Anas platyrhynchos and Blicca bjoerkna.Once again, these species assignments unique to MiFish were flagged as low confidence by the software.Table 3 shows a comparison of the number of species detected in each sample found in the tested dataset.There is a high level of consistency across both pipelines, and differences were detected only for low-count taxa.The detection of low-count taxa is strongly influenced by sampling biases due to the different read quality filtering and ASV algorithms adopted by the eDNA-container app and MiFish.
The main difference in the outputs from the two pipelines is that MiFish consistently reports higher read counts for each taxon.The reason for these higher taxa counts is that MiFish reports single-end read counts, whilst the eDNA-container app reports amplicon fragment counts (the R1 and R2 read pairs are considered as a single fragment).Another reason for the difference in taxa counts between the two pipelines is that the eDNA-container app conservatively only counts amplicons if the PCR primers can be identified at the ends of the paired-end reads.Although this latter strategy reduces the influence of putative PCR artifacts, it does come at the cost of sampling depth.
In summary, despite differences in the detection of low-count taxa, the outputs from the eDNA-container app and MiFish are very similar in terms of high confidence identifications and abundance rankings of taxa.Users familiar with the popular QIIME2 ecosystem of tools will benefit from eDNA-container app outputs as they will be compatible with existing downstream processing pipelines.The ability to view raw data outputs from the eDNAcontainer app using the drag-and-drop QIIIME2 viewer is also advantageous for scientists who prefer web-based interactive plotting tools.As MiFish and the eDNA-container app use different underlying algorithms, the availability of both packages provides scientists with two alternative pipelines to assess the robustness of captured eDNA profiles.

Discussion
The ability to monitor ecosystems using non-invasive and relatively inexpensive methods such as eDNA will contribute to the better management of these important habitats and the resources they contain [2,4,5,24,29].However, many of the free and opensource tools used for eDNA analysis have complex command-line interfaces that can be challenging for wet lab staff to utilize [30,31].The eDNA-container app uses a point-andclick interface that will allow lab scientists to analyze their own eDNA sequencing data using the latest bioinformatics software.The distribution of the app as a Docker image allows for cross-platform usage and supports reproducible eDNA analyses across different computer frameworks.
The eDNA-container app automatically generates a PDF report that describes important runtime parameters so that researchers can adjust quality trimming stringency and ASV generation settings.The final ASV spreadsheet includes the DNA sequence of the loci so taxonomic classifications can be confirmed quickly using tools such as NCBI-BLAST.Also, alpha diversity and species barplots are reported so scientists can assess the taxonomic sampling depth obtained in the experiment, as well as visualize the species diversity across samples.
The eDNA-container app is customizable and can be used to analyze data from any eDNA loci or QIIME2-compatible taxonomic database.Importantly, the use of Docker supports reproducible research in the eDNA community, which is an important development as the underlying methodologies continue to be optimized [3,32].The development of a free and easy-to-use analysis application will support the increased uptake of eDNA technologies and thus help improve biosecurity and ecosystem monitoring.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/app14062641/s1. Supplemental_figs.docx:Supplemental Figures S1-S3.Supplemental File S1: an example PDF report generated using the eDNA-container app.Supplemental File S2: a guide for ecologists and lab scientists on using Docker and the eDNA-container app.Supplemental File S3: an example of the eDNA taxonomy counts spreadsheet generated by the app based on the data from [27].Supplemental File S4: MiFish outputs from the [27] dataset.
Author Contributions: D.W. conceived the idea for the eDNA-container app, wrote the software, and drafted the manuscript.L.B., A.K. and M.L.R. provided critical feedback during testing, contributed to the project development, and reviewed the manuscript.All authors have read and agreed to the published version of the manuscript.

9 Figure 1 .
Figure 1.A summary of the Snakemake rules used to execute the eDNA-container app pipeline.Each box represents a Snakemake rule that runs a specific step in the pipeline as indicated by the description.The rules are executed in the order indicated by the arrow.

Figure 1 .
Figure 1.A summary of the Snakemake rules used to execute the eDNA-container app pipeline.Each box represents a Snakemake rule that runs a specific step in the pipeline as indicated by the description.The rules are executed in the order indicated by the arrow.

Table 3 .
Brys et al. (2021)ies detected by the eDNA-container app and MiFish based on the data fromBrys et al. (2021).When species are only detected by one pipeline, it is indicated in parentheses.Species are only considered if the assignment confidence level is medium/high for MiFish or >90% for the eDNA-container app.SRA accession numbers for each sample from Brys et al. (2021) are provided in column 1.