Aquatic Organisms Research with DNA Barcodes

Since their inception, DNA barcodes have become a powerful tool for understanding the biodiversity and biology of aquatic species, with multiple applications in diverse fields such as food security, fisheries, environmental DNA, conservation, and exotic species detection. Nevertheless, most aquatic ecosystems, from marine to freshwater, are understudied, with many species disappearing due to environmental stress, mostly caused by human activities. Here we highlight the progress that has been made in studying aquatic organisms with DNA barcodes, and encourage its further development in assisting sustainable use of aquatic resources and conservation.


Introduction
Since its inception as an ambitious global bioidentification system [1], DNA barcodingthe use of a standardized gene fragment as an internal tag for species identificationhas established itself as an important method in biodiversity sciences, with more than 12,000 papers published (Web of Science search "DNA" and "barcod*", 10 June 2021). The initial proposal by Hebert and collaborators recommended the mitochondrial cytochrome c oxidase I (COI) marker for animals. However, in the case of plants and fungi, other more effective markers have been proposed, such as the maturaseK (matK) and ribulose biphosphate carboxylase large subunit (rbcL) choloroplast markers for flowering plants [2]. Several markers have been suggested as DNA barcodes for diatoms, for example, from 5.8S + ITS-2 [3] to rbcL [4], but studies on these taxa have been limited. For fungi, the ITS has been broadly accepted [5]; however, its implementation also has several problems, particularly in some aquatic species [6], and despite its importance, we found only six papers for DNA barcoding aquatic fungi.
DNA barcoding has been repeatedly demonstrated as a fit-for-purpose method of biodiversity surveying, showing high rates of congruence with traditional taxonomy in well-known groups such as fishes and birds [7-10], while its power as a predictive tool in biodiversity sciences also quickly became apparent, spearheading new molecular frameworks for de novo species discovery [11][12][13]. Here, some striking examples of overlooked diversity have been observed [14,15], and similar trends have been depicted in numerous aquatic ecosystems. Currently, DNA barcoding can accelerate biodiversity inventories and assist the work of dwindling numbers of taxonomists in many countries. The importance of data sharing and potential for collaborative research was recognized early on, resulting in the creation of the Barcode of Life Data System (BOLD) [16]. Sequence data could be associated with detailed specimen metadata and photographs, supporting trace files, and most importantly vouchered specimens in museum collections [16]. The online workbench also provides the Barcode Index Number (BIN) system, equivalent to a Molecular Operational Taxonomic Unit (MOTU) for all specimens that cover minimal data standards [12], creating a standardized referencing tool for unidentified organisms.
In this overview, we will cover recent trends in the study of aquatic life with DNA barcodes and highlight examples illustrating its utility.

Progress in Aquatic DNA Barcoding Studies
General assessments of the use of DNA barcodes in the marine realm were provided in 2011 [17] and 2016 [18], when the number of DNA barcoding studies on aquatic biotas was somewhat less than around 160 per year ( Figure 1A). Since then, that number has increased. As such, there is a clear upward trend of DNA barcoding studies, with more than 2500 hits during the last two years ( Figure 1A). The words "barcod*" and "DNA" are becoming increasingly used, from zero to more than 1400 hits per year in 2019 and 2020 in the Web of Science database (consulted 10 June 2021) ( Figure 1A). However, when we restrict the search to aquatic environments, this figure lowers to 320, with an increment since 2014 ( Figure 1B), in comparison with the other trends ( Figure 1A) (search "DNA" and "barcod*" and "marin*" or "aquat*" or "freshwat*" or "estuar*" or "fish*" in the Web of Science). Considering that more than 75% of our planet is represented by aquatic environments, this is a modest increment of DNA barcoding studies on aquatic organisms. This modest increment has had mostly the fishes as focal group, where the use of DNA barcoding has been more widespread than invertebrates or aquatic plants ( Figure 1A).
The majority of studies included marine and freshwater environments ( Figure 1B). It is evident that estuarine systems are almost lacking in analyses with DNA barcoding ( Figure 1B).
A good account of the barcoding progress in crustaceans was made in 2015 by Raupach and Radulovici [19], who reported a total of 164 studies, with most studies focusing on Decapoda. Progress in understanding the biodiversity of crustaceans is more advanced for marine environments than in freshwater [19].
Barcoding of aquatic insects has seen most progress in Europe, with Germany the most advanced country, although the diversity there is not high [20,21]  Another complex group with aquatic immature stages, the chironomids, are starting to be studied with DNA barcodes, where the focus has been in different speciose genera. such as Tanytarsus, found almost everywhere [34,35], and other speciose groups [36][37][38].
In some invertebrate groups such as Polychaeta, with more than 10,000 species described, a total of 65 barcoding studies have been published prior to 2020. Copepoda, which are some of the most abundant organisms in our planet [39], encompassing 14,300 species (World Association of Copepoda; @copepodology), have been targeted by only 87 studies so far, with eight and ten in 2019 and 2020, respectively ( Figure 1C  This trend likely arises from an assortment of shortcomings hindering the development of DNA barcode as a routine survey tool for several groups. In many specialist groups, there are likely issues obtaining funding and taxonomic expertise required to identify the voucher specimens. In others, there are problems with availability of universal primers for amplifying COI. In copepods, for instance, the target marker has been difficult to amplify reliably, and some researchers proposed to adopt the nuclear 28S gene as an alternative marker for COI barcoding [41]. However, 28S fails to distinguish between the species of several groups of crustaceans, such as ostracods, and therefore is of limited use as a species diagnostic [42]. Preservation methods also negatively impacted the acquisition of COI sequences in some cases, such as the shift from liquid nitrogen, which damages voucher specimens, to ethanol, which resulted in a lower yield of mitochondrial DNA, as with the cladoceran Holopedium [43]. However, sequencing based on frozen samples with liquid nitrogen proved to increase sequencing success in this case [44]. Due to these problems, new protocols involving cold ethanol were developed with good success in many freshwater zooplankton species, using only a single pair of primers [45].
Regarding aquatic invertebrates, studies indicated similar trends. In freshwater mites of Yucatan Peninsula (Mexico), a single DNA barcoding campaign across 24 karstic environments yielded 77 MOTUs, most of them new to science [107]. Similarly, in Panama, a study of invertebrate communities in four streams, with an effort of two hours sampling, yielded~100-106 MOTUs [108]. Similarly, García-Morales et al. [109] detected a complex of 13 species within the rotifera Lecane bulla across 25 localities from south of the United States to Mexico. Elías-Gutiérrez et al.
[45] detected a total of 325 BINs among zooplanktonic invertebrates from lakes of Canada and Mexico, with only three BINs (two cladocerans and one copepod) shared between these two countries, suggesting much narrower species distribution ranges in North America freshwater zooplankton than previously thought. Moreover, in an important oligotrophic lake hosting the largest stromatolites in the world [110], the number of possible species increased from about 20 to more than 80, with a projection near to 120 [45]. The closeness (about 100 m) of this lake to a nearby deep sinkhole (64 m) shows almost an entirely different zooplanktonic fauna, explained by a different chemistry of the water [111][112][113].
In Asia, multiple cases of cryptic diversity have been detected among freshwater shrimps [114][115][116][117], other crustaceans and invertebrates [118][119][120][121][122][123], and parasites [124,125] with implications in conservation and phylogeny [126,127]. Some cases of species under high fishing pressure have been discovered to be complexes of species [128] or unexpected species [129], with important implications for fisheries management. Among African invertebrates, studies are much more limited, mostly focusing on fish parasite communities [124,130,131]. Other parasites studied are helminths, which use aquatic invertebrates as intermediate hosts and are of medical importance [23, [132][133][134], and the use of DNA barcoding to identify larval phases has been explored with success [135]. Due to their medical or invasive importance, some studies focused on molluscs [136,137], highlighting three cryptic species in Etheria, for instance [138].

Integrative Taxonomy
Numerous taxonomic revisions have been conducted using DNA barcodes [139], guiding the detection of diagnostic characters in new species descriptions for marine [140][141][142][143][144] and freshwater organisms [145][146][147][148][149][150], or assisting in understanding species range distributions [88,90,118]. Some important complexes of species-groups, which have been used as indicators of toxicants or live food, have also been explored by integrating DNA barcodes with other source of information such as morphology, biogeography and ecology as part of integrative taxonomy [151,152]. Such studies have focused on widely used organisms as biological indicators such as Moina micrura, one of the most ubiquitous freshwater cladocera of the world. Widely used in ecotoxicological studies, DNA barcoding indicated that it constitutes a complex of species, with the nominal species being limited to Eurasia [152]. Species descriptions based on integrative taxonomy of freshwater zooplankton provide an enriched framework, allowing not only the delimitation of species, but also access to a wealth of information guiding the acquisition of additional knowledge about their distributions and biology [44, 119,[145][146][147][148][153][154][155]. Some other groups of marine invertebrates, such as polychaetes, have also been described in an integrative framework [140,142], but a substantial amount of work remains in uncovering the full diversity.

Applications
Once DNA barcode libraries are available, several applications have been readily demonstrated. In terms of aquatic ecology, DNA barcoding has been used to identify fish larvae [52,56,[156][157][158][159][160][161][162][163] and eggs [54] to the species level, with important implications for fisheries or breeding areas' management. In case of invertebrates, DNA barcoding enabled linking early life stages and adults in aquatic insects [25, 164,165]. Along the same line of application, DNA barcoding has been used for food security [166]. Species substitution of food products has been one of the most studied applications, with more than 50 papers devoted to this topic (search: "food mislabel*" and "coi", Web of Science as of 3 March 2021).
The first study of market substitution was published in 2008, focusing on North American seafood, which evidenced 25% of mislabelling [167], this frequency of replacement being lower in Mexico [168]. Papers dealing with this topic have come from Taiwan [169] and Europe [145][146][147][148], mostly devoted to fraud in seafood, and also including some other Latin American countries such as Argentina [149].
Detection and impacts of exotic species has been another promising application, such as the invasion of the lion fish (Pterois volitans) in the Caribbean [55,150], or the Amazon suckermouth catfish Pterygoplichthys in Mexico [151].

Future Trends
High throughput sequencing (HTS) methods have had a significant impact in DNA barcoding. DNA barcode reference libraries can now be assembled at larger scale and at lower cost [152] and can even be generated on the lab bench or in the field without the requirement for expensive sequencing equipment [153,154]. However, the main advances have been in DNA metabarcoding; combining HTS with the principals of DNA barcoding has opened a diverse array of applications in terms of species detection and biodiversity monitoring. Of particular relevance in aquatic ecosystems are studies investigating diet ecology and trophic interactions between organisms [155], where in short time passed from Sanger sequencing to study this topic [150] to metabarcoding [170], the structuring and dynamics of plankton communities [156,157], marine benthic biomonitoring [158], and freshwater invertebrate water quality assessment [159]. As well, gut contents can lead to the discovery of unknown biodiversity, as demonstrated in marine and freshwater fishes [150,171]. COI barcode reference libraries for animals have now become standard resources for DNA metabarcoding applications, and have been recommended as the standard metabarcode for metazoans [160]. Environmental DNA (eDNA) metabarcoding techniques have also further transformed biodiversity research by extending metabarcoding to include indirect sequencing of animal communities via their trace DNA [161]. These advances have opened up numerous novel applications in aquatic sciences and ecological monitoring [162,163], but further work is required to optimize the use of the COI barcode for these applications [164].
Automation and big data scientific initiatives also have the potential to provide deep insights into aquatic biodiversity and environmental functioning over extended spatial and temporal scales. Here, DNA metabarcoding methods can be combined with machine learning to predict ecological quality [165], or as an automated plankton recorder [156]. In situations where taxonomic information is not available, such as in many understudied invertebrate groups, taxonomy-free MOTUs can be generated and standardized across studies [166]. New developments in public platforms are under development (e.g., mbrave.net), offering solutions in scalability and standardization for HTS-based approaches to biodiversity, biomonitoring, and biosecurity science. The need to expand publicly available databases applies not only to biodiversity discovery, but is also an essential tool in monitoring traded animals [167,168], exotic species, parasites, pathogens, and almost any species present in our planet [169].
Undoubtedly, most aquatic ecosystems on our planet are tragically understudied, particularly in the tropics, and efforts to understand interactions between anthropogenic pressures and global climate change will be only partial, if not flawed, without accurate biodiversity knowledge. With the use of the new bioinformatic tools and DNA barcoding workflows, important contributions to the conservation of marine, brackish and freshwater organisms will be achieved.
Finally, we must clarify that DNA barcodes should never replace the need for taxonomists. On the contrary, DNA barcodes are an additional suite of characters that can be used in taxonomy, and can also assist in the identification of species by non-specialists that require accurate identification of their specimens.