DNA Barcoding for Species Identification of Moss-Dwelling Invertebrates: Performance of Nanopore Sequencing and Coverage in Reference Database

: In view of the current biodiversity crisis and our need to preserve and improve ecosystem functioning, efficient means for characterizing and monitoring biodiversity are required. DNA barcoding, especially when coupled with new sequencing technologies, is a promising method that can, in principle, also be employed by taxonomic lay people. In this study we compare the performance of DNA barcoding by means of a third-generation sequencing technology, nanopore sequencing with classical Sanger sequencing, based on a sample of invertebrates collected from moss pads in a bog in Austria. We find that our nanopore sequencing pipeline generates DNA barcodes that are at least as good as barcodes generated with Sanger sequencing, with the MinION producing better results than the Flongle flowcell. We further find that while many arthropod taxa are well covered in the international reference DNA barcode database BOLD, this clearly is not the case for important taxa like mites and springtails, which hampers large-scale biodiversity assessments. Based on examples from our study we further highlight which factors might be responsible for ambiguous species identification based on BOLD and how this can, at least partly, be solved.


Introduction
To date, taxonomists have described >1.7 million species, which, however, is only a minor portion of the predicted total number of species on Earth [1].Invertebrates constitute a major part of our planet's biodiversity.As such, they are crucial for ecosystem functioning and influence ecosystem services important to mankind both in positive and negative ways [2,3].Therefore, there is an increasing interest in and need for monitoring the presence/absence and abundance of certain indicator species or entire species communities [4][5][6][7].Yet, with the infamous taxonomic impediment [8] and its main consequence, the taxonomic gap (the discrepancy between the true species diversity and our knowledge of it; [9]), assessing whole species communities via traditional morphological taxonomy is virtually impossible for most ecosystems.In addition, morphological analyses of diverse mixed samples covering a broad taxonomic spectrum quickly become very time-consuming and require a multitude of taxonomic specialists.Moreover, the presence of cryptic diversity, i.e., phenotypically invariant but genetically clearly distinct species (e.g., [10][11][12][13][14][15]), inevitably leads to biased estimates of the actual biodiversity.
The advent of DNA barcoding, a standardized method for the identification of organisms based on specific sequences of their DNA [16], and the establishment of reference DNA barcode databases like the global database BOLD (www.boldsystems.org)have revolutionized biodiversity research.Now, it becomes feasible even for taxonomic laymen to assign unknown specimens or parts of specimens to species based on their DNA barcodes.
Identification success, however, strongly depends on the quality and completeness of the reference data.Even though there has been a tremendous increase in reference DNA barcode data available in BOLD in the last few years, species coverage is far from complete for most taxa and most parts of the world.Indeed, the number of DNA barcodes available in BOLD differs a lot among countries and does not reflect the distribution of biodiversity across our planet [17].
Currently, Sanger sequencing [18,19], which amplifies and sequences individual samples, is (still) regarded as the gold standard for DNA barcoding.It is considered a highly accurate method but is time-consuming and thus also expensive for larger sample sizes [20], which makes it unfeasible in large-scale biodiversity assessment/monitoring activities.
The first two decades of this century have seen huge developments in so-called nextgeneration sequencing technologies, allowing simultaneous sequencing of a large number of short fragments quickly.With these second-generation sequencing methods, the field of metabarcoding, the barcoding of pooled samples, either from bulk samples or from environmental DNA (eDNA), was opened [21,22].However, even though (e)DNA-metabarcoding has clearly revolutionized biodiversity assessment, facilitating the rapid assessment of entire species communities in a cost-efficient way, the method does not come without shortcomings.Primer efficiency differs among taxa, such that, in bulk samples, some of them are preferentially amplified, depending on the number and position of mismatches between primer and template, while other taxa might not be amplified at all [23,24].Thus, some taxa might go undetected when employing metabarcoding.While significant steps forward have been made in using (e)DNA-metabarcoding data for inferring species abundance and/or biomass [25][26][27], this approach still requires preliminary investigations to determine species-and context-specific factors that influence the relationship between read counts and species abundance/biomass [27], especially in multi-species communities affected by differences in primer efficiency [28].
More recently, third-generation sequencing technologies have become available, facilitating the sequencing of longer reads, thus overcoming another limitation of secondgeneration sequencing approaches, while still yielding amounts of data sufficient to allow for multiplexing of hundreds to thousands of samples.Among the currently available thirdgeneration sequencing technologies, Oxford Nanopore Technologies' (ONT) nanopore sequencing technology, especially the MinION and Flongle systems, is particularly attractive for biodiversity monitoring and assessment.These systems are relatively low-cost, portable devices that deliver real-time sequencing [29].They also require much less handson time per sample than Sanger sequencing, particularly when projects scale to thousands of samples.While for nanopore sequencing cost per individual base pair is still comparatively high compared to second-generation sequencing technologies, it is considerably lower than for standard Sanger sequencing, especially for large sample sizes [19].Furthermore, the latest releases of nanopore sequencing chemistry, flowcells and basecalling models promise to improve sequencing accuracy, which was perhaps considered the largest drawback of nanopore sequencing compared to Sanger and second-generation sequencing [30].In addition, a range of software solutions have been developed specifically for the correction of nanopore amplicon reads in the context of DNA barcoding (NGSpeciesID [31], ONTbarcoder [32], Amplicon-sorter [33]).While nanopore sequencing can be used for metabarcoding [34][35][36], its real strength for biodiversity assessment and monitoring is its potential for rapid and cost-effective individual-based sequencing of DNA barcodes (or multiple markers), which not only provides presence/absence data but also reliable information on the abundance of individual species.Current protocols allow for the generation of DNA barcodes for up to 10,000 specimens in a single MinION run [32].Thus, with increased automatization and optimization of the few steps involved from specimen sorting to actual sequencing, nanopore barcoding has great potential for becoming a standard tool for basic and applied biodiversity research.
In the present study, which was conducted in the frame of a student course on wetland restoration, we aimed at showing the potential and problems associated with nanopore sequencing for biodiversity assessment and monitoring, with a particular focus on the invertebrate fauna of mosses.Specifically, we assessed (1) whether the quality of DNA barcodes differed between nanopore sequencing (using both Oxford Nanopore Technologies' MinION and Flongle systems) and traditional Sanger sequencing and (2) to what extent the barcoded taxa were covered by the available reference data in BOLD, enabling reliable species identification.Furthermore, we discuss our findings in a broader context, i.e., with respect to current metabarcoding strategies employed for biodiversity research.

Sampling and DNA Extraction
We collected seven moss pads (~15 × 15 cm; five species: Sphagnum capillifolium, S. palustre, S. nemoreum, Lemobryum glaucum, and Pleurocium schreberi) from a peat moss lawn in a bog in the Natura 2000 area Gamperlacke, near Liezen, Austria (47.554 • N, 14.283 • E) on 13 June 2022 under the permit ABT13-198250/2020-9 issued by the provincial government of Styria.From these moss pads, 88 invertebrate specimens were collected in the field by shaking the moss above a white plastic foil.Individual invertebrate specimens were immediately put in 2 mL Eppendorf tubes with >99% ethanol.We did not try to identify the specimens to species level, as we wanted to simulate an approach in which species identification is purely based on available reference data, as propagated for efficient monitoring of biodiversity based on DNA data (individual barcoding, (e)DNA-metabarcoding).Whole genomic DNA was extracted from either single legs or other body parts (for larger specimens) or whole specimens (smaller specimens) using a rapid Chelex approach [37].
The samples were PCR-amplified for different sequencing approaches: nanopore sequencing using both the MinION and Flongle systems and the traditional Sanger sequencing.

Nanopore Sequencing
To prepare the samples for nanopore sequencing, we used the PCR primers LCO1490 and HCO2198 [38].This primer pair is considered the standard barcoding primer pair for many animal taxa with high amplification success in invertebrates [39][40][41][42][43].The primers for each individual sample were tagged with index sequences on both the forward and reverse primer (in our case a selection of the tags from [44]; Table S1).Later, during demultiplexing of nanopore sequencing reads, these indices allow for unambiguous sample assignment.
We pooled 2 µL of each of the 88 PCR products into a single 1.5 µL Eppendorf tube.This pool was then cleaned up with AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA) and the final concentration was measured with a Qubit 4 Fluorometer using the Qubit dsDNA HS Assay Kit (Invitrogen by Thermo Fisher Scientific, Waltham, MA, USA).Two sequencing libraries were prepared using the Oxford Nanopore Ligation Sequencing kit versions SQK-LSK109 and SQK-LSK112, following the official protocols and sequenced on Flongle (R9.4.1;FLO-FLG001) and MinION flowcells (R10.4;FLO-MIN112), respectively.The Flongle flowcell was new and the sequencing run was set to terminate automatically after 48 h.The MinION flowcell had already run for 48 h and was washed using the ONT washkit (EXP-WSH004; Oxford Nanopore Technologies, Oxford, UK) before loading the library.An initial flowcell check revealed approx.100 available pores.Sequencing was terminated after 16 h and 20 min, since the run had already generated three times the total number of reads obtained by the Flongle flowcell and less than 10 pores remained active at this time.

Demultiplexing and Generating Consensus Barcodes
For demultiplexing of nanopore sequencing data and generating individual consensus barcodes we used ONTbarcoder [32], employing the default settings.We opted to use ONTbarcoder as it is specifically designed for analyzing protein-coding genes.To investigate barcode recovery efficiency, we analyzed raw reads generated with different basecalling methods (guppy high accuracy (hac) vs. guppy super high accuracy (shac)) and ONT technologies (Flongle vs. MinION flowcell).
To test whether more input data increase the number of recovered barcodes, we performed two additional ONTbarcoder runs with the combined reads from the Flongle and the MinION flowcell.
Additionally, we wanted to test whether the basecalled reads from the MinION flowcell are inherently better than the reads generated with the Flongle.Therefore, we ran ONTbarcoder two additional times with the first 50,000 reads from the MinION flowcell, i.e., comparable sequencing depth to that obtained from the Flongle flowcell, basecalled with hac and shac.For downstream analysis we only used consensus barcode sequences from shac basecalled data with up to five ambiguous bases.

Sanger Sequencing
To generate Sanger sequencing data for the same samples, PCR amplification followed the protocol mentioned above, but using untagged PCR primers.Again, a subset of the PCR products was run on a 2% agarose gel to ensure that the PCR was successful.Clean-up of PCR products was carried out using ExoSAP-IT (ThermoFisher Scientific) by adding 1.0 µL of ddH 2 O and 0.7 µL of ExoSAP-IT to the PCR product.The mixture was incubated at 37 • C (45 min) to degrade remaining primers and nucleotides before the enzymes were inactivated at 80 • C (15 min).Bidirectional chain termination sequencing used 5.7 µL of ddH 2 O, 2.0 µL of 5× Sequencing Buffer (ThermoFisher Scientific), 0.4 µL of primer, 0.3 µL BigDye (Thermo Fisher Scientific), and 2 µL of purified PCR product.The run conditions were an initial denaturation at 94 • C (3 min) followed by 35 cycles of denaturation at 94 • C (30 s), annealing at 50 • C (30 s), and extension at 60 • C (3 min), followed by a final extension at 60 • C for 7 min.DNA fragments were purified with Sephadex G-50 (Amersham Biosciences, Amersham, UK) following the manufacturer's instructions and visualized on an ABI 3500xl Genetic Analyzer (ThermoFisher Scientific).Trace files/sequences were checked, edited, and aligned in MEGA11 [45] by a single experienced person.

Comparison among Sequencing Methods and Reference Database Coverage
Barcodes obtained via different approaches (MinION, Flongle, Sanger sequencing) were compared in MEGA11.Initially, we inferred a neighbor-joining tree (based on uncorrected p-distances and complete deletion of gaps/missing data, as we wanted to infer the effect of estimated errors in the nanopore barcodes) including all Sanger sequencing data and all shac replicates (for reasons, see Results section) from different nanopore sequencing runs.We then checked positions that differed between nanopore and Sanger sequencing data in the electropherograms obtained by Sanger sequencing and checked the number of ambiguous bases and estimated gaps in the MinION and Flongle barcodes (as reported as quality criterion by ONTbarcoder).
Lastly, we blasted our final barcodes against barcoding data available in BOLD (www.boldsystems.org;accessed on 11 March 2024) and, if there was no match in BOLD, GenBank (https://www.ncbi.nlm.nih.gov/genbank/;accessed on 11 March 2024), to assess whether, as would be the case in a usual biodiversity monitoring/assessment situation, the available reference data would allow for a reliable species assignment.In addition, a neighbor-joining tree, based on K2P distances [46], which is the standard model of evolution used in DNA barcoding studies, was inferred in MEGA11 with one barcode sequence per sample.In cases where barcodes differed in quality between sequencing approaches the sequence with the fewest ambiguous bases was used.

Different Nanopore Sequencing Approaches
Irrespective of the basecalling algorithm used (hac vs. shac), the Flongle flowcell generated ~60,000 reads and the MinION flowcell ~180,000 reads (Table 1).After filtering, roughly two thirds of these reads were used for demultiplexing.The individual number of reads per sample for further downstream analysis in ONTbarcoder ranged from 7 to 4050 (Figure 1).For most samples, more usable reads were recovered by the shac than the hac basecalling algorithm (Figure 1, Table 1).Depending on the sequencing method (Flongle, MinION, combined Flongle+MinION) and basecalling approach (hac and shac), the number of recovered barcodes differed only slightly (Table 1).Among the various nanopore sequencing and basecalling approaches tested, the lowest number of barcodes (N = 57) was obtained with the Flongle run with hac basecalling and the largest number of barcodes (N = 60) was recovered with the MinION run and shac basecalling.While most of these barcodes were QC compliant for the MinION runs and the Flongle+MinION combination, this number was much lower for the Flongle barcodes; most of the non-QCcompliant barcodes had one to five errors.Interestingly, the MinION50000 run resulted in a much higher number of QC-compliant barcodes than the Flongle run, despite a comparable number of reads (Table 1).In general, shac basecalling produced (slightly) more highquality barcodes than the less computationally intensive hac basecalling algorithm (Table 1).The number of ambiguities and errors generally decreased with increasing read numbers, but, for MinION data, most consensus barcodes based on >11 individual reads were free of both ambiguities and errors.Flongle data also yielded some ambiguities and errors for some samples with larger read numbers (Figures S1 and S2).The final number of barcodes generated with Sanger sequencing (N = 63) was higher than for any of the nanopore sequencing approaches.The number of QC-compliant barcodes, however, was much lower and comparable with the results obtained with the Flongle flowcell (Table 1).In addition, a few of the barcodes generated by Sanger sequencing had to be trimmed to a shorter size because of ambiguous bases/background noise at the start/end of the individual sequences (see alignment in Data S1).Most QC-compliant Flongle, MinION, and combined Flongle+MinION shac barcodes were identical to Sangergenerated barcodes (Figure S3).In two cases (samples 2 and 62) the Flongle barcode differed from MinION, combined Flongle+MinION, and Sanger barcode by 1 bp.In five cases (samples 6, 10, 24, 40, and 71) the Sanger barcode differed by 1 bp from the barcodes generated by the other approaches.Indeed, upon further scrutiny, it turned out that, even though the automatically basecalled Sanger data had been checked by eye by an experienced person, wrong bases were scored/missed (Figure S4).In one case (sample 4), the Flongle barcodes did not cluster with the corresponding MinION and combined Flongle+MinION barcodes (no Sanger sequence could be obtained for this sample) but rather with the barcodes of another sample (sample 6; Figure S3).The reason for this is unclear as the index primers used for these two samples are quite different (Table S1) such that an accidental assignment of individual reads to the wrong sample, due to the comparatively high inherent error rate of nanopore sequencing [47], is highly unlikely.

Nanopore vs. Sanger Sequencing
The final number of barcodes generated with Sanger sequencing (N = 63) was higher than for any of the nanopore sequencing approaches.The number of QC-compliant barcodes, however, was much lower and comparable with the results obtained with the Flongle flowcell (Table 1).In addition, a few of the barcodes generated by Sanger sequencing had to be trimmed to a shorter size because of ambiguous bases/background noise at the start/end of the individual sequences (see alignment in Data S1).Most QC-compliant Flongle, MinION, and combined Flongle+MinION shac barcodes were identical to Sanger-generated barcodes (Figure S3).In two cases (samples 2 and 62) the Flongle barcode differed

Coverage in BOLD
For 17 of the 88 samples, none of the different sequencing approaches produced a DNA barcode.Of the remaining samples, 46 had matches in BOLD (similarity > 97%).Some of these samples could not be assigned to a single species by BOLD but rather to a set of species that required further scrutiny (Table S2).For 25 samples, BOLD (and GenBank) found no species-level matches at all.For one of these (sample 26) the best hit in BOLD was Arthropoda (84.13% similarity) but in GenBank the consensus sequences had significant matches to Arthropoda only across a part of its total length with the remainder of the sequence matching to completely different organisms (sponges, fungi) with comparable similarity.Given that the ONT consensus sequences contained ambiguities (Table S2) and Sanger sequencing for this sample was unsuccessful, we consider the consensus a spurious, potentially chimeric sequence which we omitted from further analyses.Two samples (samples 41 and 67) could be assigned to the dipluran family Campodeidae.Seven samples, belonging to five molecular taxonomic units (MOTUs), could be assigned to three mite orders and two families.Sixteen samples forming five MOTUs were clearly entomobryomorph Collembola, but could only be reliably identified to family level (Table S2, Figure 2).

Discussion
In this study we compared the performance of DNA barcoding via nanopore sequencing using Flongle and MinION flowcells with traditional DNA barcoding based on Sanger sequencing.We found that nanopore sequencing followed by barcode calling using ONTbarcoder produces high-quality barcodes.When Sanger barcodes and QC-compliant nanopore barcodes differed, this was usually due to erroneous basecalls or ambiguities in the Sanger sequences.Thus, our results are in line with previous studies that assessed the potential of nanopore sequencing for DNA barcoding, demonstrating that nanopore sequencing is a viable alternative for generating high-quality barcoding data [20,32,44].Even though the automatically basecalled Sanger data had been checked and edited by an experienced person (without knowledge about the corresponding nanopore barcodes), some obviously wrongly called bases were overlooked.Thus, while we strongly encourage checking Sanger data prior to using them for further downstream analyses and uploading to BOLD instead of simply using automatically basecalled data, this procedure is clearly not totally fail-proof and some errors are to be expected, especially when working with large datasets.The alternative pipeline we used (nanopore sequencing followed by barcode calling in ONTbarcoder) excludes the possibility of such errors, yielding QC-compliant barcodes that are at least as good as the corresponding Sanger barcodes.In addition, nanopore sequencing is a considerably cheaper and more time-efficient alternative for larger DNA barcoding projects.Indeed, a recent study showed that barcoding with Flongle or MinION flowcells is more cost-efficient than Sanger sequencing already with 61 or 180 samples, respectively [20].
In our study, the number of high-quality barcodes recovered from Flongle runs was lower than for MinION runs and comparable to Sanger sequencing.A similar pattern has been observed previously (e.g., [20]).Probably, this difference in performance between Flongle and MinION can be attributed to the particular flowcell versions and basecalling models used.At the time of writing this article both Flongle and MinION flowcells are available for the latest V14 chemistry and current versions appear to produce more barcodes with higher quality [48].However, as the newest versions of ONT flowcells and sequencing chemistry became available only very recently, many labs might still have data generated using older versions.Our findings show that already these data can yield results which are at least as reliable as DNA barcodes generated by Sanger sequencing.
Though shac basecalling produced slightly more QC-compliant barcodes than the faster hac algorithm, hac basecalling produces barcodes that are well suited for assignment to particular MOTUs.The shac model is more likely to produce reference-quality barcodes but the runtime of the basecalling algorithm is considerably longer than for hac unless GPU resources are available.It is noteworthy, however, that for most samples already the hac model produced QC-compliant barcodes.Currently, Sanger barcodes are still considered the gold standard in DNA barcoding and quality criteria of BOLD are still only available for barcodes generated via Sanger sequencing.Considering that now high-quality barcodes can also be generated by means of nanopore sequencing in a time-and cost-efficient way, equivalent standards are also urgently needed for nanopore sequencing data.
Probably not surprisingly, we found that some taxa were well covered in BOLD whereas for others no reference data were available.Spiders, chilopods, and most insects could be identified to species level, i.e., close matches were found in BOLD.For springtails and mites, this, however, was the exception and only a few samples could be identified to species level, highlighting the urgent need for additional reference data for these taxa.Even carrying out a BLAST search on GenBank did not help much in these cases, as, for these samples, identification was only possible to genus or family level and sometimes to order level.Springtails and mites are among the major functional groups in soil.Thus, accurate species identification is critical for our understanding of ecosystem functioning.Not only is a general lack of reference DNA barcodes hampering species assignment in these taxa but also high levels of cryptic diversity.There is increasing evidence that the species diversity in these taxa is considerably higher than generally assumed.For example, DNA barcoding of mites from a large number of sites across Canada revealed that the number of MOTUs was 2.4 times the number of species previously recorded for Canada [49].Similar large-scale studies on mites or other soil invertebrates are lacking from other geographic regions, but it is likely that the same pattern holds true also for other regions and will be even more extreme in largely understudied parts of the world.On a more local scale and/or focused on particular taxa within springtails and mites, recent studies have indeed provided increasing evidence for high levels of cryptic diversity (springtails: [50][51][52]; mites: e.g., [10,11,13]).While for some questions in basic and applied research exact species identification might not be necessary and MOTUs might be used as species proxies instead, proper molecular species identification does not only allow for comparisons across DNA barcode-based studies but also for comparisons with other, often older, studies that employed traditional morphological species identification.Currently, much emphasis is put on safeguarding and monitoring biodiversity in various environments, including soil, to mitigate biodiversity loss with its negative impact on ecosystem services.Thus, the EU Soil Strategy for 2030, which contributes to the goals of the EU Green Deal and is part of the EU's Biodiversity Strategy, proposes specific actions related to climate change mitigation, circular economy, biodiversity, desertification, soil restoration and monitoring, and citizen engagement for the maintenance of or transition to healthy soils [53].The accurate characterization of biodiversity is the paramount foundation for some of these actions.Large-scale characterization and monitoring of soil biodiversity are virtually impossible based on traditional morphological methods and thus have to rely on DNA-based approaches.For these to work, however, cryptic species complexes need to be resolved by employing integrative taxonomic approaches (e.g., [54]) and reference (DNA barcode) data at least for the major functional groups need to be available.
In BOLD, sequences are ideally assigned to a single BIN, which is a MOTU inferred based on BOLD's molecular species delimitation algorithm [55] that in most cases closely corresponds to biological species.In some cases, however, even when there is a match in BOLD, exact species assignment is not that straightforward.This is usually due to BIN sharing, a consequence of recent species divergence, introgression, or misidentification of specimens used for generating reference DNA barcode data.Misidentification of some reference specimens in BOLD prevented the straightforward identification of our Lithobius (Chilopoda) samples (Table S2).Among the best matches for one of the two Lithobius MOTUs in our dataset were mostly L. tenebrosus but also an alleged L. nodulipes.Similarly, the second Lithobius MOTU matched L. agilis but also a single L. pelidnus.Indeed, these L. nodulipes and L. pelidnus in BOLD seem to have been misidentified as the vast majority of L. nodulipes and L. pelidnus in BOLD cluster in different, species-specific BINs (BOLD:ACT7891 and BOLD:AAV8113, respectively).Other cases of BIN sharing in our dataset can be attributed to recent divergence.Two MOTUs of myrmycine ant species matched BINs in BOLD with four and three species, respectively.The first one includes Myrmica aloba, M. sabuleti, M. scabrinodis, and M. spinosior, with M. aloba and M. spinosior not occurring in Austria but in southwestern Europe, M. sabuleti being present in Central Europe and inhabiting predominantly xerothermic dry and semidry meadows, and M. scabrinodis also occurring in Central Europe with a clear preference for open peat moss lawns in bogs [56].Consequently, we conclude that M. scabrinodis is the species we found in our moss samples.The second case includes Lasius niger, L. platythorax, and L. psammophilus.All three species do occur in Central Europe, but L. psammophilus has a clear preference for sandy habitats [57] and thus is very unlikely to be the species found in our study.Of the remaining two species, L. niger is very common and found in a variety of (mostly dry and warm) habitats whereas L. platythorax is a rarer species that mainly occurs in habitats not inhabited by the former species, such as colder and moister forests and bogs [57].Even though it is likely that the species we sequenced is L. platythorax we cannot exclude the possibility that it is actually L. niger.Also, for the sample finally identified as the aphid Pemphigus populivenae, BOLD yielded an inconclusive result based on mere sequence similarity.Employing the standard 97% similarity threshold, a species-level match could not be made by BOLD and the following potential species were suggested: Pemphigus groenlandicus, P. monophagus, P. populiglobuli, P. populivenae, and P. sp.D. However, although these species are closely related and diverged only recently, tree-based clustering in BOLD clearly placed our sample in a clade with P. populivenae, well apart from the other species (Figure S5).Some samples were assigned to a single species in BOLD even though % similarity to the best match was <99%.Specifically, this concerned samples 51 and 82, which were assigned to the crane fly Tipula melanoceros and the cobweb spider Robertus lividus with similarities of 98.92% and 97.32%, respectively.While it is indeed possible that species assignment was correct, potential cryptic diversity or incomplete taxonomic coverage of closely related species in BOLD has to be considered as well.These cases show that the species identification suggested by BOLD should not be accepted without further scrutiny and that considering the species' distribution and biology is essential to further narrow down the range of candidate species in instances where there are several species sharing a BIN.
For 17 out of the 88 samples, no barcodes could be generated, neither with the nanopore nor the Sanger sequencing approach.The reasons therefore could be that DNA extraction failed or that primers did not bind because of an excess of substitutions in the primerbinding region.A complete failure of DNA extraction for all these samples, however, seems unlikely.On the other hand, it is well known that the commonly used standard barcoding primers LCO1490 and HCO2198 [38] do not work for all arthropods such that alternative primers have been developed for a range of taxa (e.g., [58][59][60]).In metabarcoding approaches, i.e., the barcoding of pooled samples, species that fail to amplify will go undetected and result in a biased species diversity estimate.In addition, amplification bias also impacts relative abundance estimates of taxa in metabarcoding studies.Using individual-based nanopore sequencing instead of typical metabarcoding circumvents the problems associated with amplification bias.Samples that fail to be amplified with standard primers can be repeated with a different primer pair and since sequence data are linked to each individual, abundance estimates are possible.Though clearly more expensive and time-consuming than a typical metabarcoding study, individual-based nanopore sequencing is much more cost-and time-efficient, and, as we and other studies have shown, less error prone than classic Sanger sequencing.Thus, nanopore sequencing is a viable option for efficient biodiversity assessment, especially when inferring the entire diversity and (relative) abundances is particularly important.
To conclude, our study shows that (i) nanopore sequencing produces DNA barcodes that are at least as good as barcodes generated via classical Sanger sequencing, (ii) taxonomic coverage of mites and springtails in BOLD is poor, allowing only for limited inferences regarding actual species identity in these taxa, and (iii) molecular species identification by BOLD requires further scrutiny to sort out potential misidentification of voucher material in BOLD and resolve cases of BIN sharing.Thus, current nanopore sequencing pipelines are clearly a viable cost-and time-efficient alternative for generating high-quality DNA barcode data and might also be useful for efficient biodiversity monitoring when it is crucial that all species are identified or when (relative) abundance estimates are required.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/d16040196/s1,Data S1: Full alignment, including shac basecalled Flongle and MinION data plus Sanger sequences, used for inferring the neighbor-joining tree in Figure S3; Figure S1: Relationship between number of ambiguities and number of reads per sample, depending on nanopore sequencing approach; Figure S2: Relationship between number of mismatches and number of reads per sample, depending on nanopore sequencing approach; Figure S3: Neighborjoining tree of Sanger-sequenced barcodes and all shac-basecalled barcodes from Flongle, MinION, and combined Flongle+MinION runs; Figure S4: Inconsistencies between QC-compliant nanopore sequences and Sanger sequences; Figure S5: Assignment of sample 30 to Pemphigus populivenae using the tree-based assignment in BOLD; Table S1: Index sequences (for nanopore sequencing) and PCR primers used for each individual sequencing; Table S2: Taxonomic assignment of shac-basecalled Flongle and MinION barcodes and barcodes generated via Sanger sequencing.
Ns or errors are permitted in QC-compliant barcodes.

Figure 2 .
Figure 2. Neighbor-joining tree (K2P distances, pairwise deletion of missing data) of the barcodes of moss-dwelling invertebrates generated in this study.As measures of nodal support, bootstrap values are shown (1000 pseudoreplicates; only values >50 are shown).Sample IDs are followed by (conservative) taxonomic assignment and % similarity to the best match in BOLD or GenBank, if

Figure 2 .
Figure 2. Neighbor-joining tree (K2P distances, pairwise deletion of missing data) of the barcodes of moss-dwelling invertebrates generated in this study.As measures of nodal support, bootstrap values are shown (1000 pseudoreplicates; only values >50 are shown).Sample IDs are followed by (conservative) taxonomic assignment and % similarity to the best match in BOLD or GenBank, if species assignment failed with BOLD.Samples with species identity inferred via BOLD are highlighted in bold.

Table 1 .
Read and barcode statistics for the different sequencing approaches.