Epiphytic Diatom-Based Biomonitoring in Mediterranean Ponds: Traditional Microscopy versus Metabarcoding Approaches

: Benthic diatoms have traditionally been used as bioindicators of aquatic ecosystems. Because diatom-based monitoring of water quality is required by European legislation, molecular-based methods had emerged as useful alternatives to classical methods based on morphological identiﬁcation using light microscopy. The aim of this study was to test the reliability of DNA metabarcoding combined with High-Throughput Sequencing (HTS) techniques in the bioassessment of the trophic status of 22 Mediterranean shallow ponds in NW Spain. For each pond, the Trophic Diatom Index (TDI) was calculated from inventories obtained by identiﬁcation using light microscopy (LM) followed by high-throughput sequencing (HTS) at the molecular level. Ponds were subsequently classiﬁed into ﬁve water quality classes. The results showed a good correspondence between both methods, especially after applying a correction factor that depended on the biovolume of the cells. This correspondence led to the assignment to the same quality class in 59% of the ponds. The determination and quantiﬁcation of valves or DNA sequences was one of the main pitfalls, which mainly included those related to the variability in the relative abundances of some species. Accordingly, ponds with similar relative abundances for the dominant species were assigned to the same quality class. Moreover, other difﬁculties leading the discrepancies were the misidentiﬁcation of some species due to the presence of semi-cryptic taxa, the incompleteness of the reference database and the bioinformatic protocol. Thus, the validation of DNA-based methods for the identiﬁcation of freshwater diatoms represents an important goal, as an alternative to using traditional methods in Mediterranean shallow ponds.


Introduction
Benthic diatoms are well-known bioindicators of water quality that are used in many aquatic ecosystems due to their sensitivity to changes in the environment, particularly those affecting water chemical conditions. Diatom-based methods are routinely used in water biomonitoring, and thus differentindices have been historically developed to classify the ecological status of waterbodies. Traditionally, these methods are based on morphological identifications at the species level (e.g., Pollution Sensitivity Index [1,2]).
However, shallow lakes, to the best of our knowledge, still remain unstudied. Shallow lakes, especially those in the Mediterranean region, are characterised by their shallow average depth and their small area, with different degrees of temporality. Despite their important role in the environment-providing habitat for animals and plants, diversifying the landscape or acting as water reserves-they have suffered an increasing pressure in terms of nutrients and organic pollution. Human activity, including changes in soil uses and high-intensity farmland, has contributed substantially to the degradation of these habitats. Products derived from agricultural and livestock activities (especially fertilisers and pesticides) are the main pollutants in these types of habitats. These lakes have been extensively studied from the perspective of their ecological quality using diatoms [17][18][19][20] and recently [21] applied molecular techniques under a metacommunity framework. However, metabarcoding techniques have not been applied to date together with Next-Generation Sequencing (NGS) to test their validity in water quality assessment in shallow lakes. Thus, in this study, as in previous studies [22][23][24][25][26], the use of epiphytic diatoms for biomonitoring purpose is evidenced in shallow lakes, where macrophytes may be the only available substratum to be found in virtually any system. Our study specifically addresses the question of whether there are differences between the ecological classifications of shallow lakes based on the diatom data acquired by the DNA barcoding and light microscopic identification of diatoms.
In this study, the aim was to compare the differences between diatom species inventories obtained by classical (light microscopy) and molecular (metabarcoding of diatom rbcL DNA) methods in a set of 22 Mediterranean shallow ponds in the Duero river basin (NW Spain). Thus, the reliability of molecular-based methods for the assessment of water quality in this system was tested.

Study Area
The study area ( Figure 1, Table 1) is located in the northwest of the Iberian Peninsula within the Duero river basin, with an area of 97,000 km 2 , of which 81% is in Spanish territory. Samples were obtained from shallow ponds situated 800 m above sea level (ASL) on the western side of this basin. The study site is mainly affected by agricultural intensification, turning to irrigated crops in recent decades.

Diatom Sampling
Diatoms were collected from the submerged stems of Schoenoplectus lacustris (L.) Palla, in July 2018 from 22 shallow ponds. In each pond, at least 10-12 stems that were randomly distributed throughout the pond surface were chosen and cut at 10 cm below the water surface, according to recommendations from previous studies [18,27]. The stems of each pond were placed together in a 1 L plastic bottle filled to 0.5 L with distilled water. The attached diatoms were dislodged by shaking for two minutes [28][29][30]. Subsamples for microscope analysis were preserved in 4% v/v formaldehyde, whereas subsamples for molecular analysis were preserved with ethanol (70% v/v final concentration) and stored in the dark. Simultaneously, water samples were taken in order to determine the trophic conditions in each system, and thus total nitrogen (mg L −1 ) and total phosphorus (µg L −1 ) levels were determined following standard methods [31].

Morphological Analysis
Samples for light microscopy (LM) analysis were treated with hydrogen peroxide and hydrochloric acid, according to standard methods [32]. Cleaned frustules were mounted on permanent slides using Naphrax, and at least 400 valves were identified and counted from each sample using a light microscope (Olympus BX60, Olympus Optical Co, Tokyo, Japan) at a magnification of 1000×, following the procedure for European diatom floras [33][34][35][36][37].

DNA Extraction, PCR Amplification and Sequencing Data Processing
DNA was extracted from a subsample of 2 mL which was centrifuged for 30 min at 11,000 g. Supernatant was removed and material in the pellet was resuspended in 200 µL of nuclease-free water. The Power Soil DNA Isolation Kit ® (Mo Bio Laboratories, Carlsbad, CA, USA) was used for extraction, following the manufacturer's recommendations. The gene marker rbcL was amplified by PCR using the primers proposed by [16,38], an equimolar mix of three PCR primers, one forward (Diat_rbcL_708F_1 (AGGT-GAAGTAAAAGGTTCWTACTTAAA), Diat_rbcL_708F_2 (AGGTGAAGTTAAAGGTTCW-TAYTTAAA) and Diat_rbcL_708F_3 (AGGTGAAACTAAAGGTTCWTACTTAAA)), and two reverse (Diat_rbcL_R3_1 (CCTTCTAATTTACCWACWACTG) and Diat_rbcL_R3_2 (CCTTCTAATTTACCWACAACAG)), including Illumina adapters P5 (CTTTCCCTACAC-GACGCTCTTCCGATCT) and P7 (GGAGTTCAGACGTGTGCTCTTCCGATC). For each DNA sample, six PCR replicates were carried out on 10-20 ng/µL of extracted DNA in a mixture (50 µL final volume) containing 2 U of Platinum II Taq Hot-Start DNA Polymerase (Invitrogen, Grand Island, NY, USA), 10 µL of 5× Platinum II PCR Buffer, 0.5 µM of each primer, 5 µL of dNTP mix (2 mM each), 10 µL of Platinum GC Enhancer and 9.6 µL of nuclease-free water. PCR conditions included an initial denaturalisation step at 94 • C for 4 min followed by 40 cycles of denaturalisation at 94 • C for 30 s, annealing at 55 • C for 30 s and extension at 68 • C for 30 s, and a final extension step at 68 • C for 10 min. After PCR, the amplification of the rbcL was confirmed by 1.5% agarose gel electrophoresis stained with ethidium bromide and visualised with UV light.
The sequencing process was carried out in Sistemas Genómicos SL (Valencia, Spain). Purified DNA from all six PCR replicates was pooled into a single sample and quantified using Qubit 3.0. (Invitrogen, Life Technologies, Grand Island, NY, USA). From each sample, 5 ng of pooled PCR products was subjected to indexing PCR on 50 µL of reaction mixture containing KAPA HiFi Hot Start DNA Polymerase (Kapa Biosystems, Wilmington, DE, USA) and a combination of specific primers to allow multiplexing of all PCR products in the same sequencing run. PCR products were purified with Agencourt AMPure XP beads (Beckman-Coulter, Brea, CA, USA). After that, a 4200 Tape Station (Agilent Technologies, Santa Clara, CA, USA) with a High Sensitivity D1000 Screen Tape was used to determine the quality and quantity of purified amplicons. Finally, rbcL libraries were pooled and paired-end sequenced (2 × 250 bp) on a MiSeq 2500 instrument (Illumina, San Diego, CA, USA) using a Micro MiSeq Reagent Kit v2.

Bioinformatic Analysis
For the bioinformatic treatment, all of the FASTQ files were treated together, following recommendations by [38] and process optimisation by [9], obtaining taxonomic assignments directly from individual sequence units (ISU). All of the analyses were performed using Mothur software v.1.42.1 [39]. DNA reads were filtered to remove reads with length > 250 bp, those with Phred quality score > 23 over a moving window of 25 bp (allowing up to 1 mismatch in primer sequence), homopolymer > 8 bp and sequences with ambiguous base calls (maxambig = 0). Taxonomic assignment was conducted directly from individual sequence units (ISUs) after removing chimeric DNA sequences using the Uchime algorithm [40] following [9], where those authors demonstrated the effectiveness of this method as an alternative to operational taxonomic unit (OTU) clustering in biomonitoring. The reference database was Diat.barcode v7 [41] (available at https://www6.inrae.fr/carrtel-collection/Barcoding-database/Database-download, accessed on 21 January 2020).
For each species, the correction factor (CF) proposed by [42] was applied to reduce the bias produced by the greater number of sequences in cells with large biovolumes. These biovolume values were calculated from Diat.barcode v7 [41] and applied following the CF modification proposed by [9].

Ecological Status Class Assignment
With the resulting floristic inventories, water quality was determined for each system by applying the Trophic Diatom Index (TDI) [2], computed using OMNIDIA software v. 6.0.8 [43]. This index was selected on the basis of the diatom taxa that were considered for its calculation. In most ponds, all species were considered for the calculation of TDI scores, and in the least of the cases 83.3% of the species were considered in order to compute the water quality index. Ponds were subsequently classified into five classes: High, Good, Moderate, Bad and Poor status, according to the criteria of the European Water Framework Directive (2000/60/EC).

HTS Analysis
A total of 4,150,073 reads were obtained by Illumina MiSeq sequencing. After trimming and applying quality filters, 2,699,154 DNA reads were conserved. After taxonomic assignment corresponding to 209,650 ISUs, 159 taxa were successfully determined at species level (50.1% of total reads). The number of species per sample ranged from 43 to 93, with an average of 59 species (Table S1). After applying the biovolume correction factor, the most abundant species were Achnanthidium minutissimum (Kützing) Czarnecki, Ulnaria acus (Kützing) Aboal, Fragilaria sp., Gomphonema saprophilum (Lange-Bertalot and E.Reichardt) Abarca et al. and Nitzschia palea (Kützing) W.Smith.

Comparison of LM and HTS Inventories
Only 20.1% of the taxa were concordant when comparing both methodologies ( Figure 2). The molecular method was unable to detect some species such as Gomphonema exilissimum (Grunow) Lange-Bertalot and E.Reichardt, Nitzschia palea var. debilis (Kützing) Grunow, Ulnaria biceps (Kützing) Compère or Gomphonema pumilum var. rigidum E.Reichardt and Lange-Bertalot. These species could have been misidentified during morphological identification as morphologically similar taxa. The pairwise comparisons of the most abundant species are summarised in Table 2. For instance, Fragilaria nanoides Lange-Bertalot was probably identified as Fragilaria tenera, as deduced from their similar relative abundances (9.1% and 3.9% by LM and HTS, respectively).   The correlation between relative abundances of valves (LM) and number of reads (HTS) for the most abundant species (Achnanthidium minutissimum, Eunotia bilunaris, Fragilaria tenera and Ulnaria acus) was positive and statistically significant (Figure 3). Larger diatoms were widely overrepresented in HTS inventories before applying the correction factor. This was the case for Ulnaria acus, which varied from a relative abundance of 15.8% to 10.7% after applying the correction factor, which was a similar value to that identified by LM (8.7%).

SPECIES
Regarding the potential for LM misidentification, Nitzschia gracilis was only identified in three samples by LM, but occurred in many samples according to HTS. In one of the ponds (Linos) a total of 11% of sequences were assigned to Nitzschia palea, but by LM up to 37% of valves were assigned to N. gracilis (not detected by HTS). Similarly, a confusion may have occurred between Fragilaria tenera and F. nanoides, identified only by LM or HTS, respectively. Another case of suspected misidentification affected Gomphonema parvulum sensu stricto; while Gomphonema saprophilum (Lange-Bertalot and Reichardt) Abarca Jahn Zimmermann and Enke was the most abundant species from the Gomphonema parvulum species complex in HTS inventories, Gomphonema parvulum f. parvulum (Kützing) Kützing was dominant in the datasets that were derived from light microscopy.
Overall, HTS was able to resolve a larger species diversity within certain genera. For instance, seven different taxa were identified at species level for Eunotia (Eunotia arcus, E. bilunaris, E. glacialis, E. implicata, E. minor, E. mucophila, Eunotia sp.), whereas only five taxa were identified at species level (Eunotia sp., E. bilunaris, E. implicata, E. minor, E. pseudoflexuosa) by LM. Of particular interest is the case of Eunotia glacialis, a species known to be present in northern Spain (Aboal et al. 2003) and which was actually recorded in all of the studied ponds by HTS but was apparently absent from the studied systems according to the results provided by LM analysis. Conversely, several genera were only detected by LM analysis, such as Fallacia Stickle and Mann, Amphora Ehrenberg ex Kützing, Cocconeis Ehrenberg, Cyclostephanos Round, Epithemia Kützing, Gyrosigma Hassall, Lemnicola Round and Basson, Luticola Mann, Planothidium Round and Bukhtiyarova, Rhoicosphenia Grunow, Tabellaria Ehrenberg ex Kützing. Finally, a number of taxa, such as Achnanthidium digitatum Pinseel, Vanormelingen, Hamilton and Van de Vijver, were not expected in the inventories obtained because, to our knowledge, these have not previously been recorded in Spain and to date are only known from high latitude localities (Pinseel et al., 2017).
The Trophic Diatom Index scores showed a better correlation between data from LM and from metabarcoding after applying the correction factors for cell size. The percentage of ponds ( Figure 4) with a congruent classification according to both methods increased from 45 to 54 after applying the correction factor. None of the ponds was classified as "Bad" quality by molecular and morphologic-based methods, whereas the only pond classified as "Poor" was assigned to the same class by both methods. In the case of ponds classified as "Moderate" or "Good", 7 ponds out of 22 were incongruently classified (Table 3). Regarding the "High" quality class (blue), only one system reached this quality range by conventional methods, which was underestimated by HTS. For 50% of the systems evaluated, TDI scores obtained by both methods differed by less than 20%.  For instance, the only pond classified as "High" quality by LM (LP04 TRIGO), showed differences regarding the dominant taxa that were found according to the methods that were compared. By light microscopy, the most abundant species were Fagilaria tenera (75.8%), Ulnaria acus (17.6%) and Nitzschia gracilis (3.1%), contrasting with Fragilaria sp. (47.2%), Achnanthidium minutissimum (19.2%) and Ulnaria acus (16.2%) derived from HTS analysis.
The results of the correlation test ( Figure 5) that was performed between index values that were calculated from morphological and molecular inventories showed significant correlation in the case of the TDI only, after applying the correction factor. Total phosphorus and total nitrogen were analysed as one of the main indicators of nutrient pollution. Total phosphorus (TP) concentrations, with an average concentration of 122.3 (µg L −1 ), ranging from 26 to 796.8 µg L −1 , correlated with TDI derived from morphological and molecular-based methods. A non-significant negative correlation was observed between total phosphorus and TDI calculated from both morphological (R = −0.27) and molecular (R = −0.23) approaches ( Figure 6). Total nitrogen (TN) levels ranged from 0.8 to 3.2 mg L −1 with an average of 1.82 mg L −1 . The correlation coefficients to TN and TDI that were calculated from both methods, LM (R = −0.23) and HTS (R = −0.40), were not significant (p > 0.05).

Discussion
The comparison of morphological and molecular-based inventories demonstrated a similarity of 20.1% in terms of species number. In fact, the results are comparable with the 15.7% found in [16] and other studies in rivers, ranging from 13 to 26.7% [9,11,15,38].
Comparing the relative abundance of the sequences (HTS inventories) and of the valves (LM inventories), it can be assumed that certain species (identified at the genus level) may have been misidentified under LM. In this regard, molecular methods can give an idea of the total richness within a sample. Notwithstanding, the comparison of inventories obtained both morphologically and by HTS point out some nonconformities in geographical distributions. On the one hand, some taxa found by HTS had not been detected in previous analyses of the diatom assemblages in the study ponds. For instance, Achnanthidium digitatum Pinseel, Vanormelingen, Hamilton and Van de Vijver, has been detected in most samples by HTS methods, and not identified as such under LM. This species had been detected only in arctic and sub-arctic regions [44][45][46]. Similarly, three Fragilaria species (F. aculeopectinalis, F. longipectinalis and F. taeniavaucheriae) were also not identified using light microscopy. In these cases, in which the species are not included in the OMNIDIA database, autoecological ranges need to be defined for an accurate diagnosis of the ecological status.
On the other hand, the appearance of different species between both inventories may be due to misidentification within diatom complexes. Some studies based on phylogenetic analysis have been conducted [4,[47][48][49] in order to explain their taxonomy and ecology, and these have indicated that geographical distribution may affect species determination in extremely similar morphological cases. Thus, traditional morphological characteristics are insufficient to distinguish varieties (or even species), whereby the authors in [50] describe pseudo-cryptic species. Since then, several studies have been aimed at providing an accurate identification of semi-cryptic taxa: Gomphonema parvulum (Kützing) Kützing complex [47], Nitzschia palea (Kützing) Smith complex [51], genus Fragilaria [52], Navicula cryptocephala Kützing [53]. In the case of Gomphonema parvulum complex, in their study, [48] elevated G. saprophilum to species rank, suggesting the biogeographic importance in determining this species, and yet this is not so easy to identify by microscopic methods.
In the case of some genera, they are over-represented in terms of species diversity, due to the database that was used for the taxonomic assignment. For instance, genus Eunotia was represented by a greater number of species using HTS, which [54] associated with the pipeline used. Several lightly silicified taxa could not be found by light microscopy because of the oxidant reactives that were used during the permanent slide preparation process [55]. For instance, Fistulifera saprophila, which was actually present in the samples, was not detected by LM in any pond. Our results are in agreement with this point, regarding other studies [10,11]. On the contrary, some other species were not recorded by HTS, probably because of the lack of some taxa in the reference library or because they were not detected by molecular methods; some dead diatom valves can be retained in the collected biofilm growing on the stems, so they could only be detected by LM. Finally, the presence of certain endemic species [38], not included in the reference database, cannot be ruled out.
With respect to the ecological classification according to TDI scores, 59% of ponds were assigned to the same quality class in both methods, with only one class of difference for the others, hence with similar results to studies performed in rivers (57% in [15]; 69.8% in [11]; 64% in [16]). Similarly, the comparison between transformed (applying CF) and untransformed data showed an improvement in the assignment to quality classes derived from morphological data, as also reported by the same authors. In our case, after applying the correction factor, the percentage of congruent water quality classifications increased from 45.4% to 59.0%. The correction factor proposed by [42] and modified by [9] seems to be a good strategy to avoid the bias produced by the difference in the rbcL copy number. This modification substantially improves the assignment to quality classes in the studied ponds, as has already been shown in previous studies [9,11]. Nutrient pollution caused by increasingly growing intensive agriculture is seriously affecting wetlands. As a consequence, high levels of nutrients are released into the water bodies, these having high nitrogen and phosphorus levels that contribute predominantly to this problem [56,57]. The results of the present study show a good correlation with the Trophic Diatom Index, thus highlighting the efficacy of this index, which was initially conceived for deep continental lakes, to monitor nutrient levels in shallow ponds.
As emphasised by many similar studies (e.g., [54]), one of the key steps for the correct application of DNA barcoding for biomonitoring purposes is the bioinformatic protocol. Standardised protocols are needed to reach an agreement about bioinformatic analysis. In our case, the results suggest a moderately poor correspondence in the species composition and abundances between the HTS inventories that were obtained from different bioinformatic treatments starting from the same sequences [21].
Differences in taxonomic assignment methods have been widely discussed in [54], where the strategies of different research groups and different software were compared.
Those authors suggested the development of an index based exclusively on the presence or absence of taxa. This strategy gives good results in other taxonomic groups, such as macroinvertebrates (using the IBMWP index, [58]). In the present study, taxonomic assignment was conducted directly with individual sequence units (ISUs), in accordance with previous studies [9,59], where the effectiveness of this method was demonstrated as an alternative to OTU clustering in biomonitoring. Nevertheless, a taxonomy-free analysis seems to be a better approach according to several other authors [10,13,49,60]. Indeed, a taxonomy-free approach based on diatom community samples has already been successfully tested to assess rivers in Portugal [13] and the UK [10]. However, the most reasonable approach seems to be to seek consensus between the equivalence of sequences (HTS) and valve number (LM) [10].

Conclusions
Compared to conventional analyses based on morphological characteristics that require more qualified personnel, DNA barcoding combined with High-Throughput Sequencing (HTS) provides an interesting complementary information source for morphological data. Nevertheless, the determination and quantification of valves or DNA sequences was one of the main pitfalls, which mainly included those related to the variability in the relative abundances of some species. Accordingly, ponds with similar relative abundances for the dominant species were assigned to the same quality class. The main difficulties leading the discrepancies were the misidentification of some species due to the presence of semi-cryptic taxa, the incompleteness of the reference database and the bioinformatic protocol.
In the present study, we demonstrated the efficacy of the use of a taxonomic assignment that was conducted directly using individual sequence units (ISUs). As far as these results show, new studies are needed to expand the study zone and the range of trophic statuses to assess how valuable the routine is in the bioassessment of shallow ponds. New studies are needed to establish policies aimed at protecting, conserving and managing these resources. In conclusion, the validation of DNA-based methods for the identification of freshwater diatoms represents an important goal, whereas this information from DNA-based methods is complementary to traditional methods in the monitoring and conservation of Mediterranean shallow ponds.