Lake Sedimentary DNA Research on Past Terrestrial and Aquatic Biodiversity: Overview and Recommendations

The use of lake sedimentary DNA to track the long-term changes in both terrestrial and aquatic biota is a rapidly advancing field in paleoecological research. Although largely applied nowadays, knowledge gaps remain in this field and there is therefore still research to be conducted to ensure the reliability of the sedimentary DNA signal. Building on the most recent literature and seven original case studies, we synthesize the state-of-the-art analytical procedures for effective sampling, extraction, amplification, quantification and/or generation of DNA inventories from sedimentary ancient DNA (sedaDNA) via high-throughput sequencing technologies. We provide recommendations based on current knowledge and best practises.


Tracking Past Ecological Changes from Lakes and Catchments with
Lake sediments consist of both autochthonous (in-lake) and allochthonous (from the catchment and beyond) organic and inorganic matter. Paleoecological inquiries using biological archives stored in lake sediments have been largely dominated by microscopic analyses of the relatively limited number of aquatic and terrestrial groups that leave wellpreserved and readily identifiable morphological remains in the sediment (e.g., silicified diatoms, calcified nannofossils, organic walled or calcified dinoflagellates, chrysophytes cysts, Cladocera remains, chironomid head capsules, fungal spores, pollen and plant macrofossils) [1][2][3][4][5][6][7][8][9], keeping the remaining biological diversity out of reach. Where these morphological remains are not well preserved or cannot be taxonomically identified to the species level, alternative proxies have been sought to build a more detailed and comprehensive understanding of past biological diversity in a broader range of environments [10,11]. In particular, the use of biomarkers, such as pigments and lipids, have emerged as reliable alternatives for the study of past ecosystems functioning [12]. However, their taxonomic specificity remains limited.
A more recent alternative that is now gaining in popularity is to target nucleic acids (DNA) preserved in the sediment archive. The sequencing of targeted genetic regions or the metagenome (all DNA present) extracted from sedimentary DNA can be used to reconstruct changes in biodiversity i.e., distribution of species based on the genetic  25 November 2020 and was restricted to journal articles and review articles. The results are split into two categories for visualization: publications that contained the search terms sedaDNA or sedimentary ancient DNA (dark grey bars); and publications that had the search terms sedDNA or sedimentary DNA, but that did not contain the first two terms (pale grey bars).
In recent years, sedaDNA has shown a great potential for reconstructing past biological assemblages within lakes and their terrestrial catchments (for review see [13][14][15][19][20][21][22]). Following extraction of sedaDNA, several analytical methods have been used (Table  S1) including: (i) methods aiming to detect or quantify the occurrence of specific target organisms on the basis of endpoint PCR assays, quantitative real-time qPCR and special applications such as droplet digital ddPCR; (ii) DNA metabarcoding-based methods combining PCR amplification of marker loci (often described as "barcodes") with highthroughput sequencing; (iii) metagenomic approaches based on untargeted shotgun sequencing of the total pool of DNA recovered from the sediment; (iv) hybridization-based target enrichment methods to recover DNA fragments of interest from the sediment metagenome (see Section 3.8).
Because sedaDNA captures biological changes in both the aquatic and terrestrial ecosystems, past genetic signals have the potential to provide insights about landscape development (including terrestrial floral and faunal changes and anthropogenic impacts) and the lake ecology itself (Table S1). Using sedaDNA to reconstruct changes in local vegetation has become a particularly well-established method as a complement to traditional microscopic analyses in lake sediment (e.g., pollen) e.g., [23][24][25][26][27][28][29][30]. It also offers the opportunity of studying the response of organisms to disturbances of natural and/or anthropogenic origin [31][32][33][34][35], the interactions of species at different trophic levels [36][37][38] and be-  25 November 2020 and was restricted to journal articles and review articles. The results are split into two categories for visualization: publications that contained the search terms sedaDNA or sedimentary ancient DNA (dark grey bars); and publications that had the search terms sedDNA or sedimentary DNA, but that did not contain the first two terms (pale grey bars).
In recent years, sedaDNA has shown a great potential for reconstructing past biological assemblages within lakes and their terrestrial catchments (for review see [13][14][15][19][20][21][22]). Following extraction of sedaDNA, several analytical methods have been used (Table S1) including: (i) methods aiming to detect or quantify the occurrence of specific target organisms on the basis of endpoint PCR assays, quantitative real-time qPCR and special applications such as droplet digital ddPCR; (ii) DNA metabarcoding-based methods combining PCR amplification of marker loci (often described as "barcodes") with high-throughput sequencing; (iii) metagenomic approaches based on untargeted shotgun sequencing of the total pool of DNA recovered from the sediment; (iv) hybridization-based target enrichment methods to recover DNA fragments of interest from the sediment metagenome (see Section 3.8).
Because sedaDNA captures biological changes in both the aquatic and terrestrial ecosystems, past genetic signals have the potential to provide insights about landscape development (including terrestrial floral and faunal changes and anthropogenic impacts) and the lake ecology itself (Table S1). Using sedaDNA to reconstruct changes in local vegetation has become a particularly well-established method as a complement to traditional microscopic analyses in lake sediment (e.g., pollen) e.g., [23][24][25][26][27][28][29][30]. It also offers the opportunity of studying the response of organisms to disturbances of natural and/or anthropogenic Quaternary 2021, 4, 6 4 of 58 origin [31][32][33][34][35], the interactions of species at different trophic levels [36][37][38] and between native and introduced species [35,39], or species recovery and ecosystem restoration [40]. Lake sedaDNA is not only of interest to paleoecologists but also to archaeologists, because the recovered data can provide insights into human history and interactions with the environment such as agriculture and urbanisation e.g., [41,42]. Lake sedaDNA can also be used by geomorphologists to trace sediment sources or by evolutionary biologists to study population changes through time.
In the present work, we aim to provide an overview of the most recent advances in sedaDNA research on lakes. We also identify the major methodological knowledge gaps that remain and present original data-in the form of seven case studies-that tackle these challenges. Finally, we recommend a series of current best laboratory practices to successfully and robustly reconstruct past environmental change from lake sediment DNA archives, but acknowledge that this is a rapidly advancing field, which requires continuous update.

sedaDNA to Study Past Vegetation Changes from Lake Catchment
DNA-based studies of past vegetation have mainly focused on arctic, boreal, and alpine regions because of their high sensitivity to climate change, providing new insights on past vegetation dynamics and species distributions e.g., [16,26,27,[43][44][45][46][47][48][49]. At high-latitude regions, sedaDNA has for example contributed to increased knowledge on the occurrence of insect-pollinated plants, which are typically underestimated in pollen analyses [24,27,50].
Some recent examples of studies are described below.
An exceptionally rich sedaDNA record that covers 24,000 years of vegetation dynamics in the Polar Urals, provides empirical evidence that arctic-alpine species survived early Holocene forest expansion in this heterogeneous landscape [27,47]. A study of 10 lakes in northern Fennoscandia show that there was a continuous increase in the regional species pool from the onset of the Holocene until the last two millennia, suggesting a severe time lag in colonization [16]. In the European Alps, sedaDNA studies have provided new knowledge about the history of agricultural activities; for instance, fruit trees were detected by sedaDNA but were not detected by microscopic pollen analyses [51]. Other sedaDNA studies have focused on the effects of soil evolution, climate change and pastoral activities on plant communities e.g., [32]. Based on the current body of literature, we know that sedaDNA can also trace vegetation changes over millennia in environments at cold high-latitude regions, including tropical regions from high-altitude sites with lower temperatures and warmer lowland sites [52][53][54]. At the same time, some of the above-cited works highlighted confounding methodological issues related to accelerated degradation of sedaDNA with temperature, which may result in less informative historical DNA signals in lake sediments from moderate to hot temperature environments e.g., [53].

sedaDNA to Detect Human and Animal Presence in the Lake Catchment
Human activities have been inferred from microscopic analyses of pollen from cultivated plants, native plants that are favored by livestock grazing or other disturbances and coprophilous fungal spores associated with livestock [55]. The first studies that aimed to track the presence of humans and domesticated animals in a lake catchment with sedimentary DNA-based methods used bacterial DNA indicative of past human and animal faecal waste [56,57]. In contrast to pollen, which normally gives both a local and regional signal in a lake, bacterial DNA provides only a local and catchment specific signal, although it does not allow identification of which livestock species were present. The metabarcoding approach applied to a high-altitude lake sediment record in the Alps revealed the first history of livestock compositional dynamics, including Bos (cow), Ovis (sheep), and Equus (horse/donkey) [31]. Other studies of lakes in the northwestern Alps have similarly provided new detailed insights into the temporal patterns of pastoral activities and changes in livestock species composition, especially during the Medieval Period [58][59][60][61]. Even in the very remote site of Kerguelen Island, an ongoing rabbit invasion could be successfully traced using sedaDNA, elucidating the impact that the invasion had on plant community and landscape erosion [35]. Other studies have used shotgun metagenomics to detect the presence of mammalian megafauna and other animals formerly present in lake catchments. Such approaches have contributed to constraining the timing of an island woolly mammoth population extinction in Alaska during the Holocene [33] and understanding when newly deglaciated landscapes became biologically viable in North America at the end of the last ice age [62]. However, due to methodological limitations and/or lack of animal sedaDNA preservation in lakes, recent studies set out to target mammalian DNA from lake sediments have not been particularly successful [63] and resulted instead in by-catching different animal groups [64] (see Section 1.5).
Most of these studies describe the long-term responses of communities to environmental perturbations (see [13]), while others investigate ecological interactions through time, such as the co-existence of parasitic groups and their phytoplanktonic hosts [36,37] or the role of double-stranded DNA viruses in terminating past algal blooms [88]. A study succeeded in tracing within-population genetic variation in an algae population during the Last Glacial Maximum [81]. The presence of key functional sets of genes from aquatic organisms can also be traced using the sedaDNA approach, like in the case of mcy genes responsible for producing microcystin by specific cyanobacteria [89,90], amoA genes associated with ammonia oxidation [91] and merA genes involved in mercury detoxification [92,93]. Finally, the research works showed how it is possible to study past activity of methanotrophs in lake systems [66,[94][95][96][97].

Influence of Taphonomic Processes on the Burial and Persistence of sedaDNA
The studies presented above show the utility of sedaDNA methods to unravel biodiversity changes in lakes and their catchments but several concerns remain about the interpretation of such data due to the limited knowledge we have on the taphonomic processes occurring in lakes (i.e., origin, transport, and preservation of genetic material under prevailing environmental conditions). While issues related to taphonomy in lakes have been reviewed elsewhere [21,39], we provide here a brief update.
From source environment to sediment: For aquatic organisms living in the lake, several factors can influence the processes of incorporation and burial of DNA in the sediments, e.g., organism abundance, its spatial distribution, the ability to form cysts, and edibility of the focal taxon [98,99]). For terrestrial organisms living in the lake catchment, it is important to consider also the transport of material and the DNA from the catchment to the lake, which likely depends on multiple factors including soil erosion and hydrological connectivity [14,30,39,64,82].
Degradation and adsorption of DNA molecules at the water-sediment interface and in sediments: Environmental conditions in water and at the water-sediment interface and within the sediment column-such as temperature, redox state, conductivity and pHinfluence the rates and extent of both abiotic and biotic DNA degradation e.g., [66,67,100]. The adsorption of DNA to mineral particles was recently reported to be a strong factor controlling the persistence of DNA molecules in sedimentary archives [101,102]. Desorption of adsorbed DNA from the particulate phase in sediments can also occur and is dependent on the mineralogic composition, pore-water pH or the valence and concentrations of cations in the sediment [103]. Changes in environmental conditions at the water-sediment interface are thus likely to be of crucial importance for the long-term preservation of DNA molecules during burial into sediments. Meanwhile, sedaDNA has also been successfully recovered and analyzed from aquatic systems that did not provide ideal preservation conditions, including sediments from warm, tropical lakes [52,83,104,105] and oxygenated deep-sea sediments [11,106]. However, the state of such preservation was found to be poor with the exception of the upper sediments corresponding to the last~200 years, as found by Bremond et al. [53] in tropical Lake Sele (Africa). This may be due to accelerated DNA degradation rates related to higher temperatures (i.e., average annual air temperature of 28 • C over the course of the year) and the associated higher bacterial activity at their study site. As such, there is still much to be learned regarding the conditions that promote and compromise sedaDNA preservation.
Early diagenesis of DNA molecules during burial in sediments: Experimental evidence from lake sediments suggest that there is only a limited effect of early diagenesis on the DNA signal of microbial eukaryotic communities [107]. However, it is possible that DNA is degraded or damaged during early diagenesis, e.g., by microbes using extracellular DNA as energy sources [108] or by environmentally-induced strand breakage [109]. In contrast, DNA from organisms that produce resting stages (cyanobacterial akinetes), Cladocera resting eggs (ephippia) or resting propagules (protists), is likely better protected than extracellular DNA or intracellular DNA from organisms with more fragile outer membranes (see Section 2.3). Similarly, recalcitrant structural elements such as lignin in terrestrial plants are likely to protect cellular DNA against microbial degradation more efficiently than for most of the aquatic primary producers (algae and macrophytes).
Long-term persistence of DNA molecules in sediments: Overall, the processes involved in the preservation of DNA on long time scales in sediments are not yet fully understood, but DNA preservation still seems to be affected mainly by environmental conditions (temperature and anoxia) during incorporation at the bottom of the lake. Nevertheless, sedaDNA molecules have been reported from more than 270,000-year-old sediments of Lake Van in Turkey [72], and from marine sediments nearly a million years old [110]. Such reports however, are very close to the theoretical limit of ancient DNA preservation [111] and should be therefore treated with caution until they are repeated using methods that can properly authenticated ancient DNA from these substrates [112]. Today we know that strand breakage, miscoding lesions' crosslinks may heavily compromise the analysis of sedaDNA molecules [109,[113][114][115]. Thus, the quality of DNA molecules needs to be carefully inspected to avoid confounding temporal biodiversity change with changes in DNA quality over time, particularly when inferring long-term patterns on species persistence and richness e.g., [16,26,39,47,116]. Authenticity of sedaDNA older than 10,000 years and/or from poorer preservation settings can be demonstrated by identifying characteristic DNA damage patterns in metagenomic data and supported by corroborating proxy data whenever possible [33].

To What Degree Does the sedaDNA Signal Represent Past Communities?
Success in recovering targeted ancient DNA from sedimentary archives depends on the taphonomy and preservation of DNA molecules within the sediments (as described in Section 1.5) and the ability to extract, sequence and taxonomically identify the DNA molecules [82,117]. The recovery success can be evaluated by comparing plant DNA signals with flora surveys in the lake catchments and signals from other sedimentary proxies like macrofossils, pollen, biomarkers or historical records. Yet, it has now become evident that the taxonomic breadth of environmental reconstructions based on metagenomics approaches (e.g., [62,81,118]) can now potentially expand to all groups of organisms and is therefore too vast to be systematically "validated" by other proxies. The degradation of DNA molecules over time in sediments and the presence of active microbes that com-promise or obscure the archived ancient signal should also be considered carefully when doing sedaDNA analysis.

sedaDNA Data Compared to Historical, Archaeological and Monitoring Data
Terrestrial organisms: For the DNA molecules of an organism that lived in a lake's catchment, the chances of reaching the lake bottom depend on several factors [39]. One of these factors is organism biomass, which affects DNA production at the source [14,30,64,82]. A study comparing sedaDNA from surface samples with vegetation surveys from eleven lakes in the boreal/subarctic ecozone showed that the effective detection of plant taxa mainly depends on the abundance of the taxa in the vegetation and distance from the lake [82]. Additionally, there were differences among the lakes studied and among plant families. Overall, the sedaDNA signal was strong enough to reconstruct near-shore vegetation types, and the detection of aquatic species was particularly good [82]. When compared to historical maps, sedaDNA accurately tracked exotic conifer plantations, and in contrast to pollen analyses from the same site, it did not show any signal from taxa dispersed over longer distances [119]. Thus, the DNA signal from lake sediment represents the vegetation within the catchment, in particular that in close hydrologic connection to the lake (e.g., shoreline, streams).
Meanwhile, studies assessing the reliability of mammal DNA records from lake sediments in comparison to historical or archaeological data are rare. This is because most sedaDNA studies to date have focused on areas where such data are lacking, leaving this gap as yet unfilled. However, the first study reconstructing livestock composition has been validated by archaeological remains and historical data [31]. Finally, the temporal occurrence of sedaDNA from rabbits, which had invaded an island in the 19th Century, was consistent with known historical records [35].
Plankton DNA compared to water monitoring data: Several studies confirmed that DNA from most aquatic planktonic organisms-from photosynthetic microbes to fishcan be detected in lake sediments [11,18,73,85,120]. Recently, few studies provided first insights into the reliability of DNA signals preserved in sediments to represent lacustrine communities. For example, Capo et al. [120] revealed that 70% of the microbial eukaryotic taxa (phylogenetic units in this work) living in the water column were retrieved in the sedimentary archives of a temperate lake. However, a comparison of the protist composition in the water column and in the underlying surface sediments showed that some groups, including cryptophytes, were underrepresented in sediments [120]. This is thought to be related to their high nutritional value, which makes cryptophyte cells likely to be preferentially grazed by herbivorous zooplankton [121,122], explaining therefore why their DNA does not reach the sediment.
Processes involved during the transport of DNA molecules or dead cells in the water column are still poorly understood [99], especially for smaller sized aquatic prokaryotes and protists with poorly discernible morphological traits. Few bacterial taxa can be identified using traditional microscopy cell counts, with some exceptions such as (large-size fraction) cyanobacteria. The work of Monchamp et al. [73] revealed indeed a correlation between the taxonomy of pelagic cyanobacteria morphology identified in the water column and the genetic information obtained from sediment. Similarly, Garner et al. [80] identified DNA from heterotrophic bacteria and viruses in sediment using the contemporary microbial diversity in surface-water metagenomes from the corresponding lakes as references. The presence of DNA from pelagic zooplankton in lake sediments confirms the potential using the sedaDNA approach to study temporal community changes in larger taxa, including rotifers, copepods, and Cladocera e.g., [11,45,83,123]. Altogether, as recently stated by Armbrecht et al. [22], it is "reasonable to assume that obligate photosynthetic plankton and/or zooplankton do not survive and reproduce after burial". While there is relatively limited evidence to date of how the active growth of modern-day microbial communities affects the ancient DNA signal at depth (i.e., below the first few centimeters in the sediment column), sedaDNA studies investigating past bacteria and archaeal communities can be Quaternary 2021, 4, 6 8 of 58 strengthened by the authentication of ancient DNA signals with procedures to map DNA damage patterns such as those described in Pedersen et al. [62] and Lammers et al. [81].

sedaDNA Data Compared to Other Sediment Proxies
Molecular reconstructions of past flora and fauna based on sedaDNA have been compared against a range of biological and geochemical sediment proxies, often showing complementarity between the different approaches [27][28][29]33,48]. In particular, comparisons with well-established proxies such as plant macrofossils, pollen and diatom enumeration, coprophilous fungal spores, or specific biomarker identification, have provided unique opportunities to evaluate the nature and reliability of the DNA signal obtained from sediments.
Plant DNA compared to pollen and macrofossil analyses: The first studies comparing pollen and sedaDNA showed rather limited overlap, therefore only partly confirming the reliability of the taxa detected by DNA signatures [23,28,124]. However, the detection of taxa in sedaDNA has greatly increased in recent years with expanded reference libraries and improved molecular methods, and we see now higher taxonomic overlap between DNA and pollen [25,27]. A major issue for interpreting sedaDNA data, also applying to pollen data, is the source area. Plant abundances inferred from metabarcoding reads seem to decline with distance from the lake, suggesting that sedaDNA provides a local signal [82]. Pollen is likely not a major source of DNA in sediment records, as pollen from taxa typically dispersed over long distances is poorly recorded in sedaDNA [25,27,28,[125][126][127]. A metagenomic study focusing on woody taxa from a Beringian island also found that sedaDNA from spruce (Picea) could not be detected, although it was present in the pollen record, whereas willow could be detected in both records, with the sedaDNA data authenticated [127]. Compared to pollen, sedaDNA provide a more detailed account of agricultural activities due to its higher taxonomic resolution (e.g., identification of fruit or vegetable plant species; In [39,58,128]. Thus, sedaDNA provides a more detailed description of the catchment area, particularly for the areas with close connectivity to the lake such as the shoreline and inflowing streams and for the areas impacted by erosion processes.
Fewer studies have compared sedaDNA with macrofossils [23][24][25]29,125] and the expectations are that plant species detected with macrofossil can also be detected with sedaDNA. However, some taxa detected as macrofossils may not be observed in the DNA record and vice versa. Which proxy is more sensitive may vary according to the nature and conditions of the site, the taxonomic group and the amount of biomass produced [24,25]. Nevertheless, the majority of the taxa detected are similar in comparative studies where macrofossil preservation is good.
In conclusion, comparison with pollen and macrofossils, as well as vegetation surveys and historical maps, confirms that sedaDNA detects the common species and has sufficient detectability and resolution to determine the vegetation types.
Mammal DNA compared to fossil records: The presence of herbivorous animal herds in a lake catchment can be inferred from sediments by using other proxies than sedaDNA. The most established approach is based on the analysis of pollen and the detection of nitrophilous and ruderal taxa favored by animal faeces, trampling and selection due to plant consumption e.g., Rumex sp., Urtica sp., Chenopodium sp., Plantago sp.; In [55]. Some studies also focused on the presence of DNA from bacterial lineages specific to the gut microbiota [56,57,[129][130][131]. Other approaches involve the use of spores of coprophilous fungi, which develop on herbivorous faeces (Sporormiella sp., [33,35,130], lipid biomarkers [132], or corroborating data from radiocarbon dated bones [33]. All these approaches can be used to assess the robustness of mammalian DNA records [33,35,39,130]. Indeed, several of these studies showed that the detection rate of mammalian DNA is lower than rates estimated via other inferred methods. A good detection of mammals has been observed at sites where there was only one source of drinking water [33] or where the concentration of mammals was high due to pastoral practices (e.g., presence of stabling areas; In [31,39]) or migration routes [62]. In contrast, the detection of mammals in tundra sites is poorer, likely because individuals are scattered and the sources of drinking water are multiple [64]. At a more southern site, mammal sedaDNA has not been detected even in the presence of abundant coprophilous fungal spores [63]. Nevertheless, sedaDNA has the advantage of allowing identification at the species level, which is not possible using pollen, Sporormiella spores or specific bacteroidales. While currently no mammalian DNA and faecal DNA datasets have been compared, this is, nevertheless, a promising complementary approach. Indeed, ratios between different stanols and bile acids can be used to distinguish between omnivore and ruminant species and between humans and pigs [131,[133][134][135]. Though limited, evidence from archaeological sites suggests that shotgun metagenomic reads assigned to a range of different animal taxa, mirror their respective biomass estimated using classical analyses of bone remains [136]. Nevertheless, some work remains to establish whether this quantitative relationship for sedaDNA might hold across different types of archives.
DNA from aquatic biota compared to lipids, pigments and subfossils: The sedaDNA signal of past aquatic biota has mainly been compared to lipid, pigment, or subfossil (such as diatom frustules) records. The current view is that the DNA information in sediments degrades faster than pigments or lipids, although significant positive correlations have been observed between these different proxies [18,72,77,[137][138][139][140][141]. Studies comparing diatom diversity retrieved from sediment using morphological and genetic approaches consistently show that beta-diversity values obtained from both methods were highly comparable (i.e., the turnover is similar), whereas alpha-diversity values and taxonomic assignment datasets were not [45,[142][143][144][145][146][147]. This discrepancy in the reconstructions is at least in part attributable to very sparsely populated genetic databases available for diatoms, which can be tested in the near-future thanks to the exponential growth of genomic databases and initiatives such as the Earth Biogenome Project [148]. Heinecke et al. [149] demonstrated that the detection of sedaDNA identified as Potamogetonaceae was consistent with the recovery of subfossil remains from a species within this family of hydrophytes, whereas Clarke et al. [26] recently detected Callitriche and Sparganium in both pollen and sedaDNA, whereas each of the proxies detected an additional two and five taxa of aquatic macrophytes. DNA from animals, land plants, zooplankton, and from many photosynthetic bacteria and protists can be preserved in sediments. These data provide information on past ecosystems prevailing at the time of deposition. This is because the pool of DNA is unlikely to originate from organisms living in dark and/or anoxic conditions in the sediments upon burial. In contrast, subsurface microbial communities (notably facultative and obligate anaerobic microorganisms) are generally thought to be structured through in situ environmental conditions such as the availability of electron acceptors and donors, porosity, and sediment lithology e.g., [150,151]. However, recent studies suggest that subsurface microbial taxa were present at the time of deposition and that their vertical distribution in the sedimentary record was shaped by the paleoenvironmental conditions that prevailed at the time of deposition [152][153][154][155][156]. For example, downcore sedimentary 16S rRNA gene profiling revealed that Holocene sediments of Laguna Potrok Aike in Argentina reflected a vertical stratification linked to electron acceptors availability while in the Late Pleistocene samples, up to 50,000 years in age, salinity, organic matter-type and the depositional conditions over the Last Glacial-Interglacial cycle being the most important selective stressors [153]. Analogously, shotgun metagenomic analyses of sediments from the Arabian Sea revealed subseafloor bacteria that were involved in denitrification processes during the formation of an extensive oxygen minimum zone [154]. A switch to fermentation is a likely explanation for their subsequent long-term post-depositional survival. However, none of these examples determined to what extent the identified communities represented dead, dormant, or metabolically active communities. Conversely, a recent study of Dead Sea sediments [157] illustrated a new pathway of carbon transformation in the subsurface and demonstrated how life can be maintained in extreme environments characterized by long-term isolation and minimal energetic resources.
Besides revival and cultivation of the living subset of the community or methods based on metabolic probing, several indirect approaches have been used to test for microbial viability (live/dead) and/or activity (see [158] for a review). Here we describe the most feasible approaches that can be optimised for use with sedimentary records and which are also compatible with downstream DNA sequencing. It is generally accepted that a cell must be intact, capable of reproduction, and metabolically active to be considered alive. The separate extraction of intracellular vs. extracellular DNA and subsequent amplicon or shotgun metagenomic sequencing analysis can reveal the diversity and metabolic potential of intact living vs. dead subsurface bacteria. This approach was applied to shallow sediments of tropical Lake Towuti (Indonesia) to reveal which microbial populations grew, declined, or persisted at low density with sediment depth [159]. Sequencing analysis of reverse transcribed sedimentary RNA markers most likely reflects the activities of microbes that were alive at the time of sampling [160]. That is because transcription is among the first levels of cellular response to environmental stimuli and RNA has a much shorter average half-life than DNA, being in the order of hours or days for ribosomal RNA and hours to minutes for messenger RNA [161]. Viability PCR via propidium monoazide (PMA) is another promising live/dead approach for sedimentary bacteria [162]. This nucleic acid intercalating dye binds to extracellular DNA and DNA inside damaged cells whereas it cannot enter living cells with intact membranes. Upon exposure to a bright light source, photoactivation causes PMA to form covalent bonds so that the irreversibly damaged DNA cannot be amplified in PCR assays. A comparison of microbial communities between untreated and PMA treated samples will reveal, respectively, total vs. living bacteria. However, this approach needs to be performed on freshly collected sediments and efficient exposure to the light source is essential and requires optimisation ( [158] and references therein).
Overall, short read lengths (<200 base pairs (bp)) have often been associated with the more damaged signatures of ancient DNA libraries (e.g., [33,62,104,118]) providing an idea about ancient and modern DNA in the sediment DNA pool. A few more sophisticated bioinformatic approaches have also been developed. For instance, the growth rate of environmental bacteria may be calculated by measuring genome replication rates from shotgun metagenomic data. The most promising of these approaches is the Growth Rate Index (GRiD) [163] because it can infer growth rates of specific microbial populations from complete or draft genomes as well as metagenomic bins at ultra-low sequence coverage (0.2x). If used in high throughput mode, prior knowledge of the microbial composition and coverage is not required [163]. Finally, the assessment of ancient DNA damage patterns, a bioinformatic method applied to metagenomic data (see Section 3.8), has recently been applied to sedaDNA to identify ancient DNA sequences with postmortem damage [29,33,62,81,118,127]. Such a procedure is a powerful method to ensure the authenticity of the DNA fragments assigned as ancient in sedaDNA studies.

State of the Art Lake sedaDNA Analyses
Working on sensitive samples such as sedaDNA requires the application of strict sampling and laboratory protocols and prevents contamination by modern exogenous DNA. Lake-sediment cores should be opened and sub-sampled in clean, dedicated ancient DNA laboratories [164,165]. The DNA is extracted from the samples, and molecular methods are then applied using targeted approaches (PCR, qPCR, ddPCR, metabarcoding) or whole-(meta)genome shotgun sequencing (shotgun metagenomics, target enrichment through hybridization capture) (see Table S1 for detailed information about the methodology used in each study). Recovered DNA sequences are then taxonomically and/or functionally annotated using a suite of bioinformatic tools to answer paleoecological questions. The potential and limitations of DNA approaches have been largely discussed [117,166,167]. However, several considerations specifically related to the study of ancient DNA in sediments are addressed here [13,14,21,22,103]. The appropriate procedure for the analysis of sedaDNA will inevitably depend on an array of parameters, from the origin and dis-tribution of studied organisms in their ecosystems to the factors influencing sedaDNA preservation, extractability and, in the case of PCR-directed approaches, the probability of amplifying authentic sedaDNA. In this section, we synthesize the most recent literature concerning the different steps of experimental design for sedaDNA work which we augment with seven original case studies described in detail in the Appendix A.

Criteria for the Selection of Lakes
Lake selection is first and foremost defined by the specific scientific aims and budgetary considerations. For many purposes, like archaeological studies, choice might be limited to natural archives close to the site of interest and conditions that lead to efficient burial and preservation of DNA in sediments. DNA paleo-reconstructions have mostly used sediments from alpine, boreal and arctic lakes, where ancient DNA is well preserved due, for instance, to low temperatures, bottom water anoxia, and little to no bioturbation [100]. Nevertheless, successful sedaDNA studies have also been conducted in temperate lakes, where conditions appear to be less suitable for DNA preservation. Thus, we likely do not yet have a full understanding of the factors controlling DNA preservation. For example, it has been suggested that faster sedimentation rates in temperate and tropical regions [168] result in the DNA being buried faster below the active surface sediment zone which would cause more rapid preservation and immobilization in anoxic conditions. This may in turn overshadow higher degradation rates that result from the comparatively high temperatures that prevail at the bottom of such systems. More work is needed to accurately define the environmental conditions where sedaDNA is most likely to be efficiently preserved.
When considering the effect of past environmental changes on terrestrial ecosystems, the topography and size of a lake catchment relative to the lake size can influence the prospects of successfully using a sedaDNA approach. For extracellular DNA, which is readily adsorbed to clay particles, the source of material transported to lake sediment can be strongly influenced by erosional processes [39]. For example, surface soil horizons would be expected to provide more plant DNA than deep mineral horizons, bare soil or glacial flours. In addition, a well-developed hydrological network and higher rates of topsoil erosion may transport and deposit DNA that provides a better representation of a catchment flora and thus of the different potential habitats than less hydrologically connected areas [39]. Thus, for lakes with low allochthonous sediment input from areas with low relief and/or no major inflow streams, the DNA signal will mostly represent the plant community in the riparian zone within a few hundred meters [82], or the lake itself. On the other hand, the DNA signal from lakes with a larger hydrological catchment and considerable riverine input may represent a much larger source area [27,47]. No sedaDNA research, to our knowledge, has focused on transport processes in arid zones, however it is likely that the intermittent hydrological connection to the catchment would also affect the representativity of the DNA signal towards a stronger riparian signature, in a similar way than what has been found for bulk organic carbon pools and biomarkers [169,170].
Additionally, different sediment lithologies, including those from within the same lake basin, pose varying challenges to successful PCR amplification of sedaDNA. This is the case for the presence of humic substances, co-extracted with DNA, that can act as inhibitors and have adverse effects on the performance of any PCR or other nucleic acid analysis (e.g., [171]). Depending on the concentration and type of inhibitors and the particular enzymes used for the PCR, effects can be highly variable, i.e., some enzymes being more sensitive than others. In our case study A1 (Appendix A), we evaluated to what extent PCR inhibition was found to be related to sediment quality by taking advantage of a minerogenic-rich to organic-rich sediment continuum. Analyzing PCR inhibition along a sediment profile (138 sediment samples), no clear relationships between inhibition and sediment type were observed. However, it was noted that minerogenic-rich sediments (~32,000-21,000 yr. BP) had little to no inhibition while organic-rich sediments (~21,000-14,000 yr. BP) resulted in stronger inhibition even if these effects were variable. A slow decrease in inhibition is observed towards the youngest organic part of the core, creating an overall V-shape pattern ( Figure 2). The reason for this remains unclear. It might be related to community changes in or around the lake, or (bio)chemical changes in the sediments. One key result of this case study is the strong negative correlation (r 2 −0.66, p < 0.01) between PCR inhibition and the number of plant DNA sequences amplified and sequenced in our work ( Figure A1 in Appendix A).
Quaternary 2020, 3, x FOR PEER REVIEW 12 of 61 decrease in inhibition is observed towards the youngest organic part of the core, creating an overall V-shape pattern ( Figure 2). The reason for this remains unclear. It might be related to community changes in or around the lake, or (bio)chemical changes in the sediments. One key result of this case study is the strong negative correlation (r 2 −0.66, p < 0.01) between PCR inhibition and the number of plant DNA sequences amplified and sequenced in our work ( Figure A1 in Appendix A).

Number of Sediment Cores to Collect for sedaDNA
In molecular paleoecological studies, it is common practice to collect a single sediment core from the deepest part of a lake because we assume that the genetic information is homogeneously distributed across the sediments and that this in turn reflects the biodiversity in the catchment. This practice is largely based on the assumption that DNA is distributed in a manner similar to fine-grained material such as organic matter or pollen, where one core is typically representative for the entire studied lake basin. However, this assumption can be questioned because we know that micro and macrofossils and organic The plot shows three measures of DNA extraction/amplification quality with time on the x-axis (years before present (BP)). The first plot (blue dots) shows the DNA concentration (ng.µL −1 ) with a moderate to strongly negative correlation with age. The second plot (pink dots) shows the PCR inhibition as the dilution volume (in µL) necessary for qPCR reactions to succeed. Inhibition is absent until~23,000 years BP and increases steadily afterwards. The third plot (green dots) shows the log 10 of the mean number of raw reads from plant DNA metabarcoding. The number of reads is strongly and negatively correlated to the level of inhibition (see Appendix A). A simplified description of sediment lithology is provided at the top of the figure: minerogenic (M), organic (O), and minerogenic-organic (MO) sediment types.

Number of Sediment Cores to Collect for sedaDNA
In molecular paleoecological studies, it is common practice to collect a single sediment core from the deepest part of a lake because we assume that the genetic information is homogeneously distributed across the sediments and that this in turn reflects the biodiversity in the catchment. This practice is largely based on the assumption that DNA is distributed in a manner similar to fine-grained material such as organic matter or pollen, where one core is typically representative for the entire studied lake basin. However, this assumption can be questioned because we know that micro and macrofossils and organic matter can have a patchy distribution in the lake basin [172][173][174]. Additionally, the transfer and deposition of organic matter-and therefore catchment-DNA-in lake sediments is not necessarily homogeneous and may depend on catchment features [39]. Indeed, while a single-core signal may be suitable for capturing the temporal dynamics of small planktonic organisms that are evenly distributed in the water mass, the detection of DNA from larger aquatic organisms (e.g., fish, hydrophytes, littoral mussel species) can be strongly influenced by their more heterogenous in-lake distributions, as shown previously in eDNA studies [99,175,176]. A complex lake topography (e.g., lakes with two distinct basins) may also cause spatial variation in the DNA signal. Thus, there is a need to assess spatial variability in sedaDNA signals.
The use of "field replicates" i.e., collection of several sediment cores within one lake basin, in sedaDNA studies may be used to assess (i) how consistent the signal is at a specific site (core-site replicates) and (ii) whether or not there is spatial variability in the sedaDNA signal. The work of Etienne et al. [177] showed that field replicates led to a high spatial heterogeneity on the signal of fungal spores. In contrast, the recent work of Weisbrod et al. [178] with surface sediment DNA showed that a single sediment core can capture the dominant microbial taxa when targeting toxin-producing cyanobacteria. Regarding aquatic plants, their dispersion potential in the water has been proposed as a factor that can influence their detection in sedaDNA studies, particularly in large and deep lakes (5.45 km 2 surface area, 71 m maximum depth; [179]). In this study, free floatingleaf plants that can be more easily dispersed, were also readily detected in the deepest part of the lake. In contrast, helophytes, which are rooted in the near-shore area (littoral zone), were less-well detected and submerged plants were in an intermediate position.
These results contrast with the findings from a survey of 11 smaller and shallower lakes (0.04-27 ha; 1.7-20 m), where two samples taken 15 cm apart in the center of each lake allowed for the detection of 90% of the common and dominant and 30-60% of scattered and rare taxa of macrophytes [82].
Taking multiple sediment cores from the same site, i.e., "core-site replicates" may also facilitate the detection of organisms that are in relatively low abundance in aquatic systems or further away from the lake within the catchment. It may, for example, be necessary for detection of fish DNA in sediments. Indeed, despite numerous attempts by multiple authors involved in the present work, the detection of fish from sedaDNA archives has only been reported in a few lake systems to date [85][86][87]180]. Although there is still uncertainty about the reasons for this apparent failure, the low amounts of fish DNA, compared to microbial DNA, present in the sediment may be one explanation. The potential to increase the sensitivity of such analyses by capitalizing on the dramatic rise in sequencing capacity that we are currently experiencing may indeed help us tackle this problem.

Storage of Sediment Cores Prior to DNA Analysis
Inadequate handling during coring and subsequent storage of sediment samples can have unexpected consequences such as (i) degradation of DNA by fast-growing bacteria that use nucleic acids as a substrate or due to hydrolysis and oxidation and (ii) modification of the composition of mainly the microbial DNA pool by growth. For instance, even minimal exposure to oxygen results in rapid fungal and bacterial growth. Even storing sediment in well-sealed conditions at 4 • C can clearly affect the reliability of the sedaDNA signal, particularly when studying past microbial diversity with e.g., 16S or 18S rRNA metabarcoding. The 16S rRNA gene amplicon data from the case study A2 (Appendix A) highlight the effect of secondary anaerobic growth on the DNA signal observed from sedimentary archives. In particular, these data show that microbial seed banks can reactivate or alternatively, that new microbes can colonize the sediments shortly after exposure to oxygen ( Figure 3). In the two cases reported here, freeze-thaw cycles ( Figure 3A) and oxygen diffusion ( Figure 3B) increased the proportion of extractable DNA that mainly originated from growth of facultative anaerobic and metabolically versatile Gamma proteobacteria. Therefore, in 16S rRNA amplicon DNA inventories, secondary growth can induce a significant bias in the signal from past and present bacterial communities in the sediment ( Figure 3B, Appendix A). bacteria. Therefore, in 16S rRNA amplicon DNA inventories, secondary growth can induce a significant bias in the signal from past and present bacterial communities in the sediment ( Figure 3B, Appendix A).  (2) depth, with sediments stored unfrozen under anoxia in hermetic bags (top) and those that experienced oxygen diffusion in Falcon tubes (bottom). Secondary growth occurred due to pore water oxidation during the 4 months of sample storage, which also resulted in higher intracellular DNA concentrations in the aforementioned oxidized (light green) than pristine (dark green) (light green) ferruginous sediment samples.
Overall, it is likely desirable to split sediment cores lengthwise into two halves-one for other analyses (e.g., geochemistry, dating) and one for DNA analysis. Additionally, it is prudent to sub-sample sediments immediately after opening the core and then store subsamples frozen at −20 °C or below. However, freezing is not needed in all cases and storing cores at 4 °C under a protected, oxygen-free atmosphere in tightly sealed containers may improve their preservation by avoiding unnecessary freeze-thaw cycles. For example, the pristineness of DNA extracted two years after sediment core sub-sampling was validated by using qPCR assays targeting facultative anaerobes [154]. These results ensure that metagenomic information can be interpreted in terms of past microbial processes in the water column at the time of deposition. Similarly, a study of vascular plant sedaDNA from lake sediment cores that were retrieved in 2009, left untouched and stored at 4 °C until being sub-sampled in 2014 yielded a detailed floristic sedaDNA record [26,27]. Other cores stored for 5-10 years at 4 °C have been successfully used to recover plant DNA [16,181]. Some early studies [83,105,182] stored the sediment samples in Queen's Tissue Buffer, but generally, the use of chemical DNA preservatives, such as ethanol, Longmire's lysis buffer or RNA later for sediment subsamples is not common for studies of sedimentary ancient DNA. While a dedicated test has not been published to the best of our knowledge, the addition of chemical preservatives can potentially cause problems in DNA extraction, and would introduce a further potential source of contamination.

Number of Analytical Replicates to Perform for sedaDNA Research
Similar to field replicates, the use of analytical replicates for both DNA extraction and PCR amplification is required for detecting taxa with low detection probabilities like rare species (i.e., specific fish populations) or species that are remotely located (e.g., terrestrial mammals). In order to increase the probability of detection, processing a number of replicates has proven to be helpful [39,183,184]. Numerous sedaDNA studies have included replicate samples at the extraction and/or PCR steps [16,24,25,31,35,39,49]. In the context of studying taxa with low detection probabilities (e.g., mammals), Ficetola et al. [183] recommended the use of at least eight PCR replicates. For catchment vegetation, comparison to macrofossils [24] and vegetation surveys [24] have shown that one positive PCR out of four or eight PCR replicates, respectively, may represent true positives (i.e., the presence of the desired targeted DNA in the environmental sample). However, it was also mentioned that increasing the number of analytical replicates can increase the probability of false positives [185].  (2) depth, with sediments stored unfrozen under anoxia in hermetic bags (top) and those that experienced oxygen diffusion in Falcon tubes (bottom). Secondary growth occurred due to pore water oxidation during the 4 months of sample storage, which also resulted in higher intracellular DNA concentrations in the aforementioned oxidized (light green) than pristine (dark green) (light green) ferruginous sediment samples.
Overall, it is likely desirable to split sediment cores lengthwise into two halves-one for other analyses (e.g., geochemistry, dating) and one for DNA analysis. Additionally, it is prudent to sub-sample sediments immediately after opening the core and then store subsamples frozen at −20 • C or below. However, freezing is not needed in all cases and storing cores at 4 • C under a protected, oxygen-free atmosphere in tightly sealed containers may improve their preservation by avoiding unnecessary freeze-thaw cycles. For example, the pristineness of DNA extracted two years after sediment core sub-sampling was validated by using qPCR assays targeting facultative anaerobes [154]. These results ensure that metagenomic information can be interpreted in terms of past microbial processes in the water column at the time of deposition. Similarly, a study of vascular plant sedaDNA from lake sediment cores that were retrieved in 2009, left untouched and stored at 4 • C until being sub-sampled in 2014 yielded a detailed floristic sedaDNA record [26,27]. Other cores stored for 5-10 years at 4 • C have been successfully used to recover plant DNA [16,181]. Some early studies [83,105,182] stored the sediment samples in Queen's Tissue Buffer, but generally, the use of chemical DNA preservatives, such as ethanol, Longmire's lysis buffer or RNA later for sediment subsamples is not common for studies of sedimentary ancient DNA. While a dedicated test has not been published to the best of our knowledge, the addition of chemical preservatives can potentially cause problems in DNA extraction, and would introduce a further potential source of contamination.

Number of Analytical Replicates to Perform for sedaDNA Research
Similar to field replicates, the use of analytical replicates for both DNA extraction and PCR amplification is required for detecting taxa with low detection probabilities like rare species (i.e., specific fish populations) or species that are remotely located (e.g., terrestrial mammals). In order to increase the probability of detection, processing a number of replicates has proven to be helpful [39,183,184]. Numerous sedaDNA studies have included replicate samples at the extraction and/or PCR steps [16,24,25,31,35,39,49]. In the context of studying taxa with low detection probabilities (e.g., mammals), Ficetola et al. [183] recommended the use of at least eight PCR replicates. For catchment vegetation, comparison to macrofossils [24] and vegetation surveys [24] have shown that one positive PCR out of four or eight PCR replicates, respectively, may represent true positives (i.e., the presence of the desired targeted DNA in the environmental sample). However, it was also mentioned that increasing the number of analytical replicates can increase the probability of false positives [185].
The use of analytical replicates to assess the diversity and composition of planktonic communities from sedaDNA has been shown to provide a highly consistent assemblage composition of dominant taxa e.g., [34,38,76,141]. For instance, Ibrahim et al. [38] revealed similarity between DNA inventories obtained from triplicate extraction replicates in terms of community structure of microbial eukaryotes, diatoms and cyanobacterial assemblages obtained from a sediment record covering the last 100 years. Although the relative abundances of dominant microbial molecular taxa do not vary much between analytical replicates, it is noticeable that the proportion of shared taxa can be relatively low (<40% shared taxa between extraction replicates; [34,107]). Such low levels of consistency can be caused by the detection of numerous rare taxa, illustrating the generic challenges involved in estimating absolute richness from DNA metabarcoding data [184].

Tracing Contamination of sedaDNA Samples
The process of recovering sedaDNA must be carried out following guidelines for ancient DNA, due to high contamination risks e.g., [186]. There are many ways in which sedaDNA samples can be contaminated, from the time of field sampling to sequencing. Contamination can originate from the equipment and consumables used during the sampling collection, sediment core extrusion and core splitting, but also derive from compromised cleanliness in the ancient DNA laboratory, non-sterile laboratory reagents and insufficient precautions taken by the operator(s) to avoid cross-contamination and introduction of exogenous DNA sources. Sterile tools, clean working environments, appropriate clothing (single-use suit, gloves, mask, hairnets) and molecular biology grade reagents that may need to be decontaminated with UV irradiation can help to minimize the introduction of exogenous modern DNA.
Because modern DNA molecules are intact and have typically limited post-mortem damage while are at the same time normally present in higher concentration than ancient DNA, there is always a risk that such "false targets" are amplified during PCR. It should also be noted that samples with the lowest concentrations of sedaDNA are more susceptible to contamination, because of lower competitiveness during PCR [39]. PCR amplicons generated by qPCR or metabarcoding are particularly insidious forms of contamination, as they are highly concentrated and indistinguishable from authentic results. For this reason, it is crucial that pre-and post-PCR facilities are physically separated and strict protocols are used in the way reagents and personnel transit between these areas [164]. Contamination by modern DNA is more likely when studying microbial, fungal, and human aDNA [113], or plant species that are widespread and/or used for furniture/building construction such as pine and spruce [25]. Even when all precautions are taken, and quality control procedures are followed (i.e., integrating all the required negative controls and taking multiple replicates), the authenticity of sedaDNA sequences can be difficult to demonstrate especially from metabarcoding data (see also Section 3.8), and the bioinformatic filtering procedure required is not always straightforward (see also Section 3.10).
One way to trace contamination that has occurred during coring activities is to spray or paint the coring equipment with an artificial DNA tracer like DNA extracts or amplicons of a plasmid or an exotic species not likely to be present in the original sample ensuring that only contaminant-free internal parts of the core are analyzed [16,62]. During subsampling, DNA extraction, PCR amplification and library preparation, negative controls are always necessary to track potential contamination and can be used to filter DNA inventories for potential contaminants [14,25,66,185,186]. Positive PCR controls are largely used in environmental DNA research to verify the success of the molecular biology procedure and evaluate the presence of sequencing errors. For ancient DNA analyses, they should be avoided or used in a laboratory physically separated from the ancient DNA areas, to avoid potential cross-contamination [16]. Finally, the use of occupancy-detection models is a new approach for estimating the frequency of false positives and can be informed by the results of negative controls [187,188].

DNA Extraction Methods for sedaDNA Research
The DNA extraction method used may influence the DNA signal obtained from sediments. The PowerSoil, PowerMax, and UltraClean DNA Isolation kits (Qiagen) have so far been widely used by terrestrial and aquatic molecular ecologists (Figure 4), while various other kits and custom protocols have also been used by a number of studies. Based on our review (Table S1), the choice of DNA extraction protocol appears to be driven largely by the prior success of the research group with a particular kit. As more studies comparing and/or optimizing extraction methods are published (case studies described in Box 1, In [189]), the importance of selecting extraction methods optimized for the different sediment types become apparent. Box 1. Optimizing DNA extraction protocols for molecular paleoecology.
In this box we describe the main findings of five case studies and for each of them additional information is presented in Appendix A. The case study A3 compares eukaryotic inventories obtained using a protocol extracting both exDNA and inDNA (NucleoSpin protocol) and a protocol favoring exDNA extraction (Taberlet 2012 protocol). We showed that the composition of eukaryotes varies depending on the extraction protocol used, even when considering the high variability in the signal recovered from each lake ( Figure 5A). This is also the case in terms of richness ( Figure A3 in Appendix A). One striking finding is that the extraction of total DNA with the NucleoSpin protocol appeared to be more efficient for detecting rotifer DNA than the Tarberlet2012 protocol for extracting exDNA, most likely because of the improved extraction of DNA from rotifer eggs with the lysis buffer from the NucleoSpin protocol. The case study A4 revealed that the qPCR amplification of several aquatic and terrestrial taxa targeted (bacteria, diatoms, eukaryotes, plants, arthropods and vertebrates) led to similar amplification levels (low quantification cycle (Cq) values correspond to higher amplification success)-when comparing inDNA and exDNA fractions obtained from a modified Powersoil protocol, with the exception of arthropods that were found to be amplified preferentially from intracellular DNA ( Figure 5B). In addition, the use of the unmodified Powersoil protocol for extracting total DNA was more efficient for detecting and amplifying sedaDNA from several biological groups (average Cq values lower). For plant DNA in surface sediments, similar results were obtained in the case study A5 across four different lakes, in which differences in the diversity retrieved from total DNA vs. exDNA (Taberlet2012) were investigated both for use with the NucleoSpin kit and the PowerMax Soil kit. For samples from lakes surrounded by a high taxonomic diversity of terrestrial plants, distinct differences in the number of plant molecular taxa retrieved were observed between extraction protocols, with the unmodified PowerMax kit revealing the highest number of Molecular Operational Taxonomic Units (MOTUs) ( Figure 5C). In our case study A6, we show that a modified DNA extraction protocol designed to release the mineral-bound sedaDNA from calcitic minerals using EDTA-based chelation provide higher richness estimates of plant assemblages in calcite-rich sediments but was comparable for organic-rich sediments, though with a lower level of reproducibility ( Figure 5D). Finally, while investigating the number of positive PCR replicates required for the analyses of domesticated mammal DNA (ovine and bovine) in the case study A7, we reported higher amplification success when using Amicon filters coupled with the Taberlet2012 protocol relative to using the standard approach (Taberlet2012 protocol) ( Figure 5E).

DNA Extraction Methods for sedaDNA Research
The DNA extraction method used may influence the DNA signal obtained from sediments. The PowerSoil, PowerMax, and UltraClean DNA Isolation kits (Qiagen) have so far been widely used by terrestrial and aquatic molecular ecologists (Figure 4), while various other kits and custom protocols have also been used by a number of studies. Based on our review (Table S1), the choice of DNA extraction protocol appears to be driven largely by the prior success of the research group with a particular kit. As more studies comparing and/or optimizing extraction methods are published (case studies described in Box 1, In [189]), the importance of selecting extraction methods optimized for the different sediment types become apparent.  Table S1. DNA deposited at the water-sediment interface can be either extracellular (exDNA) or intracellular (inDNA) [13,22,103] and different DNA extraction protocols can be used to preferentially extract the two fractions. InDNA is likely to be more protected inside protective resting stages such as cysts or spores, cell membranes, or lignins, but can also  Table S1. DNA deposited at the water-sediment interface can be either extracellular (exDNA) or intracellular (inDNA) [13,22,103] and different DNA extraction protocols can be used to preferentially extract the two fractions. InDNA is likely to be more protected inside protective resting stages such as cysts or spores, cell membranes, or lignins, but can also be attacked by nucleases present in the cells. In contrast, exDNA released into the environment after cell lysis can be quickly adsorbed on clay minerals, which significantly promotes preservation by decreasing chemical and physical degradation processes [101] and making its molecules less accessible as a food source for indigenous sediment bacteria [190]. One of the protocols that allows desorption of exDNA fragments bound to minerogenic particles is described in Taberlet et al. [191] named hereafter the Taberlet2012 protocol. This protocol relies on the use of a phosphate buffer and it is coupled to the NucleoSpin soil kit protocol (Macherey-Nagel). This approach has been successfully used in several recent lake sediments studies (e.g., [31,32,39,51] but see also [25]). However, no explicit comparison has been made between molecular inventories obtained from total DNA (inDNA and exDNA), inDNA and exDNA. Here, we provide three case studies-A3, A4 and A5 that highlight differences and similarities using extraction protocols targeting the different fractions of the DNA (see Box 1, Appendix A).
The physical and chemical properties of sediments can have strong effects on the efficiency of DNA extraction from sediments. For instance, clay minerals and calcite both bind tightly to DNA [101,102,192] but require different extraction approaches to maximize DNA recovery. In the case study A6, we show that carbonate-rich sediments yield lower amounts of sedaDNA than organic-rich sediments using a conventional extraction protocol (PowerSoil kit) and an optimized extraction protocol can greatly improve vascular plant DNA recovery from carbonate-rich samples. (Box 1, Appendix A). The co-extraction of inhibitors, particularly from organic-rich sediments, can cause major problems for downstream analyses. Although new DNA extraction methods are being developed to reduce the effects of inhibitors [189], further refinements and a better understanding of the nature of inhibition are still necessary. An alternative approach to reducing the concentration of inhibitors in organic-rich sediments is to include an additional cleanup step (e.g., [193]). This was shown in case study A1, using the OneStep PCR inhibitor removal kit (Zymo Research) after the PowerSoil kit, which successfully reduced the level of inhibition with only a limited loss of DNA (mean DNA recovery of 91%, see Appendix A).
To reduce reagent usage in downstream procedures, it may be desirable to concentrate the supernatant or eluate, either during or after DNA extraction, respectively. To achieve this, a concentration step using spin columns can be added, either just after the desorption when using the phosphate buffer or our carbonate-optimised approach or at the end of the extraction when using the PowerMax kit. This approach has been recently applied to sediments [29,35,42,53,62,81,154] and in the case study A7 we provide new data showing its efficiency (Box 1, Appendix A).
Altogether, the findings of our case studies allow us to provide recommendations about the type of DNA extraction protocols to use for specific ecosystems and target taxa. Ecosystems with different features-lake basin and catchment-can result in different relative abundances and richness because the studied biological communities and potentially sources of origin are different. Additionally, contrasted sediment lithologies (e.g., minerogenic vs. organic-rich sediments) can influence the extractability of sedaDNA and thus the DNA signal recovered from sedimentary archives. For instance, carbonated post-glacial lake sediments should be extracted with a DNA extraction protocol optimized for DNA extractability to maximize the recovery of plant richness from the sedimentary archives (case study A6). Furthermore, sediment records with changing lithologies over time can lead to different efficiency in DNA extractability and PCR inhibitions (case study A1). Conversely, different extraction protocols may yield significantly different relative abundances and richness in taxa and genes, and should therefore be selected following preliminary analyses for suitability, notably where the heterogeneity between sample types and reconstructed assemblages is expected to be important. In most cases, it is better to use a DNA extraction protocol that extracts both intracellular and extracellular DNA, such as NucleospinSoil, Powersoil and PowerMax kits, including a physical lysis step for improved detection of taxa, regardless of the targeted group (case studies A3, A4 and A5). While no test suggests the need for concentrating DNA to successfully recover the sedaDNA signal from relatively abundant and small organisms such as microbes (case studies A3 and A4), the optimization of DNA extractions protocol to increase DNA amounts is recommended when targeting remotely located and not abundant organisms such as terrestrial mammals (case study A7). While our case studies show that sediment type and extraction protocol may have a significant impact on taxonomic composition recovered from ancient sedaDNA, much like in the modern environmental DNA literature, advances in measuring and modelling the taxonomic bias inherent to the metagenomic experimental workflow towards more quantitative results are promising [194].

Sediment Amount to Use for DNA Extraction
It makes sense to take as much sediment as possible to avoid a "nugget effect", whereby DNA is heterogeneously distributed, and have a sample that best represents the variation present within a sample, although this depends on how heterogeneous the sediments are and if larger samples have been homogenized before subsampling for extraction. Furthermore, trade-offs need to be made with other complementary analyses that also require sediments. Many DNA extraction kits commonly used in sedaDNA studies (e.g., DNeasy PowerSoil kit, NucleoSpin soil kits) are limited to smaller amounts of sediment (<2 g wet sediment). However, one of the most widely used extraction kits (Figure 4) is the DNeasy PowerMax Soil DNA Isolation kit, which can use up to 10 g of starting material. By contrast, the Taberlet2012 protocol allows the extraction of DNA from a large amount of sediment (e.g., 15 g of wet sediment) to which phosphate buffer is added. A recent study by Kang et al. [195] showed that sediment mass input (0.5 g vs. 10 g) did not affect the resulting diatom richness or community structure inferred from metabarcoding. The case study A3 found differences in the total DNA concentration of eukaryotic sedaDNA extracted when applying the Taberlet2012 protocol with different sediment masses (0.75 and 4 g of wet sediment). Although only four lakes were examined, higher sediment masses did not consistently lead to higher total DNA concentrations in the extracts ( Figure A3 in Appendix A). Secondly, the predominant OTUs differed strongly between the two sediment masses for both micro-eukaryotic and metazoan groups: in lake Lauzanier (LAZ), diatoms and rotifers were highly abundant in 0.75 g samples, whereas Cercozoa, nematodes, and unclassified Stramenopiles and Metazoa were predominant in 4 g of material. In contrast, extraction using phosphate buffer (Taberlet2012 protocol) resulted in a lower number of reads, poorer repeatability, and less diversity detected compared to PowerMax Soil kit both for minerogenic [24] and organic sediments [25].

Molecular Methods for Generating sedaDNA Data
The method used to generate sedaDNA data is first and foremost constrained by the ecological question of the specific project. We describe below the three main methods that are currently used in sedaDNA research.
Targeted quantitative analysis: is used to detect and/or quantify specific taxa through the use of methods such as qPCR and ddPCR. In qPCR, target relative abundance is quantified to provide information about the occurrence of historical taxa. However, inhibition during qPCR reactions can bias this quantification. Alternatively, this bias can be utilized to quantify inhibition (e.g., [189]). Unlike qPCR, the recently developed ddPCR does not require standard curves and inhibition assays, due to pre-amplification partitioning of target templates into thousands of droplets of defined minute volumes where individual PCR reactions will take place (see [196,197] for application to modern environmental samples). For this method, the detection limit is very low, which may be advantageous given the issues that can be present in sedaDNA extracts. Meanwhile, as in environmental DNA studies [167], the use of these methods should be validated by providing supporting data to confirm that the DNA amplified truly corresponds to the target. Two options include the use of TaqMan probes, increasing the specificity of binding to the targeted marker region, or the sequencing of qPCR products. The samples were obtained from three small lakes from the Southern Taymyr Peninsula, Siberia, and from three locations within Lake Karakul, Pamir Mountains, Tajikistan. The molecular taxa were split into three categories: terrestrial plants,  DNA metabarcoding: can be used to assess the diversity and composition of specific assemblages (e.g., plankton, vegetation, fish) [167]. This method is based on the barcoding principle, which consists of sequencing standardised markers that are conserved enough to be specific to the target higher taxonomic group but variable enough to contain enough information to discriminate lower taxonomic groups, such as species or genera [198]. Available reference sequences are compiled into databases to which the metabarcode sequences are compared for taxonomic assignment. Metabarcoding is a robust and powerful tool that has been widely applied in sedaDNA studies (Table S1) [11,13,74,181,199]. The power, but also limitation, of metabarcoding is the fact that it is PCR-based. This makes it possible to amplify minute quantities of template molecules, but also introduces PCR and amplification biases (e.g., [200]). Because the targeted, reference databases can often be relatively complete for specific biological groups, bioinformatics become easier to handle. This is achieved by adding unique combinations of forward and reverse primers that include short unique tags of 6-15 nucleotides, which enables pooling of a large number of PCR products that can be sequenced on the same run. After sequencing, the DNA reads obtained can be demultiplexed by subsetting the DNA reads based on the tag associated with each primer. Metabarcoding requires the template to be present in the sample, and, if the fragments in the sample are too degraded, then an intact template, including the primer binding sites, might no longer be present. Thus, generally short barcodes are targeted, which may have lower taxonomic resolution compared to longer ones [201]. Furthermore, this method does not retain the damage patterns found at or near the ends of aDNA molecules, which are typically used to authenticate ancient DNA. This is because primers, which are artificial constructs, would either bind to or exclude the ends of the template molecules where damage is present. The damage signal is therefore lost during PCR amplification. Nevertheless, it remains the most used approach to studies of many eukaryotes as large numbers of samples can be run with comparatively low processing efforts and costs (see [167]).
Shotgun and target-enriched metagenomics: represent nascent but promising approaches to reconstructing past biodiversity preserved in sedaDNA [29,33,62,202]. The immediate advantage of metagenomic over PCR-based approaches is that they can resolve the ultrashort DNA sequences that cannot be amplified by PCR but are characteristic of the vast majority of sedaDNA molecules. In particular, shotgun metagenomics does not have the taxonomic biases and blind-spots that are inherent to PCR approaches, and which may preclude this latter approach for certain ecological questions and/or taxonomic groups.
Unlike PCR-based approaches, metagenomic approaches allow for the ends of ancient DNA molecules to be sequenced. This allows one to differentiate between modern DNA (from indigenous sediment microbiota or exogenous contamination) and sedaDNA by identifying patterns of DNA damage that accumulate via age-and temperature-related hydrolytic and oxidative decay (e.g., cytosine deamination, depurination induced DNA strand breakage), [29,33,62,81,118,127]. However, to retain damage patterns in metagenomic libraries, it is necessary that the polymerase used during the indexing PCR step does not stall when amplifying these templates. Great care is also required when selecting a proofreading, high fidelity polymerase because they often cannot read through uracils. Notably, polymerases such as PfuTurbo Cx HotStart (Agilent) or AccuPrime polymerase (Thermo Fisher) are specially engineered to be able to amplify templates containing uracil bases with minimal bias [203].
The phylogenomic resolving power of metagenomics, either through non-targeted shotgun sequencing or targeted enrichment via hybridization-based capture, can be harnessed to reconstruct population-level diversity. Schulte et al. [202] demonstrated that ancient Siberian larch species can be resolved by designing hybridization probes based on contemporary chloroplast genomes. To reconstruct an ancient algal population, Lammers et al. [81] used an iterative-mapping approach to reconstruct full organellar genomes and were able to distinguish multiple haplogroups. Fragment recruitment strategies, as used by these studies, could also be deployed to reconstruct microbial ecotypes which, if matching with high similarity to contemporary ecotypes that are sufficiently described, can lend insights into the ecological niches of paleoenvironments [80].
Metagenomics is positioned to become a powerful approach to reconstructing historical lake systems through microbial functional analysis and may enable access to the functional diversity of past microbiomes. For example, a novel approach to describing paleo-oceanographic conditions through the functional analysis of Black Sea sediment metagenomes draws an interesting perspective on reconstructing microbial functions by leveraging the depositional-age functional traits that survive in the metabolisms of extant sediment microbiota [193].
When sufficiently comprehensive genomic reference databases are available, the detection of species may be dramatically enhanced [29]. With the growth of these databases the potential for identifying species using the shotgun metagenomics approach will greatly increase. One such promising initiative is the recently launched Earth Biogenome Project setting out to provide a high-quality genome inventory of eukaryotic life on earth [148]. Currently, there is a strong bias towards mammals, agricultural and medicinal plants, and pathogenic microbes in available databases. However, this will change as an increasing number of large-scale genome skims data-i.e., shallow shotgun sequencing of genomes to reconstruct multicopy markers and organellar genomes-become available [204,205]. Thus, we expect metagenomics-based approaches to be increasingly used in the near future.
Published comparisons between methods: In the wider molecular ecology literature, comparisons between these methods have been performed on environmental DNA samples and mock communities. For example, Wood et al. [206] found that ddPCR had the highest detection rate of the Mediterranean fanworm in water and biofouling samples when compared with qPCR and metabarcoding (which had the lowest detection rate) approaches. qPCR was also found to have a higher detection rate over metabarcoding by Harper et al. [207] in an eDNA study; however, other studies have emphasized the power of metabarcoding for discovering members of the community which are not anticipated at the stage of selecting more specific primers [208]. The choice between metabarcoding and metagenomics is equally blurred. Studies have generally shown that metabarcoding provides slightly more accurate assignment [209,210], with higher detection frequencies [211], but may also generate more spurious Molecular Operational Taxonomic Units (MOTUs) than metagenomic approaches. We note that the performance of metagenomics will almost certainly improve further as genomic reference databases become more complete. Differences in taxonomic composition can be observed between metabarcoding and metagenomics datasets, especially if the DNA is very degraded [189,212]. In all cases, comparisons will be context-dependent, relying on parameters such as DNA degradation, sequencing depth, primer selection, detection thresholds, available reference libraries, and appropriate use of controls.

DNA Markers and Reference Databases Used in Current sedaDNA Research
sedaDNA from lake sediments is degraded into short fragments. For instance, shotgun libraries from Holocene lake sediments consisted mainly of DNA reads between 30 and 100 bp [33,62], although fragments up to 560 bp have been found in 10,000-year-old sediments in the Black Sea ( [11]. Thus, targeting long barcodes (typically >150 bp; [213]), which generally have higher taxonomic resolution [198,214,215] is not a viable option for sedaDNA approaches, even if shorter barcodes generally have lower taxonomic resolution [201,216,217]. An alternative is to use multiple barcodes and compare the results [48]. If shotgun metagenomics is used, then a representative sample of all DNA fragments present in an extract is sequenced, independent of the length.
Nevertheless, as mentioned in Section 3.8, there will be a bias in the data related to representation in the reference databases [29]. This bias increases the risk of random matches to model organisms that are well covered in reference libraries [81], although fortunately, methods have recently been developed to address this issue (e.g., [218]). For any of the methods, there is a real risk of false positive assignment due to incomplete reference libraries or inappropriate filtering (see Section 3.10). This risk is inversely related to the taxonomic resolution of the marker, as there is a higher chance of obtaining incorrect assignments for a conserved sequence than a highly specific one. An error in the reference library will furthermore have a greater effect the more conserved the marker is.
Here, we propose DNA markers that can be used to study sedaDNA for various types of organisms (Table 1). For each type of organism, we identify the current most-suitable DNA marker to use but acknowledge that as overall genomic resources increase [148,219,220], the potential for developing new markers, or using full organellar and/or nuclear genomes, also increases. DNA markers to reconstruct past vegetation changes: Most environmental aDNA studies of plants target the short and variable P6 loop of trnL (UAA) intron [14,44,201], which is not a standard plant barcode (CBOL 2009, Ref. [214]). There is a well-curated trnL (UAA) intron reference library (ArctBorBryo) containing 2445 sequences of 815 arctic and 835 boreal vascular plants [50,226], as well as 455 low-arctic bryophytes [227]. Recent large-scale genome skimming reference libraries have been generated for Norway/Polar regions, the European Alps, the Carpathians (n = 6655, Ref. [25]), China (n = 1659, Ref. [204]) and Australia (n = 672, Ref. [205]). These new reference libraries give not only the full nuclear ribosomal DNA and chloroplast genome sequences (including the P6 loop), but allow for improved detection using shotgun metagenomics [29] and designing probes for target enrichment [189,202].
DNA markers to detect mammalian presence in a lake catchment: Several universal mammalian primer sets were initially designed for ancient DNA analyses from animal remains [228], which were then applied to environmental samples, such as coprolites, frozen soils, and cave sediments [43,229,230]. Ultimately, a large set of mitochondrial universal and species-specific markers were developed [43,229,[231][232][233][234]. Because it is near-impossible to avoid modern human contamination, universal primers have to be combined with a human-blocking probe to inhibit the exponential amplification of human DNA templates (such as in [235]). In lake sedaDNA studies, a new universal primer, MamP007, leading to the amplification of a fragment of the mitochondrial 16S rRNA gene, was proposed and has been applied successfully [31,35,39,51,59,60]. Because of the low number of differences between this mammalian primer and the binding sites for avian species and clitellates worms, as well as the generally low mammal sedaDNA template content in lake sediments, MamP007 was also able to amplify these non-target groups of taxa [64], along with fish taxa and amphibians [179,236]. Consequently, the mammalian DNA of targeted species has to be considered as "rare" for lake sedaDNA applications. The scarcity of mammalian aDNA in lake sediment archives will also depend on biomass, behavior, human practices for domestic herbivores, and as explained in Section 2.1, the transfer capacity in the catchment-lake system [39,64]. The quantity of mammalian DNA can thus strongly vary from one site to another and according to the animals of interest.
DNA markers to study the past diversity of aquatic organisms: To date, 16S rRNA genes have been the most common target used to investigate ancient bacterial and archaeal diversity in metabarcoding studies [18,66,91,[94][95][96][97]104]. For instance, 16S rRNA and amoA genes allowed for the detection of ammonium-oxidizing Archaea and inference of past variation in nutrient and salinity levels in lake sediments [91]. Total bacteria, type I methanotrophs, type II methanotrophs and the NC10 phylum were traced using primers for amplicons between 111 and 200 bp long encoding their 16S rRNA genes to study past methane oxidation across <2000 years freshwater sediment profiles [94][95][96][97]. For cyanobacterial assemblages, Monchamp et al. [73,74,84] used a 16S primer set (400 bp) specifically designed for cyanobacteria by Nübel et al. [237].
To track past changes in the diversity and composition of microbial eukaryotic communities (including phytoplankton and fungi), the 18S rRNA gene V7 region marker used in Capo et al. [76] offers a good tradeoff between taxonomic resolution and fragment length (260 bp). It has been proven to capture past modifications related to environmental change [34,38,76,141]. Coolen et al. [11] targeted the V1-V3 region of 18S rRNA genes while Kisand et al. [78] and More et al. [79] targeted 18S V4 and V9 region, respectively. Specific databases, such as PR2 [238] and SILVA [239], can be used to identify microbial eukaryotes from any of the 18S genetic regions. In addition, the ITS region has been proposed as a DNA marker for fungal specific barcoding from environmental samples [240]. Such DNA markers have been for instance used by Ortega-Arulú et al. [241] targeting ITS 1 and Tõnno et al. [242] ITS 2 to identify fungal taxonomy against the UNITE database [243].
The detection of diatom communities in environmental DNA can be traced by targeting the chloroplast rbcL [105,144,147,244] and the nuclear 18S V4 region [245] because both markers are well represented in current reference databases, which facilitates the assignment of sequences to lower taxonomic levels. The short rbcL metabarcode (67-76 bp) [105] has been shown to facilitate specific amplification of diatoms from tropical [105] and arctic lake sediments [143,146,147], as well as marine deposits [246]. Moreover, the amplification of larger (577 bp) rbcL fragments could be achieved from more recent sedimentary deposits [142,247], while the amplification of the hypervariable V4 region of the 18S results in the detection of additional taxa next to diatoms [248], which include other Stramenopiles, Alveolata and Rhizaria. A more specific diatom amplification with 18S V4 is enabled when applying the marker to filtered water DNA [249] or modern biofilm samples [250]. In addition, primers targeting a 230-bp region of the viral major capsid protein (mcp) gene have been used to study the diversity of dsDNA algal viruses (Phycodnaviridae) in Holocene sediments [88] Complete zooplankton assemblages have not yet been extensively studied from sedimentary archives. Studies have targeted specific groups i.e., copepods [45,123] and rotifers [83], but studies employing universal primers are rare. A combination of both 18S and COI primers has been used to analyze eukaryotic organisms [251] but, to our knowledge, no work has been published to specifically study sedaDNA for zooplankton assemblages across multiple phylogenetic groups. One potential bias is that the V7 region of the 18S rRNA gene of Daphnia (and probably other cladoceran taxa) is longer than other zooplankton species [252] leading to a potential bias in the PCR amplification and/or sequencing of such DNA markers to reconstruct past zooplankton communities. Additionally, in the case study A3 (Box 1, Appendix A), 18S rRNA gene V7 region amplicon sequencing showed that microeukaryotes dominated in terms of taxonomic richness (76 to 96% of the MOTUs are represented by microeukaryotes).
There are few sedaDNA studies that have successfully applied metabarcoding techniques to fish. Published studies have focused on short single-marker PCR or qPCR approaches, designed for species such as Perca flavescens [253], Coregonus lavaretus [85], and Oncorhynchus sp. [86]. Evaluations of markers have been carried out in the eDNA literature for applications such as biomonitoring [254,255] and seafood identification [256]. Achieving a balance between specificity and breadth in a marker has proven challenging due to the diversity of fish taxa and ecotypes; however, several studies have highlighted the potential utility of the 12S ribosomal region. Collins et al. [254] found that a 12S rRNA gene marker achieved higher levels of universality and taxonomic discrimination, combined with lower levels of non-specific amplification when compared with COI, 16S rRNA, and cytb genes. Although COI has a much greater reference database coverage, due to being the universal metazoan barcode [198], many reads are assigned to non-target groups making it difficult to obtain reproducible signals for fish taxa, which will be especially problematic given the low amounts of fish DNA compared to bacterial or planktonic groups in sediments. The MiFish-U 12S primers, in combination with a high annealing temperature, have been used to successfully amplify a hypervariable 163-185 bp fragment of fish DNA in surface sediments [221], although applications to older sediments remain uncertain.

Bioinformatic Filtering and Analysis of sedaDNA Data
While a comprehensive review of bioinformatic methods for sedaDNA goes beyond the scope of this work, we provide here an overview of some of the main tools employed as background to our case studies and we discuss some of the specific implications related to study design and selection of laboratory methods recommended in Section 4. Currently, there is no standard procedure nor criteria proposed for data "filtering" or "cleaning", neither for metabarcoding nor shotgun metagenomics data. This is because developments, and eventual standardization, of bioinformatic filtering criteria are still in their infancy. Generally, taxa that are more frequently detected in negative controls than in samples can be considered suspect and should be removed from the data set. Regardless of the procedure applied, "true" presences can be removed (type II error), and "false" presences retained (type I error) [82,183]. To limit these errors, sedaDNA data can be cross-validated by comparing it with independent proxies [22,24,39,82]. When no cross-validation is possible, one can set the threshold where the majority of retained taxa are biogeographically likely to be true positives whereas the taxa filtered out are not expected in the sample [27,47]. However, such an approach has the potential to be overly conservative and bias the taxonomic inventory toward the present. The detection of a given suspect taxon in the same sample as taxa with similar ecological preferences can also be used to validate the presence of this suspect taxon [39]. For metagenomic datasets, taxa with very low read counts (e.g., n =~3-5 per sample or 1 count in each of 3-5 samples) are generally excluded as spurious and controls thoroughly screened [62]. In metagenomic data sets, it is also imperative that a strict minimum fragment length cutoff (e.g., 35 bp) is applied to reduce spurious alignments [257]. Where doubt remains, it is important that the filtering procedure does not impact the conclusions of the study [25].
A major issue with the lack of standard bioinformatic pipelines to analyze DNA metabarcoding or shotgun metagenomics data is that slightly different procedures may lead to contrasting results [167,[258][259][260]. Many sedaDNA works use OBITOOLs [261] to analyze DNA metabarcoding data (e.g., [16,31,35,199]), using mainly the EMBL nucleotide database or a custom reference database depending on the target taxonomic group or geographic region. This package, along with similar packages such as the Anacapa toolkit [262], contains various functions allowing for a comprehensive control of different sections of the data handling pipeline from read alignments, cleaning and filtering (i.e., removal of potential sequence variants generated by sequencing errors) to taxonomic classifications with simple user-customizable parameters. For shotgun metagenomics data obtained from sedaDNA, an example emerging procedure is the Holi pipeline [62]. By mapping shotgun reads to the full EMBL genomic and/or nucleotide database, which encompasses all realms of life, and subsequently assigning those matches to the lowest common ancestor taxa, Holi greatly diminishes the risk of misassignments [62]. In addition to the cleaning, merging, mapping, and annotation steps provided by the tools compiled in this pipeline, tools such as pmdtools [263] and mapDamage [264] can be used to identify ancient DNA present in alignments to ultimately differentiate between modern and ancient DNA molecules ensuring, therefore, the reliability of the sedaDNA signal obtained as per described in Section 2.3.
The choice of reference library used to annotate DNA sequences is a key component for obtaining reliable ecological information. Indeed, a major source of false positives may also be due to errors in reference libraries or sequence sharing combined with local species lacking in the reference library. Hence, the choice between the use of curated and non-curated reference databases (using more liberal alignment parameters) and the use of regionally local reference databases (like that made for the Norway/Polar regions and the European Alps) is of utmost importance to avoid misidentifications.
Details on statistical data analysis for processed sedaDNA data (univariate and multivariate) are beyond the scope of the present review. We instead direct readers to some key publications providing relevant recommendations about this step of the workflow [188,[265][266][267].

Recommendations
In this section, we use state-of-the-art developments in lake sedaDNA research identified from our synthesis of previous studies and compiled case studies to provide methodological recommendations for future sedaDNA work. Because of the high variability in the sedaDNA signal from lakes with contrasting catchment morphologies as well as numerous other factors influencing the transport of DNA from sources to sediments, its extractability and amplification, it remains difficult to provide clear and concise guidelines about how to collect, analyze, and interpret sedaDNA data. However, we aim for this effort to further promote and guide future sedaDNA research, which will result in more robust reconstructions of past changes in aquatic and terrestrial biodiversity and offer predictions of the consequences of current and future climatic and environmental changes on biota.
Lake selection: Select the lake based on ecological questions and, if needed, adapt protocols to improve the ability to extract and recover DNA from the studied sediments. As a coring campaign is a major effort, especially in remote areas, we suggest, as a pilot study, analyses of surface sediments to test the amenability of the sedimentary record for sedaDNA analysis. If studying DNA from terrestrial organisms, it is important to first consider the size and topography of lake catchment and the hydrographic network to estimate the efficiency of DNA transfer to the lake following detailed recommendations from Giguet-Covex et al. [39]. Sediment lithology (e.g., clay mineral content) may also impact the preservation potential of DNA in lake sediment [101], but further study is required to create a predictive framework for lithology-based site selection.
Field replicates: The ideal number of sediment cores to sample depends on the target taxon and question. If one wants to assess the signal from unequally distributed large organisms (e.g., fish), the use of spatial field replicates may be beneficial for the detection of your target species. For investigation of terrestrial or more evenly distributed aquatic organisms such as plankton, a single core taken in the central part of the lake may suffice.
Core-site replicates: If the target organisms are rare or remotely-located, the use of multiple replicate cores from the same site is recommended to increase the probability of detection. In contrast, taking multiple cores is time-consuming and costly and implies the need of inter-calibration between cores for dating. Coring with tubes with broad diameter (e.g., 90 mm) may be an alternative to core-site replicates to increase the amount of material collected.
Analytical replicates: When enough material (i.e., sediment) is available, use multiple analytical replicates to ensure the reliability of the data obtained. When material is limited: if targeting abundant biological organisms (i.e., plankton), two to three analytical replicates (extraction and/or PCR replicates) should suffice to capture their sedaDNA signal; if targeting rare or remotely-located organisms, the use of at least six to eight analytical replicates is recommended. For further discussion about how to calculate the suitable number of replicates needed for statistical considerations, we direct readers to relevant publications [167,188].
Handling of analytical replicates: Extraction replicates can be pooled prior to PCR step and PCR replicates can be multiplexed with the same tagged primers to reduce costs and increase the number of reads per sample. However, such an approach will result only in presence-absence data and will not allow for assessments of replicability or semiquantification for the results. If replicability data are desired, then it is essential that different tagged primer sets are used so that replicates can be separated and analyzed independently in silico.
Storage of cores and sediment subsampling: For core storage, recreate in situ conditions in the sediment column (cold, dark, anoxic) as much as possible. Perform subsampling immediately after core collection, or immediately after splitting the core in half. After the core liners are cut, a wire cutter is often used to split the core, which often results in contaminated surfaces of the core halves. It is therefore essential that these surfaces are removed using sterile implements, so that an unexposed layer is sampled, Sterile syringes with the ends cut off with sterilized scissors, blades or a heated wire can serve as mini corers to obtain subsamples (e.g., procedure outlined by Pedersen et al. 2016). The collected material can then be transferred to sterile tubes, while avoiding the potentially cross-contaminated smeared surface (e.g., [18]). Directly store the subsampled sediment at −20 • C or proceed to DNA extraction. For vascular plant and mammalian sedaDNA analysis, the core can be stored cool, ideally unopened, for several years prior to sampling, although secondary growth by fungi, for example, may lead to reduced efficiency of targeted sedaDNA recovery. To avoid freeze-thaw cycles of the sediment, sub-sample the sediment in multiple tubes for later extraction, if needed.
Monitoring of contamination: Consider using a synthetic DNA or exotic amplicon tracer during coring. Follow established protocols for minimizing contamination in ancient DNA laboratories [164]. Include and sequence negative controls from sampling, extraction, and downstream steps, as a minimum. Compare sequences/taxa that appear in both the controls and samples-these should, at a minimum, be treated with caution or removed entirely. Whenever possible, perform a multi-proxy approach with diagnostic macroscopic, microscopic, cellular, and/or DNA markers for specific taxa to cross validate the sedaDNA approach, especially if results are unexpected or ground-breaking. Use analytical methods that remove putative contaminants from the dataset.
Choice of DNA extraction protocol: In most cases, the following protocols should efficiently extract DNA from sediments: Powersoil (Pro) kit for small amounts of sediment (approx 0.25 g), PowerMax or FastDNA kit for larger sample sizes (up to 10 g), although the sediment type (organic-, clay-, carbonate-rich, etc.) should also inform the protocol used. Different extraction protocols may yield significantly different richness and relative abundances in taxa and genes, and should therefore be selected following preliminary analyses for suitability, notably where the heterogeneity between sample types and reconstructed assemblages is expected to be important. For shotgun sequencing approaches on very old sediments, it is better to use a protocol that retains ultrashort DNA fragments, such as the protocol from Pedersen et al. [62] or Murchie et al. [189]. Combine DNA inventories obtained from different DNA extraction protocols to maximize your chances to detect a specific target and/or robustly reconstruct the studied biological assemblage.
Choice of the data generation method: It depends on the question and target organism(s) of interest. Shotgun metagenomics can give an indication about the degradation state of the DNA molecules that can be necessary when data authentication is required (e.g., working with microbes that can live in, or contaminate, lake sediments), but may not be strictly necessary if interested in terrestrial organisms or aquatic macroorganisms. However, metabarcoding and quantitative methods (qPCR and ddPCR) can be used to process more samples at lower cost and are therefore recommended for long time series or large data sets, provided there is sufficient sedaDNA preservation. If DNA preservation is poor and molecules of the target taxa are likely to be rare in the sedaDNA mixture, then a target enrichment approach is recommended.   Acknowledgments: For case studies A1 and A4, we thank the Swedish Phytogeographical Society (SVS) for funding this work through the B. Lundman's fund for botanical studies scholarship. For case study A3, we thank Cécile Chardon and Louis Jacas (UMR CARRTEL INRAE, France) for the laboratory work of case study A3. For case study A5, we thank Claudia Havel for assistance in the lab and Paolo Ballota, Zafar Mahmoudov and Rasmus Thiede for assistance during fieldwork. For case study A6, we thank Francisco Javier Ancin Murguzur for assistance in the field, Iva Pitelkova for conducting LOI, Enrique Tejero Caballo and Karina Monsen for support with XRF scanning and highresolution imagery, and Youri Lammers for assistance with bioinformatics. For case study A7, we thank Ludovic Gielly for his assistance in labwork. We would also like to thank ASTERS, the manager of the Haute Savoie natural reserves, for constant help since the onset of the palaeoenvironmental studies on Lake Anterne. We are grateful for the useful comments and suggestions from three anonymous reviewers.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A.
Appendix A.1. Case Study A1-Effects of Sediment Type on PCR Amplification Success By Kevin Nota 5 and Laura Parducci 5,31 When amplifying DNA using PCR, or performing any other enzymatic reaction, inhibitors co-extracted with DNA can have adverse effects on performance. Depending on the concentration and type of the inhibitor and the particular enzymes, effects can be highly variable e.g., some enzymes are more sensitive than others.
The aim of this case study is to establish if PCR inhibition of lake sedaDNA is related to the type of sediment, here from minerogenic to organic sediments. Briefly, we sub-sampled 138 sediment layers from a single varved lake sediment record Lago Grande di Monticchio, southern Italy, time period~1993-31,190 years before present yr BP). DNA was extracted using the PowerSoil kit (Qiagen) with some modifications (see Material and Methods section). The PCR inhibition was analyzed using a synthetic oligonucleotide (85 bp, including unique primer binding sites) spiked in qPCR reactions with standardized amounts (0.0001 pM). To evaluate the level of inhibition, the DNA extracts were diluted until no PCR inhibition was detected for the standardized amount of synthetic oligonucleotide. In particular, we defined the removal of the effect of PCR inhibitors when the qPCR amplification curve of the synthetic oligonucleotide was the same as for a qPCR reaction with ultrapure water instead of the DNA template. In addition, the effects of PCR inhibitors on the success of a DNA metabarcoding approach (here number of raw reads/DNA sequences) were studied using primers developed to amplify vascular plant communities (trnL-p6 loop, [201]). The Zymo OneStep PCR inhibitor removal kit was tested on ten samples with varying degrees of inhibition to investigate whether adding this purification step after DNA extraction with the PowerSoil kit improves qPCR amplification success. qPCR was used for testing inhibition, rather than using the regular PCR used for metabarcoding. This was done because qPCR has the advantage of visualizing the PCR amplification in real-time. This allows for scoring of the inhibition much more accurately and reduces hands-on lab time because an agarose gel is not necessary for the visualization of PCR success. One disadvantage of using qPCR is that these tend to be more sensitive to inhibitors than commonly used polymerases for metabarcoding. Dilution factors obtained in the qPCR are, therefore, not directly applicable to more robust PCR enzymes which are developed to work in more challenging samples. However, the qPCR does give a representation of relative levels of inhibition between samples. Making it highly valuable information for PCR optimization, because it is possible to select the most inhibited samples to test the metabarcoding protocol. The idea behind this is that if a PCR protocol works for the most inhibited sample in the core, the protocol will likely also work for the remaining samples.
The PCR inhibition was largely detected from sediments dated <25,000 years old ( Figure A1). However, from 25,000 years, no clear relationship was found between the types of sediments varying from minerogenic at the end of the late-glacial period (~15,000 years) to organic in the late Holocene period (c. 2000 years ago). In addition, PCR inhibition was not significantly correlated with total organic content (TOC) (results not shown, TOC data were available for samples between 11,000 and 22,600 yr BP) indicating that PCR inhibition is not related to the proportion of organic compounds in the sediment. One key result of this case study is the strong negative correlation (r 2 −0.659, p < 0.01) between PCR inhibition and the number of DNA amplicons sequenced ( Figure A2). In contrast, no correlation was detected between the DNA concentration and PCR inhibition. DNA concentration was also negatively correlated to age (r 2 0.608, p < 0.01). The Zymo OneStep PCR inhibitor removal kit was able to reduce the dilution factor required to remove inhibition by~50% (Table A1). Adding the extract through multiple Zymo OneStep columns, or running the extract multiple times through the same column did not improve the efficiency in removing more inhibitors.
Overall, it is unclear what exactly the inhibition signal represents. It might be related to the presence of certain plant taxa that produce different levels of inhibitors that are more difficult to remove during DNA extraction. Although we do not know the cause and origin of inhibition, we should recognize its importance during performance of DNA metabarcoding. It is therefore very important to understand the levels of inhibition present in sediment samples in order to either dilute DNA extracts or preferably optimize PCR reactions to guarantee the best amplification and avoid bias during metabarcoding analyses. Adding a Zymo purification step after extraction gives, in most cases, a reduction in inhibition but does not remove it all.  Figure A2. Comparison inhibition PowerSoil extracts with or without additional purification. The Zymo column was used once according to kit manual (green), inhibition after using two clean columns (light blue) and running the extract three times through the same Zymo column (pink).

Material and Methods
DNA extraction-All 138 sediment samples were homogenzsed by vortexing before DNA extraction. DNA was extracted using the PowerSoil kit (Qiagen) with some modifications. After bead-beating, 2 µL of 20 mg/mL Proteinase K and 25 µL of 1 M Dithiothreitol (DTT) was added to bead-beating tubes and incubated overnight at 37 • C in a rotating incubator, the volume of Solution C3 was increased from 200 µL to 250 µL, the solution C4 was increased from 1200 µL to 1400 µL, samples were incubated for 10 min at room temperature before eluting twice in 60 µL elution buffer containing 10 mM Tris-HCL and 0.05% Tween-20; Bead-beating was done on a "normal" vortexer at the highest speed for 10 min by taping the tubes horizontally on the vortex.
Inhibitor control template design-The oligonucleotide used as template for the inhibition control was designed by generating a random 85 nucleotide fragments using R with a random sequence with a GC content close to 50%. Candidate sequences were checked for secondary structures using the IDT oligo analyzer (https://eu.idtdna.com/calc/analyzer (accessed on 1 February 2021)). Sequences that did not show any secondary structures were blasted against all sequences present in GenBank. We selected and further used only sequences with no matches in GenBank. The primer binding sites were manually edited to create the most "optimal" primer pair (Table A2). The 85 oligonucleotide "Inh_Nota85" nucleotide oligos were synthesized as primers, diluted to 0.0001 pM and used as the template in qPCR.    Figure A2. Comparison inhibition PowerSoil extracts with or without additional purification. The Zymo column was used once according to kit manual (green), inhibition after using two clean columns (light blue) and running the extract three times through the same Zymo column (pink). Inhibition testing with qPCR amplification-All qPCR reactions were run on the same qPCR machine (CFX96, BioRad). qPCR amplification was performed in 10-µL reaction volumes containing 1X TATAA SYBR ® GrandMaster Min, 0.5 µM forward primer, 0.5 µM reverse primer, 0.1 pM inhibitor control oligo. The thermocycling program was as follows: 30 s at 95 • C, followed by 50 cycles, of 5 s at 95 • C, 30 s at 55 • C, and 10 s at 72 • C. After the qPCR cycles a melt curve was obtained by increasing the temperature from 60 to 95 • C, with 0.5 • C increase every 5 s. qPCR amplifications were tested using: 3 µL undiluted DNA extracts, 1 µL undiluted extracts, and 1 µL extracts diluted 2X, 5X, 10X, 15X, or 20X. The dilution factor was used as inhibition value (e.g., inhibition score of 20 was given to samples that first amplified as expected with a 20x dilution).
Metabarcoding-Metabarcoding was done using the trnL g/h primers [201]. PCR reactions were done in 20-uL reactions containing the following, 0.04 U/µL Platinum™ II Taq Hot-Start DNA Polymerase (Invitrogen), 1X Platinum™ II PCR Buffer, 0.2 mM of each dNTP, 0.2 µM tagged forward primer, 0.2 µM tagged reverse primer, and 4 µL of DNA extract. The PCR Thermal cycler program was as follows, 94 • C for 2 min, followed by 55 cycles of 94 • C for 30 s, 55 • C for 30 s, 68 • C for 15 s. Both forward and reverse primers contained eight nucleotide tags at the 5 ends. We used 96 dual unique tags in both primers. PCRs were set up in 96 well-plates, each plate contained 2 × 40 samples randomly distributed in the plate, 4 PCR negatives, 2 × 4 extraction negatives; 4 wells were kept empty to control for background primer contamination (4 tagged primers are not used, their presence in the sequences would indicate primer contamination). In total 8 × 120 samples were amplified over 12 96-well plates. From all PCR reactions on a plate, 10 µL was taken and pooled together. Three times 100 µL pooled PCR products were purified using the MinElute PCR Purification kit (Qiagen), DNA was eluted in 20 µL and the three purifications from the same pool were pooled together. DNA was quantified using Qubit, and~100 ng of pooled DNA was used for library prep using the Carøe et al. [268] single-tube protocol with the flowing modifications: reaction volume was increased to 40 µL, after adapter ligation a min-elute clean-up was done instead of after the fill-in step to remove adapter dimers, no clean-up was done after the fill-in step. Index PCR contained the following, 2.5-U Pfu Turbo polymerase (Agilent Technologies), 0.2 mM of each dNTP, 1X Pfu reaction buffer, 0.2 µM of both index primers [269]. Each plate was indexed with unique dual indexes, plates 1-4, 9-12 were indexed for 10 cycles, and plates 5-8 for 16 cycles (low copy number of libraries, highly likely due to PCR inhibitors). Plates 1-4, 5-8, and 9-12 were pooled together in equal volume and sequenced on 3 MiSeq lanes using v2 300 cycle paired-end chemistry (Table A3). Sequencing was done at SciLifeLab, Uppsala (Sweden).
Zymo OneStep Inhibitor Removal Kit-For 10 samples~100 µL of the extract was run through the Zymo inhibitor removal kit following the instructions of the manufacturer. For two of the samples, the extract was run through the inhibitor removal column three times. Inhibitor tests as described before were run after every flowthrough. For another two samples, the extract was run through two separate Zymo columns and inhibitor removal assay was run after every flowthrough. For 16 samples the DNA concentration was measured with Qubit (ds DNA high sensitivity, Invitrogen) before and after purification, see Table A1. The mean recovery of DNA after purification was 91%. For some samples the recovery was nearly 100%; however, the three oldest samples had a recovery between 0.75 and 0.78. Anoxic sediments from the abyssal Northern Atlantic Ocean and ferruginous Lake Towuti, Indonesia, experienced inadvertent short exposure to oxygen during sampling, storage and aliquoting. This resulted in substantial secondary growth of metabolically versatile and facultative anaerobes in the samples. Oxidation of pore water redox sensitive species is difficult to monitor and has the consequence of replenishing energetically more favorable terminal electron acceptors for microbial respiration (i.e., O 2 , NO 3 2-, Fe 3+ , SO 4 2-) from their initially reduced counterparts (e.g., Fe 2+ , NH 4 , H 2 S, H 2 , CO 2 ). Although facultative anaerobes are normally minor in pristine samples, fast growers are known to outcompete slow growers in sediment incubations [270] and thereby rapidly mask the taxonomic assemblage corresponding to in-situ conditions. Here, we briefly report two cases of secondary growth in anoxic sediments due to unexpected pore water oxidation.
In the first case, the inner part of a piston core consisting of anoxic abyssal clay was subsampled on board using sterile end-cut syringes that were directly frozen in liquid nitrogen. The end of the syringe was systematically discarded [271]. We extracted and sequenced total DNA in four successive biological replicates [160]. Although sediments were immediately stored back at −80 • C, pore water oxidation occurred due to partial thawing of the sample, which allowed secondary growth to resume briefly with each freeze-thaw cycle ( Figure 3A in main text). The taxonomic assignment pointed to microorganisms from the water column that remained viable in the sediment, for instance Alteromonas and Alcanivorax among Gammaproteobacteria, Comamonadaceae (e.g., Variovorax, Hydrogenophaga) among Betaproteobacteria and Sphingomonadaceae (e.g., Sphingobium, Sphingomonas) among Alphaproteobacteria. These taxa are all known hydrogen-oxidizing oxalotrophic bacteria, using oxalate (C 2 O 4 2− ) as both carbon source and energy [272]. In the second case, anoxic iron-rich sediments were retrieved via gravity coring and aliquoted into successive 2-cm and 5-cm sections inside a glove bag filled with nitrogen [104]. The sediment rims were discarded using a sterile spatula, the remainder of each section transferred into aluminum-foil bags flushed with nitrogen and hermetically heat-sealed. The upper samples of one core were transferred into Falcon tubes instead and sealed with plastic foil. For DNA extraction, sediments were pressed out through a hole poked in the bag, not letting any air in, and heat-sealed again, whereas Falcon tubes were not hermetic enough to fully prevent oxygen diffusion into the sample. Because bottom waters in tropical Lake Towuti are 28 • C throughout the year [104], we stored all samples at room temperature to keep them close to in situ conditions. Ammonia is also a redox sensitive species in pore water, whose oxidation replenishes energetically favorable electron acceptors, mostly nitrate and nitrite. The subsequent reduction of these electron acceptors via denitrification can be coupled with microbial oxidation of iron, implying that secondary growth can still proceed once anoxic conditions are restored in the sample. Here, we compare pristine and oxidized sample replicates ( Figure 3B in main text) and show that secondary growth resulted in a doubled amount of extracted intracellular DNA [159]. The main taxa that actively grew during the four months of sample storage were identified as Betaproteobacteria inclusive of iron-oxidizing Gallionella, facultative photoheterotrophic anaerobe Rhodocyclus. These strains perform iron oxidation coupled to nitrate reduction and are often misinterpreted as obligate instead of facultative anaerobes. In addition, obligate anaerobic Deltaproteobacteria and Firmicutes such as Desulfovibrio, Desulfosporosinus and Pelotomaculum also grew in the samples due to metabolic versatility in the use of terminal electron acceptors (e.g., NO 3 − , Fe 3+ , SO 4 2− , S 2 O 3 2− ) [153]. To conclude, we draw attention to the presence of metabolically versatile and facultative anaerobes that remain viable in the sediment under anoxic conditions. We warn of rapid oxidation of pore water that would result in increased bacterial respiration rates with rapid secondary growth in the samples. For successive DNA extractions, we advise heating a sterile blade and cutting the whole frozen sediment at once into single-use 2-3-g aliquots to be kept at −80 • C.

Appendix A.3. Case Study A3-Variability in Eukaryotic Inventories across Different DNA Extraction Protocols
By Isabelle Domaizon 29,30 , Eric Capo 1 , Charline Giguet Covex 2 and Irene Gregory-Eaves 18 We evaluated how the use of different DNA extraction methods affected DNA inventories obtained for eukaryotic assemblages. Our case study focused on recent sediment samples (less than 100 years old) collected from four lakes. We compared samples from two large (>44 km 2 ) deep hard-water lakes (Bourget (LDB), Geneva (LEM)) and two small (<0.1 km 2 ), shallow and organic-rich high-altitude lakes (Lauzanier (LAZ), Serre de l'Homme (SDH)). For each lake, the DNA was extracted using two protocols: the Nucle-oSpin Soil extraction kit (NucleoSpin protocol-hereafter NS) and Taberlet2012 protocol (hereafter PB), the latter with two different sediment masses (0.75 and 4g). Extraction duplicates were performed for the three protocols (NS-0.75g, PB-0.75g and PB-4g), resulting in a total number of six DNA extracts per lake. An 18S metabarcoding approach was used to compare DNA inventories obtained for eukaryotic diversity, including microbial eukaryotes (e.g., bacillariophyta, chlorophytes, ciliates) and metazoan (metazooplankton, oligochaeta, teleostei).
Differences were detected in both total DNA concentrations and the composition of eukaryotes across the DNA extraction protocols ( Figure A3). For the two deep lakes (LDB and LEM), NS-0.75g and PB-0.75g protocols resulted in similar DNA concentrations (maximum of 503 ng.g wet sed −1 for one LEM sample with NS-0.75g) while the PB-4g protocol yielded much higher DNA concentrations (from 900 to 2205 ng.g wet sed −1 ). In contrast, higher DNA concentrations were obtained for the two shallow altitude lakes with the NS-0.75 g protocol compared to the PB protocol-0.75 and 4 g-(at least 2.5 and 6 times higher for LAZ and SDH, respectively). In terms of the composition of the assemblages, we found that the micro-eukaryotic DNA read s proportion ranged from 65 to 99% of inventories (with the exception of one NS-0.75g LDB replicate). In the two deep peri-alpine lakes, the richness in microbial eukaryotes and metazoans was found to be higher for DNA extracted using the NS protocol (by contrast with richness obtained from PB protocol either with 0.75 and 4 g sediment), while an opposite pattern was found for the two altitude lakes. We did not find a clear pattern associated with the MOTU numbers and the type of extraction nor the amount of sediment extracted.
The structure of the eukaryotic assemblages (here shown as hierarchical clustering trees) from the two deep peri-alpine lakes appears to be primarily associated with DNA extraction protocol whereas the quantity of sediment used was a more distinguishing variable for the two shallow altitude lakes ( Figure A3). The composition of both microeukaryotic and metazoan inventories varies between lakes but also for each DNA extraction protocol. For example, ciliates reads were more abundant in PB-0.75g than NS-0.75g inventories in LDB but an opposite pattern was observed from LEM and SDH. Similarly, higher bicosoecidan read numbers were observed in the LEM inventory obtained from the NS-0.75g compared to the PB-0.75g protocol (opposite for LAZ). Nevertheless, similarities were observed between inventories from low sediment masses (0.75 g) compared to the 4g extraction protocol, with the predominance of bacillariophyta reads in LAZ in the 0.75 g protocol. Nematodes were more easily detected in inventories from low (0.75 g) rather than high sediment masses (4 g). Additionally, we noticed a predominance of rotifers within the metazoans DNA reads when the NS protocol is used for LEM, LAZ and SDH but not LDB.
Altogether, our findings highlight that the eukaryotic community DNA signal is sensitive to the extraction protocol applied and the initial sediment sample mass extracted. Finally, the differences in signal also vary depending on the lakes studied (here four different lakes), which may be related to the composition of the microbial communities and the type of lake (and their catchment) studied, which could in turn have an effect on pre/post depositional taphonomic processes.  Figure A3. This figure describes, for each sample, the total DNA concentration (in ng.g sed −1 ), number of microbial eukaryotic and metazoan MOTUs, proportion of reads from microbial eukaryotes and metazoan groups. Samples were obtained from two large (>44 km 2 ) deep peri-alpine lakes (Bourget (LDB), Geneva (LEM)) and two small (<0.1 km 2 ), shallow, highaltitude lakes (Lauzanier (LAZ), Serre de l'Homme (SDH)). A code color discriminates between the three DNA extraction treatments (red color for NS-0.75g, green for PB-0.75g and blue for PB-4g). For each lake, hierarchical clustering analysis was independently performed from the total MOTU abundance table for all DNA inventories. NS-NucleoSpin protocol, PB-Taberlet2012 protocol.

Material and Methods
Study site-Lake Bourget (45°44′N 5°52′E, 18 km long, 2.8 km wide, maximum depth 145 m, mesotrophic, id: LDB) and Lake Geneva (46°26′N 6°33′E, 73 km long, 14 km wide, maximum depth 310 m, mesotrophic, id: LEM) are peri-alpine hard-water lakes. While Lake Bourget is the largest French natural freshwater reserve, Lake Geneva, located on the border between France and Switzerland at the northern end of the French Alps is the largest European lake. These lakes have experienced increases in nutrient inputs, especially phosphorus, since the 1950′s. Lakes Serre de l'Homme (44°46′28.3" N 6°23′45.4" E, 2235 m asl, 0.25 ha lake surface area, 1.95 ha catchment surface area, 1.5 m maximum depth, id: SDH) and Lauzanier (44°22′43.9" N 6°52′20.4" E, 2285 m asl, 3.85 ha lake surface area, 148 ha catchment surface area and 8 m maximum depth, id: LAZ) are small and shallow subalpine lakes located in the Southern French Alps (Parcs naturels des Ecrins and du Mercantour, respectively). Their catchments are made of sandstones and for Lauzanier, marls and calcareous rocks are also present. Sedimentological, geochemical and DNA analyses performed on Serre de l'Homme showed high development of aquatic plants in the last three hundred years, which led to higher organic matter accumulation in the sediments [39]. The Lake Lauzanier ecological state experienced important changes during the last 2000 years with an important increase in productivity and thus in organic content in the last 60 years [273]. In addition to the geology, size and depth, these highaltitude lakes differ from the peri-alpine lakes by the presence of ice cover for approximately 7 months each year.
Sediment sampling-Sediment cores from Lake Bourget and Lake Geneva were sampled during the ANR Iper Retro program. The detailed sampling and subsampling procedures were reported in previous articles where sedimentary DNA was analyzed (e.g., Figure A3. This figure describes, for each sample, the total DNA concentration (in ng.g sed −1 ), number of microbial eukaryotic and metazoan MOTUs, proportion of reads from microbial eukaryotes and metazoan groups. Samples were obtained from two large (>44 km 2 ) deep peri-alpine lakes (Bourget (LDB), Geneva (LEM)) and two small (<0.1 km 2 ), shallow, high-altitude lakes (Lauzanier (LAZ), Serre de l'Homme (SDH)). A code color discriminates between the three DNA extraction treatments (red color for NS-0.75g, green for PB-0.75g and blue for PB-4g). For each lake, hierarchical clustering analysis was independently performed from the total MOTU abundance table for all DNA inventories. NS-NucleoSpin protocol, PB-Taberlet2012 protocol.

Material and Methods
Study site-Lake Bourget (45 • 44 N 5 • 52 E, 18 km long, 2.8 km wide, maximum depth 145 m, mesotrophic, id: LDB) and Lake Geneva (46 • 26 N 6 • 33 E, 73 km long, 14 km wide, maximum depth 310 m, mesotrophic, id: LEM) are peri-alpine hard-water lakes. While Lake Bourget is the largest French natural freshwater reserve, Lake Geneva, located on the border between France and Switzerland at the northern end of the French Alps is the largest European lake. These lakes have experienced increases in nutrient inputs, especially phosphorus, since the 1950 s. Lakes Serre de l'Homme (44 • 46 28.3" N 6 • 23 45.4" E, 2235 m asl, 0.25 ha lake surface area, 1.95 ha catchment surface area, 1.5 m maximum depth, id: SDH) and Lauzanier (44 • 22 43.9" N 6 • 52 20.4" E, 2285 m asl, 3.85 ha lake surface area, 148 ha catchment surface area and 8 m maximum depth, id: LAZ) are small and shallow subalpine lakes located in the Southern French Alps (Parcs naturels des Ecrins and du Mercantour, respectively). Their catchments are made of sandstones and for Lauzanier, marls and calcareous rocks are also present. Sedimentological, geochemical and DNA analyses performed on Serre de l'Homme showed high development of aquatic plants in the last three hundred years, which led to higher organic matter accumulation in the sediments [39]. The Lake Lauzanier ecological state experienced important changes during the last 2000 years with an important increase in productivity and thus in organic content in the last 60 years [273]. In addition to the geology, size and depth, these high-altitude lakes differ from the peri-alpine lakes by the presence of ice cover for approximately 7 months each year.
Sediment sampling-Sediment cores from Lake Bourget and Lake Geneva were sampled during the ANR Iper Retro program. The detailed sampling and subsampling proce-dures were reported in previous articles where sedimentary DNA was analyzed (e.g., [40]). Sediment cores from lakes Serre de l'Homme (SDH 09 P1; N • IGSN: IEFRA00AW) and Lauzanier (LAZ 12 P1; N • IGSN: IEFRA0082) were taken in 2009 and 2012, respectively. The sampling procedure is described in previous articles (e.g., [39]) as well as in protocols.io (dx.doi.org/10.17504/protocols.io.bdwsi7ee). Ages and sedimentological features of sediment samples are presented in Table A4. DNA extractions-For each lake, DNA was extracted from two sediment layers using the three following DNA extraction protocols: the NucleoSpin Soil kit (NS protocol) with 0.75 g of sediment (NS-0.75g), the Taberlet2012 protocol (PB protocol) with 0.75 g (PB-0.75g) and 4 g of sediment (PB-4g). The three extraction protocols were performed in duplicate for each lake. All protocols (NS and PB) are based on the same approach, with five main steps: cell lysis (NS) or DNA desorption (Taberlet2012), filtration through a pre-column (to retain contaminants), precipitation and fixing of DNA on a membrane, washing of the membrane and elution of extracted DNA. For the NS protocol, DNA extraction was performed on approximately 0.75 g of wet sediment using the NucleoSpin ® Soil kit according to the manufacturer's instructions (Macherey-Nagel, Düren, Germany). This kit includes a step for cell lysis to release intracellular DNA (both chemical treatment via the use of lysis buffer and physical treatment via the use of magnetic beads), but is limited in terms of amount of sediment that can be treated (less than 1 g). For the Taberlet2012 protocol, DNA extraction was performed on approximately 0.75 and 4 g of wet sediment by using the procedure adapted from Taberlet et al. [191] and Giguet-Covex et al. [31]. The detailed protocols are accessible at https://dx.doi.org/10.17504/protocols.io.beenjbde (accessed on 1 February 2021) for extraction of 4 g of sediment and https://dx.doi.org/10.17504 /protocols.io.betsjene (accessed on 1 February 2021) for extraction from 0.75 g of sediment. As a first step phosphate buffer (0.12 M Na 2 HPO 4 ; pH ≈ 8) is used to desorb DNA attached to particles. For the extraction of 0.75g of sediment, we used the NucleoSpin ® Soil kit (Macherey-Nagel). For the extraction of 4g of sediment, the NucleoSpin ® Plant II Midi kit (Macherey-Nagel) is then used to fix the desorbed DNA, however we also used the SB buffer (solution from the NucleoSpin Soil kit) to precipitate the DNA. After elution, DNA was kept at −20 • C until PCR and downstream analysis. The bulk DNA concentration was estimated using a Nanodrop ND-1000 Spectrophotometer (Thermo Scientific, Wilmington, DE, USA).
PCR amplifications, high-throughput sequencing-A 260-bp long fragment of the V7 region of the 18S rRNA gene was PCR amplified, from c.a. 25 ng of environmental DNA extract for each sample in duplicate using the general eukaryotic primers 960f (5 -GGCTTAATTTGACTCAACRCG-3 ) [274] and NSR1438 (5 -GGGCATCACAGACCTGTTAT-3 ) [275]. Each PCR was performed in duplicate in a total volume of 25 µL containing 3 µL of 10x NH4 reaction buffer, 1. Bioinformatics and data analysis-The paired-end reads were merged together using UPARSE tools (option-fastq_mergepairs with a minimal overlap equal to 150 and no mismatch allowed) [276] allowing the attainment of 5,847,791 raw DNA sequences. These DNA sequences were then submitted to three cleaning procedures: (i) no undefined bases (Ns), (ii) a minimum sequence length of 200 bp, and (iii) no sequencing error in the forward and reverse primers. Putative chimeras were detected by UCHIME [277]. The combining of these cleaning procedures with a read length (2 × 250 bp) covering the entire amplicon length (~220 bp without the primers) allowed the sequencing of each base twice and the drastic minimization of the sequencing errors. After this cleaning step and demultiplexing process, the remaining DNA sequences were clustered at a 95% similarity threshold with UPARSE 7.0 [276] to obtain the seed MOTUs (option-cluster_fast). The taxonomic affiliation was performed by BLAST against the SSURef SILVA database [278] after application of the following selection criteria: length >1200 bp, quality score >75% and a pintail value >50; the SILVA database was enriched by lacustrine DNA sequences originating from various studies on lacustrine systems [279][280][281][282]. The taxonomy of a MOTU corresponded to the best hit given by the similarity search. If an OTU was associated with several best hits (hits with the same identity), then the taxonomy was the common taxonomy of these hits named also the lowest common ancestor. MOTUs affiliated to embryophytes, cercozoa nuclear and unclassified Eukaryota were removed from these analyzes in order to consider only unicellular eukaryotes, fungi and metazoa in this analysis. The two molecular inventories obtained from one SDH DNA extract obtained using the protocol PB-0.75g showed low amounts of read abundances and were thus removed for further analysis of the dataset. For each sample, the molecular inventories were merged and MOTUs detected in only one of the two duplicates were discarded. In addition, all MOTUs with a total abundance lower than 3 DNA read sequences were removed. Finally, the resulting 23 molecular inventories were standardized at 26,584 resulting in a final MOTU relative abundance table with a total number of 4802 MOTUs. For each lake independently, a hierarchical clustering tree was realized based on the Bray-Curtis dissimilarity matrices calculated from the final MOTU relative abundance table using the hclust function from the R package "vegan" [283].
Appendix A.4. Case Study A4-Variability in Biological Groups across Different DNA Extraction Methods By Kevin Nota 5 and Laura Parducci 5,31 We extracted extracellular DNA (exDNA), intracellular DNA (inDNA) and total DNA using four different extraction protocols and compared qPCR amplification results for six biological groups (bacteria, eukaryotes, diatoms, plants, vertebrates and arthropods). DNA was extracted from seven sediment samples (each ca 0.25 g) collected from European and Russian lakes and dated from ca 2000 to 42,000 years BP. Total DNA was extracted using three protocols: (i) the standard PowerSoil kit protocol (PS protocol), (ii) a modified PowerSoil kit protocol using Andersen's lysis buffer (aPS protocol), (iii) a Powersoil protocol coupled with Phosphate Buffer to extract the extracellular (exPS protocol) and intracellular DNA (inPS protocol) fractions. All samples were extracted and qPCR amplified in duplicates. In all samples, including controls, we measured the quantification cycle (Cq) value-the number of PCR cycles necessary to detect a signal above fluorescence background-and the melting temperature (Tm) value, which gives information about sequence composition of the amplicons.
PowerSoil protocol extracted DNA amounts (557 ± 448 ng.g sed −1 ) on average atleast 10-fold higher than other protocols (34 ± 29, 24 ± 25, 24 ± 15 ng.g sed −1 for exPS, inPS, and aPS protocols, see Table A5). Adding together the exDNA and inDNA fractions did not result in the amount obtained with the Powersoil protocol. The qPCR results are shown in Figure A4. The majority of Cq values from samples were lower than those from extraction and PCR negatives suggesting successful amplification of the targeted organisms from sedaDNA. The extraction negatives show on average lower Cq values than the PCR negatives in all groups of taxa, suggesting that contamination may have occurred during DNA extraction but not during the PCR step. Tm values tend to be different between extraction and PCR negatives compared to the samples, indicating that amplified products are different, with the exception of bacteria. The within-protocol variation in Tm is likely due to the different ages and locations resulting in differences. However, if both extraction protocols extract the same diversity, the within and between extraction protocols should be similar, which is not the case for most of the extraction protocols. With the exception of bacteria, Tm values varied extensively between extraction protocols. The exPS and aPS protocols show melting temperatures dissimilar to the inDNA and Powersoil protocols for arthropods and eukaryotes. For plants, Tm values were different for the four DNA extracts, indicating that the different protocols may have an impact on the amplified diversity, and there may be a difference between the inDNA and exDNA fractions. The results show a similar trend in amplification between the different taxa and extraction protocols and overall, the lowest Cq values were obtained when the total DNA was extracted (PS protocol). With the inPS and exPS protocols, Cq values did not vary between taxa, except for Arthropods where the inDNA fraction was better amplified than the exDNA fractions and shows Cq values close to those obtained using the PS protocol.
Overall, our results show that PS protocol is the most efficient in extracting DNA from the sediments for all groups of taxa. The PowerSoil extraction with modified lysis buffer (aPS protocol) seems to reduce extraction efficiency for all taxa. Combined inDNA and exDNA fractions do not come close to the amount of DNA extracted with the PowerSoil kit, indicating that the inDNA and exDNA does not represent the whole fraction, and making direct comparisons difficult. inDNA and exDNA show similar results for most taxa, except for insects.

Material and Methods
Study sites-The study sites are listed in Table A6. DNA extraction-Three extractions protocols were conducted on seven sediment samples in duplicate; see Figure A5 for the extraction schemes. All extractions were performed from ca 0.25 g of sediments. The Sodium Phosphate buffer approach according to Taberlet et al. [191], with addition of extraction of cell pellet according to Alawi et al. [286] was used. An amount of 675 µL 0.12 M NaP buffer (pH 8) was added to the sediment and incubated at room temperature for 15 min. Samples were vortexed and centrifuged for 10 min at 500× g at room temperature, supernatant was transferred to a clean 2-mL tube. This was repeated twice (three times in total). The pooled supernatant was centrifuged at 10,000× g for 30 min at room temperature. The supernatant was transferred to a 2-mL 30 kDa Centrifugal Filter Unit to concentrate DNA. Concentrated DNA was further purified with the PowerSoil kit after the lysis step. The pellet was resuspended and 150 µL nuclease free water, before extraction with the PowerSoil kit. Total DNA was extracted using an unmodified PowerSoil kit protocol (PowerSoil protocol), except for 10 min incubation at room temperature before eluting. Bead-beating was done on a "normal" vortexer at highest speed for 10 min by taping the tubes horizontally on the vortex. For the aPS protocol, 675 µL Andersens lysis buffer was added to sediment [62] and incubated overnight at 37 • C in a rotating incubator rack. Samples were centrifuged for 1 min at 10,000× g and transferred to a clean 2-mL tube. Supernatant was extracted further with the PowerSoil kit after the kit lysis step. The lysis buffer contained the following, 8 mM N-lauroylsarcosine sodium salt, 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 20 mM EDTA (pH 8.0), 50 µL 2mercaptoethanol (per 1 mL of lysis buffer), 33 µL DTT (per 1 mL of lysis buffer), and Proteinase K [287]. DNA for all extracts were quantified with a Qubit ® 2.0 Fluorometer Qubit™ 1X dsDNA µHS Assay Kit. The total DNA extraction has on average at least a 10-fold higher DNA concentration than the other extraction protocols. Counting the exDNA and inDNA together does not result in a similar amount of DNA.

Material and Methods
Study sites-The study sites are listed in Table A6. DNA extraction-Three extractions protocols were conducted on seven sediment  Southern Sweden Lake~15,000 [285] the kit lysis step. The lysis buffer contained the following, 8 mM N-lauroylsarcosine sodium salt, 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 20 mM EDTA (pH 8.0), 50 µ L 2-mercaptoethanol (per 1 mL of lysis buffer), 33 µ L DTT (per 1 mL of lysis buffer), and Proteinase K [287]. DNA for all extracts were quantified with a Qubit ® 2.0 Fluorometer Qubit™ 1X dsDNA µ HS Assay Kit. The total DNA extraction has on average at least a 10-fold higher DNA concentration than the other extraction protocols. Counting the exDNA and inDNA together does not result in a similar amount of DNA.
. Figure A5. Overview of the extractions. In the graph a represents the intracellular and extracellular DNA extraction, and b the Andersen lysis buffer total DNA extraction.
qPCR amplification-All qPCR reactions were done on the same qPCR machine (CFX96, BioRad). qPCR amplification was done in 10-µ L reaction volumes containing the following, 1X TATAA SYBR ® GrandMaster Min, 0.5 uM forward primer, 0.5 uM reverse primer (0.2 uM was used instead of 0.5 uM for the diatom primers). The thermocycling program was the same for all amplifications except for the annealing temperatures (see Table A7). The thermocycling program was as follows, 30 s at 95 °C, followed by 60 cycles of 5 s at 95 °C, 30 s annealing (48-59 °C), and 10 s at 72 °C. After the qPCR cycles a melting curve was obtained by increasing the temperature from 60 to 95 °C with 0.2 °C increase every 5 s  Figure A5. Overview of the extractions. In the graph a represents the intracellular and extracellular DNA extraction, and b the Andersen lysis buffer total DNA extraction.
qPCR amplification-All qPCR reactions were done on the same qPCR machine (CFX96, BioRad). qPCR amplification was done in 10-µL reaction volumes containing the following, 1X TATAA SYBR ® GrandMaster Min, 0.5 uM forward primer, 0.5 uM reverse primer (0.2 uM was used instead of 0.5 uM for the diatom primers). The thermocycling program was the same for all amplifications except for the annealing temperatures (see Table A7). The thermocycling program was as follows, 30 s at 95 • C, followed by 60 cycles of 5 s at 95 • C, 30 s annealing (48-59 • C), and 10 s at 72 • C. After the qPCR cycles a melting curve was obtained by increasing the temperature from 60 to 95 • C with 0.2 • C increase every 5 s.

Summary
We tested a suite of different kit-based extraction protocols on six different lake surface sediment samples from two geographical regions to evaluate the effects of the extraction procedure on the recovered plant diversity. The two regions, the Southern Taymyr Peninsula and the high Pamir Mountains, differ strongly in terrestrial vegetation, and in the type of lakes targeted. For two extraction kits, we compared protocols for extracting total DNA (standard kit buffers including a physical lysis) with protocols for extracting extracellular DNA, the latter based on using a Sodium Phosphate buffer wash as initial step [191]. Extracted DNA was amplified using the universal plant metabarcoding primers g and h [201], and PCR products were Illumina sequenced. We found that for samples from the Taymyr Peninsula, which has a high diversity and biomass of terrestrial vegetation, and where we targeted three small, shallow lakes, results were highly dependent on the extraction protocol, with the number of identified MOTUs ranging from below 10 to over 75 for a single sample. For samples from the high Pamir Mountains, where terrestrial vegetation is very sparse and low in diversity, and which originate from a very large, alkaline lake, the number of MOTUs was low (mostly below 10) for all extraction protocols, and no particular difference between protocols could be observed. Overall, protocols including a physical lysis step yielded a higher diversity than protocols relying on extracellular DNA.

Experimental Procedures
Six lake surface sediment samples from two geographical areas were processed for this study: from three small lakes from the Southern Taymyr Peninsula, Siberia, and from three locations within Lake Karakul, Pamir Mountains, Tajikistan, (Table A8). The lakes differ in their properties, as does the surrounding vegetation. The high latitude Southern Taymyr Peninsula is characterized by the transition from boreal larch forest to forest tundra and tundra, and altogether the area has a high vegetation cover [30]. The catchment of Lake Karakul is very arid and only sparsely vegetated [149,291]. Furthermore, the water of Lake Karakul is alkaline (pH~9), which negatively affects DNA preservation, while the Siberian lakes have near neutral pH values. DNA from the six surface sediment samples was extracted using four different DNA extraction kits (Table A9), some with modified lysis buffers. In particular, both for the NucleoSpin Soil Kit and the PowerMax Soil Kit, we conducted DNA extractions targeting extracellular DNA by replacing the kit lysis buffer by a wash with a Sodium Phosphate buffer [191]. For the PowerMax Soil Kit, we additionally used a third lysis buffer [292] that was used in a number of earlier sedaDNA studies [45,50,181,293]. Except for the NucleoSpin Soil Mini Kit, for which we used about 0.25 g, we used approximately 5 g of sediment for each extraction. Protocols using the kit lysis buffers were carried out according to the manufacturer's instruction. The wash with the Sodium Phosphate buffer was carried out as in Gebremedhin et al. [294]. For lysis with the "Bulat" buffer and Proteinase K, the sediments were incubated with the buffer overnight at 56 • C in a shaking incubator. Extracted DNA was amplified with the plant specific metabarcoding primers g and h [201] modified with an 8-bp identifier tag preceded by NNN at the 5 end and prepared for Illumina sequencing as in Heinecke et al. [149]. Bioinformatic processing was performed with the OBItools as in Epp et al. [46], with the modification that after filtering no minimum number of reads was required for a sequence type to be further considered.

Results
Samples from the lakes in the Southern Taymyr peninsula yielded MOTU numbers between less than 10 and more than 75, with a strong difference between different extraction protocols ( Figure A6). PCR products with a high number of MOTUs contained a high proportion of terrestrial vascular plants. Extractions of the surface sediment samples from Lake Karakul showed a much lower diversity. With a single exception, the PCR products yielded less than 10 MOTUs each, and the proportion of terrestrial vascular plants compared to aquatic plants was lower. The diversity retrieved in the samples from Lake Karakul was quite uniform among the samples.
In the samples from the Southern Taymyr Peninsula, highest MOTU numbers were retrieved with either the PowerMax Soil Kit with standard kit buffers, or with the NucleoSpin Mini Kit with standard kit buffers. In all three analyzed lakes, these two extraction protocols were first and second in terms of MOTU numbers. The highest numbers overall, and the highest degree of reproducibility between PCRs, was achieved with the PowerMax Soil Kit for the sample from Lake CH12. Apart from this sample, the degree of reproducibility was unfortunately low between PCR replicates in this experiment.
For the two protocols, which were carried out with and without a physical lysis step, the physical lysis resulted in higher diversity than a wash with a sodium phosphate buffer without lysis. This was true even when a much lower amount of sediment was inserted into the lysis and extraction, as was done for the NucleoSpin Soil Mini Kit.

Conclusions
In our tests DNA extraction protocols made a large difference in sediments from the Southern Taymyr Peninsula, an area that harbors a high diversity of terrestrial vascular plants. Surface sediment samples from Lake Karakul, which is surrounded by very sparse vegetation, yielded a much lower diversity, irrespective of the extraction method. For the Southern Taymyr peninsula samples, extraction protocols employing a physical lysis step resulted in higher diversity of plant MOTUs and two of the tested protocols (PowerMax Soil Kit and NucleoSpin Soil Mini Kit) yielded a relatively high diversity across all three samples, but reproducibility of results was highly variable. Our results suggest that preliminary analyses of samples from new areas, prior to conducting large-scale experiments with many samples, can decisively influence the quality of the results. vegetation, yielded a much lower diversity, irrespective of the extraction method. For the Southern Taymyr peninsula samples, extraction protocols employing a physical lysis step resulted in higher diversity of plant MOTUs and two of the tested protocols (PowerMax Soil Kit and NucleoSpin Soil Mini Kit) yielded a relatively high diversity across all three samples, but reproducibility of results was highly variable. Our results suggest that preliminary analyses of samples from new areas, prior to conducting large-scale experiments with many samples, can decisively influence the quality of the results. Appendix A. 6

. Case Study A6-A Protocol for Ancient DNA Extraction from Calcite-Rich Minerogenic Lake Sediments
Peter D. Heintzman 6 , Dilli P. Rijal 6 , Antony G. Brown 6,11 and Inger G. Alsos 6 Postglacial lake sediment records are often characterized by minerogenic sediments at their base, which were deposited before the development of organic-rich soils in the catchment and high rates of within-lake organic production. These sediments may be calcite-or silicate-based, or a mixture, depending on the catchment bedrock geology. During analysis of a lake sediment core derived from a calcite-rich catchment in northern Norway ( Figure A7A), we found vascular plant DNA metabarcoding results became far poorer (reduced taxonomic richness and PCR replicability) in these basal minerogenic layers. As calcitic minerals can tightly bind DNA [102,192], we hypothesized that our DNA extraction protocol was not effectively releasing DNA during the lysis step. We therefore developed a modification to digest calcite, using EDTA-based chelation, thereby releasing matrix-bound DNA before continuing with our existing protocol. We tested this new method on six sediment layers previously extracted using our original protocol. Appendix A.6. Case Study A6-A Protocol for Ancient DNA Extraction from Calcite-Rich Minerogenic Lake Sediments Peter D. Heintzman 6 , Dilli P. Rijal 6 , Antony G. Brown 6,11 and Inger G. Alsos 6 Postglacial lake sediment records are often characterized by minerogenic sediments at their base, which were deposited before the development of organic-rich soils in the catchment and high rates of within-lake organic production. These sediments may be calcite-or silicate-based, or a mixture, depending on the catchment bedrock geology. During analysis of a lake sediment core derived from a calcite-rich catchment in northern Norway ( Figure A7A), we found vascular plant DNA metabarcoding results became far poorer (reduced taxonomic richness and PCR replicability) in these basal minerogenic layers. As calcitic minerals can tightly bind DNA [102,192], we hypothesized that our DNA extraction protocol was not effectively releasing DNA during the lysis step. We therefore developed a modification to digest calcite, using EDTA-based chelation, thereby releasing matrix-bound DNA before continuing with our existing protocol. We tested this new method on six sediment layers previously extracted using our original protocol.
We first confirmed that the basal minerogenic sediments of Lake Gauptjern are calciterich by visual examination (Figure A7B), mass loss-on-ignition (LOI) analysis at 950 • C, and calcium/titanium (Ca/Ti) values derived from x-ray fluorescence (Table A10). Greatly elevated LOI 950 and Ca/Ti values in the basal minerogenic sediments suggest a significant contribution from calcium carbonate and confirm these sediments as likely representing a biogenic calcitic marl [295,296].
Across all samples and DNA extraction protocols, we detected 60 plant taxa. Our original extraction protocol worked well for the two organic-rich (O) sediment layers, which had LOI 550 values of 51-79%, resulting in a richness of 32-44 taxa (Table A10). However, the intermediate minerogenic-organic (M-O) through to minerogenic-rich (M) sediments display a declining taxonomic richness, with only 9-10 taxa identified in the basal calcitic sediment. The optimized protocol also exhibits a decline in richness, although this is far less severe with 35-38 taxa in the organic-rich layers through to 24-27 in the basal calcitic sediment. The two protocols gave comparable richness results for the organic-rich sediment samples, although we note that PCR replicability was reduced with the optimized protocol. All minerogenic sediment samples (M, M-O), with an elevated LOI 950 (>20%) and Ca/Ti (>100), exhibited a 1.5-3.0-fold increase in richness, with PCR replicability increasing by an average of 64%, when the optimized protocol was applied (Table A10, Figure 5D in the main text). The majority of taxa detected in the minerogenic sediment samples using the original protocol are also recovered with the optimized protocol (mean of 85%, Table A10). Altogether, these data demonstrate that our optimized DNA extraction protocol yields superior results for this particular type of minerogenic-rich sediment. We first confirmed that the basal minerogenic sediments of Lake Gauptjern are calcite-rich by visual examination (Figure A7B), mass loss-on-ignition (LOI) analysis at 950 °C, and calcium/titanium (Ca/Ti) values derived from x-ray fluorescence (Table A10). Greatly elevated LOI950 and Ca/Ti values in the basal minerogenic sediments suggest a significant contribution from calcium carbonate and confirm these sediments as likely representing a biogenic calcitic marl [295,296].
Across all samples and DNA extraction protocols, we detected 60 plant taxa. Our original extraction protocol worked well for the two organic-rich (O) sediment layers, which had LOI550 values of 51-79%, resulting in a richness of 32-44 taxa (Table A10). However, the intermediate minerogenic-organic (M-O) through to minerogenic-rich (M) sediments display a declining taxonomic richness, with only 9-10 taxa identified in the basal calcitic sediment. The optimized protocol also exhibits a decline in richness, although this is far less severe with 35-38 taxa in the organic-rich layers through to 24-27 in the basal calcitic sediment. The two protocols gave comparable richness results for the organic-rich sediment samples, although we note that PCR replicability was reduced with the optimized protocol. All minerogenic sediment samples (M, M-O), with an elevated LOI950 (>20%) and Ca/Ti (>100), exhibited a 1.5-3.0-fold increase in richness, with PCR replicability increasing by an average of 64%, when the optimized protocol was applied (Table A10, Figure 5D in the main text). The majority of taxa detected in the minerogenic sediment samples using the original protocol are also recovered with the optimized protocol (mean of 85%, Table A10). Altogether, these data demonstrate that our optimized DNA extraction protocol yields superior results for this particular type of minerogenic-rich sediment.
As well as impacting taxonomic richness estimates, the results from the two DNA extraction protocols would lead to differing ecological interpretations of the early post- As well as impacting taxonomic richness estimates, the results from the two DNA extraction protocols would lead to differing ecological interpretations of the early postglacial vegetation history of the catchment surrounding Lake Gauptjern. Based on the results from the original extraction protocol, we would interpret a species-poor vascular plant community with a higher-than-expected fern component (e.g., Gymnocarpium, Dryopteris) in the early postglacial (162.5 cm core depth), followed by a tripling of plant community richness and corresponding increase in community complexity between 158.5 and 142.5 cm core depth. On the other hand, the results from the optimized extraction protocol gives a more reasonable interpretation, with a diverse plant community in the early postglacial and richness less than doubling over the same interval.
In addition to these general patterns, the two protocols give contrasting results for the inferred first occurrence of key woody taxa, with crowberry (Empetrum nigrum), blueberry (Vaccinium myrtillus), lingonberry (V. vitis-idaea), pine (Pinus sylvestris), and alder (Alnus) first appearing within an 8-cm interval between 154.5 and 142.5 cm, based on data from the original protocol ( Figure 5D in main text). However, the optimized protocol indicates that crowberry was already present in the catchment during the early postglacial (162.5 cm), with blueberry and lingonberry appearing from at least 158.5 cm. The first appearances of pine and alder, on the other hand, are unaffected by the protocol used, which gives us high confidence that these taxa first appear in the record between 152.5 and 142.5 cm.
Although inhibition of DNA extracts, especially from organic-rich sediments, is a major problem and the focus of methodological optimizations [189], we show here that minerogenic-rich substrates can also impact DNA recovery. In this study, we used a modified digestion strategy to improve the release of mineral-bound DNA from calcite-rich sediments. Although our study concerned biogenic marl, we believe that the results would also hold for calcites of detrital or pedogenic origin. Our findings highlight that a onemethod-fits-all approach to lake sediment DNA extraction would have been inappropriate and have led to erroneous interpretations of the early postglacial succession of vascular plants around this lake. Footnotes: LOI, loss-on-ignition (at 550 and 550-950 • C); XRF, X-ray fluorescence; M, minerogenic-rich; O, organic-rich; Replic., replicability; Prop., proportion of taxa detected by the original protocol that were also detected by the optimized protocol; NEC, negative extraction control; NPC, negative PCR control.

Material and Methods
Study site, sediment sampling and geochemistry-Lake Gauptjern is located on a calcite-marble bedrock in northern Norway (400 m a.s.l.; 68.85647 N, 19.61843 E; Figure A7A) and has a lake area of 0.78 ha and a catchment area of 0.13 km 2 . The Gauptjern sediment record has previously been described and analyzed for palaeoecological proxies, including pollen and macrofossils [297]. In March 2017, we collected a 163-cm long sediment core (EG13) from Lake Gauptjern using a 10-cm diameter Nesje corer [298], which included the basal minerogenic layers identified by Jensen and Vorren [297]. We compared DNA extraction methods for six samples, which were taken from the minerogenic (M), intermediate (M-O), and organic-rich (O) layers (Table A10, Figure A7B). To confirm that the minerogenic sediments were a calcitic marl, we calculated mass loss-on-ignition (LOI) of dried sediment and performed x-ray fluorescence (XRF) scanning. For LOI and XRF, the methods followed Clarke et al. (2019), except that XRF was conducted at 5 mm resolution and LOI was calculated at both 550 • C (LOI 550 ; for organic carbon content) and 550-950 • C (LOI 950 ; for carbonate/inorganic carbon content) [295,296]. We took DNA and LOI samples concurrently in the specialized clean-room ancient DNA facilities at The Arctic University Museum of Norway in Tromsø.
DNA extraction-We initially extracted DNA following Rijal et al. [16], whose method uses 0.25-0.35 g of input and is modified from the Qiagen DNeasy PowerSoil kit (Qiagen Norge, Oslo, Norway). Due to poorer results from the minerogenic layers (Table A10), we re-extracted DNA using a modified protocol to improve the release of DNA from calciterich sediments. For this optimized protocol, we used a lysis buffer, consisting of 1 mL of 0.5 M EDTA (pH 8.0), 2.5 µL of 20 mg/mL proteinase K, and 32 µL of 1 M Dithiothreitol (DTT), that was incubated overnight at 56 • C. We then centrifuged the digested mixture to separate the supernatant from the pellet. We removed the supernatant, which we concentrated to 100 µL in a 30 kDa Vivaspin-500 column (GE Healthcare, Oslo, Norway). The concentrated supernatant was then used as input to the standard method, but with the bead-beating and overnight lysis steps omitted. PCR amplifications, high-throughput sequencing-We amplified each DNA extract using unique dual-tagged 'gh' primers [201] to target the vascular plant trnL p6-loop locus (following [49]), for eight PCR replicates per extract (see Table in figshare folder https: //doi.org/10.6084/m9.figshare.13007279.v1 (accessed on 1 February 2021)). We applied negative controls during both DNA extraction and PCR setup. After PCR, we pooled and purified PCR products into two pools following Clarke et al. [26]. One pool (JIE4) was shipped to FASTERIS, SA (Switzerland) and converted into a single-indexed PCRfree MetaFast library (following [26]). The second pool (AOHL-3-8) was converted into a unique dual-indexed library in Tromsø following the Illumina TruSeq DNA PCR-free protocol (Illumina, Inc., CA, USA), with the bead-cleanup steps modified to retain our short amplicon inserts. Both libraries were sequenced on~10% of separate 2x 150 cycle mid-output flow cells on the Illumina NextSeq platform at FASTERIS.
Bioinformatics and data analysis-We followed the bioinformatics pipeline presented by Rijal et al. (2020), which uses the ObiTools software package [261] and custom R scripts (available at https://github.com/Y-Lammers/MergeAndFilter (accessed on 1 February 2021)). Briefly, we merged overlapping paired-end reads and retained only those that merged. We then demultiplexed our tagged amplicons using the tag-PCR replicate lookup presented in. We then collapsed identical sequences, removed putative artifactual sequences from our data that may have derived from Illumina library index-swaps or PCR/sequencing errors, and removed sequences that had fewer than 3 reads in all PCR replicates. We next identified amplicon sequence variants (ASVs) that had 100% identity agreement with either the ArctBorBryo and/or EMBL nucleotide (rl133 release) databases, following Volstad et al. [49]. We further removed identified ASVs that 100% matched two "blacklists" consisting of known contaminants/exotics or synthetic sequences [16]. The final taxonomic assignment of the 71 retained ASVs was determined by one of us (Alsos) using regional botanical taxonomic expertise. We calculated the proportion of weighted replicates (wtRep) for each retained ASV in each sample using the method presented by Rijal et al. [16], which weights PCR replicate detections based on relative sequencing depth. For each of the six taxa that were represented by more than one ASV, we selected the ASV that had the greatest dataset-combined wtRep. This resulted in a final data set of 60 taxa that were used for all taxonomic richness and PCR replicability statistics. For visualization, we selected the top 40 taxa that had the greatest dataset-combined wtRep. We visualized the results in R v3.4.4 (R Core Team 2018) using the rioja v0.9-21 package [299]. PCR replicability was calculated as the mean proportion of detections in PCR replicates across all taxa within a sample.
Appendix A.7. Case Study A7-Improvement of DNA Extraction Methods for the Detection of Catchment Mammal DNA Signal By Charline Giguet-Covex 2 , Francesco Gentile Ficetola 14,15 and Pierre Taberlet 6,15 We tested if adding a concentration step to a DNA extraction protocol can improve the performance of extraction of sedaDNA. From the established DNA extraction protocol combining the use of a Phosphate Buffer step with the NucleoSpin Soil kit protocol (Taberlet2012 protocol), successfully used in several studies analyzing lake sedaDNA (e.g., [31,32,51]), we added a step with Amicon ® ultra-15 10k centrifugal filters (Millipore) to concentrate the DNA extract (AmTaberlet2012 protocol). eDNA was extracted from four sediment samples from the Lake Anterne (French Alps), already analyzed by Giguet-Covex et al. [31]. We used one sample dated back to the Neolithic age (roughly 4800 cal. years BP), one dated back to the Bronze age (3160 cal. years BP), one was from the Roman age (2010 cal. years BP) and one was very recent (-56 years BP). In the AmTaberlet2012 protocol, 15 g of sediments were mixed with 15 mL of phosphate buffer and centrifuged for 10 min at 10,000× g. Then, 12 mL of the resulting supernatant were transferred to Amicon ® ultra-15 10k centrifugal filters (Millipore), and then centrifuged at 4000× g for ultra-filtration and concentration of the buffer with DNA. The centrifugation time was variable among sediments; the aim of this step was to reduce the volume as much as possible, from 12 mL to 500-700 µL of supernatant containing a high concentration of DNA. For most of samples approx. 20 min of centrifugation allowed the reduction in the volume to the desired level. However, for a few samples a longer centrifugation time (up to 30 min) was required. An amount of 400 µL of the resulting concentrate was then kept as starting material for the following extraction steps, using the standard protocols with the NucleoSpin ® Soil kit (Macherey-Nagel, Düren, Germany) [167,300]. The extracted DNA was then amplified using the primers MamP007, following Giguet-Covex et al. [31]. These primers have been developed to mammal eDNA; they are able to identify >96% of mammals at the genus level or better [167]. We also amplified two DNA extraction controls plus three PCR controls to check for contamination; each sample was amplified in eight replicated PCRs. The amplified DNA was sequenced on an Illumina HiSeq 2000 platform; the retrieved sequences were analyzed using the Obitools software [261] following the pipeline described in Pansu et al. [32]. In order to compare the performance of the AmTaberlet2012 method vs. standard Taberlet2012 method, we used occupancy modelling to calculate the detection probability of these species per each PCR replicate, which is a measure of the performance of the two approaches to detect these species [183]. This approach was applied to the two most frequent mammal taxa (Bos and Ovis, i.e., cattle and sheep). Given the low amplification rate with the standard approach, the results obtained in this analysis were merged with ones produced by Giguet-Covex et al. [31], which used exactly the same protocol on samples with the same age collected during the same coring activities. Therefore, with the standard approach we have 16 PCR replicates per sample, and this greatly increases the power of statistical analyses [183].
Using the both approaches, we obtained Bos and Ovis sequences only from the Roman and from the recent sediment samples. Overall, for both species the amplification rate was higher using the AmTaberlet2012 method than the standard Taberlet2012 method ( Figure 5E in the main text). With the AmTaberlet2012 method, the detection probability of both cattle and sheep was 0.562 (95% confidence intervals: 0.32-0.78). The estimated detection probability was significantly lower when using Taberlet2012 method (sheep: 0.05; 95% confidence intervals: 0.008-0.28; cattle: 0.10; 95% confidence intervals: 0.03-0.29). Today, sheep are present in the lake catchment whereas cattle herds are absent. However, cattle were still there a hundred years ago. On one hand, the better detection of sheep (2/8 vs. 0/16) in the very recent sample confirms the better performance of the AmTaber-let2012 method. On the other hand, the unexpected higher detection of cattle (3/8 vs. 2/16) may reflect a soil DNA memory effect. In alpine soils, potato DNA sequences can persist at least 50 years [301]. In cultivated lands from Northern France, DNA sequences from grapevine are still detected in high quantity despite the crop already being strongly decreased 60 years ago [302]. The frequency of positive PCRs and the detection probability of domestic mammals was significantly higher with the Amicons (AmTaberlet2012 method), confirming their usefulness. Several recent studies used this approach on lake sediments and were able to successfully track (1) the presence of species, otherwise supported by other proxies (e.g., spore of coprophilous fungi; [35]) and (2) the relationships between the species distribution and environmental changes across time (e.g., [35,51,58,59]. A higher detection probability means that we are more confident of the presence-absence of species. By using the Amicons to concentrate DNA, we can thus better estimate the variation of species occupancy through time, and this, with a lower number of technical replicates, reduces time and cost of analyses [183,188].