The History of Tree and Shrub Taxa on Bol’shoy Lyakhovsky Island (New Siberian Archipelago) since the Last Interglacial Uncovered by Sedimentary Ancient DNA and Pollen Data

Ecosystem boundaries, such as the Arctic-Boreal treeline, are strongly coupled with climate and were spatially highly dynamic during past glacial-interglacial cycles. Only a few studies cover vegetation changes since the last interglacial, as most of the former landscapes are inundated and difficult to access. Using pollen analysis and sedimentary ancient DNA (sedaDNA) metabarcoding, we reveal vegetation changes on Bol’shoy Lyakhovsky Island since the last interglacial from permafrost sediments. Last interglacial samples depict high levels of floral diversity with the presence of trees (Larix, Picea, Populus) and shrubs (Alnus, Betula, Ribes, Cornus, Saliceae) on the currently treeless island. After the Last Glacial Maximum, Larix re-colonised the island but disappeared along with most shrub taxa. This was probably caused by Holocene sea-level rise, which led to increased oceanic conditions on the island. Additionally, we applied two newly developed larch-specific chloroplast markers to evaluate their potential for tracking past population dynamics from environmental samples. The novel markers were successfully re-sequenced and exhibited two variants of each marker in last interglacial samples. SedaDNA can track vegetation changes as well as genetic changes across geographic space through time and can improve our understanding of past processes that shape modern patterns.


Introduction
The Pleistocene epoch is characterised by glacial-interglacial cycles leading to pronounced changes of temperature and sea level [1][2][3], which in turn affect the prevailing ecosystems [4][5][6] and periglacial landscapes [7,8]. During glacial periods sea level was much lower than today, exposing the shallow arctic shelf, which extended the Eurasian continent northwards and connected it with the North American continent. Rapid climate warming at the onset of interglacial periods was accompanied by  [48]). Permafrost coring took place at the southern coast of Bol'shoy Lyakhovsky, west of Zimov'e River in April 2014 [48]. Permafrost records often do not reflect the full sequence of Quaternary periods due to complex periglacial processes [51]. Hence, sediments of different ages and accumulation types are located at different positions and elevations [51] and are exposed on steep bluffs of thermokarst depressions, thermo-erosional valleys and Yedoma hills [7]. Based on stratigraphic knowledge from previous studies [4,7,8,25,52], coring was undertaken at four localities to obtain a record spanning from the last interglacial to the Holocene (MIS 5-MIS 1). The cores were transported and stored continuously frozen. The placement of the cores within the local stratigraphy is shown in Figure 2. The cores are described from the oldest to the youngest deposits.  [48] (modified after [4,8,25]). The lower panel shows the schematic core recoveries with photographic examples [48].

Core Material
2.2.1. Core L14-04 and Hand-Pieces L14-04B and L14-04C  [48] (modified after [4,8,25]). The lower panel shows the schematic core recoveries with photographic examples [48]. The core L14-04 has a length of 8.1 m and was drilled in a thermo terrace about 2.5 km west of the Zimov'e River mouth [48]. The core can be subdivided into two units ( Figure 2): first (8.1-6.43 m depth), grey to brown ice-poor silt, faintly laminated with distinct black spots of reduced organic material and a micro lens-like cryostructure, and second (6.43-0 m), grey to brown ice-rich silt with rare spots of peaty inclusions, mostly with lens-like to blocky cryostructure [48]. A core barrel loss in the borehole prevented further drilling and four samples (hand-pieces) L14-04B were collected from the coastal bluff from layers resembling the upper part of core L14-04 while five samples L14-04C originated from ice-wedge casts resembling the lower part of the core.

Core L14-03
The core L14-03 has a length of 15.49 m and was drilled in a thermo-terrace about 1.0 km west of the Zimov'e River mouth and extends the record of the next core (L14-02) from a lower topographic position [48]. The core can be subdivided into six units from bottom to the top ( Figure 2): first (15.49-13.7 m), sand and gravel layers, which could not be sub-sampled (these stony deposits stopped further progress in the borehole), second (13.7-11.95 m), composite ice-sand wedge, ice-supported pebbles and partly clear ice, third (11.95-9.59 m), rich in vertical ice bands, fourth (9.59-8.62 m), ice-rich sand and pebble layers, fifth (8.62-6.02 m), composite ice-sand wedge, rich in vertical ice bands, and sixth (6.02-0 m), grey to brown silt with rare plant remains and lens-like cryostructure, partly with cm-thick ice bands.

Core L14-02
The core L14-02 has a length of 20.02 m. The drilling site was located on a Yedoma hill around 800 m west of the Zimov'e River mouth. The core can be subdivided into two main units ( Figure 2). The first (10.92-20.02 m) is composed of ice-wedge ice, which was not sampled for DNA analyses. The second unit (0-10.92 m) comprises Yedoma deposits composed of grey to brown and olive ice-rich silts with scattered plant remains and peaty inclusions and a mainly coarse lense-like cryotexture, partly with cm-thick ice bands. The core segment consists of a succession of palaeosol horizons.

Core L14-05
The core L14-05 has a length of 7.89 m and was drilled about 4 km west of the Zimov'e River mouth in a thermokarst depression (alas). The core is described as one unit of grey to brown silt with scattered plant remains and thin layers of peaty material, mostly with lattice to lens-like cryostructure ( Figure 2). In general, the ice-content increased towards the surface [48].

Radiocarbon Dating
Radiocarbon dating was performed using accelerator mass spectrometry (AMS) (6 MV Tandetron (High Voltage Engineering Europa B.V. (HVE), Amersfoort, Netherlands)) on plant macrofossils from 27 samples at the CologneAMS laboratory (University of Cologne, Germany) [53] following the same procedure as described in detail in [54]. Samples were corrected for carbon contributions from exogenous sources using size-matched, radiocarbon-free coal, which was processed similar to the samples. Samples close to the radiocarbon method limit were blank corrected only (a) if the 14 C concentration (fraction modern-Fm) of the sample was larger than those of the corresponding process blank; and (b) if twice the error of the sample's Fm value was smaller than the sample Fm [55]. The limiting age is calculated as −8033 × ln (2σ) because the 6 MV AMS at Cologne produces low blank values, due to its higher energy. The conventional radiocarbon ages are given in years before present (yr BP) and were calibrated (cal yr BP) in CALIB 7.0.2 using the IntCal13 calibration curve [56,57] ( Table 1).

Core Sub-Sampling
The sub-sampling of the permafrost sediment cores took place in a windowless climate chamber without any contact to laboratories in which molecular genetics work is performed. The temperature was set to −10 • C to prevent thawing of the core material. The sub-sampling procedure was carried out following the protocol described in [30] using the same equipment, treatments and facilities. In total, 72 subsamples (~2 samples per m) were collected for both sedaDNA and palynological analyses and stored at −20 • C, and +4 • C, respectively.

Sedimentary Ancient DNA Metabarcoding Approach
Sedimentary ancient DNA extractions and polymerase chain reactions (PCRs) followed the same procedures as described in [30]. In total, 72 samples and 8 extraction negative controls were processed, with extraction negative controls processed in the same way as the samples.
Generally, two positive PCR products were pooled for sequencing under the condition that the extraction and PCR negative controls remained negative. There were two exceptions. First, for one sample (core L14-03, 5.13 m depth) only one positive amplification could be retrieved. Second, for one PCR batch, which contained the deepest sample of core L14-04 and all hand-pieces, 2 out of 3 PCRs showed a PCR-product after gel-electrophoresis in the extraction negative control. Therefore, only the single PCR-products with the corresponding clean extraction negative controls were sequenced in the same run as all other samples. To identify the source taxa responsible for the PCR product of this extraction negative control, we pooled the two corresponding PCR products of the whole PCR batch (samples, extraction negative controls and PCR negative controls) and sequenced them in another sequencing run. This batch showed only exotic taxa in the extraction negative control (Musaceae, PACMAD clade, Persea and Ruta graveolens). The PCR negative controls of this batch showed no bands after the gel-electophoresis, yet 4 taxa with at least 10 sequence counts were obtained (Saliceae 724 counts, Potentilla 42 counts, Anthemidae 35 counts and Puccinellia 10 counts), which is why we decided to use the clean batch for our analyses.
We purified all PCR products using the MinElute PCR Purification Kit (Qiagen, Hilden, Germany), following the manufacturer's recommendations. Elution was carried out twice to a final volume of 40 µL. The DNA concentrations were estimated with the double stranded (dsDNA) BR Assay and the Qubit ® 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA) using 1 µL of the purified amplifications. Aiming for balanced DNA concentration among all samples, the purified PCR-products were pooled such that 60 ng DNA of each PCR-product was in the final multiplex. All extraction and PCR negative controls were included in the sequencing run. When their DNA concentration was below the Qubit detection limit, a standardised volume of 10 µL was added. Library preparation and paired-end sequencing were performed by Fasteris SA sequencing service (Geneva, Switzerland) using a Illumina HiSeq 2500 platform (2 × 125 bp) (Illumina Inc., San Diego, CA, USA) aiming for 10 Gb output. The library preparation using the HiSeq SBS V4 kit followed the MetaFast protocol, which has been developed by Fasteris especially for PCR amplicon metagenomic analyses, which is characterized by low diversity and low complexity of the amplicons.

Specific Amplification of Larix from sedaDNA
In 12 samples we detected reads assigned to Larix at the genus level or to Larix sibirica. To validate the presence of Larix-derived DNA in these samples (especially in those with less than 20 sequence counts) and to assess the potential of analysing single-nucleotide polymorphisms (SNPs) in sedimentary ancient DNA we developed new genetic markers. Based on a chloroplast genome-wide comparison between 12 individuals from the range of Larix gmelinii and 7 from the range of Larix cajanderi [58] we chose two SNPs to develop Larix-specific primer pairs for short amplicons using Primer3 [59] (Table 2). Primer specificity was assessed with in silico PCR on a database created from the EMBL Nucleotide Sequence Database release 129 using ecoPCR [60], which is also implemented in the OBITools [61]. EcoPCR generates a list of all sequences, including their associated taxonomic name, which can potentially be amplified with the given primers ( Table 2). We allowed for three mismatches in the primer sequence, but no mismatches in the last two bases at the 3' end. The PCRs were prepared in the exact same reaction composition and volume as described for trnL g and h primers in [30], with the exception of adding 4 µL DNA. The PCRs were carried out in the TProfessional Basic thermo cycler (Biometra GmbH, Göttingen, Germany): 94 • C for 5 min, followed by 55 cycles of 94 • C for 30 s, 58 • C for 30 s, 68 • C for 30 s and a final extension at 72 • C for 10 min. To monitor contamination the corresponding extraction negative controls were included as well as a PCR negative control. The negative controls were treated identically to the samples. The amplification success and product sizes were visually checked by gel-electrophoresis. The PCR products were purified using the MinElute Purification kit as described in Section 2.5.1. An A' overhang was produced by incubation with Sigma Taq DNA polymerase (Sigma-Aldrich Chemie GmbH, Taufkirchen, Germany) and 0. The primers of the retrieved sequences were trimmed without allowing for mismatches in the forward or reverse primer and aligned using the MAFFT multiple aligner v. 1.3.3 plug-in [62] in Geneious v. 7.1.4 [63] with default settings. To check target region specificity, the trimmed sequences were mapped to the chloroplast genome reference sequences [58] and subjected to a BLASTn search [64] in the National Center for Biotechnology Information (NCBI) sequence database.

Filtering of Illumina Sequencing Data and Taxonomic Assignments
The sequence quality was checked using FastQC [65] and all bioinformatic filtering steps were performed using OBITools [61] as described in [30]. The taxonomic assignments were performed using two reference databases as described in [66]: (1) the quality-checked and curated Arctic and Boreal vascular plant and bryophyte reference libraries [28,41,67]; and (2) the European Molecular Biology Laboratory (EMBL) Nucleotide Database standard sequence release 127 [68]. We use the term "sequence" to refer to the nucleotide sequence, which was generated during sequencing and to "sequence counts" to refer to the number of times this nucleotide sequence was generated. The sequences were assigned their taxonomic names due to sequence similarity to each of the reference databases using ecotag, following the NCBI taxonomy nomenclature [69]. If the query sequence is identical with exactly one database entry, the taxonomic name of the species will be assigned. However, if several database entries are identical, the query sequence will be assigned on a higher taxonomic level to the most recent common ancestor [61,70], e.g., on tribe (e.g., Saliceae) or family level. Hence, the taxonomic resolution of the sedaDNA dataset depends on the marker resolution for vascular plant taxa as well as the completeness of the reference database. When the same taxonomic name was assigned more than once, we grouped them together to mitigate a possible over-estimation of taxonomic richness.
Rare sequences that exhibited less than 10 counts in a sample were excluded from the analyses as they can arise through tag-jumps [71] and might increase the rate of false positives. Furthermore, only sequences were kept that matched 100% to an entry in a reference database. Exotic sequences assigned to cultivated plants or those highly unlikely to occur in arctic or boreal landscapes were also excluded from our analyses.

Pollen Sample Treatment and Analysis
In total, 72 samples were palynologically analysed. Pollen extraction of 1 mL sediment was performed following the standard procedure of [72] including HCL, KOH, HF (including boiling) and actetolysis. A Lycopodium spore tablet (Batch no. 1031; n = 20,848 ± 1460) was added to each sample to estimate the pollen concentration [73]. Pollen slides were analysed with a Zeiss Axioskop 2 light microscope (400-600× magnification). At least 300 pollen grains were identified in each sample. Taxonomic identification was based on published pollen and non-pollen palynomorph atlases [74][75][76][77][78][79][80][81][82][83][84] and a pollen reference collection at the Arctic and Antarctic Research Institute (Saint Petersburg) and the Alfred Wegener Institute (Potsdam). The systematic placement of Chenopodiaceae changed after the application of molecular techniques and is now included into Amaranthaceae [85]. To facilitate comparability with previous palynological studies in the Laptev region, we use Chenopodiaceae in this manuscript.

Statistical Analyses and Visualisation
For statistical analyses we excluded aquatic macrophytes, bryophytes, algae and taxa associated with wetlands and bogs, such as Cyperaceae from both datasets to reduce the effect of different depositional conditions, since we focus on changes in the terrestrial vegetation. Furthermore, we are interested in comparing the richness of terrestrial vascular plant taxa between the different time periods covered. Comparisons of taxonomic richness between samples with different count sizes can be biased as higher count sizes increase the chance of finding rare taxa. Therefore, we rarefied the data using R-functions rarecurve and rarefy. This permitted us to compare the richness at the same count size, determined by the minimum number of retrieved sequence counts among all samples (L14-03, 1.91 m = 893 counts) or by the minimum number of pollen (L14-03, 12.8 m = 82 counts) [86,87]. Two samples were excluded from the rarefaction analysis as they had very low sequence count sizes (L14-04, 2.05 m = 32 counts, L14-05, 7.22 m = 182 counts) and would have decreased the rarefied taxonomic richness far too much.
We used multivariate statistical analyses to explore the patterns of the multivariate data and to identify samples with a similar taxonomic composition from among the different cores, whose deposits comprise partially overlapping time-slices. We grouped the taxa to family levels, except for taxa with erect shrub or tree growth-form to get a better signal. If a family was represented by only one species or genus, we retained the species or genus name. Afterwards, we calculated the relative proportions of each taxon within each sample, and applied transformation by the fourth root for sedaDNA data and by square root for pollen data to mitigate the effects of very dominant and very rare taxa. We ran a principal component analysis using rda to assess the major structure of the multivariate dataset (see also Figures S2 and S3 in Supplement1.docx). We plotted the stratigraphic diagrams for each core separately using strat.plot. All statistical analyses were performed in R v. 3.0.3 [88] using the packages "vegan" [89], "rioja" [90] and "analogue" [91,92]. The original raw data are available at PANGAEA [93] and the terrestrial vascular plant datasets are available as supplementary material (Tables S1 and S5).

Radiocarbon Ages
Twenty-seven samples were radiocarbon dated ( Table 2). The lower part of core L14-03 had sediments older than 52 kyr BP, while the upper part was deposited between about 29.3 and 4 kyr BP (33.4 to 4.5 cal kyr BP). The deepest part of core L14-02 exhibited ages near the analytical capacity of the 6 MV Tandetron in Cologne until about 5.33-5.56 m depth. The upper part accumulated between 46.3 and 37.3 kyr BP, although samples between 3.47-3.77 m and 4.62-4.78 m do not age continuously with increasing depth. Such age inversions result from cryoturbations as well as from the high uncertainty range of ages near the radiocarbon dating limit [8]. The deepest unit below 5.83 m of core L14-05 returned ages of 51 to 54 kyr BP. They belong to thawed and refrozen Yedoma deposits below a thermokarst depression, called taberite. The sample at 5.00-5.17 m showed a very young age of 1.2 kyr BP, while samples from between 1.39-1.69 m and 0.65-0.85 m accumulated around 10.1 kyr BP and 2.2 kyr BP, respectively. The very young age at 5.00-5.17 m is probably due to the relocation of material during drilling. Overall, the radiocarbon ages support stratigraphical interpretations based on previous work [4,7,8,25,52,94].

Overall Composition of the sedaDNA Metabarcoding and Pollen Data
Illumina sequencing generated a total of 28,827,030 sequence counts. After bioinformatic filtering 11,808,836 sequence counts were assigned to 347 plant and algae sequences with a 100% match to an entry in a reference database. Of these, we assigned 5 sequences (233,081 counts) to trees, 5 (2,780,400 counts) to erect shrubs, 212 (5,200,502 counts) to herbs graminoids and non-erect dwarf-shrubs (including graminoids), 39 (1,737,207 counts) to swamp and aquatic (including sedges) taxa, 52 (123,958 counts) to bryophytes and 6 (3884 counts) to algae. Furthermore, we assigned and subsequently discarded 28 sequences (1,729,804 counts) to exotic taxa, since they are unlikely to have occurred in the Arctic and belong to cultivated and food plants. Three samples contained only exotic sequences after filtering, hence only 69 samples were further analysed.
The PCR negative controls of the sequencing run were all clean. Seven out of eight extraction negative controls were clean in the gel-electrophoresis picture while only five out of eight extractions negative controls were clean after sequencing (Table S2-S4). However, as we show in the Supplementary material (Tables S1-S4), the sequences found in those extraction negative controls do not compromise the results and interpretations of the associated samples.
The palynological analyses identified 57 terrestrial vascular plant taxa. Of these, the dominant taxa in the pollen assemblages are Poaceae and Asteraceae (especially Artemisia), followed by Alnus, Betula and Caryophyllaceae.

SedaDNA Metabarcoding
The core L14-04 is mainly composed of sequences assigned to Saliceae, Poaceae, Asteraceae and Rosaceae ( Figure 3). The rarefied richness ranges between 8 and 33 taxa (Figure 4). With increasing depth, richness first decreases strongly until 3.65 m (MIS 4) and then increases again towards high richness in the lowest part of the core between 8.03 and 6.15 m (MIS 5). This part of the core is characterised by the presence of trees (Larix, Picea, Populus and Pinus) and shrubs (Alnus, Betula, Ribes, Saliceae and Cornus) as well as Ericaceae, Galium and Kobresia, which suggests a warm period. Above 6.15 m no tree or shrub taxa are present, except for Pinus at 4.11 m and Saliceae.    The taxonomic richness of core L14-05 increases with decreasing depth until 3.7 m and remains high to medium until 1.44 m (transition from MIS 2 to MIS 1). Then the richness drops sharply at 0.9 m and remains low until the uppermost sample (MIS 1). The deepest part of the core is characterised by the absence of Saliceae sequences, which first occur at 5.05 m-the sample in which Populus is the only tree taxon. Samples between 4.8 and 1.44 m are characterised by the presence of shrubs (Alnus, Betula, Ribes, Cornus, Saliceae) and Ericaceae. Larix is present at 3.7 m, but we note that in the sample at 3.3 m Larix was recorded with 8 read counts but discarded during the filtering process as a putative false positive. Pinus sequences are recorded in the two deepest samples as well as the one at 0.7 m, but no other tree or shrub taxa are present in these samples.
The principal component analysis displays the major structure of the terrestrial vascular plant composition for the 69 samples processed by sedaDNA metabarcoding (Figure 5). A high proportion of the variance (41.97%) in the multivariate dataset is explained by the first two axes, with PC1 explaining 28% and PC2 14% of the variance. The variance along PC1 and PC2 can mostly be explained by Asteraceae, Saliceae, Rosaceae, Poaceae and Polygonaceae, which have the highest species scores. Except for Pinus, the vectors of all tree and shrub taxa point towards the lower right quadrant, along with taxa that are associated with subarctic/boreal vegetation such as Galium or

Pollen
The rarefied richness of terrestrial vascular plant taxa varies between 7 and 16 and remains relatively constant in comparison to the richness displayed by the sedaDNA data (  The terrestrial pollen assemblage of core L14-03 exhibits high proportions of tree and shrub taxa at 6.72, 7.59 and 11.77 m (MIS 5). Wood cells determined to the Pinaceae family support the presence of conifers in sample 11.77 m ( Figure 6). Furthermore, a peak of Pinus pollen is found at 12.87 m. This pattern is similar to the sedaDNA dataset. The major differences between the two datasets are first, the absence of Populus, Cornus and Ribes in the pollen record; second, the highest proportion of Salix pollen is found at 5.13 m, while no Saliceae is present in the sedaDNA record; and third, Pinus sedaDNA is only recorded in the deepest sample, while it is absent at the level where it has the highest proportion in the pollen record.
In core L14-02 (MIS 3) the proportions of tree and shrub taxa in the pollen record are low. Shrub taxa are represented throughout the core and increase in the two uppermost samples (MIS 1), similar to the sedaDNA record. The proportions of trees peak at 7.53 m, with Larix being present between 9.07 and 4.67 m. This supports the presence of Larix in the sedaDNA record at 7.53 and 7.35 m as being real, although the sequences were discarded as potential false positives. Picea and Pinus are present in most of the samples at proportions of less than 1%, except for samples between 9.07 and 6.25 m, which show increased proportions but less than 3% for Picea and less than 10% for Pinus. In

Validation of Sediment-Derived Larix Sequences and First Insights into Past Genetic Variation
EcoPCR results confirmed that the developed primer pairs are Larix specific. In silico PCR using primers cp77444 retrieved 3 hits, with only Larix showing no mismatches in both primer sequences ( Table 2). The in silico PCR using the second primer pair cp103595 resulted in 14 hits comprising all genera within the Pinaceae family. Except for Larix all other hits have at least 2 mismatches in the reverse primer sequence. Additionally, a BLASTn search of the target sequence for both primer pairs confirmed the Larix specificity. Based on the current status of the NCBI database, this allows for the validation of Larix-derived sedaDNA in our samples. Even though Larix was filtered out due to a low number of sequence counts at 3.3 m depth in core L14-05, it was successfully re-amplified and resequenced by both markers.
High-throughput sequencing allowed the detection of Larix sequences in 12 samples from three of the cores (L14-03, L14-04 and L14-05) and three hand-pieces (L14-04C). Out of these, 10 samples were successfully re-amplified by the marker cp77444 and 11 samples by the marker cp103595 (Table

Validation of Sediment-Derived Larix Sequences and First Insights into Past Genetic Variation
EcoPCR results confirmed that the developed primer pairs are Larix specific. In silico PCR using primers cp77444 retrieved 3 hits, with only Larix showing no mismatches in both primer sequences ( Table 2). The in silico PCR using the second primer pair cp103595 resulted in 14 hits comprising all genera within the Pinaceae family. Except for Larix all other hits have at least 2 mismatches in the reverse primer sequence. Additionally, a BLASTn search of the target sequence for both primer pairs confirmed the Larix specificity. Based on the current status of the NCBI database, this allows for the validation of Larix-derived sedaDNA in our samples. Even though Larix was filtered out due to a low number of sequence counts at 3.3 m depth in core L14-05, it was successfully re-amplified and re-sequenced by both markers.
High-throughput sequencing allowed the detection of Larix sequences in 12 samples from three of the cores (L14-03, L14-04 and L14-05) and three hand-pieces (L14-04C). Out of these, 10 samples were successfully re-amplified by the marker cp77444 and 11 samples by the marker cp103595 (Table 3). Cloning and Sanger re-sequencing was successful for 9 samples amplified by cp77444 and for 10 samples amplified by cp103595 (Table 3). All sequences aligned specifically with the target regions on the Larix gmelinii and Larix cajanderi chloroplast reference genomes and showed no mismatches in any of the primer sequences. Table 3. List of samples containing Larix sequences in the metabarcoding data, with their extraction negative controls (EN), sorted by the core or hand-pieces (HP) from which they were collected. PCR and cloning success (+: successful, −: not successful) are given, as is the number of retrieved sequences for each sample and the detected SNP variants of the two markers cp77444 and cp103595 including their abundance. If PCRs were not successful, the samples were not cloned (n.c.).

Core/HP Sample Depth
The first marker, cp77444, contains a SNP (A/C) at position 12 of the sequence. At this position all samples showed variant A, except for the sample at 8.03 m in core L14-04, which comprises variants of both A and C. The second marker, cp103595, contains a SNP (T/C) at the second-to-last position. In two samples only variant C was retrieved, in four samples only variant T and in four samples both variants were retrieved (Table 3).

Geochronology
The association between stratigraphic units and the approximate period of deposition is based on a combination of AMS radiocarbon dating of 27 samples with pre-existing dating results and palaeoenvironmental interpretations of local outcrop sections [4,7,8,25,52,94]. In permafrost landscapes, warm interglacial conditions lead to the melting of ground ice resulting in ground subsidence (thermokarst) and lake development [95]. The core L14-04 was taken from the Krest Yuriakh Suite (MIS 5), which is located in a stratigraphical context below deposits of the Yedoma Ice Complex (MIS 3). The core contains lacustrine deposits suggesting deposition in a thermokarst lake and its lower part can be assigned to MIS 5 [7,48]. As the suggested age of the core exceeds the limits of the AMS radiocarbon method, no samples were dated with this method. However, two infrared stimulated luminescence (IRSL) dates from the adjacent outcrop [25] suggest an MIS 5 age.
According to its stratigraphy, core L14-03 was taken in a thermo-terrace and extends the record of the core L14-02 from a lower topographic position. It contains deposits of the Kuchchugui suite associated to MIS 4. The radiocarbon ages within the upper 2 m of the core suggest deposition during MIS 3 while the uppermost sample suggests deposition during MIS 1.
The core L14-02 was drilled from the ground surface at the top of a Yedoma hill. Palaeoenvironmental interpretations [8] suggest interstadial deposition during MIS 3 with a succession of palaeosols in an ice-wedge polygon. Most of the radiocarbon-dated samples exhibit ages also near the analytical capacity of the 6 MV Tandetron in Cologne. This is often observed for deposits of MIS 3 origin, leading to high uncertainties or infinite ages [8].
Due to erosion and thermokarst processes, MIS 1 deposits are mostly preserved as lake deposits in thermokarst depressions, boggy deposits after lake drainage or as cover deposits on Yedoma hills [7]. According to stratigraphic interpretations from the adjacent outcrop section [7], the sediments below 5.5 m of core L14-05 are thawed and refrozen taberite deposits of the Yedoma Ice Complex (MIS 3). Between 0.9 m and 5.5 m, the core contains lacustrine deposits, which likely formed during the late glacial (MIS 2) or the transition from the late glacial to the early Holocene. (MIS 2 to MIS 1). Above 0.9 m, there are MIS 1 sub-aeolian deposits from a boggy environment that formed after lake drainage [7].

Assessment of Tree and Shrub Taxa in the sedaDNA and Pollen Records
Our vascular plant derived sedimentary ancient DNA record contains tree (Larix, Picea, Populus, Pinus) and erect shrub taxa (Saliceae, Alnus, Betula, Ribes, Cornus) in several strata, which can be associated to MIS 5, the transition between MIS 2 to MIS 1 and MIS 1. Generally, the presence of trees is associated with the presence of shrubs other than Saliceae, as well as with higher taxonomic richness among terrestrial taxa. We detected Larix sequences in three different cores and three hand-pieces (large soil nuggets), which were collected from four localities. When Larix pollen proportions are more than 4%, Larix is also represented in the sedaDNA record. Conversely, for 5 out of 16 occurrences of Larix in the sedaDNA record, the pollen proportion is less than 2%, including one sample where Larix pollen is absent. Additionally, some samples have Larix pollen proportions of up to 3%, but our criteria for sequence counts to be included in the sedaDNA analysis were not met. We were able to validate the presence of Larix-derived sedaDNA by re-amplification and re-sequencing with different primer pairs, corroborating that the low sequence counts were not an artefact. This is in contrast to previous results based on macrofossil and pollen analyses from outcrop sections at the same sampling locations. Kienast et al. [6,26] did not find any Larix macro-remains on Bol'shoy Layakhovsky Island, but did find some at the Oyogos Yar coast across the Dmitry Laptev Strait, leading them to suggest that the Larix treeline was located in the Dmitry Laptev Strait during MIS 5. Andreev et al. [25] detected Larix pollen with proportions of less than 1%, which they interpreted as re-worked pollen in the putative MIS 5 sequence R22+60 ( Figure 2). Hence, they conclude that Larix was not present at Bol'shoy Lyakhovsky at that time and that the island was covered by tundra dominated by Poaceae and Artemisia at the beginning of MIS 5 and by shrub tundra including Alnus, Betula and Salix at more protected sites during the MIS 5 optimum. However, a recent study [94] analysed pollen records from the Buchchagy Ice Complex (MIS 5) on Bol'shoy Lyakhovsky Island and detected well-preserved Larix pollen and Larix-type stomata, which supports our findings. As Larix pollen is generally under-represented in pollen records due to low pollen productivity and low pollen dispersal capacity [37,96], even a single pollen grain is usually accepted as evidence for the local to regional presence of Larix [97]. Further evidence, that Larix has the potential to grow at such high latitudes and even farther north was provided recently [98] by Larix wood from Kotelny Island (New Siberian Archipelago) dated to~38 kyr BP (MIS 3 interstadial). However, in the corresponding time-slice of core L14-02 no Larix sequences were detected.
Overall, Picea sequences are congruent with the presence of other tree taxa. In the deepest part of core L14-02 the sedaDNA record contains Picea sequences, but other taxa imply an open tundra landscape, in which Picea was not likely to be a component of the regional species pool.
Populus is mostly present in samples in which other tree taxa are present, except for one sample (L14-05, 5.05 m). Further evidence of Populus is needed to validate its actual occurrence in the Siberian high Arctic during MIS 5. Populus pollen is generally under-represented in the pollen record in comparison with its abundance in the vegetation [99] and due to its very low preservation potential as it is easily destroyed [100].
While the vectors of Larix, Picea and Populus point towards the lower right quadrant of the ordination plot ( Figure 3) along with shrubs, the Pinus vector is directed towards samples of low taxonomic richness, which are dominated by Poaceae, most likely reflecting sparse grass-tundra under colder climatic conditions. The presence of Pinus in the sedaDNA record conforms, in several samples, with the pollen data and thus could have resulted from long-distance transported pollen. As Pinus has a much higher pollen productivity and a much lower pollen fall speed in comparison to Picea and Larix [96,101], Pinus is known to display notable proportions in the pollen spectra of treeless environments such as the Arctic tundra [35,36]. The contribution of DNA from pollen to the sedaDNA record was also considered in [31]. However, it has been reported that DNA from pollen is not easily amplified [43,102,103]. In samples without a notable proportion of Pinus pollen, contamination could be a possible explanation, although all associated negative controls did not contain Pinus. The occurrence of Pinus in the sedaDNA record is dubious and hence not deemed an authentic part of the regional vegetation.
We consider Saliceae, Alnus, Betula, Ribes and Cornus as shrub taxa, but Alnus, Betula and Cornus can also take a tree growth-form. The genus Cornus contains species (e.g., Cornus suecica) that even exhibit the herb growth-form. The genera Ribes and Cornus were, however, not represented in the pollen record. Both genera are characterized by insect pollination [104,105], which usually results in an under-representation in comparison to wind-pollinated taxa. Saliceae is over-represented in the whole dataset, as was noted in a recent study [30] possibly because of the massive below-ground biomass of Salix dwarf-shrubs in tundra environments [106]. The short trnL (UAA) intron marker does not contain sufficient variation to allow the distinction between most species of Salix and Populus, leaving the resolution at tribe level (Saliceae). During colder periods, we interpret the presence of Saliceae as Salix, while during warmer phases, we cannot rule out a contribution of Populus.
Macrofossil evidence from the Oyogos Yar coast suggests that the past forest community during MIS 5 differs in comparison to the present-day northern larch forests in that they contained tree birch and alder, with birches probably dominating the vegetation [6]. As we still do not know much about the vegetation during MIS 5, we believe it could be possible that Populus was a component of the past vegetation mosaic, along with larch, spruce, birch and alder.

Terrestrial Plant Community Changes of Warm Phases since the Last Interglacial
Generally, samples reflecting warm phases display a high taxonomic richness of terrestrial vascular plants and are characterised by the presence of trees and shrubs in both the sedaDNA and the pollen records. Samples from the lower part of core L14-04 and the hand-pieces L14-04C are among the most diverse samples recorded in this study and comprise the tree taxa Larix, Picea and Populus, shrub taxa such as Ribes, Cornus, Alnus and Betula, as well as herbs such as Galium and Ericaceae. The sedaDNA and pollen records align well within the stratigraphic units. Core L14-03 contains a warm phase older than 58.3 ± 3.6 14 C kyr BP, starting at 11.77 m and ending approximately at 6.72 m in both the pollen and the sedaDNA data. The associated samples intermix with those from the core L14-04 and the hand-pieces L14-04C in the PCA ordination plot. Hence, an MIS 5 origin for these samples can be confirmed. The vegetation was likely a mosaic composed of open Larix, Picea and Populus forest, accompanied by erect (dwarf-) shrubs such as Alnus, Betula, Salix, Cornus and Ribes, as well as Ericaceae and Galium in the understorey. There were probably also open spots with Kobresia, a typical steppe taxon, and Papaver, a typical pioneer plant [6].
When the last interglacial ended, the cool Weichselian glacial period began and prompted pronounced vegetation responses. Our record shows reduced diversity in MIS 4 and an absence of trees and shrubs (except for Saliceae), which indicates at least a strong fragmentation or a general treeline retreat. Subsequently, the warmer MIS 3 interstadial shows intermediate taxonomic richness, at a level comparable with the MIS 4 period. The vegetation was open and dominated by herbs.
The Yedoma Ice Complex core L14-02 contains two warm phases. The palynological data suggest a first warm phase between 8.56 m and 5.38 m in which radiocarbon dating suggests was deposited during the early MIS 3 interstadial. This part contains very rare sequences of Larix at 7.35 and 7.53 m depth, which were discarded during the filtering process as probable artefacts. However, we were able to re-amplify Larix from a sample at 7.53 m depth with a different Larix-specific marker, corroborating the actual presence of Larix-DNA in the sedimentary record. Additionally, these two samples contain Populus (7.35 m) or Alnus and Galium (7.53 m) sequences. In comparison to the samples attributed to MIS 5, these samples show a much lower taxonomic richness and Ericaceae, Picea, Ribes or Cornus are not detected. The second warm phase is captured by both proxies in the near-surface samples at 0.36 m and 0.05 m and post-dates the Last Glacial Maximum (LGM). It is interpreted to be of Holocene origin and is characterised by Ericaceae/Ericales, Alnus and Betula, but it does not record any tree taxa, except for low proportions of Picea pollen.
The thermokarst depression deposits of core L14-05, especially between 3.7 m and 1.44 m reflect a warm phase during the transition from MIS 2 to MIS 1 and early MIS 1. On Bol'shoy Lyakhovsky Island, this time interval displays high taxonomic richness and comprises Larix as well as Betula, Alnus, Ribes and Cornus in addition to Saliceae. The presence of Larix on Bol'shoy Lyakhovsky Island after the LGM is remarkable, and gives weight to the current hypothesis that Larix persisted at high latitudes during the LGM [107,108] on sites with a favourable microclimate [109].
Over the course of the MIS 1, the record shows a decrease in diversity. The vegetation changes into high arctic treeless tundra with Saliceae as the only shrub taxon in the sedaDNA data. This supports the findings of a pollen record that displayed shrub associations at Bol'shoy Lyakhovsky in the early MIS 1 with a gradual disappearance of shrubs after approximately 7.6 14 C kyr BP [4].
Today, Bol'shoy Lyakhovsky is treeless and the question is, at which point in time did Larix disappear and why? Our sedaDNA data indicate the presence of Larix after the LGM, putatively during the Bølling-Allerød interstadial complex; yet Larix pollen is recorded in the late Holocene, in samples younger than about 2000 years. A possible explanation for the disappearance of larch could be the strong change in environmental conditions from a continental to a maritime setting caused by the MIS 1 marine transgression. In contrast to continental conditions, which are characterised by strong differences in seasonality with very cold winters and warm summers, maritime conditions buffer such temperature differences leading to relatively cool summers and milder winters [110]. Cooler summers could have decreased local evaporation rates leading to moister conditions [6,111]. As larch is adapted to extreme continentality, they are especially prone to water stress. Near Yakutsk, for example, a strong increase in precipitation in two consecutive years resulted in anoxic conditions in the rooting zone, leading at first to premature withering, and subsequently to the dieback of Larix cajanderi trees in the third year [112].
The present abiotic conditions in the Laptev Region are not comparable to those of the last interglacial. Based on macrofossil analyses Kienast et al. [6,26] suggested continental climate conditions due to tectonic subsidence leading to a putatively a less inundated Laptev Sea Shelf, which could have led in combination with higher solar insolation during summer to a longer growing season and drier soils.
Climate-vegetation models project an advance of the treeline in northern Siberia, reaching approximately to the coast by AD 2100, as well as a dominance of evergreen conifers [18]. Hence, MIS 5 could eventually provide a putative analogue for the future vegetation of northern Siberia under ongoing climate warming. The northwards spread of trees into areas which are presently covered by tundra and of evergreen Picea into deciduous Larix taiga would strongly affect the snow cover and albedo at high latitudes, leading to a positive climate feedback [22]; something which should be considered in future climate models. Yet, time-lags [24] due to, for example, the long generation time of conifers or competition, probably restrict the speed of a general treeline advance and the northward shift of Picea, making the modelled scenario of a coastal treeline by AD 2100 unlikely.

Past Genetic Diversity of Larch Populations on Bol'shoy Lyakhovsky Island
Initial attempts at analysing the past genetic diversity of larches [108] successfully analysed ancient DNA from 2000 to 7000 year-old larch macrofossils and a 16,000 year old larch twig from a mammoth intestine. By comparing the mitochondrial haplotypes between these macrofossils with the distribution of haplotypes in modern populations, it was concluded that admixed Larix sibirica × Larix sukacewii populations were already present on the southern Yamal Peninsula during the early Holocene [108]. This demonstrates that knowledge about the past distribution of genetic variation can improve our phylogeographic understanding of Larix. However, retrieving and analysing genetic variation from sediments is challenging as the primer pairs used to amplify the target region have to be specific for the taxon in question. The second objective of this paper was therefore to assess the potential of two newly designed paternally-inherited chloroplast markers, cp77444 and cp103595, to detect past genetic diversity from ancient sediments.
Overall, the amplification and cloning success is highest in samples of MIS 5 origin. As higher read counts are detected in these samples compared to some younger ones, we think that the amount of Larix-derived DNA is higher in these samples, increasing the chance of amplifying the target sequence. Regarding marker cp77444, most samples show only variant A, while the sample at 8.03 m depth of core L14-04 shows two variants. This implies that during MIS 5 both variants were present at Bol'shoy Lyakhovsky. Today, the prevailing larch species on the Siberian mainland south of Bol'shoy Lyakhovsky is Larix cajanderi Mayr, which covers a vast area (~120-160 • E [113]) between the Lena River in the west and the easternmost parts of Siberia in the Chukotka region. Adjacent to Larix cajanderi, Larix gmelinii (Rupr.) Rupr. covers an area (~90-120 • E [113]) from the Lena River in the east to the southern Taymyr Peninsula in the west. In a chloroplast genome-wide analysis of 19 individuals from across the tundra-taiga ecotone covering the ranges of Larix gmelinii (12 individuals) and of Larix cajanderi (7 individuals) a variant C of marker cp77444 was found, but only in individuals from the range of Larix gmelinii (southern Taymyr region, north-central Siberia), while individuals from the range of Larix cajanderi (lower Omoloy and lower Kolyma regions, north-eastern Siberia) exhibited variant A [58]. This suggests that variant A is probably not only dominant in the present but was also dominant in the past. The presence of variant C on Bol'shoy Lyakhovsky Island, which was only detected in individuals from the southern Taymyr region [87], indicates that this variant was, however, not as restricted geographically in the past as it probably is today.
Similar to the first marker, cp103595 displays both possible variants in the samples attributed to MIS 5 from core L14-04, implying that both variants were present at Bol'shoy Lyakhovsky during that time. Both variants of the cp103595 marker are distributed across both species ranges [58].

Conclusions
In contrast to previous pollen and macrofossil studies, which have reconstructed shrub-tundra as the prevailing vegetation on Bol'shoy Lyakhovsky Island, our proxies of pollen and sedaDNA indicate that trees were present during MIS 5 and during the transition from MIS 2 to MIS 1. Furthermore, the sedaDNA record suggests that Siberian northern forests were more diverse during the last interglacial than they are today. In comparison to the contemporary monodominant Larix forest, forests of MIS 5 additionally contained evergreen Picea and deciduous Populus. Picea and Populus likely underwent a range contraction, retreating southward, due to climate cooling after MIS 5 and only Larix was present afterwards. According to the pollen data, Larix, Alnus and Betula gradually disappeared over the course of MIS 1, which was probably the result of the marine transgression and the associated change from continental to oceanic conditions.
We successfully validated the presence of Larix in the sedaDNA using two novel chloroplast markers and were able to detect two possible variants for each SNP in one MIS 5 sample. For one marker we found that variant C, which has so far only been detected in the Southern Taymyr region among modern populations, was distributed across a much larger area in the past than it is today.
The novel markers are suitable for amplicon sequencing of ancient and modern Larix-DNA from environmental samples such as peat, lake or permafrost sediments.