Centuries-Old DNA from an Extinct Population of Aesculapian Snake (Zamenis longissimus) Offers New Phylogeographic Insight

: The Aesculapian snake ( Zamenis longissimus ) is distributed in Central and Southern Europe, the Balkans, Anatolia, and Iran, but had a wider mid-Holocene distribution into Northern Europe. To investigate the genetic afﬁnity of a Danish population that went extinct in historical times, we analysed three ethanol-preserved individuals dating back to 1810 using a silica-in-solution ancient DNA extraction method, combined with next-generation sequencing. Bioinformatic mapping of the reads against the published genome of a related colubrid snake revealed that two of the three specimens contained endogenous snake DNA (up to 8.6% of the reads), and this was evident for tooth, bone, and soft tissue samples. The DNA was highly degraded, observed by very short average sequence lengths (<50 bp) and 11–15% C to T deamination damage at the ﬁrst 5 (cid:48) position. This is an effect of specimen age, combined with suboptimal, and possibly damaging, molecular preservation conditions. Phylogeographic analyses of a 1638 bp mtDNA sequence securely placed the two Danish Aesculapian snakes in the Eastern (Balkan glacial refugium) clade within this species, and revealed one previously-undescribed haplotype. These results provide new information on the past distribution and postglacial re-colonization patterns of this species.


Introduction
Collections of biological specimens represent an essential resource for studying biodiversity on our planet. Over the past decade it has become increasingly clear that such museum collections not only offer a wealth of information on morphological biological diversity, but also serve as a record of past and present genetic diversity [1,2]. DNA obtained from historical collections can offer direct insight into the gene pools of extinct species or populations, thus providing unique phylogenetic and phylogeographic information, not possible to obtain from living individuals. The extinct Danish population of the Aesculapian snake (Zamenis longissimus) represents such a case.
The Aesculapian snake is a non-venomous colubrid snake distributed in Central and Southern Europe including the Balkans, Anatolia, and Iran, with introduced populations in Wales and London ( Figure 1). The species' northern distribution limit coincides with the 48th to 50th parallel north, but c. 10 isolated populations in Germany, Czech Republic, and Poland exist outside the main distribution, restricted to warm and humid microhabitats in river valleys [3][4][5]. A further five northern populations have gone extinct during the last 150 years [6], including the population Diversity 2018, 10, 14 2 of 10 in Denmark. These isolated populations, whether extinct or extant, are possible remnants of an earlier, more widespread distribution, evidenced by numerous fossil findings of this species [7,8] ( Figure 1). In the mid-Holocene, the mean temperature in the northern hemisphere was higher than the present day which is why some thermophilous European species, including Z. longissimus, had a distribution extending further north than their current distribution [9,10]. Aesculapian snake fossils from Denmark represent the species' Mid-Holocene northernmost distribution border [9,11,12]. The presence of this species during historical times in Denmark is evidenced by a number of direct observations and museum specimens collected throughout the 17th to 19th centuries [13][14][15]. All reliable observations of living Aesculapian snakes in Denmark derive from the southern tip of the island Zealand (Sjaelland) [16,17] (Figure 1). This area was earlier characterized by coppice of hazel [18], but the management practice ceased in late 1800s and the habitat changed, likely becoming unfavourable for the snakes. Three ethanol-preserved specimens exist from the historical Danish population from Southern Zealand, collected in 1810, 1851, and 1863, respectively, but no previously-published study on Z. longissimus morphology or phylogenetics has included these specimens.
Diversity 2018, 10, x FOR PEER REVIEW 2 of 10 northern populations have gone extinct during the last 150 years [6], including the population in Denmark. These isolated populations, whether extinct or extant, are possible remnants of an earlier, more widespread distribution, evidenced by numerous fossil findings of this species [7,8] (Figure 1). In the mid-Holocene, the mean temperature in the northern hemisphere was higher than the present day which is why some thermophilous European species, including Z. longissimus, had a distribution extending further north than their current distribution [9,10]. Aesculapian snake fossils from Denmark represent the species' Mid-Holocene northernmost distribution border [9,11,12]. The presence of this species during historical times in Denmark is evidenced by a number of direct observations and museum specimens collected throughout the 17th to 19th centuries [13][14][15]. All reliable observations of living Aesculapian snakes in Denmark derive from the southern tip of the island Zealand (Sjaelland) [16,17] (Figure 1). This area was earlier characterized by coppice of hazel [18], but the management practice ceased in late 1800s and the habitat changed, likely becoming unfavourable for the snakes. Three ethanol-preserved specimens exist from the historical Danish population from Southern Zealand, collected in 1810, 1851, and 1863, respectively, but no previously-published study on Z. longissimus morphology or phylogenetics has included these specimens. The objective in the current study is two-fold. First and foremost, we aim to investigate the genetic relationship between the extinct Danish population and the genetically well-described extant Z. longissimus populations. Musilová et al. [19] demonstrated a clear population genetic structuring with four main mtDNA clades in this species. Furthermore they showed that the current small and isolated populations north of the core distribution belong to the Eastern clade, suggesting that it was snakes from a Southern Balkan glacial refugium that colonized Northern Europe during the  [19] and the extinct Danish population from Southern Zealand (map insert). B: the current relictual (continent) or introduced (UK) populations marked with black dots. Populations that went extinct in historical times are marked with orange dots and fossil finds are marked with green dots. Modified from Gomille [6], Richter and Noe-Nygaard [11], and Kristensen [12]. Three historical Danish specimens were sampled for this study: The objective in the current study is two-fold. First and foremost, we aim to investigate the genetic relationship between the extinct Danish population and the genetically well-described extant Z. longissimus populations. Musilová et al. [19] demonstrated a clear population genetic structuring with four main mtDNA clades in this species. Furthermore they showed that the current small and isolated populations north of the core distribution belong to the Eastern clade, suggesting that it was snakes from a Southern Balkan glacial refugium that colonized Northern Europe during the mid-Holocene climatic optimum. We aim to investigate if the genetic profiles of the Danish Aesculapian snakes are consistent with this scenario. A second objective is to explore the molecular potential in these old ethanol-preserved specimens. By using highly-optimized ancient DNA (aDNA) extraction protocols combined with next-generation sequencing technology it is possible to obtain data from severely degraded DNA that will fail with standard extraction methods and PCR-based technology. We test the Aesculapian snake material using a silica-in-solution extraction method that has previously been used in large-scale genomic studies on ancient human skeletal material [20]. In this context we compare DNA preservation in different types of snake tissue (tooth, bone, and soft tissue). We believe that the protocols and results presented here have high relevance to scientists that engage in museum collection-based molecular research.

Samples
Three historical specimens were sampled for this study (Figure 1), representing all known museum specimens of Z. longissimus from Denmark. HK2489 is a 102 cm long female collected in 1810 (noted locality: Petersgaard Skovdistrikt, Sydsjaelland); ZMUC R8974 is a 128 cm long female, collected in 1863 (noted locality: Petersvaerft Skovridergård, Sydsjaelland), and ZMUC R8975 is an 81 cm long female collected in 1851 (Noted locality: Petersgaard Skovdistrikt, Sydsjaelland), received alive at the Natural History Museum of Denmark and killed and stored in ethanol. The exact preservation history of the other two specimens is unknown but today they are stored in ethanol ( Figure 1) and have presumably been so since the time of collection. Small pieces of muscle and skin tissue (<0.2 g) were removed from all three, and we also obtained 3-4 teeth from each of two individuals, and a complete vertebra from one individual (Table 1). Basic sequencing, mapping, and DNA damage statistics per sample and substrate. Total, total number of sequences generated; Retained, number of sequences retained after adapter trimming and removing <30 bp sequences; Mapped, number of sequences successfully aligned to the T. sirtalis genome; Non-clonal, number of sequences retained after removing identical sequences (clones) within each library; Hits, percentage of retained sequences that could be aligned; Efficiency, the percentage of non-clonal aligned sequences among total number of generated sequences; C-T, percentage of sequences showing DNA deamination damage at position 1 (5 end); Av. Length, average length (bp) of the aligned sequences.

DNA Extraction
A total of six DNA extracts were prepared for this study: three soft tissue samples (muscle/skin), two tooth samples, and one bone sample. The DNA was extracted using a silica-in-solution method optimized for retaining short and degraded DNA molecules [20]. The samples were incubated for 24 h at 45 • C in 5 mL digestion buffer containing 4.7 mL 0.5 M EDTA buffer, 50 µL of Proteinase K (0.14-0.22 mg/mL, Roche, Basel, Switzerland), 250 µL 10% N-laurylsarcosyl, and 50 µL TE buffer (100×). The solution was spun down and the supernatant transferred to a 50 mL tube, where it was mixed with 100 µL of silica suspension (see [20]) and 40 mL of binding buffer. The binding buffer was prepared in bulk by mixing 500 mL buffer PB (Qiagen, Hilden, Germany) with 9 mL of sodium acetate (5 M), and 2.5 mL of sodium chloride (5 M), pH adjusted to 4-5 with 37% HCL. After 1 h of incubation (room temperature) with binding buffer and silica, the supernatant was removed and the pelleted silica was re-suspended in 1 mL of new binding buffer, spun down, and washed twice with 1 mL 80% cold ethanol. Finally, the DNA was eluted in 80 µL of EB buffer (Qiagen). Extraction blanks were included with each round of extractions. The work was conducted using strict aDNA guidelines (e.g., [21]) in a dedicated clean lab at the Centre for GeoGenetics at the Natural History Museum of Denmark.

Library Preparation and Sequencing
DNA extract was built into a blunt-end library using the NEBNext DNA Sample Prep kit E6070 (New England Biolabs, Ipswich, MA, USA) and Illumina-specific adapters (Illumina, San Diego, CA, USA). The libraries were prepared as described previously [22] with a few modifications: the end-repair step was performed with 20 µL of DNA extract, 2.5 µL repair buffer, and 1.25 µL repair enzyme mix. The solution was incubated for 20 min at 12 • C, followed by 15 min at 37 • C, and purified using the same binding buffer as used for extractions (see above) with Qiagen MinElute columns, and eluted in 15 µL of EB buffer. Next, Illumina-specific adapters (prepared as in [23]) were ligated to the end-repaired DNA in 25

Bioinformatics
The data were base-called using the Illumina software CASAVA 1.8.2 and sequences were de-multiplexed with a requirement of a full match of the six nucleotide indices that were used. The raw reads were trimmed for adapters using AdapterRemoval2 [24], and trimmed reads shorter than 30 bp were discarded. Since no reference genome exists for Z. longissimus, we performed an initial round of mapping against the draft genome sequence of another colubrid snake species, Thamnophis sirtalis [25] allowing to test for the presence of genomic snake DNA in our samples. The trimmed reads were mapped using BWA v. 0.7.15 [26] with default settings, and Samtools v. 1.5 [27] was used to remove duplicate sequences. To investigate the level of DNA degradation in our samples, the mapped sequences were analysed with mapDamage v.2.0.5 [28].
For the phylogeographic analyses, we mapped the shotgun data (as above) against two published Z. longissimus mtDNA sequences (cytochrome c oxidase subunit I and cytochrome b, coxI and cytb) representing each of the four different phylogeographic lineages (Eastern E1, Western W1, Greek G1, Asian T1) as defined previously [19]. The individual bamfiles were then imported into Geneious v.8.1.7 [29] for manual inspection and consensus sequence construction, applying a 75% site identity threshold and a minimum site coverage of 3×. The consensus sequences are available at GenBank under accession numbers MH018690-MH018693.

Alignment and Network Analysis
The consensus sequences were then concatenated (coxI + cytb) and aligned in Geneious against 92 Z. longissimus sequences (1638 bp), representing the dataset analysed in Musilová et al. [19]. The alignment was converted to a Nexus file format and imported into POPART [30] to construct an unrooted median-joining network.

Sequencing, Trimming, and Mapping
Roughly 500 million DNA sequences were generated for this project, ranging from 33 to 172 million per extract (Table 1). Following bioinformatic trimming and mapping against the T. sirtalis genome between 0.6% and 8.6% of the sequences could be aligned, representing an overall mapping efficiency (clonal reads, low quality reads, and <30 bp reads removed) between 0.06% and 1.3% ( Table 1). Given that we had to rely on the T. sirtalis genome as reference for the genome-wide mapping, these numbers are likely lower than the true endogenous DNA content of the samples. Regardless, based on these data it is evident that two of the three specimens (HK2489 and ZMUC R8974) showed promising potential. Moreover, for ZMUC R8974 all three substrate types (muscle/skin, bone, tooth) yielded snake DNA; the vertebrae sample performed best with an overall mapping efficiency of 1.3% (Table 1).

DNA Preservation
When mapped against the T. sirtalis genome and analysed in mapDamage2, the data from all DNA extracts showed a marked increase in C to T deamination damage towards the 5 -ends of the sequences, ranging from ca. 11% to 15% at position 1 (Table 1, Figure 2). The C-T damage rates appeared slightly lower in the soft tissue samples compared to bone and tooth (10% vs. 14% in ZMUC R8974 and 11% vs. 15% in ZMUC R8975). The DNA was highly degraded with length distributions of the mapped reads peaking well below 50 bp ( Figure 2). There was a tendency for soft tissue samples to display slightly longer average sequence lengths than bone and tooth ( Figure 2). alignment was converted to a Nexus file format and imported into POPART [30] to construct an unrooted median-joining network.

Sequencing, Trimming, and Mapping
Roughly 500 million DNA sequences were generated for this project, ranging from 33 to 172 million per extract (Table 1). Following bioinformatic trimming and mapping against the T. sirtalis genome between 0.6% and 8.6% of the sequences could be aligned, representing an overall mapping efficiency (clonal reads, low quality reads, and <30 bp reads removed) between 0.06% and 1.3% ( Table 1). Given that we had to rely on the T. sirtalis genome as reference for the genome-wide mapping, these numbers are likely lower than the true endogenous DNA content of the samples. Regardless, based on these data it is evident that two of the three specimens (HK2489 and ZMUC R8974) showed promising potential. Moreover, for ZMUC R8974 all three substrate types (muscle/skin, bone, tooth) yielded snake DNA; the vertebrae sample performed best with an overall mapping efficiency of 1.3% (Table 1).

DNA Preservation
When mapped against the T. sirtalis genome and analysed in mapDamage2, the data from all DNA extracts showed a marked increase in C to T deamination damage towards the 5′-ends of the sequences, ranging from ca. 11% to 15% at position 1 (Table 1, Figure 2). The C-T damage rates appeared slightly lower in the soft tissue samples compared to bone and tooth (10% vs. 14% in ZMUC R8974 and 11% vs. 15% in ZMUC R8975). The DNA was highly degraded with length distributions of the mapped reads peaking well below 50 bp ( Figure 2). There was a tendency for soft tissue samples to display slightly longer average sequence lengths than bone and tooth ( Figure 2).  distribution showing very short average read lengths and an exponentially declining distribution as typical for degraded DNA [31]. The peaks at ca. 38 and 64 bp represent bioinformatic mapping artefacts. In BWA the number of allowed mismatches when aligning each sequence to the reference genome is defined by the sequence length, increasing from 2 to 3 allowed mismatches at 38 bp, and 4 allowed mismatches at 64 bp. This relax in stringency is observed as more reads mapping. The peak observed at 81 bp represents the tail of the distribution piling up at the maximum sequencing length applied in this run.

Phylogeography
When the sequences were mapped against the coxI + cytb mitochondrial sequences from Z. longissimus obtained from GenBank, HK2489 and ZMUC R8974 again displayed far more sequences aligning than ZMUC R8975 ( Table 2). With only very few sequences retained it was impossible to generate a consensus sequence for ZMUC R8975 and was, therefore, excluded in downstream analyses. For the two good samples we obtained approximately twice as many hits when mapping against Z. longissimus mitochondrial genes compared to sequences of Z. lineatus (Table 2), confirming the taxonomic identity of these two specimens. Moreover, we observed that both specimens had slightly higher affinity (more hits) to the Eastern lineage within Z. longissimus than to the other three lineages (Western, Greek, Asian) described previously [19]. Number of non-clonal sequences from each sample that could be mapped against the 1638 bp concatenated sequence (cytb + coxI) from each of the four Z. longissimus clades and a closely-related outgroup species (Z. lineatus).
The two specimens with sufficient DNA preserved for analyses (HK2489 and ZMUC R8974) display the highest number of matches when mapped against the Eastern clade sequence, indicative of their genetic affinity to this clade.
The network analysis securely placed the sequences of the two Danish individuals in the Eastern lineage ( Figure 3). The specimen HK2489 displayed the previously defined haplotype E1, whereas ZMUC R8974 showed one unique mutation separating it from E1, and not previously observed in the Z. longissimus gene pool (Figure 3). This nucleotide (position 1616 in the concatenated sequence) is a T in ZMUC R8974 instead of the C observed in all previously published sequences. The position was covered 12 times with unique non-clonal reads, all showing a T nucleotide, ruling out that this variant call could be a result of deamination damage.

DNA Preservation
By using a silica-in-solution DNA extraction method with a binding buffer optimized for recovering very short molecules [20], we succeeded in obtaining and sequencing DNA from two of the three 19th century specimens of Z. longissimus included in this study. Despite not being a large systematic study, comparing different methods on many specimens, the results are encouraging nonetheless and show that this aDNA extraction method is certainly applicable to old ethanol-preserved specimens. More systematic studies are needed in order to test the performance in comparison to other extraction methods that have worked on similar material (e.g., [32][33][34]). In terms of utilizing different parts of a specimen, we observe that soft tissue, bone, and tooth samples all yielded positive results for one of the snakes. To our knowledge this is the first study to report endogenous DNA from snake teeth. There may be situations where one substrate is preferred over another and now the molecular arguments are in place for targeting each of these substrates. The vertebra yielded the best DNA library (highest efficiency) of the three substrates, but more tests are needed before this can be confirmed as a general feature. In ZMUC R8975 both soft tissue and teeth failed to yield DNA at levels that were feasible for further sequencing. Whichever factors that have affected DNA preservation negatively in this specimen did not discriminate between the substrates.
The DNA in the snakes was highly degraded, showing short fragment lengths and C-T damage fractions that are comparable to DNA from bones thousands of years old (e.g., [20]). This explains why PCR-based experiments often fail on such material; the template molecules are too short to include both primer binding regions and the target region between them. Although ethanol preservation has ensured long-term morphological integrity of these snake specimens, the DNA molecules have clearly been highly compromised. This is perhaps not surprising considering that many such specimens are preserved in only 70% ethanol solution, likely causing DNA to degrade spontaneously in the presence of 30% water [35]. Moreover, although these particular snakes were collected prior to the use of formalin in preservation practices [36], formalin could very well have been added at a later stage, perhaps explaining the high level of DNA degradation we observe [36].
Nonetheless, the fact that we can successfully align > 4% of the reads to the genome of a distantly-related snake species (T. sirtalis) is encouraging and highlights the genomic potential in

DNA Preservation
By using a silica-in-solution DNA extraction method with a binding buffer optimized for recovering very short molecules [20], we succeeded in obtaining and sequencing DNA from two of the three 19th century specimens of Z. longissimus included in this study. Despite not being a large systematic study, comparing different methods on many specimens, the results are encouraging nonetheless and show that this aDNA extraction method is certainly applicable to old ethanol-preserved specimens. More systematic studies are needed in order to test the performance in comparison to other extraction methods that have worked on similar material (e.g., [32][33][34]). In terms of utilizing different parts of a specimen, we observe that soft tissue, bone, and tooth samples all yielded positive results for one of the snakes. To our knowledge this is the first study to report endogenous DNA from snake teeth. There may be situations where one substrate is preferred over another and now the molecular arguments are in place for targeting each of these substrates. The vertebra yielded the best DNA library (highest efficiency) of the three substrates, but more tests are needed before this can be confirmed as a general feature. In ZMUC R8975 both soft tissue and teeth failed to yield DNA at levels that were feasible for further sequencing. Whichever factors that have affected DNA preservation negatively in this specimen did not discriminate between the substrates.
The DNA in the snakes was highly degraded, showing short fragment lengths and C-T damage fractions that are comparable to DNA from bones thousands of years old (e.g., [20]). This explains why PCR-based experiments often fail on such material; the template molecules are too short to include both primer binding regions and the target region between them. Although ethanol preservation has ensured long-term morphological integrity of these snake specimens, the DNA molecules have clearly been highly compromised. This is perhaps not surprising considering that many such specimens are preserved in only 70% ethanol solution, likely causing DNA to degrade spontaneously in the presence of 30% water [35]. Moreover, although these particular snakes were collected prior to the use of formalin in preservation practices [36], formalin could very well have been added at a later stage, perhaps explaining the high level of DNA degradation we observe [36].
Nonetheless, the fact that we can successfully align > 4% of the reads to the genome of a distantly-related snake species (T. sirtalis) is encouraging and highlights the genomic potential in these samples. With a Z. longissimus reference genome available, our two historical specimens would likely qualify for complete genome assembly and future studies will hopefully be able to take advantage of the genome-wide shotgun data we have generated in this study.

Phylogeography
The primary objective of this investigation was to determine the genetic affinity of the extinct Danish Z. longissimus population. Four major clades have been described within this species: two small and basal clades near the Black Sea and in Greece, respectively, and two larger clades that expanded northwards from their southwestern and southeastern refugia-likely in Southern France and the Southern Balkans, respectively [19,37,38]. Our analyses clearly place the two Danish individuals in the eastern clade, likely deriving from a Balkan refugium. The south of the Balkans served as a glacial refugium for many reptiles (e.g., [39][40][41][42][43]) and our results suggest that it was an Z. longissimus expansion from this refugium that reached as far north as Denmark during the mid-Holocene warming period. This expansion mirrors the results from Musilová et al. [19] who demonstrated that the extant isolated populations in Germany and Czech Republic also belong to this Eastern clade.
The question remains, however, whether these isolated extant and recently-extinct populations are relicts of the wider mid-Holocene distribution demonstrated by fossils, or if they were introduced at a later stage. Before fossil evidence was available, it was suggested that these isolated populations could have been introduced, amongst others, by the ancient Romans, or later by Italian or French noble families or Greek merchants [6,44,45]. While the presented data (here and elsewhere) cannot exclude a later re-introduction based on individuals from the Eastern clade, they certainly exclude an origin from the Western clade (Italy, France, etc.) or Greek clade, which would have been the null hypothesis if they had been introduced by Romans or later merchants. Thus, our results suggest that the now extinct Danish population from Southern Zealand was probably a true relict population from the mid-Holocene warming period. In summary, this suggests that Z. longissimus occurred in Denmark from at least 7500 BC up to the early 1900s, when the last observations were made.