An eDNA-Based SNP Assay for Ungulate Species and Sex Identification

Many processes in wild populations are difficult to study. Genetic data, often non-invasively collected, may provide a solution to these difficulties and are increasingly used to study behavioral, demographic, ecological, and evolutionary processes. Moreover, the improved sensitivity of genetic methods now allows analyses of trace amounts of DNA left by animals in their environment (e.g., saliva, urine, epithelial cells). Environmental DNA (eDNA) thus offers new opportunities to study a range of historic and contemporary questions. Here, we present a species and sex diagnostic kit for studying browsing in a multispecies temperate ungulate assemblage. Using mitochondrial sequences deposited in Genbank, we developed four single nucleotide polymorphisms (SNPs) for identifying four temperate ungulate species. We also sequenced portions of the Amelogenin gene on the Xand Y-chromosomes and developed six SNPs (three on the X-chromosome and three on the Y-chromosome) for sex determination. We tested the SNP assays on high and low quality/quantity DNA samples.


Introduction
Environmental DNA (eDNA) is becoming an increasingly popular tool for determining species' presence or absence [1].The term eDNA refers to any DNA that an organism sheds into its environment via skin cells, feathers, hair, feces, saliva, etc.For example, protocols for identifying fish species from eDNA in water have been developed [2][3][4], and DNA from soil has also been used to reflect above-and below-ground species compositions [5,6].DNA can also be isolated from food items such as browsed twigs [7][8][9] and salmon carcasses [10], allowing for the determination of the species of browsing ungulates [8] or predators [10].Browsed twig environmental DNA (biteDNA) allows the quantification of species-specific browsing patterns of temperate ungulates using trace amounts of DNA left during foraging [7].Briefly, species-specific primers were originally designed from cytochrome b sequences to amplify short DNA fragments (74-83 base pairs).Amplification success reached 75% and a time series showed that over 50% of samples amplified 10 weeks after the animal had browsed [8].A logical further step of this method is to use it for the determination of sex-specific browsing patterns.Some studies have determined sex from eDNA samples using traditional fragment-based methods [11,12].Forensic applications have used single nucleotide polymorphisms (SNPs) for sexing samples, but fewer eDNA applications use SNPs.The higher sensitivity conveyed by some SNP microarray platforms suggests that this approach may be more amenable to the low quantities of DNA in browse samples.This is because genetic analyses based on environmental samples face the challenge of extracting DNA of sufficient quality and quantity for successful amplification, as environmental DNA (eDNA) is often degraded.Depending on the degree of degradation and fragment lengths, this may prevent the amplification of some fragments.This results in allelic drop-out (where one or both alleles are not detected due to a lack of successful amplification events) [13].Additionally, eDNA samples often yield low quantities of DNA.For fragment-based (quantitative) methods, this increases the risk of incorrect genotyping due to stochastic misprinting events (e.g., due to polymerase slippage on tandem repeats).It also further increases the risk of drop-out events.Additionally, in contrast to analyses based on mitochondrial DNA (mtDNA), the nuclear DNA required for sexing is far less abundant than the required mtDNA.Thus, the problems of allelic drop-out and misprinting increase dramatically when using classical DNA-based sexing methods [14].By using SNP markers (shorter than other markers) and platforms with single copy detection thresholds, these problems are dramatically reduced.This typically makes fluid-based SNP systems more suitable for use in low template DNA samples [15,16].
Here, we present a novel method for determining the species and sex of unknown browsing samples using biteDNA and SNPs.We developed mitochondrial and sex-specific SNPs in four temperate ungulate species: moose (Alces alces), roe deer (Capreolus capreolus), fallow deer (Cervus dama), and red deer (Cervus elaphus) We ran all trials on a Fluidigm®96.96Dynamic Array Integrated Fluidic Circuit (Fluidigm, San Francisco, CA, USA), which has single copy detection and allowed us to use a multiple tube approach to replicate our assays.

Samples
Tissue samples from red deer, roe deer, moose, and fallow deer were collected by hunters or from animals that died due to other causes (e.g., natural death or vehicle collisions) in Sweden.DNA from all these samples was extracted and analyzed originally for other studies [17,18].We were given full permission to use the extracts.We used the extracts both for sequencing and for validating the SNP array.No animals were killed or injured for the purposes of this study.

Amelogenin Gene Sequencing
To obtain X-and Y-chromosome SNPs that were conserved across the tested species, we sequenced portions of the X-and Y-chromosome versions of the Amelogenin gene (Amel-X and Amel-Y) using primers found in Gurgul et al. [19].Note that these SNPs were developed to detect the presence of a Y-chromosome, and they may thus be monomorphic within or between species without losing their information purpose (the same applies to our mitochondrial markers).As a quality control, we also included X-chromosome SNPs as they are present in every individual.We sequenced four males and four females for each species.We used the same primers as in Gurgul et al. [19] but changed the chemistry and thermal profile of the Polymerase Chain Reaction (PCR).We used the Qiagen®Multiplex kit (Qiagen, Hilden, Germany) with the standard chemical recipe, although we found that amplification success was increased with Q-solution when amplifying Amel-X but not Amel-Y.Thermal profiles followed touchdown protocols starting from 60 • C, losing 0.5 • C each cycle until the final temperature was reached.For all Amel-X amplifications except for roe deer, 30 additional cycles were run at the final temperature (50 • C).For all Amel-Y amplifications and Amel-X roe deer, 35 additional cycles were run at the final temperature (52 • C).For all Amel-X amplicons, only one fragment was amplified (confirmed via gel electrophoresis), so these PCR products were sent without purification for sequencing.Despite attempts to optimize the Amel-Y PCR reaction, we were unable to resolve them to one fragment, so we cut the appropriately sized bands from agarose gels (around 700 bp), extracted them using the QIAquick Gel Extraction kit, and sent them for Sanger sequencing using the BigDye Terminator 3.1 Kit.We aligned Amel-X and Amel-Y sequences using BioEdit software (v.7.3.1.0)and deposited the sequences for each species into Genbank.They can be found using the following accession number series: KJ542359-KJ542366.

SNP Identification and Design
Once the sequences were aligned in BioEdit, we visually identified the SNPs.We did this both for mitochondrial and Amelogenin sequences.We designed the SNPs using Fluidigm®'s custom SNPtype assay design criteria.The sequences used for SNP development and the resulting primer sequences used for the assays can be found in the supplementary material.

SNP Arrays
We tested all SNP assays using a Biomark with a Fluidigm®96.96Dynamic Array Integrated Fluidic Circuit, following the manufacturer's protocol.We used four replicates for each assay.For all DNA samples, we used the pre-amplification step to enrich the product (14 cycles for high quality DNA samples and 30 cycles for all browsed twig samples).We used four high quality (tissue) DNA samples for each species (two males and two females).We tested eight browsed twig samples collected from the field and extracted in May 2013 (we considered these as fresh samples).We also tested 32 known browsing samples that were collected at different time points after browsing at Lycksele zoo.These zoo samples were collected and extracted in March 2010, and some were also used in a time series experiment spanning 30 March-19 August 2010 when we optimized the species-specific primers [8].For the zoo and time series samples, we recorded species but not sex information.We also tested 27 unknown field samples that were collected in November 2010 and extracted in July 2011.We also included three no-template control (NTC) blanks which remained blank after analysis.All browsed twig sample collections and extraction protocols can be found in Nichols et al. [8].

Genotyping
Taberlet et al. [14] developed an early guide for assigning genotypes to samples with very low quantities of DNA.However, this publication was written when microsatellites were the norm for genotyping.SNP diagnostics are fundamentally different than microsatellite fragment-based approaches, thus some problems (such as misprinting) inherent with microsatellites may not apply for SNP genotyping.In microsatellite applications, misprinting occurs due to PCR amplification slippage; this cannot happen in SNP genotyping.Instead, imprecise clustering of SNP genotypes may render the same outcome as misprinting.However, the way we genotyped males and females here was not dependent on allelic differences, such as in other genotyping guides [15].We designed SNPs as presence-absence markers for the X-and Y-chromosomes separately.We also replicated each assay four times.
To determine if a mitochondrial or Y-marker was truly present, we required at least three positive replicates.For an X-marker, we required four positive replicates.To assign males, we needed at least one Y-marker to be present.To assign females, we required that there were no Y-markers and at least one X-marker.Our reasoning for these criteria follows.As DNA deposited at browse bites eventually deteriorates, the absence of Y-markers or the absence of X-markers cannot be used to determine sex.However, if X-markers are present, the probability of a false negative male can be assessed (this could also be called a false positive female).In other words, what are the chances that a male sample is incorrectly assigned as a female?For samples of low quality, pipetting may transfer an incomplete set of the genome to the PCR reaction due to stochastic processes.However, given that the Y-and X-chromosomes occur in equal numbers, the chance of amplifying one over the other is 50% (assuming equal transfer and amplifiability).Hence, the probability for a male sample to display only X-markers is 0.5 R , where R is the number of independent reactions.While only additional replicates may resolve which samples are incorrectly labeled females, it is possible to correct the true number of males and females in the dataset.However, note that if four positive amplifications of X-markers are required, then the chance of calling a true male a female is only 6.25%.

SNP Design and Utility
We initially designed nine mitochondrial SNPs, but after testing, only four worked reliably to identify species (Ce16mt, Ce17mt, Ce18mt, and Ce19mt).Table 1 shows the resulting alleles specific for each species.We initially designed seven Amel-X SNPs and seven Amel-Y SNPs.After testing, three Amel-X SNPs were found to perform reliably for all species (Ce02ax, Ce03ax, and Ce04ax).For the Amel-Y SNPs, the success of these SNPs differed according to species and sex, but the most successful ones were Ce10ay, Ce11ay, and Ce12ay.Using the high quality DNA samples, we determined that Ce10ay could reliably distinguish males from females in moose, roe deer, and red deer.Ce11ay could distinguish male from female moose and roe deer.Ce12ay could distinguish male from female roe deer and red deer.None of the markers were reliable for sexing fallow deer samples.Table 1 shows the resulting alleles specific for sex.
Table 1.Single Nucleotide Polymorphism (SNP) genotyping for species and sex identification.Ce19mt, Ce18mt, Ce17mt, and Ce16mt show the base pairs that correspond to the different species they identify.Ce04ax, Ce03ax, and Ce02ax show the base pairs that are specific for Amel-X SNPs.Ce12ay, Ce11ay, and Ce10ay show the base pairs that are specific for Amel-Y SNPs.Amel-X and Amel-Y SNPs are designed as presence-absence markers.A blank indicates that the SNP will not give a result in that species/sex.The bottom row shows drop-out rates (in percentages).

SNP Genotyping
Using the SNPs in Table 1, we were able to successfully identify the species in seven out of eight freshly browsed twig samples (Table 2).The eighth sample remained ambiguous, but it could be determined to be either moose or red deer (Ce18mt and Ce19mt both dropped out).Using 32 known browser samples, 30 amplified and 28 gave the expected genotype.Thirteen out of 27 unknown field samples gave clear species genotypes, two of which differed from the species we had assigned using the previous method.These samples had been stored at very low concentrations for an extended period of time, which may have caused the DNA to fragment further, leading to poor resolution in analyses [19].All results are summarized in Table 2.
Table 2. Genotyping by species using the coding in Table 1."Fresh" indicates samples collected from the field and extracted just prior to SNP analysis."Known" indicates samples collected from animals at the zoo."Field" indicates samples collected from the field, extracted, and stored in the freezer for two years.Ambiguous means that some markers amplified, but not enough to distinguish a species.For the mitochondrial SNPs, drop-out rates averaged 23.7% in all browsed twig samples, but there were differences according to SNPs (Table 2).Ce19mt was the most likely mitochondrial marker to drop out, and Ce16mt was the least likely.Drop-out rates for Amel-X markers averaged 73% across the three markers.However, for Ce04ax, we noted misprinting for the second allele which was originally designed from a monomorphic locus, meaning that the first allele signified the presence of the X-chromosome whereas the second allele did not exist according to our sequence data.This misprinting occurred in 17% of browsed twig sample reactions.

Species
We were able to assign sex in 28% of our browsed twig samples.We note that the number of positive sex identifications is highest in the first few days after browsing (Table 3), whereas the number of positive species identifications has less of a decline through time.However, we were still able to make some positive sex identifications up to 145 days after browsing.For positive sex identifications to be effective, we recommend collecting as fresh as possible browsed bites.Table 3. Genotyping successes categorized by days after browsing.Zero days after browsing are the zoo samples where ungulates were allowed to take bites from twigs.The remaining browsed twig with saliva was then taken to the lab.Positive values for days after browsing are samples that were used in a time series experiment where ungulates were allowed to browse twigs that were then set up in an enclosure, mimicking a forest stand, and were then sampled at regular intervals.Unknown indicates that the samples came from the field.

Discussion
This SNP assay allows identification of species and sex from eDNA samples.For freshly collected and extracted field samples, the success rate for species identification was 87.5%, and for the time series samples we had a 97% success rate where we were successful in identifying a species 155 days after browsing.As expected, our sex-specific assays showed lower amplification rates than the mitochondrial markers.The overall success rate for sexing was 28% (Table 3).Although a lower success rate for nuclear compared to mitochondrial markers is expected because mitochondrial DNA is far more abundant than nuclear DNA per cell, we have shown here that nuclear DNA can be detected by SNP genotyping in a substantial proportion of samples.This demonstrates that nuclear DNA is present in these samples, which may allow the determination of individual genotypes from eDNA left during browsing.We note that some of our older samples may not have been as successful because they had been collected and extracted up to two years prior to using the SNP protocols.DNA stored at low concentrations tends to fragment faster than DNA stored at higher concentrations [20].Some of our samples showed allelic drop-outs, and this affected our sexing protocol much more than the species identification.Our analysis of drop-out rates in the mitochondria showed that Ce19mt was the most likely to drop out, but when it was the only mitochondrial loci that dropped out, the remaining markers still allowed us to identify the correct species.The same applied when Ce18mt dropped out or when Ce16mt dropped out.However, if Ce17mt dropped out, we would not be able to differentiate between moose and fallow deer.Additional markers would reduce these ambiguities.How to deal with allelic drop-outs and misprinting is a subject of debate [21].The fundamental problem of sexing is that females are identified by the absence of a PCR-product from the Y-chromosome.Given that we work with eDNA samples, when the Y-chromosome fails to amplify in males, they could incorrectly be identified as females.To gauge the risk of false assignments, we calculated the probability of the Y-chromosome dropping out to give us a measure of certainty when sexing.However, without known-sex browsed twig samples, we cannot empirically test this.In addition, increasing the numbers of SNPs would also increase our certainty of genotyping, and this requires sequencing of additional regions on the X-and Y-chromosomes.For Ce04ax, allele two was not observed in our original sequence data, but it appeared in 17% of browsed twig sample reactions.This may reflect the potential for misprinting on this locus.Alternatively, this allele may have just been missed in our initial SNP screening.Additional sequencing may reveal whether this allele is found in other samples.We could not empirically determine drop-out or misprinting rates for the Amel-Y marker because the sex was not known for biteDNA samples.
In summary, the species-level SNP loci performed well in the tested samples, with the highest success in the higher quality DNA samples.In the case of biteDNA, this works best for samples that have been very recently browsed.Due to the possibility for drop-out, however, additional species-level SNPs could be developed to differentiate fallow deer from moose in especially challenging samples.For sex-level SNPs, additional testing could be done to determine how the assays differentiate sex in other ungulate species.In addition, testing with known-sex browsed twig samples could be done to estimate drop-out rates in Amel-Y.
This study presents a new DNA-based technique for identifying ungulate species and sex from browsed twig samples.It highlights the increased sensitivity of SNPs over classical fragment-based approaches and provides evidence that individual genotyping of these samples is possible.With the addition of more SNPs for identifying individuals, we could increase the amount of information per sample, providing a new tool for monitoring species abundances, sex ratios, and genetic diversity using non-invasive genetic samples.