Sequencing Red Fox Y Chromosome Fragments to Develop Phylogenetically Informative SNP Markers and Glimpse Male-Specific Trans-Pacific Phylogeography

The red fox (Vulpes vulpes) has a wide global distribution with many ecotypes and has been bred in captivity for various traits, making it a useful evolutionary model system. The Y chromosome represents one of the most informative markers of phylogeography, yet it has not been well-studied in the red fox due to a lack of the necessary genomic resources. We used a target capture approach to sequence a portion of the red fox Y chromosome in a geographically diverse red fox sample, along with other canid species, to develop single nucleotide polymorphism (SNP) markers, 13 of which we validated for use in subsequent studies. Phylogenetic analyses of the Y chromosome sequences, including calibration to outgroups, confirmed previous estimates of the timing of two intercontinental exchanges of red foxes, the initial colonization of North America from Eurasia approximately half a million years ago and a subsequent continental exchange before the last Pleistocene glaciation (~100,000 years ago). However, in contrast to mtDNA, which showed unidirectional transfer from Eurasia to North America prior to the last glaciation, the Y chromosome appears to have been transferred from North America to Eurasia during this period. Additional sampling is needed to confirm this pattern and to further clarify red fox Y chromosome phylogeography.


Introduction
The phylogeography of red fox (Vulpes vulpes) has been well-characterized in terms of mitochondrial and nuclear patterns of diversity over most of its global distribution [1][2][3][4][5][6][7]. For example, the most divergent mitochondrial split dates to approximately half a million years ago and separates a clade that evolved in North America south of the ice sheets prior to the last glaciation (Nearctic clade) and one that spans Eurasia and Alaska-western Canada, reflecting a secondary continental exchange event~100 thousand years ago (ky) [2,4]. However, little is known about red fox Y chromosome diversity. Except for microsatellites, no markers exist for such investigations in red foxes [4,8]. Although microsatellites are useful for populationgenetic questions about contemporary gene flow and recent historical demography, more conserved mutations are needed for deeper reconstruction of Y chromosome phylogenies.
Sequencing the Y chromosome is challenging, even in model species, due to an evolutionarily plastic history that has resulted in a high frequency of repetitive DNA, paralogs of autosomal and X chromosomal regions, and palindromes [9][10][11]. The challenge is even more pronounced in non-model organisms due to a lack of reference genomes. Notwithstanding these obstacles, it is feasible to enrich libraries for DNA resembling Y chromosome sequences from other mammals, sequence these enriched libraries and use read depth as a criterion for distinguishing unique loci from repetitive motifs [10,12]. As a final confirmation, primers can be designed to genotype the presumptive unique loci and to test them in male and female individuals [13]. Those that turn out to be male-specific and provide no more than one allele per individual can be inferred to be male-specific Y chromosome markers. Because the entire male-specific length of the Y chromosome is linked, these markers can be used in tandem, regardless of their relative locations on the chromosome, as multi-single nucleotide polymorphism (SNP) haplotypes to reconstruct phylogenetic topologies.
In this study, we used information from the dog genome to enrich genomic fox libraries for Y chromosome DNA and sequenced the libraries in a diverse geographic sample of red fox to develop Y chromosome SNP markers and provide a preliminary look at male-specific phylogeography. Although dogs and red foxes are relatively close relatives, they have distinct karyotypes; sex chromosomes, in particular, are evolutionarily unstable [11]. Therefore, we emphasized use of Y chromosome regions that were conserved across multiple mammalian orders [13] and validated SNP markers on male and female red foxes. To obtain a red fox Y chromosome tree and divergence time estimates for major Y chromosome clades, we conducted a phylogenetic analysis rooting and calibrated branch lengths among red fox Y chromosome sequences, as well as those of the most basal extant canids (Urocyon spp.) and of coyotes (Canis latrans).

Samples
For high-throughput sequencing, we used 18 male red fox samples (Figure 1), along with 2 gray foxes (Urocyon cinereoargenteus), an island fox (U. littoralis) and 6 coyotes (Canis latrans) as outgroups (Table S1). These samples were used for SNP discovery and phylogenetic analyses. To validate a genotyping assay (see below), we used 9 of the same red fox males (i.e., to confirm that the assay produced the correct genotypes) and 19 red fox females. We also genotyped one additional red fox male from Alaska. Sequencing the Y chromosome is challenging, even in model species, due to an evolutionarily plastic history that has resulted in a high frequency of repetitive DNA, paralogs of autosomal and X chromosomal regions, and palindromes [9][10][11]. The challenge is even more pronounced in non-model organisms due to a lack of reference genomes. Notwithstanding these obstacles, it is feasible to enrich libraries for DNA resembling Y chromosome sequences from other mammals, sequence these enriched libraries and use read depth as a criterion for distinguishing unique loci from repetitive motifs [10,12]. As a final confirmation, primers can be designed to genotype the presumptive unique loci and to test them in male and female individuals [13]. Those that turn out to be male-specific and provide no more than one allele per individual can be inferred to be male-specific Y chromosome markers. Because the entire male-specific length of the Y chromosome is linked, these markers can be used in tandem, regardless of their relative locations on the chromosome, as multi-single nucleotide polymorphism (SNP) haplotypes to reconstruct phylogenetic topologies.
In this study, we used information from the dog genome to enrich genomic fox libraries for Y chromosome DNA and sequenced the libraries in a diverse geographic sample of red fox to develop Y chromosome SNP markers and provide a preliminary look at male-specific phylogeography. Although dogs and red foxes are relatively close relatives, they have distinct karyotypes; sex chromosomes, in particular, are evolutionarily unstable [11]. Therefore, we emphasized use of Y chromosome regions that were conserved across multiple mammalian orders [13] and validated SNP markers on male and female red foxes. To obtain a red fox Y chromosome tree and divergence time estimates for major Y chromosome clades, we conducted a phylogenetic analysis rooting and calibrated branch lengths among red fox Y chromosome sequences, as well as those of the most basal extant canids (Urocyon spp.) and of coyotes (Canis latrans).

Samples
For high-throughput sequencing, we used 18 male red fox samples (Figure 1), along with 2 gray foxes (Urocyon cinereoargenteus), an island fox (U. littoralis) and 6 coyotes (Canis latrans) as outgroups (Table S1). These samples were used for SNP discovery and phylogenetic analyses. To validate a genotyping assay (see below), we used 9 of the same red fox males (i.e., to confirm that the assay produced the correct genotypes) and 19 red fox females. We also genotyped one additional red fox male from Alaska.  . One sample from Alaska was genotyped for selected single nucleotide polymorphisms (SNPs) but not sequenced. Numbers refer to the sample size indicated by a marker. Markers with no numbers represent a single sample.

Capture Enrichment and Sequencing
We constructed individually barcoded whole-genome shotgun libraries using the NEB-Next Ultra DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) and 6-bp custom indexed primers. We then pooled libraries (10 ng each) in batches of 47 samples (which included samples for another project) for hybridization to capture baits. Although the entire dog Y chromosome assembly is now available [14], we began this study just prior to its publication and therefore relied on a smaller subset of canine Y chromosome fragments totaling 39,246 bp. However, we subsequently mapped our markers to the canine Y chromosome assembly (GenBank Accession No. KP081776.1) and present that information. We designed baits that were complementary to previously published and validated dog Y chromosome DNA sequences (Supplementary Information 1) that had been prescreened to minimize paralogy to X chromosome sequences from canids and other mammals [13,[15][16][17]. We also included capture baits for exons from the Y chromosome amelogenin (AMELY) and zinc finger (ZFY) genes, which have high paralogy with the X chromosome but could have unique regions as well. Baits were 80 bp each in length and overlapped adjacent baits by 40 bp (i.e., tiling density = 2 ×), as appropriate for capture between closely related species. The RNA baits were synthesized and biotinylated by MyBaits (Arbor Biosciences, Ann Arbor, MI, USA). We conducted the hybridization with blockers for 36 h at 65 • C according to the manufacturer's protocols, using the MyBaits Target Enrichment Kit (Arbor Biosciences, Ann Arbor, MI, USA) We bound hybridization products to streptavidin-coated magnetic beads, washed away the unbound library, isolated the bound (i.e., enriched) library from baits and beads, and PCR-amplified the enriched library for 14 cycles. We then sequenced the pooled enriched library on an Illumina MiSeq lane SR300 at the UC Davis Genome Center. Additionally, we included HiSeq2500 reads (PE 150) from 2 male red fox whole-genome shotgun libraries sequenced for a different project [18].

Bioinformatic Processing
We trimmed adapter sequences and filtered reads for quality, discarding reads with >50% of bases with quality scores < 2 and trimmed low-quality 3 end bases using ngsShoRT [19]. We aligned reads to the original bait sequences using bwa-mem [20] and processed the alignments with Samtools [21]. We then used freeBayes v0.9.21 [22] to call variants against the original bait sequence references, allowing "heterozygotes" to flag paralogs downstream, and exported all positions for each individual (variable and otherwise) in variable call format (vcf). Because we had no independent knowledge of X-degenerate (uniquely mapping) red fox or Urocyon Y chromosome sequences, we used coverage depth as a basis for selecting positions empirically. The distribution of coverage depths within a species contained multiple modes, which we presumed to correspond on the low end to erroneous sites (e.g., exogenous contamination), followed by uniquely aligning sites, sites aligning to both sex chromosomes and, lastly, a long tail corresponding to variable frequency repetitive motifs. Therefore, after visualizing the distributions for red foxes (see Results), we selected a coverage depth range encompassing the second mode (10-160 × across all foxes). We used these sites to obtain intraspecific SNPs.

Phylogenetic Analysis
We further filtered the sites inferred above to be uniquely aligning to the red fox Y chromosome based on the same criteria described above. We also removed any interspecific indels to obtain Y chromosome sequences that were orthologous across all taxa for calibrating substitution rates against independently determined node ages [23][24][25]. We used Mega 6 [26] to construct a maximum likelihood tree with 500 bootstrap replicates based on the Tamura 3 parameter mutation model (T92) [27], which was supported over 23 other models by the lowest Bayesian Information Criterion. To estimate the divergence time between red fox clades, we computed pairwise T92 distances and standard errors between red fox clades and calibrated these to pairwise distances to coyote and gray fox, previously estimated to be 8.8 millions of years (MY) divergent from red foxes [25].

SNP Assay
We used Assay Design Suite 2.0 (https://mysequenom.com/Tools) to design primers for multiplexing on the MassARRAY iPLEX platform (Agena Biosciences Inc., San Diego, CA, USA). To validate these SNPs as uniquely representing the non-recombining portion of the Y chromosome, we tested them in both male and female foxes and retained only those yielding male-specific genotypes.

Intraspecific Red Fox Network
After combining sequencing and genotyping calls, a small number of remaining missing calls were imputed parsimoniously with the allele found in otherwise identical complete sequences. We constructed statistical parsimony network using the median joining method in Networks (v 5.0) [28,29] and estimated the age of clades in terms of the average (and standard error) numbers of mutations separating derived nodes from inferred ancestral nodes [29,30].

Results
Excluding one red fox sample that failed sequencing, we obtained 6,486,865 raw 300-bp reads from a MiSeq lane of the 23 target-enriched samples, of which 2,393,840 (36.8%) reads aligned back to the original bait sequences (i.e., 41 targeted fragments of the dog Y chromosome). This total included 1,342,241 of 3,304,125 reads (40.6%) from 14 red foxes, 738,554 of 2,296,795 reads (32.2%) from six coyotes and 313,045 of 898,307 reads (34.8%) from three Urocyon foxes. We also obtained 86,687,990 bp of HiSeq sequence for two additional red foxes, of which 1,835,416 bp (2.1%) aligned to the bait sequences. The ratio of targeted sequence obtained from the capture enrichment vs. whole genome shotgun approaches for red foxes (i.e., 40.6% vs. 2.1%) indicates that the enrichment process increased yield approximately 20-fold over random sequencing.
In total, we recovered 37,080 sites, of which 31,624 (85.3%) had coverage depths in the range we presumed to correspond to X-degenerate sites on the Y chromosome (10-160×) and no heterozygous sites in red foxes ( Figure 2). For phylogenetic analyses, we retained 20,931 of these sites after removing interspecific indels and confirming that read depths in the other species were also consistent with uniquely aligning sites. These  (Table S3). Only sites with depths ranging from 10 to 160 × were used in analyses. Sites with depths of < 10 × were presumed to reflect errors and those > 160 × were presumed to reflect paralogs, not necessarily all on the Y chromosome.  (14 Ti,9 Tv) in the six coyotes. A maximum likelihood tree indicated two well-supported, reciprocally monophyletic red fox clades (Figure 3). Based on the Tamura 3 parameter model distances (and SE) calibrated to the Urocyon and coyote divergence times (8.8 MY [25]), the two red fox Y chromosome clades diverged approximately 470,000 (402,000-563,000) years ago, which is consistent with mitochondrial and autosomal estimates [4].

Figure 2.
Coverage depths from sequencing reads of 19 red foxes mapped to 37,080 sites of 41 orthologous dog Y chromosome fragments shown (A) with read depths arrayed across ordered sites on the reference fragments and (B) as a frequency distribution, illustrating a systematic pattern of coverage presumably corresponding to uniquely mapping Y chromosome sites (10-160 ×), sites likely to reflect two paralogs (including one on the X chromosome, 161-210 ×) and sites with > 2 paralogs (> 210 ×). Only sites with depths ranging from 10 to 160 × were used in analyses. Sites with depths of < 10 × were presumed to reflect errors and those > 160 × were presumed to reflect paralogs, not necessarily all on the Y chromosome.  Red fox Y chromosome phylogeny displayed as a maximum likelihood tree inferred from 20,931 bp of sequence in 17 red foxes, 2 gray foxes, an island fox and 6 coyotes with node support inferred from 500 bootstrapped trees. Genetic distance was computed according to the Tamura 3-parameter model [27] and calibrated to divergence time in millions of years (MY). The locations of red fox samples are shown in Table 1. Table 1. Y chromosome haplotypes of 19 male red foxes and an ancestral node inferred from orthologous positions in Canis and Urocyon, which are composed of 31 variable sites, discovered 31,624 bp of the Y chromosome sequence. We also designed a genotyping assay for 13 of the loci (first 13 from left to right). Eighteen male red foxes were sequenced, including nine foxes that were also genotyped (*) and one fox that was genotyped but not sequenced (**). One locus (20_349AT) that was amplified in 14 of 19 females was excluded from this table. None of the other 13 genotyped loci was consistently amplified in females, although we observed a low (3.5%) false-positive rate. Genotype and sequencing calls matched 100% of the time. Two foxes that were genotyped but not sequenced (or not successfully sequenced, i.e., S12-1163) have missing data (X) across all non-assayed sites.

--A T T C --G -C T ----C -A -C --T G T G G G --
Alaska, USA S12-1163 * --A T T C --G ----X X X X X X X X X X X X X X X X X X Alaska, USA S12-1161 ** --A T T C --G -C --X X X X X X X X X X X X X X X X X X For intraspecific analyses of red foxes, we retained all of the 31,624 presumptive Y chromosome sites. Among these sites, we observed 32 SNPs within red foxes (including the 22 intraspecific sites used above, along with 10 additional sites). From the 22 sites discovered initially, we designed SNP assays for 14 loci that had flanking sequences with sufficiently low complexity to facilitate primer design (Table 1, Tables S2 and S3). All genotypes for the nine males previously sequenced matched those based on the original sequences. All except one of the loci failed to amplify consistently in 19 females. Locus 20_357AT was amplified in 14 of 19 female samples, indicating that the primers we designed were not sufficiently specific to the Y chromosome target. Otherwise, the false-positive rate among the other 13 loci was 3.5% in females, with no single locus or individual yielding notably more than any other. The positions of the 13 validated SNPs were determined based on the Y chromosome dog assembly (Table S3).
We constructed two median joining networks: one using all 31 variable sites (i.e., excluding only 20_349AT, which was amplified in females) in the 17 successfully sequenced red foxes ( Figure 4A), and one using the subset of 13 assayed and validated sites that additionally included two Alaskan male red foxes genotyped using the assay ( Figure 4B and Table 1). The former network provided more resolution of haplotypes, particularly within continents, whereas the latter network included two additional Alaskan foxes with haplotypes that were basal within their clade. For the network based on all 31 sites, the rho estimates (i.e., the average number of mutations since divergence from the most recent common ancestor) were 13.00 (SD = 2.55) for the entire network and 3.82 (SD = 1.48) for the North American-Siberian clade, indicating a ratio of 0.294. For the network based on 13 sites, the rho estimates were 4.80 (SD = 1.55) for the entire network and 1.77 (SD = 0.98) for the North American-Siberian clade, indicating a ratio of 0.369. Assuming a 470,000 year divergence between the two clades, these estimates imply the age of the most recent common ancestor of Siberian and North American red foxes to range from 77,000 to 270,000 years ago based on the 13-site network or 85,000 to 190,000 years ago based on the 31-site network, both of which are consistent with previous mtDNA and autosomal nuclear estimates [4].

Discussion
The Y chromosome is replete with repetitive elements and regions paralogous to other chromosomes, posing several significant obstacles to sequencing even in species for which the Y chromosome genome has been assembled and provides a reference [10]. Our reliance on the dog Y chromosome sequence, which is likely to share only partial homology to red and gray foxes, complicated this task further. Nevertheless, our efforts to enrich fox libraries for Y chromosome reads and obtain sequences for non-repetitive fox Y chromosome fragments were successful.
Overall, we obtained approximately 75% of the targeted Y chromosome sequence as uniquely aligning in male red foxes. Although 1 of the 14 SNPs tested in females yielded regular amplification products, this may have been due to the non-specificity of the primers themselves rather than of the contiguous sequence reflected in the sequencing reads. The read depth at this SNP in the original sequencing reads was 132× among red foxes, which was well within the range of depths observed in most positions and substantially smaller than the range corresponding to a larger mode and is presumed to correspond to

Discussion
The Y chromosome is replete with repetitive elements and regions paralogous to other chromosomes, posing several significant obstacles to sequencing even in species for which the Y chromosome genome has been assembled and provides a reference [10]. Our reliance on the dog Y chromosome sequence, which is likely to share only partial homology to red and gray foxes, complicated this task further. Nevertheless, our efforts to enrich fox libraries for Y chromosome reads and obtain sequences for non-repetitive fox Y chromosome fragments were successful.
Overall, we obtained approximately 75% of the targeted Y chromosome sequence as uniquely aligning in male red foxes. Although 1 of the 14 SNPs tested in females yielded regular amplification products, this may have been due to the non-specificity of the primers themselves rather than of the contiguous sequence reflected in the sequencing reads. The read depth at this SNP in the original sequencing reads was 132× among red foxes, which was well within the range of depths observed in most positions and substantially smaller than the range corresponding to a larger mode and is presumed to correspond to loci with X and Y paralogs. Regardless, 13 of the loci for which we developed a genotyping assay resolved two major clades, as well as several haplotypes within each. We also discovered 18 additional polymorphisms for which SNP assays could be developed and validated in the future. Most of these 18 SNPs would further differentiate the already well-defined major clades (Table 1). More importantly, three intraspecific polymorphisms (i.e., 29_231AC, MS41A_209CT and MS41A_248AG) may be useful in further resolving North American haplotypes.
Given the nearly global distribution of red foxes and many translocations [31], these markers will be useful in complementing mitochondrial markers as a way of easily identifying continent of origin; for example, in the case of introductions (e.g., [32]). Moreover, combining these SNPs with Y-linked microsatellites [8] can elucidate historical demographic patterns on Holocene time scales [29,33].
Although our study was not designed primarily as a phylogeography study, which would require many more samples, our findings nevertheless provided some insights about the red fox's continental Y chromosome phylogeography in the context of previous models of continental exchange between Eurasian and North American red foxes. Most fundamentally, our Y chromosome phylogeny provided independent support for previous estimates of the timing of continental exchanges between red foxes, specifically, an initial colonization of North America by Eurasian red foxes approximately 500 ky and a secondary continental exchange around the beginning of the last glaciation~100 ky [2,4].
Prior to this study, discordance observed between mitochondrial and nuclear genetic patterns resulted in unresolved hypotheses about the nature and extent of the secondary exchange [4,7]. In particular, nearly all of the mitochondrial haplotypes observed in modern red foxes from Alaska and much of northwest Canada traced to Eurasia around the time of the secondary exchange (e.g., [2,4,6]). In contrast, most of the nuclear genetic ancestry in this same region traced to the original colonization of North America 500 ky, with little genetic exchange in either direction corresponding to the secondary contact event [4,7]. These discordant patterns suggested two hypotheses: (1) a selective sweep on Eurasian mtDNA introduced into North American red foxes following the secondary exchange, or (2) a male-mediated expansion of southern North American red foxes after the recession of the two major North American ice sheets after the last glacial maximum (< 20 ky), leading to partial replacement of a Beringian population formerly composed of Eurasian red foxes. These hypotheses differ as to whether the pre-event (continental exchange or expansion) population in Beringia was composed of Eurasian or North American red foxes and in the timing of the event that led to the discordance.
In the present study, we found the major split in the Y chromosome phylogeny dated to 500 ky, presumably reflecting the same secondary intercontinental split seen in nuclear and mitochondrial data. However, one of the clades included all North American, as well as Siberian, samples. The estimated age of the most recent common ancestral haplotype of Siberian and North American haplotypes dated to approximately 100 ky, consistent with Y chromosome movement from North America to Eurasia during the intercontinental exchange. This observation is consistent with Hypothesis 1 but not Hypothesis 2, which implies that North American patrilines did not occur in Beringia until after the second continental exchange~100 ky. Thus, the exclusivity of Eurasian mitochondrial haplotypes, combined with the general rarity of Eurasian nuclear ancestry in modern-day Alaskan red foxes, most likely traces to a selective sweep~100 ky on the mitochondrial genome.
Clearly, our sample size was too small to address the extent to which North American Y chromosomes contributed to East Asian red fox diversity or whether some reciprocal exchange could have occurred. These questions can be investigated further in the future by genotyping larger samples of foxes with the SNPs developed in this study, and in combination with Y chromosome microsatellite markers [4,8].
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-442 5/12/1/97/s1. Supplementary Information 1: Fasta sequences (and Genbank numbers or references) used to synthesize 2 × overlapping 80 bp RNA baits for the canine Y chromosome. Table S1: Information for 19 male (M) and 19 female (F) red fox and 2 male gray fox, 1 male island fox and 6 coyote samples used for Y chromosome sequencing or SNP typing. Table S2: Names and primers sequences of 14 SNPs used in the red fox Y chromosome Sequenom assay, including whether SNPs were used in the current study and reasons for exclusion. Table S3: Flanking sequences and SNP positions from which primers were designed for 14 Y chromosome SNPs.