The Location of the Pseudoautosomal Boundary in Silene latifolia

Y-chromosomes contain a non-recombining region (NRY), and in many organisms it was shown that the NRY expanded over time. How and why the NRY expands remains unclear. Young sex chromosomes, where NRY expansion occurred recently or is on-going, offer an opportunity to study the causes of this process. Here, we used the plant Silene latifolia, where sex chromosomes evolved ~11 million years ago, to study the location of the boundary between the NRY and the recombining pseudoautosomal region (PAR). The previous work devoted to the NRY/PAR boundary in S. latifolia was based on a handful of genes with locations approximately known from the genetic map. Here, we report the analysis of 86 pseudoautosomal and sex-linked genes adjacent to the S. latifolia NRY/PAR boundary to establish the location of the boundary more precisely. We take advantage of the dense genetic map and polymorphism data from wild populations to identify 20 partially sex-linked genes located in the “fuzzy boundary”, that rarely recombines in male meiosis. Genes proximal to this fuzzy boundary show no evidence of recombination in males, while the genes distal to this partially-sex-linked region are actively recombining in males. Our results provide a more accurate location for the PAR boundary in S. latifolia, which will help to elucidate the causes of PAR boundary shifts leading to NRY expansion over time.


Introduction
Sex chromosomes are known to evolve from autosomes (e.g., [1,2]) following acquisition of sex-determining gene(s) and evolution of a non-recombining region around the sex locus (reviewed in [3,4]). Following formation of a non-recombining sex-determining region (SDR), a part of sex chromosomes continues to recombine in heterogametic sex, comprising the so-called pseudoautosomal region (PAR). As the PAR is partially sex-linked, its properties are intermediate between the sex chromosomes and the autosomes, but they also possess some features unique to these peculiar genomic regions [5]. In particular, the recombination rate may be unusually high in this region-e.g., in humans the average recombination rate in the p-arm PAR is at least 10 times higher than the genomic average [6,7], while local recombination in mouse PAR is 100 times higher than the genomic average [8]. This elevated recombination in the PAR is the consequence of X:Y pairing only in this region during male meiosis and the physical size of the region is negatively proportionate to recombination density. Frequent recombination in the PAR may inflate GC-content via biased gene conversion [9] and increase mutation rate in this region [10].
The genes in the PAR show a unique evolutionary dynamic specific to this region [5,11,12]. In particular, the PAR genes closely linked to the SDR are expected to maintain some sequence divergence between the X-and Y-linked alleles, which should inflate polymorphism in the PAR. Furthermore, the genes evolving under sex-specific or sexually antagonistic (SA) selection may accumulate divergence in frequencies of alleles linked to the X and Y chromosomes, which could favor recombination suppression and lead to shrinking of the PAR and expansion of the non-recombining SDR [5]. Indeed, the non-recombining SDRs, such as male-specific region on mammalian Y-chromosome (NRY) or female-specific region on bird W-chromosome (NRW), show a tendency to expand over evolutionary time. In human ancestry, this expansion has occurred in four [13] or five [14] steps, giving rise to so-called "evolutionary strata", with the most recent expansion about 30 million years ago [14]. The analyses of bird W-chromosomes show evidence of multiple independent expansion events in different lineages [2]. NRY (or NRW) expansion has also been reported for other lineages, such as snakes [15], and dioecious plants including the Silene genus [16]. SA selection is often mentioned as the cause of this expansion [17,18], though there is relatively little experimental evidence supporting this view [19,20]. Furthermore, resolving SA does not have to involve NRY expansion; e.g., it can be resolved by limiting SA gene expression to one sex. The latter mechanism was suggested to be at play in ratite birds, such as emus, where the NRW is relatively small, while the extent of sex-biased expression in the pseudoautosomal genes is substantial [21], though it remains unclear whether these two observations are causally linked. Sex-biased expression evolved multiple times in eukaryotes and has been reported in other species such as in the plant Silene latifolia [22].
The proximate mechanisms that cause NRY expansion and recombination suppression between sex chromosomes are not well understood [19]. Such mechanisms could involve chromosomal rearrangements preventing recombination, or operate via regulation of rate and/or distribution of recombination in the genome. The latter type includes sex-specific achiasmy, when recombination occurs only in the homogametic sex, as found in Drosophila and butterflies [23], but may also include more subtle changes in local recombination rate in the region adjacent to the NRY/PAR boundary. Translocations of chromosomal segments from autosomes to the Y-chromosome, resulting in recombination suppression in the translocated region, can be regarded as an instance of the former mechanism. Such translocations can lead to formation of additional sex chromosomes, the co-called neo-sex chromosomes reported for many species, including Drosophila miranda (e.g., [24]) and the plant Silene diclinis [25]. In some cases, this process of translocation leading to the formation of neo-sex chromosomes has been repeated multiple times. This is particularly the case in monotremes, where this process resulted in formation of five X-and five Y-chromosomes in platypus [26]. In addition, chromosomal inversion(s) may also play a significant role in sex chromosome evolution and NRY expansion [19], such as reported in papaya [27] and sticklebacks [28]. However, in some cases, NRY expansions appear to have occurred without inversions involved [29]. Moreover, inversions detectable in NRY regions could be the consequence rather than the cause of recombination suppression, as they can occur between the sequences present in multiple copies, such as transposable elements that tend to accumulate in the non-recombining regions [30,31]. The analysis of genes in the region adjacent to the PAR boundary (e.g., [8,32,33]) is essential to understand the mechanisms underpinning NRY expansion and shifts in PAR boundary location.
The studies of recently evolved sex (or neo-sex) chromosomes have contributed significantly to our understanding of sex chromosome evolution, notably in plants [34], such as found in Silene latifolia and its close relatives [35]. They represent convenient study systems to investigate the processes shaping sex chromosome at the early stage of their evolution [36,37]. In particular, the S. latifolia sex chromosomes have been actively used to study many aspects of sex chromosome evolution, ranging from the origin of sex chromosomes evolving de novo [1], to sex chromosome structure [38,39], to Y-degeneration [40][41][42][43][44], to evolution of dosage compensation [45][46][47][48], to NRY expansion [16,33,[49][50][51]. In particular, it has been demonstrated that NRY expansion has created distinct evolutionary strata on S. latifolia sex chromosomes [16,52] that are analogous to evolutionary strata described on sex chromosomes of humans [13] and other species [15,53]. Furthermore, there is some evidence that the NRY expansion in S. latifolia is an on-going gradual process, as the NRY/PAR boundary in S. latifolia is "fuzzy" [49] and its location differs between close relatives of S. latifolia [33]. The previous work on this system was limited to a relatively small number of genetic markers in the PAR and the adjacent region. In this paper, we report the analysis of 86 genes adjacent to the PAR boundary, which allows us to substantially improve resolution in this region. A more accurate location of the PAR boundary reported by our study will significantly facilitate the downstream work devoted to the analysis of the processes and mechanisms involved in NRY expansion.

Finding the Markers Common with Other Studies
The sequences of sex-linked and pseudoautosomal genes mapped previously [47] were blast-searched against the markers of the other study that analyzed the location of the PAR boundary [49] with blastall v2.2.26 (-p blastn) to identify the markers common to the two studies. In an attempt to increase the number of common markers, we also blast-searched the markers of each of these studies against the partial genome assemblies published previously [41,47]. In all cases, we kept blast hits with a P-value below 1.0 × 10 −80 and identity higher than 97.5%.

Finding the Location of the PAR/NRY Boundary
To identify Y-linked alleles in the genes with gametologs on the X and Y-chromosomes, we used transcriptome sequencing data from parents and 52 progeny (20 males and 32 females) of S. latifolia genetic cross df108 [47]. Y-linked alleles were identified as alleles always inherited from father to sons across two generations. To test whether occasional recombination occurs in the sex-linked genes located closely to the PAR boundary, we checked the presence of these Y-linked alleles in wild S. latifolia females sampled around Europe (5 females and 3 males, Table 1), with the expectation that fully Y-linked alleles are never present in the females.
Plants used for RNA extraction and transcriptome sequencing were grown in the glasshouse (16h light, ambient temperature) from wild-collected seeds. Total RNA was extracted from young actively growing leaves using the Qiagen RNeasy Plant Mini kit (Qiagen, Manchester, UK) with the optional DNase digestions step, following the manufacturer's instructions. RNA was poly-A enriched and sequenced on Illumina HiSeq 2000 at the WTCHG genomics facility in Oxford (UK). The newly generated sequence data is available from NCBI under bioproject number PRJNA629313 (biosamples accessions SAMN14776665 and SAMN14776666). Raw sequence reads were aligned against the reference transcriptome [47] with RSEM v.1.2.31 [54], bam files processed with Samtools v.1.2.1 [55] and single nucleotide polymorphism (SNP) calling carried out with HaplotypeCaller from GATK v. 4.1.2.0 [56]. Then, resulting VCF files for separate samples were merged together with bcftools v1.2 to calculate population genomics statistics (F st between females and males and π calculated for two genders separately) with vcftools v0.1.15. We calculated the statistics for each gene with window size corresponding to the gene length and by removing indels (options: -chr, -remove-indels, -window-pi and -fst-window-size). Last, we generated the fasta files for each wild individual with FastaAlternateReferenceMaker (from GATK v. 4.1.2.0) and aligned the gene sequences with muscle v3.8.31 [57] to calculate Tajima's D [58] with mstatspop v.0.1 [59]. The fasta alignments for the genes analyzed are available in Supplementary Materials.

Finding the Markers in Common Between X-maps of Different Studies
The PAR and NRY/PAR boundary in S. latifolia have been actively studied using genetic mapping and population genetic approaches [33,[49][50][51]. This paper takes advantage of a larger number of X-linked and pseudoautosomal genes in the genetic map we published previously [47] to more accurately locate the PAR boundary and test how wide the "fuzzy" boundary region is. We started by finding the markers in common between our map [47] and the most detailed map delimiting the S. latifolia PAR published by others ( Figure 1 in [49]). Blast-searching the sequences of the two studies against each other identified only eight markers in common between the two maps of S. latifolia X-chromosome (shown in bold in Table 2). In order to achieve better integration between the mapping results of different studies, we used previously published genomic assembly [47] to identify the genomic contigs that contained genetic markers from different studies. This allowed us to add additional six markers corresponding to nearly identical genomic position, though over half of the markers from Qiu et al. [49] could not be found (Table 2), which is not too surprising given the published genomic assembly covers only a fraction of large (~3 Gb) S. latifolia genome [47]. Table 2. The markers in common between different studies of the pseudoautosomal region (PAR) and the X chromosome. The markers in bold are the same genes in both maps; the other markers (not in bold) co-locate on genomic scaffolds, but they are not the same genes.  The markers in common between the X-chromosome maps of [49] and [47] show very good correspondence in marker order ( Table 2). The location of the PAR border differed slightly between the maps, with marker contig8488 designated as fully sex-linked by Papadopulos et al. [47], while Qiu et al. [49] concluded that the corresponding marker E780X is pseudoautosomal. Both studies agree that the S. latifolia PAR boundary is located somewhere more distally to the marker E779X/contig675. Thus, we focused our analyses on the genes between the sex-linked marker E779X/contig675 and the pseudoautosomal E241/contig3920.

Finding the Location of the PAR Boundary
The region between contig675 and contig3920 contains 86 genes in the previously published genetic map [47]. According to this map, the PAR boundary is located between the genes encoding transcripts contig9011 and contig16617 (Table 3) with the former being sex-linked and the latter being pseudoautosomal [47]. While segregation analysis of markers in the genetic cross is informative about the approximate location of the PAR boundary, it is unlikely to detect rare recombination events that may occur proximally to the putative PAR boundary. However, such rare events may be detected in the analysis of sequence polymorphism data from wild populations because such data contain information about multiple meioses that occurred since the common ancestor of the alleles in the sample.  In order to look for such rare recombination events, we searched for "Y-SNPs"-Y-alleles identified by a segregation pattern in the df108 mapping family [47], in transcriptome data of five wild females ( Table 1). As no Y-alleles were found in any of the genes designated as pseudoautosomal by [47], we focused this analysis on 48 genes located proximally to contig16617 (Table 3) in the genetic map from [47]. Out of these 48 genes, Y-alleles were found for 19 genes (Table 3 and Table S1). Six out of these 19 genes showed the presence of Y-alleles in some of the wild female samples (Table 3 and  Table S1), indicating occasional recombination in male meiosis in these genes. Interestingly, most of the genes showing evidence for recombination in male meiosis are adjacent to the PAR boundary, while no such recombining genes were detected more proximally along the X-chromosome region analyzed (Table 3). Furthermore, the genes next to the PAR (between contigs 6406 and 1798) contained multiple Y-SNPs in several females, indicating that recombination in male meiosis is not too rare in this region. On the other hand, the genes proximally to contig6406 (mapped to 63 cM [47]) contained zero or one Y-SNPs in females (Table 3 and Table S1). Thus, contig6406 may represent a boundary between regions of relatively frequent and very rare recombination in male meiosis. For convenience, we refer to the partially sex-linked region between contigs 8558 and 16617 as a "fuzzy boundary" between the PAR and fully sex-linked genes and the region proximally and distally to contig6406 as fuzzy boundary I and II, respectively (Table 3).

Patterns of Genetic Diversity Around the PAR Boundary
The patterns of polymorphism, summarized by such statistics as average per nucleotide heterozygosity (π) difference between males and females, or population differentiation (F st ) between the two sexes, are informative about the recombination between the partially X-and Y-linked alleles in males (e.g., [33,49]). In particular, in the absence of recombination, divergence between the X-and Y-linked alleles of a sex-linked gene inflates heterozygosity in males, but not in the females, so the difference in π between males and females indicates lack of recombination in a gene in males. Similarly, F st can be used to measure "population differentiation" between males and females that is expected to be high in the absence of recombination in male meiosis and low if recombination is present.
The population analysis using eight wild individuals (Table 1) focused on 86 genes (Table 3) from the previously published map [47], including the region from the fully sex-linked gene E779X/contig675 to the pseudoautosomal gene E241/contig3920. The overall genetic diversity shows contrasting patterns between the X-linked and pseudoautosomal genes and between the males and females ( Figure 1A).
The polymorphism is higher in males than females but the difference varies between the genes. On average, the polymorphisms in males and females for the PAR genes (distal to contig9011) are π males = 0.0051 and π females = 0.0025 (Student t-test, p-value < 0.01, with π males /π females = 2.02). In the fuzzy boundary genes (from contigs 1798 to 9011), π males = 0.0105 and π females = 0.0025 (Student t-test, p-value < 0.01, with π males /π females = 4.22), indicating that recombination in male meiosis is sufficiently rare in this fuzzy boundary region for the Y-and X-linked gametologs to accumulate significant sequence divergence. Last, the X-linked genes (proximal to contig1798) have π males = 0.0064 and π females = 0.0017 (Student t-test, p-value < 0.01, with π males /π females = 3.72). F st between males and females also sharply rises proximally to 64.0 cM map position ( Figure 1B). Tajima's D [58] is variable among the genes analyzed with an increase for genes in the sex-linked (Tajima's D average of 0.059) and the fuzzy boundary regions (Tajima's D average of 0.087) compared to the PAR (Tajima's D average of −0.268) ( Figure 1C). In the pseudoautosomal genes (distally to contig9011) Tajima's D shows significant decline with the distance from the PAR boundary (linear regression model, p-value = 0.00915, Figure 1C).
Genes 2020, 11, x FOR PEER REVIEW 7 of 14 The patterns of polymorphism, summarized by such statistics as average per nucleotide heterozygosity (π) difference between males and females, or population differentiation (Fst) between the two sexes, are informative about the recombination between the partially X-and Y-linked alleles in males (e.g., [33,49]). In particular, in the absence of recombination, divergence between the X-and

Integration of Genetic Map and Genome Sequence for the PAR Boundary Region
In order to estimate physical size of the PAR and the fuzzy PAR boundary region we used the sequences of the sequenced transcripts encoded by genes adjacent to the PAR boundary to find the corresponding genomic scaffolds in the partial S. latifolia genome assembly published previously [47]. BLAST searches with stringent parameters (see methods) identified corresponding genomic scaffolds for 76 out of 86 genes in the proximity of the PAR boundary (Table S2). In total we identified 72 genomic scaffolds with the total length 2.84 Mb. Most genes analyzed corresponded to separate scaffolds, reflecting a highly fragmented state of the genome assembly. The only two exceptions to this were scaffolds QBIE01000100.1 and QBIE01001489.1 that contained four and two genes, respectively (shown in bold in Table S2). Reassuringly, the genes corresponding to the same scaffold are located closely in the genetic map (<1.9 cM apart), though not always adjacent to each other (Table S2). The total length of genomic scaffolds corresponding to markers in the PAR, fuzzy boundary and the sex-linked region adjacent to PAR boundary are 1.57 Mb, 0.49 Mb and 1.2 Mb, respectively (Table S2). However, given the fragmented state of the genomic assembly, these numbers represent gross underestimates of the actual physical size of these genomic regions.

Discussion
Here, we integrated the data from genetic mapping, genome sequencing and population genetic analyses to establish the location of the boundary between the pseudoautosomal region and the X-chromosome in S. latifolia. There are many reasons that make PAR boundary region particularly interesting for evolutionary and molecular genetic studies [5,62]. The genes proximally to PAR boundary do not recombine in male meiosis, while the genes located distally, in the pseudoautosomal region, do recombine and the recombination rate may be unusually high [8]. How such dramatic difference in the recombination rate in the adjacent regions is determined at the molecular and chromosomal level is not entirely clear. Furthermore, the recombination (or lack of it) affects many evolutionary processes that shape the genome [63][64][65], and PAR boundary regions provide an interesting comparison between the "deserts" and "jungles" (cold-and hotspots) of recombination next to each other. Finally, the shifts of the PAR boundary are thought to play central role in evolution of the non-recombining region on the sex chromosomes [3,19].
Despite the intriguing evolutionary and molecular genetic aspects of the PAR boundary [5], its location is known only for a few species, including humans [66,67], some mammals [68][69][70][71], birds [2] and plants [32,49]. Many of these studies reported the changes in the location of the PAR boundary between closely related species [33,69], or even within a species [8,32,49], indicating that the location of the PAR boundary is often unstable and evolutionary labile. In particular, for S. latifolia the PAR boundary was reported to be "fuzzy" [49], implying that there is a region between the PAR and the sex-linked region where recombination in male meiosis occurs only rarely, and our results are consistent with this. The fuzzy boundary may represent a region of on-going recombination suppression leading to gradual expansion of the NRY. However, shifts of the PAR boundary could also lead to the opposite-an expansion of the PAR and shrinking of the sex-linked region, as was reported for Mus spretus, where a previously fully sex-linked 400 kb region adjacent to the PAR became pseudoautosomal [8,69]. Regardless of the direction of evolutionary change, the PAR boundary region provides an extremely interesting location in the genome to study evolution of recombination, which is key to our understanding of sex chromosome evolution and NRY formation.
In this paper, we expanded the number of genes in the proximity of the PAR boundary analyzed for the presence of recombination in male meiosis and patterns of genetic diversity. Instead of using a conventional genetic mapping approach that would fail to detect rare male recombination in the region adjacent to the PAR boundary, we opted to focus on the detection of Y-alleles in wild females. The latter approach effectively integrates over many thousands of meioses in the history of the sample and thus has a lot more power to detect rare recombination in males compared to conventional genetic mapping. Indeed, this analysis detected male recombination in six NRY genes located near the PAR boundary, but not in the genes further away from the PAR. These results indicate that the PAR boundary is located between the genes contig8598 (62.6 cM) and contig16617 (64.8 cM), with the former being fully sex-linked and the latter fully pseudoautosomal, with no male-specific alleles detectable and evidence of recombination in male meiosis in the genetic cross df108 [47]. The region between these markers must be partially sex-linked, as it shows intermediate properties between the fully sex-linked and pseudoautosomal genes. In particular, the genes in this partially sex-linked region show full sex-linkage in the genetic cross df108, and also show evidence of occasional recombination in male meiosis as some of the wild females contain the alleles that should be Y-linked based on segregation in df108 family. Interestingly, the genes proximally to contig6406 (mapped to 63 cM [47]) contained zero or one Y-SNPs in females (Table 3 and Table S1), perhaps because of lower recombination rate compared to the genes from 63.4 to 63.6 cM, or because of the action of gene conversion between the Xand Y-linked gametologs.
To compare our results with the previous work [33,44,[49][50][51], we identified 14 markers (listed in Table 2) in common between our dataset and the markers in the most detailed map delimiting the PAR published by others (Figure 1 in [49]). The comparison of the markers reveals that the results of this study support the conclusion of [49] that PAR boundary is located more proximally compared to what had been concluded by [47]. The results of the latter study are based entirely on segregation in a genetic cross and the aim of that study was not to precisely locate the S. latifolia PAR boundary. On the other hand, both the current paper and the study published by Qiu et al. [49] used additional analyses based on DNA polymorphism in wild populations, which allowed these studies to identify rare recombination events undetectable in a genetic cross. We improved on the results of [49] by adding substantially more genes in the vicinity of the S. latifolia PAR boundary, helping to locate the PAR boundary more accurately and facilitating further studies on the evolution of on-going recombination suppression and NRY expansion. For example, it will be particularly interesting to study chromatin structure in this partially sex-linked region and compare it with the chromatin in the fully sex-linked region and the PAR.
The results of polymorphism analyses are consistent with the PAR boundary located proximally to contig16617. Most genes proximally to contig16617 show much higher genetic diversity in males compared to females, which indicates some degree of divergence between the X-and Y-linked gametologs that contributes to polymorphism in males. Interestingly, this is true even for the presumably pseudoautosomal genes between contigs 16617 and 8598 where Y-alleles are occasionally found in females, indicating that recombination in male meiosis is sufficiently rare in this fuzzy boundary region for the Y-and X-linked gametologs to accumulate sequence divergence.
The patterns of polymorphism in the pseudoautosomal genes are consistent with balancing selection maintaining excess of intermediate frequency polymorphisms at the genes in the vicinity of the NRY/PAR-boundary. This is evidenced by elevated F st and the difference in polymorphism in males and females, as well as the negative regression of Tajima's D with distance from the NRY/PAR-boundary ( Figure 1C). The latter result is consistent with the previous report of inflated Tajima's D in pseudoautosomal genes adjacent to the PAR-boundary [49]. However, that study reported positive Tajima's D only in the two genes immediately adjacent to the PAR boundary, E559 and cs3297, while more distally located genes between E523 and E241 showed negative Tajima's D ( Figure 2B in [49]). Although we found no homologs for the markers E559 and cs3297 in our dataset, they are likely to be located in the fuzzy boundary region (proximally to contig16617) rather than in the PAR (Table 2). Our analysis also showed that many genes in the fuzzy boundary region have positive Tajima's D and for more distal PAR genes Tajima's D is lower and becomes negative closer to E241/contig3920 ( Figure 1C). It is possible that this pattern is caused by linkage to the NRY and the presence of sexually antagonistic genes partially linked to NRY, although, according to theory, this is likely only for pseudoautosomal genes located very close to the PAR boundary [12,18,72] and most PAR genes analyzed in our study are likely located too far for linkage with NRY to affect the patterns of polymorphism.
The lack of adequate genome sequence assembly for S. latifolia genome generally, and for the PAR boundary region specifically, remains a significant limitation of what could be carried out next in this species. In particular, nearly all genes analyzed in this study fall into separate genomic scaffolds, reflecting the highly fragmented state of the assembly that does not allow us to even approximately estimate the physical sizes of the fully sex-linked, partially sex-linked and pseudoautosomal regions of the X-chromosome. The total lengths of the genomic scaffolds corresponding to genes mapped to these regions can serve only as a lower boundary. Nevertheless, it is clear that the PAR should comprise a significant proportion of S. latifolia X-chromosome because a third of the genes mapped to that chromosome are pseudoautosomal (108 out of 327, [47]). Our results demonstrate that at least 20 more genes previously designated as sex-linked [47] are actually pseudoautosomal (or only partly sex-linked), increasing the proportion of the PAR genes further. The physical size of this partially sex-linked fuzzy boundary region must be substantial, as even the minimal estimate, provided by the total length of genomic scaffolds in this region, is nearly half a megabase (Table S2), which is likely to be a gross underestimation of the actual size of this region. Better, more contiguous assembly for the S. latifolia genome is long overdue and will significantly advance the analysis of sex chromosome structure and evolution in this interesting plant model species.
Author Contributions: D.A.F. conceived the study, M.K. did population genetics analysis, M.K. and Y.Z. did recombination map analysis, M.K. and D.A.F. drafted the manuscript, all authors contributed to final writing and manuscript editing. All authors have read and agreed to the published version of the manuscript.

Funding:
The work was funded by BBSRC grant to DAF (BB/P009808/1).

Conflicts of Interest:
The authors declare no conflict of interest.