Dynamic patterns of sex chromosome evolution in neognath birds: Many independent barriers to recombination at the ATP5F1A locus

Simple Summary: The evolution of distinct sex chromosomes, like the X and Y chromosomes of humans, has long been of interest. Unlike most chromosomes, sex chromosomes do not fully recombine during meiosis. The non-recombining region is typically small early in their evolution, but additional regions stop recombining over time. Avian sex chromosomes, where males are ZZ and females ZW, are relatively young, and previous studies have demonstrated the size of the non-recombining parts of the sex chromosomes vary among bird species. However, previous studies have sampled relatively few avian orders. Here we examine one locus from species in 22 different avian orders. Our results indicate that this locus has stopped recombining in almost all orders sampled. However, our results also show that the recombination stopped independently in each order of birds, highlighting the dynamic nature of avian sex chromosome evolution. The cessation of recombination is thought to have functional consequences because it is expected to weaken natural selection. We compared the proteins encoded by the recombining Z chromosome with those on the W chromosome. Most W chromosome proteins appeared to be functional, but they also appeared to be subject to weaker selection than the Z chromosome proteins. Abstract: Avian sex chromosomes evolved after the divergence birds and crocodilians from their common ancestor, so they are much younger than the better-studied chromosomes of mammals. It has long been recognized that there may have been several stages to the evolution of avian sex chromosomes. For example, the CHD1 undergoes recombination in paleognaths but not neognaths. Genome assemblies have suggested there may be variation in the timing of barriers to recombination among Neognathae, but there remains little understanding of the extent of this variability. Here, we look at partial sequences of ATP5F1A, which is on the avian Z and W chromosomes. It is known that recombination of this gene has independently ceased in Galliformes, Anseriformes, and at least five neoavian orders, but whether there are other independent cessations of recombination among Neoaves is not understood. We used a combination of data extracted from published chromosomal-level genomes with data collected using PCR and cloning to identify Z and W copies in 22 orders. Our results suggest there may be at least 19 independent cessations of recombination within Neognathae, and 3 clades that may still be undergoing recombination (or have only recently ceased recombination). Analyses of ATP5F1A protein sequences revealed an increased amino acid substitution rate for W chromosome gametologs, suggesting relaxed purifying selection on the W chromosome. Supporting this hypothesis, we found that the increased substitution rate was particularly pronounced for buried residues, which are expected to be more strongly constrained by purifying selection. This highlights the dynamic nature of avian sex chromosomes, and that this level of variation among clades means they should be a good system to understand sex chromosome evolution.


Introduction
Sex chromosomes have evolved independently in many different organisms, including birds, which are characterized by a ZW (males ZZ and females ZW) sex determination system). More generally, there have been many different origins of partially non-recombining chromosome pairs, called XY, ZW, and UV, depending on their distribution in the sexes [1]. The general hypothesis regarding sex chromosome evolution (e.g., Rice [2] and Charlesworth et al. [3]; but see Ponnikas et al. [4], Furman et al. [5]) and Jeffries et al. [6]) begins with a pair of autosomes that undergo recombination but include a sex determining locus (Figure 1, left hand side). When a gene with a sex-specific function is gained, sexual antagonism favors a barrier to recombination. Over time, the non-recombining region may expand so a large portion of the chromosome does not undergo recombination, and thus parts of the chromosomes begin to evolve independently ( Figure 1, purple segments). Gene content of sex chromosomes can be dynamic, with both losses of genes (particularly on the Y or W chromosmes [7,8]) and gains of genes [9,10]. Due to the need to pair during meiosis, there is not a complete cessation of recombination along the entire length of the chromosome. Instead, recombination continues on at least a small region, referred to as the pseudoautosomal region or PAR [11]. Figure 1. Steps in the evolution of heteromorphic sex chromosomes. White segments of chromosomes indicate recombination is occurring, while purple areas have ceased recombination and the chromosomal segments are evolving independently. Left to right: Autosomes in which recombination at a locus (red X) is selected against, leading to a loss of recombination. Changes in gene content and structure may cause other loci to cease recombining, with potential loss of some genetic material from one chromosome. Part of the chromosome continues to recombine to allow pairing during meiosis.
Birds may be an excellent group in which to study the evolution of sex chromosomes. Avian sex chromosomes are thought to have evolved largely independently of those in mammals [12,13], likely from an ancestor that had temperature-dependent sex determination [14]. Studies suggest multiple stages in the evolution of avian sex chromosomes [15,16]. After an initial barrier to recombination prior to the divergence of Palaeognathae and Neognathae, there was a second stage in neognaths prior to the divergence between the Galloanserae and Neoaves (reviewed in Zhou et al. [16]). This included cessation of recombination at the CHD1 (chromohelicase domain 1) locus, which is commonly used for sexing in birds [17]. This led to the origin of independently evolving CHD1-Z and CHD1-W loci that arose by the cessation of recombination, a relationship called gametology [18]. Within neognaths, examination of genes known to be present on both the Z and W chromosomes have demonstrated multiple independent barriers to recombination at the same loci across different avian clades [16,[19][20][21]. That sex chromosomes among neognath birds are changing independently is further reinforced by the variable size of the W chromosome [22] and variation in the size of the PAR [16]. Further, in one avian clade, it appears to have been formation of a neo-sex chromosome by fusion of part of an autosome with the sex chromosomes [23,24]. Thus, avian sex chromosomes are highly dynamic and variable across taxa, in contrast to groups such as mammals.
The cessation of recombination in the region surrounding any sex-linked gene should have a number of consequences. First, the rate of neutral site evolution should increase for one gametolog (either the Y or Z, depending on the sex chromosome system) due to the "male-driven molecular evolution" effect, in which the larger number of cell divisions in the male germ line leads to a larger number of mutations and, consequently, a larger number of neutral substitutions [25]. Second, the fast-X/Z effect reflects the increased efficiency of selection for beneficial recessive substitutions when the sex chromosome is in a hemizygous state [26,27], which should lead to increased non-synonymous substitutions on the X or Z chromosome. However, there is evidence both for increased positive selection and for increased negative selection on the human X chromosome [28], and it seems reasonable to expect this to be the case for avian Z chromosomes as well. There is also evidence that drift plays a major role in avian Z chromosome evolution [29] due to the reduced effective population size of the Z chromosome relative to autosomes. The consequences for the W chromosome are likely to be driven by the absence of recombination and reduced efficiency of selection (see Charlesworth and Charlesworth [7] for a description of this effect on Y chromosomes). In fact, the linkages of the W chromosome and mitogenome in birds is likely to further exacerbate this effect [30][31][32][33]. The relaxed purifying selection of W-linked loci is likely to lead to an elevated rate of non-synonymous changes [34], especially classes of non-synonymous substitutions that are likely to be deleterious, and it may also be correlated with the accumulation of transposable elements and other repetitive DNA [33,[35][36][37].
One sex-linked locus that has been the focus of several studies is ATP5F1A (also known as ATP5A1), which encodes the α subunit of the mitochondrial ATPase [38]. Ellegren and Carmichael [19] identified at least three independent cessations of recombination at the avian ATP5F1A locus -one in each of the three orders included in their study (Galliformes, Anseriformes and Charadriiformes). Suh et al. [21] built on that study, identifying three additional orders which had independently ceased recombining at this locus (Podicipediformes, Ciconiiformes and Passeriformes). That many different avian groups are undergoing independent trajectories in sex chromosome evolution is supported by data suggesting the PAR is quite variable in size among species [16]. In addition, the size of the W chromosome is also quite variable, even among relatively closely related species [22] and appears to be changing in distinct ways across taxa [39,40].
However, understanding these processes can be challenging. Studies such as those by Ellegren and Carmichael [19] and Suh et al. [21] used PCR amplification in females to isolate loci of interest, followed by cloning to obtain each gametolog. However, this may be misleading due to several possible problems. First, there may be biases in amplification (e.g., amplifying one gametolog but not the other), cloning (both gametologs amplified but only one gametolog is cloned) or sequencing (both gametologs cloned, but clones of only one gametolog are sequenced). This could lead to assumptions that some taxa are recombining (since only one gametolog identified) when that may not be correct. Since PCR amplification can result in errors among amplicons, which are fixed in clones, identifying what differences may be due to error versus mutations can be hard to sort out.
Alternatively, data can be extracted from whole genome sequences from heterogametic individuals (e.g., females in birds). However, in these cases, coverage of the sex chromosomes is half that of the autosomes, meaning that the quality of assembly is lower for these chromosomes. In addition, W-chromosomes appear to accumulate high levels of repetitive DNA [33,37], which can further confound assembly. For example, in the whole genome study of Zhou et al. [16], genomes of six neognathous birds were examined for presence of 14 loci hypothesized to be on the W chromosome (see Table S3 in Zhou et al. [16]). They only identified 10% of loci expected to be on the avian W chromosome across the six samples (8 loci were present of 84 possible). Thus, while using whole genomes should in theory be a better strategy for understanding sex chromosome evolution, many challenges still remain.
High-quality, chromosome-level genome assemblies (e.g., Rhie et al. [41]) for female individuals are likely to provide sufficient information to examine the evolution of Zand W-linked loci in detail [42]. At this point, there are very few assemblies of this quality so it is likely to be necessary to combine genomic data with PCR data. Here, we focused on the ATP5F1A locus to gain greater insights into its evolution across birds. We used a combination of traditional PCR and cloning, and also extracted data from chromosomal-level assemblies. Using data from 49 species representing 22 avian orders allowed us to better assess the number of independent barriers to recombination at the ATP5F1A locus. Because the cessation of recombination is expected to lead to different selective pressures on proteins encoded on the the Z and W chromosomes, we complemented this analysis by examining the molecular evolution of ATP5A1Z and ATP4A1W protein sequences from 13 taxa with genome assemblies and annotations of sufficient quality. Finally, we discuss some insights into the sources of errors and the effectiveness of each method for studying sex chromosome evolution that were revealed by our combined approach.

DNA extraction, PCR amplification, cloning and sequencing
We used tissues from 31 avian species, from 20 orders, based on the IOC world bird list v. 11.2 [43]. No live birds were collected or sampled for this research (see Supplementary File S1 for sample information including museum voucher numbers). DNA extraction was done using the Gentra Puregene Kit (Qiagen) following manufacturer's protocols. Samples were identified as female on a museum tag and/or must have been identified as female by amplification of CHD1 [17]. Only samples that were clearly genetically female were given further consideration.
PCR amplification of ATP5F1A intron 3 was done using primers ATP5A1_208mod (5' GTCCAAGCAGAAGAAATGGTTGA 3'; modified from Ellegren and Carmichael [19]) and ATP5A1_ex4R (5' AACTAGAACATCCACAATGGCACC 3'). PCR products were cleaned using an equal volume of PEG:NaCl (20%:2.5M) for precipitation, followed by washing with 70% EtOH. Cloning was done using pGEM-T-Easy vector (Promega Corp.), and clones were screened using PCR amplification of whole cells with universal primers (T7 and SP6). After screening, PCR products from clones with appropriate inserts were cleaned using a PEG-NaCl precipitation [44]. To be 95% certain that both gametologs (or alleles, if not recombining) were sampled (assuming both gametologs were amplified and cloned in equal frequencies), a minimum of 5 clones should be sampled, so we targeted multiple plasmids per sample.
Sequencing was done using ABI BigDye Terminator 3.1. Manufacturer's recommendations were followed, except that reaction volumes were cut to ¼-⅙ of recommended reagent volume. Sequences were analyzed on an ABI Prism 3100-Avant genetic analyzer (PE Applied Biosystems). Sequencing was initially done using universal primers (SP6 and T7). In some cases, internal primers were designed for specific taxa and used for sequencing to result in double-stranded sequencing.
Sequences were trimmed of vector, and examined by eye in Sequencher™ 4.1 (Gene Codes Corp. Ann Arbor, MI). Care was taken to assess when two distinct gametologs might be present. In cases where there were two divergent sets of sequences for a species, these were split into two separate contigs for assembly. In cases where there were not two clear gametologs, a challenge was to assess whether differences among sequences represented PCR error or might represent different, possibly recombining, alleles. To be retained in our dataset, we required at least two differences be identified, and that each of those differences had to be present in at least two clones. We felt that most PCR errors would likely be in a single sequenced amplicon, and the differences present in two amplicons were more likely to represent biological differences. This may have led to the exclusion of data for species still undergoing recombination at this locus, and where we sampled a homozygous individual (or where there was only a single difference between alleles). However, we felt that falsely inferring recombination was worse than reducing the number of species sampled. Since this approach did not allow determination of whether a gametolog was the Z or W version, we labeled the two variants with A and B.

Extraction of data from genome assemblies
To identify genomes with assemblies of the W-chromosome, we used the NCBI Genome Browser and restricted consideration to those genomes that included the number of assembled chromosomes available on Aug. 30, 2021. Of the 59 chromosomal-level assemblies, only 33 included an assembly of the W chromosome. For each of these 33 genomes (see Supplementary File S2), we identified the region of interest using genome BLAST (blastn). Our BLAST query was the Gallus gallus (chicken) Z chromosome segment that included exons 3 and 4, plus the intervening intron 3 of the ATP5F1A gene. The flanking exons were used as they were expected to be conserved between both gametologs, and so maximized identification of the appropriate regions. We only retained data in which the BLAST results included two good hits that covered most or all of intron 3. We labeled these sequences as being Z, W or unplaced based on the BLAST results.
To obtain protein sequences, we conducted tblastn searches of the genomes identified for intron 3. For landbirds, we used Taeniopygia guttata (zebra finch) ATP5F1AW sequence as the query, while for non-landbirds we used the Gallus gallus ATP5F1AW sequence. From the GenBank record identified using tblastn, we identified the protein accession and obtained the sequence for both the Z and W copies. Based on the existence of two in-frame stop codons and a frameshift, Melopsittacus undulatus (budgerigar) ATP5F1AW sequence appears to be a pseudogene (similar to several other parrots [45]). Therefore, we adjusted the inferred amino acid sequence in the alignment, recoding the stop codons as ambiguous (X). In a number of cases, the genome assemblies lacked annotations, and so we did not obtain the protein sequences for those taxa.

Identification of transposable element insertions
All sequences were submitted to the RepBase database using the CENSOR tool [46] to identify long transposable element (TE) insertions. Since long insertions in one or a few taxa can make alignment challenging, we used the output from CENSOR to identify boundaries and then excluded the TE insertions from our analyzed data. We noted where and when this occurred to explore and understand the frequency and distribution of TE insertions.

Sequence alignment and phylogenetic analyses
We combined our intron data (PCR-amplified and from genomes) with published ATP5F1A sequences from Ellegren and Carmichael [19] and Suh et al. [21]. One species (Bucorvus abyssinicus; Abyssinian ground hornbill) was sampled both by our PCR approach as well as from a whole genome. These were aligned in Mafft v. 7, using the online server [47]. The final alignment was trimmed to just include the intron 3 region (see Supplementary File S3 for the alignment). This was analyzed in IQ-TREE 1.6.12 [48] as implemented in the W-IQ-TREE web server [49]. For this, we had IQ-TREE identify the best-fitting model [50], allowing free rate models, and then estimated the maximum likelihood tree and 1000 ultrafast bootstrap replicates [51].
For the amino acid data, we aligned sequences as we had done for the nucleotides (see Supplementary File S4 for the alignment). Using the alignment, we evaluated the fit of three model sets to the aligned ATP5F1A protein sequences: 1) the models included in IQ-TREE; 2) the clade-specific models described by Pandey and Braun [52] and Minh et al. [53]; and 3) the clade-specific structural mixture models described by Pandey and Braun [52]. We used the fit_XBmixture_models.pl program, available from github (https://github.com/ebraun68/clade_specific_prot_models; last accessed 24 November 2021) to evaluate the fit of the clade-specific structural mixture models. In all cases, we used the corrected Akaike information criterion (AICc) to evaluate model fit. As described above, we conducted the tree searches and used 1000 ultrafast bootstrap replicates to assess support. We divided sites into solvent exposed and buried subsets using the site mixture probabilities (obtained using the -wspm option); sites where the contribution of the buried mixture class was >0.5 were classified as buried and those with ≤0.5 were classified as solvent exposed. We used PAUP* 4.0a169 [54] to count numbers of amino acid substitutions using the maximum parsimony (using ACCTRAN [55] reconstructions to assign changes to specific branches).

Expectations
Phylogenetic analyses of the sequence data should allow identification of likely barriers to recombination ( Figure 2). Several different patterns represent evidence for ancient barriers to recombination. First, cases where the same gametolog from multiple species cluster together to the exclusion of the second gametolog (green clade in Figure 2) suggest that the barrier to recombination occurred prior to divergence of those taxa. Depending on the taxa sampled it might be possible for a deep divergence to be limited to the two gametologs of the same species (blue clade in Figure 2). Due to the relatively short sequence for intron 3 of ATP5F1A (under 1 kb), it may be expected that in some cases the phylogenetic tree obtained using the ATP5F1A sequences may not have the power to recover relationships among avian orders that reflect likely divergence patterns among orders within Neoaves [56]. However, if we observe the gametologs of the same species in very different positions (e.g., brown branches in Figure 2), this would also reflect a case where there is an ancient barrier to recombination. Interpretation of results to assess when there have been barriers to recombination. Barriers to recombination are shown using red bars, and possible barriers are shown using gray bars. For taxon "g", the limited analyzed data have been unable to reconstruct this as a clade, but the divergence between gametologs suggests recombination has also ceased in the history of that species.
In contrast to the case for ancient barriers to recombination, it is difficult to distinguish between recent barriers to recombination and cases where recombination is occurring, because they are expected to result in few differences between gametologs (pink clade, Figure 2). Suh et al. [21] concluded that recombination had ceased in Cathartes aura (turkey vulture) and Podiceps cristatus (great crested grebe). Based on our p-distances, we estimate that the differences between gametologs for these two taxa were 0.015 and 0.005. Given this, we decided to classify any gametologs with a p-distance of at least 0.005 as having ceased recombining, and values below this as cases of ongoing recombination. In principle, cases where only a single sequence was obtained (rather than two gametologs or alleles, sequence e in Figure 2) could reflect continued recombination or even a very recent barrier to recombination. However, they can also reflect failure to amplify one of the two gametologs. Therefore, we discarded those sequences (see above).
Since we examined ATP5F1A intronic sequences, we expect them to behave in a manner consistent with male-driven molecular evolution (i.e., the Z chromosome gametologs will typically have longer branches; Figure 2). Although the introns in the W chromosome gametologs are expected to exhibit fewer substitutions, they are also expected to harbor a larger number of transposable element insertions [33,35,37].
In contrast to the intronic sequences, coding sequences are expected to exhibit a distinct set of patterns. Coding regions in recombining regions like the PAR or Z chromosome are likely to be subject to relatively strong purifying selection and, therefore, trees generated with those sequences are likely to have shorter branches when the lengths are measured in amino acid substitutions. On the other hand, coding sequences in non-recombining regions of the W chromosome, are expected to exhibit evidence of relaxed purifying selection, likely due to Hill-Robertson interference [57] involving other W-linked loci and/or the mitogenome [30][31][32][33]. In the extreme, this might lead to pseudogenization and/or complete loss of W-linked coding regions. However, even if the gene remains functional, the simplest expectation is a higher rate of non-synonymous substitution for W gametologs (e.g., Berlin and Ellegren [34]), although the signature might be more subtle (i.e., a higher proportion of non-synonymous substitutions that are likely to reduce fitness). Sassi et al. [58] reported that comparing numbers of amino acid substitutions in buried versus solvent exposed structural environments was a very sensitive way to differentiate between pseudogenes and functional genes; we propose that this metric can serve as a way to assess the relaxation of purifying selection on the W chromosome.

Success in identifying both gametologs
We obtained between 4-15 clones for 31 species (all but one species was sampled for at least 5 clones), with an average of 8.9 clones per species (Supplementary File S1). We could clearly identify two gametologs or alleles in only 17 species (55%) ( Table 1). Species where two gametologs were identified were sampled at about the same level (an average of 9.2 clones per species, with a range of 4-15) as those where only one gametolog was clearly identified (an average of 8.7 clones, with a range of 5-15).
For the sequences obtained from genomes, we found two good hits in 26 of 33 (79%) ( Table 1) genomes with an assembly of the W chromosome (Supplementary File S2). For 21 of these 26, BLAST identified a match on both the Z and W chromosomes. In five cases the match was to the Z chromosome and an unplaced scaffold, while in one case both sequences were identified as unplaced. In other cases, either there were no matches (two cases), or a match to one chromosome (three cases) or where too little data was present for one gametolog to include in analyses (one case).  20  31  14  17  10  14  Genomes  13  33  10  26  5  7  Published  5  7  5  7  0  0  Combined  25  71  22  50  13  21 The novel data we collected using PCR and cloning was congruent with data extracted from genomes. For example, for Bucorvus abyssinicus where data was obtained using both methods (PCR/cloning as well as extracted from a genome), the sequences were virtually identical. In other cases, where sampling included taxa that were related, our data clustered as expected with genome data. Thus, it appears both methods (once potentially problematic samples were excluded) yielded biologically relevant data. section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Transposable elements
For the sequences we obtained via PCR, there were no TE insertions in any samples. For the genomes, there were TE insertions in eight sequences. One TE insertion was shared among the W-chromosome sequence of three duck species. The other five sequences were from unrelated taxa and none of the TE insertions were shared. Of these five sequences, four were W-chromosome sequences and just one was a Z-chromosome sequence. In some cases, multiple TE insertions were present in the same sequence (e.g., Myiopsitta monachus [monk parakeet] had four TE insertions on the W-chromosome, one of which prevented identification of the 3' end of the intron).

Barriers to recombination
Phylogenetic analyses of the ATP5F1A intron yielded a tree consistent with reciprocal monophyly of Galloanserae and Neognathae ( Figure 3). Moreover, many orders that had multiple samples were recovered as monophyletic. However, a few orders (e.g., Apodiformes, Bucerotiformes) were not recovered as monophyletic. This is likely to reflect the fact that a single, relatively short segment of DNA is expected to fail to recover expected relationships in many cases [56]. However, the tree topology was able to provide information that was sufficient to identify the positions of barriers to recombination. The estimated phylogenetic tree makes it clear that there were a large number of independent barriers to recombination across Neognathae (Figure 3). . Samples collected using PCR and cloning as part of this study have A or B following the species name (since we could not a priori assign gametologs to the Z or W chromosome). Species names followed by chromosome and then Z, W, or Un are from the chromosomal level genome with the designation (Z, W, or unassigned) based on annotation in NCBI. If this is followed by "TE" the sequence contained a transposable element insertion. Species names followed by Z or W and then either Ellegren or Suh are from those published studies with the Z and W designation determined by those authors. Colors relate to clade colors in Kimball et al. [59].
Based on these results, we estimate at least 19 cessations of recombination at the ATP5F1A locus (Figure 4, red bars) based on either deep divergences within a species, or else evidence of cessation to recombination prior to speciation events. While the best interpretation of our results (Figure 3) suggested independent cessations to recombination in Tyto alba (barn owl) and Athene cunicularia (burrowing owl), the short internode and limited bootstrap support between these taxa make it possible that instead there was a single barrier prior to the divergence of Tytonidae and Strigidae. There were six species where the gametologs within a species were very similar (Figure 3), some of which may represent cases of alleles of ATP5F1A present in a region that is still undergoing recombination. The p-distances between sequences within a species for these six taxa was low (x = 0.006, range 0.000 -0.015; the value of 0 was in Phaethon lepturus where the only differences were indels, which are considered as missing data in p-distance calculations). In contrast, for the other 44 species, the differences were much greater (x=0.166, range 0.044 -0.306). If we consider a p-distance of 0.005 as the minimum difference to define cessation of recombination (see methods), then there are three additional barriers to recombination (Figure 3, gray bars), for a likely total of 22 independent cessations of recombination. That leaves three species (Pelecanus occidentalis, Phaethon lepturus, and Phoeniconaias minor) that may still be undergoing recombination.

Purifying selection on ATP5F1A proteins
Many Estimated branch lengths for protein sequences obtained from available chromosome-level genome assemblies with annotations were quite heterogeneous (Figure 5 and Supplementary Figure S1), with the W gametologs generally being associated with longer branches than the Z gametologs, indicating increased amino acid substitu-  tions for W gametologs. In addition to the evidence for increased amino acid substitutions in W gametologs, we found evidence for a disproportionate increase in the rate of amino acid substitution for the buried residues (numbers of exposed and buried substitutions for all terminal branches are presented in Figure 5). The ratio of the numbers of buried to exposed substitutions (normalized for the number of sites in each category) was 0.171 for Z gametologs and 0.463 for W gametologs (Supplementary File S5). Figure 5. Maximum likelihood branch length estimates of ATP5F1A protein sequences using a topology that maximizes congruence with the best estimate of the species tree [59] and places the recombination blocks in positions compatible with the intronic tree. These branch lengths were estimated using the MamXB model and numbers of exposed and buried substitutions on each terminal branch are presented to the right of each taxon. Colors correspond to those used in other figures. The maximum likelihood topology with ultrafast bootstrap support is presented in Supplementary Figure S1.
The two longest branches appeared to be pseudogenes; the W gametolog in Melopsittacus undulatus is a clear pseudogene with two in-frame stop codons and a frameshift and one of two W chromosome sequences in the Cygnus olor (mute swan) is an intronless sequence. The likely processed pseudogene from the Cygnus olor also has a substitution in a putative nucleotide binding site (the conserved QGQ sequence is RGQ in the conceptual translation of the mute swan pseudogene; see Supplementary File S4). The Cygnus olor processed pseudogene is sister to the anseriform Z chromosome gametologs and support for this relationship is relatively high (89% ultrafast bootstrap support; Supplementary Figure S1). The two pseudogene branches had an especially elevated buried to exposed rate ratio (0.689; see Figure 5 and Supplementary File S5), as would be expected for non-functional sequences. Excluding the pseudogenes does not alter the conclusion that the W gametologs have an elevated buried to exposed rate ratio (0.3; see Figure 5 and Supplementary File S5) that is elevated relative to the Z gametologs, emphasizing the relaxed purifying selection on the W chromosome.

Avian sex chromosome evolution is highly dynamic
Our results further highlight the dynamic nature of sex chromosomes in birds. We did not identify any orders that were united with another order by an ancient cessation of recombination, indicating that the ATP5F1A locus has independent evolutionary trajec-
The PAR is highly variable in length among birds [29] and it is likely that the ATP5F1A remains in the PAR in some cases, and outside of the PAR in others. Studies comparing gene order of the Z chromosomes have found that while gene content is similar across taxa, there are rearrangements of gene order on the Z chromosome [61,62]. Such rearrangements could result in ATP5F1A being located near loci where selection favors cessation of recombination, but away from those regions in other taxa.
Although we did not identify the Z and W gametologs for data we collected using PCR, the expectation is that the Z gametolog will be evolving more rapidly (Figure 2). When looking at the data extracted from genomes, this appears to be largely true ( Figure  3); longer branches were almost always associated with the Z gametolog. However, there may be some exceptions to this rule. For example, within Falco, the W gametolog of F. rusticolus (gyrfalcon) has a longer branch than the Z gametolog, and the total divergence within F. naumaniii (lesser kestrel) appears to be approximately similar for the two gametologs. Whether this is a biological difference (perhaps Falconiformes exhibit less striking patterns of male-driven molecular evolution) or it reflects a simple stochastic effect (e.g., it is possible most neutral regions on the falconiform Z chromosome would have longer branches than the gametologous region on the W but the region we sampled for this study did not) remains to be determined.
We also observed variation among orders when we analyzed protein sequences extracted from annotated genome assemblies. Even with the limited taxon sampling for proteins, we observed pseudogene formation in one order (Psittaciformes), whereas the other orders exhibited intact (and presumably functional) W gametologs. That there is high variation at the ATP5F1AW locus within Psittaciformes has been observed [45], with not only pseudogene formation but in at least one case, the apparent loss of the W gametolog. The protein analyses also highlighted a duplication within one order (Anseriformes). As more data becomes available, it is likely more such variation will likely be observed.

Cessation of recombination and relaxed purifying selection
Many of the changes in patterns of evolution for W chromosomes (and Y chromosomes in XY systems) can be explained by the increased strength of Hill-Robertson interference [57] after the cessation of recombination. This includes increases in the rate of non-synonymous substitution [34], instances of gene loss [29,35,63], and the accumulation of transposable elements and other repetitive DNA [33,[35][36][37]. Although it is tempting to view all of these as simple consequences of the weaker purifying selection in the non-recombining region of the W, it is possible that the transposable element insertions are actually causal, initiating the cessation of recombination rather than being a consequence of other mechanisms that suppress recombination [35]. Moreover, the strength of these effects is expected to vary over time; for example, it is generally thought that the strength of Hill-Robertson interference is likely to decrease [64] as gene loss proceeds. However, we note that the potential for Hill-Robertson interference between the W chromosome and mitogenome [32] might alter this expectation for W chromosomes.
Our findings were consistent with all of these expectations despite our narrow focus on the ATP5F1A locus. We observed the accumulation of transposable elements ( Figure  3), although our focus on a single locus made it impossible to examine overall rates of insertion. We also observed a higher rate of amino acid substitution in W gametologs ( Figure 5). Finally, we observed pseudogenization of one W gametolog and even found evidence for a processed pseudogene in Cygnus olar that likely arose from the Z gametolog of ATP5F1A. The Cygnus olar pseudogene is especially striking because processed pseudogenes appear to be rare in birds [65]; perhaps the W chromosome will prove to have a disproportionately large number of these genomic features as well.
To complement our observation that the ATP5F1A coding sequences from high-quality genome assemblies had a higher rate of amino acid substitution we examined the ratio of buried to solvent exposed substitution rates. This analysis revealed clear evidence of relaxed purifying selection. Specifically, we observed three buried to exposed rate ratios: one for Z gametologs, a second for the apparently functional W gametologs (almost twice the value for Z gametologs), and a third, even higher value, for pseudogenes. Improved taxon sampling might refine that finding further by providing information about fine-scale variation in the buried to exposed rate ratio.
We found the observation that the buried to exposed rate ratio for ATP5F1AW gametologs is intermediate between Z gametologs and pseudogenes to be somewhat surprising. The very high buried to exposed rate ratio for pseudogenes is straightforward to explain based on their loss of function, but the intermediate rate ratio implies the maintenance of function combined with substitutions that likely reduce fitness. ATP5F1A encodes the α subunit of the F1 portion of the mitochondrial ATPase. This raises a number of issues, all related to the question of whether female birds with a non-recombining W chromosome segment that contains the ATP5F1A have reduced mitochondrial ATPase function. F1-ATPase comprises three α and three β subunits [66], implying that female birds are likely to have mixed ATPase complexes with both Z-encoded and W-encoded subunits, unless there is some reason that the Z-linked (and presumably higher fitness) subunits will tend to dominate the functional enzyme complexes. One way this could be accomplished is lower expression of ATP5F1AW. Smeds et al. [33] examined Ficedula albicollis (collared flycatcher) RNA-seq data from Uebbing et al. [67], finding lower expression of ATP5F1AW in muscle (a tissue with high ATP requirements) but not in other tissues. Xu and Zhou [63] reanalyzed that data, finding generally lower expression of ATP5F1AW. Xu and Zhou [63] also analyzed Gallus gallus RNA-seq data, also finding lower expression of ATP5F1AW in multiple tissues. Another possibility is that the inclusion of at least one protein encoded by the presumably higher fitness Z gametolog in the α3β3 structure will yield a fully functional ATPase complex. Regardless of the details, if the Z-linked gametolog is sufficient for function (and for males, this is critical), why has ATP5F1AW apparently maintained function in so many lineages? Additional sequencing of the ATP5F1A region of avian W chromosomes, along with other studies focused on transcriptional activity and protein function, is likely to yield additional insights, both into the narrow issue of ATPase function and into the broader issues of W chromosome evolution.

Challenges in studying sex chromosome evolution
Our results also highlight the challenges of examining sex chromosome evolution. One reason we may have failed to identify two gametologs is that, in some cases, insertions of transposable elements may have led to much larger fragments for one gametolog than the other. Larger fragments may not amplify as robustly (and so may be in low abundance in a pool of amplicons) and/or may not clone as efficiently. PCR amplicons of vastly different sizes than expected were also not selected as likely clones, so we may have inadvertently excluded actual gametologs when selecting clones for sequencing.
Given that our results suggest that the majority of transposable element insertions are on the W chromosome (as expected [33,35,37]), this would bias us towards sequencing only the Z gametolog. Another factor that may prevent sampling both gametologs is that the primers may not bind (or may bind less efficiently) to one gametolog as compared with the other. That may lead to a pool of amplicons enriched for just one gametolog. Given that the Z chromosome evolves more rapidly in birds (e.g., Berlin and Ellegren [34] and Wang et al. [29]), it is most likely this would lead to amplification of the more conserved W gametolog.
We failed to amplify two gametologs in six species from four orders where we expected very divergent gametologs, and where there was a chromosomal-level genome that sampled both the Z and W gametologs for ATP5F1A. Using BLAST to a genome in the same order, we found our PCR/cloning results led to sequencing the Z gametolog in three species from two orders (Bucerotiformes and Psittaciiformes), and the W gametolog in three species from two orders (Caprimulgiformes and Passeriformes). This suggests that our failure to identify both gametologs may have arisen from multiple causes (though transposable element insertions may occur on the Z chromosome as well and the primers may have failed to bind to the W chromosome in some cases).
In at least some cases, an examination of our results suggests our conservative approach was likely justified. For example, we sampled two species of Passeriformes (Terpsiphone viridis [African paradise flycatcher] and Sitta pusilla [brown-headed nuthatch]) where our single sampled gametolog might have suggested ongoing recombination, yet where sampling in other passerines suggests we should have found very divergent gametologs. At the sequence level, passeriforms are among the most rapidly evolving avian order (see Wang et al. [68] and Berv and Field [69] for avian rate comparisons). This could make it even more challenging to amplify the Z gametolog, which would be consistent with our only sequencing the W gametolog in that case.
It also remains possible that our PCR and cloning results were correct for some or all of the data we excluded. Given the dynamic nature of sex chromosomes overall [10], and particularly the variation in W chromosomes across birds [22,35,39], our results could also reflect cases where one gametolog (most likely the W) may have been lost in some species. Similar to the passerine example in the previous paragraph, we also sampled a single gametolog in two parrots (Psittacus erithacus [grey parrot] and Trichoglossus haematodus [coconut lorikeet]) where our results from genomes also suggest we should have found two gametologs. However, southern blots of ATP5F1A suggest that the W gametolog has been lost in Psittacus erithacus [45]. Thus, our results (at least those where we sequenced the Z gametolog) could reflect biology (loss of the W gametolog) rather than a failure of our methods.
In other cases, we excluded data that may represent true cases where recombination is occurring. Based on data from Suh et al. [21], the New World vulture Cathartes aura likely ceased recombining very recently. We sampled a different New World vulture, Coragyps atratus (black vulture), and failed to find clear gametologs. It may be that recombination is still occurring in some New World vultures, or that an independent cessation has recently occurred within Coragyps atratus but the gametologs are sufficiently similar to each other that we classified this as a single gametolog sampled. We also sampled two other Accipitriformes from a different family (Buteo jamaicensis [red-tailed hawk] and Rostrhamus sociabilis [snail kite]) and could not identify two clear gametologs, which might be consistent if recombination is ongoing (or recently stopped) throughout this order.
Although our success rate at identifying both gametologs was much higher using the chromosomal-level genomes, there were still some cases where we did not identify both gametologs in samples that included an assembly of the W chromosome. Additionally, in some cases it appeared both gametologs were present, though not necessarily assigned to the sex chromosomes (typically not to the W chromosome). However, as the cost and ease of obtaining chromosome-level assemblies makes them more common, and likely improvements to assemblies will leave fewer unassigned fragments (particularly from regions likely on the W chromosome), this should become a more and more valuable resource for studying sex chromosome evolution [42]. Loss of genes on the W chromosome, however, will remain challenging to identify as analysis of genomes may not allow teasing apart true gene loss versus failure to sequence or assemble the W gametolog. In such cases, it may be necessary to incorporate other approaches (including older methods like Southern blots) to better understand the true dynamic nature of avian sex chromosomes Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Phylogenetic analyses of protein data, File S1: Sample information for new sequences, File S2: Information on genome sequences, File S3: Alignment of intron sequences, File S4: Alignment of protein sequences, File S5: Data on buried and exposed amino acid residues.  Data Availability Statement: Newly collected sequence data has been deposited in GenBank, accession numbers XXXX-XXXX.