Comparative Phylogenomics, a Stepping Stone for Bird Biodiversity Studies

: Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing conﬁdence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversiﬁcation after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reﬂect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conﬂict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird speciﬁc traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.


Introduction
Birds are a species-rich vertebrate group and are unparalleled in their adaptations to a wide range of habitats.Modern birds have achieved their current diversity during a long evolutionary 'journey', beginning with their divergence from the theropod dinosaurs when they evolved the ability to fly, followed by several bursts of diversification over ca. 100 million years of evolutionary history [1].Birds exhibit an extraordinary range of ecological, morphological, and behavioral traits that have inspired researchers for centuries to study the basic principles of nature.Birds are often key species in ecosystems as herbivores, predators, scavengers, seed dispersers and pollinators [2], so their biodiversity and population density are used as indicators to evaluate the root causes of biodiversity changes in ecosystems around the world [3,4].Many bird species are also under threat and 24% of all species are listed on the IUCN RedList under a category of concern.Our understanding of bird biodiversity relies on our knowledge of their genomes, because genomes are both logs of evolutionary history and blueprints for future adaptive potential.
Genome sequences are being generated at an ever-increasing pace and at greater evolutionary breadth for thousands of species across the tree of life.Sequencing technology has reached most fields of biology and has expanded our knowledge of the principles of biology.After the milestones of genome sequencing of chicken [5], turkey [6], and zebra finch [7], the Avian Phylogenomics Project added 45 newly sequenced genomes across the tree of life of birds [8][9][10].Building on this success, the Bird Genome 10K (B10K) Consortium has been announced in 2015 with the aim to sequence all ~10,500 extant bird species [11].Until now, over 300 bird genomes have been sequenced and are being analyzed by the B10K, and the number is growing every week.These genomes are already providing insights on the mechanisms of evolution and the speciation process, physiology, morphology and development, genome composition, and how we can use genomic information for bird conservation.
The B10K project aims to more finely resolve the phylogenetic relationships among birds.Phylogeny is central to biology not only because it reflects evolutionary history, but also because it distinguishes similarities due to shared history from similarities due to shared function, and thus provides the foundation for understanding genomic organization and function.The increased availability of avian reference genomes shows that phylogenetic signal, or lack thereof, is not evenly distributed across bird genomes, but differs among genomic compartments such as protein-coding regions and introns [10,12].Phylogenomics, inherently a comparative discipline comparing homologies between different taxa, is becoming even more comparative when we can design phylogenomic analyses to interrogate different genomic compartments for phylogenetic information content.
With increasing numbers of bird genomes due for sequencing, it is timely to review the current status of the bird genomic sequencing efforts and what we have learnt from comparative genomics about avian evolutionary history.In this review, we will summarize the state-of-the-art strategies specifically for bird genome sequencing, the phylogenomic efforts and challenges, and the latest progress on comparative genomics and conservation genomics of birds.Given the rapid pace at which we are moving towards denser and denser sampling of the bird tree of life, we will give an outlook for ornithological research empowered by wide access to genomes.

Comparison of Currently Available Sequencing Strategies
In the past decade, we have witnessed a rapid change of DNA sequencing technologies with a new strategy for reference genome sequencing and assembly being proposed every two years.Here, we compare the most commonly used strategies for four factors that need to be considered by the investigator before embarking on bird genome sequencing (Table 1).These four factors pertain to the required DNA input, assembly quality with respect to contiguity and completeness, and the sequencing cost.
Since short read sequencing became commercially available in 2006, it has been quickly applied for de novo reference assembly for large eukaryotic genomes [13].Short read sequencing is producing millions of pairs of reads (presently 100 to a few hundred bp) that must get stitched together into contigs and scaffolds.The first approach, proposed in 2008, involved the sophisticated design of sequencing matepair libraries with a series of insert sizes ranging from 200 bp to 40 kb and produced highly continuous scaffolds (scaffold N50 size >1 Mb) for animal genomes.About 100× coverage of short sequencing reads were produced from these libraries and assembled with the de Bruijn graph algorithm [14,15].This was the first demonstration of the feasibility of using short reads for reference genome assembly and has reduced the cost for producing a reference of human genome size from $3 billion to $3 million at the time.It became the dominant approach for producing de novo assemblies for plant and animal genomes between 2008 and 2012, including the first batch of 101 vertebrate genomes at BGI for the Genome 10K consortium, which also included some birds [16,17].A drawback of this approach is that is requires much input DNA, prohibiting its use for many biodiversity studies, where fresh samples are often not accessible.A simplified approach was developed specifically for sequencing of bird genomes for the Avian Phylogenomic Project [18].It only uses two sequencing libraries because most bird species have relatively simple genomic features with small genome size and low repeat content.This approach allows the use of appropriately preserved museum specimens in lieu of fresh tissues.Owing to the relatively narrow span of the insert size of the sequencing library, the assemblies produced with this second method are normally highly fragmented due to the difficulty in sequencing regions with deviating nucleotide composition and repetitive elements.A "whole genome" for a bird may be missing 20% of the average genome from the assemblies [19].Despite advances in read length capabilities that improve resolution in GC-rich regions and repetitive regions, even the longest reads (currently ~2 Mb with Oxford Nanopore) still do not span avian microchromosomes (>3 Mb) nor macrochromosomes [20]).
The following two approaches can produce extraordinary assemblies with long continuous contig and scaffold size, given that high molecular weight DNA is available as input.The 10X genomic sequencing approach provided the first demonstration to produce a highly continuous genome assembly for the human genome and was soon adopted for other species.This technology relies on high molecular weight DNA to produce long range information for short reads but with input DNA as low as 300 ng.The most recently developed approach, single-tube long fragment sequencing (stLFR), can also generate long fragment reads by sequencing short reads for sub-fragments of one long DNA molecule with DNA-barcoding [21].These two strategies provide a good balance between input DNA, cost and assembly quality.Nevertheless, the assembly quality of both methods heavily depends on the quality of the input DNA, and therefore hinders their broad application to species for which fresh tissues with molecular weight DNA are hard to collect.
Without further genetic linkage or physical mapping data, the assemblies from these four methods are still highly fragmented and cannot be assigned into chromosomes.Of the 147 bird genomes on NCBI (as of May 2019), only 13 have been assembled into chromosomes.Several methods have been developed to scale up the scaffolding to chromosome level (Bionano, optical mapping, Hi-C) [22][23][24][25].The DNA Zoo consortium (dnazoo.org)has proposed a two-fold strategy for reference genome assembly by first generating short reads with 100× coverage from one short insert size library for contig assembly, and secondly another 100× coverage short reads data from one Hi-C library for scaffolding [26].The first release of 100 mammalian genomes using this strategy produced remarkable assemblies of which many are at the chromosome level, while the contig sizes are of a similar range to those produced by other short read assemblies.This strategy thus represents the most economical way to produce a chromosome-level assembly.Nevertheless, a standard Hi-C library construction protocol is still not available yet [27] and the data produced from different protocols behave differently during assembly and require time-consuming manual correction [28,29].Therefore, the accuracy of the assembly produced with this strategy still needs the validation from other datasets.
The recently initiated Vertebrate Genome Project (VGP) has set a high standard for reference genome assembly with the goal of almost gapless chromosome-level assemblies by integrating various sequencing technologies including PacBio long read sequencing, Bionano, 10X, and Hi-C data (https://vertebrategenomesproject.org).To facilitate processing of these heterogenous data sources, the VGP is also working on a standard computation pipeline.Though the VGP approach is expensive regarding costs and input DNA, an increasing number of genomes have now been done with this approach, including several birds (https://vgp.github.io/genomeark/).The VGP consortium and B10K are now also taking the joint effort to produce chromosome-level genome assemblies for at least one species representing each avian order using the VGP approach.

Towards Complete Genomic Representation of Birds
To date, birds are the vertebrate group with the most genomes available for different species.As of May 2019, a total of 456 bird genomes have been sequenced and assembled, of which 147 are available on NCBI.The B10K Consortium is a major contributor of these genomic resources.The B10K sequencing plan is organized into four phases with the ultimate goal to decode the genomes for all bird species [11].Each phase is sampling the bird tree of life at increasing resolution, moving from deep nodes to increasingly shallower divergences (Figure 1).The first phase of the B10K, the Avian Phylogenomics Project, concluded in 2014 with the release of 45 bird genomes of at least one species from each neoavian order [9,10], hence sampling all of the deepest branches of the phylogeny.The second phase has started in 2017 to sequence at least one representative per family.Sequencing was successful for 218 families (following the taxonomy of Howard & Moore 4th edition [30]) while 18 families could not be represented due to the lack of appropriate samples.Altogether, phase 2 analyzes 363 genomes.Of these, 90 are published, 273 were newly sequenced including 32 still unpublished genomes that were contributed by the community to bolster B10K sampling.The analyses for phase 2 of the B10K are underway to provide a densely sampled phylogeny, new comparative genomics insights and a flood of new data for the community, including a novel whole genome alignment.
The third phase has been initiated concurrently in 2019 aiming to cover one representative from the 2341 extant genera, which will involve sequencing genomes for at least 2250 species in addition to the publicly available genomes.The fourth phase of the B10K will make a big leap to sequence all 10,135 species [30].It will require tremendous effort from the entire consortium to provide resources, laboratory support and funding.At the same time, the sequencing of individual species with specific biological research interests has also been initiated in individual labs all over the world.As of May 2019, 109 birds have been sequenced outside of the B10K and released to NCBI.Moving forward with ever denser sampling, the probability of duplicated efforts of sequencing the same species increases and resources can be saved by interrogating the B10K progress database (https://b10k.genomics.cn/species.html).With sequencing technologies becoming affordably available for researchers and an ever-growing base of available genomes that can be used as frameworks for new genomes, the generation of more reference genomes can be expected to grow rapidly.Overview of genomes available and in preparation for all 10,135 species of birds.We use the organization of the B10K into four successive phases to show the increasing density of genome-enabled species across the phylogeny of birds.Each phase aims to cover more fine scale relationships among bird taxa: Phase 1 sequenced 48 species from most orders (branches highlighted in blue), phase 2 includes 363 birds from most families (red), phase 3 is planning to sequence 2250 birds from all genera (yellow), and eventually all species will have genomes upon completion of phase 4 (grey).About 60% of bird species are Passeriformes and hence sampling will proportionally increase in this group starting from phase 2. The shown tree is a 'synthetic' phylogeny of birds that is estimated from taxonomic information and previous phylogenetic studies [31] because a complete, sequence-based bird phylogeny is not available.Subspecies were removed from the tree and species names were matched to the taxonomy of Howard & Moore 4th edition [30].

Centrality of the Phylogeny for Comparative Studies
'Tree thinking' has been a fundamental concept for every branch of biology since Darwin's Origin.Phylogeny is essential to understand biodiversity because not only does the species tree record the evolutionary history, the divergence events and their pace, but it is also the framework for comparative genomics.Explicitly tree-based analyses of associations between phenotypic and genomic traits are necessary to evaluate the statistical (in)dependence of data points [32].We can only understand the origin and later modifications of phenotypes when they can be mapped onto correctly reconstructed phylogenetic trees.Faulty trees produce incorrect inferences of evolutionary history and could misguide inferences of future resilience of species to environmental change.Therefore, producing an accurate species tree is a fundamental first step of any biodiversity study.
Despite decades of efforts on reconstructing the tree of life for birds, the avian phylogeny is still under debate.Agreement exists widely on the split of Palaeognathae (ostriches and tinamous) and Neognathae, and of Galloanseres (waterfowl and landfowl) and Neoaves.The deep branches of the Neoaves, which contain 95% of all extant species, have been subject of numerous phylogenetic hypothesis (summarized by [31,33]).The reason for the complexity of resolving these branches is thought to be grounded in the evolutionary history of the group.Neoaves appear to have diversified rapidly [34][35][36].Such a scenario of explosive diversification of Neoaves was confirmed by the genome-scale dated phylogeny, showing that all major groups of birds formed within just 15 million years after the Cretaceous-Paleogene (K-Pg) boundary, ca.66 million years ago (Figure 2a, [10]).New fossil evidence of an Early Paleocene mousebird suggests that morphological differentiation may have occurred even faster after the K-Pg extinction event [37].Multiple divergences within an evolutionary blink of the eye can leave their signature in phylogenies as a tight succession of nodes [38].It can be expected that one must scan a large proportion of the genome to find sufficient signal that captured this branching sequence.The avian phylogenetic community has readily embraced the evolving technologies to sequence ever-increasing parts of the genome and species, from whole mitochondrial genomes [39], nuclear loci from Sanger sequencing [12,40] or targeted capture of hundreds to thousands of loci [41,42], to whole genome data [10].
The tidal flood of whole genome data used for phylogenetic analysis of 48 birds held up to the promise of providing the most robust tree for birds to date (Figure 2a, [10]).The main hypothesis was based on over 40 million base pairs from protein-coding and non-coding data, which were exposed to great phylogenetic scrutiny.Another 322 million base pairs aligned across lineages, a dataset too large to experiment much in terms of subsetting and filtering, was largely in agreement with the main hypothesis.Neoaves were found to be composed of two main clades, Columbea and Passerea, each of which contained both land and water birds.But even with genome-scale data, there was difficulty in resolving the early branches of Passerea, which was attributed to artifacts introduced by molecular convergence of protein-coding genes and incomplete lineage sorting (see next section).This was not the end of alternative hypotheses, when in the following year a study with less sequence data (0.39 million base pairs) but denser sampling of 198 species proposed quite different evolutionary relationships among Neoaves (Figure 2b, [42]).While statistical support for some of the nodes was often low, a number of conflicting hypotheses were put forward.These included the position of the enigmatic hoatzin as sister to the core landbirds (Inopinaves of Prum et al. [42], equivalent to Telluraves of Jarvis et al. [10]), while it was supported as the sister to cranes and waders in the Jarvis et al. [10] tree.The placement of Caprimulgiformes (swifts, nightjars, hummingbirds) as the sister group to all other Neoaves [42] differs dramatically from the placement as the sister to cuckoos, tauracos and bustards within Passerea [10].Birds with an aquatic lifestyle were found to occur both in Columbea and Passerea (Figure 2a), which argued for a convergent evolution of the water bird phenotype.This differs from Prum et al. [42], where water birds (except for Gruiformes) evolve from a common ancestor (Aequorlitornithes, Figure 2b).This example illustrates that these rearrangements of the deep nodes are not simply phylogenetic details, but they alter our interpretations about ancestors and the evolution of specific traits and lifestyles.
While these different phylogenetic hypotheses may be disheartening if one simply needs the 'right tree' for comparative or functional analyses, incongruences can point to some fascinating evolutionary processes that left their mark on the genomes [43,44].On the other hand, the complexity of confidently resolving the bird phylogeny has made this data set important for phylogenetic method development (e.g., [45][46][47]) and for the exploration of incongruence [12,31].

Challenges to Phylogenetic Resolution
Phylogenetic incongruence is conflict in the branching order between two or more phylogenetic trees that cannot be reconciled [48].Disagreement can occur between data sets, analysis methods, and between different regions of the genome.Causes are diverse including (1) sampling issues, (2) inadequacies of model assumptions, (3) differing evolutionary histories of genomic compartments, and (4) biological processes.The first two stem from errors introduced by how we generate the data and how we infer evolutionary relationships from them.The latter two are manifestations of the evolutionary process on the genome.Incongruence can be an important pointer to sort genuine phylogenetic signal, i.e., information that reflects the true evolutionary history, from technical error or biological factors that misleads phylogenetic estimation.

Incongruence as a Sign of Data Problems
Failure to confidently resolve certain nodes may simply be due to the lack of power either by insufficient sequence information or by insufficient sampling of branches.The neoavian backbone tree has the 'undesirable' feature of tightly packed nodes that reflect the short amount of time for lineages to sort and new mutations to occur [10,36].On the other hand, at least 50 million years have passed since these divergences during which the little phylogenetic signal may have been eroded or altered in a misleading way.One solution could lie in screening even larger parts of the genomes for informative sites than the 322 Mb analyzed by Jarvis et al. [10].About 68% of the average bird genome was left unexplored for potential phylogenetic signal.Improved whole genome aligners that identify greater proportions of orthologous regions than previously possible could make more of the bird genome accessible for phylogenomics [49][50][51].
With genomes from as few taxa as currently available, branches have to extend deep into the phylogeny to connect with their relatives, which can lead to the problems with long branch attraction [52][53][54].The problem may be ameliorated by adding branches to the tree to span greater parts of the evolutionary history and changes the distribution of node ages from deep to younger divergences [55][56][57][58][59].In some cases, the positive effect of adding taxa can outweigh the improvement from adding sequence data [60,61].Such a scenario was also proposed to explain the alternative branching order for Neoaves put forward by Prum et al. [42].A reanalysis of the data did not confirm the hypothesis that increased sampling explains the different topology, but rather found a strong impact of the data type used for phylogenetic reconstruction [12].The last word may not be spoken on this matter and the hundreds of forthcoming bird genomes will provide a powerful dataset to test the effects of increased sampling on difficult nodes.
Modeling the evolutionary processes that influence sites is a complex problem and inappropriate model choices can produce faulty topologies and branch lengths [62,63].Model violations are a sneaky problem that cannot be fixed by adding more data (as opposed to the stochastic errors from uneven sampling discussed before), but it can even be exacerbated by adding more loci, leading to faulty inferences with strong statistical support [64,65].The assumptions that commonly used substitution models make about the evolutionary process at each site are frequently violated across the genome [66,67].In birds, this is likely also the case because GC content varies considerably between avian lineages [10,68].Complex partitioning schemes and more realistic models that account for heterogeneous processes on each alignment site (e.g., CAT, [69]) can alleviate problems [70], but are often computationally prohibitive on genome-scale datasets.[42] with 198 species but only 0.4 Mb of conserved coding sequence data.The two hypotheses are difficult to reconcile because of differences in the extent of genomic and taxonomic sampling, the type of loci, and the method of analysis.Clades that are discussed in the text are annotated, for the complete systematic classification the reader is referred to the original studies.

Phylogenetic Incongruence as a Pointer to Evolutionary Processes
Rather than viewing conflicting patterns of sequence data as a nuisance to the goal of deciphering the tree of life, some incompatibilities can be the footprint of evolution acting on the genome [71][72][73].Introgression and hybridization can be detected from incongruent gene trees [74,75], with the hybrid being grouped with either parent species when analyzing different genomic sections [76].Hybridization appears to be common in birds, with 16% of bird species known to interbreed across species [77] and is gaining increasing appreciation with whole genome data [78].Incomplete lineage sorting (ILS, deep coalescence) is a process during which ancestral genetic variation is retained throughout the speciation process [79,80].From population genetics theory, it is understood that ILS arises (1) when population size is large, which increases the probability that an ancestral polymorphism traverses through the split, and (2) when few generations pass between successive splitting events, which decreases the probability of fixation [81].Albeit a population process, ILS is not limited to recent splits but can affect deep divergences [82].Therefore, the strong ILS affecting the entire neoavian backbone implies that ancient populations had large effective population sizes and diverged rapidly [10,83].A similar scenario can be drawn for the strong ILS within Palaeognathae, which confounds the estimation of their species tree [84].At the extreme, speciation events may have occurred too quickly for populations fixing their own set of alleles, so it may simply be impossible to distinguish the order of events with a bifurcating tree.The existence of a hard polytomy across most of the neoavian backbone has been repeatedly suggested [33,36,85] but also refuted [12,40].Formal gene-tree-based tests based on genome-wide loci reject a polytomy at least for the first three neoavian nodes [86].
Different genomic compartments, such as coding regions, intergenic regions, mitochondrial genomes, but also structural variations, can be expected to be under different evolutionary constraints and hence have different phylogenetic signal in birds [10,41,[87][88][89].Effects of the data type have been put forward to explain some of the alternative hypotheses of the Neoavian backbone [10,12,61].If a locus is under selective constraint, phylogenetic signal may be overwritten by homoplasious substitutions in unrelated lineages, multiple substitutions on the same site, and heterogenous base composition [90].Site saturation and heterogenous base composition can be incorporated into substitution models, but molecular homoplasy can only be identified when another data type, ideally not affected by homoplasy, is available for comparison.This is where we return to the water birds.In the Prum et al. [42] tree, which is based on mostly protein-coding sequences, most water birds get grouped together (Figure 2b).The same was found by Jarvis et al. [10] but only when analyzing protein-coding genes [12].The majority of the genome, which is non-coding, supports independent origins, which implies that the similarities of the protein-coding sequences of 'water birds' arose by molecular homoplasy [10].Another example are the Caprimulgiformes (swifts, nighjars and hummingbirds), which are the sister group to turacos, cuckoos and bustards when whole genome sequences are analyzed (Figure 2a), but shift to be the sister group to all other Neoaves when only protein-coding genes are analyzed (Figure 2b) [10,12,42].

Outlook
Bird phylogenomics is looking at a resourceful future with new genomes coming for hundreds of branches of their tree of life.The data will provide new angles to address long standing questions regarding the evolutionary history of birds, to study the distribution of phylogenetic signal throughout the genome and will further the development of new computational tools.A more densely sampled phylogeny will also be informative beyond the backbone of Neoaves.For instance, Passeriformes comprise about 60% of all bird species but only a handful of species have whole genomes available.Phase 2 of the B10K will release new genomic resources for over 200 passerines, which will provide a new scaffold to test macroevolutionary questions, such as the impact of historical factors on their extraordinary diversification [91][92][93].
With data production largely outpacing the development of appropriate inference methods, continued improvements of the scalability, accuracy and integration of the phylogenomic workflow are needed [94][95][96][97][98]. Another focus is on genome-scale reconstruction of phylogenetic networks, allowing the species tree to have circularizations pointing to hybridization [99].The abundance of phylogenomic data also opens a new chapter for locus selection, where genomic loci can be chosen based on their characteristics, rather than being predefined with PCR or targeted capture [43].The ability to design genomic experiments to test phylogenetic signal, or lack thereof, in different genomic compartments, across the same set of taxa, adds strength to whole genome datasets over studies with just a single data type.Selections of loci can be designed to answer specific questions (e.g., resolve the position of a specific taxon, [100]), or to have particular properties (e.g., low base-composition heterogeneity).

Genomic Architecture of Birds
Avian genomes make connecting genomic differences to phenotypes simpler than other vertebrates due to their small genome size and conserved genomic structure.Birds have about the same number of genes as other amniotes [101], while their genome are much smaller than those of other amniotes and most vertebrates [102] (Figure 3a, left).Compact genomes have been commonly related to flight as genomes are small in birds and bats [103], smaller in flying than in flightless birds [104], and smaller in metabolically active birds [105,106].The ancestral bird genomes have experienced massive large segmental deletions of >10,000 base pairs and purging of transposable elements (TEs) [9,103].Bird genomes contain less repetitive DNA than other vertebrates, whose genomes can consist of up to 60% TEs [107].The exception is the downy woodpecker with about 22% TEs [9] (Figure 3a, center).Autosomes of the collared flycatcher have a similar low TE content (5.9%) as other birds, but the species has a much higher density of TEs (48.5%) on the non-recombining region of the W chromosome [108].
Protein-coding genes of birds are on average 50% shorter than mammalian genes, which is due to the shortening of introns and intergenic regions while the exon length is about the same [9] (Figure 3a, right).Synteny is high even between distantly related birds, as are rates of gene and chromosomal duplications [109][110][111].Therefore, there is generally a high stability of karyotypes with some exceptions with chromosome numbers ranging from 40 to 142 [112].The ZW sex chromosome system of birds has heterogametic females (ZW) and homogametic males (ZZ).The Z chromosome formed from an autosome through an inversion, which placed the putative sex determining gene DMRT1 on the Z chromosome, while it degenerated on the W chromosome [113][114][115].A karyotype modification has occurred in Sylvioidea, where a new sex chromosome appears to have formed by the fusion of autosomes and conventional sex chromosomes [116].

Understanding Adaptation by Integrating Genomic Data with Phenotypes
Centuries of natural history study on birds have recorded the stunning diversity of their morphology and life histories.Genomics now makes it possible to unveil the mechanisms underlying this variety, of which we illustrate some examples in Figure 3b.Conducting searches for candidate genes in a phylogenetic framework helps searching for the candidate 'needle' in the genomic 'haystack' of countless genetic differences that exist even between closely related species.Integration with the phylogeny also permits the reconstruction of ancestral states such as the loss of teeth in the ancestor of birds that inactivated all genes involved in the formation of dentin and enamel [117].The bird beak functionally replaced the teeth [118] and diversified into a tremendously effective tool in different bird lineages.The classical system to study beak shape diversity are Darwin's finches, in which cross-species genomic comparisons identified two genomic regions that are strongly associated with beak shape and size, respectively [119,120].Beaks, as feathers, scales and claws are mainly composed of keratins [121].Birds have much diversified their beta-keratin gene repertoire compared to other vertebrates, with a subfamily of feather beta-keratins that is unique to birds, while claw and scale beta-keratins are also found in turtles and crocodiles [9,122].Visual opsin genes are diverse in birds, supporting tetrachromatic vision in the ancestor [9], and they coevolve with plumage coloration genes [123].Penguins have only three functional opsins [9,124] and the nocturnal kiwi has inactivated of several opsin genes [125].
Availability of genomes for a greater number of species permits to target the genomic foundation of special traits of certain lineages.The large variety of rock pigeon breeds were one of Charles Darwin's inspirations and comparative genomics showed that the characteristic head crests of many breeds are associated with just a single genetic change [126].Bird song evolved in hummingbirds, and again either once in the ancestor of parrots and passerines (with subsequent loss in suboscines), or independently within oscine passerines and parrots [10,88].This phenotypic convergence is also met by underlying molecular parallelism, because the vocal learning birds show convergent accelerated evolutionary rate in several hundred genes and regulatory regions [9].The convergence goes even farther with gene expression in the brain of song-learning humans and songbirds showing surprisingly similar patterns [127].
Comparisons between closely related sister taxa with different phenotypes is a powerful approach for understanding the genomic basis of adaptation, which was used to identify candidate genes for flightlessness between the flightless Galapagos cormorant and its flying relatives [128].Flight may have been lost 3 to 6 times in palaeognaths and the convergence is also found on the molecular level affecting the regulatory regions of genes involved in forelimb development [129].The blue and yellow feather pigmentation of budgerigars is maintained by a single nucleotide difference in a polyketide synthase [130].Adaptation of birds to low oxygen levels at high altitudes involves increased affinity of hemoglobin.In high-altitude hummingbirds, the same molecular substitutions confer parallel adaptation of hemoglobin [131].To answer if this molecular predictability holds across a wider range of birds, hemoglobin sequences were compared between 28 species pairs of low-and high-altitude birds to find that higher oxygen affinity can be achieved through multiple routes of amino acid changes [132].

Outlook
Birds will remain an important source of comparative genomic insights because of their simple genomic structure and the availability of genomic and phenotypic resources.Phenotypic data are ideally are compiled and curated by experts and are accessible informatically in databases for integration with genomes.More genomes will give us an understanding on lineage-specific genomic innovations and the change of function of genes in different lineages.Accessibility of a growing number of genomes through NCBI and genome browsers but also the continued development of bird-specific portals (e.g., Avianbase, [133]) will be important for investigations of specific gene families or genomic regions.

Conclusions
We are in an age of unparalleled access to the evolutionary chronicles of birds.Every year a growing number of whole genomes, RNAseq, targeted capture, and RADseq data becomes available for birds.The choice among several sequencing technologies allows investigators to tailor the genome quality to their needs.Genomes based on short reads are relatively cheap and easily done, long-read sequencing technologies can be employed if continuity is required.While data generation has never been easier, the bottleneck in sequence analysis is largely set by the efficiency of algorithms and by computational skills and resources.Each new dataset is a small step towards a complete coverage of the bird tree of life, towards linking genomic variability to phenotypes and towards predicting species responses to environmental change.Comparisons with other vertebrates and other animals will shed further light into what makes a bird a bird and into their fascinating evolutionary history.

Figure 1 .
Figure 1.Overview of genomes available and in preparation for all 10,135 species of birds.We use the organization of the B10K into four successive phases to show the increasing density of genome-enabled species across the phylogeny of birds.Each phase aims to cover more fine scale relationships among bird taxa: Phase 1 sequenced 48 species from most orders (branches highlighted in blue), phase 2 includes 363 birds from most families (red), phase 3 is planning to sequence 2250 birds from all genera (yellow), and eventually all species will have genomes upon completion of phase 4 (grey).About 60% of bird species are Passeriformes and hence sampling will proportionally increase in this group starting from phase 2. The shown tree is a 'synthetic' phylogeny of birds that is estimated from taxonomic information and previous phylogenetic studies[31] because a complete, sequence-based bird phylogeny is not available.Subspecies were removed from the tree and species names were matched to the taxonomy of Howard & Moore 4th edition[30].

Figure 2 .
Figure 2. Overview of the main competing hypotheses based on two studies with different strengths in their data acquisition: (a) Sequence-focused analysis of Jarvis et al. [10] based on whole genome sequence data for 48 birds analyzing 8251 protein-coding loci, 2516 introns, and 3679 ultraconserved elements; (b) Taxon-sampling-focused analysis by Prum et al.[42] with 198 species but only 0.4 Mb of conserved coding sequence data.The two hypotheses are difficult to reconcile because of differences in the extent of genomic and taxonomic sampling, the type of loci, and the method of analysis.Clades that are discussed in the text are annotated, for the complete systematic classification the reader is referred to the original studies.

Figure 3 .
Figure 3. Insights from comparative genomics.(a) Characteristics of the bird genome.Left: The genomes of birds are small for vertebrates and smaller than all amniote genomes as shown in boxplots of C-values (the haploid nuclear DNA content picograms) for each clade and species-specific points.If multiple values for one species were available, we show the average.Data from http://www.genomesize.com/.Center: Birds have lower abundance of transposable elements (TEs) than other vertebrates.If multiple values for one species were available, we show the average.The asterisk marks the high TE density on a non-recombining region of the collared flycatcher, while the autosomes are within the range of other birds[108].All other data were compiled from[9,107].Right: Bird genes are shorter than those of other amniotes because of the reduction of intergenic and intron length.Data for amniotes from[9], unavailable for other vertebrates.(b) Selection of insights into phenotype-genotype interactions from bird genomes.