The Vertebrate TLR Supergene Family Evolved Dynamically by Gene Gain/Loss and Positive Selection Revealing a Host-Pathogen Arms Race in Birds

: The vertebrate toll-like receptor (TLRs) supergene family is a ﬁrst-line immune defense against viral and non-viral pathogens. Here, comparative evolutionary-genomics of 79 vertebrate species (8 mammals, 48 birds, 11 reptiles, 1 amphibian, and 11 ﬁshes) revealed di ﬀ erential gain / loss of 26 TLRs, including 6 (TLR3, TLR7, TLR8, TLR14, TLR21, and TLR22) that originated early in vertebrate evolution before the diversiﬁcation of Agnatha and Gnathostomata. Subsequent dynamic gene gain / loss led to lineage-speciﬁc diversiﬁcation with TLR repertoires ranging from 8 subfamilies in birds to 20 in ﬁshes. Lineage-speciﬁc loss of TLR8-9 and TLR13 in birds and gains of TLR6 and TLR10-12 in mammals and TLR19-20 and TLR23-27 in ﬁshes. Among avian species, 5–10% of the sites were under positive selection (PS) (omega 1.5–2.5) with radical amino-acid changes likely a ﬀ ecting TLR structure / functionality. In non-viral TLR4 the 20 PS sites (posterior probability PP > 0.99) likely increased ability to cope with diversiﬁed ligands (e.g., lipopolysaccharide and lipoteichoic). For viral TLR7, 23 PS sites (PP > 0.99) possibly improved recognition of highly variable viral ssRNAs. Rapid evolution of the TLR supergene family reﬂects the host–pathogen arms race and the coevolution of ligands / receptors, which follows the premise that birds have been important vectors of zoonotic Our results elucidated the dynamic evolution of TLR super gene family with di ﬀ erent level of gene gain and loss leading to variable species and lineages speciﬁc TLR repertoire across vertebrates and positive selection analysis of avian TLRs showed extensive adaptive evolution shaping host pathogen arms race in the avian TLRs. Our results show that the TLR supergene family is broadly divided into six major families, including TLR1, TLR3, TLR4, TLR5, TLR7, and TLR11, with each having di ﬀ erent numbers of subfamilies—e.g., 10 subfamilies in TLR1, 10 subfamilies in TLR11, 3 subfamilies in TLR7, and a single subfamily in families TLR3, TLR4, and TLR5. We conclude that most of the TLRs originated early during vertebrate evolution and that genome duplication together with di ﬀ erential rate of gene gain and loss shaped the TLR gene family evolution in vertebrates, leading to species and lineage-speciﬁc TLR variations. We found that gene loss was common in the avian lineage, resulting in only eight extant subfamilies, the least observed among vertebrates. In contrast, 20 subfamilies were detected in ﬁshes, reﬂecting the strong impact of gene duplication. The genomic scan of diverse reptilian genomes conﬁrmed the presence of TLR13, TLR15, TLR18, TLR21, and TLR22 in reptiles (Table 1), supporting other evidence that these TLRs are widely distributed and not limited to a particular vertebrate species or group For example, TLR15 was not bird-speciﬁc and is also found in most reptilian genomes. frogs, birds.


Introduction
The toll-like receptor (TLRs) supergene family members are type-I transmembrane glycoproteins belonging to the pathogen recognition receptor (PRRs) class of proteins expressed in the cell membrane and intracellular vesicles, including the endoplasmic reticulum, endosomes, lysosomes, and endolysosomes [1,2]. TLRs constitute one of the first lines of the immune defense system and recognize a variety of pathogen-associated-molecular patterns (PAMPs) during pathogen invasion, triggering the cascade of signaling pathways leading to the adaptive immune response [3][4][5]. The vertebrate TLRs studied to date are involved solely in immune response, in contrast with invertebrate TLR-like proteins, which are also associated with developmental functions [6].
The diversity of TLRs has facilitated the detection of a diverse group of pathogenic ligands [7]. The diverse mechanisms used by TLRs paralogs for TLR-ligand recognition and the formation of a m-shaped homo or heterodimeric complex, lead to activation of downstream signaling cascades by Toll/interleukin-1 receptor (TIR) domains [1,[8][9][10][11][12][13][14][15]. The TIR dimer is recognized by the TIR domain present in different signaling adaptor proteins-including MyD88, MAL, TRIF, and TRAM [16]-and resulting in the activation of NFkB, the expression of various inflammatory and anti-pathogenic proteins [17], the initiation of the adaptive immune response, and the elimination of the invading pathogens [5,[18][19][20].
Each TLR gene consists of a highly conserved intracellular (cytoplasmic) TIR domain, which is responsible for signal transduction [21], a conserved single transmembrane region and a variable extracellular domain (ECD), involved in the ligand recognition and dimerization. ECD consists of variable numbers (~16 to 28) of leucine-rich repeats (LRRs) motifs [22].
TLRs identified to date are grouped phylogenetically into six major families (TLR1, TLR3, TLR4, TLR5, TLR7, and TLR11) [23]. Gene gain and loss have been prominent features in the evolution of this supergene family. The large and variable number of TLR genes facilitate recognition of a wide variety of ligands from diverse pathogens (bacterial, fungal, protozoan, and virus). For example, recent studies have revealed important compensatory mechanisms of the TLR supergene family in both adaptive and innate response in the absence of major histocompatibility complex (MHC) II, CD4 and invariant chain (Ii) in cod fish [24], where TLRs act as an alternative to MHCII, a well-known conserved feature of the adaptive immune system of jawed vertebrates [24][25][26]. The innate immune receptors of the coelacanth are a mixture of mammalian-and teleost-specific TLRs, affirming the transitional position of coelacanth [27], evolutionarily connecting fish and tetrapods, along with its unique immune system lacking IgM [28].
We performed comparative evolutionary genomic analyses of vertebrate TLRs using whole genome sequencing data from 79 species (mammals, birds, reptiles, amphibians, and fishes) to understand the dynamics of gene gain and loss in shaping the TLR gene family repertoire in vertebrate genomes. The most important immunological function of TLRs comprehends protection from pathogens that requires the genes to evolve rapidly in response to selective pressure exerted by constantly evolving pathogens. Because avian species are often important vectors of zoonotic pathogens and reservoirs for viruses we analyzed avian TLRs (viral and non-viral TLRs) to determine how positive diversifying selection affecting the functional and structural variation of TLRs has improved pathogen recognition in birds. We discuss our findings in the context of host-pathogen interactions and the adaptive evolution of avian TLRs to diverse ecological conditions and adaptive requirements.

Genome Scan and Synteny Analysis of TLR Supergene Family
We used representative sequences of the vertebrate TLR1-27 as the query to exhaustively perform blast [29] searches to retrieve all available TLRs from 79 species, including the genomes of 48 birds [30,31] plus 31 other vertebrates which included 11 reptiles, a frog, and 11 fish genomes (Supplementary File 1), providing updated TLR information across vertebrate. TLRs, coding sequences retrieved from the Avian genome consortium [30,31] were used in subsequent adaptive evolution analysis. For the synteny analysis of TLRs we scanned vertebrate genomes (Supplementary File 2) for each TLR gene using extensive Blast searches [29] complemented with the Genomicus tool [32,33]. Gene loss can be hard to ascertain due to constraints imposed by levels of genome coverage [34]. Here we define gene loss as genes that are not found in a given assembly after extensive blast searches.

Phylogenetic Analysis
For the vertebrate TLR supergene family tree, sequences were aligned using ClustalW as implemented in MEGA5 [35] and were corrected manually. The tree was estimated with MEGA5 [35] software using the neighbor joining method with 1000 bootstrap replications. For the phylogenetic and adaptive analyses of the avian TLRs, one to one orthologs were aligned using Muscle [36], implemented in Seaview [37], and manually corrected. We chose the best substitution model based on the Akaike information criteria (AIC) in jModelTest 2 [38]. The sequences were used for maximum likelihood tree construction using PhyML [39] with 500 bootstrap replicates to check the robustness and reliability of the tree [40]. Sequences were tested for nucleotide substitution saturation using DAMBE 5 [41] by plotting the number of transition-transversion against the genetic distance using the F84 model [42], which allows for different equilibrium base frequency and transition-transversion rate bias for nucleotide substitution. The Xia test [43] implemented in DAMBE5 [41] was used to compare the index score (ISS) with critical score (ISS.C) at third and other codon positions to obtain estimates of saturation.

Gene Conversion and Recombination
The sequence alignments were tested for recombination using the online version of GARD (Genetic Algorithm for Recombination Detection) [44] available at http://www.datamonkey.org. The GENE-CONV [45] software was used for the detection of gene conversion events with 1000 permutation and Bonferroni corrected p-value cutoff of p < 0.01 and mismatch allowed (/g1 = 1).

Positive Selection in Avian TLRs
The important immunological function of TLRs, i.e., providing protection of host from pathogens, requires them to evolve rapidly in response to selective pressure exerted by rapidly and constantly evolving pathogens. In proteins, different functional sites experience distinct selection pressures with advantageous changes being positively selected due to adaptive benefits. The Avian TLRs are assumed to have similar ligands as those reported in mammalian TLRs [46,47]. In this study, we used multiple approaches to find signals of positive selection (i.e., ω > 1, assessed by comparing the dN-number of non-synonymous substitutions per non-synonymous sites with that of dS-synonymous substitutions per synonymous sites) in avian TLRs (TLR1A, TLR1B, TLR2A, TLR2B, TLR3, TLR4, TLR5, TLR7,  TLR15). We excluded TLR21 from positive selection analysis as insufficient sequences were found. The avian TLR7 is reported to have been duplicated in a few passerine birds [48][49][50] while TLR5 is pseudogenized in others [46]. Therefore, the TLR5 pseudogene and duplicated TLR7 sequences were excluded from the positive selection analysis.
We employed codon models implemented in PAML [51][52][53] and Datamonkey [54] together with amino acid models in TreeSAAP [55]. CODEML in PAML [51] implements likelihood ratio tests (LRT) for comparisons of sophisticated nested site-specific models calculated as twice the difference of the log likelihood between the two models following Chi square distribution with the degrees of freedom corresponding to the difference in the number of parameters between the nested model, i.e., null model and alternate model (positive selection). A significant LRT implies that the null model is rejected and sites are under positive selection. We compared M1a (nearly neutral) vs. M2a (positive selection), and M7 (beta: assumes a beta distribution for 0 ≤ ω ≤ 1; therefore this model does not allow for positively selected sites) vs. M8 (beta&ω: same as the M7 plus an extra class of ω > 1, allowing positive selection) to find sites under positive selection. Bayes empirical Bayes (BEB) inferred the posterior probabilities of positive selected sites where higher PP mean high confidence. We also used the Hyphy package (http://www.hyphy.org) [56] (http://www.datamonkey.org/) [54,57], which provides several approaches for detection of positive selected sites, including single-likelihood ancestral counting (SLAC), fixed effects likelihood (FEL), random effects likelihood (REL) [58], mixed effects model of evolution (MEME) [59], fast unconstrained Bayesian approximation (FUBAR) [60], and an integrative approach. The SLAC model uses ancestral sequences reconstruction, FEL calculates site-by-site dN/dS without assuming a prior distribution. REL assumes a prior distribution across site, FUBAR ensures robustness against model misspecification, and MEME is most appropriate to detect episodic diversifying selection affecting individual codon sites. In addition, the integrative approach incorporates all sites detected by SLAC, FEL, REL, FUBAR, and MEME. Sites detected by two different methods are interpreted as having more support of positive selection [61][62][63].
Further support for our results was gained by the complementary protein level approach implemented in TreeSAAP [55]. It uses ancestral sequence reconstruction to detect the physiochemical properties changes of the amino acid replacements considering 31 amino acid properties. The amino acid replacement can lead to conservative or radical change in physiochemical properties. The positive radical changes can modify the protein structure and/or function and the number of radical changes at a site can be used as an indicator of the positive selection strength. To facilitate interpretation of the level of changes at a site we compared sites having six or more radical changes (defined as type I) with sites with less than six properties (defined as type II).

Comparison of Domain Architecture, Homology Modeling, and Structure Analysis across Vertebrate TLRs
The LRRfinder was used to predict the domain architecture and define locations of specific amino acid residues in the TLR protein domain. This was also verified, whenever possible using the Uniprot protein database (http://www.uniprot.org). The structure of each TLR was predicted using the CPHmodels 3.2 protein homology modeling server, which resulted in significant model structure for the complete regions of TLR5 and the ECD region of TLR1A, TLR2A, TLR2B, TLR3, TLR4, and TLR7. No significant structure was predicted for TLR1B and TLR15. All the highly significant positive selected sites were displayed on the respective predicted structures to show the potential functional or structural significance of specific amino acid residues. The residues were mapped onto the predicted structure using PyMOL (http://www.pymol.org).

Dynamic Gene Gain and Loss Shapes Vertebrate TLR Supergene Family Repertoire
The dynamics of gene gain and loss have an important role in the evolution and diversification of gene families [64][65][66][67][68][69][70][71]. These lineage and species-specific changes are important evolutionary mechanisms of adaptation [72][73][74]. Comprehensive TLR data from varied vertebrate species/lineages is crucial for assessing the major evolutionary events influencing the diversification of the vertebrate TLR supergene family. To describe and interpret TLR gene family evolution across vertebrates, we assessed representatives from major vertebrate groups by comparing closely and distantly related species (e.g., in mammals we covered monotremes-egg laying platypus, marsupial-short-tailed opossum, to placental mammals). TLR data was collected from diverse vertebrates. These included birds [30,31] and eight divergent reptiles from diverse environments (aquatic, semi-aquatic, and terrestrial) and under unique adaptive pressures (pathogens), including three lizards (Anolis carolinensis, Pogona vitticeps, and Gekko Japonicus), two snakes (Python molurus bivittatus and Protobothrops mucrosquamatus), three turtles (Pelodiscus sinensis, Chrysemys picta bellii, and Chelonia mydas), and three crocodilians (Alligator mississippiensis, Gavialis gangeticus, and Crocodylus porosus). Together with mammals, amphibian, and fishes (see Table 1), our analyses revealed a comprehensive picture of the TLR supergene family evolution (Figures 1 and 2).
The fishes had the highest number of TLR subfamilies (20 subfamilies) with only six subfamilies (TLR6, TLR10-TLR12, TLR15, and TLR16) missing in fishes. This showed that most of the TLR subfamilies originated early in the fish lineage. Fishes also had the highest number of genes derived from duplication events ( Table 1 and Figure 2), with cod having 41 TLR gene copies. These patterns highlight the importance of differential gene gain, and its prominence among fish lineages (Table 1 and Figure 2).
The comparative genomics of the TLR supergene family showed lineage and species-specific distribution-e.g., TLR16 is frog-specific, TLR24 lamprey-specific, TLR26 catfish-specific, and TLR27 coelacanth-specific-whereas TLR11-12 are present only in mammals and TLR19-20 and TLR26 are only found in teleost fishes. The TLR18 and TLR22 were lost in both the avian and mammalian lineages, suggesting similar adaptive pressure (Table 1 and Figure 2). Loss of TLR15 and TLR21 in the mammalian lineage possibly occurred after the divergence of sauropsida from the synapsida lineage (Table 1 and Figure 2). Likewise, the finding of TLR13, TLR18, and TLR22, together with TLR15 and TLR21 in reptiles (Table 1) suggests that these TLRs are not limited to particular vertebrate species and lineages, as previously suggested [23,45]. Our study reveals that dynamic gene gain and loss of TLRs in vertebrate genomes facilitated vertebrate TLR supergene family evolution under diverse ecological requirements. Together with mammals, amphibian, and fishes (see Table 1), our analyses revealed a comprehensive picture of the TLR supergene family evolution (Figures 1 and 2). The fishes had the highest number of TLR subfamilies (20 subfamilies) with only six subfamilies (TLR6, TLR10-TLR12, TLR15, and TLR16) missing in fishes. This showed that most of the TLR subfamilies originated early in the fish lineage. Fishes also had the highest number of genes derived from duplication events (Table 1 and Figure 2), with cod having 41 TLR gene copies. These patterns highlight the importance of differential gene gain, and its prominence among fish lineages (Table 1 and Figure 2).
The comparative genomics of the TLR supergene family showed lineage and species-specific distribution-e.g., TLR16 is frog-specific, TLR24 lamprey-specific, TLR26 catfish-specific, and TLR27 coelacanth-specific-whereas TLR11-12 are present only in mammals and TLR19-20 and TLR26 are only found in teleost fishes. The TLR18 and TLR22 were lost in both the avian and mammalian lineages, suggesting similar adaptive pressure (Table 1 and Figure 2). Loss of TLR15 and TLR21 in the mammalian lineage possibly occurred after the divergence of sauropsida from the synapsida lineage (Table 1 and Figure 2). Likewise, the finding of TLR13, TLR18, and TLR22, together with TLR15 and TLR21 in reptiles (Table 1) suggests that these TLRs are not limited to particular vertebrate species and lineages, as previously suggested [23,45]. Our study reveals that dynamic gene gain and loss of TLRs in vertebrate genomes facilitated vertebrate TLR supergene family evolution under diverse ecological requirements.  Table 1). The NJ tree was made in MEGA5 with 1000 bootstrap replication.  Table 1). The NJ tree was made in MEGA5 with 1000 bootstrap replication.

Synteny of TLR Supergene Family in Vertebrates
Synteny analysis has been fundamental in identifying conserved genomic arrangements and orthologous relationships. Synteny analyses highlighted the genomic organization of TLRs ( Figure 3, Supplementary File 2 A-P) which helped evaluate and interpret TLR gene family evolution and levels of homology of TLR genes in different genomes. Lineage-specific TLRs, e.g., fish-specific and tetrapod-specific TLRs, showed highly conserved synteny. In contrast, the coelacanth showed shared features with both fish and tetrapods. Only TLR3, TLR5M, TLR7, and TLR8 genes showed conserved synteny across vertebrates (from fishes to mammals) ( Figure 3).

Synteny of TLR Supergene Family in Vertebrates
Synteny analysis has been fundamental in identifying conserved genomic arrangements and orthologous relationships. Synteny analyses highlighted the genomic organization of TLRs (Figure 3, Supplementary File 2 A-P) which helped evaluate and interpret TLR gene family evolution and levels of homology of TLR genes in different genomes. Lineage-specific TLRs, e.g., fish-specific and tetrapod-specific TLRs, showed highly conserved synteny. In contrast, the coelacanth showed shared features with both fish and tetrapods. Only TLR3, TLR5M, TLR7, and TLR8 genes showed conserved synteny across vertebrates (from fishes to mammals) ( Figure 3).
We found that the TLR13 clade consists of two diverse groups, one with coelacanth and mammals (labeled as 13M) and another with amphibian and reptiles (labeled as 13X) (Figure 1), which were supported by high bootstrap values and our synteny analysis (Supplementary File 2 H1-H2). We clarified the dubious naming of TLR14, TLR18, and TLR25 and address this ambiguity by assessing synteny (genomic arrangement of gene and flanking genes) and relative phylogenetic relationships. Our phylogeny and syntenic analyses suggest that TLR14 [75], which was first described in Fugu, should also be placed within TLR18 with zebrafish and catfish and be renamed TLR18 (Supplementary File 2-I2). The genes that have been previously referred to as TLR18 in medaka, tilapia, and cod are syntenic and form a monophyletic clade with TLR25 from catfish [76] and should thus be classified as TLR25 (Figure 1 and Supplementary File 2P).
Since the frog TLR12 is more similar to the zebrafish TLR19 than the mammalian TLR12, and does not retain a conserved syntenic relationship with either TLR12 and TLR19, we retained it as TLR16 [23]. Overall, this TLR dataset, derived from an extended evaluation of additional newly
We found that the TLR13 clade consists of two diverse groups, one with coelacanth and mammals (labeled as 13M) and another with amphibian and reptiles (labeled as 13X) (Figure 1), which were supported by high bootstrap values and our synteny analysis (Supplementary File 2 H1-H2). We clarified the dubious naming of TLR14, TLR18, and TLR25 and address this ambiguity by assessing synteny (genomic arrangement of gene and flanking genes) and relative phylogenetic relationships. Our phylogeny and syntenic analyses suggest that TLR14 [75], which was first described in Fugu, should also be placed within TLR18 with zebrafish and catfish and be renamed TLR18 ( Supplementary  File 2-I2). The genes that have been previously referred to as TLR18 in medaka, tilapia, and cod are syntenic and form a monophyletic clade with TLR25 from catfish [76] and should thus be classified as TLR25 (Figure 1 and Supplementary File 2P).
Since the frog TLR12 is more similar to the zebrafish TLR19 than the mammalian TLR12, and does not retain a conserved syntenic relationship with either TLR12 and TLR19, we retained it as TLR16 [23]. Overall, this TLR dataset, derived from an extended evaluation of additional newly sequenced genomes, provided an improved evolutionary assessment and a more accurate classification of the TLR supergene family phylogeny across vertebrates (Table 1, Figure 2).

Gene Conversion and Recombination
Gene conversion is common in tandem-duplicated gene paralogs located in close proximity. Gene conversion and recombination can bias phylogenetic inference and therefore care should be taken to identify signals of those events. The duplicated avian TLR gene paralogs of TLR1 and TLR2 are closely related to their orthologous counterpart from other species (i.e., within species paralogs TLR1A is closely related to TLR1B and TLR2A with TLR2B). This is caused by concerted evolution, possibly through gene conversion, leading to homogenization of the paralogs and reduced phylogenetic signals. Gene conversions have been reported in avian TLR1 and TLR2 [77]. TLR1 and TLR2 are closely located and are separated by a short physical distance (~12 kb and~5 Kb between TLR1 and TLR2 duplicates, respectively) [46,64]. The C terminal of TLR1A/B ranging from LRR14 to the TIR domain is under gene conversion, whereas in TLR2A/B, the N terminal region (N terminal to LRR8) and C terminal region (LRR15 till TIR) are homogenized [46,64,77]. These homogenized regions may have important conserved functions [77].

Vertebrate TLRs Domain Architecture
TLRs consist of three major characteristic domains (extracellular domain (ECD), transmembrane domain (TM), and toll/interleukin-I receptor domain (TIR)). The one exception is TLR5S, which lacks the characteristic TIR domain. The TM domain connects the ECD with the cytoplasmic domain. The ECD is a solenoid shaped structure involved in interaction with PAMPs present in varied pathogens and is also involved in the formation of a M shaped homo and/or hetero dimer, which leads to signaling cascade by TIR activation [78].
Each LRR was around 20-30 amino acids long and had a conserved LxxLxLxxN leucine rich motif and a remaining variable region. The hydrophobic leucine residues constitute the conserved concave surface of the parallel beta strands forming the hydrophobic core, where asparagine is involved in the hydrogen bonding providing structural integrity [79]. Leucine can be replaced with other hydrophobic amino acids, whereas asparagine can also be replaced with other hydrogen donor's like threonine, serine, and cysteine. The variable 'x' residues are responsible for the TLR function.
The exposed and convex surface of ECD is formed by the variable part of LRR repeat and this region is involved in PAMPs recognition. The cysteine clusters capping present in the terminal LRRs (LRR-NT and LRR-CT) protect the terminal hydrophobic residues. The human TLR1, 2, 4, 6, and 10 have all three domains, whereas TLR 3, 5, 7, 8, and 9 have a single domain [79]. This categorization is due to the interrupted asparagine in LRRs of the central domain, which results in structural flexibility, whereas uniform LRRs repeats with continuous asparagine results in single domain architecture [78]. The comparison of the domain architecture of representative TLRs from stickleback, coelacanth, Chinese soft shell turtle, Anolis lizard, and chicken was consistent with the human TLRs (   Tables (For  TLR1B and TLR15 structure prediction was not significant).

Rapid Adaptive Evolution of Avian TLRs
We found variable numbers of positive selection sites among different TLR genes. The positive selection model M2a and M8 implemented in PAML4.7 [51] detected significant signals of positive selection in all genes ( Table 2). The reliability of sites was well supported by the fact that most positive selected sites with PP > 0.99 under M8 also had PP > 0.90 in M2a. The M8 model found higher number of sites compared to the more conservative model M2a (Table 2). We found significant evidence of positive selection in all avian TLRs studied with different percentage of positively selected sites for different genes ranging from 11% in TLR15 to 5% in TLR7. The highest omega values were found in the TLR4, TLR2A, and TLR7 with 2.6, 2.6, and 2.5, respectively, while TLR15 had the lowest value of 1.5.
The number of positive selected sites with the highest (BEB) posterior probabilities (PP) also varied and the maximum number of sites with high PP were found in TLR7 (viral) and TLR4 (nonviral), which shows for the first time that both viral and non-viral TLR genes follow a positive selective regime (Table 2). In TLR7, Model M8 detected high numbers of positive selected (PS) sites The avian TLR1/TLR2 forms heterodimers that are activated by both diacyl (Malp-2) and triacyl (Pam3) lipopeptides [13,79,80], with the exception of TLR2a/ TLR1b, which are activated by Pam3 but not by Malp-2. In addition, TLR2a/TLRL1b is activated by peptidoglycan [77]. We found 12 positively-selected (PS) sites in TLR1A, 26 PS sites in TLR1B, 34 PS sites in TLR2A, and 25 PS sites in  Tables (For  TLR1B and TLR15 structure prediction was not significant).

Rapid Adaptive Evolution of Avian TLRs
We found variable numbers of positive selection sites among different TLR genes. The positive selection model M2a and M8 implemented in PAML4.7 [51] detected significant signals of positive selection in all genes ( Table 2). The reliability of sites was well supported by the fact that most positive selected sites with PP > 0.99 under M8 also had PP > 0.90 in M2a. The M8 model found higher number of sites compared to the more conservative model M2a (Table 2). We found significant evidence of positive selection in all avian TLRs studied with different percentage of positively selected sites for different genes ranging from 11% in TLR15 to 5% in TLR7. The highest omega values were found in the TLR4, TLR2A, and TLR7 with 2.6, 2.6, and 2.5, respectively, while TLR15 had the lowest value of 1.5.
The number of positive selected sites with the highest (BEB) posterior probabilities (PP) also varied and the maximum number of sites with high PP were found in TLR7 (viral) and TLR4 (non-viral), which shows for the first time that both viral and non-viral TLR genes follow a positive selective regime (Table 2). In TLR7, Model M8 detected high numbers of positive selected (PS) sites (4.5% sites, total of 42 positively selected sites, 30 sites PP > 0.90, 25 sites PP > 0.95 and 23 sites PP > 0.99) and a somewhat similar scenario was found in TLR4 (5.5% sites, total of 45 sites, 29 sites PP > 0.90, 27 sites PP > 0.95 and 20 sites PP > 0.99). TLR3 had fewer sites with high PP (e.g., of 5% positive selected sites in TLR3 only five sites had PP > 0.90). Overall these results show that the number and strength of positive selection sites varies in the TLR supergene family and that both viral (TLR7) and non-viral (TLR4) TLRs evolved under strong positive selection.
Overall, of 12 sites in TLR1A (Tables 2-4 Figure 4, Supplementary File 4), all of them in LRRs with 14 in the variable region and 11 in the conserved region. The PS sites found in our study could be related with the a wide range of ligands as TLR2 recognizes a variety of compounds other than triacyl lipoproteins, including lipoteichoic acids, lipoarabinomannan, and zymosan [18]. Also, the formation of combinatorial binding sites by selection of TLR1 or TLR6 as the dimerization partner can explain, at least in part, the broad ligand specificity. A similar mechanism might explain the extent that PS sites found in TLR1 and TLR2 are shared with their counterparts in birds.  [13,81].
TLR3 recognizes dsRNA and prevents the spread of most viruses. TLR3-ECD binds with dsRNA at two sites located at opposite ends of the TLR3 horseshoe structure. The first dsRNA:TLR3 interaction site is located close to the C-terminus, on LRR19-LRR21 and the second dsRNA:TLR3 interaction site is located on the N-terminal end (LRR-NT-LRR3) [82]. The intermolecular contact between the two C-terminal domain regions of TLR3 coordinates and stabilizes the dimer by a series of protein-protein interactions. Of 22 PS sites in TLR3 (Tables 2-4, Figure 4, Supplementary File 4), 2 are in signal peptides, 2 in LRR-NT, 2 in TM domains, 3 in the TIR domain, and 13 in LRRs. Of 13 sites in LRRs, 9 were in the variable region. The PS sites found in TLR3 likely favor the recognition and protection against rapidly evolving viral RNA [82].
TLR4 forms a heterodimer with myeloid differentiation factor 2 (MD-2) and recognizes diverse LPS molecules [83,84] along with components of yeast, trypanosoma, and even viruses [20,81,85]. The ECD of TLR4 consists of three subdomains. N subdomain consists of LRR-NT and LRR 1-6, the central subdomain ranges from LRR7-12 and the C subdomain consists of LRR 13-22 and LRR-CT. LPS causes dimerization of the TLR4-MD-2 complex at the central and/or the C-terminal, and interaction between TLR4 and LPS-MD-2 complex takes place at the concave surface between N and central sub-domain [84]. We found 41 PS sites in TLR4 (Tables 2-4 The TLR4 has a 330-amino-acid-long variable region in ECD which is also known as the middle region. Within the middle region there are 82 amino acids that are hyper-variable across species with numerous species-specific changes [86]. The majority of the PS sites found in TLR4 were concentrated near to the hyper-variable region. The primary contact interface between TLR4 and MD-2 involves two chemically distinct regions, the A and B patches provided by the N-terminal and the central domains of TLR4 and main dimerization interface of TLR4 located in the central C-terminal domain. The presence of positive selected sites in TLR4-MD-2-LPS complex may support the remarkable versatility of the ligand recognition mechanisms employed by the TLR family, which is essential for defense against diverse microbial infections [84,86]. The bacterial flagellin is a virulence factor recognized by TLR5 [87]. The residues 174-401 in ECD of TLR5 are responsible for species-specific flagellin recognition [88]. The alpha and epsilon Proteobacteria are able to evade TLR5 recognition by mutating key residues in the TLR5 recognition site [89] and it is suggested that positive selection in primate TLR5 may be related with coevolution between PRRs and their microbial ligands [81]. Twenty-six PS sites were found in TLR5 (Tables 2-4, Figure 4, Supplementary File 4), with two sites each in signal peptide, LRR-NT, TM and TIR domains, one site in LRR-CT and the remaining 17 sites in LRRs, of which seven were in the variable region and 10 in the conserved region. Seven of these PS sites were located in 228 amino acid region identified previously [88] and thus could play an important role in the species-specific flagellin recognition and defense.
TLR7 recognize single-stranded RNAs [90]. In TLR7, we found 4, 1 and 4 PS sites in LRR-NT, TM, and TIR domain, respectively, and 39 PS sites were present in LRRs of which 33 sites were in the variable region (Tables 2-4, Figure 4, Supplementary File 4). This variable region directly interacts with the ssRNA. Thus, the high proportion of PS sites in viral TLR7 is probably related with the host pathogen arms race. TLR7 is also known to be evolving under strong selection pressure in mammals [79,85]. Recent studies found TLR7 to be under positive selection in bats, which are natural reservoirs and carriers of numerous deadly viruses [91]. The long-term coexistence of bats and viruses imposed strong selective pressures on the bat genome, with strong positive selection occurring in genes like TLR7 involved in the innate immune system, the first line of anti-viral defense [92]. The rapid evolution of viral TLR and its associated role in maintaining and disseminating viruses (e.g., in bats), suggest that it might have a similar role in the rapid evolution of viral TLR7 in birds. The avian TLR7 is a candidate for the detection of influenza [93] and has evolved to recognize ssRNA (with very high mutation rates due to lack of mismatch repair), pointing towards the long coexistence of viruses and birds. This may help understand the role of birds as natural reservoirs and vectors of zoonotic pathogens [93].
TLR15s recognize yeast-derived agonists [94]. We found 41 PS sites (Table 3 and Supplementary File 4) with 1, 3, 4, 2, and 1 PS sites in the signal peptide, LRR-NT, LRR-CT, TM, and TIR domains, respectively, and 30 sites in LRRs, of which 14 were present in the variable region. None of the PS sites were located in the highly conserved three box regions of the TIR domain [95,96]. The exact mechanism of action of TLR15 is unknown but the presence of PS sites is in consistent with other TLRs involved in immune defense.
Evidence of positive-selected sites was further assessed using by multiple complementary approaches (SLAC, FEL, REL, MEME, and FUBAR) implemented in the HyPhy package (Table 3) (http://www.hyphy.org) [56] (http://www.datamonkey.org/) [54,57] in viral and non-viral avian TLRs (Table 3). Sites detected by more than one method (concordant of two or more methods) were thus considered to be robust candidates for positive selection (Table 3) [79]. We further implemented TreeSAAP [55] to detect positive radical changes in physiochemical properties of amino acids, which in turn can affect the structure and function of proteins. The TreeSAAP results also complemented our findings of positive selection (Table 4 and Figure 4), with most sites under selection belonging to type I (sites with six or more positive radical changes) or type II radical changes (sites with less than six positive radical changes).
Finally, homology modeling elucidated the role of the highly significant positive selected sites (Table 4) on the structural and functional diversification of TLRs (Figure 4). Since the majority of positive selected sites restricted to the variable extracellular domain (ECD) are involved in the ligand recognition and dimerization, this highlights their importance in the structure and function of these proteins (Figure 4). This strongly supports the hypothesis that the host-pathogen arms race drives the rapid evolution of TLRs.

Discussion
The vertebrate toll-like receptors (TLRs) supergene family constitutes the first line of immune defense against diverse pathogens and provides a fascinating example of the host-pathogen evolutionary arms race. Here, employing state-of-the-art DNA and protein level analyses, we provided a comprehensive comparative evolutionary genomics characterization of the vertebrate TLR supergene family using whole genome sequencing data from 79 species (mammals, birds, reptiles, amphibians, and fishes) together with strong positive selection of avian TLRs (viral and non-viral TLRs).
Our results elucidated the dynamic evolution of TLR super gene family with different level of gene gain and loss leading to variable species and lineages specific TLR repertoire across vertebrates and positive selection analysis of avian TLRs showed extensive adaptive evolution shaping host pathogen arms race in the avian TLRs.
Our results show that the TLR supergene family is broadly divided into six major families, including TLR1, TLR3, TLR4, TLR5, TLR7, and TLR11, with each having different numbers of subfamilies-e.g., 10 subfamilies in TLR1, 10 subfamilies in TLR11, 3 subfamilies in TLR7, and a single subfamily in families TLR3, TLR4, and TLR5. We conclude that most of the TLRs originated early during vertebrate evolution and that genome duplication together with differential rate of gene gain and loss shaped the TLR gene family evolution in vertebrates, leading to species and lineage-specific TLR variations. We found that gene loss was common in the avian lineage, resulting in only eight extant subfamilies, the least observed among vertebrates. In contrast, 20 subfamilies were detected in fishes, reflecting the strong impact of gene duplication. The genomic scan of diverse reptilian genomes confirmed the presence of TLR13, TLR15, TLR18, TLR21, and TLR22 in reptiles (Table 1), supporting other evidence that these TLRs are widely distributed and not limited to a particular vertebrate species or group [23,45]. For example, TLR15 was not bird-specific and is also found in most reptilian genomes. Similarly, TLR21 was also found in reptiles, as well as fishes, frogs, and birds. The subfamilies TLR11-TLR12 are mammal-specific and subfamilies TLR19-20 and TLR23-27 are specific to fishes. More duplication events occurred in fishes (TLR2, TLR4, TLR5, TLR7, TLR8,  TLR9, TLR14, TLR24, and TLR19-TLR23) compared with tetrapods (TLR1, TLR2, TLR5, TLR8, TLR13, and TLR14), clearly showing a reduction of the TLR supergene family repertoire of fishes relative to other vertebrates (Table 1), e.g., maximum of 41 TLR copies in cod genome among fishes compared to maximum 14 copies in mammals suggest the greater role of TLRs in fishes. The importance of TLRs in fishes is also supported by the loss of the MHCII, CD4, and invariant chain (Ii) in cod [24] and IgM in coelacanth [28], which suggests that TLRs have an important alternative and compensatory role in the fish lineage.
The synteny of TLR supports fish-and tetrapod-specific genomic arrangements, with the exception of subfamilies TLR3, TLR5M, TLR7, and TLR8 having conserved syntenic organization across all vertebrates. The coelacanth was an exception to this trend as its TLRs had features shared with both tetrapods (TLR1, TLR2, and TLR13) and fishes (TLR21), supporting its evolutionary proximity with both groups. This pattern also probably reflects its unique immune system, which lacks IgM as part of its adaptive immune system [27,28].
The important immunological function of TLRs-i.e., protection of host from pathogens-requires them to evolve rapidly in response to selective pressure exerted by rapidly and constantly evolving pathogens. Birds have been important vectors of zoonotic pathogens and reservoirs for viruses, thus the availability of avian genomes allowed us to test the adaptive evolution of avian TLRs responsible for both viral and non-viral pathogens.
Positive selection is one of the hallmarks of immune-defense-related genes [97,98] and especially those encoding recognition proteins, evolve under positive selection [99]. There is growing evidence of positive selected sites in TLR loci [46,50,65,76,86]. These positive-selected sites can provide increased number of advantageous variations, which is important for pathogen recognition and the host-pathogen arms-race that facilitate successful adaptation against changing environments and pathogens [99]. The use of different codon and protein-level approaches revealed extensive positive selected sites in birds TLRs and the homology modeling of these important changes in TLR proteins (Tables 2-4, Figure 4 and Supplementary File 4) confirms their possible effect on structural and functional diversification of the respective TLRs, and is discussed below.
Our results were further supported by protein level approaches. Most PS sites found under high PP > 0. 99 (Table 4 and Figure 4), had the highest number of positive radical physiochemical changes. Of the 71 sites with high PP > 0.99 (Table 4), 59 PS sites showed positive radical changes in physiochemical properties (TreeSAAP categories 6, 7, and 8). We also found 15 PS sites with type-I radical changes (sites with six or more positive radical changes). TLR4 (non-viral) and TLR7 (viral) had the most sites with PP > 0.99 (Table 4). In TLR4 we found 20 such sites (Table 4 and Figure 4), among which 17 showed type II changes (sites with less than six positive radical changes) and 6 PS sites showed type I changes. Most of the type I changes were restricted to TLR2A (5 sites) and TLR4 (6 sites) ( Table 4 and Figure 4). In TLR7 we found 23 sites with PP > 0.99, among which 19 showed type II changes and two PS sites showed type I changes ( Table 4). The homology modeling showed that the majority of sites having PP > 0.99 were found in the ECD with only few (4 sites) found outside the ECD (Figure 4)-e.g., site 16 from TLR2A, sites 640 and 655 from TLR4, and site 920 from TLR7-were found in the signal peptide, TM domain, and TIR domain of the respective gene.
Overall positive selection in the avian TLRs suggests that the host pathogen arms race has played an important role in the rapid evolution of the avian TLRs immune genes to adapt against pathogens. Our finding suggests rapid positive selection in both viral and non-viral TLRs (Table 3) with non-viral TLR4 having the highest PS sites (20 sites with highest PP > 0.99) and among viral TLRs, TLR7 had the highest number of PS sites (23 sites with PP > 0.99). The majority of PS sites with type I and II positive radical changes were located in the LRRs of ECD, which is mainly involved in PAMPs recognition. The finding of non-viral TLR4 and viral TLR7 genes with highest number of PS cites coincides with similar finding for TLR7 [87] and non-viral TLR4 [92] in mammals. The role of non-viral TLR4 in recognizing diverse ligands (e.g., LPS-lipopolysaccharide and LTA-lipoteichoic) and TLR7 in recognizing ssRNA and possibly influenza [94] may be linked with the mechanism of how birds maintain and disseminate numerous deadly viruses [95].
These findings support the premise that the host-pathogen arms race led to co-evolution and possibly explains the strong selective pressure imposed by the long-term coexistence of viruses and birds and is consistent with the proposed role of birds as natural reservoirs and vectors of zoonotic pathogens [100]. The rapid evolution of the TLR supergene family that we have documented using newly-available bird genomes fits well with other evidence that these immune genes experienced strong adaptive selection due to the rapid evolution of pathogens [101] and that the host-pathogen arms race increases the chance of developing novel protection against diverse species of pathogens.

Conclusions
This study revealed lineage and species-specific differences in the distribution of the TLR supergene family among vertebrates (n = 79 species), with varied rates of gene gain/loss resulting in different repertoires of TLRs that would have facilitated recognition and protection from diverse pathogens. Our results provide strong support for the rapid evolution of both viral and non-viral avian TLRs and strengthens the hypothesis that the long-term coexistence of birds and viruses contributed to the strong selective pressure observed in the viral TLR immune genes. The patterns of gene gain, gene loss, and positive selection in the TLR gene superfamily provides compelling support of the co-evolutionary host-pathogen arms race.