Phylodynamics of H5N1 Highly Pathogenic Avian Influenza in Europe, 2005–2010: Potential for Molecular Surveillance of New Outbreaks

Previous Bayesian phylogeographic studies of H5N1 highly pathogenic avian influenza viruses (HPAIVs) explored the origin and spread of the epidemic from China into Russia, indicating that HPAIV circulated in Russia prior to its detection there in 2005. In this study, we extend this research to explore the evolution and spread of HPAIV within Europe during the 2005–2010 epidemic, using all available sequences of the hemagglutinin (HA) and neuraminidase (NA) gene regions that were collected in Europe and Russia during the outbreak. We use discrete-trait phylodynamic models within a Bayesian statistical framework to explore the evolution of HPAIV. Our results indicate that the genetic diversity and effective population size of HPAIV peaked between mid-2005 and early 2006, followed by drastic decline in 2007, which coincides with the end of the epidemic in Europe. Our results also suggest that domestic birds were the most likely source of the spread of the virus from Russia into Europe. Additionally, estimates of viral dispersal routes indicate that Russia, Romania, and Germany were key epicenters of these outbreaks. Our study quantifies the dynamics of a major European HPAIV pandemic and substantiates the ability of phylodynamic models to improve molecular surveillance of novel AIVs.


Introduction
The epidemiology of the H5N1 highly pathogenic avian influenza viruses (HPAIVs) is characterized both by the far-reaching economic impact for commercial poultry production, and also by the risk of spread into mammalian species, including humans. Transmission into humans has mainly been attributed to direct contact of susceptible individuals with infected poultry, including activities associated with the slaughter and dressing of poultry for human consumption [1]. The maintenance and spread of avian influenza (AI) infection in domesticated poultry are mainly associated with the establishment of the virus in free-range (or backyard) poultry and its subsequent introduction to commercial animals via live-bird markets. Furthermore, wild birds are the natural reservoirs of the virus, and the movement of wild birds infected with HPAIV is known to play an important role in the spread, circulation, and maintenance of the virus through either direct or indirect contact with poultry [2][3][4].
The ancestral strain of H5N1 HPAIV, referred to as A/goose/Guangdong/1/96, emerged in commercial domesticated geese in the Guangdong province of China in 1996 [5]. Since then, the strain has undergone numerous genetic changes that have given rise to several distinct lineages, also referred to as clades [6]. To date, ten H5N1 HPAIV clades have been identified based on comparisons of 859 hemagglutinin (HA) gene sequences [7]. These viruses are now widespread in countries throughout Asia, Europe, and Africa. Between 2004 and 2005, a new H5N1 HPAIV strain (sub-clade 2.2) emerged that caused severe outbreaks in wild birds and poultry in Northwestern China [6,8]. In Europe, HPAI H5N1 was first detected at the end of 2005 in both poultry (in Russia, Romania, and Turkey) and wild birds (in Croatia) [2,[9][10][11]. In January 2006, the first human infection outside Southeast Asia was reported in Turkey. In February 2006, the first of occurrence of HPAI H5N1 in Northern Europe was identified from a wild mute swan (Cygnus olor) in Germany in the southwestern part of the Baltic Sea [11,12]. Between 2006 and 2010, the epidemic spread through 13 European Union (EU) member states located eastward of a line extending from Southeastern Sweden to Southwestern Italy. Both poultry and wild-bird populations were infected in Austria, Bulgaria, Czech Republic, Denmark, France, Greece, Hungary, Poland, Slovak Republic, Slovenia, Switzerland, and the UK [10,13]. Phylogenetic analysis of HPAIVs isolated from European outbreaks distinguished at least two different sub-lineages of the 2005 Qinghai strain, which confirmed that the virus spread from southeastern Asia into Europe [10][11][12][13].
Since 2005, surveillance of avian influenza viruses in poultry and wild birds has become compulsory in the EU. Member states have conducted extensive molecular surveillance for AI, with most of the studies focused on genetic analysis of partially or fully sequenced AI viral genomes [14][15][16][17][18][19][20]. Molecular epidemiological studies of H5N1 HPAIV sequence data isolated between 2005 and 2010 in EU member states, for example, were focused on the molecular characterization, genetic classification, identification of genetic shifts and drifts, and the degree of relatedness to previously isolated viruses. Phylogenies of H5N1 HPAIV isolated in the EU were estimated using traditional parsimony, neighbor-joining, and maximum-likelihood methods. However, those methods typically ignore various sources of uncertainty, including uncertainty associated with estimates of the phylogenetic relationships, divergence times, and history of geographic dispersal [21]. Furthermore, epidemiological studies that explored risk factors and the spatial and temporal evolution of H5N1 HPAIV in the EU between 2005 and 2010 were mainly conducted separately from phylogenetic studies [22][23][24][25][26]. Phylodynamics is an emerging field that aims to characterize the joint evolutionary and epidemical behavior of rapidly evolving infectious diseases using tools borrowed from the field of phylogenetics [27]. The probabilistic models used in the phylodynamic approach are mostly pursued in a Bayesian statistical framework [28]. This approach treats parameters of the phylodynamic model as random variables, such that each parameter is described by a specified prior probability distribution (and a corresponding inferred posterior probability distribution). Accordingly, the Bayesian approach provides a natural way to estimate (and accommodate) uncertainty in the phylodynamic model parameters, including the virus phylogeny, divergence times, and history of geographic spread [29].
Recent evidence suggests that Bayesian methods may be more accurate in assessing HPAIV evolution than traditional phylogenetic methods. For example, Lemey et al. [29] adopted a Bayesian phylodymanic approach to re-analyze a H5N1 HPAIV dataset initially studied using conventional methods [30,31]. Results of the Bayesian analysis did not support the earlier conclusion regarding the epidemiological link between Guangdong and Indonesia, but instead suggested that the ancestral strains of the Indonesian outbreaks originated from the Hunan province in south central China [30,31]. Furthermore, these studies demonstrated the potential of Bayesian phylogeographic methods to estimate the posterior probability of each geographic location at any point in the phylogenetic tree, and the ability for phylogenetic parameters to be estimated from different genomic segments of the H5N1 HPAIV genome without assuming that the sequences shared an identical phylogenetic history [29].
Here, we extend the landmark study of Lemey et al. [29] to estimate the history of HPAIV dispersal into and within Europe after its initial detection in Russia in 2005. Our data comprise 277 H5N1 HPAIV HA and neuraminidase (NA) gene sequences (referred to as isolates) collected between 2005 and 2010 in Europe and Russia. We adopt a discrete-trait phylodynamic model to estimate both the history of viral migration between geographic areas, and also to infer the movement of the virus among host species (wild vs. domesticated birds). We also estimate temporal changes in viral population size, and discuss the potential of phylodynamic methods to improve AI surveillance. Our study provides quantitative estimates of the mechanisms that led to the rapid spread of HPAIV throughout Europe in 2005-2010 and further illustrates the potential of this approach to improve the prevention and control of novel emerging AIVs.

Sequence Data
Our dataset comprised complete (or nearly complete) H5N1 HPAIV haemagglutinin (HA) and neuraminidase (NA) gene sequences isolated in Europe and the Russia between May 2005 and June 2010, including information on the date, location, and host from which the sequences were isolated. These data were obtained from the Global Initiative on Sharing All influenza data (GISAID) public database [32], with the exception of sequences from Romania, which have been described elsewhere [33]. Laboratory recombinants or highly cultured sequences were excluded from the dataset. The length of the HA and NA gene regions ranged from 1621-1803 and 1211-1458 bp, respectively. Our dataset comprised a total of 347 sequences, of which 277 (79.8%) were from Europe and Russia, and 70 (20.2%) were from Asian and African countries that were included to root the phylogenetic tree (Table  S1). Furthermore, we cross-matched each retrieved sequence with the NCBI Influenza Virus Resource (IVR) [34] database to obtain information on the date, location, and host of origin of the sample. Because both IVR and GISAID databases do not report the exact longitude and latitude (and occasionally the actual date of collection), we consulted the metadata field for each sequence isolate to approximate the location and date of collection, as well as to specify the host species. We identified metadata for each sequence by comparing strain names between databases (e.g., A/swan/Germany/6/2007), and, when available, we cross-matched information with the corresponding publication [10][11][12][13][14][15][16][17][18][19][20]. We specified the location for each sample (latitude-longitude) using the centroid of the country from which samples were isolated. We converted the collection date for each sequence into fractional years (decimal days) in order to estimate divergence times. If only the year of isolation was available (1.2% of the sequences), we specified the ages as the mid-point of the corresponding year. Similarly, if only the month of isolation was available (10.4% of the sequences), we recorded the date as the mid-point of the corresponding month. Finally, we classified host species as either wild birds, domestic birds, or other (non-avian hosts). Table S2 summarizes the profile of 277 H5N1 HPAIV sequences isolated from reported outbreaks in Europe and the Russia between May 2005 and June 2010. Finally, an epidemic curve was plotted using approximately 1130 outbreak reports retrieved from the Food and Agriculture Organization of the United Nation (FAO) Global Animal Disease Information System EMPRES-i [35]. The outbreak data included dates and locations of detected cases in domesticated poultry and wild birds in Europe and Russia between 2005 and 2010.

Preliminary Phylogenetic Analysis
We aligned the H5N1 HPAIV HA and NA gene regions using MUSCLE version 3.8 [36]. The resulting alignments were adjusted manually to ensure that these protein-coding gene regions remained in frame (assessed by amino-acid translation using Mesquite version 3.01) [37]. We removed all sequences (25%) with 100% nucleotide identity and both gene regions were found to be free of homologous recombination using Recombination Detection Program version 3 (RDP3) [38]. For each gene region, we selected a mixed substitution model (assignment of data partitions to substitution models) by first defining six data subsets (the codon positions of each gene region), and then selecting among mixed-substitution model using the Bayesian Information Criterion (BIC) implemented in PartitionFinder v 1.1 [39]. Finally, we assessed the degree of topological (in)congruence between the two gene regions by performing maximum-likelihood estimates of the phylogeny for the individual gene regions under the selected mixed-substitution model; these analyses entailed 100 non-parametric bootstrap replicate searches of each gene region using RAxML version 8 [40].

Divergence-Time Estimation
We estimated divergence times for the H5N1 HPAIV sequence dataset using the (relaxed) molecular-clock models implemented in BEAST v 1.8 [41]. We used the mixed-substitution model for each gene region identified in the preliminary analyses detailed above. We assessed the fit of the sequence data to three branch-rate prior models: (1) a strict molecular clock model (which assumes that substitution rates are stochastically constant across branches of the tree); (2) the uncorrelated exponential relaxed clock (UCED) model, which assumes that substitution rates on adjacent branches are sampled from a shared exponential distribution, and; (3) the uncorrelated lognormal relaxed clock (UCLN) model, which assumes that substitution rates on adjacent branches are drawn from a shared lognormal distribution. We assessed the fit of the sequence data to these three branch-rate models by comparing their corresponding marginal likelihoods, which were estimated using Tracer version 1.6 [42,43]. Bayes factor (BF) comparisons indicated that the UCED branch-rate model provided the best fit for both HA and NA gene regions (Bayes factor > 25 for the log marginal likelihood). Estimation of divergence times also requires a node-age model; we assumed a Bayesian skyline coalescent tree prior, which allowed us to estimate changes in the effective population size through time [44]. Isolation dates of the sequences were used to calibrate divergence-time estimates.
We approximated the joint posterior probability density of these model parameters using the Markov Chain Monte Carlo (MCMC) algorithms implemented in BEAST v 1.8. Each MCMC simulation was run for 1 × 10 8 cycles, and thinned by sampling every 10,000 cycle. We performed four replicate MCMC simulations to aid in assessing performance of the simulations. We assessed MCMC convergence by comparing the parameter estimates from independent analyses for each parameter, ensuring that the estimates of the marginal posterior probability densities were effectively identical for the four independent chains. We also assessed convergence by the estimating effective sample sizes (ESS) for each parameter using Tracer, and assessed mixing based on the acceptance ratios. Those evaluations suggested that the MCMC algorithms provided a reliable approximation of the posterior probability density, and suggested that the first 10% of the samples (the "burn-in") should be discarded. We summarized posterior results as a maximum clade credibility (MCC) tree using Tree Annotator. We then generated a Bayesian skyride plot to infer the population demographics of H5N1 HPAIV HA and NA gene segments in Europe and the Russia. We plotted the inferred effective population size of the virus between 2005 and 2010 in terms of relative genetic diversity (N e T), where N e is the effective population size and T the generation time [44].

Estimation of Geographic History under the Discrete Phylodynamic Model
We incorporated information on the geographic locations where sequences were isolated to describe the spatial dynamics of viral epidemics following a procedure described by Lemey et al. [29]. Briefly, the approach jointly estimates phylogeny and history of discrete traits in a Bayesian statistical framework, which enables inference of the timing of viral dispersal patterns while accommodating phylogenetic uncertainty. Here, the discrete states are geographic areas, and the goal is to estimate the history of viral migration (transitions) between these areas through time. This is achieved using a model averaging approach (based on Bayesian stochastic search variable selection; BSSVS) to describe the spatial evolution of H5N1 HPAIV epidemic. We modeled the geographic transition of the H5N1 HPAIV between areas as discrete states under a continuous-time Markov model comprising a number of non-zero transition rates that were identified by means of BSSVS. We assessed the fit of the H5N1 HPAIV data to a number of candidate discrete phylogeographic models (Table S3), including both symmetric and asymmetric discrete-trait models with irreversible and reversible transitions, respectively (using a mean-one exponential prior for the rate parameters). Accordingly, our analyses were based on a composite phylogenetic model that comprised: (1) the previously selected mixed-substitution model; (2) the strict, UCLN, UCED branch-rate models to describe variation in substitution rate across branches; (3) the coalescent Gaussian Markov Random field (GMRF) Bayesian Skyride model as a prior on the node times in the tree, and; (4) the symmetric and asymmetric discrete-state phylodynamic models to describe the history of viral migration between discrete geographic areas. Accordingly, we explored a total of six candidate composite phylodynamic models (the symmetric and asymmetric variants of the geographic model and the three branch-rate models).
We assessed the relative fit of the six candidate phylodynamic models to the HA and NA sequence datasets using Bayes factors. To this end, we first estimated the marginal likelihood for each of the six candidate phylodynamic models from the resulting posterior samples using the posterior simulation-based analogue of Akaike's information criterion (AICm) [45]. Bael et al. [46] have recently shown the AICm to provide more reliable estimates of the marginal likelihood than the harmonic-mean estimator. We then used the marginal-likelihood estimates to compute Bayes factors to select among the candidate models. The UCED branch-rate model was again preferred for both HA and NA gene segments (BF > 25). Bayes factor (BF) comparisons indicated that the symmetric UCED branch-rate model with irreversible transitions provided the best fit for both HA and NA gene regions (Bayes factor > 25 for the log marginal likelihood). The posterior probability distribution of trees sampled under the preferred model was summarized as an MCC consensus tree with posterior probabilities of geographic areas plotted at interior nodes using FigTree version 1.4 [47]. We used SPREAD version 1.0.6 [48] to assess the strength of transition rates between discrete geographic areas, using a BF cutoff = 6 to identify non-zero transition rates. Finally, we generated a keyhole markup language (KML) file to visualize the geographic migration of the virus.

Exploring the Evolution of H5N1 HPAIV Host Infection
We modeled the evolutionary movement of H5N1 HPAIV within and between host types (domestic birds, wild birds, other). To this end, we again adopted a discrete-trait model, where the discrete states in this case correspond to the host type, where the objective is to infer the history of H5N1 HPAIV migration between hosts through time. The number of non-zero transition rates in the model was again estimated using BSSVS. The relative strength of transition rates (e.g., wild → domestic) was estimated using Bayes factors. Following [49], we estimated the ancestral states (host type) at internal nodes of the tree under a composite phylogenetic model that included: (1) the previously selected mixed-substitution model to describe evolution of nucleotides over the tree with branch lengths; (2) the preferred uncorrelated lognormal model to describe the variation of substitution rates across branches of the tree [49]; (3) a constant population-size coalescent model to describe the temporal distribution of node heights in the phylogeny, and; (4) the asymmetric discrete-state phylodynamic model to describe the history of viral migration between host types as a continuous-time Markov process (with a number of non-zero transition rates determined by the BSSVS procedure, using a uniform prior with mean = 1 for the rate parameters). We summarized the posterior probability distribution of parameter estimates as an MCC consensus tree with the posterior probability of host state mapped to interior nodes of the tree using FigTree. Additionally, we assessed the strength of transition rates between states (hosts) by means of Bayes factors using SPREAD with a cutoff of six to identify non-zero transition rates. Use of the asymmetric discrete trait model allowed us to assess the strength of directionality between states (e.g., wild → domestic and domestic → wild bird).

Assessing Uncertainty in Discrete-Trait Mappings and Association Statistics
We used the Kullback-Leibler (KL) divergence statistic [50] to accommodate phylogenetic uncertainty in the discrete-trait estimates (for host type and geographic location). The KL divergence measures the departure between prior probability distribution and the corresponding posterior probability distribution for a given parameter. The premise is simple: The posterior probability distribution inferred for a given parameter is the updated version of prior probability distribution-it is updated by the information in the data via the likelihood function. Accordingly, if the data contain little information regarding the value of a parameter, its posterior probability distribution will closely resemble the corresponding prior probability distribution, and the KL divergence between these two probability distributions will therefore be small [29]. We calculated the KL divergence for each selected posterior distribution of trees using a function written by Razavi [51] implemented in Matlab v 2013a [52]. The Association Index (AI) was calculated using Bayesian Tip-Significance Testing (BaTS) software version 1.0 to test the hypothesis that a taxon with a given trait (host type or geographic location) are more likely to share traits with adjoining taxa than that expected by chance [53].

Results and Discussion
Our analyses of the viral population dynamics for both HA and NA genes reveal a strong increase in genetic diversity (and, therefore, effective population size) in mid-2005 followed by a second increase in early 2006 ( Figure 1A,B). This coincides with the highest frequency of reported cases of H5N1 infections during the epidemic, when several independent introductions of H5N1 viruses belonging to sub-clade 2.2 have been confirmed ( Figure 1C) [14][15][16][17][18][19][20]. The relatively high viral genetic diversity during the early stage of the epidemic is consistent with the introduction of the virus into a novel region, with a naïve population, where it was exposed to selection pressure imposed by environmental and demographic factors [10,25,54]. We also inferred a third (and relatively minor) change in viral genetic diversity during mid-2007, which coincides with the entry of the virus into new EU countries. From 2005 to 2006 conditions seem to have been relatively favorable for the virus to perpetuate in the susceptible population and, given the genetic diversity, reassortment may have occurred during this time frame. The viral genetic diversity is then inferred to decline toward 2007 and to stabilize until 2010. The inferred decrease of viral genetic diversity coincides with the decline of the epidemic, suggesting that prevailing epidemiological factors, such as host density or weather conditions, in 2007-2010 were not sufficient to favor the establishment of the newly emerging H5N1 strain in the population. In contrast to typical AIV epidemics (which exhibit seasonal variation in viral genetic diversity) the genetic diversity of the H5N1 virus did not exhibit seasonal variation in genetic diversity during the 2005-2010 period of the epidemic in Europe and Russia, which suggests that transmission of the virus was influenced by factors different from those typically observed in AIV epidemics.  Our analyses indicate that domestic birds were the most probable ancestral host type for both genes. Domestic birds are inferred to be the most probable host type along many branches of the HA and NA phylogenies, suggesting that the evolution of the H5N1 HPAIV virus largely occurred in association with domestic-bird populations. By contrast, the large proportion of wild-bird isolates ( Figure 2) and the inferred high transition rates between host types (Table 1) suggest that wild birds also played an important role in viral dispersal between host populations and geographic areas. This finding is consistent with a number of alternative hypotheses, including scenarios where (1) domestic birds were the source of the spread of H5N1 HPAIV into Europe; or (2) where the HPAI strains were first transmitted from non-EU wild birds to non-EU domestic birds (with little or no genetic change), and later transmitted from non-EU domestic birds to EU domestic birds (Figure 2). The first scenario may be explained by the illegal introduction of domestic birds into Europe, which has been implicated in the spread of other diseases within some European countries [55]. The second scenario assumes very low selective pressure on the virus within-And rapid transmission of the virus among-Wild-bird populations, which would explain the high mortality in wild-bird populations and the reported prevalence of the virus along migratory routes [56]. These two seemingly contradictory scenarios may be reconciled as follows: H5N1 HPAIV was initially transmitted from non-EU wild birds to domestic poultry as a less virulent and non-reassortant strain of the virus. Subsequently, the virulent strain of H5N1 HPAIV that caused the EU epidemic emerged as a reassortant strain from poultry, as suggested elsewhere [57]. The high density of poultry populations may have resulted in elevated contact and transmission rates, ultimately enhancing opportunities for viral evolution in poultry compared to wild birds. Additionally, the three sequences isolated from non-avian species were obtained from hosts that prey on birds (two from cats and one from a marten), which is both consistent with a wild-bird origin of infection (Table 1) and with the results of previous studies [11,58].
Our divergence-time estimates (Figure 2) suggest that H5N1 HPAIV originated and experienced reassortment among hosts between 1995 and1996 ( Figure S1). These results are consistent with the frequent reports of highly reassorted H5N1 HPAIVs introductions into eastern and southeastern Asia since the emergence of the ancestral strain in 1996 (A/goose/Guangdong/1/96) [5].  Results of our discrete phylodynamic analyses indicate that Russia, Romania, and Germany were important epicenters of the H5N1 HPAIV epidemic in Europe. Specifically, our results suggest that the virus first originated and accumulated in Russia, with subsequent significant dispersal rates out of Russia into Romania (Figures 3-5 and S2). Later, significant dispersal rates occurred between Romania and Germany. In fact, exchange between Russia-Romania and Romania-Germany represent the most significant dispersal rates of the virus during the course of the epidemic ( Figure 5, Table S4). Finally, in late 2005 and early 2006 the virus spread independently from Romania and Germany, respectively into other areas of the EU ( Figure S3); a scenario that is consistent with the results of previous studies [1,[14][15][16]30]. Results of our discrete phylodynamic analyses indicate that Russia, Romania, and Germany were important epicenters of the H5N1 HPAIV epidemic in Europe. Specifically, our results suggest that the virus first originated and accumulated in Russia, with subsequent significant dispersal rates out of Russia into Romania (Figures 3-5 and Figure S2). Later, significant dispersal rates occurred between Romania and Germany. In fact, exchange between Russia-Romania and Romania-Germany represent the most significant dispersal rates of the virus during the course of the epidemic ( Figure 5, Table S4). Finally, in late 2005 and early 2006 the virus spread independently from Romania and Germany, respectively into other areas of the EU ( Figure S3); a scenario that is consistent with the results of previous studies [1,[14][15][16]30].        (Figures 1 and S3). The geographic expansion of the virus entailed two major waves. The first wave began in early 2005 from Russia, intensified in Romania, and resulted in the spread of the virus into Germany, Ukraine, and Turkey. The second wave began in early 2006, also commencing in Russia, intensified in both Romania and Germany, and resulted in the spread of the virus into Switzerland, Czech Republic, Denmark, and Sweden [11][12][13][14][15][16][17][18][19][20]. Additionally, a minor wave in 2007 emerged and intensified within Germany, and resulted in the spread of the virus into France, Hungary, and Poland [13,19].
The results of our phylodynamic analyses generally appear to be robust ( Table 2). The significance (p-value < 0.001) of the observed association-index (AI) values and their 95% credible intervals for both genes indicate a strong relationship between the inferred phylogeny of the H5N1 virus and the  (Figure 1 and Figure S3). The geographic expansion of the virus entailed two major waves. The first wave began in early 2005 from Russia, intensified in Romania, and resulted in the spread of the virus into Germany, Ukraine, and Turkey. The second wave began in early 2006, also commencing in Russia, intensified in both Romania and Germany, and resulted in the spread of the virus into Switzerland, Czech Republic, Denmark, and Sweden [11][12][13][14][15][16][17][18][19][20]. Additionally, a minor wave in 2007 emerged and intensified within Germany, and resulted in the spread of the virus into France, Hungary, and Poland [13,19].
The results of our phylodynamic analyses generally appear to be robust ( Table 2). The significance (p-value < 0.001) of the observed association-index (AI) values and their 95% credible intervals for both genes indicate a strong relationship between the inferred phylogeny of the H5N1 virus and the discrete traits (geographic area and host type), and also indicate that both host type and geography played important roles in the transmission of the virus. By contrast, the Kullback-Leibler (KL) values for the discrete host-type models were fairly small for both genes (suggesting relatively poor model fit), whereas the KL values for the geographic models were large, indicating good fit the between model and the geographic data [29]. Although generally robust, the results of this study are based on analyses that necessarily entailed several compromises, including: (1) imprecise geographic information on the isolation of H5N1 sequences; and; (2) incomplete and possibly biased sampling of H5N1 isolates. First, precise latitude/longitude coordinates were unavailable for 65.8% of the sequences used in this study; instead only the country of isolation was recorded. In these cases, we used the centroid of the country as a proxy for the location, which is apt to depart substantially from the true location in most cases, which may bias our inferences of the phylogeographic history of the H5N1 virus. This sampling issue precludes our use of the continuous phylodynamic model described by Lemey et al. [59], as these models require more precise information on the locations of the viral samples. Unfortunately, this information in typically not recorded for publically available viral sequence data in GenBank or disease databases. The greatest obstacle for applying continuous phylodynamic models to the H5N1 system is therefore mainly an issue of data confidentiality. The impact of incomplete spatial metadata on the performance of phylogeographic models has been discussed elsewhere [60,61]. Inferences under the phylodynamic models-like all statistical inference methods-assume that we have either a complete or random sample of sequence data. In the present case, this requires that the H5N1 sequences were collected randomly with respect to time (between ∼1990 and 2010), geographic location, and host type. Myriad practical considerations almost certainly result in very strongly biased samples, and the impact of these departures from random sampling on the estimates are difficult to quantify. Although certainly incomplete, and almost certainly non-random, our study is based on all available sequence data for the HA and NA gene regions associated with the H5N1 epidemic in the EU during 2005-2010, and therefore reflects our best understanding based of the available data.
Phylodynamic analysis has not yet been widely embraced as a resource by public health agencies to support the design of disease-surveillance strategies. For example, most of the published literature on avian influenza-including disease summary reports and peer-reviewed scientific studies-upon which global disease agencies base their decision-making process primarily include numerical summaries of reported cases in each country, risk maps based purely on spatial and temporal distribution of the disease, and conventional phylogenetic analysis of sequenced isolates from these outbreaks [11][12][13][14][15][16][17][18][19]. The 2009 AI global pandemic demonstrates the ability of phylodynamic analyses to provide novel insights for decisions regarding animal and public health [62]. Our phylodynamic analyses of an H5N1 HPAIV sequence dataset and associated metadata provides insights on the origin of the outbreak in Europe (e.g., the ancestral location or host type) and the temporal and the spatial progression of this epidemic. These inferences could be used to inform prevention and control measures to block the virus at the source (e.g., high-risk geographical areas or bird species), which can limit the spread of the virus both to the domestic poultry population or to uninfected geographic areas. Furthermore, the ability to infer and predict routes of viral transmission has clear implications for informing the selection of appropriate strains for vaccine production to more effectively control new reassortant strains of AIV in future epidemics. Accordingly, incorporating phylodynamic analyses as a standard tool for the molecular surveillance of AI might support the development of more effective (and cost effective) policy decisions for the control of this virus in high-risk regions.

Conclusions
The purpose of this study was to illustrate the potential of a Bayesian phylodynamic approach to inform AI surveillance and control programs and associated decision-making processes. Our results indicate that the peak in viral genetic diversity between 2005 and 2006 coincided with its geographic spread from Russia into Romania and Germany, and throughout the rest of the EU. This expansion was followed by a drastic decline in viral genetic diversity, which corresponds with the epidemic decline in most EU member states. Our findings corroborate previous conclusions that wild birds were the most important vectors for the spread of the disease. However, our analyses indicate that domestic (rather than wild) birds were the source of the epidemic, and that viral reassortment most likely occurred in domestic species. Finally, our results provide further demonstration of the potential of the Bayesian phylodynamic approach to effectively study the dynamics of disease origin and spread, which promises to maximize the utility of molecular sequence data for the surveillance and control of emerging diseases. These results will contribute to the design of surveillance and control strategies for H5N1 HPAIV in Europe and provide a methodological framework for molecular epidemiological modeling of emerging novel HPAIVs at regional, national, and international scales. This work can also assist global and European disease agencies in their effort to improve the efficiency of their surveillance systems by sharing and improving their genomic databases in the course of emergence of new AIVs.