Robust Phylodynamic Analysis of Genetic Sequencing Data from Structured Populations

The multi-type birth–death model with sampling is a phylodynamic model which enables the quantification of past population dynamics in structured populations based on phylogenetic trees. The BEAST 2 package bdmm implements an algorithm for numerically computing the probability density of a phylogenetic tree given the population dynamic parameters under this model. In the initial release of bdmm, analyses were computationally limited to trees consisting of up to approximately 250 genetic samples. We implemented important algorithmic changes to bdmm which dramatically increased the number of genetic samples that could be analyzed and which improved the numerical robustness and efficiency of the calculations. Including more samples led to the improved precision of parameter estimates, particularly for structured models with a high number of inferred parameters. Furthermore, we report on several model extensions to bdmm, inspired by properties common to empirical datasets. We applied this improved algorithm to two partly overlapping datasets of the Influenza A virus HA sequences sampled around the world—one with 500 samples and the other with only 175—for comparison. We report and compare the global migration patterns and seasonal dynamics inferred from each dataset. In this way, we show the information that is gained by analyzing the bigger dataset, which became possible with the presented algorithmic changes to bdmm. In summary, bdmm allows for the robust, faster, and more general phylodynamic inference of larger datasets.


Introduction
Genetic sequencing data taken from a measurably evolving population contain fingerprints of past population dynamics [1]. In particular, the phylogeny spanning the sampled genetic data contains information about the mixing pattern of different populations and thus contains information beyond what is encoded in classic occurrence data; see, e.g., Hey and Machado [2], Stadler and Bonhoeffer [3]. Phylodynamic methods [4,5] aim at quantifying past population dynamic parameters, such as migration rates, from genetic sequencing data. Such tools have been widely used to study the spread of infectious diseases in structured populations; see, e.g., Dudas et al. [6], Faria et al. [7] as examples of analyses of recent epidemic outbreaks. The Bayesian phylodynamic inference framework BEAST2 [8] is one of the software frameworks within which such analyses can be carried out. With BEAST2, tree topologies, parameters from phylodynamic, molecular clock, and substitution models can be jointly inferred via Markov-Chain Monte-Carlo (MCMC) sampling. Both the host population and the pathogen population may be structured (e.g., the host population may be geographically structured), and the pathogen population may consist of a drug-sensitive and a drug-resistant subpopulation. Understanding how these subpopulations interact with one another-whether they are separated by geographic distance, lifestyles of the hosts, or other barriers-is a key determinant in understanding how an epidemic spreads. In macroevolution, different species may also be structured into different "subpopulations", e.g., due to their geographic distribution or to trait variations; see, e.g., Hodges [9]. Phylodynamic tools aim at quantifying the rates at which species migrate or the rates at which traits are gained or lost, as well as the rates of speciation and extinction within the "subpopulations"; see, e.g., Goldberg et al. [10], Mayrose et al. [11], Goldberg et al. [12].
The phylodynamic analysis of structured populations can be performed using two classes of models, namely coalescent-based and birth-death-based approaches. Both have their unique advantages and disadvantages [13,14]. Here, we report major improvements on a multi-type birth-death-based approach.
A multi-type birth-death model is a linear birth-death model accounting for structured populations. Under this model, the probability density of a phylogenetic tree can be calculated by numerically integrating a system of differential equations. The use of this model within a phylodynamic setting and the associated computational approach were initially proposed for the analysis of species phylogenies [15] and later for the analysis of pathogen phylogenies [3,13]. The BEAST2 package bdmm generalizes the assumptions of these two initial approaches [16]. It further allows for co-inferring phylogenetic trees together with the model parameters and thus explicitly takes phylogenetic uncertainty into account. Datasets containing more than 250 genetic sequences could not be analyzed using the original bdmm package [16] due to numerical instability. This limitation was a strong impediment to obtaining reliable results, particularly for the analysis of structured populations as the quantification of parameters which characterize each subpopulation requires a significant amount of samples from each of them. The instability was due to numerical underflow in the probability density calculations, which meant that probability values extremely close to zero could not be accurately calculated and stored. We were able to solve the numerical instability issue of bdmm, thereby lifting the hard limit on the number of samples that could be analyzed. In addition, the practical usefulness of the bdmm package had previously been restricted by the amount of computation time required to carry out the analyses. Here, we report on significant improvements in computation efficiency. As a result, bdmm can now handle datasets containing several hundred genetic samples. Finally, we made the multi-type birth-death model more general in several ways: homochronous sampling events can now occur at multiple time points (not only the present), individuals are no longer necessarily removed upon sampling, and the migration rate specification has been made more flexible by allowing for piecewise-constant changes through time.
Overall, these model generalizations and implementation improvements enable more reliable and ambitious empirical data analyses. Below, we use the new release of bdmm to quantify the Influenza A virus spread around the globe as a sample application and compare the results obtained with those from the reduced dataset analyzed in Kühnert et al. [16].

Description of the Extended Multi-Type Birth-Death Model
First, we formally define the multi-type birth-death model on d types [16], including the generalizations introduced in this work. The process starts at time 0 with one individual; this is also called the origin of the process and the origin of the resulting tree. This individual is of type i ∈ {1 . . . d}, with a probability of h i (where ∑ d i=1 h i = 1). The process ends after T time units (in the present). The time interval (0, T) is partitioned into n intervals through 0 < t 1 < . . . < t n−1 < T, and we define t 0 := 0 and t n := T. Each individual at time t, t k−1 ≤ t < t k , k ∈ {1 . . . n} of type i ∈ {1 . . . d}, gives rise to an additional individual of type j ∈ {1 . . . d}, with a birth rate of λ ij,k , migrates to type j with a rate of m ij,k (with m ii,k = 0), dies with a rate of µ i,k , and is sampled with a rate of ψ i,k . At time t k , each individual of type i is sampled with a probability of ρ i,k . Upon sampling (either with a rate of ψ i,k or a probability of ρ i,k ), the individual is removed from the infectious pool with a probability of r i,k . We summarize all birth rates λ ij,k in λ, migration rates m ij,k in m, death rates µ i,k in µ, sampling rates ψ i,k in ψ, sampling probabilities ρ i,k in ρ, and removal probabilities r i,k in r, i, j ∈ {1, . . . , d}, k ∈ {1, . . . , n}. The model described in Kühnert et al. [16] is a special case of the above and assumes that migration rates are constant through time (i.e., do not depend on k), removal probabilities are constant through time and across types (i.e., do not depend on k and i), and that ρ i,k = 0 for k < n and i ∈ {1 . . . d}.
This process gives rise to complete trees on sampled and non-sampled individuals, with types being assigned to all branches at all times (Figure 1, left). Following each branching event, one offspring is assigned to be the "left" offspring and another to be the "right" offspring, with each assignment having a probability of 1 2 . In the figure, we draw the branch with the assignment "left" on the left and the branch with the assignment "right" on the right. Such trees are called oriented trees, and considering oriented trees will facilitate the calculation of the probability densities of trees. Pruning all lineages without sampled descendants leads to the sampled phylogeny ( Figure 1, middle and right). The orientation of a branch in the sampled phylogeny is the orientation of the corresponding branch descending from the common branching event in the complete tree. When the sampled phylogeny is annotated with the types along each branch, we refer to it as a branch-typed tree (Figure 1, middle). On the other hand, if we discard these annotations but keep the types of the sampled individuals, we call the resulting object a sample-typed (or tip-typed) tree (Figure 1, right).

Complete tree
Branch-typed tree Sample-typed tree T 0 time t 1 Figure 1. Complete tree (left) and sampled trees (middle and right) obtained from a multi-type birthdeath process with two types. The orange and blue dots on the trees represent sampled individuals and are colored according to the type these individuals belong to. A ρ-sampling event happens at time t 1 . The grey squares represent degree-2 nodes added to branches crossing this event. ρ-sampling also happens in the present (time T). As seen in the complete tree, the three individuals who were first sampled were not removed from the population upon sampling, whereas the three individuals sampled at time t 1 were.
Here, we give an overview of the computation of the probability density of the sampled tree (i.e., the sample-typed or branch-typed tree) given the multi-type birth-death parameters λ, m, µ, ψ, ρ, r, h, T. This probability density is obtained by integrating probability densities g from the leaf nodes (or "tips"), backwards along all the edges (or "branches") to the origin of the tree. Our notation here is based on previous work [16,17], and the probabilities p i,k (t) and g e i,k (t) relate to E and D in Stadler and Bonhoeffer [3], Maddison et al. [15], respectively.
Every branching event in the sampled tree gives rise to a node of degree 3 (i.e., 3 branches are attached). Every sampling event gives rise to a node of degree 2 (called "sampled ancestor") or 1 (called "tip", as defined above). A sampling event at time t = t k , k ∈ {1, . . . , n}, is referred to as a ρ-sampled node. All other nodes corresponding to samples are referred to as ψ-sampled nodes.
Furthermore, degree-2 nodes are placed at time t k on all lineages crossing time t k , k = 1, . . . , n − 1, as shown at time t 1 in Figure 1. In a branch-typed tree, a node of degree 2 also occurs on a lineage at a time point when a type change occurs. Such type changes may be the result of either migrations or birth events in which one of the descendant subtrees is unsampled (Figure 1, middle).
We highlight that in bdmm, we assume that the most recent sampling event happens at time T. This is equivalent to assuming that the sampling effort was terminated directly after the last sample was collected and overcomes the necessity for users to specify the time between the last sample and the termination of the sampling effort at time T.
The derivation of the probability density of a sampled tree under the extended multitype birth-death model is developed in Appendix A. This probability density, also called a "phylodynamic likelihood", can be used to estimate the multi-type birth-death parameters λ, µ, ψ, m, T, when used in a Bayesian phylodynamic framework such as BEAST 2 by Bouckaert et al. [8]. Note that unlike other parameters of this model, h is typically not estimated via MCMC sampling. h i values can be set according to different rationales: the root type can be fixed to a particular type k (h k = 1, h i = 0 for i = k), or all types can be equally likely (h i = 1 n ), or they can be set to the equilibrium distribution (derived by Stadler and Bonhoeffer [3]) given that the process was already in equilibrium at the time of origin.

Implementation Improvements
The computation of the probability densities of sampled trees under the multi-type birth-death model requires the numerical solving of Ordinary Differential Equations (ODEs) along each tree branch. We were able to significantly improve the robustness of the original bdmm implementation, which suffered from instabilities caused by the underflow of these numerical calculations. Compared to the original implementation, we prevented the underflow by implementing an extended precision floating point representation (EPFP) for storing intermediary calculation results. In addition to this improvement in stability, we improved the efficiency of the probability density calculations by (1) using an adaptive step-size integrator for numerical integration, (2) performing preliminary calculations and storing the results for use during the main calculation step, and (3) distributing calculations among threads running in parallel. Details can be found in Appendix B.
The latest release with our updates, bdmm 1.0, is freely available as an open access package of BEAST 2. The source code can be accessed at https://github.com/denisekuehnert/ bdmm (accessed on 26 July 2022).

Evaluation of Numerical Improvements
We compared the robustness and efficiency of the improved bdmm package against its original version. We considered tree sizes varying between 10 and 1000 samples. For each tree size, we simulated 50 branch-typed and 50 sample-typed trees under the multi-type birth-death model using randomly drawn parameter values from the distributions shown in Table A1. The distributions from which the parameters were drawn were selected to reflect a wide range of scenarios. For each simulated tree, we measured the time taken to perform the calculation of the probability density, given the parameter values under which the tree was simulated, using the updated and the original bdmm implementation. We report here the wall-clock time taken to perform this calculation 5000 times ( Figure 2). All computations were performed on a MacBook Pro with a dual-core 2.3 GHz Intel Core i5 processor. The new implementation of bdmm is on average nine times faster than the original ( Figure 2A). The robustness of the updated implementation was demonstrated by reporting how often the implementations returned −∞ for the probability density in log space. We call these calculations "failures", the most likely cause of error being the underflow. Our new implementation showed no calculation failure for trees containing a thousand samples, while in the original implementation calculations often failed for trees with more than 250 samples ( Figure 2B).

Validation against Original Implementation
To ensure that no errors were introduced into the updated bdmm, we validated the improved implementation against the original bdmm version by comparing the results of likelihood computation on a handful of randomly simulated trees. We used simulated trees with 10 or 100 tips, well below the limit of reliability of the original bdmm version (approximately 250 tips). Details of this procedure can be found in Appendix C. Figure 3 shows one of such simulated trees along with tree likelihood values (or the probability density of the sampled tree given the multi-type birth-death parameters) computed with each bdmm version. Likelihood computation results are identical for all trees and parameters tested for both implementations (difference in log-likelihoods < 1 × 10 −6 ). Figure A3 shows that the same results were obtained with other trees or when varying other parameters. Therefore, we conclude that the results of the full validation, along with error and bias assessment performed by Kühnert et al. [16] on the original bdmm version, hold true for the improved bdmm implementation we present in this article.

Influenza A Virus (H3N2) Analysis
As an example of a biological question that can be investigated with the use of bdmm, we analyzed 500 H3N2 influenza virus HA sequences sampled around the globe from 2000 to 2006; we aimed to recover the seasonal dynamics of the global epidemics. The dataset is a random subset of the data analyzed by Vaughan et al. [18], taken from three different regions around the globe: New York (North, n = 167), New Zealand (South, n = 215), and Hong Kong (Tropics, n = 118). The dataset of 980 samples assembled by Vaughan et al. [18] was built with the aim of gathering samples from three locations with relatively similar population sizes, each representative of the northern, southern, or equatorial regions.
As a comparison, we performed an identical analysis of the H3N2 influenza dataset with 175 sequences sampled between 2003 and 2006 that was used in [16]. This dataset of 175 sequences was also a subset of the data by Vaughan et al. [18], and it also grouped samples from three locations denoted as North (for the northern hemisphere), South (for the southern hemisphere), and Tropic (for tropical regions). Note that the latter dataset came from more geographically spread samples, and thus we did not expect results from both analyses to be perfectly comparable. As we were dealing with pathogen sequence data, we adopted the epidemiological parametrization of the multi-type birth-death model, as detailed in Kühnert et al. [16]. The epidemiological parametrization substitutes birth, death, and sampling rates with effective reproduction numbers within types, rate at which hosts become noninfectious, and sampling proportions. To study the seasonal dynamics of the global epidemic, we allowed the effective reproduction number R e to vary through time. To do so, we subdivided time into six-month intervals (starting April 1st and October 1st), and we constrained the effective reproduction number values corresponding to the same season across different years to be equal for each particular location, assuming that the R e values were the same in the summer seasons and the same in the winter seasons. The testing of this hypothesis' validity by estimating the R e values that varied in each six-month period was not performed as we expected little information from the data for the additional parameters. For the same reason, the migration rate was not varied through time. Further details on the data analysis configuration can be found in Appendix D.
The analysis of the larger dataset (500 samples) allowed for the reconstruction of the phylogenetic tree encompassing a longer time period, and therefore gave a more long-term and detailed view of the evolution of the global epidemic (see Figure 4 for the Maximum Clade Credibility trees). As can be expected for the tropical location, in both analyses, the effective reproduction numbers for H3N2 influenza A were inferred to be close to one throughout the year ( Figure 5A). Conversely, strong seasonal variations can be observed in the Northern and Southern hemisphere locations, where the effective reproduction number was close to one in winter and was much lower in summer. Inferences from the small and large datasets are mostly in agreement. A subtle difference appears: in the larger dataset, the effective reproduction numbers in the winter seasons and in the tropical location are closer to one, with less variation across estimates. This seems to indicate that the variations between estimates observed with the smaller dataset, including samples from 2003 to 2006 (for instance, R e in winter in the North compared to R e in winter in the South), are due to stochastic fluctuations, which are averaged out when considering a longer period of transmission dynamics in the larger dataset covering the years 2000-2006.
The precise inference of migration rates is more difficult, as reflected by the significant uncertainty we obtained on the estimates ( Figure 5B). However, we still observed that the uncertainty was generally reduced for the inference performed with the larger dataset, as expected. A significant difference between the migration rates inferred from the Southern to Tropical locations arose between the two analyses. With the larger dataset, the estimated rate was much lower than that with the smaller one, and it was more in range with the other migration rate estimates. Detailed results of all the parameter estimates for both analyses are available in Table A4. Most notably, the estimates of the root locations for both datasets are very similar. In both cases, the tropical location is most likely to be the location of the root; however, neither of the two other locations can be entirely excluded.

Properties of bdmm 3.4.1. Identifiability of Parameters
In birth-death models with sampling through time or in the present, only two of the three parameters of birth rate, death rate, and sampling rate/sampling probability can be jointly estimated [19,20]. Thus, independent prior data need to be employed to quantify all three parameters. In recent work, the question of identifiability in time-dependent birth-death sampling models has been thoroughly investigated [20,21]. The issue of identifiability in state-dependent birth-death sampling models remains, to our knowledge, largely unanswered. The interactions between migration rates, rates of birth-among-demes, and other multi-type birth-death parameters is not well-known. It is likely that different parameter combinations of the multi-type birth-death model can yield the same likelihood value. Informative prior information on some of the birth-death parameters mitigates parameter non-identifiability issues.

Computational Costs
Despite the implementation improvements presented in this manuscript, phylodynamics analyses performed in bdmm are still limited in practice by the number of genetic sequences they can handle. This limitation, unlike the previously existing limitation caused by underflow, is not a hard boundary but rather a soft boundary imposed by the practical constraints of computational analyses. Limitations with regard to the complexity of the analyses that could be carried out with the improved version of bdmm derive from the time required to carry out computations and from the complexity of the probability space that must be explored. For instance, each MCMC chain for the 500-sample Influenza A analysis required about 15 days to compute. Keeping the same analysis setup and increasing the number of genetic samples will have a linear effect on the time required to compute the phylodynamic likelihood with bdmm. With our updated bdmm implementation, the core bottleneck is the complexity of exploring tree space, which increases exponentially with additional samples. Due to this complexity, only trees with up to around 1000 samples can be successfully estimated with BEAST2.

Discussion
The multi-type birth-death model with its updated implementation in the bdmm package for BEAST 2 provides a flexible method for taking into account the effect of population structure when performing a phylodynamic genetic sequence analysis. Compared to the original implementation, we now prevent the underflow of numerical calculations and speed up calculations by almost an order of magnitude. The size limit of around 250 samples for datasets that could be handled by bdmm is thus lifted, and significantly larger datasets can now be analyzed. Now, the bottleneck lies in the search for tree space with MCMC rather than with bdmm. We demonstrate this improvement by analyzing two datasets of Influenza A virus H3N2 genetic data from around the globe. One dataset has 500 samples and could not have been analyzed with the original version of bdmm, while the other one contains 175 samples and is the original sample dataset analyzed in [16]. Overall, we observed that analyzing a dataset with more samples, as expected, gives a more long-term picture of the global transmission patterns and reduces the general uncertainty concerning parameter estimates.
With the addition of the so-called ρ-sampling events in the past, intense sampling efforts limited to short periods of time (leading to many samples being taken nearly simultaneously) can be easily modeled as instantaneous sampling events across the entire population (or subpopulation) rather than as non-instantaneous sampling over small sampling intervals. This simplifies the modeling of intense pathogen sequencing efforts in very short time windows. By allowing the removal probability r (the probability for an individual to be removed from the infectious population upon sampling) to be typedependent and to vary across time intervals, as well as allowing migration rates between types to vary across time intervals, we further increase the generality and flexibility of the multi-type birth-death model. A sample bdmm analysis with a ρ-sampling event in the past was added to the software package to guide users who may want to set up such an analysis with their own data.
We focused on an epidemiological application of bdmm, where we co-infer the phylogenetic trees to take into account the phylogenetic uncertainty. However, the bdmm modeling assumptions are equally applicable to the analysis of macroevolutionary data, in which context bdmm allows for the joint inference of trees with fossil samples under structured models [22]. When using a multi-type birth-death model in the macroevolutionary framework, ρ-sampling can be used to model fossil samples originating from the same rock layer. In the context of the exploration of trait-dependent speciation, structured birth-death models such as the binary-state speciation and extinction model (BiSSE) [23,24] have been shown to possibly produce spurious associations between character state and speciation rate when applied to empirical phylogenies [25]. When used in this fashion, users of bdmm should assess the propensity of their dataset analysis for Type I errors through neutral trait simulations, as suggested by Rabosky and Goldberg [25].
In summary, the new release of bdmm overcomes several constraints when analyzing sequencing data in BEAST2. As it stands, the main constraint now is given by the efficiency of the BEAST2 MCMC tree space sampler rather than bdmm itself. We expect the new release of bdmm to become a standard tool for the phylodynamic analysis of sequencing data and phylogenetic trees from structured populations.
Supplementary Materials: The following are available at https://www.mdpi.com/article/10.339 0/v14081648/s1. The XML files for the replication of the BEAST2 analyses of the two datasets with 175 and 500 samples of the H3N2 influenza virus HA sequences are available as supplementary files. To run these XML files using BEAST2, the bdmm and feast packages must be installed.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Derivation of the Probability Density of a Sampled Tree
Appendix A. 1

. Probability of Having No Sampled Descendants
In order to compute the probability density of a sampled tree, we need to calculate the probability of an individual with type i ∈ {1 . . . d} at time t k−1 ≤ t ≤ t k to have no sampled descendant, which we denote by p i,k (t). In the interval t k−1 ≤ t < t k , this probability satisfies the ordinary differential equation (ODE) The terms on the right-hand side in the first line correspond to the probability of no event happening, and the terms in the second line correspond to either a branching, migration, or death event happening. For a detailed derivation via Master equations, the reader is referred to Maddison et al. [15] or Stadler and Bonhoeffer [3]. For t = t k , k ∈ {0 . . . n}, we have

. Probability Density of a Sample-Typed Subtree
We use g e i,k (t) to denote the probability density that an individual represented by branch e at time t (with t k−1 ≤ t ≤ t k , k ∈ {1 . . . n}) in state i ∈ {1, . . . , d} has evolved between t and T, as observed in the tree. Branch e is connected to two nodes, and we denote the more recent node with n e , occurring at time t k−1 < t e ≤ t k . Node n e represents either a sampling event ( Figure A1A), a branching event (Figure A1B), or a degree-2 node at t e = t k with or without sampling ( Figure A1C,D). One can show that g e i,k (t) for t < t e satisfies the ODE with the derivation being analogous to that of p i,k (t).  Figure A1. Possible configurations for node n e on branch e at time t k−1 < t e ≤ t k . (A) n e is a ψ-sampled node at time t e < t k , with or without sampled descendants. (B) n e is a branching event at time t e < t k . (C) n e is a ρ-sampling node at time t e = t k , with or without sampled descendants. (D) n e is a degree-2 node at time t e = t k without sampling.
We denote the branch (or two branches) descending from n e at time t e with e 1 (respectively e 1 and e 2 ). The initial conditions for the differential equations at node n e , i.e., the values of the probability densities at the most recent end of the branch e, are as follows: if n e is a ψ-sampled tip of type i, if n e is a ψ-sampled ancestor of type i, if n e is a ρ-sampled tip of type i, if n e is a ρ-sampled ancestor of type i, 0 if n e is a sample of type j = i, if n e is not a sample and t e = t k , i,k (t e )g e 2 j,k (t e ) + g e 1 j,k (t e )g e 2 i,k (t e ) if n e has two descendant branches.
The 1 2 in the last equation is needed since we are computing the probability density of an oriented tree.

Appendix A.3. Probability Density of a Branch-Typed Subtree
When inferring branch-typed trees, we condition on the type of a branch at all times. Thus, we do not integrate over migration events or unobserved branching events that change the type of the tree lineage. We define η i,j = 1 for i = j and η i,j = 2 for i = j. Equation (A3) is replaced by with the initial conditions if n e is a ψ-sampled tip of type i, if n e is a ψ-sampled ancestor of type i, if n e is a ρ-sampled tip of type i, if n e is a ρ-sampled ancestor of type i, 0 if n e is a sampled tip of type j = i, if n e is not a sample and t e = t k , m ij,k g e 1 j,k (t e ) + λ ij,k g e 1 j,k (t e ) p i,k (t e ) if n e has one descendant branch with type j = i, if n e has two descendant branches. (A6) Here, the 1 2 in the last equation is also needed since we are computing the probability density of an oriented tree. In branch-typed trees, a branch is always of one single type. The type of branch e being i implies that g e j,k (t) = 0 for j = i. Indeed, Equation (A6) states that g e j,k (t e ) = 0 for i = j. Furthermore, Equation (A5) specifies g e j,k (t) = 0 for all t < t e .
is on the order of 2 −2 10 10 −320 . We calculate a probability densityf over a tree space (together with intermediate values g e i,k (t) as part of the ODE integration), and these values can fall well below the lower limit imposed by the standard 64-bit DPFP representation, resulting in underflow errors.
To work around this limitation, our new implementation of bdmm employs a 96-bit extended precision floating point representation (EPFP), in which the mantissa is represented using a standard 64-bit double precision floating point number, and the exponent is represented using a 32-bit signed integer value. This dramatically extends the range of possible values; in particular, the smallest non-zero absolute value is 2 −2 31 10 −646456993 .
A naive approach to employing the EPFP representation would be to use numbers of this kind at each and every step in the probability density calculation. This would ensure that all intermediate calculation results were accurately stored and would allow for accurate calculations at the next step.
However, this approach has two main drawbacks. Firstly, existing numerical integration libraries almost exclusively use the DFPF representation. While integration algorithms could certainly be implemented for other representations such as the EPFP, this would be extremely time-consuming. Secondly, since DPFP calculations are implemented at a hardware level on modern processors, calculations using this representation are performed very efficiently. Abandoning this primitive data type thus makes basic calculation steps considerably less efficient.
For these reasons, our approach involves a mixture of using both representations. The numerical integration of the ODEs along each branch of the sampled phylogeny (Equations (A3) and (A5)) is performed using the DPFP representation, while the combination of these integration results-accomplished to provide the boundary condition for the next integration (Equations (A4) and (A6))-is achieved using the EPFP representation.
In order to avoid underflow at all times, the EPFP numbers are scaled before they are converted to the DPFP representation used by the integrator. The scaling procedure amounts to an integration by substitution. We scale g e i,k (t k ) by the linear substitution function G e i,k (t k ) = δ × g e i,k (t k ), with δ being a scale factor. Since the original Equations (A3) and (A5) are both linear in g e i , this scaling does not affect the results of their integration once the inverse scaling at time t k−1 has been completed. The method employed to choose an appropriate scale factor is described in detail below. The goal of scaling is to make sure all values stored as DPFP actually fit the window of values that this representation can express. After each numerical integration step, the results are converted back to EPFP and rescaled accordingly. The differential equation for the probability p i,k (t) of having no sampled descendants is not scaled at all as its integration does not cause any underflows in practice.

Appendix B.2. Choice of a Scale Factor
The scale factor δ is shared among all subpopulations. Therefore, it has to be carefully chosen so that all initial conditions fit into the window of values that can be represented in DPFP. To choose δ, we use three rules: A, B, and C. We apply rule A if possible, otherwise we apply rule B, and otherwise, rule C. These rules are described below and illustrated in Figure A2, with a brief summary in the legend.
Let (x i ) represent a set of floating point representations of real positive numbers (x i ), as defined in Equation (A8). (x i ) are the scaled valueŝ (A9) q min and q max denote the minimum and maximum exponents over allx i . We define q low = −1022 and q high = 1023, respectively the lowest and highest acceptable exponents for a scaled valuex i . We define q gap = 2040, the largest accepted difference between exponents across pairs of scaled valuesx i . The values q low , q high , and q gap were chosen to accommodate 64-bit DPFP representation, with some wiggle room to eliminate the need to account for edge cases.
Rule A is applied if: In this case, the scale factor δ is simply −q min . With this rule, the minimal exponent of scaled values becomes zero. Thus, scaled values are located between 1 and the maximum value in DPFP representation. Rule B is applied if: The scale factor is δ = q low − q min + (q gap − (q max − q min ))/2. With this rule, the exponents of scaled values are approximately centered around zero. Sets of valuesx i with greater variation between the smallest and the biggest elements can be handled as compared with rule A.
In the rare case that neither rule A nor rule B can be applied, rule C is applied. All (x i ) values with exponents q i such that are set to zero. These are the smallest values amongx i values. Then, rule B is applied to all non-zerox i values. In all panels, the white rectangle represents values that can be represented using DPFP. Dots represent the values of initial conditions for the differential equations of the multi-type birth-death model, before (1) and after (2)  We assume that r = 1 for all i, k in our analyses, i.e., individuals become noninfectious upon sampling. Furthermore, we assume that δ is constant across locations i and time intervals k. Sampling proportion and migration are assumed not to change through time.
To study the seasonal dynamics of the global epidemic, we allow the effective reproduction number R to vary through time. To do so, we subdivide time into six-month intervals (starting on April 1st and October 1st) for the time period during which we have samples. We set all values corresponding to the same season across different years to be equal for a particular location. Therefore, we infer two different values of R for each location X, one which corresponds to the April-September period: R X 1 , and another one for the October-March period: R X 2 . The location-specific sampling proportions s i are assumed to be constant for the time interval in which we have samples and null before the first sample.
Migration rates between types are inferred as products of a unique migration factor σ m and relative rates M i,j : This setting allows us to use rather informative priors of around 1 for the relative rates, and only requires a less informative prior for the unique migration factor. Table A3 lists the prior distributions assumed for the birth-death parameters.
Following [18], we use a GTR+Γ substitution model [29] and a strict molecular clock with the same priors as in [18].
The BEAST 2 analysis infers the birth-death model parameters, the substitution model parameters, and the clock model parameters together with the phylogenetic tree. In our analysis, the inferred phylogenetic trees are sample-typed trees. Thus, we do not attempt to reconstruct the history of migrations; rather, we marginalize over all the possible migration histories. This is realized in order to allow the MCMC chain to converge in a shorter time as compared to an analysis inferring branch-typed trees.
For each data analysis, we ran 10 parallel MCMC chains for 9 × 10 7 steps, each using the new implementation of bdmm as a package of BEAST 2.6 [8,30]. The XML files providing all analyses specifications are available as Supplementary Materials. The computations were run on the Euler computation cluster at ETH Zürich. Each MCMC chain in the 175-sample analysis required 150 computation hours on average (1500 h in total for ten chains). In the 500-sample analysis, each MCMC required 360 computation hours on average (3600 h for ten chains). We used Tracer v1.6 [31] to check for convergence. The effective sample size was above 200 for all parameters. We combined the chains run in parallel into one using LogCombiner. We obtained the MCC tree using TreeAnnotator. Both LogCombiner and TreeAnnotator are available as part of BEAST 2.6. Finally, we plotted the numerical results with the R package ggplot2 [32] and the MCC tree with ggtree [33]. Tables   Table A1. Distributions from which parameters were sampled for the simulation of trees. All parameters were constant through time.

Parameter Distribution
Unif(0, 0.5) ψ i Unif(0.05, 0.5) r i Unif(0, 1) . N, S, and T refer respectively to the North, South, and Tropics. For effective reproduction numbers R, the first subscript is the location while the second one refers to the period of the year. s, w, as, om respectively refer to summer, winter, april-september, and october-march. Thus, for instance, R S w refers to the effective reproduction number of samples from the southern hemisphere during the winter season. The tree height t is given in number of years, M is given in migrations/lineage/year, and δ is given in years −1 . The remaining parameters are unitless.  Figure A3. Comparisons of likelihood computation results between the original and improved bdmm versions for additional trees. (A,B) Randomly simulated ten-tip tree and log-likelihood computation results against λ 1 (birth rate of red deme). (C,D) Randomly simulated hundred-tip tree and loglikelihood computation results against λ 1 (birth rate of red deme). (E,F) Randomly simulated ten-tip tree and log-likelihood computation results against µ 1 (death rate of red deme).