Two Gaussian Bridge Processes for Mapping Continuous Trait Evolution along Phylogenetic Trees

: Gaussian processes are powerful tools for modeling trait evolution along phylogenetic trees. As the value of a trait may change randomly throughout the evolution, two Gaussian bridge processes, the Brownian bridge (BB) and the Ornstein–Uhlenbeck bridge (OUB), are proposed for mapping continuous trait evolution for a group of related species along a phylogenetic tree, respectively. The corresponding traitgrams to the two bridge processes are created to display the evolutionary trajectories. The novel models are applied to study the body mass evolution of a group of marsupial species.


Introduction
Evolution is the change in the heritable traits of biological populations over successive generations [1]. Evolution occurs ubiquitously on our planet because all species need to survive by adapting themselves to the living environment. For phenotypic evolution, the trait value, such as body mass, may change across generations in the evolutionary history. Scientists use mathematical models and computing tools to describe the behavior of change in trait evolution. In particular, the change of a trait value for a species can be modeled by a stochastic variable of continuous type varying with time.
Let the stochastic variable x t be a species' trait value at time t, x t solves the onedimensional stochastic differential Equation (SDE) in Equation (1) where dx t is the infinitesimal change in the character x over the infinitesimal interval from time t to time t + dt, b(x t , t, Θ) is called the drift coefficient, b(x t , t, Θ)dt measures the deterministic trait value in an infinitesimal time, σ(x t , t, Θ) is the diffusion coefficient, σ(x t , t, Θ)dt measures the change as a result of random perturbations in an infinitesimal time, W t is a Wiener process starting from W(0) = 0 with an independent Gaussian increment W(t) − W(s) of mean 0 and variance t − s, and Θ is a parameter vector (see [2,3]).
In the family of stochastic processes, the Brownian motion (BM) [4][5][6] and the Ornstein-Uhlenbeck (OU) processes [2,3,7] are the two most popular Gaussian processes for modeling trait evolution. A Brownian motion trait variable x t without the drift effect (i.e., b(x t , t, Θ) = 0) and with σ(x t , t, Θ) = σ solves the SDE in Equation (2) dx t = σdW t , (2) and due to the property of the Winer process, the variation of x t is proportional to time (i.e., Var(x t ) = σ 2 t), where the positive parameter σ is regarded as the rate of evolution.
When treating x t followed as an OU stochastic variable (i.e., b(x t , t, Θ) = α(θ − x t ) and σ(x t , t, Θ) = σ) [7], the corresponding trait variable x t solves the SDE in Equation (3) where the deterministic term α(θ − x t )dt is interpreted as the force of selection [2], α > 0 measures the strength of selection, θ ∈ R is the long-term mean level optimum of the trait, θ − x t measures the distance of the current trait value x t from the optimum θ, and σ > 0 is the rate of evolution, as defined earlier for the Brownian motion. In particular, when the force parameter α is 0, the OU process reduces to the Brownian motion. For studying the trait evolution for a group of related species, scientists apply the phylogenetic comparative methods (PCMs) to understand their evolutionary history [5,[8][9][10]. PCMs incorporate phylogenetic trees (acylic-directed diagrams that represent the evolutionary history) to analyze phenotypic data. For instance, an ornithologist applies PCMs to study the evolutionary divergence of waterfowl (e.g., swan) and sea-fowl (e.g., sea gull) by investigating their common traits (e.g., wing shape, flight speed, or/and wing span length) [11], while an evolutionary biologist uses PCMs to study the primate evolution by comparing their body size and brain size [12]. PCMs can also be applied to estimate the rate of evolution [6,13], estimate the ancestral status [8,14], test the evolutionary phylogenetic signal [15,16], and detect the evolutionary correlation among traits [17,18].
This work aims to provide a tool for exploring the evolutionary trajectories built along a given phylogenetic tree with known topology and branch lengths and the traits observed at the tips of the tree. The procedure starts at the tips of the tree with known trait data and works backwards up the tree to fill in trait values at early branching points. Then, by conditioning these known internal values, a bridge process is applied to simulate trajectories. Two Gaussian bridge processes, the Brownian bridge process and the Ornstein-Uhlenbeck bridge, are introduced for this purpose herein. The goal of this work is two-fold: first, the new method generates a stochastic map under a bridge process to provide an overview of the entire change for a group of species; second, the new method creates a traitgram that allows the fluctuation of trait changes along the branch of the phylogenetic tree. Note that Revell [19] presented graphical methods for visualizing both the discrete and continuous phenotypic evolution along a phylogenetic tree. While Revell [19] used a weighted average method to generate the map where change of trait value was changed monotonically in an increase/decrease manner, our work implements bridge processes to allow random changes of the trait value .
The background and implementation of the two bridge processes to model trait evolution along the phylogenetic tree are provided in Section 2. The results are shown in Section 3, where simulation results for assessing the new models is provided in Section 3.1, and analysis of an empirical dataset using the traitgrams can be accessed in Section 3.2. The conclusion and discussion of this work are provided in Sections 4 and 5, respectively.

Bridge Process
The bridge process is a member of the continuous-type stochastic process family. Formally, a bridge process' random variable y t has a starting value a at time t = t 0 and an end value b at a known time t = T. A bridge process with the initial condition y(t 0 ) = a at t = t 0 can be constructed by conditioning on y T and yields to a fixed value of b at time t = T.
A bridge process random variable y t solves the SDE in Equation (4) where c T (y t , t, Θ) is the drift coefficient of y t , and σ(y t , t, Θ) is the diffusion coefficient of y t . Szavits-Nossan and Evans [20] expressed c T (y t , t, Θ) in terms of the drift coefficient b(x t , t, Θ) and the diffusion coefficient σ(x t , t, Θ), defined in Equation (1), as the following, where p(b, T|x, t, Θ) is the transition density function of x t corresponding to the stochastic differential equation in Equation (1). Furthermore, p(x, t|x , t ) solves the Fokker-Planck equation (forward Kolmogorov equation) , a Dirac delta function.

The Brownian Bridge
The Brownian bridge is a well-known continuous-time stochastic process defined as is a Wiener process with W 0 = a and an independent Gaussian increment and continuous paths. In the literature, the Brownian bridge has been broadly applied in many fields. For instance, Buchin et al. [21] used the Brownian bridge to study the linear movement between sample points under the circumstance that the trajectory data are with high uncertainty.
Since the probability density p for the Brownian motion to the Fokker-Plank equation subject to p(x, t|a, 0) = δ(x) is given by and due to the process being both space and time homogeneous for the Brownian motion, the expression for p(b, T|x, t) is the same as for p(b − x, T − t|0, 0), implying Using Equation (5) and adopting the derivation in [20], the coefficient c T for the Brownian bridge is shown in Equation (8) Hence, the Brownian bridge random variable y t solves the SDE in Equation (9) The solution of Equation (9) can be obtained by the variation of constants method [22] and is shown in Equation (10) Equation (10) is used to simulate the trajectory given the starting and end values under the Brownian bridge. Take the expectation of y t and y 2 t and using the Itô isometry property, the expected value and variance of y t are shown in Equation (11) and Equation (12), respectively. and Note that Equation (12) implies that the most uncertainty is in the middle of the bridge, with zero uncertainty at the nodes.

The OU Bridge
The OU bridge [20] can be constructed by utilizing the transition density of the OU process given in Equation (13).
where κ(α, t) = Using Equation (5) Note that when α approaches 0, it can be algebraically verified that the c T (x, t) in Equation (15) for the OU bridge converges to the c T (x, t) in Equation (8) for the Brownian bridge. The stochastic differential equation for the OU bridge is thus given by which can be simplified into where coth and csch are hyperbolic tangent and hyperbolic cosecant functions, respectively. The solution y t to the Equation (17) for the OU bridge can be obtained by applying the variation of constants method by two steps [22]. First, the homogeneous solution of Equation (17) can be obtained by solving the first-order linear differential equation The ordinary differential equation in Equation (18), given y(0) = a, has the solution Second, the particular solution y p (t) = k(t) sinh(α(T − t)) for Equation (17) can be obtained by solving which yield the solution Therefore, the full solution for the OU bridge in Equation (17) is Taking the expectation of y t and y 2 t and using the Itô isometry property, the expected value and the variance of y t are and Note that the OU bridge also reduces to the Brownian bridge when the force α approaches zero. This property can be algebraically verified by directly computing the limit of the first (23) converges to E[y t ] in Equation (11), and Var[y t ] in Equation (24) converges to Var[y t ] in Equation (12), respectively. Sample paths for the Brownian bridge and for the Ornstein-Uhlenbeck bridge are shown in Figure 1.

Bridge Process along Phylogenetic Tree
In this section, we implement the bridge process for trait evolution along a root phylogenetic tree. A rooted phylogenetic tree T has a fixed topology equipped with branch lengths that are represented as evolutionary times (e.g., the six horizontal lengths ν 1 , · · · , ν 6 shown in the Figure 2B). Let {ν k } m k=1 be the set of branches, where m is the number of branches and let Y t = (y 1,t , y 2,t , · · · , y n,t ) T be a stochastic trait vector of n species with the dependency described by the tree topology. To implement the bridge process for trait evolution, each branch ν k is first divided into fine grids. Denote the integer N k , k = 1, 2, · · · , m as the number of grids on the kth branch. Samples of y t are then drawn from the path space of the discrete version of Equation (4). Here, the Euler-Maruyama method [23] is used to approximate the numerical solution of the SDE in Equation (4), which is shown in Equation (25) where and σ(y i , t i , Θ) = σ for the OU bridge model. We then use Equation (25) to generate the trajectories of {y i } N k i=1 on each branch constraint on the topology of the tree T . Sample paths generated under the bridge models along a four-taxa balanced phylogenetic tree is shown in Figure 2. Prior to generating the full trajectories along the tree, trait values at the internal nodes of the tree are required. Then, the bridge process can be applied therein on each branch. Since only the tree and the trait value observed at the tip (Y tip = (y 1 , y 2 , · · · , y n ) T ) are available, we obtained samples of the internal nodes as follows [24][25][26]: Let Y node be the trait vector of internal nodes, and let Σ be the covariance matrix of the trait vector Y full = (Y tip , Y node ), represented as where those Σs in the RHS of Equation (26) can be determined according to the corresponding model of trait evolution. Since the joint distribution of Y full under the bridge model is unknown, we instead using the Brownian motion (BM) model and Ornstein-Uhlenbeck (OU) model to obtain the estimates of the internal nodes [27,28]. For the BM model, Σ = [σ 2 ij ] = σ 2 g ij , where g ij is the element representing the shared branch length between a pair of nodes. For the OU model, Σ = [σ 2 ij ] = σ 2 exp(−2α(1 − g ij ))(1 − exp(−2αg ij ))/(2α), where α > 0 is the force parameter [29,30]. Since both the BM model and the OU model are multivariate normal, the distribution of Y node conditioned on Y tip is again a normal distribution, as shown in Equation (27) Y node |Y tip ∼ MVN(µ tip , Σ tip ), where µ node = µ tip + Σ tip,node Σ −1 tip (Y tip − µ tip ) and Σ node = Σ tip − Σ node,tip Σ −1 tip Σ tip,node . (28) Hence, given the tip Y tip and tree T , samples of Y node can be drawn accordingly. Since the tree topology represents the dependency of the trait variable, the simulation starts from the root node of the tree and adopts the tree traversal algorithm [31], where one-dimensional bridge process is applied to generate the trajectories on each branch.

Simulation
The simulation is used to describe the state space of y t given the known parameters Θ on each branch of the tree. Balanced trees and comb trees of 4, 8, 32, 64, and 128 taxa were generated by the R ape package [32]. Trait vectors at the tips Y tips of the tree were simulated using fastBM in the R package phytools [19] with rate parameter σ = 1 and root value µ = 0 under the BM model for the Brownian bridge model and with parameters α = 0.05, θ = 1, σ = 1.85, and root µ = 0 under the OU model for the OU bridge model. The internal node state Y node is then estimated using Equation (28). For each size of tree, 1000 trajectories are simulated on each branch. Below, we report the simulation results for the 4 taxa case for the left tree case. The results of the balanced tree using taxa 8, 32, 64, 128 and the results of comb tree using taxa 4, 8, 32, 64, 128 can be accessed through the link listed in Appendix A.1.

The Brownian Bridge Model
The distribution of the trajectories under the Brownian bridge model for each branch of the four taxa balanced tree is reported by the 2.5%, 50%, and 97.5% level trajectories, the density plots in Figure 3, summary statistics in Table 1, and the boxplots in Figure 4.  Table 1 reports the summary statistics of the trajectories on each branch under the Brownian bridge model. From Table 1, the trajectories on each branch vary widely, as the difference between the Min and Max of the samples are large on each branch. This result is also supported by the sd column where the values of the standard deviation are considerably large (sd = 2.421 for branch 5 and sd = 3.945) when comparing with the σ = 1 used for simulation.
The skewness computed by (y is − y s ) 3/2 ) and the kurtosis computed by (y is − y s ) 2 ) 2 using S = 1000 replicates for the six branches are reported in Table 1. In this trial, the distributions of trajectories under the Brownian Bridges model for branch 1 (black: from time 0 until branching) and for branch 2 (red: from time 0 until branching) are more symmetric (b 1 ≈ 0) but platykurtic (g 2 < 3), for branch 3 (green) and branch 5 (sky blue) are negative skew (b 1 < 0) and kurtic (g 2 ≈ 3), and of branch 4 (blue) and branch 6 (purple) are positive skew (b 1 > 0) and kurtic (g 2 ≈ 3). The skewness for all samples is 0.004 and the kurtosis for all samples is 2.469 meaning that the distribution of the trajectories under the Brownian bridge model is symmetric and platykurtic (less outliers are produced than the normal distribution).  Figure 3 bottom right panel. The last row is the summary statistics for all trajectories. The Min., 1st Qu., Median, Mean, 3rd Qu., Max., sd., skewness, and kurtosis are reported in each column, respectively.   Figure 4 shows the boxplots of the trajectories of each branch for the four-taxa balanced tree. On each panel, it can be seen that from the leftmost to the middle then to the rightmost, the interquartile range (IQR: the height of the gray box) increases and reaches its maximum at the middle, then decreases. This indicates that the samples have the largest range drawn in the middle stage compared to samples drawn from other stages. The result is concordant with the property of the Brownian bridge that the middle accounts for the most uncertainty.

The OU Bridge Model
The distribution of the trajectories under the Ornstein-Uhlenbeck model for each branch of the four-taxa balanced tree is reported by the 2.5%, 50%, and 97.5% level trajectories, the density plots in Figure 5, summary statistics in Table 2, and the boxplots in Figure 6.   Table 2 shows the summary statistics for the trajectories on each branch under the Ornstein-Uhlenbeck bridge model. From Table 2, with α = 0.05, θ = 1, σ = 1.85, the trajectories vary in a narrower range compared to the trajectories generated under the Brownian bridge model. In the sd column, the standard deviation of the trajectories across all branches varies within 1.5 unit, which is smaller than the σ = 1.85. As expected, due to the stabilizing property of the OU bridge model, the range of sampled trajectories are relatively narrower. The skewness and kurtosis using S = 1000 replicates for the six branches are reported in Table 2. In this trial, the distributions of the Ornstein-Uhlenbeck bridge on each branch are less skewed; branch 1 and branch 2 are more kurtic; the corresponding g 2 s are closed to 3; branch 3 and branch 4 are platykurtic g 2 < 3; and branch 5 and branch 6 are leptokurtic g 2 > 3. The overall skewness is −0.447, and the overall kurtosis is 2.981, meaning that less outliers were sampled from the OU bridge model. We notice readers that the distributions of the trajectories on each branch under the two bridge models depend both on the parameters and on the size of tree. See more discussion for the impact of the distribution on the size of the tree in Section 4. Table 2. Summary statistics for the trajectories simulated under the Ornstein-Uhlenbeck bridge model using a balanced tree of 4 taxa. Each row of the Table represents the results in the branch corresponding to the branch number in Figure 5 bottom right panel. The last row is the summary statistics for all trajectories. The Min., 1st Qu., Median, Mean, 3rd Qu., Max., sd., skewness, and kurtosis are reported in each column, respectively.  Figure 6 shows the boxplots of the trajectories for each branch in the four-taxa balanced tree. Similar to the result of the Brownian bridge, the OU bridge also accounts for the most uncertainty in the middle. In each panel, the largest IQR (height of the gray box) can be located in the middle and then decreases toward both ends.

Empirical Study by Traitgram
The two bridge models are applied to analyze a real dataset [33] where the corresponding traitgrams for mapping the evolution of the body mass in marsupial species are generated to overview the change in trait values. The phylogenetic tree is shown in Figure 7A, where 14 marsupial species are included and their body masses are shown in Table 3.
Prior to analysis, the body mass data are logarithm transformed. The parameter σ = 0.010 used for the traitgram under the Brownian bridge model is estimated by the Brownian motion model [27]. The parameter set (α, θ, σ) = (0.008, 10.045, 0.009) used for the traitgram under the OU bridge model is estimated by the OU model [3]. Both parameter estimates are obtained by applying the functions mvBM and mvOU in the R package mvMORPH package [28]. We modified several functions (contMap and plot.contMap) in the R package phytools [19] and use them for our bridge models to generate traitgrams.
Revell [19] used the interpolating method to average between two successive grids. The states along each edge is interpolated using equation y k = (ν j y i + ν i y j )/(ν i + ν j ) which is the weighted average of y i and y j . We also report the traitgram under the interpolating method in Figure 7 (right panel).
The traitgrams using marsupial data under the Brownian bridge model and the OU bridge model are shown in Figure 8.   Table 3. The Traitgrams using a smaller phylogeny of five taxa marsupial species are shown in Figure 9, where the visualization of color change along the edges of the tree are presented. In Figure 9, the traitgram using the weighted method [13] shows that change in trait values is in a monotonic manner; the traitgram using the Brownian bridge model shows that trait values can vary drastically within each branch, and the traitgram using the OU bridge model shows that trait values can be gradually and randomly changed within each branch.

Discussion
When comparing work in the literature, the interpolating method [13] is based on the belief that trait evolution shall gradually change with trends. Our bridge models instead take an alternative point of view and assume that a trait is allowed to fluctuate tremendously (the Brownian bridge model) or is limited to change in a mild and stably manner (the OU bridge model).
Notice that the simulation result for the distribution of model trajectories of the four-taxa tree case is reported here. However, there remains challenges to describe the distribution of the two bridge models for general cases. First, one can envision that the statistical measure of the distribution of trajectories (e.g., skewness and kurtosis) depends on the values of parameters. Second, the distribution of model trajectories may depend on the size of the tree. For a bifurcated rooted tree with n = 2 k taxa, there are 3 × 2 k−1 branches. While a 4-taxa balanced tree has 6 branches, a 64-taxa balanced tree would have 96 branches. This size condition does increase the difficulty of the inference on the distribution of bridge models. Therefore, specifying the distribution for the trajectories remains an interesting question that may require further investigation and more intensive simulations.
Due to the deficiency of the trait data (only the tip trait data is available), we also encounter a challenge of estimating parameters (the inverse problem) for our bridge model. Castiglione et al. [35] developed models that integrated the tip data and the fossil data for phylogenetic comparative analysis. In the future, one may consider developing models that integrate fossil data for mapping the trait evolution.
While the PCMs developed for studying continuous trait evolution mainly use the Gaussian process family; there are also PCMs that use non-Gaussian processes to model trait evolution [36,37]. Those PCMs mainly serve as tools to address the questions from evolutionary biology. The two bridge models presented here merely serve as add-on tools to hopefully reveal useful information embedded in the past.

Conclusions
Two Gaussian bridge processes, the Brownian bridge and the Ornstein-Uhlenbeck bridge, are proposed for stochastic trait mapping to describe trait evolution along a phylogenetic tree. The Brownian bridge model allows trait values randomly and fluctuated tremendously, while the OU bridge model assumes a trait shall randomly change in a more stable manner. The simulation study shows that the two models perform well and are consistent with the mathematical property of the bridge process that the middle accounts for the most uncertainty. The two novel models are applied to analyze a real dataset of marsupial body mass evolution where the evolution trajectories for the 14 species are displayed in the traitgram. The two models provide community-novel tools for mapping trait evolution under different evolutionary assumptions. The links for the scripts to generate the Figures and Tables in this work can be accessed in Appendix A.2.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.