A Simulation-Based Study on Bayesian Estimators for the Skew Brownian Motion

In analyzing a temporal data set from a continuous variable, diffusion processes can be suitable under certain conditions, depending on the distribution of increments. We are interested in processes where a semi-permeable barrier splits the state space, producing a skewed diffusion that can have different rates on each side. In this work, the asymptotic behavior of some Bayesian inferences for this class of processes is discussed and validated through simulations. As an application, we model the location of South American sea lions (Otaria flavescens) on the coast of Calbuco, southern Chile, which can be used to understand how the foraging behavior of apex predators varies temporally and spatially.


Introduction
For some problems involving time series from continuous variates, considering semi-permeable barriers allows us to construct more accurate models for the phenomenon studied.For instance, in finance we could consider psychological barriers involving asset prices which lead to different decision making procedures on each side of the barrier.
We propose here a model capturing that feature in terms of local times, giving rise to the so-called skew Brownian motion (sBm).
The sBm has attracted interest within other facts due to its relationship with diffusions with discontinuous coefficients or related to media with semi-permeable barriers.It states the first example of the solution of a stochastic differential equation having the local time of the solution as the drift [1].
Recently, several papers have considered the sBm in modeling or in simulation issues, as well as in some optimization problems.See the review by Lejay [2] for references on the subject, and for a survey on the various possible constructions and applications of the sBm.
In this paper, we are interested in the statistical estimation of the skewness parameter when we observe a trajectory of the process at an equally spaced time grid.
From the statistical point of view, this problem represents an intermediate analysis between the classical problem of drift estimation in a diffusion [3,4], and the estimation of the variance of a diffusion, see [5,6] and references therein, where the probability measures generated by the trajectories are singular for different values of the parameter.
As far as we know, there are few works about the estimation of this parameter.In [7], the authors assume that the sBm is reflected at levels 0 and 1 to ensure ergodicity, and in [8], the maximum likelihood estimator is constructed and its asymptotic behavior analyzed, under the null hypothesis of the sBm being the standard Brownian motion.Under general hypotheses, however, the problem is not analytically handled and has not been studied yet.
In this paper, we present a simulation-based analysis on the asymptotic behavior of the Bayes estimators for the skewness parameter, showing that their consistency is supported.The methodology is applied to the trajectory that South American sea lions (SASL, Otaria flavescens) follow during their foraging trips, understood as the feeding trips at sea alternating with visits to colonies in order to rest and for lactating females to nurse pups ( [9]), on the coast of southern Chile.
The paper is organized as follows.Section 2 provides the model settings and the basic definitions.Section 3 presents a simulation-based analysis on the asymptotic behavior of the mean, mode and quantiles of the posterior distribution.Numerical approximations and real data analysis are exemplified in Section 4. Finally, in Section 5, some aspects of the results obtained are discussed.

Skew Brownian Motion
Consider the following stochastic differential equation (SDE), for a process X ∈ R, of the form where x ∈ R, (B t ) t≥0 is a standard Brownian motion, and l t is the local time of the unique strong solution X at 0. Given T > 0 fixed, for any interval I ⊂ [0, T], define where λ is the Lebesgue measure on R + , and l(I, •) denotes the occupation measure on R of X during the time interval I.If l(I, •) is absolutely continuous with respect to the Lebesgue measure λ on R, we denote its Radon-Nikodym derivative by and we call l(I, x) the occupation density of X at level x during the time interval I.We write l t (A) = l([0, t], A) and l t (x) = l([0, t], x) for the occupation measure and for the density (local time), respectively.An equivalent alternative for the definition of the local time l x = {l x t : 0 ≤ t ≤ +∞} at level zero of the (unknown) solution X of the Equation (1), departing from x, is given by the following limit, Recall that a strong solution X of a stochastic differential equation driven by (1) on a given probability space (Ω, F , P), with respect to the fixed Brownian motion B and independent initial condition x over this probability space, is a stochastic process (X t , t ≥ 0) satisfying: 1. X is adapted to the filtration (F t ), where F t := σ(B s , 0 ≤ s ≤ t); 2. X is a continuous process; 3. P(X 0 = x) = 1; 4. with probability one, we have Then, the skew Brownian motion (sBm), X = {X t : 0 ≤ t ≤ +∞}, can be defined as the strong solution of the stochastic differential Equation (1), where B = {B t : 0 ≤ t ≤ +∞} is a standard Brownian motion defined on a given probability space.
For the study of the strong uniqueness of the solution of stochastic differential equations and local times (see [10]), and [2] for a comprehensive survey on the sBm and the SDE involving the local time.
In Equation (1), the initial condition is x ≥ 0 (the case x < 0 is analogous), σ is associated to the volatility parameter, and θ ∈ [−1, 1] is the skewness parameter.

Transition Probabilities
Consider the SBm X with parameter θ ∈ [−1, 1] defined in Equation ( 1), for t ∈ [0, T], and the sampling scheme denoted by X i := X i ∆ , i = 0, . . ., n, and ∆ = T/n.As shown in [11], the transition density of X, shown in Figure 1, is given by where is the density of a Gaussian random variable with variance tσ 2 and mean 0, and sgn We can rewrite the transitions between states as when x > 0, when x < 0, and, when x = 0, Observe, from the transition kernel, that the parameter θ has a strong influence on the trajectory when it passes close to the origin.Indeed, for a fixed amount of time t, when x is far enough from the origin, the second term is nearly zero.In this case, the trajectory is practically driven by a Brownian motion, almost without influence from θ.On the other hand, when a trajectory approaches the origin from positive values, say, if θ < 0 the trajectory is pushed across the boundary to negative values, and if θ > 0, it has a higher probability to be reflected and remain with a positive value.
In the extreme case θ = 1, the solution to Equation ( 1) is the reflected Brownian motion above the origin, and when θ = −1, it is the reflected Brownian motion under the origin as soon as the trajectory becomes negative.The case θ = 0 corresponds to the standard Brownian motion.

Exiting Times
Let us consider τ x = inf{t > 0 : X t = x}, the first hitting time of the point x ∈ R. Let J = (a, b) be any open interval such that the exiting probability P x (τ {a,b} < ∞) equals 1, for any x ∈ J. Then there exists a continuous, strictly increasing function S(x) on R , such that The function S is called the scale function of the process.For more details, refer to [12], for instance.
For the sBm, the scale function is given by where α = (1 + θ)/2, as in [2].This implies that, if a = −b < 0 < b, Observe that this value does not depend on b, but only on θ: for every b > 0. In other words, if θ < 0, once the particle reaches the barrier, it is pushed downwards with a probability (1 − α) greater than half; and if θ > 0, the particle is pushed upwards with probability α greater than half.
From the point of view of the inference procedure, given an observed trajectory, for those data close enough to the barrier, the proportions of them at each side of the barrier are linear functions of θ.This property can guide us to define an appropriate prior distribution for this parameter, as we discuss in the next section.

Likelihood and Prior Settings
For the model (1), let us consider the parametric space For a given sample d = (x 1 , . . ., x n ), of the observed process at times (t 1 , . . ., t n ), the likelihood function for ω ∈ Ω, after observing d, is given by with q as in (3), and τ i = t i+1 − t i .To ease the notation, let With this notation, the likelihood of (θ, σ 2 ) is proportional to where n −+ is the number of crossings from the negative to the positive side, n +− , the number of crossings from the positive to the negative side, and b j /a j = exp{−2x j x j+1 /(τ j σ 2 )} < 1.
Observe that, for each step crossing the barrier, if x i ≤ 0 and x i+1 > 0, then q(τ i , When the first happens, θ = −1 becomes a root of the likelihood, and when the latter case happens, θ = 1 becomes a root. On the other hand, for each two consecutive positive or negative steps, q(τ j , x j , x j+1 | θ, σ 2 ), as a linear function in θ, has a root, ±a j /b j , outside the interval [−1, 1], with maximum in θ = 1 if x j > 0, and in θ = −1, otherwise.This implies, for instance, that if we observe a positive trajectory without crossings, the maximum likelihood estimate for θ is 1, giving more evidence for a reflected Brownian motion, as already pointed out in [8] (Lemma 1).
Suppose now we have prior information about the skewness and the volatility, and let us denote by f (ω) the prior density for ω quantifying that information.By Bayes's theorem, the posterior density for ω, given the data d, is determined by the relation By the previous remarks, if we assume prior symmetry around θ = 0, the posterior symmetry for θ basically depends on the polynomial factors defined by the steps performed at each side of the barrier.If, for instance, a trajectory with at least two crossings has more steps at the positive side than at the negative one, then the posterior distribution is skewed to the left, giving more mass probability to θ > 0.
Also, when a trajectory has no crossings, if x i > 0, for all i, then the posterior distribution is skewed to the left, with a modal value in θ = 1, and, conversely, if x i < 0, for all i, then the posterior distribution is skewed to the right, with a modal value in θ = −1.
In this work, we will consider that, a priori, θ and σ 2 are independent random variables.For the inverse of the variance we adopt a Gamma distribution, G(α s , β s ), mainly because of the first product in the likelihood expression (4), so we can obtain a kind of conjugacy for 1/σ 2 .On the other hand, from the interpretation of θ as a linear function of a proportion given in Section 2.3 and again for obtaining a partial conjugacy for θ, we adopt for θ a Beta distribution in [−1, 1] with hyperparameters α t , β t .With this, the prior density is and, consequently, the posterior density can be expressed by The last two factors show the dependency between θ and σ 2 , given by the updated information after observing d, as stronger the more the trajectory remains near the barrier.As the trajectory moves away from the barrier, those factors converge exponentially to one.
Observe that this is a precise hypothesis, that is, it is a sub-manifold of Ω with dimension verifying dim(Ω 0 ) < dim(Ω).For any absolutely continuous posterior distribution, the posterior probability of a precise hypothesis is zero.As a test criterion for sharp hypothesis, the Full Bayesian Significance Test (FBST) deals with the posterior probability of a region defined by the hypothesis and the level surfaces of the posterior density f (ω | d).
Given a hypothesis Ω 0 , let us define the tangential set, T 0 , as Intuitively, the tangential set to Ω 0 considers all points more probable than the most probable value inside Ω 0 , according to the posterior law.
The e-value (briefly denoted by ev) for the null hypothesis is an evidence measure defined by Therefore, if the tangential set has high posterior probability, the evidence in favor of Ω 0 is small; if it has low posterior probability, the evidence against Ω 0 is small, see [13].
As observed in [14], by the absolute continuity of the posterior distribution on Ω, there is a representation of f (ω | d) such that where . This property allows us to obtain f 0 without having to elicit a prior density conditional on the null hypothesis, using the limit (6) instead.Under a decision theoretical point of view, the e-value defined by (5) allows us to perform a significance test for the hypothesis H : θ ∈ Ω 0 .Let us consider a decision space D = {to reject H(a 1 ), not to reject H(a 0 )}, and the loss function l : D × Ω → R given by with w 0 , w 1 , c > 0. The value c represents the additional loss of not rejecting H when in fact θ belongs to T 0 and then has posterior density greater than any value in Ω 0 .According to [15], the decision rule defined by "reject H" if and only if e − value < w 1 + c w 0 + c minimizes the expected value E(l(a, θ)|d).
We can also measure the evidence for H using the well-known Bayes factor [16].Let us note that its determination requires us to define a prior distribution on Ω 0 , f (θ|Ω 0 ).The sensitivity of Bayes factors to the choice of such priors is one of the usual criticism of that measure (see [17]).Let B denote the Bayes factor against the hypothesis H.According to [18], one might adopt the following decision rule to test H: if 1 < B < 3 there is little evidence against H; if 3 < B < 20, the evidence against H is positive; if 20 < B < 150, it is strong; and if it is greater than 150, the evidence is very strong.These categories are though prescriptive, since there is no explicit mention to any loss function associated to the test.
Finally, we will also compute posterior probabilities, as presented in Section 4, for some regions of interest.

Considerations on the Asymptotic Behavior of the Bayesian Posterior
This section presents a simulation-based analysis of the asymptotic behavior of the posterior distribution of θ, given σ 2 , as the sample size increases.
In [8], a first step towards a convergence result for the maximum likelihood estimator (MLE) has been achieved.The authors characterize the limiting distribution under the null hypothesis of the standard Brownian motion, θ = 0, and the convergence is related to the G-stability (see [6] for the definition and properties).That kind of convergence is proved to be true there in a neighborhood of θ = 0, but even in that case it is not a sufficient condition for the consistency of the estimator.To the best of our knowledge, reference [8] is the only analytical result on the convergence of estimators for the skewness parameter θ of the sBm.
Also, for the MLE, high-frequency estimation, obtained for a sequence of times {iT/n} i , is equivalent to long-time estimation, obtained for a sequence of time-steps equal to one.The simulations we performed consider then increasing times {1, . . ., n} and give support for the consistency of the posterior distribution when the sample size n increases, as we can see in what follows.
For this discussion, we present the simulated sampling distribution of the posterior mode, the posterior mean, and the posterior 0.025 and 0.975 quantiles for θ.Those distributions were obtained by a simulation of 10 thousand trajectories with n steps, for n = 100, 1000 and 10, 000, for different values of θ ranging from −1 to 1.We consider for the simulations a fixed value of σ 2 = 1; analogous results were obtained for other values of σ 2 .
Figure 2 represents the sampling distribution for the posterior mode of θ, from trajectories generated with θ = 0. Observe that for a trajectory with n = 100 steps, the sampling distribution of the posterior mode is trimodal, with modes −1, 0, 1.These extreme modes occur because there is a positive probability that a trajectory does not hit the origin, giving then evidences for θ = 1 if the trajectory is mostly positive, and for θ = −1, if it is mostly negative.By symmetry, both cases can happen with the same probability, as confirmed by the simulations.As n increases, this probability tends to zero, as we can see in the histograms for n = 1000 and n = 10, 000 steps, where the mass probability around −1 and 1 almost vanishes.
Figure 3 represents the sampling distribution for the posterior mean and the joint sampling distribution for the posterior quantiles q(0.025) and q(0.975).It is clear that the sampling distribution for the posterior mean concentrates around θ = 0 as n increases.Also, the range q(0.975)-q(0.025)diminishes as n increases, showing that the whole posterior distribution concentrates asymptotically.Posterior mean, and quantiles q(0.025) and q(0.975), for trajectories with n steps, n = 100, 1000, 10,000, assuming θ = 0.
The convergence can also be observed in the sampling distribution of the posterior mean for different values of θ, as shown in Figure 4. There, the horizontal axis represents the nominal values for θ, from −1 to 1.For each θ, the solid lines represent the 95% central values of the posterior mean, for a sample of size: (a) n = 100, (b) n = 1000, (c) n = 10, 000 steps.Observe that these intervals become sharper as n increases.
We must point out that the number of crossings is a relevant statistic for having a more precise inference on θ.In each graph, the dashed line shows the same interval for the posterior mean, but considering only those trajectories that have five crossings or more, and the dot-dashed line, those with 10 crossings or more.The information gained is remarkable for small samples, as we see in Figure 4, mainly for θ close to 1 or −1.
As seen in Figures 2 and 3, for sample sizes less than 1000, the posterior mode for the skewness parameter presents a more dispersed sampling distribution than the posterior mean.That makes, in some sense, the posterior mean a more reliable estimator for θ.On the other hand, both estimators improve as the number of crossings increases in the sample, and in that sense their consistency is well supported by simulations, for any value of θ.

Simulated Data
In this section we describe the procedure followed to obtain posterior inferences from one single trajectory, and the subsequent point estimates and e-value.For illustrative purposes, let us consider the simulated trajectory of n = 1000 steps from a skew-Brownian motion with parameters σ 2 = 1 and θ = 0.8, represented in Figure 5a.The evidence given by the Bayes factor against H is favorable to this hypotheses.However, considering the decision rule (7) and an uniform prior distribution for θ, we should take the decision to reject H if and only if In other words, if we have for instance w 1 ≈ c, and the cost of the error of type II is at least half the cost of the error of type I, then H is to be rejected.
Also, the probabilities P(θ < 0 | d) > 0.4, for the different priors, indicate that the subject likely did not always travel to the same foraging area but instead some of their trips were to lower latitudes.Biologically, at least three different explanations not mutually exclusive could explain why SASL move to different areas to forage.First, prey aggregations may change location due to variations in oceanographic features, thus obligating predators to change their foraging areas, see [22].Second, individuals usually segregate and change their foraging locations in order to decrease intra-specific resource competition ( [23]).Finally, SASL are considered to be a generalist predator, which means that they consume different prey species [24].Thus, if the prey abundance and availability in a determined foraging area decrease, the subject may display behavioral plasticity and abandon the area and forage at a different place or look for different prey ( [25]).

Conclusions
In a model selection problem, some aspects should be taken into account, as theoretical considerations on the question itself or past experience.The observed sample may provide insights about the skew Brownian model, mainly if there are a number of crossings through the barrier, but supplementary considerations are required for choosing the skew Brownian as a proper model for a specific problem.
From the point of view of the estimation, as seen in Figure 3, for sample sizes less than 1000, the posterior mode and, in particular, the maximum likelihood estimator for the skewness parameter present more dispersed sampling distribution than the posterior mean.
That makes, in some sense, the posterior mean a more reliable estimator for θ.On the other hand, both estimators improve as the number of crossings increases in the sample, and in that sense, their consistency is well supported by simulations.
A possible generalization of this problem is to consider that, on each side of the barrier, there are different values for the volatility parameter, σ 2 1 and σ 2 2 , allowing in that way different dynamics depending on the barrier.On this extended parametric space, the proposed methodology can be applied as well.Considering that parametrization, we obtain a precise hypothesis stating that those diffusions have the equal value to the parameter σ 2 1 = σ 2 2 , whose evidence can be measured by the e-value, by instance.Another possible extension consists of considering more than one barrier and several diffusion processes between them, or, even more, considering a time-dependent barrier, underlying dynamic models problems.
The analysis proposed here could be used to model behavioral changes in apex predators, such as marine mammals.Understanding the foraging behavior of the animals, and how such behavior varies temporally and spatially may be an indicator of ecosystem events and links shifts in predator and prey dynamics with environmental variability.

Figure 4 .
Figure 4. Sampling interval for the posterior mean of θ, with σ 2 = 1, considering θ ∈ [−1, 1] and: (a) n = 100; (b) n = 1000; (c) n = 10, 000 steps.For each graphic, the solid line represent the 95% central values of the posterior mean, the dashed line shows the same interval considering only those trajectories that have five crossings or more, and the dot-dashed line, those with 10 crossings or more.

Table 2 .
Posterior estimates for (θ, σ 2 ), for different priors on θ, for the SASL observed trajectory.The last three columns show the evidence in favor of the hypothesis θ = 0, given by the e-value, the Bayes factor and the posterior probability of θ < 0.