Bayesian Inference on the Memory Parameter for Gamma-Modulated Regression Models †

In this work, we propose a Bayesian methodology to make inferences for the memory parameter and other characteristics under non-standard assumptions for a class of stochastic processes. This class generalizes the Gamma-modulated process, with trajectories that exhibit long memory behavior, as well as decreasing variability as time increases. Different values of the memory parameter influence the speed of this decrease, making this heteroscedastic model very flexible. Its properties are used to implement an approximate Bayesian computation and MCMC scheme to obtain posterior estimates. We test and validate our method through simulations and real data from the big earthquake that occurred in 2010 in Chile. Entropy 2015, 17 6577


Introduction
Diffusion processes have been a cornerstone of stochastic modeling of time series data, particularly in areas such as finance [1] and hydrology [2].Many extensions to the classic diffusion model have been developed in recent years, addressing such diverse issues as asymmetry, kurtosis, heteroscedasticity and long memory; see, for instance, [3].
In the simplest case, the increments of a diffusion model are taken as independent Gaussian random variables, making the process a Brownian motion.In this work, by contrast, processes with long memory and non-Gaussian increments are considered.
The proposed model is a generalization of the Gamma-modulated (G-M) diffusion process, in terms of the memory parameter.This model was developed in [4] to address an asset market problem, extending the ideas of the Black-Scholes paradigm and using Bayesian procedures for model fitting.In that work, the memory parameter was assumed to be known and fixed, with some particular cases, such as the standard Brownian motion and the Student process.The latter one is a generalization of the Student process previously presented in [5], the marginals of which have a t-Student distribution with fixed degrees of freedom and a long memory structure.
Here, we enlarge the parameter space considering that the memory parameter is also unknown, provided we have a prior distribution on it.
This extension allows flexibility for the dependence structure of the process, where the Brownian motion and the G-M process become particular cases.
Typically, the trajectories generated by this process exhibit heteroscedasticity, with higher variability at the beginning, which we call "explosion at zero".In addition, as time increases, the variability decreases at a rate depending on the long memory parameter.
In particular, we will focus on estimation procedures for long-range memory stochastic models from a Bayesian perspective.Other parameters, such as location and dispersion, are also considered.
For the location and scale parameters, we can straightforwardly find natural conjugate prior distributions.However, the same does not occur for the memory parameter, as its marginal likelihood is not workable analytically.This implies that methods used for obtaining a posterior distribution, such as commonly-used likelihood-based solutions, are not suitable for this purpose.
In order to approximate the posterior distribution for the parameters involved, we propose a blended approximate Bayesian computation ABC-MCMC algorithm.This family of ABC algorithms and its very broad set of applications are well reviewed in [6].In this work, the MCMC part is built for those components with full conditional posterior distributions that are able to be dealt with, and the ABC is implemented for the memory parameter.Grounded on previous results [7], for the ABC steps, an appropriate summary statistic was defined, based on the path properties and on the m-block variances.We obtain, by this proposal, very precise estimates for that parameter.
After generating samples from the posterior, a by-product is the e-value, an evidence measure for precise hypotheses, such as the Brownian motion and G-M cases.This measure was defined in [8] and used afterward, for instance, in [9][10][11].
We test and validate our method through simulations and illustrate it with data from the big earthquake that occurred in Chile in 2010.
The definition of the process and some properties are presented in Section 2. In Section 3, we describe the ABC-MCMC algorithm.The simulated and real data results are shown in Section 4, and finally, in Section 5, we give some final remarks.

Generalized Gamma-Modulated Process
Let us consider the standard Brownian motion, {B t } t>0 , and a Gamma process, {γ t } t>0 , as defined in [4].
A Gamma process is a pure-jump increasing Lévy process with independent and stationary Gamma increments for non-overlapping intervals.For this process, the intensity measure is given by κ(x) = ax −1 exp(−bx), for any positive x.That is, jumps whose size lies in the interval [x, x + dx] occur as a Poisson process with intensity κ(x)dx.The parameters involved in the intensity measure are a, which controls the rate of jump arrivals, and b, the scaling parameter, which controls the jump sizes.
The marginal distribution of a Gamma process at time t is a Gamma distribution with mean at/b and variance at/b 2 , allowing also the parametrization in terms of the mean, µ, and variance, σ 2 , per unit time, that is, a = µ 2 /σ 2 and b = µ/σ 2 .
For the one-dimensional distributions, we have that αγ t (a, b) = γ t (a, b/α) in distribution; E(γ n t ) = b −n Γ(at + n)/Γ(at), n ≥ 0, where Γ(z) is the Gamma function; E(exp(θγ t )) = (1 − θ/b) −at , for θ < b, and its characteristic function, φ γt (u) = E(exp(iuγ t )), is: Given times s < t, Corr (γ s , γ t ) = s/t, and given h > 0, the density, f h (y), of the increment γ t+h − γ t is given by the Gamma density function with mean ah/b and variance ah/b 2 , In this work, we will consider a = b = 1/2.Given a real value α ∈ [−1, 0], we define the generalized Gamma-modulated (G-M) process by: Figure 1 shows typical realizations of the generalized G-M process for different values of the parameter α.In particular, the value α = 0 corresponds to the Brownian motion and α = −0.5 to the Student process studied in [4].
The next subsections present some useful path characteristics of the process that could lead us to choose this model as an appropriate one for a given problem and to help us make inferences about its parameters, such as the presence of long memory, the variability profile and the variance of the increment process.

Explosion at Zero
The graphs in Figure 1a,b, with α < 0, show that the process is highly variable near t = 0, but as t → ∞, its variability decreases.We call this path property "explosion at zero" and prove it in the next result.Proposition 1.Let α X t be the generalized Gamma-modulated process as defined in Equation (1).Then, for all M > 0, we have: Proof.Let us consider M > 0 and α < 0. Conditioning in γ s , we obtain: with A > 0. The last quantity tends to one as s → 0 for α < 0.
Let us consider now the case α = 0, that tends to zero as s → 0.

The Increment Process
Let us now consider the increment process, ∆( α X t ) = B t γ α t − B t−1 γ α t−1 .The next results describe the asymptotic behavior of the variance-covariance structure for these differences and, hence, for the process itself, since α X t has zero expectation.Proposition 2. For the increment process, ∆( α X t ), Proof.Let us observe that the increment process, ∆( α X t ) = B t γ α t − B t−1 γ α t−1 , can be written as ∆( α X t ) = W t + V t , where: Working out each term, we obtain: This property leads us to consider the variance of the observed increment process, V ar(∆( α X t )) ∼ t 2α , as informative for the parameter α and, therefore, helpful to implement the approximated simulation for its marginal posterior.
In a data exploratory phase, we could examine the graph of the empirical variances of ∆( α X t ) from t to the end of the process, as a function of t in logarithmic scale.As t increases, that graph should become linear with slope 2α. Figure 2 exhibit this result for some values of α.Observe that the asymptotic result can be visualized from log t ≈ 3, that is from t ≈ 20, for α < 0. Proposition 3. The auto-covariance for lag n of the increment process, denoted by M α (n), is of order n α−1 , if −1 < α < 0, and it is zero, if α = 0, as n increases.
Proof.Let us consider n ≥ 3, because of the explosion at zero.Then: Observe that: If −1 < α < 0, we can apply the Gautschi inequality [12], obtaining: and then: On the other hand, for α = 0, E(γ α t γ α s ) = 1, and therefore, M 0 (n) = 0.In other words, for α = 0, the process has no memory, and we recover the Brownian motion case as already mentioned; for α < 0, the process is called anti-persistent.Recalling the Hurst parameter H associated with the fractional Brownian motion, the relationship between H and α is α = 2H − 1; see [4], for instance.

ABC-MCMC Study
When dealing with non-standard posterior distributions, the usual procedure is to use Markov chain Monte Carlo simulation to produce an approximate sample from this posterior.However, when the likelihood function is intractable, MCMC methods cannot be implemented.The class of likelihood-free methods termed, approximate Bayesian computation (ABC), can deal with this problem as long as we are able to simulate from the probabilistic model and a suitable set of summary statistics is available.
The ABC idea was proposed by Pritchard et al. [13] and developed further in the last decade.In particular, with respect to the choice of the summary statistics from among diverse options, in [14], the authors consider a sequential test procedure for deciding whether the inclusion of a new statistic improves the estimation.In [15], the discussion refers to the choice of informative statistics related to the algorithmic properties.
In our case, in order to perform the ABC algorithm for the memory parameter, α, we have to be able to generate a realization of the target process and we have to know which are the important observable characteristics of the process that lead us to increase our information about α.
Observe that we already know, by Equation ( 1), how to simulate from the generalized G-M for each value of the parameter.After obtaining a simulated trajectory, we can compare it then to the observed trajectory through adequate statistics.Intuitively, if they are similar enough, the chosen value of the parameter can be thought of as an appropriate one.
More concretely, suppose that we take the observed increments d t = ∆(x t ) from a sample x t and that for each α ∈ [−1, 0], we generate a realization from a Brownian motion b t and a realization from a Gamma process γ t , obtaining: For the ease of notation, set α y t = ∆( α x t ).To compute the proximity between d t and α y t , we will determine the distance between some statistics for each sample.The usual choices for the memory parameter are, among others, the rescaled range R/S or the rescaled variance V /S, the periodogram, quadratic variations, aggregated variances, Whittle estimator and functions, as the logarithm, inverse or square root, of these ones [15,16].In [7], we used those statistics to obtain approximate posterior samples for the memory parameter for fractional self-similar processes.
We tried all of them in this work; however, their performance was not good enough, because of the non-stationarity and the inherent heteroscedasticity of the process.To solve this situation, we considered the slope of the regression of the sampling variance of m-size blocks on the time, regarding the results in Section 2.2, and this solution worked much better than the former ones.
Let T * denote the following statistic.Giving n observations and an integer m << n, we take consecutive blocks of size m and obtain the sampling variance for each block, s 2 (k), for k = 1, . . ., n/m , where ξ is the integer part of ξ.Those values are used as estimates for the variances at times m, 2m, . . ., n/m m.In log − log scale, the slope of the regression line obtained through the points (km, s 2 (k)), k = 1, . . ., n/m , should be of order of 2α, as time increases.We define T * as this slope.
The ABC steps for α are then given by the following algorithm.
The sample S α obtained in the last step is an approximate sample from the posterior distribution for α, the goodness of which strongly depends on the choice of the statistics and the threshold ε in Step 3.
Note that if ε is too small, then the rejection rate is high, and the algorithm becomes too slow; on the other hand, if ε is too big, the algorithm accepts too many values of α, giving a less precise approximate sample.In general, what is done is to take a small percentile of the simulated distances, that is we select those α's giving the closest values to the observed one.In this work, after some trial, we used the first percentile, as proposed by [17].
Let us consider now the more general model for the increments given by: for µ, τ ∈ R and τ > 0, representing the precision of the fluctuations.As: and B t independent of γ t , for all t > 0, and following the theory associated with the Gamma-modulated process [4], it is possible to assume a hierarchical representation.
The whole procedure is to then use the ABC algorithm for α and the MCMC scheme for µ, τ to perform the Bayesian inference of our proposal.

Posterior Evidence for Sharp Hypotheses
As a by-product of the previous algorithm, we are able to compute approximately the so-called e-value, an evidence measure defined in [8] for precise hypothesis testing, which we describe briefly below.
Let us consider a hypothesis of interest H : α ∈ Ω 0 , and define the tangential set T 0 to Ω 0 as the set: In other words, the tangential set to Ω 0 considers all points "more probable" than those in Ω 0 , according to the posterior law.
The evidence measure e-value for the hypothesis H is defined as: Therefore, if the tangential set has high posterior probability, the evidence in favor of H is small; if it has low posterior probability, the evidence against H is small.For instance, suppose that we are considering the sharp hypothesis H : α = α 0 .If the subset α = α 0 has high density, that is it lies near the mode of π(µ, τ, α | y)), then the e-value must be large, giving high evidence for that sharp hypothesis.Some interesting hypotheses are the precise ones defining the Brownian motion case, when α = 0, and the G-M process, when α = −0.5.
In this work, we approximate empirically the integral in the e-value using the posterior sample obtained by the previous algorithms.As usual, in the Bayesian paradigm, the predictive distribution for the next steps can be computed using this ABC-MCMC sample and the model Equation (2).

Numerical Results
In this section, we present the main results after implementing our algorithms in the R-gui software [18], using simulated and real data.Hence, we will show the performance of our proposal and its use in practice.

Simulated Results
To illustrate the performance of our procedure, we simulate 500 replicates of length n = 1000, for a grid with α ∈ [−1, 0] by 0.1, when µ = 0 and τ = 1 are fixed.In Figure 4, we see the sampling distribution of the posterior mean for α.Observe that the estimates we obtained are fairly precise for all values in the interval [−1, 0].
In order to confirm that precision, we obtained the e-values associated with the hypothesis α 0 = −0.5, when the nominal value for α varies in [−1, 0], as shown in Figure 5a.The boxplots show the sampling distribution for these e-values from the 500 replicates for each α ∈ {−1, −0.9, . . ., −0.1, 0}.Observe that the obtained e-values are coherent with the previous estimates, giving high evidence to α 0 = −0.5 when the nominal value is close to it and low evidence otherwise.In the case α 0 = 0, shown in Figure 5b, even when the e-value is not as high for α = 0 as in the previous case, it does allow for a good discrimination in favor of the null hypothesis.
In the general case, when µ and τ are unknown, we applied Algorithm 2, reporting the following results for the nominal values µ = 10 and τ = 1, as illustrated in Figure 6.The numerical results were very similar for other values in terms of precision and computational time.
It is worth noting the symmetry of the sampling distribution of the posterior mean for µ, as well as the asymmetry of the sampling distribution of the posterior mean for τ , as it was supposed to be by Equation (3).Furthermore, the estimates are consistent around the nominal value.The results suggest that, as α → −1, the data become more informative for the location parameter.Inversely, for the precision parameter τ , the estimates seem to be less accurate as α goes to −1.Both features are related to the behavior of the trajectories: as α decreases to −1, the increments stabilize faster around zero than for α closer to zero, and consequently, we obtain more precise estimates for µ.This very behavior explains the posterior for τ : the fast stabilization of the increments leads to underestimating the variance, that is to overestimating the precision.The same results were observed for other nominal values of µ and τ .

Earthquake Acceleration Data
Our proposal is illustrated extending the ideas in [4].We analyze sequential data obtained from an accelerometer recording the big earthquake in Southern Chile (27 February 2010), with the epicenter in Cobquecura, approximately 335 km southwest of Santiago, which reached 8.8 on the Richter scale.The dataset was obtained from the Hydrographic and Oceanographic Service of the Chilean Navy.The time series was recorded at a rate of 50 observations per second, with n = 1653, as shown in Figure 7a.
A brief exploratory analysis of the data, summarized in Figure 7b-d, led us to believe that the proposed model can be well fitted according to the properties described in Section 2.
The marginals for the posterior distribution are shown in Figure 8, with posterior means for the parameters equal to α = −0.975(±0.021),μ = −0.0007(±0.007),τ = 106305(±3800).Observe, in the first scatterplot, the sharp and negative posterior correlation between α and τ , as expected by Equation (3).Furthermore, the e-values for the hypotheses α 0 = −0.5 and α 0 = 0 are clearly zero, as confirmed by both graphs.Marginal posterior distributions for the parameters in the acceleration earthquake series.

Final Remarks
The model proposed here seems to be suitable for the phenomena under study.In particular, interesting properties, including long memory and trajectory behavior, are discussed, on which our inference methodology is grounded.
For the ABC algorithm, we considered at first a minimum entropy criterion for selecting the approximate posterior sample, because that choice had a good performance in estimating the long memory parameter for stationary non-Gaussian processes as, for instance, binary and the Rosenblatt processes [7].However, given the trajectory behavior of the G-M process, the most informative statistics we found is the m-block variance statistic T * .
Our results show, firstly, a clear and easy way of implementing the ABC-MCMC algorithms in a standard software.For the real data (n = 1653), the computational cost was moderate, obtaining a posterior sample of a size of 500 from the ABC-MCMC proposal in one hour.
In our simulations, we obtained a very high precision in the estimates given by this procedure.In addition, the estimates for the precision parameter, τ , are affected reasonably by scale changes.For instance, for the rescaled data, z = c × y, we estimated τ as τ * = τ /c 2 , approximately.
The chosen parameterization allows for α ∈ (0, 1].However, we did not study such a case, since that process diverges as time increases, making the inference procedure harder for α.We recommend then to treat this problem as a separate case.
Finally, we believe that this model has wider applications, mainly for its parsimony and the straightforward interpretation of the parameters.For diagnostic purposes, for example, the predictive series could be used to perceive the increasing fore-shocks before the arrival of a new earthquake.
Proposition 4. The one-dimensional densities of T t , ∆T t = T t − T s and (∆T t , T s ) for fixed 1 < s < t are given by: where Γ t is the cdf of γ t , φ(x; µ, σ 2 ) denote the univariate density at x of a normal distribution with mean µ and variance σ 2 , for i = 1, . . ., n.We will assume that (µ, σ 2 ) is independent of {B t } t≥0 .Under this framework, the conditional setting is given by: y where Σ = I n , X = 1 n is the n-dimensional vector of ones, and µ = β.
Considering the following prior specification: where B 0 = τ 2 0 and b 0 , a 0 , d 0 and τ 2 0 are known, then it is straightforward to obtain the complete conditional distributions.To approximate the posterior distributions, we need to sample from:
It is straightforward to prove that if we integrate out W , we obtain: where t n (µ, Σ, ν) denote the n-dimensional t distribution.Thus, if we consider:

Figure 3 Figure 3 .
Figure 3. Realizations of the increments of the generalized Gamma-modulated process for different values of α and the posterior density obtained by the approximate Bayesian computation (ABC) algorithm for α.

Figure 7 .
Figure 7. (a) Acceleration earthquake series, X t (n = 1653 at 50 observations per second) and exploratory graphics: (b) density of the raw data; (c) sample variance of the increment process, V ar(∆(X t )), in log-scale; (d) sample auto-covariance, M (n), in log-scale.