Least-Squares Estimators of Drift Parameter for Discretely Observed Fractional Ornstein–Uhlenbeck Processes

: We introduce three new estimators of the drift parameter of a fractional Ornstein–Uhlenbeck process. These estimators are based on modiﬁcations of the least-squares procedure utilizing the explicit formula for the process and covariance structure of a fractional Brownian motion. We demonstrate their advantageous properties in the setting of discrete-time observations with ﬁxed mesh size, where they outperform the existing estimators. Numerical experiments by Monte Carlo simulations are conducted to conﬁrm and illustrate theoretical ﬁndings. New estimation techniques can improve calibration of models in the form of linear stochastic differential equations driven by a fractional Brownian motion, which are used in diverse ﬁelds such as biology, neuroscience, ﬁnance and many others.


Introduction
Stochastic models with fractional Brownian motion (fBm) as the noise source have attained increasing popularity recently. This is because fBm is a continuous Gaussian process, increments of which are positively, or negatively correlated if Hurst parameter H > 1/2, or H < 1/2, respectively. If H = 1/2 fBm coincides with classical Brownian motion and its increments are independent. The ability of fBm to include memory into the noise process makes it possible to build more realistic models in such diverse fields as biology, neuroscience, hydrology, climatology, finance and many others. The interested reader may check monographs [1,2], or more recent paper [3] and the references therein for more information.
Let {B (H) t } t∈[0,∞) be a fractional Brownian motion with Hurst parameter H defined on an appropriate probability space {Ω, A, P}. Fractional Ornstein-Uhlenbeck process (fOU) is the unique solution to the following linear stochastic differential equation where λ > 0 is a drift parameter (we consider ergodic case only) and σ > 0 is a noise intensity (or volatility). Recall that solution to Equation (1) can be expressed by the exact analytical formula: A single realization of the random process {X t (ω)} t∈[0,∞) for a particular ω ∈ Ω is the model for the single real-valued trajectory, part of which is observed. Two examples of such trajectories are given in Figure 1. We assume H > 1/2 throughout this paper so that the fOU exhibits long-range dependence.
For an example of application, see a neuronal model based on fOU described in the recent work [4]. The aim of this paper is to study the problem of estimating drift parameter λ based on an observation of a single trajectory of a fOU in discrete time instants t = 0, h, . . . , Nh with fixed mesh size h > 0 and increasing time horizon T = Nh → ∞ (long-span asymptotics). Estimating drift parameter of a fOU observed in continuous time has been considered in [5,6], where least-squares estimator (LSE) and ergodic-type estimator are studied. These have advantageous asymptotic properties, they are strongly consistent and, if H ≤ 3/4, also asymptotically normal. Ergodic-type estimator is easy to implement, but it has greater asymptotic variance compared to LSE, requires a priori knowledge of H and σ and does not provide acceptable results for non-stationary processes with limited time horizon.
A straightforward discretization of the least-squares estimator for a fOU has been introduced and studied in [7] for H > 1/2 and in [8] for H < 1/2. For the precise formula, see (8). This estimator is consistent provided that both the time horizon T = Nh → ∞ and the mesh size h → 0 (mixed in-fill and long-span asymptotics). However, it is not consistent when h is fixed and T → ∞. This has led us to construct and study LSE-type estimators that converge in this long-span setting.
An easy modification of the ergodic-type estimator to discrete-time setting with fixed time step was given in [9], see (10) for precise formula, and its strong consistency (assuming H ≥ 1/2) and asymptotic normality (for 1/2 ≤ H < 3/4) when N → ∞ were proved, but with possibly incorrect technique (as pointed out in [10]). Correct proofs of asymptotic normality for 0 < H ≤ 3/4 and strong consistency for 0 < H < 1 of this estimator were provided (in more general setup) in [10]. Note that the use of this discrete ergodic estimator requires the knowledge of parameter σ (in contrast to the estimators of least-squares type introduced below). Other works related to estimating drift parameter for discretely observed fOU include [11][12][13], but this list is by no means complete.
This work contributes to the problem of estimating drift parameter of fOU by introducing three new LSE-type estimators: least-squares estimator from exact solution, asymptotic least-squares estimator and conditional least-squares estimator. These estimators are tailored to discrete-time observations with fixed time step. We provide proofs of their asymptotic properties and identify situations, in which these new estimators perform better than the already known ones. In particular, we eliminate the discretization error (the LSE from exact solution), construct strongly consistent estimators in the long-span regime without assuming in-fill condition (the asymptotic LSE and the conditional LSE), and eliminate the bias in the least-squares procedure caused by autocorrelation of the noise term (the conditional LSE). Especially the conditional LSE demonstrates outstanding performance in all studied scenarios. This suggests that the newly introduced (to our best knowledge) concept of conditioning in the least-squares procedure applied to the models with fractional noise provides a powerful framework for parameter estimation in this type of models. The proof of its strong consistency, presented within this paper, is rather non-trivial and may serve as a starting point for investigation of similar estimators in possibly different settings. A certain disadvantage of the conditional LSE is its complicated implementation (involving optimization procedure), which is in contrast to the other studied estimators.
Let us explain the strength of the conditional least-squares estimator in more detail. Comparison of the two trajectories in Figure 1 demonstrates the effect of different values of λ on trajectories of fOU. In particular, it affects the speed of exponential decay in initial non-stationary phase and the variability in stationary phase. As we illustrate below, the discretized least-squares estimator, cf. (8), utilizes information about λ from the exponential decay in initial phase, but is not capable to make use of the information contained in the variability in stationary phase. As a consequence, it is not consistent (in long-span setting). On the contrary, the ergodic-type estimator, cf. (10), is derived from the variance of the stationary distribution of the process. It works well for stationary processes (and is consistent), but leaves idle (and even worse, it is corrupted by) the observation of the process in its initial non-stationary phase. In result, neither of these estimators can efficiently estimate drift from long trajectories with far-from-stationary initial values. This gap is best filled with the conditional least-squares estimator, cf. (25), which effectively utilizes both information stored in non-stationary phase and in stationary phase of the observed process. This unique property is demonstrated in Results and Discussion, where the conditional LSE (denoted by λ 5 ) dominates the other estimators. For the three newly introduced estimators the value of the Hurst parameter H is considered to be known a priori, whereas the knowledge of volatility parameter σ is not required, which is an advantage of these methods. If H is not known, it can be estimated in advance by some of many methods, such as methods based on quadratic variations (cf. [14]), sample quantiles or trimmed means (cf. [15]), or on a wavelet transform (cf. [16]), to name just a few. Another useful works in this direction include simultaneous estimation of σ and H using the powers of the second order variations (see [17], Chapter 3.3). The estimates of H (obtained independently from λ) can subsequently be used in the LSE-type estimators of lambda introduced below in a way similar to [18].
In Section 2, some elements of stochastic calculus with respect to fBm are recalled, stationary fOU is introduced and precise formulas for two existing drift estimators λ 1 and λ 2 are provided. Section 3 is devoted to construction of a new LSE type estimator ( λ 3 ) based on exact formula for fOU. A certain modification of λ 3 (denoted as λ 4 ), which ensures long-span consistency, is introduced in Section 4. In Section 5, we rewrite the linear model using conditional expectations to overcome the bias in LSE caused by autocorrelation of the noise. Least-squares method, applied to the conditional model with explicit formulas for conditional expectations, results in the conditional least-squares estimator ( λ 5 ). We prove strong consistency of this estimator. The actual performance of the newly introduced estimators λ 3 , λ 4 and λ 5 as well as its comparison to the already-known λ 1 and λ 2 , is studied by Monte Carlo simulations in various scenarios and reported in Section 6. The simulated trajectories have been obtained in software R with YUIMA package (see [19]). Section 7 summarizes key points of the article and provides possible future extensions.

Preliminaries
For reader's convenience we briefly review the basic concepts from theory of stochastic models with fractional noise in this section, including definition of fBm, Wiener integral of deterministic functions w.r.t. fBm and stationary fOU. This exposition follows [2,20]. For further reading, see also the monograph [1]. In the end of this section, we also recall formulas for discretized LSE and discrete ergodic estimator.
Fractional Brownian motion with Hurst parameter H ∈ (0, 1) is a centered (zero-mean) continuous Gaussian process {B Note that for the purpose of construction of the stationary fOU, we need a two-sided fBm {B (H) t } t∈R with t ranging over the whole R. In this case we have As a consequence, the increments of fBm are negatively correlated for H < 1/2, independent for H = 1/2 and positively correlated for H > 1/2.
Consider a two-sided fBm with H > 1/2 and define Wiener integral of a deterministic step function with respect to the fBm by formula This definition constitutes the following isometry for any pair of deterministic step functions f and g where ϕ(t, s) = H(2H − 1)|t − s| 2H−2 . Using this isometry, we can extend the definition of the Wiener integral w.r.t. fBm to all elements of the space L 2 H (R), defined as the completions of the space of deterministic step functions w.r.t. the scalar product ., . H defined above. In result, the formula (3) holds true for any f , g ∈ L 2 H (R), see also [21]. We will frequently use this formula in what follows, mainly to calculate the covariances of Wiener integrals.
This process is referred to as the stationary fOU and it can be expressed as Note that the stationary fOU is an ergodic stationary Gaussian process (its autocorrelation function vanishes at infinity).
Consider now a stationary fOU {Z t } t∈[0,∞) observed at discrete time instants t = 0, h, 2h, . . . . The ergodicity and the formula for the second moment of stationary fOU (see e.g., [20] and the expectation can be calculated using (3) and the change-of-variables formula The rest of this section is devoted to the two popular estimators of the drift parameter of fOU observed at discrete time instants described in Introduction-the discretized LSE and the discrete ergodic estimator. Start with the former. Consider a straightforward discrete approximation of the Equation (1): Application of the standard least-squares procedure to the linear approximation above provides the discretized LSE studied in [7,8], which takes the form where h is the mesh size (time step) and with X nh and X (n+1)h being the observations at adjacent time instants t = nh and t = (n + 1)h respectively, of the process X t defined by (1) or (2). Note that having λ 1 expressed in term of F N simplifies its comparison with the estimators newly constructed in this paper. Recall that for consistency of λ 1 mixed in-fill and long-span asymptotics is required due to the approximation error in (7). The discrete ergodic estimator is derived from asymptotic behavior of the (stationary) fOU. Recall the convergence in (4). Rearranging the terms provides an asymptotic formula for drift parameter λ expressed in terms of the limit of the second sample moment of the stationary fOU. Substituting the stationary fOU by the observed fOU X h , X 2h , ...X Nh in the asymptotic formula results in the discrete ergodic estimator: which was studied in [9,10]. Recall that this estimator is strongly consistent in the long span regime (no in-fill condition needed), however, it heavily builds upon the asymptotic (stationary) behavior of the process and fails for processes with non-stationary initial phase (as illustrated by numerical experiments below).

Least-Squares Estimator from Exact Solution
Since the estimator λ 1 obtained from naive discretization of (1) provides reasonable approximations only for non-stationary solutions with short time horizon and small time step h > 0 (as seen from numerical simulations below), we eliminate discretization error by considering exact analytical formula for X t , see (2), and corresponding exact discrete formula for X t+h , where The least-squares estimator for β w.r.t. linear model (11) is given by β = F N , cf. (9), and the estimator for λ can be defined as Numerical simulations show that λ 3 works well for non-stationary solutions and short time horizon (T = 10 in simulations). The results for H = 0.6 are presented in  On the other hand, estimator λ 3 does not provide good results for observations with long time horizon (T = 1000 in simulations) since λ 3 is not consistent if N → ∞ and h > 0 is fixed. The reason is that ξ t and X t in (11) are correlated. In fact, we can calculate the almost sure limit of λ 3 exactly.
The limit is provided in Theorem 1. Its proof uses the following simple lemma (see [22]) to show the diminishing effect of the initial condition on limiting behaviour of sample averages. It is later used in the proof of Lemma 3 as well.
Since the effect of the initial condition vanishes at infinity, the limit behaviour of the non-stationary solution (X nh ) ∞ n=0 is same. Indeed, The convergence of the first summand to zero follows from the facts that and Lemma 1. Similar argument guarantees convergence of the second summand to zero as well. The convergence can be shown correspondingly. In result, we obtain the almost sure convergence The claim follows immediately from definition of λ 3 .

Remark 1.
Note that the convergence in (14) holds true also for H = 1/2 (the double integral in f disappears) and for H < 1/2 (utilizing the fact that a relation analogous to (3) is true even for H < 1/2, if the two domains of integration are disjoint). Consequently a.s.

Asymptotic Least-Squares Estimator
Our goal in this section is modifying λ 3 so that it converges to λ when N → ∞ and h > 0 is fixed. Combination of (12) and (14) yields F N → f (λh) a.s. Thus, we can define the asymptotic least-squares estimator λ 4 by relation f (h λ 4 ) = F N . Since f is one-to-one (see below), the explicit formula for λ 4 reads The following lemma justifies invertibility of f in the definition of λ 4 .

Proof.
Calculate the derivative Continue with Plug this estimate into the formula for f (x) to see Remark 2. Note that f is monotonous also for H = 1/2, but it is not monotonous if H < 1/2, which rules out the possibility to use estimator λ 4 in this singular case.

Proof.
Recall the definition of λ 4 in (16) and the limit in (15): Further recall that f : (0, ∞) → (0, ∞) is differentiable with strictly negative derivative (see (17)). Thus, f −1 is also differentiable with strictly negative derivative and this implies The strongly consistent estimator λ 4 works well for stationary solutions or observations with long time horizon (see Figure 3). Moreover it does not require explicit knowledge of σ (in contrast to λ 2 , which is also strictly consistent). On the other hand it does not provide adequate results for non-stationary solutions and short time horizon, since the correction function f reflects stationary behavior of the process (see Figure 4).

Conditional Least-Squares Estimator
Non-stationary trajectories with long time horizon contain a lot of information about λ, which is encoded mainly in two aspects: speed of decay in initial non-stationary phase and variance in stationary phase (see Figure 1). However, neither of the estimators λ 1 , λ 2 , λ 3 or λ 4 can utilize all the information effectively. This motivates us to introduce another estimator. Recall that λ 3 fails to be consistent because of bias in LSE caused by the correlation between X t and ξ t in Equation (11). To eliminate the correlation between explanatory variable and noise term in the linear model, we switch to conditional expectations. Start from the following equation, which defines η t : where λ is the true value of the unknown drift parameter and E Λ the (conditional) expectation with respect to the measure generated by the fOU {X t } t∈[0,∞) with drift value Λ and initial condition x 0 . (Λ stands for an unknown throughout this section). In other words, E Λ [X t+h |X t ] means the conditional expectation of X t+h , conditioned by X t , where the process X is given by (2) with drift λ = Λ. Hence, E λ has the same meaning as E in previous sections.
In result, we apply the least-squares technique to Equation (18), where λ is to be estimated, i.e., we would like to minimize To calculate c t (X t , Λ) = E Λ [X t+h |X t ] explicitly, use (11) and obtain Note that random vector (ξ t , X t ) has 2-dimensional normal distribution (dependent on parameter Λ) , and we can use explicit expression for its conditional expectation to write With respect to the exact formula for X t given by (2) and relation (3) we get where we used change-of-variable formula in the last step. Analogously Using the expressions for σ ξ,X (Λ) and σ 2 X (Λ) in (20) we obtain Combining formula (21) with (19) yields We can thus reformulate the Equation (18) for the observed process X as the following model (linear in X t , but non-linear in Λ): Now we aim to apply the least-squares method to the reformulated model to get the conditional least-squares estimator λ 5 . To ensure the existence of global minima, we choose a closed interval [Λ L , Λ U ] ⊂ (0, ∞) and define λ 5 as the minimizer of sum-of-squares function on this interval: with criterion function S N defined as where we used (24) with t = nh.
Note that S N (Λ) is continuous in Λ and therefore a minimum on the compact interval [Λ L , Λ U ] exists. Although model (24) is linear in X t , the coefficients A and B depend on t and that complicates the numerical minimization of S N .
where f is defined in (13) and Λ > 0 is arbitrary. Since the coefficient f (Λh) does not depend on t, it is possible to calculate LSE for f (λh) explicitly and to construct the estimator of λ by applying f −1 . Such estimator coincides with λ 4 introduced in previous chapter. Thus λ 4 can be understood as the special case of conditional LSE for the stationary solution.
In order to prove strong consistency of the estimator λ 5 we need to verify uniform convergence of (1/N)S N (Λ) to a function S ∞ (Λ) specified below. Let us start with the following proposition on uniform convergence of A τ,x and B τ,x . This proposition will help us in the sequel to investigate limiting behaviour of the two terms A Λnh,Λh and B Λnh,Λh in the sum-of-squares function S N . Proposition 1. Consider A τ,x and B τ,x defined by (22) and (23), and f defined by (13). Fix arbitrary and lim Proof. In order to simplify the notation, denote Begin with (27): Choose any 0 < Λ L < Λ U < ∞ and recall that h > 0 is fixed. The uniform convergences in (27) and (28) imply the following convergences uniformly in Λ ∈ [Λ L , Λ U ]: and lim respectively. Indeed, set x L = Λ L h and x U = Λ U h and fix any ε > 0. There is τ 0 > 0 such that for any τ > τ 0 , which proves (29). The convergence in (30) can be shown analogously. These uniform convergences will be helpful in the proof of the following Lemma, which provides uniform convergence of 1 N S N (Λ) to a limiting function S ∞ (Λ). This uniform convergence is the key ingredient for the convergence of the minimizers λ 5 .
Lemma 3. Let f be defined by (13) and let S N (Λ) be defined by (26) Then Proof. First consider the stationary solution {Z t } t∈[0,∞) to (1) corresponding to drift value λ.
Comparison of (6) with (13) yields It enables us to write for any Λ > 0, and, consequently Recall that {Z t } t∈[0,∞) is ergodic and |Z t − X t | vanishes at infinity. Using Lemma 1 in the same way as in the proof of Theorem 1 implies For the second term, write Application of Lemma 1, the convergence in (29) and the continuity of f ensure the convergence with probability one of both summands to zero as N → ∞.
The uniform convergence of the third term can be shown analogously: where we use which follows directly from (29) and the continuity of f .
The last term in (33) can be treated similarly: Lemma 1 concludes the proof: Previous considerations lead to the convergence of λ 5 , being the minimizers of 1 N S N (Λ) to the minimizer of S ∞ (Λ). Next lemma ensures that this minimizer coincides with the true drift value λ. Lemma 4. S ∞ defined by (31) is continuous on (0, ∞) and λ is the unique minimizer of S ∞ , i.e.,

Proof. By definition
The claim follows immediately, because f is one-to-one (it is strictly decreasing). Continuity of S ∞ is a direct consequence of the continuity of f . Now we are in a position to prove the strong consistency of λ 5 . Theorem 3. Consider bounds 0 < Λ L < Λ U < ∞ so that they cover the true drift λ of the observed solution {X t } t∈[0,∞) to Equation (1), i.e., λ ∈ (Λ L , Λ U ). Then λ 5 defined in (25) is strongly consistent, i.e., Proof. The proof follows standard argumentation from nonlinear regression and utilizes Lemma 3 and Lemma 4. Choose ε > 0 sufficiently small so that [Λ L , Λ U ] \ (λ − ε, λ + ε) = ∅ and set Consider a set of full measure on which the uniform convergence (32) holds and take N 0 > 0 such that As λ 5 minimizes S N , for all N ≥ N 0 we have Since ε was arbitrary (if small enough), we obtain the convergence on a set of full measure.

Results and Discussion
In Table 1 we present comparison of the root mean square errors (RMSE) of all considered estimators for λ = 0.5 and several combinations of x 0 , T and H. Estimators λ 1 and λ 3 demonstrate good performance in scenarios with far-from-zero initial (x 0 = 100) condition and short time horizon (T = 10) This illustrates the fact that these estimators reflect mainly the speed of convergence to zero of the observed process in its initial phase. Increasing time horizon to T = 1000 adds a stationary phase to the observed trajectories, which distorts the estimators λ 1 and λ 3 . Estimators λ 2 and λ 4 perform well in settings with stationary-like initial condition (x 0 = 0) and long time horizon (T = 1000). This is because they are constructed from the stationary behavior of the process. Taking far-from-zero initial condition ruins these estimators, unless trajectory is very long.
The conditional LSE, λ 5 , shows reasonable performance in all studied scenarios and it significantly outperforms the other estimators in scenario with far-from-stationary initial condition (x 0 = 100) and long time horizon (T = 1000). This results from the unique ability of this estimator to reflect and utilize information about the drift from both non-stationary (decreasing) phase and stationary (oscillating) phase. This is also illustrated on Figure 5. On the other hand, evaluation of λ 5 is the most numerically demanding compared the other studied estimators.
If x 0 = 0 and T = 10, λ 5 shows greater RMSE than λ 1 and λ 3 in Table 1 due to λ = 1/2 being relatively close to zero. This causes that λ 1 and λ 3 have smaller variance (although greater bias) compared to λ 5 (see Figure 6). In order to present this effect we have calculated RMSE for simulations in same scenario but with λ = 3/2 (see Table 2). λ 5 provides smaller RMSE than the other estimators in this setting.  Table 2. Root mean square errors of the studied estimators calculated using 100 numerical simulations with λ = 3/2.

Conclusions
Three new estimators were defined and studied: 1.
The least-squares estimator from exact solution ( λ 3 ), which improves the popular discretized LSE ( λ 1 ) by eliminating the discretization error. It is easy to implement, since it can be calculated by a closed formula. However, it fails to be strongly consistent in long-span regime.

2.
The asymptotic least-squares estimator ( λ 4 ), which is a modification of λ 3 with respect to its asymptotic behavior. In result, λ 4 is strongly consistent in the long-span regime and behaves similarly to the well-established discrete ergodic estimator ( λ 2 ). The advantage of λ 4 is that it does not require a priori knowledge of the volatility σ. On the other hand, its implementation includes a root-finding numerical procedure. 3.
The conditional least-squares estimator ( λ 5 ), which eliminates the bias in the least-squares procedure by considering the conditional expectation of the response as the explanatory variable. The possibility to express the conditional expectation explicitly makes this approach feasible. This conditioning idea (which is new in the context of the models with fractional noise, to our best knowledge) provides exceptionally reliable estimator, which outperforms all the other studied estimators. We proved the strong consistency (in long-span regime) of this estimator. The implementation comprises solving an optimization problem.
These new estimating procedures can help practitioners or scientists from various fields to improve the calibration of their models based on available data with autocorrelated noise (these are typically observed/measured in discrete time instants) and, consequently, obtain more reliable conclusions from the calibrated models.
An interesting future extension would certainly be to explore the potential of the promising idea of the conditioning within least-squares procedure to more general models and settings (including d-dimensional fOU, fOU with H < 1/2, non-linear drift, multiplicative noise, etc.).