Institutional Knowledge at Singapore Management University Maximum likelihood estimation for the fractional Vasicek model

This paper is concerned about the problem of estimating the drift parameters in the fractional Vasicek model from a continuous record of observations. Based on the Girsanov theorem for the fractional Brownian motion, the maximum likelihood (ML) method is used. The asymptotic theory for the ML estimates (MLE) is established in the stationary case, the explosive case, and the null recurrent case for the entire range of the Hurst parameter, providing a complete treatment of asymptotic analysis. It is shown that changing the sign of the persistence parameter will change the asymptotic theory for the MLE, including the rate of convergence and the limiting distribution. It is also found that the asymptotic theory depends on the value of the Hurst parameter.


Introduction
Since Vasicek (1977) introduced a model to describe the evolution of short-term interest rates, the so-called Vasicek model has enjoyed a wide range of applications. Jamshidian (1989) used it to price bond options. Scott (1987) used it to model the evolution of instantaneous volatility of stock price and to price European call options.
Many extensions have been made to generalize the specification of Vasicek. For example, motivated by the phenomenon of long-range dependence found in data of hydrology, * Katsuto Tanaka, Faculty of Economics, Gakushuin University, Japan.
Email: katsuto.tanaka@gakushuin.ac.jp. Weilin Xiao, School of Management, Zhejiang University, Hangzhou, 310058, China. Email: wlxiao@zju.edu where σ is a positive constant, µ, κ ∈ R, B H t is an fBm with H ∈ (0, 1) being the Hurst parameter. An fBm B H t is a zero mean Gaussian process, defined on a complete probability space (Ω, F, P), with the following covariance function (1. 2) The process B H t is self-similar in the sense that ∀a ∈ R + , B H at d = a H B H t . It becomes the standard Brownian motion W t when H = 1/2 and can be represented as a stochastic integral with respect to the standard Brownian motion. It is negatively correlated when 0 < H < 1/2. When 1/2 < H < 1, it has long-range dependence in the sense that In this case, the positive (negative) increments are likely to be followed by positive (negative) increments. The parameter H, which is also called the self similarity parameter, measures the intensity of the long range dependence.
Parameter κ is often referred to as the persistence parameter. When κ > 0, X t is stationary and ergodic. In this case, µ is the unconditional mean of X t and κ is the mean-reversion parameter. When κ < 0, X t is explosive and hence non-ergodic. When κ = 0, X t is nullrecurrent and the drift term κ (µ − X t ) dt disappears. So µ is superfluous in this case. The ergodic fVm has been used to model the evolution of instantaneous volatility in Comte and Renault (1998), the evolution of quadratic variation in Aït-Sahalia and Mancini (2008), the evolution of realized variance in Gatheral et al. (2018), the evolution of VIX in Xiao et al. (2019).
An alternative to and perhaps slightly more general specification than Model (1.1) is In Model (1.3), even when κ = 0, the drift term does not vanish and it is αdt. This alternative specification for the drift term was used in Chan et al. (1992) and Yu and Phillips (2001).
When α in (1.3) is known (without loss of generality, it is assumed to be zero), (1.3) becomes the fractional Ornstein-Uhlenbeck (fOU) process.
Assuming that a continuous record of observations is available for X t with t ∈ [0, T ], a number of studies have introduced methods to estimate κ and α (or µ) and developed asymptotic distributions for the proposed estimators under the scheme of T → ∞. When H > 1/2 and κ > 0, borrowing the idea of Hu and Nualart (2010) and Hu et al. (2017), Xiao andYu (2019a) considered two methods, the least squares (LS) estimates and the ergodic-type estimates of κ and µ. When H ≥ 1/2 and κ = 0 or κ < 0, Xiao and Yu (2019a) considered the LS method. Xiao and Yu (2019b) extends the results of Xiao and Yu (2019a) from the case where H ∈ (1/2, 1) to where H ∈ (0, 1/2). Lohvinenko and Ralchenko (2017) considered the maximum likelihood (ML) estimates of κ and α when κ > 0 and H ∈ (1/2, 1).
The rest of the paper is organized as follows. Section 2 introduces the MLE of κ and α. Section 3 is devoted to the asymptotic theory for the stationary case (i.e., κ > 0) but H ∈ (0, 1/2]. Section 4 studies the asymptotic properties of the MLE in the null recurrent case (i.e., κ = 0) and for the entire range for the Hurst parameter H ∈ (0, 1). In Section 5, we establish the asymptotic behaviors of the MLE for the non-ergodic case (i.e., κ < 0) and for the entire range for the Hurst parameter H ∈ (0, 1). Section 6 contains some concluding remarks and gives directions of further research. All the proofs are collected in the Appendix.
We use the following notations throughout the paper: p →, d → and ∼ denote convergence in probability, convergence in distribution, and asymptotic equivalence, respectively, as T → ∞.
Throughout this paper, the constant C only depends on H, whose values can differ at different places.

ML Estimation
Following Kleptsyna et al. (2000) and Lohvinenko and Ralchenko (2017), by applying the Girsanov theorem for the fBm developed in Norros et al. (1999), one can get the expression for the continuous-record log-likelihood function for Model (1.3) as follows: where Taking the derivatives of the log-likelihood function with respect to κ and α and setting them to zero, Lohvinenko and Ralchenko (2017) obtained the following expressions for the MLE of α and κ:α Combining (1.3), (2.2) with (2.9), we deduce that (2.13) Using the idea of Kleptsyna and Le Breton (2002), Lohvinenko and Ralchenko (2017) obtained the following results (2.14) The process M H t , the so-called fundamental martingale, is a Gaussian martingale with the variance function being ω H t . Moreover, the natural filtration of the martingale M H coincides with the natural filtration of the fBm. Based on (2.15) and (2.16), the MLE of α and κ can be represented as When a continuous record of observations of X t is available, Lohvinenko and Ralchenko (2017) studied the consistency and the asymptotic normality of the MLE defined by (2.6) and (2.7) when H > 1/2 and κ > 0. The goal of the present paper is to establish asymptotic theory for the MLE of α and κ for all the other cases, including H < 1/2 and κ > 0, H ∈ (0, 1) and κ = 0, H ∈ (0, 1) and κ < 0.
3 Asymptotic Theory When κ > 0 In this section, inspired by Lohvinenko and Ralchenko (2017), we extend the asymptotic properties ofα T andκ T from the case of H ∈ (1/2, 1) to the case of H ∈ (0, 1/2]. For the sake of comparison, we first introduce the main result of Lohvinenko and Ralchenko (2017).
When H > 1/2, Lohvinenko and Ralchenko (2017) obtained the asymptotic normality for the MLE of α and κ, i.e., The objective of this section is to obtain the consistency and the asymptotic normality of α T andκ T when H ∈ (0, 1/2]. Since the asymptotic laws ofα T are different when H ∈ (0, 1/2) from those when H = 1/2, we need to treat them separately.
Theorem 3.1 For κ > 0 and H ∈ (0, 1/2) in Model (1.3), as T → ∞, we have Remark 3.1 Comparing the asymptotic theory with that obtained in Lohvinenko and Ralchenko (2017), the asymptotic normality continues to hold for both estimators. Moreover, comparing (3.11) with (3.2), we can see that the asymptotic theory forκ T is the same regardless of H ∈ (0, 1/2) or H ∈ (1/2, 1). Comparing (3.10) with (3.1), we can see that the asymptotic variance ofα T depends on H. The asymptotic variance is λ H σ 2 with the consistency order Remark 3.2 The asymptotic theory for the MLE of κ in the fOU when H ∈ (0, 1/2) has been developed in the literature; see, for example, Theorem 2 in Brouste and Kleptsyna (2010). It is the same as in (3.11). So having to estimate an additional parameter α, there is no efficiency loss in estimating κ asymptotically. The LS estimator of κ is given bŷ where the stochastic integral T 0 X t dX t is interpreted as a divergence integral. The ergodic-type estimate of κ is given byκ Moreover, Xiao and Yu (2019b) showed that where Figure 1 compares the efficiency of ML, LS and ergodic-type estimates of κ by plotting δ 2 LS , δ 2 HN against 2 when H takes a value between (0, 0.5). It can be seen that the LS estimate is the most efficient, followed by the MLE and then by the ergodic-type estimate. The efficiency gap is larger for a smaller value of H and disappears when H = 1/2. Xiao and Yu (2019a). Since our model is slightly different from that in Xiao and Yu (2019a) (i.e., µ versus α), before we report our asymptotic theory, we review asymptotic theory of the LS estimate of κ and µ in the Vasicek model given in Xiao and Yu (2019a).
(3.19) and T 0 X t dX t is interpreted as an Itô integral.
While it was not shown,κ T andμ T are independent asymptotically. Using the results of Lemma 3.2 and the independence, we can obtain the asymptotic laws ofα T andκ T defined by (2.6) and (2.7).
Remark 3.4 When α = 0, we can summarize the three sets of asymptotic theory for the MLE of α as follows: If H ∈ (0, 1/2), where the last asymptotic theory was obtained in Theorem 3.4 of Lohvinenko and Ralchenko (2017). While the three sets of asymptotic theory forκ T are identical, the three sets of asymptotic theory forα T are different. When H changes from a value in (0, 1/2) to 1/2, while the rate of convergence stays the same (i.e., √ T ), the asymptotic variance changes from 2α 2 κ to σ 2 + 2α 2 κ . When H changes from a value in (0, 1/2] to (1/2, 1), both the rate of convergence and the asymptotic variance change.
Remark 3.5 When α is known and assumed to be zero and H = 1/2, the asymptotic theory for the MLE of κ was obtained in Brown and Hewitt (1975) and in Feigin (1976). The two sets of asymptotic theory are the same, suggesting that there is no efficiency loss in estimating κ when α is estimated or not.

Asymptotic Theory When κ = 0
In this section, we consider the asymptotic laws ofα T andκ T for the entire range for the Hurst parameter, i.e., H ∈ (0, 1). Note that when κ = 0, we have it is well known that the MLE of κ can be expressed asκ ( 4.2) whereP Before considering asymptotic properties ofα T andκ T , we first introduce a lemma, which will be used to derive the asymptotic theory. (4.10) where B(·, ·) is the Beta function, λ H is defined by (2.4) and a H = 3−2H 4(1−H) .
We can now describe the asymptotic behavior ofα T andκ T as T → ∞.
Remark 4.2 In the case of H = 1/2 and α = 0, with α and κ being estimated, by the scaling properties of the Brownian motion, we have Thus, the limiting distributions ofα T andκ T are not normal. In particular, the asymptotic distribution ofκ T is a Dickey-Fuller-Phillips type distribution with the rate of convergence being T . Hence, when κ = 0 is unknown, the value of α plays an important role in the study of asymptotic laws for the MLE.

Asymptotic theory when H = 1/2
Now, we can state the key results of the asymptotic theory forα T andκ T when H = 1/2.
Remark 5.1 In (5.4), if we set X 0 = α κ , the limiting distribution of e −κT 2κ (κ T − κ) becomes a standard Cauchy variate. This limiting distribution is the same as that in the Vasicek model driven by a standard Brownian motion (see, e.g., Feigin, 1976). The asymptotic theory in (5.4) is similar to that in the explosive discrete-time and continuous-time models when discretely-sampled data are available (see e.g., White, 1958;Anderson, 1959;Phillips and Magdalinos, 2007;Yu, 2015, 2016).
. Applying (2.10) and (2.11), we can obtain where I ν (z) is the modified Bessel function of the first kind defined by and we used the asymptotic expansion that, as z → ∞, Consequently, we can state the following lemma.
Remark 5.2 For the entire range of H ∈ (0, 1), the asymptotic distribution ofα T is normal with the rate of convergence of T 1−H and variance σ 2 . This asymptotic distribution is the same as that of the LS estimate (see Theorem 3.5 in Xiao and Yu (2019a) and Section 3 in Xiao and Yu (2019b)).
In this case, to obtain the asymptotic distribution of e −κT (κ T − κ) /(2κ), one needs to calculate the Laplace transform of dω H t . This is complicated and we leave it in our future work.

Concluding Remarks and Future Directions
The fVm has found more and more applications in practice. In this paper, we consider the MLE of parameters in the drift term when a continuous record of observations is available. The ML estimation is made possible due to the presence of the fundamental martingale and the generalized Girsanov theorem. The asymptotic theory is based on the assumption that It is shown that the MLE of α is asymptotically normal regardless of the sign of κ.
However, the asymptotic law of the MLE of κ critically depends on the sign of κ. More precisely, when κ > 0 and H ∈ (0, 1), we have shown that the asymptotic distribution of the MLE of κ is normal with the rate of convergence being √ T . The asymptotic variance is 2κ, which is independent of H. When κ = 0 and α = 0, the asymptotic distribution of the MLE of κ is normal with the rate of convergence being T 2−H . The asymptotic variance depends on H. When κ = 0 and α = 0, the asymptotic distribution of the MLE of κ is a Dickey-Fuller-Phillips distribution with the rate of convergence being T . When κ < 0, it is shown that the limiting distribution is a Cauchy-type with the rate of convergence being e −κT . If one further assumes that X 0 = α/κ, the limiting distribution becomes a standard Cauchy variate multiplied by sin(πH).
This study also suggests several important directions for future research. First, it is worth investigating to generalize the results in this paper to nonlinear stochastic differential equations driven by the fBm. The ergodic theorem, fractional calculus and Malliavin calculus will be employed for obtaining the asymptotic properties of both the MLE and the LS estimators.
Second, in this paper, H and σ are assumed to be known.
It would be interesting to study the asymptotic properties of these estimators mentioned above, which will be reported in later work.
Third, this paper assumes that a continuous record of an increasing time span is available for the development of asymptotic theory. In practice, data is typically observed at discrete time points with (0, h, 2h, ..., N h(:= T )) where h is the sampling interval and T is the time span. When high frequency data over a long span of time period is available, one may consider using a double asymptotic scheme by assuming h → 0 and T → ∞. The discretized model corresponding to (1.3) is given by where L is the lag operator, d = H − 1/2. As shown in Wang and Yu (2016), under the double asymptotic scheme, exp(−κh) This implies an autoregressive (AR) model with the AR root being moderately deviated from unity and with a fractionally integrated error term with d ∈ (−1/2, 0). This model is closely related to a model considered in Magdalinos (2012) where it is assumed that d ∈ (0, 1/2). Developing double asymptotic theory based on discretely sampled data will allow one to extend the results of Magdalinos (2012) to the case where d ∈ (−1/2, 1/2). The development of the MLE and the asymptotic theory is beyond the scope of this paper and will be reported in later work.

Proof of Theorem 3.1
We first consider (3.3). Using (2.11), (2.2) and the properties of the modified Bessel function of the first kind, for T tending to infinity, we get Then, as T → ∞, using Lemma 4.2 of Lohvinenko and Ralchenko (2017), we can obtain By the proof of Theorem 3 in Tanaka (2013), we can easily obtain (3.5) and (3.6). The result of (3.7) follows directly from (see the proof of Lemma 4.7 in Lohvinenko and Ralchenko, 2017). Applying Lemma 4.5 in Lohvinenko and Ralchenko (2017), we can easily obtain (3.8). Now, we are left with (3.9). Using the Cauchy-Schwarz inequality, (3.6) and (3.7), we which implies (3.9).

Proof of Lemma 3.2
For H = 1/2, using arguments similar to the proof Theorem 3.1 in Xiao and Yu (2019a), we can easily obtain Now, we consider the second term on the right-hand side of (7.10). For convenience, let Moreover, using some basic facts on the Malliavin calculus for Gaussian processes (for details, see Nualart, 2006), we obtain Consequently, we have A standard calculation yields Moreover, a standard calculation implies (7.14) By combining (7.13) with (7.14), we can obtain that J 1 converges in L 2 to σ 2 2κ as T → ∞. For J 2 , we can easily obtain Hence J 2 also converges to σ 2 2κ in L 2 as T → ∞. Finally, we consider J 3 . A standard calculation yields Then, a simple calculation shows that (7.16) From (7.10)-(7.16), we obtain where L 2 −→ denotes convergence in mean square. Using (7.11), (7.17) and Theorem 4 in Nualart and Ortiz-Latorre (2008), we have On the other hand, from (2.17), we have Finally, combining (7.8), (7.9), (7.10), (7.18), (7.19) with Slutsky's theorem, we obtain (3.20). The proof of (3.21) is analogous to the proof of (3.4) in Xiao and Yu (2019a) and omitted.

Proof of Lemma 5.1
Let us observe that (5.6) can be obtained easily from Theorem 2 in Tanaka (2015) and the details are omitted here. For (5.7), using the Cauchy-Schwarz inequality, we have which implies (5.7) directly.
Let 1 F 1 (·, ·, ·) be the confluent hypergeometric function of the first kind. From (5.5), and the well known result of the confluent hypergeometric function (see for example, Eq. 3.383 (1) in Gradshteyn and Ryzhik, 2007), we have We now deal with (5.9). Let ζ t = σ We now turn to the term (5.10). Using (5.5), we can easily obtain which yields (5.10).