Parametric Estimation of Diffusion Processes: A Review and Comparative Study

This paper provides an in-depth review about parametric estimation methods for stationary stochastic differential equations (SDEs) driven by Wiener noise with discrete time observations. The short-term interest rate dynamics are commonly described by continuous-time diffusion processes, whose parameters are subject to estimation bias, as data are highly persistent, and discretization bias, as data are discretely sampled despite the continuous-time nature of the model. To assess the role of persistence and the impact of sampling frequency on the estimation, we conducted a simulation study under different settings to compare the performance of the procedures and illustrate the finite sample behavior. To complete the survey, an application of the procedures to real data is provided.


Introduction
Diffusion processes described by stochastic differential equations (SDE) are frequently applied in physical, biological and financial fields to model dynamical systems with a disturbance term. Its use in mathematical finance for modeling the evolution of important economic variables, such as the interest rate, has been increasingly important over the last decades. The need to develop the analysis of the term structure of interest rates in a stochastic environment emerges as a consequence of market turbulence throughout the seventies. New theories of the term structure of interest rate based on pricing models in absence of arbitrage under stochastic environment were emerging: Merton [1] used the interest rate in option pricing modeling it as a stochastic process. Subsequently, Black and Scholes [2] had an important impact on arbitrage models of the term structure of interest rates, as shown in [3][4][5][6][7]. In these models, the interest rate is the solution of the stochastic differential equation, therefore we can use the framework of Markov processes theory for its analytical treatment. The continuous time paradigm proves to be an especially useful tool, but the continuous time nature of the model does complicate the estimation of the parameters because available data are sampled in discrete time. Thus, parameter estimates are subject to discretization bias together with estimation bias and this issues have been addressed using different estimation methods. The finite sample bias is especially acute when the process is highly persistent, such as time series of interest rates, and alters the valuation of derivatives since short term interest rate models are used to price these instruments [8].
We consider the SDE defined in a filtered probability space Ω, F , {F t } t≥0 , P , where Ω is a nonempty set, F is a σ-algebra of subsets of Ω and P is a probability measure, P(Ω) = 1. We will focus on parametric time-homogeneous stochastic differential equations, where X t is an Itô process, dX t = m(X t , θ) dt + σ(X t , θ) dW t , with X 0 = x 0 , 0 ≤ t ≤ T, with X t ∈ R, W t a {F t } t≥0 -adapted standard Wiener process and θ an unknown parameter vector such that θ ∈ Θ ⊂ R d with d a positive integer and Θ a compact set. We assume the knowledge of the drift and diffusion functions parametric structure, m(·, θ) : R × Θ → R and σ(·, θ) : R × Θ → (0, ∞), respectively, where none is time dependent. Jump-diffusions and fractional Brownian motion or Lévy-driven SDEs are out of the scope of this review. A parametric specification of (1) that encloses different interest rate models is the Chan-Karolyi-Longstaff-Sanders (CKLS) model proposed in [6], given by dX t = κ(µ − X t ) dt + σX γ t dW t , which is a mean-reverting process that allows the conditional mean and variance to depend on the interest rate level X. The drift parameter µ is the long-term mean and κ is the rate of reversion, while the diffusion parameter σ is the volatility and γ is the proportional volatility exponent that measures the sensitivity of the volatility regarding the process in time t. This model generalizes prior interest rate models by imposing restrictions on the parameters. When γ is 0 or 0.5-which yields the Vasicek [3] and CIR [5] model, respectively-the process is tractable and admits an analytical solution.
In this article, we consider SDEs with deterministic volatility function, though current option pricing literature withdraws the constant volatility assumption and adopts instead a stochastic volatility framework. Nevertheless, the issues addressed here are inherited by stochastic volatility models and the estimation methods can be extended to SDEs with volatility described by a stochastic process. Furthermore, models based on the Vasicek process are still used in the financial market (see, e.g., [9] or [10]) and new estimation procedures are being proposed for jump-diffusions [11] or Lévy-driven processes [12].
Several studies of estimation methods for diffusion processes can be found in the literature, see [13] for a theoretical comparison or [14] for a more practical approach. In [15], a comparative study of different discretization methods and moment-based estimation is carried out, while [16] focused on simulation-based approaches. The aim of this paper is to complement these comparative studies extending the evaluation of the finite sample performance to different settings-near unit-root time series, various degrees of persistence without changes in the marginal density or different sampling intervals and observation times-whose impact on estimation have been hinted at in the financial literature. The methods here considered are maximum likelihood estimation, local linearization [17,18], Hermite polynomial expansion [19], Kalman filter [20], Markov Chain Monte Carlo [21,22] and generalized method of moments [6,23]. The procedures are provided in the companion estsde R package [24], implemented in C and C++ for the sake of efficiency.
The sections are organized as follows: Section 2 provides an outline of the estimation methods, Section 3 designs the Monte Carlo experiment and discusses the finite sample performance of the procedures. Real data applications to interest rate series are presented in Section 4 and conclusions are drawn in Section 5. Tabulated simulation results are deferred to Appendix A.
Although the model is formulated in continuous time, data are registered in discrete time points. In this regard, for the estimation of the continuous time model parameters we should consider a discrete version of it. The observation scheme assumed is the fixed-∆ scheme, in which the time step ∆ between two consecutive observations is fixed and the sample size n ∈ N increases, as well as the time interval [0, T = n∆]. One of the most used approximation schemes is the Euler-Maruyama method [26]: given an Itô process {X t , 0 ≤ t ≤ T}, solution of the SDE in (1) with initial value X t 0 = x 0 and the discretization of the time interval [0, T], 0 = t 0 < t 1 < · · · < t n = T, the Euler-Maruyama approximation of X is a continuous stochastic process that satisfies the iterative scheme In the remainder of this section, we provide an outline of the procedures and their implementation to estimate the unknown parameter vector θ. The methods described fall into two categories: likelihood-based and method of moments. The method of moments provides estimates by matching population and sample moments and minimizing a quadratic form, hence we assume that the moment conditions of X t are bounded: Assumption 3 (Bounded moments). For all k > 0, all order k moments of the diffusion process exist and are such that sup The maximum likelihood (ML) estimates are yielded by different methods: exact and discrete (piecewise constant or linear approximations) ML, univariate Hermite expansion of the transition function, filtering algorithm (linear quadratic estimation) and Bayesian approach.

Exact Maximum Likelihood
As X t is a Markov process, we can obtain the likelihood function L n (θ) of the discrete process using Bayes' rule, where p θ (∆, X t i | X t i−1 ) denotes the transition density function associated to the parametric diffusion model, with unknown parameter θ. If the parametric form of the model that generates the observations {X t i } n i=0 is known, we can use a maximum likelihood method, so that the maximum likelihood estimator (MLE) of the true parameter iŝ where n (θ) = log L n (θ) is the log-likelihood function. This estimation method can seldom be used with diffusion processes, as few models have a closed-form solution, e.g., [2,3,5]. As a consequence, new procedures have been proposed in the ML framework based on different approximations of the transition density function.

Discrete Maximum Likelihood
Discrete time likelihood (also known as pseudo-likelihood) methods emerge as an alternative approach to approximate the unknown transition density of SDEs, where the diffusion model is discretized with a certain numerical scheme, when exact maximum like-lihood is unfeasible. In some cases, analytical expressions for the parameters estimates can be obtained, otherwise a numerical optimization routine is needed to maximize (minimize) the (negative) log-likelihood. Several algorithms have been proposed to approximate SDEs (see, e.g., [18,27,28]), here we briefly detail two of them.

Euler Method
To estimate the model we can use an approximation scheme, such as the Euler-Maruyama [26] method. With this method we do not approximate the transition density directly, instead the trajectory of the process is approximated so that we can use the likelihood of the discretized version of the model, given by where ε t i are i.i.d. N(0, 1) and ∆ = (t i+1 − t i ). Therefore, the estimation with the Euler-Maruyama method [29] proceeds as if the observations follow a Gaussian distribution, with mean the drift function and standard deviation the diffusion function. Thus, the transition density is given by The implementation of this method is straightforward, however, Euler-type schemes depend on the sampling interval ∆ > 0 and introduce discretization bias in the estimates, although they converge to exact ML estimates as ∆ → 0. Departures from Gaussian distribution can also increase bias since the Euler scheme increments are conditionally Gaussian.

Local Linearization
While the Euler-Maruyama approximation method restricts coefficients of the drift and diffusion terms to be piecewise constant, the local linearization instead uses a linear approximation. Considering the SDE where σ > 0 ∈ R is assumed constant, the local linearization (LL) method developed in [17,18] is an approximation method by which the drift function m(·, θ) is locally approximated by a linear function of X t (no expansion is required for the diffusion function, as it is constant). The numerical scheme is based on the local linearization of the SDE's drift coefficient by means of a truncated Itô-Taylor expansion. The process discretized by the LL method is and L t = ∂m(X t , θ)/∂X. The linear function K t approximates the drift function m(·), with K t constant in the interval i∆, (i + 1)∆ . Given that the stochastic integral is a Gaussian random variable, the transition density for X (i+1)∆ given X i∆ is indeed Gaussian. Thus, we have that X (i+1)∆ | X i∆ follows a normal distribution with mean and variance given by respectively, and therefore, maximum likelihood can be used to obtain the parameter estimates. As the SDE in (3) has a constant diffusion function, if the parametric specification we want to estimate is more intricate, a transformation is needed (e.g., standardizing the diffusion term with the Lamperti transform, see Section 2.3). The LL approximation can provide more accurate estimates than the Euler scheme (specially for nonlinear drifts), though the implementation is more troublesome as a prior transformation of the process is needed, as well as the derivative of the transformed process drift function (which can be computed numerically or analytically, for higher efficiency).

Hermite Polynomial Expansion
Bayes' rules combined with the Markovian nature of the diffusion process, inherited by discrete data, implies that the log-likelihood is of the form assuming that the process is observed in the time interval {i∆ | i = 0, . . . , n}, with fixed ∆. Aït-Sahalia [19] proposed a maximum likelihood method for diffusion processes with discrete samples, based on an approximation of the likelihood function using Hermite polynomials. The author constructed a succession of approximations {p (K) X,θ | K ≥ 0} of the transition density, such that (4) is a succession of approximations of (K) n . We will need to standardize the diffusion coefficient of X, which is achieved using the Lamperti transform, ds, and using Itô's formula in the new process U t , we have the unitary diffusion provided that ψ −1 (X t , θ) exists. This transformation allows the computation of the transition density p X,θ from p U,θ through the Jacobian formulã where the succession of explicit functionsp U , based on Hermite expansions of the density p U around a Gaussian density function up to order J, approximates p U . In ( [19], Theorem 1) it is proved that The coefficients of the density expansion terms can be calculated with a Taylor series expansion in ∆, denoting p (J,K) U as the order K Taylor series in ∆ of p (J) X . Usually, J = 6 is taken, so the first seven Hermite coefficients (j = 0, . . . , 6) are used, along with Taylor series up to order K = 3.
To obtain the MLE, the approximation of the log-likelihood function, is maximized. Thus, we obtain an estimatorθ (J) n close to the exactθ n ( [19], Theorem 2). For stationary processes, the estimator satisfies The practical implementation of this procedure is limited to the existence of an explicit inverse ψ −1 (X t , θ) and its complexity emerges from the analytically approximation of the Hermite expansion coefficients, though in return the accuracy of the estimates is high.

Kalman Filter
The state-space model or dynamic linear model, introduced in [20], employs an order one vector autoregression as a state equation and assumes that we do not observe the state vector x t directly, but a linear transformation of it with noise added, y t . The state-space representation of the dynamics of y t is given by the system of equations: where the state vector x t is p × 1, the observed data vector y t is q × 1, the observation matrix A t is q × p, u t is a r × 1 vector of inputs, Υ is p × r, Γ is q × r and, for simplicity, we assume that {w t } and {v t } are uncorrelated. Equation (5) is known as the state equation and Equation (6) as observation equation.
, the Kalman filter equations, with initial state x 0 ∼ N(x 0|0 , P 0|0 ), are given by The Kalman filter is a recursive algorithm, Equations (7) and (8)  are uncorrelated Gaussian vectors. The Kalman filter can be set up to evaluate the likelihood function, which can be computed with the innovations ε i n i=1 , where ε t = y t − A t x t|t−1 − Γu t , and because they are independent Gaussian random vectors with zero mean and covariance matrix Σ t = A t P t|t−1 A t + R, we can write the likelihood, L Y (θ), as The log-likelihood in (12) can be maximized by numerical search procedures to obtain the estimation of θ, where the derivatives of (12) can be calculated numerically or analytically. The analytical derivatives can be obtained recursively by differentiating the Kalman Filter recursion, see [30]. Algorithms like the EM [31] or the Newton-Raphson can be used to maximize the log-likelihood, as in [32,33], for an example of both approaches.
Under general conditions, letθ n be the estimator of the true parameters θ 0 obtained maximizing the innovation log-likelihood in (12). Subject to certain regularity conditions, when n → ∞ where i(θ 0 ) is the asymptotic Fisher information matrix. The Kalman filter will generate strongly consistent estimates of θ 0 when the parameters lie in a compact set. More details on regularity conditions, convergence and asymptotic properties can be found in [30,[34][35][36][37]. The Kalman filter can be used to estimate the parameters of the diffusion process (1) writing in state-space form the discrete version given in (2), with Y t i = (X t i+1 − X t i )/∆ and ε t i are i.i.d. N(0, 1) random variables. The discretized model (13) is obtained by means of a Euler-Maruyama scheme, which makes the likelihood function in (12) equivalent to the one obtained using the Euler method (see Section 2.2.1), hence the parameter estimates for both methods will be close. The Kalman filter algorithm provides a computationally efficient method for evaluating the log-likelihood. One of the advantages of the state-space representation of the system (5) and (6) is that it allows latent variables, which simplifies the extension to SDEs with stochastic volatility. Furthermore, this framework does also admit the estimation of multidimensional models. Some extensions of the filter to nonlinear systems have been proposed in the literature, such as the extended Kalman filter [38], which is close related to the local linearization method introduced in Section 2.2.2, see [39].

Markov Chain Monte Carlo
The estimation of the parameters for the continuous time model by means of the Markov Chain Monte Carlo (MCMC) procedure, given the discontinuous data, implies finding the discrete version of the model. Discretizing the model with the Euler-Maruyama approach, as in (2), we have This discrete time approximation of the SDE can be too coarse to approximate the true transition density accurately (see [40] for the strong convergence criterion for SDE). Elerian et al. [21] and Eraker [22] proposed MCMC approaches involving data augmentation, where missing data between two neighbor observations is treated as unknown parameters. Dividing the interval [0, T] into n = mT equidistant points (14), as suggested by [21]. The probability used to determine if the proposal value should be taken as the next item of the chain is given by with probability α and Augmented data in the discretization scheme: the observed data, X t i and X t i+1 , is augmented by introducing (m − 1) unobserved data points.
The mean and covariance matrix of the multivariate Gaussian distribution are obtained by a Newton-Raphson iterative procedure, where the mean is given by the mode of ln f (· | x * t i ,k−1 , x * t i ,k+M ; θ) and the covariance matrix is the negative of the inverse Hessian evaluated at the mode (see [21] for details regarding the gradient and Hessian matrix of the target density).
To complete one cycle of the MCMC sampler, we need to sample θ ∼ π(θ | X, X * ) conditioned on the augmented sample, both the observed states X and the simulated auxiliary states X * . Assuming a non informative prior, the likelihood of the augmented sample, under the Euler-Maruyama discretization scheme, is This method shows accuracy and can be extended to multi-dimensional models-at the cost of further computational demand-and to partially observed processes, such as stochastic volatility models. The main drawbacks are related to its model-specific nature and its more troublesome implementation. Moreover, determining the convergence of the algorithm is not straightforward, nor is the number of parameter draws and the initial iterations to be discarded.

Generalized Method of Moments
The generalized method of moments (GMM), introduced by Hansen [23], is a special case of minimum distance estimation based in moment conditions. Let θ be the vector of parameters, we denote f t (θ) as where u t is an unobservable vector of disturbance terms, z t is a vector of instrumental variables and "⊗" denotes the Kronecker product. We obtain the moment conditions assuming that the error term is uncorrelated with the instrumental variables, thus we have the orthogonality conditions E{ f t (θ)} = 0. Replacing the theoretical moment condition the GMM estimator is one that minimizes a squared Euclidean distance of sample moments from their population counterpart of zero, given by the quadratic form where gives the GMM estimator of θ with the smallest asymptotic covariance matrix, see [23]. For a weighting matrix W n (θ), the GMM estimator iŝ We assume that there are, at least, as many moment functions as parameters and that on the compact parameter space Θ, E f t (θ) = 0 if, and only if θ = θ 0 , to achieve the identification condition for consistency of the GMM estimator. Consistency results and conditions for asymptotic normality are given in ( [41], Theorems 2.6 and 3.1).
To implement the GMM for the diffusion model in (1), we can consider the discretized process The error term can be defined as and the first and second moments under the time period Due to the independence of increments property of the Wiener process, we can define (16) as the moments vector therefore, we have the orthogonality condition E[ f t (θ)] = 0 to construct the GMM estimator of θ. In the rare cases that true moments are known, they should be used instead of their discretized counterpart to avoid discretization bias.
The GMM has a more flexible framework than maximum likelihood methods, as no prior knowledge of the transition density of the SDE is assumed. The simple empirical implementation of method-of-moments type estimators, along a rather low computational cost, have motivated the development of related methods, for instance the GMM. However, these procedures have poor finite sample properties and if moment conditions provide weak parameter identification, which can happen with highly persistent time series, estimates are subject to large finite sample bias. Besides, the occurrence of local minima in the quadratic form (17) can likely lead to optimization problems.

Simulation Study
In this section, a simulation study is conducted to compare the performance of the estimation methods described in this article. The different settings were designed to (i) assess the role of persistence in estimation bias, (ii) compare the accuracy of the procedures, (iii) confirm that T → ∞ plays the role of the discrete-time standard asymptotic of n → ∞, (iv) examine discretization bias and the impact of different sampling frequencies ∆, and (v) study the effect of volatility on the estimators performance.

Experimental Design
We consider two models, one proposed by Vasicek [3] based in the Ornstein-Uhlenbeck [42] process and the CKLS proposed by [6]. The latter lacks a tractable likelihood function, while the former admits a closed-form expression for transition and marginal density, which allows us to perform exact maximum likelihood estimation and avoid simulation errors sampling directly from the continuous time model. The Vasicek model is given by where µ, κ and σ are positive constants and X t 0 = x 0 is the initial condition. The parameter µ represents the long time mean, κ is the speed of mean reversion, σ is the standard deviation of volatility and γ is the elasticity of variance. The Monte Carlo setup consist in the Vasicek and CKLS models, for a low mean reversion scenario (high persistent dependence) and a high mean reversion scenario (low persistent dependence). As it is common to record X t annualized, we use weekly (∆ = 1/52) and monthly (∆ = 1/12) frequency. For each case, we also consider two different volatility scenarios, with increasing unconditional variance.
A thousand realizations of random sample paths {X i∆ } n i=1 are generated for n = 520, 2600, with ∆ = 1/52, which corresponds to weekly data on an observation window of T = 10, 50 years,, respectively, along with sample paths with n = 520 and ∆ = 1/12, which corresponds to monthly data for, approximately, 43 years. With this design we can evaluate the performance of the estimation methods with a larger sample size n and when the total observation time T is increased, keeping the sample size constant. In addition, the different values of ∆ could give rise to discretization bias, which can appear jointly with estimation bias in those methods that rely on discretization schemes. Table 1 shows the eight simulated scenarios for the Vasicek model, along with unconditional and conditional mean and variance, as the analytical density is available. The first two scenarios correspond to the low mean reversion, with increasing unconditional volatility, and the third and fourth are the high mean reversion cases. For the high persistent dependence scenarios (1 and 2) the parameter values are θ 1 = (µ, κ, σ 2 ) = (0.09, 0.2, 4×10 −5 ) and θ 2 = (0.09, 0.2, 4×10 −4 ) , and for the low persistence (scenarios 3 and 4) the parameter are θ 3 = (0.09, 0.9, 1.8×10 −4 ) and θ 4 = (0.09, 0.9, 1.8×10 −3 ) . The marginal density was kept unchanged while varying the speed of mean reversion, therefore scenarios 1 and 3 and scenarios 2 and 4 have the same marginal density, as illustrated in Figure A1. The settings of scenarios 1 and 2 allow us to check the performance in a near unit-root case, as the regressive coefficient of E{X t i+1 | X t i } is 0.996. Scenarios 5-8, although unrelated to real interest rate processes, are limiting cases to quantify the impact of persistence separately from changes in the marginal density. In scenarios 5 and 6 (7 and 8), the mean-reverting force is increased by a factor of 25 (49) from scenarios 1 and 2 and σ is changed in the same proportion to hold the marginal density fixed.
In regards to the scenarios for the CKLS model, a similar scheme was set, as shown in Table 1, keeping for all scenarios the same parameters for the drift function as the Vasicek. First two scenarios correspond to high persistent dependence with parameter values θ 1 = (µ, κ, σ 2 , γ) = (0.09, 0.2, 0.25, 1.5) and θ 2 = (0.09, 0.2, 1, 1.5) , and for the low persistence (scenarios 3 and 4) the parameter are θ 3 = (0.09, 0.9, 0.5, 1.5) and θ 4 = (0.09, 0.9, 2, 1.5) . Figure A1 illustrates the stationary density for scenarios 1 and 2, where the parameter σ 2 was increased by a factor of 4, just like from scenario 3 to 4. Tables 2 and 3 show the simulation study for the Vasicek model (along with Tables A1-A4, deferred to Appendix A), Table 4 shows the computational performance of the methods, while Tables 5 and 6 (and Tables A5 and A6) refer to the CKLS model. The mean of the parameter estimates for the 1000 replications of the experiment is included, as well as standard deviation (SD) and root mean squared error (RMSE) for the estimation methods in Section 2:

Implementation Details
The simulated sample paths for the Vasicek model were constructed from the closedform transition density and for the CKLS model the Milstein scheme was used. To reduce discretization bias, paths were generated with daily frequency (∆ = 1/364) and subsamples were taken on a weekly (∆ = 1/52) or monthly (∆ = 1/12) basis. The initial condition X t 0 was set to be the average and the first 1000 data were discarded, as a burn-in period to remove the dependence on the initial value.
As in the CKLS model a closed form expression for the transition density is not available, the exact maximum likelihood method is not included, as it is unfeasible. The GMM was implemented with four moment conditions, where the first two identify the marginal distribution and the higher order are nonlinear functions of the first two moments. As the Vasicek model does have a closed form for the transition density, the true moments were used. Regarding the MCMC setup, for the Vasicek model the algorithm is iterated 2500 times and the first 500 iterations were discarded. The iterations were increased for the CKLS model as there is an additional parameter to sample, thus 5000 iterations were executed and the first 1000 were discarded. For both models, m = 5 was fixed in the data augmentation step. For all methods, θ was jointly estimated. The drift parameter µ is calculated indirectly, as the drift specification was rewritten to estimate the intercept α = κµ.
Regarding optimization, we chose to minimize the negative log-likelihood function and use the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm [24], where the gradient and Hessian matrix are calculated numerically in the optimization routine. As for the choice of initial values, the approach is the following: we fit a linear regression, writing the discrete version of the SDE as in Equation (13), to obtain a rough estimation of the parameters and set the starting values. For the CKLS diffusion parameter γ, we specified a power variance function for the heteroscedasticity structure.  Since some of the procedures are computationally expensive, we chose to integrate C and C++ code [43] in the R routines. Table 4 shows run-times for each Monte Carlo iteration, with sample size n = 2600, and evidences the computational cost of simulationbased techniques, such as the MCMC. The Kalman filter benefits from using a lower-level programming language like C and has low run-times, though the remainder of the methods have a similar performance in base R, being the GMM the slowest.

Discussion
Tables 2 and 5 report the estimates for scenario 1, for both Vasicek and CKLS models, which features low mean reversion and the unconditional volatility is very small, a quiet process whose estimation can be challenging. The process is nearly unit-root, which can increase the estimation bias in the drift parameter κ, as reported in [44]. This large bias in the estimation of the drift parameter κ, which controls the speed of mean reversion, is encountered in both tables with small sample size n through all estimation procedures, with more than a 200% relative bias (see Figure 2). The RMSE in both models for κ is similar, the other parameters estimates have small RMSE but the parameters in the diffusion function incur in more bias in the CKLS model, as this model is relatively hard to identify as it may yield similar volatility functions for different values of σ and γ, while the Vasicek model has a simpler diffusion function. Both bias and RMSE decrease as n and the total observation time T are increased, especially for κ. As opposed to discrete time series, where the sample size n controls the estimation error, is T who defines the bias and variance of the estimations: the last two columns (weekly and monthly frequency) have similar observation time T but different sample size n, and RMSEs are close. Exact ML is available for the Vasicek model, thus avoiding discretization bias, but nevertheless biases are homogeneous for all methods, with GMM presenting a slightly higher RMSE. When volatility is increased while keeping the same drift function (Tables A1 and A5, in Appendix A), bias is moderately increased for κ and standard deviation is higher than in scenario 1 for the Vasicek model, while reducing for the diffusion parameters in the CKLS model, specially γ. All biases shrink as the sample size and observation time increases. The GMM method shows less efficiency, with a larger RMSE in the CKLS model.
The estimation bias of the drift parameter κ increases when the diffusion process has an absence of dynamics, which happens when κ is small. Scenarios 1 and 2 have a mean reversion parameter κ closer to zero, while scenarios 3 (Tables 3 and 6) and 4 (Tables A2 and A6) show high mean reversion. For the low volatility case, scenario 3, the finite sample bias of κ is significantly smaller (close to 40%, see Figure 2) than the high persistence dependence scenarios for both models and estimation bias and standard deviation of µ is also reduced. For the CKLS model, the estimation bias of σ is reduced and the estimation error of γ is similar to scenario 1. RMSE and bias diminish with increasing n and T, with less RMSE in the weekly scenario for the CKLS model compared with the monthly simulation, which may be due to the discretization error. The higher volatility scenario shows a similar behavior, with smaller bias in the drift parameters than low mean reversion scenarios, but moderately higher in µ with respect to scenario 3, and the estimation errors for the diffusion parameters of the CKLS model are inferior compared to scenario 3. Table 6. Monte Carlo simulation for CKLS model, (µ, κ, σ, γ) = (0.09, 0.9, 0.707, 1.5) , scenario 3: high mean reversion and low volatility. Boldfaces denote the best results in terms of bias, standard deviation and RMSE.  (Table A3) has a mean-reverting force five times higher than scenario 2 while keeping the same marginal distribution. In this setting, the estimation bias shrinks substantially in all schemes, as we are departing from the unit-root case. However, the discretization bias starts arising, as shown in the estimations of DML and KF methods with monthly frequency (note that scenario 5 is not included, as the results were very close and so too were the conclusions drawn) . This is magnified in the limiting case of scenario 8 (Table A4), where the estimation bias is very small in all schemes, but the discretization bias in DML, KF and, less so, in LL for κ and σ is large, mostly in the coarser discretization (monthly), noticeably underestimating both parameters (the performance of the methods was analogous in scenario 7, therefore results are not reported). Figure 3 illustrates true (black) and discretized (gray) log-likelihood for scenarios 1, 3, 6 and 8, where departures from the true log-likelihood are larger for lower mean reversion scenarios, while scenarios 6 and 8 display discretization bias. The analysis of the Monte Carlo evidence reveals the following insights:

Scenario
(i) The dynamic of the process is governed by the drift parameter κ, which determines the persistence of the process by controlling the reversion towards the unconditional mean. As κ → 0, mean reversion goes to zero and correlation between observations approaches one. This increases persistence, which introduces sample bias in parametric estimation [45]. Simulations show that increasing κ lowered persistence and, therefore, estimation bias was also diminished (see Figure 2). High persistence scenarios, near unit-root cases, revealed significant estimation bias in the drift parameter κ, but almost negligible in the diffusion parameter σ. (ii) Increasing the volatility parameters had minor effect on the estimators performance.
In the Vasicek model, higher values in the volatility parameter slightly increased RMSE in the estimation of σ. On the other hand, the estimation of the parameters in the CKLS diffusion function benefit from richer volatility dynamics, reducing RMSE. (iii) In discrete time series, the bias and variance of estimators is controlled by the sample size n, so that they reduce as n → ∞. In continuous-time models sampled at discrete time points, bias and variance in the estimation of the drift parameter κ is dominated by the total observation time T = n∆. Under quite general conditions, the estimators of the drift parameters are of order O(T −1 ), while the diffusion parameter σ is of order n −1 [46]. The simulated scenarios corroborate this, as estimation bias with T = 50 and 43 years were close despite the different frequency (weekly and monthly, respectively) and sample size n (2600 and 520, respectively). (iv) Discretization bias arises in DML and KF methods in scenarios with low sampling frequency and low persistence (see Figure 3), and correcting the DML estimates with local linearization does not always correct the bias and both κ and σ are underestimated. (v) There appears to be similar estimation bias in the drift parameters for both Vasicek and CKLS models. However, the more flexible parametric form of the CKLS volatility function makes estimation more challenging, and bias and RMSE for those parameters are higher than for the Vasicek model. Vasicek model log-likelihood ( ) for the drift parameter κ with known ( θ ) and estimated ( θ ) parameters, where κ 0 is the true value. The true density and its discretized version (∆ = 1/52) is illustrated, along with the estimate obtained (κ EML andκ DML , respectively). Note that scenarios with same κ 0 but different volatility parameter (scenarios 2, 4, 5 and 7, respectively) are not included, as the estimates were very close and figures very similar to the ones displayed. Table 7 provides a summary of the properties and finite sample performance of the methods. Regarding accuracy, differences among the methods are not always clear and the context of application (e.g., the sampling interval) should be considered when choosing the estimation procedure. The HP shows the best trade-off between efficiency and speed-followed by the LL method-and MCMC exhibits good accuracy through all scenarios, however, the inherent time consuming implementation (methodologically and computationally) are major disadvantages. The performance of simpler discretization schemes, like the ones used in DML and KF, is conditioned on the sampling interval and should be avoided for large ∆, otherwise, their efficiency is close to the other alternatives (see Figure 2). The generalized method of moments was outperformed in the majority of scenarios, notably with higher dimensions of the parameter vector θ. Table 7. Summary of estimation procedures for SDE parameters.

Application to Euribor Series
In this section, we consider four data sets corresponding to four maturities (three, six, nine and twelve months) of the Euribor (Euro Interbank Offered Rate) interest rate series. This daily series expand from 15th October 2001 to 30th December 2005 (sample size of n = 1077), see Figure 4. Numerous models have been proposed to capture the dynamics of short-term interest rate, including those by [1,[3][4][5]49]. As these models can be nested within the unrestricted model proposed by [6], we will estimate the parameters with the methods presented in the previous section. We chose a different drift parametrization of the CKLS model in (18) to provide standard errors for all estimates, so none of them are estimated indirectly. The results of the parameter estimation for the CKLS model are shown in Table 8, with the associated standard error in parentheses. The estimates obtained by the different approaches are considerable close, where the generalized method of moments seem to present the most noticeable disparity, mainly in the level effect parameter θ 4 . This was already noticed in [50], where different values of θ 4 where obtained using MLE and a GMM estimator. It is important to note that the standard error associated to the GMM estimation is larger than in the other approaches, which was already manifested in the simulations (see Tables 5 and 6 and Tables A5 and A6). Regarding the estimated values for the different maturities, for all time periods the series are persistent and the main variation comes from the parameters of the volatility function: θ 3 increases with maturity and θ 4 decreases. The parameter θ 4 controls the relationship between the interest rate and the volatility, for all series we have θ 4 > 1 which indicates that volatility tends to increase as the rate rises. The estimation in the Euribor 12 months for θ 4 is close to 1, which would correspond to the diffusion process proposed by Brennan and Schwartz [49].
As goodness-of-fit test for diffusion processes are available in the literature, we will test the parametric form of the drift and diffusion functions estimated for the Euribor series. Table 9 shows the p-values for the goodness-of-fit test suggested by Monsalve-Cobis et al. [51]. The empirical p-value is significant for the drift function in every maturity. Conversely, the p-value for the volatility function leads to a strong rejection of the null hypothesis, implying that the model is inadequate to explain the volatility of the series, for every maturity.
To further analyze the fitted CKLS model, we will use a resampling procedure in the context of state space models (see Section 2.4) as it can provide insight into the validity of the model. The bootstrap technique developed in [52] is easily implemented with the innovations form of the Kalman filter and allows us to approximate the sampling distributions of the parameter estimates. Inference in state space models estimated using the Kalman filter is feasible because there exists an asymptotic theory, as seen in Section 2.4, under general conditions, the parameter estimates of a state space model are consistent and asymptotically normal. Focusing on the estimates for the Euribor 3 months, Table 10 shows the standard errors obtained from B = 1000 bootstrap resamples, along with the asymptotic standard errors and parameter estimates, and Figure 5 illustrates the bootstrap distribution (histogram) and the asymptotic Gaussian distribution (dashed line) ofθ i , with i ∈ {1, 2, 3, 4}. Regarding the drift parameters (θ 1 andθ 2 ), the bootstrap and asymptotic distributions are close, however, for the diffusion parameters (θ 3 andθ 4 ), both distributions differ. This is consistent with the conclusions drawn from the goodness-of-fit test (see Table 9), where the parametric form of the drift was not rejected, but the diffusion function lead to a strong rejection. The bootstrap standard error ofθ 3 andθ 4 is notable larger and the histograms show a slightly skewed distribution. This implies that the CKLS diffusion function specification is not able to explain the dynamics of the Euribor series, although the linear drift with mean reversion seems to be adequate. A deterministic parametric form of the diffusion function might be incapable of capturing the volatility behavior and a more intricate form, such as stochastic volatility, may be more suitable, which was already pointed out in the econometric literature, that find data are more accurately represented by a stochastic volatility model. Table 8. Estimated parameters and standard errors (in parentheses) for the CKLS model, dX t = (θ 1 − θ 2 X t ) dt + θ 3 X θ 4 t dW t , fitted to Euribor series using six estimation methods.

Methodθ
t dW t , fitted to Euribor 3 months series.

Conclusions
We reviewed parametric estimation methods for univariate time-homogeneous SDEs. Interest rate time series are highly persistent and this strong correlation through time challenges estimation, as its discretized counterpart corresponds to a near unit-root model. To address the problem of estimation and discretization bias, a comparative study of estimation methods was discussed under different settings. Based on the analysis of the simulation results, the following conclusions can be reached. First, estimation bias is large for the drift parameter κ, which controls the speed of mean-reversion, though smaller for the diffusion parameters. Lack of dynamics, that emerges when κ is small, increases this bias. Discretization bias was discerned in very low persistence scenarios with coarse discretization frequency. Second, the parameter of the diffusion function was accurately estimated in the Vasicek model, but the performance in the CKLS model was worse. Increasing the observation time T did reduce the bias, as expected, and increasing conditional volatility resulted in more accurate estimates. Third, estimation bias and variance is reduced as T = n∆ → ∞, rather than when only the sample size n is increased. This was illustrated in the simulations, where scenarios with different sample size n but similar T had comparable performances. Finally, regarding the parameter estimators, the GMM is the least efficient, while discrete maximum likelihood methods performed similarly, with LL and HP improving discretization bias over DML and KF. The MCMC method also provides efficient estimates; however, the more ad hoc implementation, highly model-dependent, makes its use more intricate than the other methods. All the procedures reviewed in this article are provided in the companion estsde R package. Acknowledgments: The authors would like to thank two anonymous referees for their insightful suggestions and comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The appendix contains the tabulated results of the simulation study for the Vasicek and CKLS models. The Vasicek scenarios included here are 2, 4, 6 and 8 (see Table 1), which all of them have the same marginal distribution (see Figure A1), and scenarios 2 and 4 of the CKLS model.