Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

Biqing Cai; Dag Tjøstheim

doi:10.3390/econometrics3020265

and

Department of Mathematics, University of Bergen, 5020 Bergen, Norway

^*

Author to whom correspondence should be addressed.

Econometrics2015, 3(2), 265-288;https://doi.org/10.3390/econometrics3020265

This article belongs to the Special Issue Non-Linear Regression Modeling

Version Notes

Order Reprints

Abstract

This paper discusses nonparametric kernel regression with the regressor being a d-dimensional β-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate

\sqrt{n (T) h^{d}}

, where

n (T)

is the number of regenerations for a β-null recurrent process and the limiting distribution (with proper normalization) is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

Keywords:

β-null recurrent; cointegration; conditional heteroscedasticity; Markov chain; nonparametric regression

JEL classifications:

C13; C14; C22; E43

1. Introduction

The interplay of nonlinearity and nonstationarity has been an important topic in recent developments of econometrics. Karlsen and Tjøstheim [1] and Karlsen et al. [2], respectively, discuss the asymptotics for nonparametric estimation of autoregression and cointegrating regression when the regressor is a β-null recurrent Markov process. Using different data generating assumptions (i.e., the regressor is a unit root process with innovations being a linear process), Wang and Phillips [3,4] discuss asymptotics for nonparametric estimation for nonlinear cointegrating regression models. The two frameworks have their own advantages and drawbacks. The β-null recurrence framework generalizes the unit root framework by incorporating more kinds of processes than the unit root process although it encounters some other restrictions. For example, the processes need to be Markov. For more discussion of linkage and difference of these two frameworks, we refer to [5].

The papers [1,2,3,4] focus on nonparametric estimation when the regressor is a univariate process. The issue of estimation when the regressor is a multivariate process has received less attention. As argued by Park and Phillips [6], the difficulty of extending the theory from a univariate regressor to a multivariate regressor is due to the fact that the recurrence property of a higher dimensional random walk process is different from the one dimensional random walk. Dong et al. [7] provide an intuitive example showing that the nonparametric estimate, when the regressor is a bivariate independent random walk, is not consistent. One way to avoid this problem is to use semi-parametric models rather than nonparametric models. For example, Schienle [8] considers an additive model rather than a pure nonparametric model to avoid this problem while in [9] a partial linear model is considered. However, nonparametric estimation is still possible in the multivariate case when the regressors are not independent random walks. Gao and Phillips [10] provide the theory of nonparametric estimation for multivariate regressors when one regressor is a unit root process and the other regressors are stationary processes. The reason why the model setup of [10] works in nonparametric estimation while the two-dimensional random walk does not is due to the fact that a one-dimensional random walk together with a multi-dimensional positive recurrent process form a

1 / 2

-null recurrent system while the two dimensional random walk process is null recurrent but not β-null recurrent for any

β \in (0, 1)

. As discussed in [1], the β-null recurrence property plays a vital role to guarantee validity of nonparametric estimates.

In this paper, we introduce the theory of nonparametric estimation for a multivariate β-null recurrent system. The multivariate β-null recurrent processes include but are not restricted to the case of [10]. For example, our theory can cover the case where two regressors are both random walks but at the same time are cointegrated which is not covered by [10]. This will be discussed in more detail in Section 2. The cointegrated case is of importance in economics because it is well known that many macroeconomic time series are nonstationary but cointegrated such that they are driven by a common stochastic trend. Furthermore, in this paper, we use different mathematical techniques compared to [10]. In their paper, the technique of local time approximation for partial sums of functionals of unit root process is used, while in our paper, we use the Markov chain null recurrence framework.

It is well known that for the nonparametric kernel estimation for a d-dimensional stationary process, the convergence rate is

\sqrt{T h^{d}}

with h being the bandwidth. In our model, the convergence rate is

\sqrt{n (T) h^{d}}

, where

n (T)

is the number of regenerations of the β-null recurrent process which is discussed in more detail in [1] (see also Appendix A). The difference of these two rates is due to the fact that for β-null recurrent processes, the number of observations in a small set (we refer to p. 376 in [1] for the definition and discussion of the small set) is

O_{P} (T^{β} L_{s} (T))

rather than

O_{P} (T)

(in the stationary case), where

L_{s} (.)

denotes a function slowly varying at infinity (cf. p. 6, [11]).

Furthermore, unlike [2], which assumes the regression error has constant variance, we allow existence of conditional heteroscedasticity. This is important for economic or financial time series modeling because many of these series are regarded to contain conditional heteroscedasticity (cf. [12,13,14]). In [15,16], estimation of conditional variance functions in autoregressive and regression models are discussed when the data are stationary. Wang and Wang [17] discuss nonparametric estimation of conditional mean and variance function when the regressor is a unit root process. Our paper is different from [17] in two ways: first, in our model, the regressor is multivariate rather than univariate; second, we employ the Markov β-null recurrence technique which is different from the local time approximation technique used in [17].

The rest of this paper is organized as follows. In Section 2, we introduce the model and the nonparametric estimate; in Section 3, the asymptotic properties for the estimator will be provided; in Section 4, Monte Carlo simulations will be conducted to examine the finite sample performance of the estimator; in Section 5, we apply the method to estimate the relationship of Federal funds rate with short term and long term T-bill rates; in Section 6, concluding remarks are made. To make this paper self-contained, we provide some basic notations and theory of Markov processes, especially the β-null recurrent processes in Appendix A. The mathematical proofs are contained in Appendix B.

Throughout this paper, all limits are taken “as

T \to \infty

” where T is the sample size,

\to_{d}

denotes weak convergence,

\to_{p}

denotes convergence in probability.

O_{P} (.)

means stochastic order same as,

o_{P} (.)

means stochastic order less than.

2. Model and Estimation

We are going to discuss estimation for the following model

y_{t} = g (x_{t}) + ε_{t},

(1)

where

t = 1, \dots, T

,

{x_{t}} = {x_{1, t}, \dots, x_{d, t}}^{τ}

is a d-dimensional β-null recurrent process (see Appendix A for a precise definition),

ε_{t} = σ (x_{t}) e_{t}

with

{e_{t}}

being a positive recurrent process and

σ (.)

being a positive function. When the data are stationary, the model (1) has been widely studied, see, e.g., [15,16] (with univariate regressor) and [18] (with multivariate regressor). Recently, Wang and Wang [17] study the estimation of model (1) with

{x_{t}}

being a univariate unit root process. As we have mentioned in the introduction, our paper has important differences from their paper.

The examples of univariate β-null recurrent processes with

β = 1 / 2

include the random walk process with the innovation having second moment (cf. [19]); the threshold unit root model (cf. [20]) with arbitrary behavior in a compact set and unit root behavior outside the compact set. Moreover, under some regularity conditions, it has been shown that several multivariate Markov processes are β-null recurrent. For example, when

d = 2

, the following models of

{x_{1, t}, x_{2, t}}

are 1/2-null recurrent:

Example 1: ${x_{1, t}}$ is a 1/2-null recurrent process and ${x_{2, t}}$ is a positive recurrent process independent of ${x_{1, t}}$ . This is proved by Lemma 3.1 of [2]. In fact, the independence assumption of ${x_{1, t}}$ and ${x_{2, t}}$ can be relaxed to asymptotic independence. We refer to Example 4.1 of [2] for more discussion of this.
Example 2: ${x_{1, t}}$ and ${x_{2, t}}$ are both unit root processes and cointegrated. More specifically

$x_{t} = A x_{t - 1} + e_{t},$

where ${e_{t}}$ is a bivariate i.i.d. process satisfying some regularity conditions, A is a $2 \times 2$ matrix having one eigenvalue equal to 1 and the other eigenvalue with absolute value less than 1. Myklebust et al. [5] show that in this model, ${x_{t}}$ is $1 / 2$ -null recurrent.
Example 3: ${x_{t}}$ can be threshold cointegrated process. Consider the model

$x_{t} = A x_{t - 1} I (x_{t - 1} \in C^{c}) + B x_{t - 1} I (x_{t - 1} \in C) + e_{t},$

where C is a compact set in $R^{2}$ , A is a $2 \times 2$ matrix having one eigenvalue equal to 1 and the other eigenvalue with absolute value less than 1, B is an arbitrary matrix and ${e_{t}}$ is a bivariate i.i.d. process satisfying some regularity conditions. Cai et al. [21] prove that ${x_{t}}$ is $1 / 2$ -null recurrent under this model setup.
Example 4: ${x_{t}}$ is generated from $x_{2, t} = f (x_{1, t}) + u_{t}$ , with ${x_{1, t}}$ being a 1/2-recurrent process and ${u_{t}}$ being an i.i.d. sequence and independent of ${x_{1, t}}$ . This is the nonlinear cointegration type model of [2].

Remark 1:

The cases discussed can be extended to dimension higher than 2. For example, Myklebust et al. [5] show that a d-dimensional VAR(1) model is 1/2-null recurrent if the autoregressive matrix has one eigenvalue equal to 1 and the other eigenvalues with absolute values less than 1. By Theorem 2 of [5], it is shown that one-to-one transformation of a β-null recurrent process is also a β-null recurrent process.

Remark 2:

We can see that the model of [10] is related to Example 1. Our methodology can be applied to other models listed above.

We propose to estimate the functional form

g (x)

at

(x_{1}, \dots, x_{d})

by the conventional local constant method through minimizing

\frac{1}{T} \sum_{t = 1}^{T} {(y_{t} - α)}^{2} K (\frac{x_{t} - x}{h}) over α = g (x_{1}, \dots, x_{d}),

(2)

where

K (\frac{x_{t} - x}{h}) = Π_{i = 1}^{d} k_{i} (\frac{x_{i, t} - x_{i}}{h})

with

k_{i} (.)

being univariate kernel functions and h being the bandwidth parameter1.

Equation (2) implies the resulting estimate is given by

\hat{g} (x) = \frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) y_{t}}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})} .

(3)

So that we have

\begin{matrix} \hat{g} (x) - g (x) = \frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x))}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})} + \frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) ε_{t}}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})} \\ \equiv I_{1} + I_{2}, \end{matrix}

(4)

where

I_{1}

and

I_{2}

are respectively the bias and variance terms for the nonparametric estimate.

3. Asymptotic Theory

To study the asymptotics for the estimate (3), we need to introduce some technical assumptions.

A.1.

{x_{t}} = {x_{1, t}, \dots, x_{d, t}}

is a d-dimensional Harris β-null recurrent Markov chain. Let

π_{s} (.)

be an invariant measure of the recurrent process admitting a locally Lipschitz continuous density

p_{s} (.)

which is locally bounded.

σ (.)

is a locally bounded, positive and Lipschitz continuous function such that for a vector x, there exists a constant C such that when y is in a neighbourhood of x, we have

∣ σ (y) - σ (x) ∣ < C ∥ x - y ∥

, where

∥ . ∥

is the Euclidean norm.

{e_{t}}

is an i.i.d. sequence independent of

{x_{t}}

with

E (e_{1}) = 0

,

E (e_{1}^{2}) = 1

and

E (| e_{1} |^{4 + δ}) < \infty

for some

δ > 0

.

A.2.

For any given x,

g (x)

is twice continuously differentiable and the second order partial derivatives are locally bounded and Lipschitz continuous, i.e.,

∣ \frac{\partial^{2} g (x)}{\partial x_{i} x_{j}} - \frac{\partial^{2} g (y)}{\partial x_{i} x_{j}} ∣ < C ∥ x - y ∥

when y is in a neighbourhood of x,

i = 1, \dots, d

and

j = 1, \dots, d

.

A.3.

For

i = 1, \dots, d

, each

k_{i}

is a symmetric, nonnegative and bounded probability density function with compact support. Furthermore, the support of the kernel functions are small sets.

A.4.

h \to 0

as

T \to \infty

,

u (T) h^{d} \to \infty

as

T \to \infty

, and

h^{2} (\sqrt{u (T) h^{d}}) \to 0

as

T \to \infty

, where

u (T) = T^{β} L_{s} (T)

, see Appendix A (cf. [1]), which is associated with

n (T)

, where

n (T)

is the number of regenerations for the null recurrent Markov chain.

Remark:

Assumption A.1 restricts the regressors to be a β-null recurrent system with some of the examples having been given in Section 2. The assumption on the error term is quite common in the literature, see, e.g., [18]. As shown in Lemma 3.1 of [2], the compound process

{x_{t}, e_{t}}

is also a β-null recurrent process. It is possible to relax the assumption on

{e_{t}}

such that endogeneity and autocorrelation are involved by applying some results in [2]. For the current paper, we use this assumption for illustrative purpose. Assumptions A.2 and A.3 are often used in nonparametric kernel estimation problems. We assume the support of the kernel function to be compact and a small set as in [1]. The bandwidth restriction in Assumption A.4 ensures that the nonparametric estimator is consistent and the estimation bias converges to 0 in probability.

To derive the asymptotic theory for the nonparametric estimator, we need the following three lemmas.

Lemma 1.

Under assumptions A.1–A.4,

\frac{1}{n (T) h^{d}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) \to_{p} p_{s} (x) .

Remark.

Lemma 1 shows that the denominator of the nonparametric estimator converges to an invariant density of the null recurrent process which is different from the positive recurrent case where the denominator converges to the probability density of the process. In Example 1 of Section 2,

p_{s} (x) = p_{s_{1}} (x_{1}) \times p_{s_{2}} (x_{2})

, where

p_{s_{1}} (x_{1})

is an invariant density of

{x_{1, t}}

and

p_{s_{2}} (x_{2})

is the (unique) invariant stationary density of

{x_{2, t}}

.

Lemma 2.

Under assumptions A.1–A.4,

\frac{1}{\sqrt{n (T) h^{d}}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) ε_{t} \to_{d} N (0, p_{s} (x) σ^{2} (x) \int K^{2} (u) d u),

where

N (., .)

denotes a normal variable and

\int K^{2} (u) d u = Π_{i} \int k_{i}^{2} (v) d v

.

Lemma 3.

Under assumptions A.1–A.4,

\frac{1}{n (T) h^{d}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x)) = O_{P} (h^{2}) .

After proving Lemmas 1–3, we can derive the asymptotic distribution for

\hat{g} (x) - g (x)

. We have the following theorem

Theorem 1.

Under assumptions A.1–A.4,

\sqrt{n (T) h^{d}} (\hat{g} (x) - g (x)) \to_{d} N (0, p_{s}^{- 1} (x) σ^{2} (x) \int K^{2} (u) d u) .

Remark 1.

In this theorem, stochastic normalization is used. As suggested by Equation (A.3) in Appendix A, we also have

\sqrt{u (T) h^{d}} (\hat{g} (x) - g (x)) \to_{d} M N (0, M_{β}^{- 1} (1) p_{s}^{- 1} (x) σ^{2} (x) \int K^{2} (u) d u),

where

M N (.)

denotes a mixed-normal variable. See also the discussion in [1].

Remark 2.

Combining this theorem and Lemma 1, we have

\sqrt{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})} (\hat{g} (x) - g (x)) \to_{d} N (0, σ^{2} (x) \int K^{2} (u) d u) .

This self-normalization quantity is the same as that used in [2].

\int K^{2} (u) d u

is known when a specific kernel function is used2. So that for statistical inference purpose or construction of confidence band, we need to estimate

σ^{2} (x)

.

In [15,16], different methods for the variance estimation are proposed. Similar to [4] and [17], we estimate this quantity by a localized version of the usual residual-based method, i.e.,

{\hat{σ}}^{2} (x) = \frac{\sum_{t = 1}^{T} {(y_{t} - \hat{g} (x))}^{2} K (\frac{x_{t} - x}{h_{σ}})}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}})},

(5)

where

h_{σ}

is another bandwidth. This is a two-step estimator which corresponds to local constant regression of the square of the residuals on the regressors. To investigate Equation (5), we impose the following assumption on

h_{σ}

.

A.5.

h_{σ}

satisfies same restrictions as h in Assumption A.4.

Then we have following theorem.

Theorem 2.

Under assumptions A.1–A.5,

{\hat{σ}}^{2} (x) \to_{p} σ^{2} (x) .

Remark 1.

Theorem 2 shows that the estimator is consistent. In this paper, we focus on the mean function estimation. The investigation of more efficient estimation of

σ^{2} (x)

(cf. [16]) is left for future research.

Remark 2.

Combining Theorem 1 and Theorem 2, and using the Slutsky theorem, we have

\sqrt{\frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})}{{\hat{σ}}^{2} (x) \int K^{2} (u) d u}} (\hat{g} (x) - g (x)) \to_{d} N (0, 1) .

(6)

Thus we can construct the

95 %

confidence interval for the mean function estimator as

\hat{g} (x) \pm 1.96 \times \sqrt{\frac{{\hat{σ}}^{2} (x) \int K^{2} (u) d u}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})}} .

(7)

4. Monte Carlo Simulation

In this section, we conduct Monte Carlo simulations to evaluate the finite sample performance of the nonparametric estimator. We focus on the case where

d = 2

. The performance of nonparametric estimation is evaluated by the root mean squared error (RMSE), which is defined by:

R M S E = \sqrt{\frac{1}{N} \frac{1}{T} \sum_{t = 1}^{T} \sum_{n = 1}^{N} {(\hat{g} (x_{1, t_{n}}, x_{2, t_{n}}) - g (x_{1, t_{n}}, x_{2, t_{n}}))}^{2}},

(8)

where

x_{i, t_{n}}

with

i = 1

or 2 denotes the observation at time-t in the n-th replication with total replication number

N = 1000

. For both variables, the Quartic kernel (i.e.,

k (u) = \frac{15}{16} {(1 - u^{2})}^{2} 1 (| u | \leq 1)

) is used. It is well known that the kernel selection plays little role in performance of nonparametric estimation. Bandwidth selection plays an important role in the performance of nonparametric estimates such that a large bandwidth may lead to large bias but small variance and vice versa. From the theoretical analysis, we know that the variance of the estimator is of order

{(T^{β} L_{s} (T) h^{d})}^{- 1}

and the square of bias is of order

h^{4}

. So that the optimal bandwidth minimizing the mean square error (MSE) for the estimator should be of order

{(T^{- β} L_{s}^{- 1} (T))}^{\frac{1}{4 + d}}

.

In the simulation, we concentrate on the case

β = 1 / 2

and

d = 2

, so that we report the simulation results with bandwidth being

c T^{- 1 / 12}

with some pre-specified values of c. We also report the results when

h = h^{*}

, with

h^{*}

chosen by the leave-one-out cross validation method, which is widely used when the data is positive recurrent (cf. p. 50, [22]). From the simulation results below, we can find that the method performs quite well even when the regressors are null recurrent.

Specifically, we consider following models:

Model 1:

{x_{1, t}} \sim i . i . d . N (0, 1),

x_{2, t} = x_{2, t - 1} + u_{t},

with

{u_{t}} \sim i . i . d . N (0, 1)

y_{t} = g (x_{1, t}, x_{2, t}) + ε_{t},

where

{ε_{t}} \sim i . i . d . N (0, 1)

and is independent of

{x_{1, t}}

and

{x_{2, t}}

. In this model,

{x_{1, t}}

is

i . i . d .

,

{x_{2, t}}

is a random walk and

{x_{t}}

is a 1/2-null recurrent process because it is a special case of Example 1 in Section 2 (it is also a special case of [10]). We let

g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}

or

g (x_{1}, x_{2}) = x_{1} \times \frac{1}{1 + x_{2}^{2}}

. The results for Model 1 are summarized in Table 1.

Table 1. RMSEs for Model 1.

**Table 1.** RMSEs for Model 1.
Functional Form	c	T = 200	T = 400	T = 800
	1	0.3707	0.2996	0.2427
	2	0.3046	0.2678	0.2357
$g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}$	3	0.4104	0.3777	0.3454
	4	0.5490	0.5131	0.4770
	$h^{*}$	0.3183	0.2629	0.2192
	1	0.3636	0.2915	0.2360
	2	0.2456	0.2071	0.1765
$g (x_{1}, x_{2}) = x_{1} \times \frac{1}{1 + x_{2}^{2}}$	3	0.2295	0.2010	0.1750
	4	0.2425	0.2111	0.1818
	$h^{*}$	0.2471	0.2054	0.1744

From Table 1, we can see that because of the trade-off between bias and variance, either a too large or a too small bandwidth will make the estimator less precise by the RMSE criterion. The bandwidth selected by the cross validation method balances the bias and variance and performs reasonable especially when the sample size is large.

To further assess the finite sample approximation, we compare the normalized quantity in Equation (6) with the standard normal density. We compute the quantity at

[x_{1} x_{2}] = [0 0]

, the sample sizes are respectively 200, 400 and 800, the bandwidth used is

2 T^{- 1 / 12}

, the functional form is

g (x_{1}, x_{2}) = x_{1} \times \frac{1}{1 + x_{2}^{2}}

, the variance is 1 (we use the true function of the variance rather than the estimated quantity), and the replication number is 1000.

Figure 1. In (a) we compute

\sqrt{\frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})}{\int K {(u)}^{2} d u}} (\hat{g} (x) - g (x))

at point

[0 0]

in each replication; in (b) we compute the normalized quantity in

\sqrt{\frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})}{\int K {(u)}^{2} d u}} (\hat{g} (x) - g (x))

at the median of each replication. The number of replications is 1000.

From Figure 1a above, we can see that the approximation to normality is quite good. However, this is not always the case. For other choices of points or functional forms for evaluation, the performance may be much worse. For example, we find that when the true functional form is

g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}

, there is a systematic bias in the estimation when the evaluation point is

[0 0]

. This phenomenon looks strange at first glance, however, it is typical in the situation when the regressors are not positive recurrent. When the regressors are null recurrent, the simulated realizations may cover very different regions of the x-axis. Hence, for a fixed evaluation point, for some replications, there may be many observations in the neighbourhood while for other replications, there may be few observations (see more discussion in [1]). Figure 1b provides the finite sample approximation of the quantity using different evaluation points for different replications, i.e., for each replication, the evaluation point is the median of the observations.

In the second model, we assume

{x_{t}}

is a cointegrated process. Specifically, the model is:

Model 2:

x_{t} = A x_{t - 1} + u_{t},

with

{u_{t}} \sim i . i . d . N (0, Σ)

with

Σ = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})

,

A = (\begin{matrix} 3 / 4 & 1 / 4 \\ 1 / 4 & 3 / 4 \end{matrix})

.

y_{t} = g (x_{1, t}, x_{2, t}) + ε_{t},

where

{ε_{t}} \sim i . i . d . N (0, 1)

and is independent of

{x_{1, t}}

and

{x_{2, t}}

. In this model,

{x_{t}}

is 1/2-null recurrent process. We let

g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}

or

g (x_{1}, x_{2}) = x_{1} \times \frac{1}{1 + x_{2}^{2}}

. This model falls into the category of Example 2 in Section 2. The results for Model 2 are summarized in Table 2.

Table 2. RMSEs for Model 2.

**Table 2.** RMSEs for Model 2.
Functional Form	c	T = 200	T = 400	T = 800
	0.1	0.5098	0.3891	0.2922
	0.2	0.3853	0.2914	0.2234
$g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}$	0.4	0.3038	0.2518	0.2437
	0.8	0.3481	0.3892	0.4826
	$h^{*}$	0.2941	0.2445	0.2009
	0.5	0.2513	0.1869	0.1412
	1	0.1971	0.1560	0.1323
	2	0.1890	0.1683	0.1467
$g (x_{1}, x_{2}) = x_{1} \times \frac{1}{1 + x_{2}^{2}}$	3	0.2044	0.1773	0.1583
	4	0.2111	0.1864	0.1654
	$h^{*}$	0.1991	0.1637	0.1334

From Table 2, we can see that for different data generating mechanism of

{x_{t}}

or function form of g, the choice of h should be quite different. In addition, the cross validation method serves as a good choice.

Next, we consider the model where the regressors are threshold cointegrated.

Model 3:

x_{t} = A x_{t - 1} 1 (x_{t - 1} \in D) + B x_{t - 1} 1 (x_{t - 1} \in D^{c}) + u_{t},

with

{u_{t}} \sim i . i . d . N (0, Σ)

with

Σ = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})

,

B = (\begin{matrix} 3 / 4 & 1 / 8 \\ 1 & 1 / 2 \end{matrix})

,

A = (\begin{matrix} 3 / 4 & 1 / 4 \\ 1 / 4 & 3 / 4 \end{matrix})

,

D = [τ_{1} τ_{2}] \times [τ_{3} τ_{4}] = [- 3 3] \times [- 2 2]

and

y_{t} = g (x_{1, t}, x_{2, t}) + ε_{t},

where

{ε_{t}} \sim i . i . d . N (0, 1)

and is independent of

{x_{1, t}}

and

{x_{2, t}}

. We know in this model,

{x_{1, t}}

and

{x_{2, t}}

are threshold cointegrated processes with different cointegrated coefficients in different regimes. According to [21], they form a bivariate 1/2-null recurrent system. We let

g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}

or

g (x_{1}, x_{2}) = x_{1} \times \frac{1}{1 + x_{2}^{2}}

. The results for Model 3 are summarized in Table 3.

Table 3. RMSEs for Model 3.

**Table 3.** RMSEs for Model 3.
Functional Form	c	T = 200	T = 400	T = 800
	0.1	0.8941	0.7909	0.6446
	0.5	0.4422	0.3715	0.3806
$g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}$	1	0.4502	0.5221	0.6176
	2	0.7344	0.8827	1.1070
	$h^{*}$	0.4104	0.3410	0.3031
	0.5	0.4230	0.3037	0.2298
	1	0.3042	0.2625	0.2417
$g (x_{1}, x_{2}) = x_{1} \times \frac{1}{1 + x_{2}^{2}}$	2	0.3411	0.3173	0.2874
	3	0.3732	0.3412	0.3042
	$h^{*}$	0.3175	0.2564	0.2155

Our conclusion for Table 3 is same as that for Table 2.

The above three models assume the error term to be homoscedastic. In the final example, heteroscedasticity will be taken into account. The model is:

Model 4:

{x_{1, t}} \sim i . i . d . N (0, 1),

x_{2, t} = x_{2, t - 1} + u_{t},

with

{u_{t}} \sim i . i . d . N (0, 1)

y_{t} = g (x_{1, t}, x_{2, t}) + σ (x_{1, t}, x_{2, t}) e_{t},

where

{e_{t}} \sim i . i . d . N (0, 1)

and is independent of

{x_{1, t}}

and

{x_{2, t}}

. We let

g (x_{1}, x_{2}) = x_{1} + \frac{1}{1 + x_{2}^{2}}

with

σ^{2} (x_{1}, x_{2}) = \frac{x_{1}^{2}}{1 + x_{2}^{2}}

. We estimate the conditional variance function using Equation (5) and then calculate the RMSE. In our simulation, the same bandwidth for the mean and variance functions is used.

The results for Model 4 are summarized in Table 4.

Table 4. RMSEs for Model 4.

**Table 4.** RMSEs for Model 4.
Functions	c	T = 200	T = 400	T = 800
	1	0.2224	0.1796	0.1542
	2	0.2615	0.2352	0.2140
$g (.)$	3	0.3955	0.3674	0.3395
	4	0.5415	0.5082	0.4741
	$h^{*}$	0.2837	0.2565	0.2167
	1	0.3890	0.3196	0.2689
	2	0.3507	0.2996	0.2651
$σ^{2} (.)$	3	0.3788	0.3369	0.2971
	4	0.4321	0.3897	0.3410
	$h^{*}$	0.7900	0.3686	0.3093

From Table 4, we can see that in general the cross validation method is not as good as in the homoscedasticity case. For some choice of fixed bandwidth, the RMSEs for mean and variance functions can be smaller. However, the cross validation is still a reasonable choice because in practice we do not know the true DGP which makes it difficult to use some pre-specified bandwidth.

In summary, we can see that the choice of bandwidth has large impact on the performance of the nonparametric estimates. The leave-one-out cross validation method is still a reasonable option for the multivariate nonparametric estimates with β-null recurrent regressor. In the empirical analysis in the next section, this method will be used.

5. Empirical Application to the Relationship of Interest Rates

In this section, we apply the proposed method to study the relationship of three interest rates: the effective Federal funds rate (FF), 3-month Treasure bill rate (TB3m) and 5-year Treasure bill rate (TB5y). The Federal Reserve (Fed) implements monetary policy by targeting the effective FF (see, e.g., [23]); the TB3m is a preeminent risk-free rate in the U.S. money market and is often used by researchers as a proxy for the risk-free asset (see, e.g., [24]), and TB5y is often used to represent the long term interest rate (see, e.g., [25]).

In the literature (cf. [23,26]), it is often argued that the interest rates move together according to the expectation hypothesis (EH) such that the Treasure bill rates are equal to market’s expectation for the FF over the term of TB rates plus a risk premium. So that according to conventional EH/montary policy views, FF “anchors" the U.S. money market. However, according to [27], the short run T-bill rate adjusts before the FF, rather than vice versa. This is possibly due to the fact that if the market anticipates changes in the FF, the T-bill rates will move in advance of the Federal funds rate. Thus, the market should have anticipated the information of the FF before its announcement. Under the same reasoning, if the short term T-bill rate contains short term information of the market, we expect that the long term T-bill rate contains long term information of the market. So that both short term and long term T-bill rates will influence the FF.

We use monthly data of FF, TB3m and TB5y from the website of the Federal Reserve Bank of St Louis. The sample period is from January 1962 to May 2014 and the sample size is 629. The ADF test suggests that all the three series contain unit roots with p-values for TB3m, TB5y and FF being respectively 0.2807, 0.3513 and 0.2648. We also perform ADF test for the interest rate differential series

{T B 3 m_{t} - T B 5 y_{t}}

and the result suggests that the series is stationary with p-value being less than 0.01. This means that the short term and long term T-bill rates are cointegrated with cointegrated vector [1 −1] 3. Thus TB3m and TB5y form a 1/2-recurrent system so that we can apply the proposed method to estimate the relationship of FF with TB3m and TB5y 4.

The time series plot of the three series is shown in Figure 2.

Figure 2. Time series plot of FF, TB3m and TB5y: 1962–2014.

From Figure 2, we can see that the three series move together. In particular, FF and TB3m seem to have a close relationship.

To study the relationship of the variables, in an exploratory phase we first examine the relationship of FF with TB5y when TB3m is fixed. Specifically, we plot the graph of fitted values against TB5y when TB3m is 4 or 8. Notice that because TB5y and TB3m are cointegrated, when TB3m is fixed, we can only study the relationship of FF with TB5y when TB5y is within some small regions. Otherwise, there will be insufficient observations for the nonparametric estimate5. So that when TB3m is 4, we study the relationship with TB5y within

[3.8 4.4]

and when TB3m is 8, we study the relationship with TB5y within

[7.8 8.4]

. In addition, we can plot the

95 %

point-wise confidence intervals according to Equation (7). The results are shown in Figure 3.

Figure 3. In (a) we estimate the relationship of FF and TB5y when TB3m is 4; in (b) we estimate the relationship of FF and TB5y when TB3m is 8.

Figure 3 indicates that the relationship of FF with TB5y is nonlinear and the relationship is largely affected by TB3m.

Similarly, we can examine the relationship of FF with TB3m when TB5y is fixed. When TB5y is 4, we study the relationship with TB3m within

[3.7 4.3]

and when TB5y is 8, we study the relationship with TB3m within

[7.7 8.3]

. The results are shown in Figure 4.

Figure 4. In (a) we estimate the relationship of FF and TB3m when TB5y is 4; in (b) we estimate the relationship of FF and TB3m when TB5y is 8.

From Figure 4, we can see that the relationship of FF with TB3m is nonlinear and different TB5y does not make large difference in the relationship.

In summary, the pairwise exploratory phase suggests that the relationship of FF and TB3m and TB5y may not be linear. The nonlinearity may be due to transaction cost as suggested by [28] or the policy interventions as suggested by [30].

Next, we estimate our model (1) by estimating g nonparametrically. We compare the in-sample mean square error of the nonparametric model with the linear model. Define the following MSE

M S E = \frac{1}{T} \sum_{t = 1}^{T} {(F F_{t} - {\hat{F F}}_{t})}^{2},

where

\hat{F F}

is the fitted value from the nonparametric model or the linear model (regress

F F

on

T B 3 m

,

T B 5 y

and a constant). The MSE for the nonparametric model is 0.0995 while the MSE for the linear model is 0.2802. It can be seen that the nonparametric model outperforms the linear model.

However, it is well known that comparison based on the in-sample performance as done above is sensitive to outliers and data mining, see, e.g., [31]. Empirical evidence of out-of-sample forecast performance is generally regarded as more trustworthy. In addition, out-of-sample forecasts can better reflect the information available to forecasters in “real time” (cf. [32]). As emphasized by [33], out-of-sample forecast is the “ultimate test of forecasting model”.

We study the out-of-sample forecasts of the nonparametric model and linear model by comparing their one-step-ahead forecasting performance. Specifically, we define the out-of-sample MSE (OMSE) as

O M S E = \frac{1}{m} \sum_{t = T - m + 1}^{T} {(F F_{t} - {\tilde{F F}}_{t})}^{2},

where

{\tilde{F F}}_{t}

is the fitted value at

(T B 3 m_{t}, T B 5 y_{t})

with the model estimated using the observations up to time

t - 1

6. Furthermore, out-of-sample size m is taken to be

[1 / 10 T]

,

[1 / 15 T]

,

[1 / 20 T]

, where T is the full sample size. The reason for choosing relatively small proportion of out-of-sample evaluation period is mainly due to the fact that for the nonparametric forecast, because of the nature of local estimate, a too small proportion of in-sample size may make the forecast impossible if the observation of the period we are going to forecast is an “outlier” such that we do not have enough observations in the neighborhood.

The results for the out-of-sample forecasts are reported Table 5.

Table 5. OMSE Comparison.

**Table 5.** OMSE Comparison.
Model	m= [1/10 T]	m= [1/15 T]	m= [1/20 T]
Linear model	0.1018	0.0671	0.0519
Nonparametric model	0.0021	0.0022	0.0015

From Table 5, we can see that the nonparametric model consistently outperforms the linear model by the out-of-sample evaluation. It is also interesting that the OMSE is smaller than the in-sample MSE. It is possible because the volatility increases with regressors (see Figure 5 below) and our out-of-sample forecasting periods are the periods where the interest rates are very low as is seen from Figure 2. The comparison of OMSE provides additional strong evidence that there exists nonlinearity in the relationship.

Figure 5. In (a) we estimate the shape of the variance with TB3m when TB5y = 4; in (b) we estimate the shape of the variance with TB3m when TB5y = 8.

Similarly, we can estimate the conditional variance function using Equation (5) (with the same bandwidth as that used for the mean function estimation). Figure 5 studies the variance conditional on TB3m when TB5y is fixed. When TB5y is 4, we study shape of the variance with TB3m within

[3.7 4.3]

and when TB5y is 8, we study the relationship with TB3m within

[7.7 8.3]

.

Figure 5 indicates that there exists conditional heteroscedasticity7. The conditional heteroscedasticity of FF is also found in [34], where a univariate model with ARCH-type (cf. [12]) volatility function is used. Chan et al. [35] study different models for the short term interest rate encompassed in the following stochastic differential equation (SDE):

d r_{t} = (α + β r_{t}) d t + σ r_{t}^{γ} d W_{t},

(9)

where

r_{t}

is the (spot) interest rate,

W_{t}

is the standard Brownian motion, α, β, σ and γ are some constants. When

γ = 0

, Equation (9) is the Vasicek [36] model and when

γ = 1 / 2

, Equation (9) is the famous Cox-Ingersoll-Ross (CIR, [37]) model. We can see that in [36], the process is conditional homoscedastic and when

γ \neq 0

, there exists conditional heteroscedasticity. The empirical findings of [35] suggest that γ is significantly larger than 0, so that the volatility increases with the interest rate. Our model setup is regression rather than autoregression. However, the interest rates are cointegrated and move in the same direction, so that the finding of [35] implies the volatility in our model also increases with the regressors. This is consistent with the empirical results in our paper.

6. Conclusions

In this paper, we establish the asymptotic theory of nonparametric estimation for multivariate β-null recurrent processes in presence of heteroscedasticity. The Monte Carlo simulation results suggest that the nonparametric estimator performs well using bandwidth selected by the cross validation method. The application to the relationship of Federal funds rate with short term and long term T-bill rates indicates the existence of nonlinearity.

In the current paper, the variance function estimation is not discussed in details as for the mean function estimation. This is left for future research. In empirical applications, because of different ranges of different regressors especially in this multivariate nonstationary case, different bandwidths may be used for different regressors to make the estimate more efficient (cf. [18]).

Acknowledgements

We thank the editor and three anonymous referees for invaluable feedback which greatly improved the structure of the paper.

Author Contributions

Biqing Cai is the main author of this paper and both authors contribute to this paper.

Appendix

A. Some Markov Theory

To make this paper self-contained, in this Appendix, we introduce necessary notions of Markov chain properties. They are fundamental for deriving the asymptotic theory in this paper. We refer to [1,2] and [38] for more comprehensive treatments.

Let

{X_{t}, t \geq 0}

be a ϕ-irreducible Markov chain on the state space

(E, E)

with transition probability P. This means that for any set

A \in E

with

ϕ (A) > 0

, we have

\sum_{t = 1}^{\infty} P^{t} (x, A) > 0

for all

x \in E

. In this paper, we take

E \subseteq R^{d}

. We further assume that the ϕ-irreducible Markov chain

{X_{t}}

is Harris recurrent.

Definition 1.

A Markov chain

{X_{t}}

is Harris recurrent if, given a neighborhood

N_{x}

of x with

ϕ (N_{x}) > 0

,

{X_{t}}

returns to

N_{x}

with probability one, for any

x \in E

.

The Harries recurrent chain is positive recurrent if there exists an invariant probability measure such that

{X_{t}}

is strictly stationary and is null recurrent otherwise. The Harris recurrence allows one to construct a split chain, which decomposes the partial sum of functions of

{X_{t}}

into blocks of i.i.d. parts and two asymptotically negligible remaining parts. Let

τ_{k}

be the regeneration times, T the number of observations and

n (T)

the number of regenerations as in [1]8.

For the process

{G (X_{t}) : t \geq 0}

, defining

U_{k} = \{\begin{matrix} \sum_{t = 0}^{τ_{0}} G (X_{t}), & k = 0 \\ \sum_{t = τ_{k - 1} + 1}^{τ_{k}} G (X_{t}), & 1 \leq k \leq n (T), \\ \sum_{t = τ_{n (T)} + 1}^{T} G (X_{t}), & k = n (T) + 1 \end{matrix}

where

G (\cdot)

is a real function defined on

R^{d}

, then we have

S_{n} (G) = \sum_{t = 0}^{T} G (X_{t}) = U_{0} + \sum_{k = 1}^{n (T)} U_{k} + U_{n (T) + 1} .

(A.1)

From [38], we know that

{U_{k}, k \geq 1}

is a sequence of i.i.d. random variables, and

U_{0}

and

U_{n (T) + 1}

converge to zero almost surely (a.s.) when they are divided by the number of regenerations

n (T)

(using Lemma 3.2 in [1]).

The general Harris recurrence only yields stochastic rates of convergence in asymptotic theory of the nonparametric estimators, where distribution and size of the number of regenerations

n (T)

have no a priori known structure but fully depend on the underlying process

{X_{t}}

. To obtain a specific rate of

n (T)

in our asymptotic theory for the null recurrent process, we next impose some restrictions on the tail behavior of the distribution of the recurrence times of the Markov chain.

Definition 2.

A Markov chain

{X_{t}}

is β-null recurrent if there exist a small nonnegative function f, an initial measure λ, a constant

β \in (0, 1)

, and a slowly varying function

L_{f} (\cdot)

such that

E_{λ} (\sum_{t = 1}^{T} f (X_{t})) \sim \frac{1}{Γ (1 + β)} T^{β} L_{f} (T),

(A.2)

as

T \to \infty

, where

E_{λ}

stands for the expectation with initial distribution λ and

Γ (1 + β)

is the Gamma function with parameter

1 + β

.

The definition of a small function f in the above definition can be found in some existing literature (cf. p. 15, [38]). Assuming β-null recurrence restricts the tail behavior of the recurrence time of the process to be a regularly varying function. In fact, for all small functions f, by Lemma 3.1 in [1], we can find an

L_{s} (\cdot)

such that Equation (A.2) holds for the β-null recurrent Markov chain with

L_{f} (\cdot) = π_{s} (f) L_{s} (\cdot)

, where

π_{s}

is an invariant measure of the Markov chain

{X_{t}}

,

π_{s} (f) = \int f (x) π_{s} (d x)

and s is the small function in the minorization inequality (3.4) of [1]. Letting

L_{s} (T) = L_{f} (T) / (π_{s} (f))

and following the argument in [1], we may show that the regeneration number

n (T)

of the β-null recurrent Markov chain

{X_{t}}

has the following asymptotic distribution

\frac{n (T)}{T^{β} L_{s} (T)} \to_{d} M_{β} (1),

(A.3)

where

M_{β} (1)

is the Mittag-Leffler distribution with parameter β (cf. [39]). Since

n (T) < T

a.s. for the null recurrent case by Equation (A.3), the rates of convergence for the nonparametric kernel estimators are slower than those for the stationary time series case (cf. [2]). We also denote

u (T) = T^{β} L_{s} (T)

, which is used in the main text of this paper.

In Section 2 of this paper, some typical examples of multivariate β-null recurrent processes are provided.

B. Mathematical Proofs

Proof of Lemma 1.

According to the split chain technique (cf. (A.1)), we have

\frac{1}{n (T) h^{d}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) = \frac{1}{n (T) h^{d}} {U_{0} (K_{h}) + \sum_{k = 1}^{n (T)} U_{k} ((K_{h})) + U_{n (T) + 1} (K_{h})},

where

K_{h} = Π_{i = 1}^{d} k_{i} (\frac{x_{i, t} - x_{i}}{h})

. We have for

k = 1, \dots, n (T)

E (U_{k} (K_{h})) = π_{s} (K_{h}) = \int k_{1} (\frac{y_{1} - x_{1}}{h}) \dots k_{d} (\frac{y_{d} - x_{d}}{h}) p_{s} (y_{1}, \dots, y_{d}) d y .

Denoted

\frac{y_{1} - x_{1}}{h} = z_{1}

, ⋯,

\frac{y_{d} - x_{d}}{h} = z_{d}

, then

\int k_{1} (\frac{y_{1} - x_{1}}{h}) \dots k_{d} (\frac{y_{d} - x_{d}}{h}) p_{s} (y_{1}, \dots, y_{d}) d y = h^{d} \int k_{1} (z_{1}) \dots k_{d} (z_{d}) p_{s} (x_{1} + z_{1} h, \dots, x_{d} + z_{d} h) d z

= h^{d} p_{s} (x) + o (h^{d}) .

And similar to the proof of Lemma 3.2 of [1], we have

P_{λ} (∣ U_{0} (K_{h}) ∣ < \infty) = 1

and

P_{λ} (∣ U_{n (T) + 1} (K_{h}) ∣ < \infty) = 1,

where λ is an arbitrary initial measure.

Then Lemma 1 follows from the weak law of large numbers because

n (T) h^{d} \to \infty

as

T \to \infty

. ☐

Proof of Lemma 2.

We have

\frac{1}{\sqrt{n (T) h^{d}}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) ε_{t} = \frac{1}{\sqrt{n (T) h^{d}}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) σ (x_{t}) e_{t}

= \frac{1}{\sqrt{n (T) h^{d}}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) σ (x) e_{t} + \frac{1}{\sqrt{n (T) h^{d}}} \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) [σ (x_{t}) - σ (x)] e_{t}

(B.1)

\equiv I_{1} + I_{2} .

It follows from Theorem 3.5 of [2] (it is easy to show that the conditions of the theorem hold under our assumptions A.1–A.4) that

I_{1} \to_{d} N (0, p_{s} (x) σ^{2} (x) \int K^{2} (u) d u) .

(B.2)

For the term

I_{2}

, by the i.i.d. assumption on

{e_{t}}

E {[\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) [σ (x_{t}) - σ (x)] e_{t}]}^{2} = E \sum_{t = 1}^{T} K^{2} (\frac{x_{t} - x}{h}) {[σ (x_{t}) - σ (x)]}^{2}

\leq C^{2} h^{2} E \sum_{t = 1}^{T} K^{2} (\frac{x_{t} - x}{h}) = o (u (T) h^{d})

by Assumption A.1 on

σ (.)

. So that by the Markov inequality

I_{2} = o_{P} (1) .

(B.3)

Combining the results (B.1) (B.2) and (B.3), we have proved this lemma.

☐

Proof of Lemma 3.

Using the same split chain technique as in the proof of Lemma 1,

\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x)) = U_{0} (L_{h}) + \sum_{k = 1}^{n (T)} U_{k} (L_{h}) + U_{n (T) + 1} (L_{h}),

with

L_{h} = K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x))

. We have

E (U_{k} (L_{h})) = π_{s} (L_{h}) = \int K (\frac{y - x}{h}) (g (y) - g (x)) p_{s} (y) d y .

Using the change of variables

= h^{d} \int k_{1} (z_{1}) \dots k_{d} (z_{d}) (g (x_{1} + z_{1} h, \dots, x_{d} + z_{d} h) - g (x_{1}, \dots, x_{d})) p_{s} (x_{1} + z_{1} h, \dots, x_{d} + z_{d} h) d z

and by Taylor expansion

= h^{d} \{\int k_{1} (z_{1}) \dots k_{d} (z_{d}) (g_{1}^{^{'}} (x_{1}, \dots, x_{d}) z_{1} h + \dots + g_{d}^{^{'}} (x_{1}, \dots, x_{d}) z_{d} h

+ 1 / 2 \sum_{i = 1}^{d} \sum_{j = 1}^{d} g_{i j}^{″} ({\tilde{x}}_{1}, \dots, {\tilde{x}}_{d}) z_{i} z_{j} h^{2}) p_{s} (x_{1} + z_{1} h, \dots, x_{d} + z_{d} h) d z\},

where

{\tilde{x}}_{i}

is a real number between

x_{i}

and

x_{i, t}

. So that

E (U_{k} (L_{h})) = O (h^{2 + d})

and

E (\frac{1}{n (T) h^{d}} \sum_{k = 1}^{n (T)} U_{k} (L_{h})) = O (h^{2}) .

(B.4)

Moreover, we have

E (U_{k}^{2} (L_{h})) = \int K^{2} (\frac{y - x}{h}) {(g (y) - g (x))}^{2} p_{s} (y) d y = O (h^{2 d + 2})

and

E (\frac{1}{n (T) h^{d}} \sum_{k = 1}^{n (T)} U_{k} (L_{h}))^{2} = O (\frac{1}{u (T) h^{2 d}} \times h^{2 d + 2}) .

(B.5)

By Equations (B.4) and (B.5) and the Markov inequality

\frac{1}{n (T) h^{d}} \sum_{k = 1}^{n (T)} U_{k} (L_{h}) = O_{P} (h^{2}) + O_{P} (\frac{h}{\sqrt{n (T)}}) = O_{P} (h^{2})

(B.6)

by Assumption A.4.

Using the fact that

| U_{0} (L_{h}) | = | \sum_{t = 1}^{τ_{0}} K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x)) | \leq \sum_{t = 1}^{τ_{0}} | K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x)) |

and by the definition of

τ_{0}

,

E (\sum_{t = 1}^{τ_{0}} | K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x)) |) \leq E (\sum_{t = τ_{0}}^{τ_{1}} | K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x)) |)

= \int K (\frac{y - x}{h}) | g (y) - g (x) | p_{s} (y) d y = O (h^{d + 1}) .

So that by the Markov inequality

\frac{1}{n (T) h^{d}} U_{0} (L_{h}) = O_{P} (\frac{h}{n (T)}) = o_{P} (h^{2})

(B.7)

by Assumption A.4. Similarly, we have

\frac{1}{n (T) h^{d}} U_{n (T) + 1} (L_{h}) = o_{P} (h^{2}) .

(B.8)

Lemma 3 holds because of Equations (B.6)–(B.8)).

☐

Proof of Theorem 1.

We have

\hat{g} (x) - g (x) = \frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) (g (x_{t}) - g (x))}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})} + \frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h}) ε_{t}}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h})}

\equiv I_{1} + I_{2} .

So that Theorem 1 follows from Lemmas 1–3 and the Slutsky theorem.

☐

Proof of Theorem 2.

We have

{\hat{σ}}^{2} (x) = \frac{\sum_{t = 1}^{T} {(y_{t} - \hat{g} (x))}^{2} K (\frac{x_{t} - x}{h_{σ}})}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}})} \equiv \frac{B_{1} + B_{2} + B_{3}}{B_{0}},

where

B_{0} = \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}})

,

B_{1} = \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) ε_{t}^{2}

,

B_{2} = \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) {(g (x_{t}) - \hat{g} (x))}^{2}

and

B_{3} = 2 \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) (g (x_{t}) - \hat{g} (x)) ε_{t}

. We have

\frac{B_{1}}{B_{0}} = \frac{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) σ^{2} (x) + \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) (ε_{t}^{2} - σ^{2} (x_{t})) + \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) (σ {(x_{t})}^{2} - σ^{2} (x))}{\sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}})}

= σ^{2} (x) + O_{P} ({(u (T) h_{σ}^{d})}^{- 1 / 2}) + O_{P} (h_{σ}) = σ^{2} (x) + o_{P} (1)

by Lemma 2 and Assumption A.1 on

σ (.)

.

Thus Theorem 2 holds if we can prove

\frac{B_{2}}{B_{0}} = o_{P} (1)

and

\frac{B_{3}}{B_{0}} = o_{P} (1)

.

Using Taylor expansion, we have

B_{2} = \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) {(g (x_{t}) - g (x) + g (x) - \hat{g} (x))}^{2}

\equiv B_{21} + B_{22} + B_{23},

where

B_{21} = \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) {(g (x_{t}) - g (x))}^{2}

,

B_{22} = \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) {(g (x) - \hat{g} (x))}^{2}

, and

B_{23} = 2 \sum_{t = 1}^{T} K (\frac{x_{t} - x}{h_{σ}}) (g (x_{t}) - g (x)) (g (x) - \hat{g} (x))

.

By Lemma 3, we have

\frac{B_{21}}{B_{0}} = o_{P} (1) .

And by Theorem 1,

\frac{B_{22}}{B_{0}} = {(g (x) - \hat{g} (x))}^{2} = o_{P} (1) .

By the Cauchy-Schwarz inequality,

B_{23}^{2} \leq B_{21}^{2} B_{22}^{2},

which implies

\frac{B_{23}}{B_{0}} = o_{P} (1) .

Again by the Cauchy-Schwarz inequality,

B_{3}^{2} \leq B_{1}^{2} B_{2}^{2} .

Thus

\frac{B_{3}}{B_{0}} = o_{P} (1) .

So that

{\hat{σ}}^{2} (x) \to_{p} σ^{2} (x) .

☐

Conflicts of Interest

The authors declare no conflict of interest.

References

H.A. Karlsen, and D. Tjøstheim. “Nonparametric estimation in null recurrent time series.” Ann. Statist. 29 (2001): 372–416. [Google Scholar]
H.A. Karlsen, T. Myklebust, and D. Tjøstheim. “Nonparametric estimation in a nonlinear cointegration type model.” Ann. Statist. 35 (2007): 252–299. [Google Scholar] [CrossRef]
Q. Wang, and P.C.B. Phillips. “Asymptotic theory for local time density and nonparametric cointegrating regression.” Econometr. Theory 25 (2009a): 710–738. [Google Scholar] [CrossRef]
Q. Wang, and P.C.B. Phillips. “Structural nonparametric cointegrating regression.” Econometrica 77 (2009b): 1901–1948. [Google Scholar]
T. Myklebust, H.A. Karlsen, and D. Tjøstheim. “Null recurrent unit root processes.” Econometr. Theory 28 (2012): 1–41. [Google Scholar] [CrossRef]
J. Park, and P.C.B. Phillips. “Nonlinear regression with integrated time series.” Econometrica 69 (2001): 117–161. [Google Scholar] [CrossRef]
C. Dong, J. Gao, D. Tjøstheim, and J. Yin. Specification Testing for Nonlinear Multivariate Cointegrating Regressions. Monash University working paper; Melbourne, Australia: Monash University, 2014. [Google Scholar]
M. Schienle. Nonparametric Nonstationary Regression. University of Mannheim working paper; Mannheim, Germany: University of Mannheim, 2008. [Google Scholar]
J. Chen, J. Gao, and D. Li. “Estimation in semi-parametric regression with non-stationary regressors.” Bernoulli 18 (2012): 678–702. [Google Scholar] [CrossRef]
J. Gao, and P.C.B. Phillips. Functional Coefficient Nonstationary Regression with Non- and Semi-parametric Cointegration. Monash University working paper; Melbourne, Australia: Monash University, 2013. [Google Scholar]
N.H. Bingham, C.M. Goldie, and J.L. Teugels. Regular Variation. Cambridge, UK: Cambridge University Press, 1989. [Google Scholar]
R.F. Engle. “Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. inflation.” Econometrica 50 (1982): 987–1008. [Google Scholar] [CrossRef]
T. Bollerslev. “Generalized autoregressive conditional hetoroskedasticity.” J. Econometr. 31 (1986): 307–327. [Google Scholar] [CrossRef]
T. Bollerslev, R.Y. Chou, and K.F. Kroner. “ARCH modeling in finance: A review of the theory and empirical evidence.” J. Econometr. 52 (1992): 5–59. [Google Scholar] [CrossRef]
W. Härdle, and A. Tsybakov. “Local polynomial estimators of the volatility function in nonparametric autoregression.” J. Econometr. 81 (1997): 223–242. [Google Scholar] [CrossRef]
J. Fan, and Q. Yao. “Efficient estimation of conditional variance functions in stochastic regression.” Biometrika 85 (1998): 645–660. [Google Scholar] [CrossRef]
Q. Wang, and Y. Wang. “Nonparametric cointegrating regression with NNH errors.” Econometr. Theory 29 (2013): 1–27. [Google Scholar] [CrossRef]
L. Yang, and R. Tschernig. “Multivariate bandwidth selection for local linear regression.” J. R. Statist. Soc.: Ser. B 61 (1999): 793–815. [Google Scholar] [CrossRef]
G. Kallianpur, and H. Robbins. “The sequence of sums of random variables.” Duke Math. J. 21 (1954): 285–307. [Google Scholar] [CrossRef]
J. Gao, D. Tjøstheim, and J. Yin. “Estimation in threshold autoregressive models with a stationary and a unit root regime.” J. Econometr. 172 (2013): 1–13. [Google Scholar] [CrossRef]
B. Cai, J. Gao, and D. Tjøstheim. A New Class of Bivariate Threshold Cointegration Models. University of Bergen working paper; Bergen, Norway: University of Bergen, 2014. [Google Scholar]
A. Pagan, and A. Ullah. Nonparametric Econometrics. Cambridge, UK: Cambridge University Press, 1999. [Google Scholar]
G.D. Rudebusch. “Federal reserve interest rate, targeting rational expectations, and the term structure.” J. Monet. Econ. 35 (1995): 245–274. [Google Scholar] [CrossRef]
T.G. Anderson, and J. Lund. “Estimating continuous-time stochastic volatility models of the short-term interest rate.” J. Econometr. 77 (1997): 343–377. [Google Scholar] [CrossRef]
R.J. Shiller, J.Y. Campbell, K.L. Schoenholtz, and L. Weiss. “Forward rates and future policy: Interpreting the term structure of interest rates.” Brook. Papers Econ. Activ. 1 (1983): 173–223. [Google Scholar] [CrossRef]
G.D. Rudebusch. “Term structure evidence on interest rate smoothing and monetary policy inertia.” J. Monet. Econ. 49 (2002): 1161–1187. [Google Scholar] [CrossRef]
L. Sarno, and D.L. Thornton. “The dynamic relationship between the federal funds rate and the Treasure bill rate: An empirical investigation.” J. Bank. Finan. 27 (2003): 1079–1110. [Google Scholar] [CrossRef]
H.M. Anderson. “Transaction costs and non-linear adjustment towards equilibrium in the US treasure bill market.” Oxford Bull. Econ. Statist. 59 (1997): 465–484. [Google Scholar] [CrossRef]
R.H. Clarida, L. Sarno, M.P. Taylor, and G. Valenter. “The role of asymmetries and regime shifts in the term structure of interest rates.” J. Bus. 79 (2006): 1193–1224. [Google Scholar] [CrossRef]
N.S. Balke, and T.B. Fomby. “Threshold cointegration.” Int. Econ. Rev. 38 (1997): 627–645. [Google Scholar] [CrossRef]
H. White. “A reality check for data snooping.” Econometrica 68 (2000): 1097–1126. [Google Scholar] [CrossRef]
F.X. Diebold, and G. Rudubusch. “Forecasting output with the composite leading index: A real-time analysis.” J. Am. Statist. Assoc. 86 (1991): 603–610. [Google Scholar] [CrossRef]
J. Stock, and M. Watson. “Why has U.S. inflation become harder to forecast? ” J. Money Credit Bank. 39 (2007): 3–33. [Google Scholar] [CrossRef]
J. Hamilton. “The daily market for Federal funds.” J. Polit. Econ. 104 (1996): 26–56. [Google Scholar] [CrossRef]
K.C. Chan, G.A. Karolyi, F.A. Longstaff, and A.B. Sanders. “An empirical comparison of alternative models of the short-term interest rate.” J. Finan. 47 (1992): 1209–1227. [Google Scholar] [CrossRef]
O. Vasicek. “An equilibrium characterization of the term structure.” J. Finan. Econ. 5 (1977): 177–188. [Google Scholar] [CrossRef]
J.C. Cox, J.E. Ingersoll, and S.A. Ross. “A theory of term structure of interest rates.” Econometrica 53 (1985): 385–407. [Google Scholar] [CrossRef]
E. Nummelin. General Irreducible Markov Chains and Non-negative Operators. Cambridge, UK: Cambridge University Press, 1984. [Google Scholar]
Y. Kasahara. “Limit theorems for Lévy processes and Poisson point processes and their applications to Brownian excursions.” J. Math. Kyoto Univ. 24 (1984): 521–538. [Google Scholar]

^1.In this paper, we use same bandwidth for different regressors. This is a little bit restrictive in practice. We leave the case of different choices of bandwidths for different regressors to future research.
^2.For the Quartic kernel used in the Monte Carlo simulation, it is equal to ${(\frac{5}{7})}^{d}$ .
^3.Recently, there are many papers considering the nonlinear dynamics of the term structure of interest rate where the nonlinear error correction models are used, see, e.g., [28,29]. This kind of process is also covered by our theory.
^4.The bandwidth is selected by the leave-one-out cross validation method and is 0.1130.
^5.That’s also the reason why we do not try to display the joint estimation results, but instead, we display the relationship of FF and TB5y with TB3m fixed.
^6.I.e., we use recursive (expanding) windows in estimation. Compared with the rolling-window forecasting which uses fixed in-sample size, the recursive window utilizes all the data when one forecasts for the next period, thus may increase the precision of the in-sample estimation, which in turn, leads to better out-of-sample forecasting.
^7.We can also find that the volatility with TB3m fixed is not constant. To save space, we don’t report the results here.
^8.They use the notation $T (n)$ instead of $n (T)$ .

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/).

Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

Abstract

1. Introduction

2. Model and Estimation

3. Asymptotic Theory

4. Monte Carlo Simulation

5. Empirical Application to the Relationship of Interest Rates

6. Conclusions

Acknowledgements

Author Contributions

Appendix

A. Some Markov Theory

B. Mathematical Proofs

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics