Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator

Johansen, Søren; Nielsen, Bent

doi:10.3390/econometrics1010053

Open AccessArticle

Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator

by

Søren Johansen

^1,2

and

Bent Nielsen

^3,*

¹

Department of Economics, University of Copenhagen, Øster Farimagsgade 5, 1353 Copenhagen, Denmark

²

CREATES, Department of Economics and Business, Aarhus University, Fuglesangs Alle 4,8210 Aarhus, Denmark

³

Department of Economics, University of Oxford & Nuffield College, OX1 1NF, Oxford, UK

^*

Author to whom correspondence should be addressed.

Econometrics 2013, 1(1), 53-70; https://doi.org/10.3390/econometrics1010053

Submission received: 28 January 2013 / Revised: 3 April 2013 / Accepted: 3 April 2013 / Published: 13 May 2013

Download

Browse Figure

Versions Notes

Abstract

:

In regression we can delete outliers based upon a preliminary estimator and re-estimate the parameters by least squares based upon the retained observations. We study the properties of an iteratively defined sequence of estimators based on this idea. We relate the sequence to the Huber-skip estimator. We provide a stochastic recursion equation for the estimation error in terms of a kernel, the previous estimation error and a uniformly small error term. The main contribution is the analysis of the solution of the stochastic recursion equation as a fixed point, and the results that the normalized estimation errors are tight and are close to a linear function of the kernel, thus providing a stochastic expansion of the estimators, which is the same as for the Huber-skip. This implies that the iterated estimator is a close approximation of the Huber-skip.

Keywords:

Huber-skip; iteration; one-step M-estimators; unit roots

Graphical Abstract

1. Introduction and Main Results

Outlier detection in regression is an important topic in econometrics. The idea is to find an estimation method that is robust to the presence of outliers, and the statistical literature abounds in robust methods, since the introduction of M-estimators by Huber [1], see also the monographs Maronna, Martin, and Yohai [2], Huber and Ronchetti [3], and Jurečková, Sen, and Picek [4]. Recent contributions are the impulse indicator saturation method, see Hendry, Johansen, and Santos [5] and Johansen and Nielsen [6], and the Forward Search, see Atkinson, Riani, and Cerioli [7].

The present paper is a contribution to the theory of the robust estimators, where we focus on the Huber [1] skip-estimator that minimizes

\sum_{i = 1}^{n} ρ (y_{i} - β^{'} X_{i}),

where the objective function,

ρ,

is given by

ρ (z) = \frac{1}{2} min (z^{2}, c^{2}) = \frac{1}{2} (z^{2} 1_{(| z | \leq c)} + c^{2} 1_{(| z | > c)}) .

This estimator removes the observations with large residuals, something that, at least in the analysis of economic time series, appears to be a reasonable method.

It is seen that ρ is absolutely continuous with derivative

ρ^{'} (z) = z 1_{(| z | \leq c)},

but

ρ^{'} (z)

is neither monotone nor absolutely continuous, which makes the calculation of the minimizer somewhat tricky, and the asymptotic analysis rather difficult.

Thus the estimator is often replaced by the Winsorized estimator, which has convex objective function

ρ_{1} (z) = \frac{1}{2} z^{2} 1_{(| z | \leq c)} + c (| z | - \frac{1}{2} c) 1_{(| z | > c)}

with derivative

ρ_{1}^{'} (z) = z 1_{(| z | \leq c)} + c sign (z) 1_{(| z | > c)},

which is both monotone and absolutely continuous and hence a lot easier to analyse, see Huber [1]. Note, however, that the function

ρ_{1}

replaces the large residuals by

\pm c,

instead of removing the observation. This is a less common method in time series econometrics.

An alternative simplification is formulated by Bickel [8], who suggested applying a preliminary estimator

{\hat{β}}_{n 0}

and define the one-step estimator,

{\hat{β}}_{n 1},

by linearising the first order condition. He also suggested iterating this by using

{\hat{β}}_{n 1}

as initial estimator for

{\hat{β}}_{n 2}

etc., but no results were given.

In the analysis of the Huber-skip, derived from

ρ,

we shall replace β by a preliminary estimator in the indicator function, which leads to eliminating the outlying observations, and run a regression on the retained observations. We shall do so iteratively and study the sequence of recursively defined estimators

{\hat{β}}_{n m}

. We prove under fairly general assumptions on regressors and distribution that for

(m, n) \to \infty,

the estimator

{\hat{β}}_{n m}

has the same asymptotic expansion as the Huber-skip, and in this sense

{\hat{β}}_{n m},

which is easy to calculate, is a very good approximation to the Huber-skip.

One-step M-estimators have been analysed previously in various situations. Apart from Bickel [8], who considered a situation with fixed regressors and weight functions satisfying certain smoothness and integrability conditions, Ruppert and Carroll [9] considered one-step Huber-skip L-estimators. Welsh and Ronchetti analysed the one-step Huber-skip estimator when the initial estimator is the least squares estimator, as well as one-step M-estimators with general initial estimator but with a function ρ with absolutely continuous derivative [10]. Recently Cavaliere and Georgiev analysed a sequence of Huber-skip estimators for the parameter of an

A R (1)

model with infinite variance errors in case the autoregressive coefficient is 1 [11]. Johansen and Nielsen analysed one-step Huber-skip estimators for general

n^{1 / 2}

consistent initial estimators and stationary as well as some non-stationary regressors [6].

Iterated one-step M-estimators are related to iteratively reweighted least squares estimators. Indeed the one-step Huber-skip estimator corresponds to a reweighted least squares estimator with weights of zero or unity. Dollinger and Staudte considered a situation with smooth weights, hence ruling out Huber-skips, and gave conditions for convergence [12]. Their argument was cast in terms of influence functions. Our result for iteration of Huber-skip estimators is similar, but the employed tightness argument is different because of the non-smooth weight function.

Notation: The Euclidean norm for vectors x is denoted

| x | .

We write

(m, n) \to \infty

if both m and n tend to infinity. We use the notation

o_{P} (1)

and

O_{P} (1)

implicitly assuming that

n \to \infty,

and

\overset{P}{\to}

means convergence in probability and

\overset{D}{\to}

denotes convergence in distribution. For matrices M we choose the spectral norm

| | M | | = max {eigen (M^{'} M)}^{1 / 2}

, so that

| | x | | = | x |

for vectors

x .

2. The Model and the Definition of the One-step Huber-skip

We consider the multiple regression model with p regressors X

y_{i} = β^{'} X_{i} + ε_{i}, i = 1, \dots, n,

(2.1)

and

ε_{i}

is assumed independent of

(X_{1}, \dots, X_{i}, ε_{1}, \dots, ε_{i - 1})

with known density

f,

which does not have to be symmetric. These assumptions allow for both deterministic and stochastic regressors. In particular

X_{i}

can be the lagged dependent variables as for an autoregressive process, and the process can be stationary or non-stationary.

We consider estimation of both β and

σ^{2} .

Thus we start with some preliminary estimator

({\hat{β}}_{n 0}, {\hat{σ}}_{n 0}^{2})

and seek to improve it through an iterative procedure by using it to identify outliers, discard them and then run a regression on the remaining observations. The technical assumptions are listed in Assumption A, see §2.2 below, and allows the regressors to be deterministic or stochastic and stationary or trending.

The preliminary estimator

({\hat{β}}_{n 0}, {\hat{σ}}_{n 0}^{2})

could be a least squares estimator on the full sample, although that is not a good idea from a robustness viewpoint, see Welsh and Ronchetti [10]. Alternatively, the initial estimator,

{\hat{β}}_{n 0},

could be chosen as a robust estimator, as for instance the least trimmed squares estimator of Rousseeuw [13], Rousseeuw and Leroy [14] (p. 180). When the trimming proportion is at most a half, this convergences in distribution at a usual

n^{1 / 2}

-rate, see Víšek [15,16,17], and as

{\hat{σ}}_{n 0}^{2}

we would choose the least squares residual variance among the trimmed observations, bias corrected as in (2.7) below.

The outliers are identified by first choosing a ψ giving the proportion of good, central observations and then, because

f

is not assumed symmetric, introducing two critical values

\underset{̲}{c}

and

\bar{c}

so

\int_{\underset{̲}{c}}^{\bar{c}} f (v) d v = ψ and \int_{\underset{̲}{c}}^{\bar{c}} v f (v) d v = 0 .

(2.2)

This can also be written as

τ_{0} = ψ

and

τ_{1} = 0

, where

τ_{k}

are the truncated moments

τ_{k} = \int_{\underset{̲}{c}}^{\bar{c}} v^{k} f (v) d v for k \in N_{0} .

(2.3)

If

f

is symmetric we find

c = - \underset{̲}{c} = \bar{c}

and

τ_{2 k + 1} = 0, k \in N_{0} .

Observations are retained based on

({\hat{β}}_{n 0}, {\hat{σ}}_{n 0}^{2})

if their residuals

y_{i} - {\hat{β}}_{n 0}^{'} X_{i}

are in the interval

[\underset{̲}{c} {\hat{σ}}_{n 0}, \bar{c} {\hat{σ}}_{n 0}]

and otherwise deleted from the sample.

The Huber-skip,

{\hat{β}}_{n H},

is defined by minimizing

\frac{1}{2} \sum_{i = 1}^{n} [{(y_{i} - X_{i}^{'} β)}^{2} 1_{(\underset{̲}{c} σ \leq y_{i} - X_{i}^{'} β \leq \bar{c} σ)} + {\underset{̲}{c}}^{2} 1_{(y_{i} - X_{i}^{'} β \leq \underset{̲}{c} σ)} + {\bar{c}}^{2} 1_{(\bar{c} σ \leq y_{i} - X_{i}^{'} β)}],

for a given

σ .

If the minimum is attained at a point of differentiability of the objective function, then the solution solves the equation

{\hat{β}}_{n H} = {(\sum_{i = 1}^{n} X_{i} X_{i}^{'} 1_{(\underset{̲}{c} σ \leq y_{i} - X_{i}^{'} {\hat{β}}_{n H} \leq \bar{c} σ)})}^{- 1} \sum_{i = 1}^{n} X_{i} y_{i} 1_{(\underset{̲}{c} σ \leq y_{i} - X_{i}^{'} {\hat{β}}_{n H} \leq \bar{c} σ)} = g_{n} ({\hat{β}}_{n H}) .

(2.4)

We apply this to propose a sequence of recursively defined estimators

({\hat{β}}_{n m}, {\hat{σ}}_{n m}^{2})

by starting with

({\hat{β}}_{n 0}, {\hat{σ}}_{n 0}^{2})

and defining for

m, n = 1, 2, \dots

\begin{matrix} S_{n, m - 1} & = & {i : \underset{̲}{c} {\hat{σ}}_{n, m - 1} \leq y_{i} - X_{i}^{'} {\hat{β}}_{n, m - 1} \leq \bar{c} {\hat{σ}}_{n, m - 1}}, \end{matrix}

(2.5)

\begin{matrix} {\hat{β}}_{n m} & = & {(\sum_{i \in S_{n, m - 1}} X_{i} X_{i}^{'})}^{- 1} \sum_{i \in S_{n, m - 1}} X_{i} y_{i}, \end{matrix}

(2.6)

\begin{matrix} {\hat{σ}}_{n m}^{2} & = & ψ τ_{2}^{- 1} {(\sum_{i \in S_{n, m - 1}} 1)}^{- 1} \sum_{i \in S_{n, m - 1}} {(y_{i} - X_{i}^{'} {\hat{β}}_{n, m})}^{2} . \end{matrix}

(2.7)

Thus, the iterated one-step Huber-skip estimators

{\hat{β}}_{n m}

and

{\hat{σ}}_{n m}^{2}

are the least squares estimator of

y_{i}

on

X_{i}

among the retained observations in

S_{n, m - 1}

based upon

{\hat{β}}_{n, m - 1}

and

{\hat{σ}}_{n, m - 1}^{2} .

The bias correction factor

ψ τ_{2}^{- 1}

in

{\hat{σ}}_{n m}^{2}

is needed to obtain consistency.

Note that if

{\hat{β}}_{n, m - 1}

and

{\hat{σ}}_{n, m - 1}

are regression- and scale-equivariant, then the updated estimators

{\hat{β}}_{n m}

and

{\hat{σ}}_{n m}

are also regression- and scale-equivariant. Indeed, if

y_{i}

is replaced by

s y_{i} + X_{i}^{'} d

for all i for a scalar

s > 0

and a vector

d,

then

{\hat{β}}_{n, m - 1}

and

{\hat{σ}}_{n, m - 1}

are replaced by

s {\hat{β}}_{n, m - 1} + d

and

s {\hat{σ}}_{n, m - 1}

so that the sets

S_{n, m - 1}

are unaltered, which in turn lead to regression- and scale-equivariance of

{\hat{β}}_{n m}

and

{\hat{σ}}_{n m} .

2.1. Asymptotic Results

To obtain asymptotic results we need a normalisation matrix N for the regressors. If

X_{i}

is stationary then

N = n^{- 1 / 2} I_{p} .

If

X_{i}

is trending, a different normalisation is needed. For a linear trend component the normalisation is

n^{3 / 2}

and for a random walk component it is

n .

We assume that N has been chosen such that matrices Σ and μ exist for which

{\hat{Σ}}_{n} = N^{'} \sum_{i = 1}^{n} X_{i} X_{i}^{'} N \overset{D}{\to} Σ \overset{a . s .}{>} 0, {\hat{μ}}_{n} = n^{- 1 / 2} N^{'} \sum_{i = 1}^{n} X_{i} \overset{D}{\to} μ .

Note that Σ and μ may be stochastic as for instance when

X_{i}

is a random walk and

N = n^{- 1}

.

The estimation errors are denoted

{\hat{u}}_{n m} = \{\begin{matrix} N^{- 1} ({\hat{β}}_{n m} - β) \\ n^{1 / 2} ({\hat{σ}}_{n m} - σ) \end{matrix}\},

(2.8)

and the recursion defined in (2.5), (2.6), and (2.7) can expressed as

{\hat{u}}_{n m} = G_{n} ({\hat{u}}_{n, m - 1}) .

(2.9)

We introduce coefficient matrices

{\hat{Ψ}}_{n 1} = (\begin{matrix} ψ {\hat{Σ}}_{n} & 0 \\ 0 & 2 τ_{2} \end{matrix}), Ψ_{2} = (\begin{matrix} ξ_{1} {\hat{Σ}}_{n} & ξ_{2} {\hat{μ}}_{n} \\ ζ_{2} {\hat{μ}}_{n}^{'} & ζ_{3} \end{matrix}),

(2.10)

where

ξ_{n} = {(\bar{c})}^{n} f (\bar{c}) - {(\underset{̲}{c})}^{n} f (\underset{̲}{c}), n = 0, \dots, 3 and ζ_{n} = ξ_{n} - ξ_{n - 2} τ_{2} / ψ, n = 2, 3,

(2.11)

and

τ_{2}

is defined in (2.3), and define

{\hat{Γ}}_{n} = {\hat{Ψ}}_{n 1}^{- 1} {\hat{Ψ}}_{n 2} = (\begin{matrix} ψ^{- 1} ξ_{1} I_{p} & ψ^{- 1} ξ_{2} {\hat{Σ}}_{n}^{- 1} {\hat{μ}}_{n} \\ {(2 τ_{2})}^{- 1} ζ_{2} {\hat{μ}}_{n}^{'} & {(2 τ_{2})}^{- 1} ζ_{3} \end{matrix}) .

(2.12)

Here

({\hat{Γ}}_{n}, {\hat{Ψ}}_{n 1}, {\hat{Ψ}}_{n 2}) \overset{D}{\to} (Γ, Ψ_{1}, Ψ_{2}),

where the limits are defined similarly in terms of Σ and

μ .

When

f

is symmetric we let

c = - \underset{̲}{c} = \bar{c}

and find

ζ_{2} = ξ_{2} = 0,

so that Γ is diagonal. Moreover from

ξ_{2 k + 1} = 2 c^{2 k + 1} f (c),

we find

ξ_{1} / ψ = 2 c f (c) / ψ,

and

ζ_{3} / (2 τ_{2}) = c^{3} f (c) / τ_{2} - c f (c) / ψ

and therefore

Γ = diag {2 c f (c) / ψ I_{p}, c f (c) (c^{2} / τ_{2} - 1 / ψ)}

.

Finally, we define a kernel

K_{n} = {\hat{Ψ}}_{n 1}^{- 1} \sum_{i = 1}^{n} \{\begin{matrix} N^{'} X_{i} ε_{i} \\ n^{- 1 / 2} (ε_{i}^{2} - σ^{2} τ_{2} / ψ) \end{matrix}\} 1_{(\underset{̲}{c} σ \leq ε_{i} \leq σ \bar{c})} .

(2.13)

The analysis of the one-step estimator in Johansen and Nielsen [6] shows that, by linearising

G_{n},

the one-step estimation errors

{\hat{u}}_{n m}

satisfy the recursion equation

{\hat{u}}_{n m} = G_{n} ({\hat{u}}_{n, m - 1}) = {\hat{Γ}}_{n} {\hat{u}}_{n, m - 1} + K_{n} + R_{n} ({\hat{u}}_{n, m - 1}),

(2.14)

for some remainder term

R_{n} ({\hat{u}}_{n, m - 1})

. In this notation it is emphasized that the remainder term is a function of the previous estimation error

{\hat{u}}_{n, m - 1},

see Lemma 5.1 in the Appendix for a precise formulation.

It will be shown in Section 3 that if

max |

eigen

(Γ) | < 1

a.s. so that Γ is a contraction, then

{\hat{u}}_{n m} - {(I_{1 + p} - {\hat{Γ}}_{n})}^{- 1} K_{n} \overset{P}{\to} 0 for (m, n) \to \infty,

that is, for any η and

ϵ > 0

there exist

m_{0}

and

n_{0}

such that for

m \geq m_{0}

and

n \geq n_{0}

it holds that

P (| {\hat{u}}_{n m} - {(I_{1 + p} - {\hat{Γ}}_{n})}^{- 1} K_{n} | \geq η) \leq ϵ .

We therefore define

{\hat{u}}_{n *} = {(I_{1 + p} - {\hat{Γ}}_{n})}^{- 1} K_{n}

and note that it satisfies the equation

{\hat{u}}_{n *} = {\hat{Γ}}_{n} {\hat{u}}_{n *} + K_{n},

(2.15)

and in this sense the estimation error of

(β, σ)

has the same limit distribution as the fixed point of the linear function

u ⟼ {\hat{Γ}}_{n} u + K_{n}

.

Moreover it follows from Johansen and Nielsen [19] that, for the case of known

σ = 1

and symmetric density, the Huber skip has the stochastic expansion

{\hat{β}}_{n H} = (I_{p}, 0) {(I_{1 + p} - {\hat{Γ}}_{n})}^{- 1} K_{n} + o_{P} (1)

and hence the same asymptotic distribution as

(I_{p}, 0) {\hat{u}}_{n *} .

Finally it holds that

n^{1 / 2} ({\hat{β}}_{n H} - {\hat{β}}_{n m}) \overset{P}{\to} 0 for (n, m) \to \infty .

Finally the asymptotic distribution of

K_{n},

and therefore

{\hat{u}}_{n *},

is discussed in Section 4.

2.2. Assumptions for the Asymptotic Analysis

The assumptions are fairly general, in particular we do not assume that

f

is symmetric.

Assumption A Consider model (2.1). Assume

(i)

The density

f

has continuous derivative

f^{'}

and satisfies

(a): ${sup}_{v \in R} {(1 + v^{4}) f (v) + (1 + v^{2}) | f^{'} (v) |} < \infty,$
(b): it has mean zero, variance one, and finite fourth moment,
(c): $\bar{c}, \underset{̲}{c}$ are chosen so $τ_{0} = ψ$ and $τ_{1} = 0$

(i i)

For a suitable normalization matrix

N \to 0,

the regressors satisfy, jointly,

(a): ${\hat{Σ}}_{n} = N^{'} \sum_{i = 1}^{n} X_{i} X_{i}^{'} N \overset{D}{\to} Σ \overset{a . s .}{>} 0,$
(b): ${\hat{μ}}_{n} = n^{- 1 / 2} N^{'} \sum_{i = 1}^{n} X_{i} \overset{D}{\to} μ,$
(c): ${max}_{i \leq n} E {| n^{1 / 2} N^{'} X_{i} |}^{4} = O (1) .$

(i i i)

The initial estimator error satisfies

(N^{- 1} ({\hat{β}}_{n 0} - β), n^{1 / 2} ({\hat{σ}}_{n 0} - σ)) = O_{P} (1) .

3. The Fixed Point Result

The fixed point result is primarily a tightness result. Thus, for the moment, only tightness of the kernel

K_{n}

is needed, and it is not necessary to establish the limit distribution, which is discussed in Section 4. The first result is a tightness result for the kernel, see (2.13).

Theorem 3.1 Suppose Assumption A

(i b, i i c)

holds. Then

K_{n},

see (2.10) and (2.13), is tight, that is,

K_{n} = {\hat{Ψ}}_{n 1}^{- 1} \sum_{i = 1}^{n} \{\begin{matrix} N^{'} X_{i} ε_{i} \\ n^{- 1 / 2} (ε_{i}^{2} - σ^{2} τ_{2} / ψ) \end{matrix}\} 1_{(\underset{̲}{c} σ \leq ε_{i} \leq σ \bar{c})} = O_{P} (1) .

The proof follows from Chebyshev’s inequality and the details are given in the appendix.

The next result discusses one step of the iteration (2.14), and it is shown that the remainder term

R_{n} (u)

in (2.14) vanishes in probability uniformly in

| u | \leq U .

Theorem 3.2 Let m be fixed. Suppose Assumption A holds for the initial estimator

{\hat{u}}_{n, m - 1},

see (2.8). Then, for all

U > 0

, it holds that

{\hat{u}}_{n m} = {\hat{Γ}}_{n} {\hat{u}}_{n, m - 1} + K_{n} + R_{n} ({\hat{u}}_{n, m - 1}),

where the remainder term satisfies

sup_{| u | \leq U} | R_{n} (u) | = o_{P} (1) .

The proof involves a chaining argument that was given in Johansen and Nielsen [6], although there the result was written up in a slightly different way as discussed in the appendix.

The iterated estimators start with an initial estimator

({\hat{β}}_{n 0}, {\hat{σ}}_{n 0})

with tight estimation error, see Assumption A(

i i i

). This is iterated through the one-step (2.14) and defines the sequence of estimation errors

{\hat{u}}_{n m}

. We next show that this sequence is tight uniformly in

m .

Theorem 3.3 Suppose Assumption A holds and that

max | eigen (Γ) | < 1

a.s. so that Γ is a contraction. Then the sequence of estimation errors

{\hat{u}}_{n m}

is tight uniformly in m

sup_{0 \leq m < \infty} | {\hat{u}}_{n m} | = O_{P} (1) .

That is, for all

ϵ > 0

there exist

U > 0

and

n_{0} > 0,

so that for all

n \geq n_{0}

it holds that

P (sup_{0 \leq m < \infty} | {\hat{u}}_{n m} | > U) < ϵ .

The proof is given in the appendix, but the idea of the proof is to write the solution of the recursive relation (2.14) as

{\hat{u}}_{n m} = {\hat{Γ}}_{n}^{m} {\hat{u}}_{n 0} + \sum_{ℓ = 1}^{m} {\hat{Γ}}_{n}^{ℓ - 1} {K_{n} + R_{n} ({\hat{u}}_{n m})} .

(3.1)

Then, if the initial estimator

{\hat{u}}_{n 0}

takes values in a large compact set with large probability, it follows from (3.1), by finite induction, that also

{\hat{u}}_{n m}

takes values in the same compact set for all

m

, and therefore

{\hat{u}}_{n m}

is tight uniformly in

m

.

Finally we give the fixed point result. Theorem 3.4 shows that the estimator has the same limit distribution as the solution of equation (2.15),

{\hat{u}}_{n *} = {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n},

which is a fixed point of the linear function

u \mapsto {\hat{Γ}}_{n} u + K_{n} .

Theorem 3.4 Suppose Assumption A holds and that

max | eigen (Γ) | < 1

a.s. so that Γ is a contraction. Then

{\hat{u}}_{n m} - {\hat{u}}_{n *} = {\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} \overset{P}{\to} 0 f o r (m, n) \to \infty .

That is, for all ϵ and

η > 0,

an

n_{0} > 0

and

m_{0} > 0

exist so that for all

n \geq n_{0}

and

m \geq m_{0}

it holds

P (| {\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} | > η) < ϵ .

Using

\sum_{ℓ = 1}^{m} {\hat{Γ}}_{n}^{ℓ - 1} = {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} (I_{p + 1} - {\hat{Γ}}_{n}^{m})

we find from (3.1) that

{\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} = {\hat{Γ}}_{n}^{m} ({\hat{u}}_{n 0} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n}) + \sum_{ℓ = 1}^{m} {\hat{Γ}}_{n}^{ℓ - 1} R_{n} ({\hat{u}}_{n, m - ℓ}) .

(3.2)

From (3.2) it can be seen that

| {\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} |

is the sum of two terms vanishing in probability, where the first decreases exponentially. The details are given in the Appendix.

In the special case where σ is known, then

{\hat{u}}_{n m}

reduces to

{\hat{b}}_{n m} = N^{- 1} ({\hat{β}}_{n m} - β)

and

Γ = ψ^{- 1} ξ_{1} I_{p},

and

{\hat{β}}_{n H}

becomes a fixed point of the mapping

g_{n}

defined in (2.4). The estimator

{\hat{b}}_{n *} = {(ψ - ξ_{1})}^{- 1} {\hat{Σ}}_{n}^{- 1} \sum_{i = 1}^{n} N^{'} X_{i} ε_{i} 1_{(\underset{̲}{c} σ < ε_{i} \leq \bar{c} σ)}

appears as the leading term for other robust estimators, such as the Least Trimmed Squares estimator discussed later on.

A necessary condition for the result is that the autoregressive coefficient matrix Γ is contracting. Therefore Γ is analyzed next.

Theorem 3.5 The autoregressive coefficient matrix Γ in (2.12) has

p - 1

eigenvalues equal to

ξ_{1} / ψ

and two eigenvalues solving

λ^{2} - (\frac{ζ_{3}}{2 τ_{2}} + \frac{ξ_{1}}{ψ}) λ + \frac{1}{2 τ_{2} ψ} (ζ_{3} ξ_{1} - ζ_{2} ξ_{2} μ^{'} Σ^{- 1} μ) = 0,

where the coefficients

ζ_{n}

and

ξ_{n}

are given in (2.11).

Further results can be given about the eigenvalues of Γ for symmetric densities, where

ξ_{2} = 0,

and

Γ = diag (ξ_{1} ψ^{- 1} I_{p}, ζ_{3} / (2 τ_{2}))

. Note that the quantities

(c, τ, ξ_{n}, ζ_{n})

all depend on

ψ,

see (2.2), (2.3), and (2.11). If

f

is symmetric, we show below,

(a),

that

ξ_{1} < ψ

and a condition,

(c),

is given for

ζ_{3} < 2 τ_{2},

in which case the eigenvalues of Γ are less than one, and Γ isa contraction. Finally

(d)

shows that Γ is a contraction if

f

is log-concave.

Theorem 3.6 Suppose

f

is symmetric with third moment,

f^{'} (c) \leq 0

for

c > 0,

and

{lim}_{c \to 0} f^{''} (c) < 0

. Then

(a)

0 < ξ_{1} / ψ < 1

for

0 < ψ < 1

while

{lim}_{ψ \to 0} ξ_{1} / ψ = 1

and

{lim}_{ψ \to 1} ξ_{1} / ψ = 0;

(b)

0 < ζ_{3} / (2 τ_{2})

for

0 < ψ < 1

and

{lim}_{ψ \to 0} ζ_{3} / (2 τ_{2}) = 1

and

{lim}_{ψ \to 1} ζ_{3} / (2 τ_{2}) = 0;

(c)

if

{[c {log \int_{0}^{c} f (x) d x}^{'}]}^{'} < 0

for

c > 0

then

ζ_{3} / (2 τ_{2}) < 1

for

0 < ψ < 1;

(d)

{log f (c)}^{''} < 0 \Rightarrow {[c {log f (c)}^{'}]}^{'} < 0 \Rightarrow {[c {log \int_{0}^{c} f (x) d x}^{'}]}^{'} < 0 .

The condition

{[c {log \int_{0}^{c} f (x) d x}^{'}]}^{'} < 0

is satisfied for the Gaussian density that is log-concave and by t-densities that are not log-concave but satisfy

{[c {log f (c)}^{'}]}^{'} < 0 .

In the robust statistics literature, Rousseeuw uses the condition

{[c {log f (c)}^{'}]}^{'} < 0

when discussing change-of-variance curves for M-estimators and assumes log-concave densities [18].

A consequence of Theorem 3.6 is that if

f

is symmetric, the roots of the coefficient matrix Γ are bounded away from unity for

ψ_{0} \leq ψ \leq 1

for all

ψ_{0} > 0 .

The uniform distribution on

[- a, a]

provides an example where Γ is not contracting since in this situation

ξ_{1} = ψ

over the entire support. However, the weak unimodality condition

f^{'} (c) \leq 0

in Theorem 3.6 is not necessary, as long as the mode at the origin is large in comparison with other modes.

4. Distribution of the Kernel

It follows from Theorem 3.4 that

{\hat{u}}_{n *} = {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n}

has the same limit as

{\hat{u}}_{n m},

and we therefore find the limit distribution of the kernel

K_{n}

in a few situations.

4.1. Stationary Case

Suppose the regressors are a stationary time series. Then the limits Σ and μ in Assumption A

(i a, i b)

are deterministic and

({\hat{Σ}}_{n}, {\hat{μ}}_{n}) \overset{P}{\to} (Σ, μ)

. The central limit theorem then shows that

K_{n} \overset{D}{\to} N_{p + 1} (0, Φ),

(4.1)

where

Φ = [\begin{matrix} ψ^{- 2} σ^{2} τ_{2} Σ^{- 1} & {(2 ψ τ_{2})}^{- 1} σ^{3} τ_{3} Σ^{- 1} μ \\ {(2 ψ τ_{2})}^{- 1} σ^{3} τ_{3} μ^{'} Σ^{- 1} & 4^{- 1} σ^{4} {τ_{4} τ_{2}^{- 2} - ψ^{- 1}} \end{matrix}] .

(4.2)

As a consequence, the fully iterated estimator has limit distribution

{\hat{u}}_{n *} = {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} \overset{D}{\to} {(I_{p + 1} - Γ)}^{- 1} N_{p + 1} (0, Φ) .

(4.3)

In the special case where the errors are symmetric, we find

\begin{matrix} N^{- 1} ({\hat{β}}_{n *} - β) & = & \frac{1}{(ψ - ξ_{1})} Σ^{- 1} \sum_{i = 1}^{n} N^{'} X_{i} ε_{i} 1_{(| ε_{i} | \leq σ c)} + o_{P} (1) \overset{D}{\to} N_{p} {0, \frac{σ^{2} τ_{2}}{{(ψ - ξ_{1})}^{2}} Σ^{- 1}}, \\ n^{1 / 2} ({\hat{σ}}_{n *}^{2} - σ^{2} τ_{ψ} / ψ) & = & {1 - ζ_{3} {(2 τ_{2})}^{- 1}}^{- 1} \sum_{i = 1}^{n} n^{- 1 / 2} (ε_{i}^{2} - σ^{2} τ_{2} ψ^{- 1}) 1_{(| ε_{i} | \leq σ c)} + o_{P} (1) \\ \overset{D}{\to} N_{p} {0, \frac{σ^{4} τ_{2}^{2} (τ_{4} - ψ^{- 1} τ_{2}^{2})}{{(2 τ_{2} - ζ_{3})}^{2}}}, \end{matrix}

noting that

ψ > ξ_{1}

and

ζ_{3} > 2 τ_{2}

are satisfied for symmetric, unimodal distributions by Theorem 3.6

(a, b) .

The limiting distribution of

N^{- 1} ({\hat{β}}_{n *} - β)

is also seen elsewhere in the robust statistics literature.

First, Víšek [15] (Theorem 1, p. 215) analysed the least trimmed squares estimator of Rousseeuw [13]. The estimator is given by

{\hat{β}}_{n}^{L T S} = arg {min}_{β \in R^{p}} \sum_{i = 1}^{int (n ψ)} r_{(i)}^{2} (β),

where

r_{(1)}^{2} (β) < \dots < r_{(n)}^{2} (β)

are the ordered squared residuals

r_{i} = y_{i} - X_{i}^{'} β

. The estimator has the property that it does not depend on the scale of the problem. Víšek argued that in the symmetric case, the least trimmed squares estimator satisfies

N^{- 1} ({\hat{β}}_{n}^{L T S} - β) = \frac{1}{(ψ - ξ_{1})} Σ^{- 1} \sum_{i = 1}^{n} N^{'} X_{i} ε_{i} 1_{(| ε_{i} | \leq c σ)} + o_{P} (1),

(4.4)

that is, the main term is the same as for

{\hat{β}}_{n *},

and it follows from Theorem 3.4 that because

{\hat{β}}_{n}^{L T S}

and

{\hat{β}}_{n *}

have the same expansions we have

| N^{- 1} ({\hat{β}}_{n m} - {\hat{β}}_{n}^{L T S}) | \overset{P}{\to} 0

for

(m, n) \to \infty .

Thus

{\hat{β}}_{n m}

can be seen as an approximation to the

L T S

estimator when there are no outliers.

Second, Jurečková, Sen, and Picek [4] (Theorem 5.5, p. 176) considered a pure location problem with regressor

X_{i} = 1

and known

σ = 1

, and found an asymptotic expansion like (4.4) for the Huber-skip, and Johansen and Nielsen [19] showed the similar result for the general regression model. A consequence of this is that the iterated 1-step Huber-skip has the same limit distribution as the Huber-skip, and because

{\hat{β}}_{n m}

and

{\hat{β}}_{n H}

have the same expansion, it follows from Theorem 3.4 that

n^{1 / 2} | {\hat{β}}_{n m} - {\hat{β}}_{n H} | \overset{P}{\to} 0 for (m, n) \to \infty,

(4.5)

so the iterated estimator is in this sense an approximation to the Huber-skip.

4.2. Deterministic Trends

As a simple example with i.i.d. errors, consider the regression

y_{i} = β_{1} + β_{2} i + ε_{i},

where

ε_{i} \in R

satisfies Assumption A

(i) .

Define the normalisation

N = (\begin{matrix} n^{- 1 / 2} & 0 \\ 0 & n^{- 3 / 2} \end{matrix}) .

Then Assumption A

(i i)

is met with

X_{i} = {(1, i)}^{'}

and

Σ = (\begin{matrix} 1 & 1 / 2 \\ 1 / 2 & 1 / 3 \end{matrix}), μ = (\begin{matrix} 1 \\ 1 / 2 \end{matrix}),

(4.6)

and

{max}_{i \leq n} E {| n^{1 / 2} N^{'} X_{i} |}^{4} \leq 4 .

The kernel has a limit distribution given by (4.1), where the matrix Φ in (4.2) is computed in terms of the Σ and μ derived in (4.6).

If the errors are autoregressive, the derivation is in principle similar, but involves a notationally tedious detrending argument. The argument is similar to that of Johansen and Nielsen [6] (Section 1.5.1), and (4.5) holds.

4.3. Unit Roots

Consider as an example the autoregression

y_{i} = β y_{i - 1} + ε_{i}, i = 1, \dots, n .

If

β = 1

then

X_{i} = y_{i - 1} = y_{0} + \sum_{s = 1}^{i - 1} ε_{s}

and we have to choose

N = n^{- 1} .

By the functional Central Limit Theorem

n^{- 1 / 2} \sum_{i = 1}^{int (n u)} \{\begin{matrix} ε_{i} \\ ε_{i} 1_{(\underset{̲}{c} σ \leq ε_{i} \leq σ \bar{c})} \\ (ε_{i}^{2} - σ^{2} τ_{2} / ψ) 1_{(\underset{̲}{c} σ \leq ε_{i} \leq σ \bar{c})} \end{matrix}\} \overset{D}{\to} (\begin{matrix} W_{x, u} \\ W_{1, u} \\ W_{2, u} \end{matrix}),

where the limit is a Brownian motion with zero mean and variance

Φ_{W} = [\begin{matrix} σ^{2} & σ^{2} τ_{2} & σ^{3} τ_{3} \\ σ^{2} τ_{2} & σ^{2} τ_{2} & σ^{3} τ_{3} \\ σ^{3} τ_{3} & σ^{3} τ_{3} & σ^{4} {τ_{4} - τ_{2}^{2} / ψ} \end{matrix}] .

Thus the limit variables Σ and μ in Assumption A

(i)

are

Σ = \int_{0}^{1} W_{x, u}^{2} d u, μ = \int_{0}^{1} W_{x, u} d u,

while the kernel has limit distribution

K_{n} \overset{D}{\to} Ψ_{1}^{- 1} (\begin{matrix} \int_{0}^{1} W_{x, u} d W_{1, u} \\ W_{2, 1} \end{matrix}),

and (4.5) holds. Thus, when the density of

ε_{i}

is symmetric,

{\hat{β}}_{n *}

has limit distribution

n ({\hat{β}}_{n *} - β) \overset{D}{\to} \frac{\int_{0}^{1} W_{x, u} d W_{1, u}}{(ψ - ξ_{1}) \int_{0}^{1} W_{x, u}^{2} d u} .

When

ψ \to 1

then

ξ_{1} \to 0

and

τ_{2} \to 1

so

W_{1, u}

and

W_{x, u}

become identical and the limit distribution becomes the usual Dickey–Fuller distribution. See also Johansen and Nielsen [6] (Section 1.5.4) for a related and more detailed derivation.

5. Discussion of Possible Extensions

The iteration result in Theorem 3.4 has a variety of extensions. An issue of interest in the literature is whether a slow initial convergence rate can be improved upon through iteration. This would open up for using robust estimators converging for instance at a

n^{1 / 3}

rate as initial estimator. Such a result would complement the result of He and Portnoy, who find that the convergence rate cannot be improved in a single step by this procedure that applies least squares to the retained observations [20].

The key is to show that the remainder term of the one-step estimator in Theorem 3.2 remains small in an appropriately larger neighbourhood. The proof of Theorem 3.4 then applies the same way leading to the same fixed point result. The necessary techniques are developed by Johansen and Nielsen [21].

A related algorithm is the Forward Search of Atkinson, Riani, and Cerioli [7,22]. This involves finding an initial set of “good” observations using for instance the least trimmed squares estimator of Rousseeuw [13] and then increase the number of “good” observations using a recursive test procedure. The algorithm involves iteration of one-step Huber-skip estimators, see Johansen and Nielsen [23]. Again the key to its analysis is to improve Theorem 3.2, in this instance to hold uniformly in the cut-off fraction

ψ,

see Johansen and Nielsen for details [21].

Another algorithm of interest would be to analyse algorithms such as Autometrics of Hendry and Krolzig [24] and Doornik [25], which involves selection over observations as well as regressors.

In practice it is not a trivial matter to compute the least trimmed squares estimator of Rousseeuw [13]. A number of algorithms have been suggested in the literature, see for instance Hawkins and Olive [26]. Algorithms based on a “concentration” approach start with an initial trial fit that is iterated towards a final fit. It is possible that the abovementioned results will extend to shed some further light on the properties of such resampling algorithms.

Acknowledgments

The authors would like to thank the two referees for their useful comments. Søren Johansen is grateful to CREATES—Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. Bent Nielsen gratefully acknowledges financial support from the Programme of Economic Modelling, Oxford.

References

P.J. Huber. “Robust estimation of a location parameter.” Ann. Math. Stat. 35 (1964): 73–101. [Google Scholar]
R.A. Maronna, D.R. Martin, and V.J. Yohai. Robust Statistics: Theory and Methods. New York, NY, USA: Wiley, 2006. [Google Scholar]
P.J. Huber, and E.M. Ronchetti. Robust Statistics, 2nd ed. New York, NY, USA: Wiley, 2009. [Google Scholar]
J. Jurečková, P.K. Sen, and J. Picek. Methodological Tools in Robust and Nonparametric Statistics. London, UK: Chapman & Hall/CRC Press, 2012. [Google Scholar]
D.F. Hendry, S. Johansen, and C. Santos. “Automatic selection of indicators in a fully saturated regression.” Computation. Stat. 23 (2008): 317–335, and Erratum 337-339. [Google Scholar]
S. Johansen, and B. Nielsen. “An analysis of the indicator saturation estimator.” In The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry. Edited by J.L. Castle and N. Shepard. Oxford, UK: Oxford University Press, 2009, pp. 1–36. [Google Scholar]
A.C. Atkinson, M. Riani, and A. Cerioli. Exploring Multivariate Data with the Forward Search. New York, NY, USA: Springer, 2004. [Google Scholar]
P.J. Bickel. “One-step Huber estimates in the linear model.” J. Am. Statist. Assoc. 70 (1975): 428–434. [Google Scholar] [CrossRef]
D. Ruppert, and R.J. Carroll. “Trimmed least squares estimation in the linear model.” J. Am. Statist. Assoc. 75 (1980): 828–838. [Google Scholar] [CrossRef]
A.H. Welsh, and E. Ronchetti. “A journey in single steps: robust one step M-estimation in linear regression.” J. Stat. Plan. Infer. 103 (2002): 287–310. [Google Scholar] [CrossRef]
G. Cavaliere, and I. Georgiev. Exploiting Infinite Variance Through Dummy Variables in an AR Model. discussion paper; Lisbon, Spain: Universidade Nova de Lisboa, 2011. [Google Scholar]
M.B. Dollinger, and R.G. Staudte. “Influence functions of iteratively reweighted least squares estimators.” J. Am. Statist. Assoc. 86 (1991): 709–716. [Google Scholar] [CrossRef]
P.J. Rousseeuw. “Least median of squares regression.” J. Am. Statist. Assoc. 79 (1984): 871–880. [Google Scholar] [CrossRef]
P.J. Rousseeuw, and A.M. Leroy. Robust Regression and Outlier Detection. New Jersey, NJ, USA: Wiley, 1987. [Google Scholar]
J.Á. Víšek. “The least trimmed squares. Part I: Consistency.” Kybernetika 42 (2006): 1–36. [Google Scholar]
J.Á. Víšek. “The least trimmed squares. Part II: $\sqrt{n}$ -consistency.” Kybernetika 42 (2006): 181–202. [Google Scholar]
J.Á. Víšek. “The least trimmed squares. Part III: Asymptotic normality.” Kybernetika 42 (2006): 203–224. [Google Scholar]
P.J. Rousseeuw. “Most robust M-estimators in the infinitesimal sense.” Zeitschrift für Warhscheinlichkeitstheorie und verwandte Gebiete 61 (1982): 541–551. [Google Scholar] [CrossRef]
S. Johansen, and B. Nielsen. A stochastic expansion of the Huber-skip estimator for regression analysis. discussion paper; Copenhagen, Denmark: University of Copenhagen, work in progress; 2013. [Google Scholar]
X. He, and S. Portnoy. “Reweighted LS estimators converge at the same rate as the initial estimator.” Ann. Stat. 20 (1992): 2161–2167. [Google Scholar]
S. Johansen, and B. Nielsen. Asymptotic analysis of the Forward Search. discussion paper 13-01; Copenhagen, Denmark: University of Copenhagen, 2013. [Google Scholar]
A.C. Atkinson, M. Riani, and A. Ceroli. “The forward search: Theory and data analysis.” J. Korean Stat. Soc. 39 (2010): 117–134. [Google Scholar]
S. Johansen, and B. Nielsen. “Discussion: The forward search: Theory and data analysis.” J. Korean Stat. Soc. 39 (2010): 137–145. [Google Scholar] [CrossRef]
D.F. Hendry, and H.-M. Krolzig. “The properties of automatic Gets modelling.” Economic J. 115 (2005): C32–C61. [Google Scholar] [CrossRef]
J.A. Doornik. “Autometrics.” In The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry. Edited by J.L. Castle and N. Shephard. Oxford, UK: Oxford University Press, 2009, pp. 88–121. [Google Scholar]
D.M. Hawkins, and D.J. Olive. “Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm.” J. Am. Statist. Assoc. 97 (2002): 136–148. [Google Scholar] [CrossRef]
R.S. Varga. Matrix Iterative Analysis, 2nd ed. Berlin, Germany: Springer, 2000. [Google Scholar]

Appendix

Proof of Theorem 3.1. The process

{\tilde{K}}_{n} = \sum_{i = 1}^{n} \{\begin{matrix} N^{'} X_{i} ε_{i} \\ n^{- 1 / 2} (ε_{i}^{2} - σ^{2} τ_{2} / ψ) \end{matrix}\} 1_{(\underset{̲}{c} σ \leq ε_{i} \leq \bar{c} σ)}

is a martingale, we find that

E {\tilde{K}}_{n} {\tilde{K}}_{n}^{'} = (\begin{matrix} σ^{2} τ_{2} \sum_{i = 1}^{n} E (N^{'} X_{i} X_{i}^{'} N) & σ^{3} τ_{3} \sum_{i = 1}^{n} E (N^{'} X_{i}) \\ σ^{3} τ_{3} \sum_{i = 1}^{n} E {(N^{'} X_{i})}^{'} & σ^{4} (τ_{4} - τ_{2}^{2} ψ^{- 1}) \end{matrix}) .

Due to assumptions

(i i c), (i i i b)

this is bounded in n. Chebyshev’s inequality gives

P (| {\tilde{K}}_{n} | > C) \leq C^{- 2} E {| {\tilde{K}}_{n} |}^{2} .

Thus, both

\tilde{K}

and

{\hat{Ψ}}_{n 1}^{- 1}

, and hence their product, are tight.

■

The key to proving Theorem 3.2 is to understand the remainder terms of the moment matrices. This was done by Johansen and Nielsen [6]. As that paper was concerned only with the convergence of the 1-step estimator, the main Theorem 1.1 simply stated that the remainder terms vanish as

n \to \infty

. A more detailed result can, however, be extracted from the proof. To draw that out, let a and b be the scale and location coordinates of

u = (b, a),

respectively, and define, for

g_{i}, h_{i} \in (1, X_{i}, ε_{i}),

the product moment matrices

{\tilde{S}}_{g h} (u) = \sum_{i = 1}^{n} g_{i} h_{i}^{'} 1_{{(σ + n^{- 1 / 2} a) \underset{̲}{c} < ε_{i} - X_{i}^{'} N b \leq (σ + n^{- 1 / 2} a) \bar{c}}} .

Lemma 5.1 Suppose Assumption A holds. Define the remainder terms

R_{11} (u), R_{x x} (u), R_{x 1} (u), R_{x ε} (u),

and

R_{ε ε} (u)

by the equations

\begin{matrix} n^{- 1} {\tilde{S}}_{11} (u) & = & ψ + R_{11} (u), \\ N^{'} {\tilde{S}}_{x x} (u) N & = & ψ {\hat{Σ}}_{n} + R_{x x} (u), \\ n^{- 1 / 2} N^{'} {\tilde{S}}_{x 1} (u) & = & ψ {\hat{μ}}_{n} + R_{x 1} (u), \\ [\begin{matrix} N^{'} {\tilde{S}}_{x ε} (u) \\ n^{- 1 / 2} {{\tilde{S}}_{ε ε} (u) - σ^{2} τ_{2} ψ^{- 1} {\tilde{S}}_{11} (u)} \end{matrix}] & = & \sum_{i = 1}^{n} \{\begin{matrix} N^{'} X_{i} ε_{i} \\ n^{- 1 / 2} (ε_{i}^{2} - σ^{2} τ_{2} ψ^{- 1}) \end{matrix}\} 1_{(\underset{̲}{c} σ < ε_{i} \leq \bar{c} σ)} \\ + (\begin{matrix} ξ_{1} {\hat{Σ}}_{n} & ξ_{2} {\hat{μ}}_{n} \\ σ ζ_{2} {\hat{μ}}_{n}^{'} & σ ζ_{3} \end{matrix}) (\begin{matrix} b \\ a \end{matrix}) + \{\begin{matrix} R_{x ε} (u) \\ R_{ε ε} (u) \end{matrix}\}, \end{matrix}

where, for notational convenience, the dependence of n in the remainder terms is suppressed. Then for all

U > 0

and

n \to \infty

it holds that

{sup}_{| u | < U} {| R_{11} (u) | + | R_{x x} (u) | + | R_{x 1} (u) | + | R_{x ε} (u) | + | R_{ε ε} (u) |} = o_{P} (1) .

(5.1)

Proof of Lemma 5.1. Theorem 1.1 in Johansen and Nielsen [6] states that

| R_{11} (u) |,

| R_{x x} (u) |,

| R_{x 1} (u) |,

| R_{ε} (u) |,

| R_{ε ε} (u) |

vanish when u is evaluated at

\hat{u} = {N^{- 1} (\hat{β} - β), n^{1 / 2} (\hat{σ} - σ)}

under the assumption that

\hat{u} = O_{P} (1),

as

n \to \infty

. The proof of that result then progresses by noting that assumption

\hat{u} = O_{P} (1)

means that for all

ϵ > 0

, a U exists so

P (| u | \geq U) < ϵ

and therefore it suffices to prove that (5.1) holds. Therefore the proof of that theorem continues to prove precisely the statement (5.1), which is the desired result here. ■

Proof of Theorem 3.2. The updated estimator

({\hat{β}}_{n m}, {\hat{σ}}_{n m}^{2})

is defined in (2.6) and (2.7) in terms of the initial estimator

({\hat{β}}_{n, m - 1}, {\hat{σ}}_{n, m - 1}^{2})

, and we express them in terms of

S_{g h} = {\tilde{S}}_{g h} ({\hat{u}}_{n, m - 1})

where

{\hat{u}}_{n, m - 1} = {N^{- 1} ({\hat{β}}_{n, m - 1} - β), n^{1 / 2} ({\hat{σ}}_{n, m - 1} - σ)},

as follows

\begin{matrix} N^{- 1} ({\hat{β}}_{n m} - β) & = & {(N^{'} S_{x x} N)}^{- 1} N^{'} S_{x ε}, \\ n^{1 / 2} ({\hat{σ}}_{n m}^{2} - σ^{2}) & = & ψ τ_{2}^{- 1} {(S_{11})}^{- 1} n^{1 / 2} {S_{ε ε} - S_{ε x} N {(N^{'} S_{x x} N)}^{- 1} N^{'} S_{x ε} - σ^{2} τ_{2} ψ^{- 1} S_{11}} . \end{matrix}

For

{\hat{u}}_{n, m - 1} = ({\hat{b}}_{n, m - 1}, {\hat{a}}_{n, m - 1})

we get, by inserting the definitions from Lemma 5.1,

{\hat{b}}_{n m} = {ψ {\hat{Σ}}_{n} + R_{x x} ({\hat{u}}_{n, m - 1})}^{- 1} {\sum_{i = 1}^{n} (N^{'} X_{i} ε_{i}) 1_{(\underset{̲}{c} σ < ε_{i} \leq \bar{c} σ)} + ξ_{1} {\hat{Σ}}_{n} {\hat{b}}_{n, m - 1} + ξ_{2} {\hat{μ}}_{n} {\hat{a}}_{n, m - 1} + R_{x ε} ({\hat{u}}_{n, m - 1})} .

Since

\sum_{i = 1}^{n} (N^{'} X_{i} ε_{i}) 1_{(\underset{̲}{c} σ < ε_{i} \leq \bar{c} σ)}

is tight by Theorem 3.1,

{\hat{u}}_{n, m - 1}

is

O_{P} (1),

and the remainders are vanishing by Lemma 5.1 for

n \to \infty

, then

{\hat{b}}_{n m} = {(ψ {\hat{Σ}}_{n})}^{- 1} \sum_{i = 1}^{n} (N^{'} X_{i} ε_{i}) 1_{(\underset{̲}{c} σ < ε_{i} \leq \bar{c} σ)} + {(ψ {\hat{Σ}}_{n})}^{- 1} (ξ_{1} {\hat{Σ}}_{n} {\hat{b}}_{n, m - 1} + ξ_{2} {\hat{μ}}_{n} {\hat{a}}_{n, m - 1}) + R_{b, n} ({\hat{u}}_{n, m - 1}),

where

{sup}_{| u | < U} | R_{b, n} (u) | = o_{P} (1) .

From

n^{1 / 2} ({\hat{σ}}_{n m}^{2} - σ^{2}) = ({\hat{σ}}_{n m} + σ) n^{1 / 2} ({\hat{σ}}_{n m} - σ) = 2 σ {\hat{a}}_{n m} (1 + o_{P} (1))

we find that a similar argument shows

{\hat{a}}_{n m} = {(2 σ τ_{2})}^{- 1} n^{- 1 / 2} \sum_{i = 1}^{n} (ε_{i}^{2} - ψ^{- 1} σ^{2} τ_{2}) 1_{(σ \underset{̲}{c} < ε_{i} \leq σ \bar{c})} + {(2 τ_{2})}^{- 1} (ζ_{2} {\hat{μ}}_{n}^{'} {\hat{b}}_{n, m - 1} + ζ_{3} {\hat{a}}_{n, m - 1}) + R_{a, n} ({\hat{u}}_{n, m - 1}),

where

{sup}_{| u | < U} | R_{a, n} (u) | = o_{P} (1) .

■

Proof of Theorem 3.3. We want to show that for all

ϵ > 0

there exists

U > 0,

and

n_{0}

so that for

n \geq n_{0}

it holds

P (sup_{0 \leq m < \infty} | {\hat{u}}_{n m} | \leq U) \geq 1 - ϵ .

(5.2)

From the recursion (2.14) we find the representation

{\hat{u}}_{n m} = {\hat{Γ}}_{n}^{m} {\hat{u}}_{n 0} + \sum_{ℓ = 1}^{m} {\hat{Γ}}_{n}^{ℓ - 1} {K_{n} + R_{n} ({\hat{u}}_{n, m - ℓ})} .

(5.3)

The spectral norm and the Euclidean norm are compatible,

| M x | \leq | | M | | | x |,

see Varga [27] (Theorem 1.5). Therefore it holds

| {\hat{u}}_{n m} | \leq | | {\hat{Γ}}_{n}^{m} | | | {\hat{u}}_{n 0} | + (| K_{n} | + max_{0 \leq ℓ \leq m - 1} | R_{n} ({\hat{u}}_{n ℓ}) |) \sum_{ℓ = 1}^{m} | | {\hat{Γ}}_{n}^{ℓ - 1} | | .

By assumption a

δ

exists so that the spectral radius

max | eigen (Γ) | < δ < 1

with large probability. Because

{\hat{Γ}}_{n} \overset{D}{\to} Γ,

then

n_{0} > 0

and

δ < δ_{0} < 1

exist so that for all

n \geq n_{0}

then

max | eigen ({\hat{Γ}}_{n}) | < δ_{0} < 1

with probability larger than

1 - ϵ / 2 .

Then Gelfand’s formula, Varga [27] (Theorem 3.4), shows there is an

m_{0} > 0

so for all

m > m_{0}

then

| | {\hat{Γ}}_{n}^{m} | | \leq δ_{0}^{m}

. This in turn implies for some

c > 1,

that

{max}_{0 \leq m < \infty} | | {\hat{Γ}}_{n}^{m} | | < \sum_{ℓ = 0}^{\infty} | | {\hat{Γ}}_{n}^{ℓ} | | < c,

and hence

| {\hat{u}}_{n m} | \leq c {| {\hat{u}}_{n 0} | + | K_{n} | + max_{0 \leq ℓ \leq m - 1} | R_{n} ({\hat{u}}_{n ℓ}) |} .

(5.4)

Because it is assumed that

{\hat{u}}_{n 0}

is tight, and the sequence

{K_{n}}

is tight by Theorem 3.1, and

{max}_{| u | \leq U_{1}} | R_{n} (u) | = o_{P} (1)

by Theorem 3.2, then constants

U_{0} > η / 2, n_{0} > 0

exist so that for

n \geq n_{0},

the set

A_{n} = (max | eigen ({\hat{Γ}}_{n}) | < δ_{0}) \cap (c | {\hat{u}}_{0} | \leq U_{0}) \cap (c | K_{n} | \leq U_{0}) \cap (c max_{| u | \leq 3 U_{0}} | R_{n} (u) | \leq η / 2)

(5.5)

has probability larger than

1 - ϵ .

An induction over m is now used to show that

{sup}_{0 \leq m < \infty} | {\hat{u}}_{n m} | \leq 3 U_{0}

on the set

A_{n}

. As induction start, for

m = 0,

then

| {\hat{u}}_{n 0} | \leq c^{- 1} U_{0} < 3 U_{0}

by the tightness assumption to

{\hat{u}}_{0}

and

c > 1

. The induction assumption is that

{max}_{0 \leq ℓ \leq m - 1} | {\hat{u}}_{n ℓ} | \leq 3 U_{0} .

This implies that on the set

A_{n}

then

c {max}_{0 \leq ℓ \leq m - 1} | R_{n} ({\hat{u}}_{n ℓ}) | \leq c {max}_{| u | \leq 3 U_{0}} | R_{n} (u) | \leq η / 2 .

Thus, the bound (5.4) becomes

| {\hat{u}}_{n m} | \leq 2 U_{0} + η / 2 \leq 3 U_{0} .

It follows that

{max}_{0 \leq ℓ \leq m} | {\hat{u}}_{n ℓ} | \leq 3 U_{0} .

This proves (5.2) for

U = 3 U_{0}

. ■

Proof of Theorem 3.4. We want to show that for all

η, ϵ > 0

there is an

n_{0}

and

m_{0}

so that for

n \geq n_{0}

and

m \geq m_{0}

it holds that

P (| {\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} | > η) < ϵ .

(5.6)

In order to show (5.6), note that on the set

A_{n}

we find

\sum_{ℓ = 1}^{m} {\hat{Γ}}_{n}^{ℓ - 1} = (I_{p + 1} - {\hat{Γ}}_{n}^{m}) {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1}

where

{(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} = \sum_{ℓ = 0}^{\infty} {\hat{Γ}}_{n}^{ℓ} .

Therefore equation (5.3) shows that

{\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} = {\hat{Γ}}_{n}^{m} {{\hat{u}}_{n 0} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n}} + \sum_{ℓ = 1}^{m} {\hat{Γ}}_{n}^{ℓ - 1} R_{n} ({\hat{u}}_{n, m - ℓ}) .

To bound this, note first that

| | {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} | | = | | \sum_{ℓ = 0}^{\infty} {\hat{Γ}}_{n}^{ℓ} | | \leq \sum_{ℓ = 0}^{\infty} | | {\hat{Γ}}_{n}^{ℓ} | | < c

. Thus on the set

A_{n},

see (5.5), it holds that

| {\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} | \leq | | {\hat{Γ}}_{n}^{m} | | (c^{- 1} U_{0} + U_{0}) + c max_{0 \leq ℓ \leq m - 1} | R_{n} ({\hat{u}}_{n ℓ}) | \leq | | {\hat{Γ}}_{n}^{m} | | 2 U_{0} + η / 2 .

Now, for

m \geq m_{0}

then

| | {\hat{Γ}}_{n}^{m} | | \leq δ_{0}^{m}

. Since

δ_{0}^{m}

declines exponentially,

m_{0}

can be chosen so large that it also holds that

| | {\hat{Γ}}_{n}^{m} | | 2 U_{0} \leq η / 2 .

Thus

P (| {\hat{u}}_{n m} - {(I_{p + 1} - {\hat{Γ}}_{n})}^{- 1} K_{n} | \geq η) < ϵ,

for

m \geq m_{0}

and

n \geq n_{0}

, which proves (5.6). ■

Proof of Theorem 3.5. The matrices Γ and

Γ - λ I_{p + 1}

are of the form

(\begin{matrix} a I_{p} & b \\ c^{'} & d \end{matrix}),

and the result follows from the identity

a det (\begin{matrix} a I_{p} & b \\ c^{'} & d \end{matrix}) = det (\begin{matrix} I_{p} & 0 \\ - c^{'} & a \end{matrix}) det (\begin{matrix} a I_{p} & b \\ c^{'} & d \end{matrix}) = det (\begin{matrix} a I_{p} & b \\ 0 & a d - c^{'} b \end{matrix}) = a^{p} (a d - c^{'} b) .

■

Proof of Theorem 3.6.

(a)

For

c > 0

then

f (x) 1_{(| x | \leq c)} \geq f (c) 1_{(| x | \leq c)}

because

f

is symmetric and non-increasing. Integration gives

ψ = \int_{- c}^{c} f (x) d x \geq 2 c f (c) = ξ_{1},

where equality holds for

f (x) = f (c)

for

| x | \leq c,

by continuity of

f

. This is, however, ruled out by assuming

{lim}_{c \to 0} f^{''} (c) < 0 .

It holds

{lim}_{c \to 0} c^{- 1} \int_{0}^{c} f (x) d x = f (0)

and

{lim}_{c \to 0} ξ_{1} / (2 c) = f (0)

so

{lim}_{c \to 0} ξ_{1} / ψ = 1 .

Similarly,

\int_{0}^{\infty} f (x) d x = 1

and

{lim}_{ψ \to 1} c f (c) \to 0

so

{lim}_{ψ \to 1} ξ_{1} / ψ = 0 .

(b)

We find

g (c) = ζ_{3} / (2 τ_{2}) = ξ_{3} / (2 τ_{2}) - ξ_{1} / (2 τ_{0}) = \frac{2 c f (c) {\int_{0}^{c} (c^{2} - x^{2}) f (x) d x}}{τ_{2} τ_{0}} > 0 .

(5.7)

For

c \to 0,

or

ψ \to 0,

we find the approximations for

k = 0, 1 :

τ_{2 k} = 2 \int_{0}^{c} x^{2 k} f (x) d x \approx 2 c^{2 k + 1} f (0) / (2 k + 1),

which show that

g (c) \to 1

.

For

c \to \infty,

or

ψ \to 1,

we find

τ_{0} \to 1, τ_{2} \to 1

and

g (c) \approx 2 c f (c) (c^{2} - 1) \to 0

because

f

is assumed to have finite third moment.

(c)

Using

c τ_{0}^{'} = 2 c f (c)

we find from (5.7) that

g (c) < 1

if

h (c) = \frac{c τ_{0}^{'}}{τ_{0}} (c^{2} τ_{0} - τ_{2}) - τ_{2} = \frac{2 c f (c)}{τ_{0}} {\int_{0}^{c} (c^{2} - x^{2}) f (x) d x} - τ_{2} < 0,

and because the limit for

c \to 0

is zero it is enough to show that

h^{'} (c) < 0 .

We find

h^{'} (c) = {(\frac{c τ_{0}^{'}}{τ_{0}})}^{'} (c^{2} τ_{0} - τ_{2}) + \frac{c τ_{0}^{'}}{τ_{0}} (2 c τ_{0} + c^{2} τ_{0}^{'} - τ_{2}^{'}) - τ_{2}^{'} = {(\frac{c τ_{0}^{'}}{τ_{0}})}^{'} (c^{2} τ_{0} - τ_{2}),

because the extra term vanishes:

\frac{c τ_{0}^{'}}{τ_{0}} (2 c τ_{0} + c^{2} τ_{0}^{'} - τ_{2}^{'}) - τ_{2}^{'} = 2 c^{2} f (c) + c^{3} \frac{{2 f (c)}^{2}}{τ_{0}} - \frac{2 c^{3} f (c) 2 f (c)}{τ_{0}} - 2 c^{2} f (c) = 0 .

Because

c^{2} τ_{0} - τ_{2} > 0

and

{(\frac{c τ_{0}^{'}}{τ_{0}})}^{'} = {[c {log \int_{0}^{c} f (x) d x}^{'}]}^{'} < 0

by assumption we find

g (c) < 1 .

(d)

First, assume

{log f (c)}^{''} < 0

and

f^{'} (c) < 0

for

c > 0 .

Then

{[c {log f (c)}^{'}]}^{'} = {log f (c)}^{'} + c {log f (c)}^{''} = \frac{f^{'} (c)}{f (c)} + c {log f (c)}^{''} < 0 .

Secondly, assume

{[c {log f (c)}^{'}]}^{'} < 0 .

Denote

F (c) = \int_{0}^{c} f (x) d x .

Then

{[c {log F (c)}^{'}]}^{'} = \frac{{c f (c)}^{'} F (c) - c {f (c)}^{2}}{{F (c)}^{2}} = \frac{f (c)}{{F (c)}^{2}} L,

where

L = [1 + c {log f (c)}^{'}] F (c) - c f (c) .

Since

f (c) \geq 0

and

F (c) > 0

for

c > 0

it has to be argued that

L < 0 .

Now

{lim}_{c \to 0} L = 0

so it suffices to argue that

L^{'} < 0

for

c < 0 .

But

L^{'} = {[c {log f (c)}^{'}]}^{'} F (c)

, which is negative by assumption. ■

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Share and Cite

MDPI and ACS Style

Johansen, S.; Nielsen, B. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator. Econometrics 2013, 1, 53-70. https://doi.org/10.3390/econometrics1010053

AMA Style

Johansen S, Nielsen B. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator. Econometrics. 2013; 1(1):53-70. https://doi.org/10.3390/econometrics1010053

Chicago/Turabian Style

Johansen, Søren, and Bent Nielsen. 2013. "Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator" Econometrics 1, no. 1: 53-70. https://doi.org/10.3390/econometrics1010053

Article Menu

Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator

Abstract

1. Introduction and Main Results

2. The Model and the Definition of the One-step Huber-skip

2.1. Asymptotic Results

2.2. Assumptions for the Asymptotic Analysis

3. The Fixed Point Result

4. Distribution of the Kernel

4.1. Stationary Case

4.2. Deterministic Trends

4.3. Unit Roots

5. Discussion of Possible Extensions

Acknowledgments

References

Appendix

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI