# Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction and Main Results

## 2. The Model and the Definition of the One-step Huber-skip

#### 2.1. Asymptotic Results

#### 2.2. Assumptions for the Asymptotic Analysis

**Assumption A**Consider model (2.1). Assume

- (a)
- ${sup}_{v\in \mathbb{R}}\{(1+{v}^{4})\mathsf{f}\left(v\right)+(1+{v}^{2})|{\mathsf{f}}^{\prime}\left(v\right)\left|\right\}<\infty ,$
- (b)
- it has mean zero, variance one, and finite fourth moment,
- (c)
- $\overline{c},\underline{c}$ are chosen so ${\tau}_{0}=\psi $ and ${\tau}_{1}=0$

- (a)
- ${\widehat{\Sigma}}_{n}={N}^{\prime}{\sum}_{i=1}^{n}{X}_{i}{X}_{i}^{\prime}N\stackrel{\mathsf{D}}{\to}\Sigma \stackrel{a.s.}{>}0,$
- (b)
- ${\widehat{\mu}}_{n}={n}^{-1/2}{N}^{\prime}{\sum}_{i=1}^{n}{X}_{i}\stackrel{\mathsf{D}}{\to}\mu ,$
- (c)
- ${max}_{i\le n}\mathsf{E}{\left|{n}^{1/2}{N}^{\prime}{X}_{i}\right|}^{4}=\mathrm{O}\left(1\right).$

## 3. The Fixed Point Result

**Theorem 3.1**Suppose Assumption A $(ib,iic)$ holds. Then ${K}_{n},$ see (2.10) and (2.13), is tight, that is,

**Theorem 3.2**Let m be fixed. Suppose Assumption A holds for the initial estimator ${\widehat{u}}_{n,m-1},$ see (2.8). Then, for all $U>0$, it holds that

**Theorem 3.3**Suppose Assumption A holds and that $max\left|\mathrm{eigen}\right(\Gamma \left)\right|<1$ a.s. so that Γ is a contraction. Then the sequence of estimation errors ${\widehat{u}}_{nm}$ is tight uniformly in m

**Theorem 3.4**Suppose Assumption A holds and that $max\left|\mathrm{eigen}\right(\Gamma \left)\right|<1$ a.s. so that Γ is a contraction. Then

**Theorem 3.5**The autoregressive coefficient matrix Γ in (2.12) has $p-1$ eigenvalues equal to ${\xi}_{1}/\psi $ and two eigenvalues solving

**Theorem 3.6**Suppose $\mathsf{f}$ is symmetric with third moment, ${\mathsf{f}}^{\prime}\left(c\right)\le 0$ for $c>0,$ and ${lim}_{c\to 0}{\mathsf{f}}^{\prime \prime}\left(c\right)<0$. Then

## 4. Distribution of the Kernel

#### 4.1. Stationary Case

#### 4.2. Deterministic Trends

#### 4.3. Unit Roots

## 5. Discussion of Possible Extensions

## Acknowledgments

## References

- P.J. Huber. “Robust estimation of a location parameter.” Ann. Math. Stat. 35 (1964): 73–101. [Google Scholar]
- R.A. Maronna, D.R. Martin, and V.J. Yohai. Robust Statistics: Theory and Methods. New York, NY, USA: Wiley, 2006. [Google Scholar]
- P.J. Huber, and E.M. Ronchetti. Robust Statistics, 2nd ed. New York, NY, USA: Wiley, 2009. [Google Scholar]
- J. Jurečková, P.K. Sen, and J. Picek. Methodological Tools in Robust and Nonparametric Statistics. London, UK: Chapman & Hall/CRC Press, 2012. [Google Scholar]
- D.F. Hendry, S. Johansen, and C. Santos. “Automatic selection of indicators in a fully saturated regression.” Computation. Stat. 23 (2008): 317–335, and Erratum 337-339. [Google Scholar]
- S. Johansen, and B. Nielsen. “An analysis of the indicator saturation estimator.” In The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry. Edited by J.L. Castle and N. Shepard. Oxford, UK: Oxford University Press, 2009, pp. 1–36. [Google Scholar]
- A.C. Atkinson, M. Riani, and A. Cerioli. Exploring Multivariate Data with the Forward Search. New York, NY, USA: Springer, 2004. [Google Scholar]
- P.J. Bickel. “One-step Huber estimates in the linear model.” J. Am. Statist. Assoc. 70 (1975): 428–434. [Google Scholar] [CrossRef]
- D. Ruppert, and R.J. Carroll. “Trimmed least squares estimation in the linear model.” J. Am. Statist. Assoc. 75 (1980): 828–838. [Google Scholar] [CrossRef]
- A.H. Welsh, and E. Ronchetti. “A journey in single steps: robust one step M-estimation in linear regression.” J. Stat. Plan. Infer. 103 (2002): 287–310. [Google Scholar] [CrossRef]
- G. Cavaliere, and I. Georgiev. Exploiting Infinite Variance Through Dummy Variables in an AR Model. discussion paper; Lisbon, Spain: Universidade Nova de Lisboa, 2011. [Google Scholar]
- M.B. Dollinger, and R.G. Staudte. “Influence functions of iteratively reweighted least squares estimators.” J. Am. Statist. Assoc. 86 (1991): 709–716. [Google Scholar] [CrossRef]
- P.J. Rousseeuw. “Least median of squares regression.” J. Am. Statist. Assoc. 79 (1984): 871–880. [Google Scholar] [CrossRef]
- P.J. Rousseeuw, and A.M. Leroy. Robust Regression and Outlier Detection. New Jersey, NJ, USA: Wiley, 1987. [Google Scholar]
- J.Á. Víšek. “The least trimmed squares. Part I: Consistency.” Kybernetika 42 (2006): 1–36. [Google Scholar]
- J.Á. Víšek. “The least trimmed squares. Part II: $\sqrt{n}$-consistency.” Kybernetika 42 (2006): 181–202. [Google Scholar]
- J.Á. Víšek. “The least trimmed squares. Part III: Asymptotic normality.” Kybernetika 42 (2006): 203–224. [Google Scholar]
- P.J. Rousseeuw. “Most robust M-estimators in the infinitesimal sense.” Zeitschrift für Warhscheinlichkeitstheorie und verwandte Gebiete 61 (1982): 541–551. [Google Scholar] [CrossRef]
- S. Johansen, and B. Nielsen. A stochastic expansion of the Huber-skip estimator for regression analysis. discussion paper; Copenhagen, Denmark: University of Copenhagen, work in progress; 2013. [Google Scholar]
- X. He, and S. Portnoy. “Reweighted LS estimators converge at the same rate as the initial estimator.” Ann. Stat. 20 (1992): 2161–2167. [Google Scholar]
- S. Johansen, and B. Nielsen. Asymptotic analysis of the Forward Search. discussion paper 13-01; Copenhagen, Denmark: University of Copenhagen, 2013. [Google Scholar]
- A.C. Atkinson, M. Riani, and A. Ceroli. “The forward search: Theory and data analysis.” J. Korean Stat. Soc. 39 (2010): 117–134. [Google Scholar]
- S. Johansen, and B. Nielsen. “Discussion: The forward search: Theory and data analysis.” J. Korean Stat. Soc. 39 (2010): 137–145. [Google Scholar] [CrossRef]
- D.F. Hendry, and H.-M. Krolzig. “The properties of automatic Gets modelling.” Economic J. 115 (2005): C32–C61. [Google Scholar] [CrossRef]
- J.A. Doornik. “Autometrics.” In The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry. Edited by J.L. Castle and N. Shephard. Oxford, UK: Oxford University Press, 2009, pp. 88–121. [Google Scholar]
- D.M. Hawkins, and D.J. Olive. “Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm.” J. Am. Statist. Assoc. 97 (2002): 136–148. [Google Scholar] [CrossRef]
- R.S. Varga. Matrix Iterative Analysis, 2nd ed. Berlin, Germany: Springer, 2000. [Google Scholar]

## Appendix

**Proof of Theorem 3.1.**The process

**Lemma 5.1**Suppose Assumption A holds. Define the remainder terms ${R}_{11}\left(u\right),{R}_{xx}\left(u\right),{R}_{x1}\left(u\right),{R}_{x\epsilon}\left(u\right),$ and ${R}_{\epsilon \epsilon}\left(u\right)$ by the equations

**Proof of Lemma 5.1.**Theorem 1.1 in Johansen and Nielsen [6] states that $|{R}_{11}\left(u\right)|,$$|{R}_{xx}\left(u\right)|,$$|{R}_{x1}\left(u\right)|,$$|{R}_{\epsilon}\left(u\right)|,$$|{R}_{\epsilon \epsilon}\left(u\right)|$ vanish when u is evaluated at $\widehat{u}=\{{N}^{-1}(\widehat{\beta}-\beta ),{n}^{1/2}(\widehat{\sigma}-\sigma )\}$ under the assumption that $\widehat{u}={\mathrm{O}}_{\mathsf{P}}\left(1\right),$ as $n\to \infty $. The proof of that result then progresses by noting that assumption $\widehat{u}={\mathrm{O}}_{\mathsf{P}}\left(1\right)$ means that for all $\u03f5>0$, a U exists so $\mathsf{P}\left(\right|u|\ge U)<\u03f5$ and therefore it suffices to prove that (5.1) holds. Therefore the proof of that theorem continues to prove precisely the statement (5.1), which is the desired result here. ■

**Proof of Theorem 3.2.**The updated estimator $({\widehat{\beta}}_{nm},{\widehat{\sigma}}_{nm}^{2})$ is defined in (2.6) and (2.7) in terms of the initial estimator $({\widehat{\beta}}_{n,m-1},{\widehat{\sigma}}_{n,m-1}^{2})$, and we express them in terms of ${S}_{gh}={\tilde{S}}_{gh}\left({\widehat{u}}_{n,m-1}\right)$ where ${\widehat{u}}_{n,m-1}=\{{N}^{-1}({\widehat{\beta}}_{n,m-1}-\beta ),{n}^{1/2}({\widehat{\sigma}}_{n,m-1}-\sigma )\},$ as follows

**Proof of Theorem 3.3.**We want to show that for all $\u03f5>0$ there exists $U>0,\phantom{\rule{4pt}{0ex}}$and ${n}_{0}$ so that for $n\ge {n}_{0}$ it holds

**Proof of Theorem 3.4.**We want to show that for all $\eta ,\u03f5>0$ there is an ${n}_{0}$ and ${m}_{0}$ so that for $n\ge {n}_{0}$ and $m\ge {m}_{0}$ it holds that

**Proof of Theorem 3.5.**The matrices Γ and $\Gamma -\lambda {I}_{p+1}$ are of the form

**Proof of Theorem 3.6.**$\left(a\right)$ For $c>0$ then $\mathsf{f}\left(x\right){1}_{\left(\right|x|\le c)}\ge \mathsf{f}\left(c\right){1}_{\left(\right|x|\le c)}$ because $\mathsf{f}$ is symmetric and non-increasing. Integration gives

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Johansen, S.; Nielsen, B. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator. *Econometrics* **2013**, *1*, 53-70.
https://doi.org/10.3390/econometrics1010053

**AMA Style**

Johansen S, Nielsen B. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator. *Econometrics*. 2013; 1(1):53-70.
https://doi.org/10.3390/econometrics1010053

**Chicago/Turabian Style**

Johansen, Søren, and Bent Nielsen. 2013. "Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator" *Econometrics* 1, no. 1: 53-70.
https://doi.org/10.3390/econometrics1010053