Next Article in Journal
Estimation of Weighted Extropy Under the α-Mixing Dependence Condition
Previous Article in Journal
Longitudinal Survival Analysis Using First Hitting Time Threshold Regression: With Applications to Wiener Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Smoothed Three-Part Redescending M-Estimator

by
Alistair J. Martin
and
Brenton R. Clarke
*
School of Mathematics, Statistics, Chemistry and Physics, Murdoch University, Murdoch, WA 6150, Australia
*
Author to whom correspondence should be addressed.
Stats 2025, 8(2), 33; https://doi.org/10.3390/stats8020033
Submission received: 22 March 2025 / Revised: 25 April 2025 / Accepted: 26 April 2025 / Published: 30 April 2025
(This article belongs to the Section Statistical Methods)

Abstract

:
A smoothed M-estimator is derived from Hampel’s three-part redescending estimator for location and scale. The estimator is shown to be weakly continuous and Fréchet differentiable in the neighbourhood of the normal distribution. Asymptotic assessment is conducted at asymmetric contaminating distributions, where smoothing is shown to improve variance and change-of-variance sensitivity. Other robust metrics compared are largely unchanged, and therefore, the smoothed functions represent an improvement for asymmetric contamination near the rejection point with little downside.

1. Introduction

In this paper, we describe a refinement of the Hampel three-part redescenders [1] that smooths sharp corners in the spirit of [2,3]. In doing so, all the asymptotic theories of weak continuity and Fréchet differentiability apply to the introduced M-estimators. The central parametric family studied here is assumed to be
F = F ( μ , σ ) ( x ) = Φ x μ σ : < μ < , σ > 0 ,
where Φ ( x ) is the standard cumulative normal distribution on the real line. Without loss of generality estimators including M-estimators described in [4,5,6], it can be studied at the standard normal distribution where ( μ , σ ) = ( 0 , 1 ) . This paper differs from many in that the asymptotics and the performance of estimators are discussed in a neighbourhood of underlying distributions. The main distribution is standard normal, with a contamination parameterised by μ 2 and α , as follows:
G 0 ( x ) = ( 1 ϵ ) Φ ( x ) + ϵ Φ x μ 2 α ,
where 0 < ϵ < 1 / 2 is considered small, μ 2 R (the real line), and α R + . The distribution is symmetric when μ 2 = 0 and asymmetric otherwise. Contamination processes themselves may be subject to the central limit theorem, whence the mixture model is apt. Moreover, regarding alternative heavy-tailed contamination, this may be represented by choosing μ 2 to be arbitrarily large. Since there is no restriction placed on μ 2 and α , very large outliers can be represented using the mixture.
In a sense [1], following the publication of [7], contaminated populations with the description of a rejection point are predicated, beyond which observations are to have zero weight on the estimation procedure. This came accompanying the introduction of the first ψ -function in robust estimation to continuously redescend to zero, as follows:
ψ 1 ; a , b , c ( x ) = x for 0 | x | a a sign x for a | x | b a c | x | c b sign x for b | x | c 0 for c | x | ,
where 0 < a b c . The tuning constant a defines a region about which the estimator is equivalent to the maximum likelihood estimate at F = Φ , ψ ( x ) = x . The value of b can be used to “Winsorise” the observations between a and b standard deviations from the true location. The rejection point is set by the tuning constant c. Figure 1 plots ψ 1 ; a , b , c . Setting b = c = yields Huber’s ψ -function. See [8].
The study of the Hampel three-part redescender considered estimating location only and made use of an auxiliary estimator of scale. The work in [1,8,9,10] addresses the question of the scale of the central distribution by suggesting using either a robust estimator of scale such as the median of the absolute deviations (MAD) estimator or its consistent version, that is,
MADN = median { | X i median ( X 1 , X 2 , , X n ) | } / 0.6745 ,
where the value 0.6745 is to make the scale estimator consistent for the scale when the underlying distribution is indeed normal. See [11] for a recent synthesis of MADN. The estimator of location is then solved via a scaled location equation. Given the sample X 1 , X 2 , X n from a hypothesised distribution in F and an estimate of the standard deviation σ , say, σ ^ , the estimating equation for location is then of the form
i = 1 n ψ 1 ; a , b , c X i μ σ ^ = 0 .
Alternatively, the estimator of both location and scale can be solved through the use of the solution of the simultaneous equations
K n ( τ 1 , τ 2 ) = 1 n i = 1 n Ψ X i τ 1 τ 2 = Ψ x τ 1 τ 2 d F n ( x ) = 0 .
Here, F n ( x ) is the empirical distribution of the sample, and Ψ = ψ 1 , ψ 2 is a vector function used in the estimation for location and scale. In this work, we propose estimators for both location and scale and, as such, focus on joint estimation. A three-part redescending M-estimator of scale may be constructed in the following form:
ψ 2 ; a , b , c ( x ) = x 2 1 P for 0 | x | a a 2 1 P for a | x | b ( a 2 1 P ) c | x | c b for b | x | c 0 for c | x | ,
where P is defined implicitly from the equation ψ 2 ; a , b , c ( x ) d Φ ( x ) = 0 . See, for example, [12]. Setting b = c = yields Huber’s Proposal 2 estimator for scale, defined in [8]. Equation (4) is illustrated in Figure 1. Then to obtain the estimator, one solves (3) with Ψ = Ψ a , b , c = ( ψ 1 ; a , b , c , ψ 2 ; a , b , c ) . Note that these functions are continuous but not smooth at a , b , and c.
Important robustness properties are a result of Fréchet differentiability, including uniform asymptotic consistency of the estimate’s variability assessment based on the influence function. See [2]. Proofs of Fréchet differentiability in the neighbourhood of a central distribution are challenging for non-smooth ψ functions; hence, smoothing to achieve a proof of Fréchet differentiability is desirable.
The remainder of this paper is structured as follows: In Section 2.1, we describe the smoothed M-estimator. Section 2.2 describes Fréchet differentiability and introduces theorems necessary to prove weak continuity and Fréchet differentiability of the introduced estimator. Additional robustness measures are introduced in Section 2.3, before comparing the smoothed and non-smoothed variants against these measures and considering their asymptotic behaviour at G 0 in Section 3.

2. Materials and Methods

2.1. A Smoothed Three-Part Redescender

The Hampel three-part redescending ψ -functions are continuous but not continuously differentiable at x = i ± a , ± b , ± c . To rectify this, one may insert smoothing functions in the 2 δ neighbourhood of each i, where δ min 1 2 a , b a , c b . As the three-part redescender is defined in terms of piecewise polynomials with heuristic tuning constants, it is desirable to retain the same properties while attaining twice differentiable functions such that higher-order asymptotics could be explored in later works. Interpolation methods, such as that outlined by ([13] p. 319), may be extended to fifth-degree polynomials, g j i ( x ) , with the following constraints to ensure the resultant ψ functions are twice differentiable:
g j i ( i δ ) = ψ j ; a , b , c ( i δ ) g j i ( i + δ ) = ψ j ; a , b , c ( i + δ ) g j i ( i δ ) = ψ j ; a , b , c ( i δ ) g j i ( i + δ ) = ψ j ; a , b , c ( i + δ ) g j i ( i δ ) = ψ j ; a , b , c ( i δ ) g j i ( i + δ ) = ψ j ; a , b , c ( i + δ )
where ψ j ; a , b , c for j 1 , 2 denotes Hampel’s location and Clarke and Milne’s scale functions for the three-part redescender with a , b , and c as given in [12]. From these boundary conditions, it is simple to formulate and solve a system of linear equations, as follows:
A j i x j i = b j i
to identify the polynomial coefficients x j i for each g j i ( x ) . Here, A j i x j i represents the left side of the aforementioned constraints, and b j i the right side. For example, the coefficients for
g 1 a ( x ) = α 1 , 5 x 5 + α 1 , 4 x 4 + α 1 , 3 x 3 + α 1 , 2 x 2 + α 1 , 1 x + α 1 , 0 g 1 a ( x ) = 5 α 1 , 5 x 4 + 4 α 1 , 4 x 3 + 3 α 1 , 3 x 2 + 2 α 1 , 2 x + α 1 , 1 g 1 a ( x ) = 20 α 1 , 5 x 3 + 12 α 1 , 4 x 2 + 6 α 1 , 3 x + 2 α 1 , 2
are the solution of (5), where
A 1 a = ( a δ ) 5 ( a δ ) 4 ( a δ ) 3 ( a δ ) 2 ( a δ ) 1 ( a + δ ) 5 ( a + δ ) 4 ( a + δ ) 3 ( a + δ ) 2 ( a + δ ) 1 5 ( a δ ) 4 4 ( a δ ) 3 3 ( a δ ) 2 2 ( a δ ) 1 0 5 ( a + δ ) 4 4 ( a + δ ) 3 3 ( a + δ ) 2 2 ( a + δ ) 1 0 20 ( a δ ) 3 12 ( a δ ) 2 6 ( a δ ) 2 0 0 20 ( a + δ ) 3 12 ( a + δ ) 2 6 ( a + δ ) 2 0 0 , x 1 a = α 1 , 5 α 1 , 4 α 1 , 3 α 1 , 2 α 1 , 1 α 1 , 0 T ,
and
b 1 a = ψ 1 ; a , b , c ( a δ ) ψ 1 ; a , b , c ( a + δ ) ψ 1 ; a , b , c ( a δ ) ψ 1 ; a , b , c ( a + δ ) ψ 1 ; a , b , c ( a δ ) ψ 1 ; a , b , c ( a + δ ) = ( a δ ) a 1 0 0 0 .

2.1.1. Smoothed Three-Part Redescender for Location

Inserting polynomial smoothing functions in the 2 δ regions around a, b, and c yields the following ψ function for location:
ψ 1 ; a , b , c , δ = x | x | ( a δ ) g 1 a ( x ) ( a δ ) < | x | ( a + δ ) a sign x ( a + δ ) | x | ( b δ ) g 1 b ( x ) ( b δ ) < | x | ( b + δ ) a c | x | c b sign x ( b + δ ) | x | ( c δ ) g 1 c ( x ) ( c δ ) < | x | ( c + δ ) 0 otherwise
where
g 1 a ( x ) = α 1 , 4 x 4 + α 1 , 2 x 2 + α 1 , 0 sign x + α 1 , 3 x 3 + α 1 , 1 x g 1 b ( x ) = β 1 , 4 x 4 + β 1 , 2 x 2 + β 1 , 0 sign x + β 1 , 3 x 3 + β 1 , 1 x g 1 c ( x ) = γ 1 , 4 x 4 + γ 1 , 2 x 2 + γ 1 , 0 sign x + γ 1 , 3 x 3 + γ 1 , 1 x
and
α 1 , 4 = ζ α 1 , 3 = 4 a ζ α 1 , 2 = 6 a 2 δ 2 ζ α 1 , 1 = 4 a 3 3 a δ 2 2 δ 3 ζ α 1 , 0 = a 4 6 a 2 δ 2 + 8 a δ 3 3 δ 4 ζ
β 1 , 4 = a ξ β 1 , 3 = 4 a b ξ β 1 , 2 = 6 a ( b 2 δ 2 ) ξ β 1 , 1 = 4 a b 3 3 b δ 2 + 2 δ 3 ξ β 1 , 0 = a ( b 4 6 b 2 δ 2 8 b δ 3 3 δ 4 + 16 c δ 3 ) ξ
γ 1 , 4 = a ξ γ 1 , 3 = 4 a c ξ γ 1 , 2 = 6 a c 2 δ 2 ξ γ 1 , 1 = 4 a ( c 3 3 c δ 2 2 δ 3 ) ξ γ 1 , 0 = a c 4 6 c 2 δ 2 8 c δ 3 3 δ 4 ξ
for ζ = 16 δ 3 1 , and ξ = 16 δ 3 ( b c ) 1 . Figure 2 plots the smoothed three-part redescender for location. In this general form, the functional may appear to suffer from too many parameters; however, the coefficients α 1 , · , β 1 , · , γ 1 , · are wholly dependent upon a , b , c , and δ . The resultant ψ -function is piecewise polynomial.

2.1.2. Smoothed Three-Part Redescender for Scale

The smoothing procedure for the scale ψ function yields the following definition:
ψ 2 ; a , b , c , δ = x 2 1 P | x | ( a δ ) g 2 a ( x ) ( a δ ) | x | ( a + δ ) η ( a + δ ) | x | ( b δ ) g 2 b ( x ) ( b δ ) < | x | ( b + δ ) η c | x | c b ( b + δ ) | x | ( c δ ) g 2 c ( x ) ( c δ ) | x | ( c + δ ) 0 otherwise
where
g 2 a ( x ) = α 2 , 5 x 5 + α 2 , 3 x 3 + α 2 , 1 x sign x + α 2 , 4 x 4 + α 2 , 2 x 2 + α 2 , 0 g 2 b ( x ) = β 2 , 4 x 4 + β 2 , 2 x 2 + β 2 , 3 x 3 + β 2 , 1 x sign x + β 2 , 0 g 2 c ( x ) = γ 2 , 4 x 4 + γ 2 , 2 x 2 + γ 2 , 3 x 3 + γ 2 , 1 x sign x + γ 2 , 0
and
α 2 , 5 = ζ α 2 , 4 = 3 a ζ α 2 , 3 = 2 ( a 2 3 δ 2 ) ζ α 2 , 2 = 2 ( a 3 + 3 a δ 2 + 4 δ 3 ) ζ α 2 , 1 = 3 ( a 4 2 a 2 δ 2 + δ 4 ) ζ α 2 , 0 = 16 P δ 3 + 3 a δ 4 a 5 + 16 δ 3 8 a 2 δ 3 + 6 a 3 δ 2 ζ
β 2 , 4 = η ξ β 2 , 3 = 4 b η ξ β 2 , 2 = 6 ( b 2 δ 2 ) η ξ β 2 , 1 = 4 ( b 3 + 2 δ 3 3 b δ 2 ) η ξ β 2 , 0 = 6 b 2 δ 2 + 8 b δ 3 16 c δ 3 + 3 δ 4 b 4 η ξ
γ 2 , 4 = η ξ γ 2 , 3 = 4 c η ξ γ 2 , 2 = 6 ( c 2 δ 2 ) η ξ γ 2 , 1 = 4 ( 2 δ 3 c 3 + 3 c δ 2 ) η ξ γ 2 , 0 = ( 3 δ 4 c 4 + 6 c 2 δ 2 + 8 c δ 3 ) η ξ
for ζ = 16 δ 3 1 and ξ = 16 δ 3 ( b c ) 1 , as before, and η = a 2 1 P . Once again, the coefficients α 2 , · , β 2 , · , γ 2 , · are not free variables; they are dependent upon a , b , c , and δ .
As for the non-smoothed three-part redescender, P may be found as the root of the implicit equation ψ 2 ( x ) ϕ ( x ) d x = 0 , or can be expanded to an explicit form in terms of the standard normal distribution pdf and cdf (as shown in the Supplementary Materials). Figure 3 plots the above smoothed function for scale.

2.2. The Influence Function and Fréchet Derivative

To compare robustness and the behaviour of estimators, one requires appropriate measures and tools. The estimator, which is a solution of Equation (3), is related here in its implicit functional form. Quoting from [6] (p. 29), where now F is any suitable parametric family, “A single functional root of Equation (3) may be written T [ Ψ , F n ] where F n is the empirical distribution function. More generally T [ Ψ , G ] can be defined as a functional root of equations
K G ( τ ) = R Ψ ( x , τ ) d G ( x ) = 0 .
T [ Ψ , G ] = + if no root exists. In the event of several roots of (6), a distance criterion ρ o is employed to select the estimator from them. Classically, the roots correspond to extremes of the distance, but this is not assumed here. Specifically, the general functional T [ Ψ , ρ o , · ] is defined as the solution to
inf τ S ( Ψ , G ) ρ o ( G , τ ) = ρ o ( G , T [ Ψ , ρ o , G ] ) ,
where
S ( Ψ , G ) = τ T R Ψ ( x , τ ) d G ( x ) = 0 ,
if a solution exists. Otherwise T [ Ψ , ρ o , G ] = + . Conditions on both Ψ and ρ o determine consistency, the very minimum requirement being Fisher consistency, which means that T [ Ψ , ρ o , F τ ] = τ for all τ T . That is if you have the whole population you must get the corresponding true parameter. This implicitly also assumes the parametric model for the population is identifiable. A parametric family F is identifiable if and only if F τ 1 = F τ 2 implies τ 1 = τ 2 for all τ 1 , τ 2 T . That is, two different parameters cannot give the same distribution”.
An easily afforded selection functional ρ o at the location and scale parametric model is
ρ o ( G , τ ) = τ G 1 1 2 , MADN ( G ) 2
Here, · is the Euclidean norm. The functional form of MADN and the influence function of MAD are discussed in [10] Section 5.2 and also [11,14].
First introduced by [1,15], the Influence Function (IF) is a heuristic tool that indicates the infinitesimal influence of contamination at a given point, x = z 0 , on the asymptotic value T [ ψ , G 0 ] . In our discussion, τ 0 = T [ Ψ , ρ o , G 0 ] . For the estimating functional T at a distribution G 0 , the IF is defined as
IF ( z 0 ; G 0 , T ) = lim ε 0 + T [ ( 1 ε ) G 0 + ε Δ z 0 ] T [ G 0 ] ε
where Δ z 0 puts mass 1 at the point z 0 . In [5,16], the authors write an equivalent definition, as follows:
IF ( z 0 ; G 0 , T ) = M τ 0 , G 0 1 Ψ z 0 , τ 0 .
The matrix
M ( τ 0 , G 0 ) = R τ Ψ ( x , τ ) τ = τ 0 d G 0 ( x )
is assumed to be nonsingular. In this form, we see that the Influence Function is proportional to Ψ , and indeed, the Fréchet derivative in Theorem 2 to be expressed below is exactly
T G 0 ( G G 0 ) = R IF ( x ; G 0 , T ) d ( G G 0 ) ( x ) .
See [17] for an earlier chronological account of the development of the Fréchet derivative in statistics.
So long as we have a compact set D that excludes the solution for scale τ 2 = 0 in which we search for the roots, the conditions set out below are easy to check and are sufficient to provide weak continuity and Fréchet differentiability.
  • Conditions  W
W 0 :
T [ Ψ , ρ 0 , G 0 ] = τ 0 .
W 1 :
Ψ is a 2 × 1 vector function on R × Θ and has continuous partial derivatives on R × D , where D Θ is some non-degenerate compact interval containing τ 0 = ( τ 01 , τ 02 ) in its interior and for which the scale τ 02 is bounded away from zero.
W 2 :
Ψ ( x , τ ) | τ D , τ Ψ ( x , τ ) | τ D are bounded above in Euclidean norm | A | = trace ( A T A ) 1 / 2 by a constant.
W 3 :
The matrix M τ 0 , G 0 , given by (12), is nonsingular.
The innovation in the conditions made here is to describe the Conditions at G 0 , rather than an assumed F τ . Note that we discuss the behaviour of the estimating functional at a solution T [ Ψ , ρ 0 , G 0 ] = τ 0 different from T [ Ψ , ρ 0 , F τ ] = τ . The restriction to a compact set D is for convenience rather than necessity, since it is easier to check the conditions. Since G 0 is chosen to be from the mixture distribution (1) which has for any given G 0 a bounded density and is an absolutely continuous distribution, the Conditions W imply Conditions A analogous to those defined in [6] (pp. 31,32) for the neighbourhoods generated by any one of the metric distances, these being Kolmogorov (also known as the Kolmogorov–Smirnov), Lévy, or Prohorov metrics. For example, the Kolmogorov–Smirnov metric or supremum norm between the two distributions F and G defined on the real line is d K S ( F , G ) = sup x | F ( x ) G ( x ) | . At an absolutely continuous distribution G 0 with a density that is bounded by a constant C > 0 , it is noted, compare [6] problem 2.3, as follows:
d K S ( G , G 0 ) ( C + 1 ) d L ( G , G 0 ) ( C + 1 ) d P ( G , G 0 ) ,
where d L is the Lévy metric and d P is the Prohorov metric. The Prohorov metric governs the weak topology, as discussed in [18], and it is known that the empirical distribution generated by G 0 converges weakly to G 0 on a set of probability one. Hence, for a weakly continuous estimator, that is, an estimator that is continuous with respect to the Prohorov metric, we have automatic consistency, as explained further below. Moreover, the smoothed M-estimators discussed in this paper will be robust at G 0 in the sense of Theorem 2 of [18], as demonstrated in Theorem 2.5 and Corollary 2.3 in [6].
However, for Ψ choices with continuous partial derivatives, as is the case of Ψ = ψ 1 ; a , b , c , δ , ψ 2 ; a , b , c , δ , and noting that the functions are of total bounded variation, we can claim the existence of a weakly continuous and Fréchet differentiable root as a corollary to [6] [Theorem 2.3, Theorem 2.6] and [6] [Theorem 2.5] using the auxiliary selection functional ρ ( G , τ ) = | τ τ 0 | .
Theorem 1
(Adapting Theorem 2.5 of [6]). Assume Conditions W . Then for ϵ small and non-negative, there exists a Prohorov neighbourhood U ( G 0 ) of G 0 such that the functional defined via T [ Ψ , ρ , · ] is weakly continuous at each G U ( G 0 ) .
See Appendix A for the proof of Theorem 1. In addition, we can write
Theorem 2
(Adapting Theorem 2.6 of [6]). Assume Conditions W and assume for all G G , the space of distributions on the real line, that
R Ψ ( x , τ 0 ) d ( G G 0 ) ( x ) = O ( d K S ( G , G 0 ) )
as d K S ( G , G 0 ) 0 . Then
T [ Ψ , ρ , G ] T [ Ψ , ρ , G 0 ] T G 0 ( G G 0 ) = o ( d K S ( G , G 0 ) ) ,
where
T G 0 ( G G 0 ) = M ( τ 0 , G 0 ) 1 R Ψ ( x , τ 0 ) d ( G G 0 ) ( x )
That is, T [ Ψ , ρ , · ] is Fréchet differentiable at G 0 with respect to the Kolmogorov–Smirnov metric.
See the proof of Theorem 2 in Appendix B.
As argued in [10] (pp. 38–39) and [6], Corollary 2.5, with F θ replaced by G 0 , it is possible to state that there exists a unique consistent root of equations T [ Ψ , ρ , F n ] to T [ Ψ , ρ , G 0 ] for which the asymptotic distribution
n T [ Ψ , ρ , F n ] T [ Ψ , ρ , G 0 ]
is multivariate normal with mean zero and asymptotic variance Σ ( T , G 0 ) , which is given below in (14).
For example, the estimator is Fréchet differentiable at G 0 with respect to the supremum norm, essentially using the Kolmogorov–Smirnov metric distance to measure the distance between distributions. This can therefore give the asymptotic distribution of n T [ Ψ , F n ] T [ Ψ , G 0 ] as multivariate normal with a mean zero and variance–covariance matrix, given by Σ ( T , G 0 ) , which is
M ( τ 0 , G 0 ) 1 R Ψ ( x , τ 0 ) Ψ ( x , τ 0 ) d G 0 ( x ) { M ( τ 0 , G 0 ) 1 } ,
where we interpret Ψ ( x , τ ) to be Ψ x τ 1 τ 2 when ( τ 1 , τ 2 ) = τ . See [6] (p. 46), Corollary 2.4 and Corollary 2.5.
When the functions Ψ have sharp corners, as for the three-part redescender, the weak continuity and asymptotic robustness arguments are only at G 0 and not at all distributions in a neighbourhood of G 0 . Essentially, only Theorem 1 and not Theorem 2 of [18] can be established. Asymptotic normality at G 0 does follow if one uses Conditions A in [19], again for ϵ small and again replacing the parametric distribution by G 0 and noting that the aforesaid Ψ functions are Lipschitz.
On the other hand, with the suggested smoothed three-part redescending ψ -functions described in Section 2.1, all the asymptotic theory, including both Theorem 1 and Theorem 2 of [18], is covered as Conditions W are satisfied.
It remains to show that, for small ϵ , the auxiliary selection functional ρ can be replaced by the practical selection functional ρ 0 in the above results. It can be noted that, in every open neighbourhood N of ( μ , σ ) for F ( μ , σ ) = Φ ( x μ σ ) ,
inf τ N ρ 0 F ( μ , σ ) , τ ρ 0 F ( μ , σ ) , ( μ , σ ) > 0
and since both the median and MADN are weakly continuous at absolutely continuous distributions, then it follows that, for every η > 0 , there exists an ε > 0 such that G n ε , F ( μ , σ ) implies that ρ 0 ( G , τ ) is continuous in τ T and satisfies
sup τ T ρ 0 G , τ ρ 0 F ( μ , σ ) , τ < η .
Here, the neighbourhoods n ε , F ( μ , σ ) are defined in [6] (p. 31) and can be defined by the metric distance between distributions or, for example, an epsilon contaminated neighbourhood, as we deal with here. Since (15) and (16) are the equivalent of [6] Equations (2.12) and (2.13), where, here, ρ 0 is defined in (9), it follows that this is a valid selection functional. Moreover, according to [10] both the Median and MADN estimators are weakly continuous at the normal and mixed normal distributions considered in this paper, since they are both absolutely continuous distributions. The consequent continuity of the function ρ 0 in both its arguments assures a weakly continuous functional T [ Ψ , ρ 0 , · ] . By [20], the empirical distribution function F n generated by G 0 converges weakly to the distribution G 0 on a set of probability one, and a result of [21] gives that d p ( F n , G 0 ) 0 almost surely. Thus, we have almost sure consistency in the sense that T [ Ψ , ρ 0 , F n ] a . s . T [ Ψ , ρ 0 , G 0 ] .

2.3. Quantifiable Measures of Robustness

Gross-error sensitivity quantifies the worst possible influence an infinitesimal gross error may have on the estimator, and is defined for a one-dimensional estimate by
γ * T , G 0 = sup x R IF x ; G 0 , T .
A bounded IF has a finite gross-error sensitivity, and the estimator is then said to be bias robust, or B-Robust [22]. There exists a trade-off between B-Robustness and efficiency. For example, the sample mean is the most efficient estimator at Φ but is not robust, as γ * = .
Several methods exist for multi-dimensional γ * ; however, for simplicity, we consider γ * T , Φ for location and scale estimation separately, as does [5] in the discussion of the tanh estimator.
Local-shift sensitivity measures local fluctuations, such as that which might arise from rounding, grouping, or observational inaccuracies, and compliments the global worst-case measure provided by gross-error sensitivity. It is defined by the standardised, worst-case slope, as follows:
λ * = sup x y IF y ; G 0 , T IF x ; G 0 , T y x .
The change of variance function (CVF) introduced by [22,23] provides insight into the robustness of the estimator’s asymptotic variance, providing complimentary assessment of asymptotic value robustness provided by the IF. For continuous ψ -functions at G 0 , under conditions outlined in [5] (pp. 125–126), the CVF is defined as
CVF ( z 0 ; ψ , G 0 ) = ϵ V ψ , ( 1 ϵ ) G 0 + ϵ 2 Δ z 0 + Δ z 0 ϵ = 0 ,
where V ( ψ , G 0 ) is the asymptotic variance of the ψ estimator at G 0 . All estimators under consideration in this study adhere to the aforementioned conditions. A more complete introduction to the CVF may be found in [5], Chapter 2.5.
Similar to the IF, the CVF indicates the influence of contamination at z 0 upon the asymptotic variance of the estimator. Unlike the IF, where all large deviations from zero are indicative of potential bias, positive and negative, CVF values are interpreted differently; negative CVF values indicate a decrease in asymptotic variance, so one is more concerned with positive CVF values.
The standardised change of variance sensitivity for continuous ψ -functions is defined by
κ * ψ , G 0 = sup x R CVF ( x ; ψ , G 0 ) V ( ψ , G 0 ) .
In [5], the authors provide a more complete definition, including for the case of piecewise ψ -functions with jump discontinuities. A finite κ * is said to be variance robust, or V-Robust.

3. Results

We compute asymptotic efficiency, γ * , λ * , and κ * at Φ , and asymptotic variance (14) at asymmetric distributions G 0 for each estimator. T [ Ψ , G 0 ] and Σ ( T , G 0 ) for polynomial-defined ψ -functions can be computed more efficiently via simple-but-elaborate calculus, which we provide in the Supplementary Materials.

3.1. Smoothed and Non-Smoothed Estimator Comparison

In Table 1, the asymptotic variance of the smoothed and non-smoothed three-part estimator at Φ is tabulated with a, b, and c values as used in [12]. δ was chosen to be the highest value in 1 2 min { a , b a , c b } . ν 11 and ν 22 represent the asymptotic variance, which are the diagonals of (14). While ν 11 is by and large equivalent for the smoothed estimator, ν 22 is consistently improved.
Table 2 lists measures of robustness for a smaller set of the three-part functional parameters. Estimator efficiency and local-shift sensitivity are seen to be equivalent for the smoothed and non-smoothed estimators. Smoothing greatly improves the change-of-variance sensitivity, at a much smaller expense of increased gross-error sensitivity. This improvement may be visualised by plots of the CVF, as shown in Figure 4 for location and scale, where smoothing blunts the sharp corners of the maxima. The smoothed CVFs for all parameter values in Table 2, for location and scale, are compared in Figure 5. Note that more efficient parameter choices yield higher change-of-variance peaks (which are lower than what would appear for the non-smoothed estimator).
It is worth noting that the rejection point for the three-part redescender is ± c , and at ± c + δ for the smoothed variant. In practise, the value for the latter of Ψ a , b , c ( c ) 0 and δ is small, so this change has little impact on the estimator’s robustness, as can be seen in Table 1 and Table 2.

3.2. Contaminated Distribution Asymptotics

We now consider the asymptotic value and variance of T ψ , G 0 , the multivariate location and scale solution of (6) under the contaminated model (1). Setting ϵ = 0.05 , we simulate the case where around 5% of the data pollutes our otherwise standard normal distribution. We hold α { 1 , 2 , 3 } and allow μ 2 to vary. Figure 6 and Figure 7 show the asymptotic values for location and scale, respectively, for the smoothed three-part with parameters as tabulated in Table 2.
Note that if α 0 , Figure 6 and Figure 7 would resemble the IF and Figure 8 and Figure 9 would resemble the CVF for the respective estimators. Instead, we are interested in the effect of scattered contamination in the vicinity of μ 2 .
Clearly, if μ 2 is non-zero and ϵ > 0 , the estimator, which is a solution of Equation (3), will be consistent to something other than μ , σ T = 0 , 1 T . This leads to the asymptotic bias, as given by T [ ψ , G 0 ] T [ ψ , Φ ] . One can see that as scattered contamination moves beyond the ( c + δ ) σ rejection point, the estimates tend towards the true μ and σ .
Figure 10 illustrates the benefits of smoothing for higher levels of contamination. The smoothed three-part estimator with a = 2 , b = 2.6 , c = 3.3 is compared with δ 0 , 0.3 . Clearly, asymptotic variance rises sharply around μ 2 3.6 ; however, with smoothing, the peak is ≈75% lower, indicating that smoothing is particularly useful when a higher level of contamination may be present in the vicinity of the rejection point.

4. Conclusions

A smoothed three-part redescender is introduced, as a refinement of Hampel’s three-part estimators, with definitions based entirely upon piecewise polynomials. Smoothing is shown to yield a significant improvement in both change of variance sensitivity and asymptotic variance near the rejection point when the distribution has higher levels of asymmetric contamination. Additionally, the resultant functional is shown to be both weakly continuous and Fréchet differentiable in the neighbourhood of the normal distribution. Other robustness measures, including efficiency and asymptotic bias, are largely unchanged from the non-smoothed functions, indicating little-to-no downside to using the smoothed ψ -functions in place of the traditional non-smoothed variants. Anywhere an M-estimator for location and/or scale has been employed for statistical applications may be enhanced by using our approach.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/stats8020033/s1. Calculus for a Smoothed Three Part Redescending M-Estimator.

Author Contributions

Conceptualisation, A.J.M. and B.R.C.; methodology, B.R.C. and A.J.M.; software, A.J.M.; validation, formal analysis, investigation, resources, and data curation, A.J.M. and B.R.C.; writing—original draft preparation, A.J.M. and B.R.C.; writing—review and editing, A.J.M. and B.R.C.; visualisation, A.J.M. and B.R.C.; supervision, B.R.C.; project administration, B.R.C.; funding acquisition, A.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by an Australian Government Research Training Program (RTP) Scholarship.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analysed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors report that there are no competing interests to declare.

Abbreviations

The following abbreviations are used in this manuscript:
MADmedian of absolute deviations
MADNnormalised median of absolute deviations

Appendix A. Proof of Theorem 1

The only difference in Conditions W in [6] (p. 43) and Conditions W in this paper is condition W 0 and Condition W 3 , but the argument is exactly the same, resulting in Theorem 1 in this paper. Note that, in W 0 , F θ is replaced by G 0 and θ is replaced by τ 0 . In W 3 , the matrix M ( θ ) is replaced with M ( τ 0 , G 0 ) (see (12)), which is assumed to be nonsingular.

Appendix B. Description of Proof of Theorem 2

We illustrate the Proof of Theorem 2 in this paper for the assumed choice of
Ψ a , b , c , δ ( x , τ ) = ψ 1 ; a , b , c , δ x τ 1 τ 2 , ψ 2 ; a , b , c , δ x τ 1 τ 2 ,
being the smoothed Hampel three-part redescender for location and scale, when δ > 0 .
Note that Formula (13) of this paper is satisfied since the component functions in Ψ a , b , c , δ are polynomials of degree at most five in their arguments with coefficients that are bounded. The next step is to integrate by parts each component power of x k , for 1 k 5 . Note that, for any partition on which x k is evaluated, call it P i for the i t h partition, evaluating at the limits of integration, which are, say, D m i n ; P i and D m a x ; P i , which, as the assumed parameter set is compact with a scale σ that is bounded away from zero, are both less than infinity in absolute value. It follows, for example, integrating by parts, that for A = D m i n ; P i and B = D m a x ; P i ,
A B x k d ( G G 0 ) = B k ( G ( B ) G 0 ( B ) ) A k ( G ( A ) G 0 ( A ) ) A B ( G ( x ) G 0 ( x ) ) k x k 1 d x
Noting also
( G ( B ) G 0 ( B ) d K S ( G , G 0 )
and similarly
( G ( A ) G 0 ( A ) ) d K S ( G , G 0 )
and seeing also
A B ( G ( x ) G 0 ( x ) ) k x k 1 d x A B d K S ( G , G 0 ) k x k 1 d x < C o n s t × d K S ( G , G 0 ) ,
it follows from the above integration that
A B x k d ( G G 0 ) c o n s t × d K S ( G , G 0 )
whence it is seen that combining the integrals over all the finite number of partitions gives that
R Ψ a , b , c , δ ( x , τ 0 ) d ( G G 0 ) ( x ) = O ( d K S ( G , G 0 ) )
as d K S ( G , G 0 ) 0 .
Now, consider G 0 n ( ϵ , Φ ) n p ( ϵ , Φ ) . According to [6] Theorem 2.5 and Corollary 2.3, there exists a Prohorov neighbourhood, U ( F θ ) , such that T [ Ψ , ρ , · ] exists and is weakly continuous at each G U ( F θ ) . Note that the arguments needed for a vector θ = [ 0 , 1 ] , for example, are the same as in [6]. Assuming Conditions W , there exists a qualitatively robust M-functional estimator T n , which is robust and consistent with T [ G ] for all G U ( F θ ) . This includes G 0 for small ϵ .
Finally, it can be remarked that Huber’s monograph [10] establishes the weak continuity of the M A D estimating functional at an absolutely continuous distribution and, combined with the weak continuity of the median, suggests that the selection functional ρ o F ( μ , σ ) , τ (9) is also weakly continuous at an absolutely continuous distribution such as G 0 in the main paper. Using arguments equivalent to [6] Theorem 2.4, it can be noted that T [ Ψ a , b , c , δ , ρ , · ] = T [ Ψ a , b , c , δ , ρ 0 , · ] for all G in a small-enough Prohorov neighbourhood of Φ ( x ) , which includes the distribution G 0 for small-enough ϵ .

References

  1. Hampel, F.R. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
  2. Bednarski, T.; Zontek, S. Robust estimation of parameters in a mixed unbalanced model. Ann. Stat. 1996, 24, 1493–1510. [Google Scholar] [CrossRef]
  3. Bachmaier, M. Consistency of completely outlier-adjusted simultaneous redescending M-estimators of location and scale. Adv. Stat. Anal. 2007, 91, 197–219. [Google Scholar] [CrossRef]
  4. Huber, P.J.; Ronchetti, E.M. Robust Statistics, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2009. [Google Scholar]
  5. Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics, the Approach Based on Influence Functions; John Wiley & Sons: New York, NY, USA, 1986. [Google Scholar]
  6. Clarke, B.R. Robustness Theory and Application; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
  7. Andrews, D.F.; Bickel, P.J.; Hampel, F.R.; Huber, P.J.; Rogers, W.H.; Tukey, J.W. Robust Estimates of Location: Survey and Advances; Princeton University Press: Princeton, NJ, USA, 1972. [Google Scholar]
  8. Huber, P.J. Robust estimation of a location parameter. Ann. Math. Statist. 1964, 35, 73–101. [Google Scholar] [CrossRef]
  9. Huber, P.J. Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1973, 1, 799–821. [Google Scholar] [CrossRef]
  10. Huber, P.J. Robust Statistics, 1st ed.; John Wiley & Sons: New York, NY, USA, 1981. [Google Scholar]
  11. Arachchige, C.N.P.G.; Prendergast, L.A. Confidence intervals for median absolute deviations. Commun. Stat. Simul. Comput. 2024, 52, 1–10. [Google Scholar] [CrossRef]
  12. Clarke, B.R.; Milne, C.J. A small sample bias correction and implications for inference. In Proceedings of the 59th ISI World Statistics Congress, Hong Kong, China, 25–30 August 2013. [Google Scholar]
  13. Hearn, D.; Baker, M.P. Computer Graphics, 2nd ed.; Prentice Hall, Inc.: New York, NY, USA, 1997. [Google Scholar]
  14. Rousseeuw, P.J.; Croux, C. Alternatives to the median absolute deviation. J. Am. Stat. Assoc. 1993, 88, 1273–1283. [Google Scholar] [CrossRef]
  15. Hampel, F.R. Contributions to the Theory Of Robust Estimation. Ph.D. Thesis, University of California, Berkeley, CA, USA, 1968. [Google Scholar]
  16. Huber, P.J. Robust Statistical Procedures, 2nd ed.; CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1996. [Google Scholar]
  17. Bednarski, T. Fréchet differentiability and robust estimation. In Asymptotic Statistics, Proceedings of the Fifth Prague Symposium 1994; Mandl, P., Hušková, M., Eds.; Springer: Berlin/Heidelberg, Germany, 1993; pp. 49–58. [Google Scholar]
  18. Hampel, F.R. A general qualitative definition of robustness. Ann. Math. Statist. 1971, 42, 1887–1896. [Google Scholar] [CrossRef]
  19. Clarke, B.R. Nonsmooth analysis and Fréchet differentiability of M-functionals. Probab. Theory Relat. Fields 1986, 73, 197–209. [Google Scholar] [CrossRef]
  20. Varadarajan, V.S. On the convergence of probability distributions. Sankhy A 1958, 19, 23–26. [Google Scholar]
  21. Prohorov, Y.V. Convergence of random processes and limit theorems in probability. Theory Probab. Appl. 1956, 1, 157–214. [Google Scholar] [CrossRef]
  22. Rousseeuw, P.J. A new infinitesimal approach to robust estimation. Z. Wahrsch. Verw. Geb. 1981, 56, 127–132. [Google Scholar] [CrossRef]
  23. Rousseeuw, P. New Infinitesimal Methods in Robust Statistics. Ph.D. Thesis, Vrije Universiteit Brussel, Brussells, Belgium, 1981. [Google Scholar]
Figure 1. Plots of the three-part redescenders.
Figure 1. Plots of the three-part redescenders.
Stats 08 00033 g001
Figure 2. Plot of the smoothed ψ 1 ; a , b , c , δ redescender for location.
Figure 2. Plot of the smoothed ψ 1 ; a , b , c , δ redescender for location.
Stats 08 00033 g002
Figure 3. Plot of the smoothed ψ 2 ; a , b , c , δ redescender for scale.
Figure 3. Plot of the smoothed ψ 2 ; a , b , c , δ redescender for scale.
Stats 08 00033 g003
Figure 4. The CVF ( x ; ψ , Φ ) of the three-part ψ -function, with a = 1.96 , b = 2.4 , c = 3.3 , shown with and without δ = 0.2 smoothing.
Figure 4. The CVF ( x ; ψ , Φ ) of the three-part ψ -function, with a = 1.96 , b = 2.4 , c = 3.3 , shown with and without δ = 0.2 smoothing.
Stats 08 00033 g004
Figure 5. The smoothed three-part CVF ( x ; ψ , Φ ) with smoothed functional parameters, as described in Table 2.
Figure 5. The smoothed three-part CVF ( x ; ψ , Φ ) with smoothed functional parameters, as described in Table 2.
Stats 08 00033 g005
Figure 6. Plots of the asymptotic location solution of (6), τ 1 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Figure 6. Plots of the asymptotic location solution of (6), τ 1 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Stats 08 00033 g006
Figure 7. Plots of the asymptotic scale solution of (6), τ 2 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Figure 7. Plots of the asymptotic scale solution of (6), τ 2 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Stats 08 00033 g007
Figure 8. Plots of asymptotic variance for the location solution of (6), labelled ν 11 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Figure 8. Plots of asymptotic variance for the location solution of (6), labelled ν 11 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Stats 08 00033 g008
Figure 9. Plots’ asymptotic variance for the scale solution of (6), labelled ν 22 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Figure 9. Plots’ asymptotic variance for the scale solution of (6), labelled ν 22 , for the estimators at (1) on μ 2 [ 0 , 15 ] . ϵ = 0.05 , representing an increasing scatter for the contaminating process. Smoothed ψ -function parameters are as described for Table 2.
Stats 08 00033 g009
Figure 10. Asymptotic variance on μ 2 [ 0 , 15 ] with ϵ = 0.15 and α = 1 . The functional parameters are a = 2 , b = 2.6 , c = 3.2 , and δ = 0 versus δ = 0.3 .
Figure 10. Asymptotic variance on μ 2 [ 0 , 15 ] with ϵ = 0.15 and α = 1 . The functional parameters are a = 2 , b = 2.6 , c = 3.2 , and δ = 0 versus δ = 0.3 .
Stats 08 00033 g010
Table 1. Asymptotic variance, rounded to four decimal places, at the standard normal distribution for location and scale, comparing the smoothed and non-smoothed three-part redescender, under the same parameters as [12]. Lower variance for equal a , b , c parameters is highlighted in bold.
Table 1. Asymptotic variance, rounded to four decimal places, at the standard normal distribution for location and scale, comparing the smoothed and non-smoothed three-part redescender, under the same parameters as [12]. Lower variance for equal a , b , c parameters is highlighted in bold.
SmoothedNon-Smoothed
a b c P δ ν 11 ν 22 P ν 11 ν 22
1.2851.962.575−0.355160.30751.21641.1292−0.345051.21821.1493
1.312.0392.575−0.338170.2681.19981.0862−0.330631.20131.1000
1.312.0394−0.327570.36451.09580.8542−0.315001.09660.8747
1.312.5753.5−0.330020.46251.07950.8104−0.310351.08020.8381
1.52.53.5−0.248140.51.06450.7367−0.227281.06370.7513
1.645−0.17480.31.02590.6352−0.168681.02620.6402
1.64523.3−0.195780.17751.09420.7822−0.193121.09430.7841
1.6452.243.3−0.190830.29751.07540.7426−0.183771.07510.7461
1.6452.44−0.185570.37751.04700.6818−0.175061.04660.6874
1.96−0.091170.31.01150.5692−0.087021.01160.5710
1.962.43.3−0.10690.221.05010.6537−0.103991.05000.6542
1.962.5754−0.098460.30751.02630.6042−0.093521.02590.6050
----01.0000.500
Table 2. Measures of robustness for the smoothed ( δ > 0 ) and non-smoothed ( δ = 0 ) three-part redescending M-estimators at Φ , where e is efficiency, γ * is gross-error sensitivity, λ * is local-shift sensitivity, and κ * is change-of-variance sensitivity.
Table 2. Measures of robustness for the smoothed ( δ > 0 ) and non-smoothed ( δ = 0 ) three-part redescending M-estimators at Φ , where e is efficiency, γ * is gross-error sensitivity, λ * is local-shift sensitivity, and κ * is change-of-variance sensitivity.
LocationScale
a b c P δ e γ * λ * κ * e γ * λ * κ *
1.64523.3−0.193100.9141.9501.2657.4740.6381.9603.29011.931
1.64523.3−0.19570.1750.9141.9541.2656.6990.6391.9732.94011.350
1.962.43.3−0.104000.9522.1392.17810.1100.7642.2553.92020.794
1.962.43.3−0.10640.20.9522.1432.1788.5880.7652.2703.52018.960
22.63.2−0.092600.9592.1553.33312.6360.7862.2714.00028.793
22.63.2−0.09760.30.9582.1643.3339.6510.7862.3043.40025.417
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Martin, A.J.; Clarke, B.R. A Smoothed Three-Part Redescending M-Estimator. Stats 2025, 8, 33. https://doi.org/10.3390/stats8020033

AMA Style

Martin AJ, Clarke BR. A Smoothed Three-Part Redescending M-Estimator. Stats. 2025; 8(2):33. https://doi.org/10.3390/stats8020033

Chicago/Turabian Style

Martin, Alistair J., and Brenton R. Clarke. 2025. "A Smoothed Three-Part Redescending M-Estimator" Stats 8, no. 2: 33. https://doi.org/10.3390/stats8020033

APA Style

Martin, A. J., & Clarke, B. R. (2025). A Smoothed Three-Part Redescending M-Estimator. Stats, 8(2), 33. https://doi.org/10.3390/stats8020033

Article Metrics

Back to TopTop