On the Robustness and Sensitivity of Several Nonparametric Estimators via the Inﬂuence Curve Measure: A Brief Study

: The use of inﬂuence curve as a measure of sensitivity is not new in the literature but has not been properly explored to the best of our knowledge. In particular, the mathematical derivation of the inﬂuence function for several popular nonparametric estimators (such as trimmed mean, α -winsorized mean, Pearson product moment correlation coefﬁcient etc. among notable ones) is not given in adequate detail. Moreover, the summary of the ﬁnal expressions given in some sporadic cases does not appear to be correct. In this article, we aim to examine and summarize the derivation of the inﬂuence curve for various well-known estimators for estimating the location of a population, many of which are considered in the nonparametric paradigm.


Introduction
In the context of assessing robustness (or equivalently the "sensitivity") of an estimator there are several methods that are well documented in the literature.A non-exhaustive list of such references can be cited as follows.Refs.[1,2] have developed and studied a wide collection of robust methods of estimation to reduce the influence of outliers in the data on the estimates.An outlying observation or 'outlier' is one that appears to deviate markedly from other members of the sample in which it occurs, see, for example [3], It may arise because of generating from different mechanism or assumption, according to [4,5].Once an observation(s) is deemed to be an outlier, the estimation procedure may fail to produce an efficient as well as a robust estimator.A natural remedy against this melody could be removing the contaminated observation(s) from the sample or be replaced by the correct observation(s).Another strategy would be to consider estimators that are not sensitive to outliers, such as the median, and or estimators based on sample quantiles.However, one major limitation of the use of such measures is that they discard a significant of observations in this process that may not be desirable always.Consequently, in statistical data analysis, the rejection of outliers from the data may have serious consequences on further analysis of the sample being reduced.If the outliers get rejected from the data, then the data is no more complete but censored.In practice, replacing the rejected outliers by statistical equivalents, i.e., by simulated random observations from the assumed underlying distribution may also have similar consequences.Outliers are samples that markedly deviate from the prevailing tendency.For example, in the case of parametric motion estimation, the presence of outliers, due to noisy measurements or poorly defined support regions, will lead to inaccurate model estimates.In the case of global motion estimation, foreground moving objects also correspond to outliers.In order to reduce the impact of outliers, robust estimation has been proposed, see [1] and the references cited therein.One indicator of the performance of a robust estimator is its breakdown point, roughly defined as the highest percentage of outliers that the robust estimator can tolerate.Three classes of robust estimators may be mentioned in this regard: • M-estimators: M-estimators are a generalization of maximum likelihood estimators.They involve the minimization of a function of the form: ∑ j ∆(r j ), where r j is the residual error between a sample data and its fitted value, and ∆ is a symmetric positive-definite function with a unique minimum at ∆(x = 0).For a robust estimator, the function ∆ infuses at large values of x.In this scenario of assessing the robustness (and/or equivalently the sensitivity in some sense) of an estimator several different statistical procedures which do not directly examine the outliers but seek to accommodate them in a manner such that their influence on the estimation procedure becomes less serious have been developed in the literature.The robust methods which are usually used in this situation to characterize the underlying distribution are defined as "Winsorization" and "Trimming".The main purpose of this paper is to discuss one of the robustness properties which we will evaluate by the Influence Curve (in short, IC, henceforth) several location estimators such as the regular mean, trimmed mean, winsorized mean, and the directional mean that are used in nonparametric paradigm as well.The influence function of an estimator measures the amount of change in an estimator that can be influenced by the change of an individual observation.In their seminal work, ref. [1] introduced the idea of influence function or influence curve (IC).Ref. [6] have extended the use of influence curves that are of a more general type.The authors stated that the condition of Fisher-consistency is necessary in Hampel's theory.In fact, we will provide the definition of the influence curve that was due to [6] later on and will cite several results from their works that are quite useful in this context.
It is well known that robust estimation provides an alternative approach to classical methods which is not unduly affected by the presence of outliers.Recently, these robust estimators have been considered for models with functional data.In this paper, we try to summarize a compendium of influence curves for various nonparametric as well as parametric estimators to evaluate the location of a population.In this work, we will do a survey on the results that are already in the literature, but in some cases, without necessary adequate details.In addition, we will particularly focus on examining the point (or the set of points) that will have the largest influence on the estimator.Along with this, we will also examine (as appropriate) the discontinuity of the influence curve (such as the case with the Winsorized mean).In the next, we provide a non-exhaustive list of references on the notion of the robustness of estimators and associated IC measures in assessing sensitivity in the presence of an outlier(s).The first instance of a statistician referring to the robustness of estimators was by [7].Shortly after, Ref. [8] began to realize something interesting about the standard statistical procedures of the time; he realized that they were optimized for a normal Gaussian distribution, but volatile for some contaminated distributions.This realization became part of the foundation for the developments in the influence curve by Huber and Hampel in the mid to late 1960, see [9] for more details.
An influence function, or influence curve, is in many ways considered "the first derivative of an estimator" of a certain distribution.This function can be useful for detecting various attributes of estimators.In this paper, we will be focusing on the advantageous features of influence functions in the sense of assessing the impact of outlier(s) on these estimators.A Robust estimator is unaffected or faces minimal effects from deviations in the distribution, especially from an assumed or idealized distribution, see [9].These influence functions provide a useful tool related to nonparametric statistical inference.For a population characterized by the distribution function F and indexed by the parameter θ (i.e., F(θ)), a common estimator of F will be given by the empirical cdf, and I(u) is an indicator function, and θ ∈ Θ, the parameter space.Nonparametric statistics estimates statistical functionals expressed as θ = T(F).
Using this functional, one can derive an influence curve that allows us to better understand how deviations in data and distribution affect various estimators.The assessment of the aspect of robustness (alias sensitivity) related to several nonparametric estimators for estimating the scale and shape parameters on the basis of the influence curve will be the subject matter of a separate article.There are other pertinent references in the context of computing the influence curve under the Bayesian paradigm.For example, Ref. [10] developed a general framework of Bayesian influence analysis for various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models.The remainder of the paper is given as follows.In Section 2, we provide the basic preliminaries related to the influence curve following [6] and will have several useful results corresponding to the influence curve for one-sample estimators (several of them are nonparametric estimators) as well as for two-sample estimators.In Section 3, we provide the mathematical details on the influence curve for several estimators starting with the sample mean, and several other types of the mean estimators, such as the trimmed mean, directional mean, etc.In Section 3, we discuss the influence curve for Spearman's correlation coefficient and highlight the error in the computation of IC by [11].Section 4 deals with the computation of IC for certain special types of distribution and convolution of two distribution functions.Finally, some concluding remarks are presented in Section 5.

Basic Preliminaries on Influence Curve
In this section, we provide the definition and some useful preliminaries related to the influence curve.We start with the definition in the one-sample situation.This section heavily draws on [6].

Influence curve for the one-sample scenario
Let us consider the triplet Ω, F, P , where Ω, the sample space is a subset of the real line R, and F is the associated Borel σ-field of events, and P is the probability measure defined on this sample space.Furthermore, assume that the parameter space denoted by Θ is a convex subset of R. The fixed model consists of a family of probability measures characterized by F θ , which is identifiable with the associated cumulative distribution function (cdf).We make a note that one might consider θ as either a location or a scale parameter, but in the context of our current discussion, unless otherwise stated, θ will be considered as the location parameter.Next, we define a sequence of statistics Suppose there exists a functional T : η(Ω) → R, where η(Ω) is the space of all signed measures with mass 1 on Ω, such that as n → ∞ and the observations are independently and identically distributed (i.i.d.) according to the true underlying distribution of the population, H.When T is Fisher-consistent, i.e., T(F θ ) = θ, ∀θ ∈ Θ, the associated influence curve has been defined in [1].Next, we provide some more pertinent details in this regard.The influence curve is essentially the derivative of a statistical functional T(F) in respect to the function F itself.To take a derivative in this way, we need to invoke the definition of the generalized directional G âteaux derivative.Given a function T(F) where F is also a function, the derivative of T in respect F in the direction of G is expressed as Statistically speaking, using this definition of a derivative helps us to identify and measure "the rate of change in a statistical functional" while considering some amount of contamination, from another distribution which has the total probability concentrated on a point mass.
In this case, we consider the contamination of F by ∆ x that can be written as where 0 < < 1, and ∆ x represents the distribution function with the entire probability concentrated at the mass point x.Equivalently, we can write Consequently, we can express the influence curve of a function T(F) as However, according to [6] in the testing of statistical hypotheses, several test statistics converge to non-Fisher-consistent functionals.As a remedial measure, Ref. [6] has developed the following (as an alternative) definition of the IC.The influence curve of the functional T at F θ is defined by the following for all x ∈ Ω where the limit (possibly +∞, or −∞) exists.In this case, U(H) as ξ −1 [T(H)]; this functional provides the parameter value that the true underlying distribution H would have if it would belong to the model.This U is clearly Fisherconsistent, since U F θ = ξ −1 T(F θ ) = θ.The Hampel's influence curve is computed based on U. Observe that in this original paper, Ref. [1] had defined the IC as a right hand limit.However, Refs.[2,12,13] have replaced it with a two-sided limit for mathematical convenience.The influence curve thus describes the influence of outliers in the sample on the value of the statistic; a bounded IC consequently indicates a finite sensitivity to outliers.2.

Influence surface for the two-sample case
The definition of a two-sample IC has been independently discussed and derived by [6].However, we provide here some useful preliminaries as an extension from the IC for one sample given earlier mimicking the ideas as described in [6].Let us assume that Ω 1 = Ω 2 are subsets of R with the associated σ-algebra.For each n, we partition the entire sample in such a way, such that we select m 1 (n) and m 2 (n) with the constraint that m 1 (n) + m 2 (n) = n.Specifically, m 1 (n) points are taken/selected from Ω 1 and m 2 (n) points are taken/selected from Ω 2 , respectively.Here, m 1 (n) and m 2 (n) goes to infinity as n → ∞.In the context of estimation of the location of a population, the "robust model" (i.e., a model without any contamination effect) establishes the fact that the sample of Ω 1 is distributed according to the distribution function H and the sample of Ω 2 is distributed according to the distribution function F, and the relation between them is of the following form H(x) = F(x − θ) must be true for all x, with θ ∈ Θ, the parameter space.We want to either estimate θ or conduct a test of hypothesis for θ, say to test the null hypothesis H 0 : θ = θ 0 .Regardless of the nature of the inferential problem, one may consider the following test statistic: Next, let us suppose that the functional as given by T : Note that W represents the joint cdf, and the observations are i.i.d.according to W 1 , and Assuming that T n is invariant with respect to an identical shift of both samples.We want to evaluate the IC in a pair (H, F) with H(x) = F x − θ .It may be noted that the straight line going through H and F and is completely determined.On this straight line, the expected value of T n when the samples originate from the two distributions H and F differing by the location parameter θ and it depends on θ alone.Let us denote this expectation by η n (θ), and we further assume that η n (θ) goes to η (the expected value of T) as n → ∞.In addition we assume that the inverse of η exists which is denoted by η −1 .Next, if we define valid for all W 1 , W 2 .
We must note that outliers can happen in the first sample, second sample or in both.Therefore, the influence curve (to be more precise, the influence surface) needs to be defined appropriately.In this case, for a definition of the influence surface, the authors are referred to definition 2.2 of [6].
In the next section, we discuss the computation of the influence curve for several location parameters.

Influence Curve of Several Location Estimators
Throughout this paper, we will be using (1) to derive and assess influence curves for the estimators in an attempt to better understand the extent to which individual observations have on the value of the estimator itself.We begin our discussion with the sample mean.

Influence curve for the sample mean
The mean of the data averages all observations.Therefore, it takes into account every observation and is highly affected by outliers.Because of this, the mean is not considered a robust statistic as it will be swayed by any contamination of the data.
Using the idea of the influence curve, we can better understand this non-robustness characterization of the mean estimator.Let µ(F) denote the mean of an absolutely continuous distribution F. We note that similar derivations can be made when F is discrete.Given that By plugging in Equation (1) to express the mean of our contaminated model will be Next, we try to obtain the influence curve for the mean estimator in the next proposition.
Proposition 1.The influence function for the mean estimator is denoted by IF(µ, F; z) and is given by x − µ(F). Proof.
As we can see, each observation x has an influence on the overall estimation.This demonstrates that if x is contaminated in any way or x is an outlier, the mean will be heavily skewed as this bad observation holds a substantial amount of influence over the estimation.Therefore, because the mean cannot hold up for a contaminated model, it is not considered to be a robust statistic.In the next section, we consider the influence function for the median, a robust statistic.

Influence curve for Median
We begin the discussion by evaluating the influence curve for the following functional Obviously, the median is obtained by substituting p = 1 2 .A general expression for the Influence curve, denoted by IC(x) will be given by and assuming that F is differentiable at F −1 (p).
In fact, it has been shown that when calculating the median, up to half of the data could be corrupted before seeing any effects on the estimation, see [14].We can observe the median's robustness from its influence curve.First, let us consider the statistical functional for this estimator given by where Along with this, the functional for the median can be thought of as By using this functional along with the contaminated model we can ascertain the influence function of the median given by We can see that the median is far less impacted by a single observation.This indicates the robustness of the median as an estimator.In the next section, we will delve into the robustness and influence function of a more efficient robust mean estimator.

Note:
The sign test statistic has the same expression of IC like the median estimator.
Furthermore, for the one-sample Wilcoxon test statistic, the associated IC is equal to that of the Hodges-Lehman estimator.

Influence function for trimmed Mean
The trimmed mean are the "smooth" L-estimators with the following functional Next, on using an alternative definition of influence curve, we may write, For F(x) < α, and on using the formula for integration by parts udv = uv − vs.du, we can re-write (4) xdF n (x) is the average of the n(1 − 2α) middle observations.4.

Influence curve for the Winsorized mean
The winsorized mean W = W(F n ) is obtained by replacing the αn smallest observations by x (nα+1) and the αn largest observations by x n(1−α) , and taking the mean of the modified sample (if nα is not an integer, then consider the nearest integer).Consequently, the Winsorized mean will be given by Next, we try to derive the influence curve for this estimator and we will also show that it is discontinuous at Here, we have . This is linear between Therefore, we can write

Influence curve for α-Winsorized Mean
An α-winsorized mean is calculated by first determining a α, from both sides of the data (for details, see [15] and the references cited therein).This proportion of data points is then replaced with the next closest observation.The mean is then taken over this altered data set.Whereas the regular mean estimator loses its robustness when an alteration to the data occurs, the α-winsorized mean estimator is far more robust and less likely to be impacted by small deviations from the original data.To better observe this, we can attempt to derive the influence function for the α-winsorized mean estimator.Let us start by considering our functional Next, the contaminated model is Here, it is important to note that F(x α ) = α as it is the α th quantile.Using, the two above equations, we can now try to derive the influence function for the α-winsorized mean by considering a small contamination at .This will be given by It appears that further numerical evaluation is required to obtain and simplify the influence function for the α-winsorized mean.6.

Influence curve for the directional mean
This problem has actually appeared in [16].Let X 1 , X 2 , • • • , X n be a set of independent and identically distributed observations from the unit circle, so that x T i x i = 1, with the probability density function f (x, ¯) where ¯T¯= 1.We define the directional mean T , and F and G are the marginal distributions of each coordinate of x.
Then, it can be shown that • The functional T is robust. .
We begin our discussion with the following . Consequently, we can write Observe that since sup where δ is the direction of x and δ 0 is the direction of T(F).This is the largest (||C(F)||) −1 , the gross error sensitivity (GES, in short) when x is orthogonal to T(F).The GES is bounded since C(F) = 0, and therefore, T is robust.The most influential direction would be for an observation located at 90°from the mean direction.
Next, consider the following after some algebraic simplification.Our result immediately follows by taking the positive square root of both sides of (8).
In the next section, we consider the influence function for a well-known dependence function, popularly known as the product moment correlation coefficient.

Influence Function for the Spearman's Correlation Coefficient (ρ)
The bivariate correlation coefficient (Pearson's product moment correlation coefficient) is a robust estimator that is incredibly useful for multivariate analysis, see [11].We examine the robustness by determining the influence curve for the Pearson product moment correlation coefficient, denoted by ρ.Assuming that the population is absolutely continuous, the associated functional is given by , where ). Next, using our familiar contaminated model in (1), we can express it for a bivariate situation.That is where ∆ x represents the point mass 1 at a pair of bivariate observations such that x = {(x 11 , x 12 ), (x 21 , x 22 )}.This is necessary as we need to consider contamination by a mixture of two distributions.In this case, a bivariate observation is randomly sampled from the distribution F = F x,y with a probability of (1 − ).In other words, our observed value is x with a probability of .Next, in terms of this contaminated model, one may get where Let us define , and Y 2 = X 2 −µ 2 σ 2 .Furthermore, let G be the joint distribution function for the random vector Y 1 , Y 2 .Next, since the transformation from (X 1 , X 2 ) → (Y 1 , Y 2 ) does not affect the correlation coefficient value, the following functional relationship holds: IC(T(F), F, x) = IC(T(G), G, y).
Noticeably, E(Y 1 ) = E(Y 2 ) = 0, and Var(Y 1 ) = Var(Y 2 ) = 1.Consequently, (9) reduces to Next, the denominator, A 2 will be Consequently, Next, combining all the above, the influence function for the bivariate correlation coefficient will be given in the next proposition. . Proof.
Since the limit is of the form 0 0 , we need to apply L'Hospital's Rule, which we have say, where and By combining the results from all the above, we can conclude that our influence function is The derivation above has been independently studied by [11].However, in the paper, the author discussed the proof by introducing a bivariate indicator function which is not required/necessary based on what we have found in this derivation.In this paper, we have provided alternative proof.
In the next section, we discuss evaluating the influence curve for parameters related to certain types of probability distributions that are different from univariate probability models and associated location parameters in their usual sense.

Influence Curve for Certain Types of Probability Distributions
We begin this section by computing the influence curve for the convolution of two absolutely continuous distribution functions.We conjecture at this point that similar development can be made in the discrete domain albeit with computational complexity.

1.
Influence curve for the convolution of two distribution functions Let T(F) = F −1 dG(t), where G is a cdf defined on [ξ, 1 − ξ], with 0 ≤ ξ ≤ 1 2 , and ξ is assumed to be a maximum value in the given interval.Then, the associated influence curve for T will be Proof.
We begin by noting the fact that the IC for . In this case, T(F) is a linear functional of F −1 , subsequently, the IC for T(F) here will be .
Hence, the proof.

2.
Next, we consider the influence curve for the von Mises distribution.The associated pdf will be .

Proof.
At first, we make a note of the following: • If we write x, ¯in terms of angles t and θ, such that x = (cos t, sin t) T , and ¯= (cos θ, sin θ) T ,, respectively, then the reparametrize family is a location family when κ is known.

•
In this case, the directional mean will be the maximum likelihood estimate of µ (irrespective of the fact whether κ is known or not.) Next, we write, for simplicity in deriving the main result Our result immediately follows by considering ( 11), ( 13) in (12).

Simulation Study
We performed a simulation study to confirm some aspects of our theoretical findings.For illustrative purposes, we generate a random sample of size 500 from an exponential distribution (with rate λ = 5) and compute the numerical values of the influence curves for each of the population functionals discussed earlier at varying contamination levels.Precisely, the Cauchy distribution was contaminated at levels = 0.02, 0.04, 0.1, respectively.The findings are summarized in Table 1.The expression of the influence curve for the directional mean and the correlation coefficient are not reported here as we feel that a separate study is required.Regarding the gross error sensitivity of the Pearson correlation coefficient based on a random sample drawn from a standard normal population, see [1].
• L-estimators: L-estimators are a linear combination of order statistics.Two examples are the median and the α-trimmed-mean.• R-estimators: These estimators are based on rank tests.

Proposition 2 .
The influence function for the Pearson's Correlation Coefficient is IC(T(G), G; y) = y 1 y 2 − ρ

Table 1 .
Numerical computation of robustness of several estimators via the Influence curve.