On the M-Estimator under Third Moment Condition

Luo, Rundong; Chen, Yiming; Song, Shuai

doi:10.3390/math10101713

Open AccessArticle

On the M-Estimator under Third Moment Condition

by

Rundong Luo

¹,

Yiming Chen

^2,*

and

Shuai Song

³

¹

School of Business, Shandong University, Weihai 264209, China

²

Institute for Financial Studies, Shandong University, Jinan 250100, China

³

School of Economics, Shandong University, Jinan 250100, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(10), 1713; https://doi.org/10.3390/math10101713

Submission received: 7 April 2022 / Revised: 30 April 2022 / Accepted: 7 May 2022 / Published: 17 May 2022

(This article belongs to the Special Issue Limit Theorems of Probability Theory)

Download

Browse Figures

Versions Notes

Abstract

:

Estimating the expected value of a random variable by data-driven methods is one of the most fundamental problems in statistics. In this study, we present an extension of Olivier Catoni’s classical M-estimators of the empirical mean, which focus on the heavy-tailed data by imposing more precise inequalities on exponential moments of Catoni’s estimator. We show that our works behave better than Catoni‘s both in practice and theory. The performances are illustrated in the simulation and real data.

Keywords:

M-estimator; Catoni’s estimator; empirical mean

MSC:

60E15; 62F35

1. Introduction

In this study, we focused on estimating the mean

m = E X

of a real random variable X, supposing that

X_{1}, \dots, X_{n}

are independent and identically distributed drawn from X. It is well known that the empirical mean

{\hat{m}}_{n} = n^{- 1} \sum_{i = 1}^{n} X_{i}

is the most popular estimator of m, and theoretical properties have been thoroughly studied [1].

However, recent works have concentrated more on the performance of the estimator when the distribution is heavy-tailed (the second moment or fourth moment of the distribution does not exist), which is becoming more and more common in many research fields (see, e.g., Embrechts, Klüppelberg, and Mikosch [2]). When the data have a heavy tail, the traditional method such as the empirical mean performs poorly, and appropriate robust estimators are required, which drives related research on M-estimator (generalizations of Maximum Likelihood estimator) for correction of the outliers (Huber [3]).

There has been a renewed interest in the area of robust statistics over the last several decades. Nemirovsky and Yudin [4], Hsu and Sabato [5], and Jerrum et al. [6] proposed various forms of Median-of-means (MOM) estimators to handle data in different situations. They called for dividing the data into several groups with equal size and then calculating the empirical mean within each group, finally taking the median of these empirical means as the formal MOM estimator, which reduces the impact of heavy-tailed data. Tukey and McLaughlin [7] and Huber and Ronchetti [8] tried to improve the performance of the empirical mean by using a truncation of X (they name it truncated mean), which removed part of the sample containing

γ n

maximum and minimum values depending on the parameter

γ \in (0, 1)

and then averaged the remaining values to improve the robustness. Catoni [9], Audibert, and Catoni [10] studied the properties of M-estimation for regression problems. The relevant works about robust techniques in various fields are summarized in Bartlett and Mendelson [11], Maronna [12], Bubeck, and Lugosi [13].

Recently, Catoni [14] modified the empirical mean to a new robust estimator. It is easy to observe that the empirical mean is the solution of the following estimation equation

\sum_{i = 1}^{n} (X_{i} - μ) = 0 .

(1)

If we change the form of Equation (1) to

\sum_{i = 1}^{n} ϕ [α (X_{i} - μ)] = 0 .

(2)

The solution of (2) is called Catoni’s mean estimator, where

ϕ : R \to R

is a non-decreasing differentiable truncation function such that for any

x \in R

,

- log (1 - x + x^{2} / 2)

\leq ϕ (x) \leq log (1 + x + x^{2} / 2)

, and

α

is a parameter to ensure the existence of the estimator. We denote Catoni’s mean estimator by

{\tilde{m}}_{n, α}

. The main purpose of the truncation function is to make

ϕ (x)

grow slower than x, and then the effect of outliers due to heavy tails in X will be diminished. Although

ϕ (x)

is not the derivative of some explicit error function, it still can be considered as an influence function in robust theory.

By a mild assumption that the variance

v = E [{(X - m)}^{2}]

of the distribution exists and choosing the parameter

α

to optimize the bounds, Catoni [14] obtained the following performance of

{\tilde{m}}_{n, α}

.

Theorem 1.

Let

X_{1}, \dots, X_{n}

be independent, identically distributed random variables, which are drawn from X. We assume that the mean m and variance v of X exist. For any

x \in R_{+}

, and positive integer n such that

n > 2 x

. Catoni’s mean estimator

{\tilde{m}}_{n, α}

with parameter

α = \sqrt{\frac{2 x}{n v (1 + \frac{2 x}{n - 2 x)})}}

satisfies,

P \{|{\tilde{m}}_{n, α} - m| \geq \sqrt{\frac{2 v x}{n - 2 x}}\} \leq 2 e^{- x} .

(3)

Moreover, if we choose α to be independent from x as follows, and assume

n > 2 (1 + x)

,

α = \sqrt{\frac{2}{n v}},

then

P \{|{\tilde{m}}_{n, α} - m| \geq \frac{1 + x}{1 - (1 + x) / n} \sqrt{\frac{v}{2 n}}\} \leq 2 e^{- x} .

(4)

The method of Catoni [14] is widely promoted as a robust estimator by Brownlees, Joly, and Lugosi [15], Minsker [16], and Wang et al. [17]. We need to point out here that the parameter

α

is the solution of the equation where the derivative of Catoni’s estimator’s deviation with respect to

α

equals to 0. When

v = 0

, the Catoni’s estimator’s deviation is 0, and no specific

α

is needed. This also holds for Theorem 2.

The main contribution of this article is to improve Catoni’s estimator under the assumption of the third moment condition, and we named it the third-moment Catoni estimator. Starting from the adjustment of the truncation function denoted by

ψ (x)

in our work, as Figure 1 shows, the influence function with the third moment performs closer to the true value than the original one of Catoni’s. We obtained a more precise exponential moment upper bound, which leads to a better error bound.

Simultaneously, our work had a better performance for the samples drawn from the t-distribution, which is common in many fields of research(see Jones and Faddy [18]). As a special case of the heavy-tailed distribution, the third moment of the t-distribution exists, which satisfies our assumptions about the distribution. We present the superiority of our estimator in a Monte Carlo simulation. We also show the performance of the proposed estimator under a skewed normal distribution to evaluate the adaptability of the estimator to other distributions.

The rest of the article is organized as follows. In Section 2, we introduce the main result of the third-moment Catoni’s estimator. A Monte Carlo simulation is provided in Section 3 to compare the performance of the proposed estimator with Catoni’s estimator for t-distribution. Section 4 examines the performance of the proposed estimator on real data.

2. Main Result

Let

{(X_{i})}_{i = 1}^{n}

denote an i.i.d. sample drawn from the distribution of X. Let m, v, and s be the mean, variance, and third central moment of X, respectively, which is

E (X) = m, E [{(X - m)}^{2}] = v, and E [{(X - m)}^{3}]

= s

.

The influence function

ψ (x)

here should be considered wider than the original function as Catoni’s to obtain a more accurate exponential moment. In this study, we assumed that

ψ (x) = \{\begin{matrix} log (1 + x + x^{2} / 2 + x^{3} / 6), & x \geq 0 \\ - log (1 - x + x^{2} / 2 - x^{3} / 6), & x < 0 . \end{matrix}

(5)

Our mean estimator

{\hat{m}}_{n, α}

is the unique solution of

R_{n, α} (μ) = 0

, where

R_{n, α} (μ) = \sum_{i = 1}^{n} ψ (α (X_{i} - μ)) .

(6)

Next, we present our main result that bounds the

{\hat{m}}_{n, α} - m

with the appropriate choice of negative parameter

α

:

Theorem 2.

Let

X_{1}, \dots, X_{n}

be independent, identically distributed random variables with finite mean m, variance v, and third central moment s. For any

x > 0

, the error bound between the estimator and the empirical mean satisfies

P \{|{\hat{m}}_{n, α} - m| \geq 2 (\sqrt[3]{\frac{q}{2} + \sqrt{Δ}} + \sqrt[3]{\frac{q}{2} - \sqrt{Δ}})\} \leq 2 e^{- x},

(7)

where

Δ = {(\frac{q}{2})}^{2} + {(\frac{p}{3})}^{3}, p = \frac{3 + 3 v α^{2}}{α^{2}}, q = \frac{n α^{3} s + 6 x - 4 n}{n α^{3}} .

Under some technical assumptions that will be mentioned in the following corollary, we have the following upper bound on the probability of the exponential tail:

Corollary 1.

Let

X_{1}, \dots, X_{n}

be independent, identically distributed random variables with finite mean m, variance v and third central moment s. For any

x > 0

and assume that

n > \frac{3}{2} (1 + x)

and

- \sqrt{\frac{4 n^{3} v^{3}}{729}} ⩽ s ⩽ \sqrt{\frac{4 n^{3} v^{3}}{729}}

,

P \{|{\hat{m}}_{n, α} - m| \geq (1 + x) \sqrt{\frac{v}{n}}\} \leq 2 e^{- x} .

(8)

Remark 1.

It is obvious that with the assumption that n is a positive integer and satisfies

n > \frac{3}{2} (1 + x)

and

- \sqrt{\frac{4 n^{3} v^{3}}{729}} ⩽ s ⩽ \sqrt{\frac{4 n^{3} v^{3}}{729}}

, then

\frac{1 + x}{1 - (1 + x) / n} \sqrt{\frac{v}{2 n}} \geq (1 + x) \sqrt{\frac{v}{n}},

By assuming that

α < 0

, we obtained a better estimator bias than (4) in Catoni’s result.

Remark 2.

When the sample was small, our result was still valid with a small s. We might consider the following example. Let

X_{1}, \dots, X_{n}

be independent, identically distributed random variables, which are drawn from X. Assuming that the mean

m = 0.01

, variance

v = 1

,

x = 1

,

n = 4

, and whenever

- \sqrt{\frac{4 n^{3} v^{3}}{729}} ⩽ s ⩽ \sqrt{\frac{4 n^{3} v^{3}}{729}}

such as

s = 0.2

, which satisfies the assumption we have

P (| {\hat{m}}_{n, α} - 1 | \geq 1) \leq \frac{2}{e} .

For the convenience of proof, we first present the following lemma (Cardano formula); refer to Høyrup’s [19] for more details.

Lemma 1.

For any general cubic equation of the form

x^{3} + p x + q = 0

, one of the roots over the field of real numbers has the form:

x = \sqrt[3]{- \frac{q}{2} + \sqrt{Δ}} + \sqrt[3]{- \frac{q}{2} - \sqrt{Δ}},

(9)

where the discriminant of the root Δ is as follows, when

Δ > 0

, the cubic equation has one real root; the cubic equation has three real roots when

Δ \leq 0

.

Δ = \frac{q^{2}}{4} + \frac{p^{3}}{27} .

Proof of Theorem 2.

Due to the inequality (5) about the

ψ (x)

, we have the following exponential moment inequality of

R_{n, α} (μ)

, for all

μ \in R

:

\begin{matrix} E [e^{R_{n, α} (μ)}] \leq {(E [1 + α (X - μ) + \frac{α^{2} {(X - μ)}^{2}}{2} + \frac{α^{3} {(X - μ)}^{3}}{6}])}^{n} \\ = {(1 + E [α (X - μ)] + E [\frac{α^{2} {(X - μ)}^{2}}{2}] + E [\frac{α^{3} {(X - μ)}^{3}}{6}])}^{n}, \end{matrix}

(10)

with a brief calculation, we have

E {[X - μ]}^{2} = v + {(m - μ)}^{2}

and

E {[X - μ]}^{3} = {(m - μ)}^{3} + 3 v (m - μ) + s

; so, the inequality (10) can be bounded by the following term:

exp (n α (m - μ) + \frac{n α^{2} (v + {(m - μ)}^{2})}{2} + \frac{n α^{3}}{6} ({(m - μ)}^{3} + 3 v (m - μ) + s)) .

Similarly,

\begin{matrix} E [e^{- R_{n, α} (μ)}] \leq {(E [1 - α (X - μ) + \frac{α^{2} {(X - μ)}^{2}}{2} - \frac{α^{3} {(X - μ)}^{3}}{6}])}^{n} \\ = {(1 - E [α (X - μ)] + E [\frac{α^{2} {(X - μ)}^{2}}{2}] - E [\frac{α^{3} {(X - μ)}^{3}}{6}])}^{n} \\ = {(1 - α (m - μ) + \frac{α^{2} (v + {(m - μ)}^{2})}{2} - \frac{α^{3}}{6} ({(m - μ)}^{3} + 3 v (m - μ) + s))}^{n} \\ \leq exp (- n α (m - μ) + \frac{n α^{2} (v + {(m - μ)}^{2})}{2} - \frac{n α^{3}}{6} ({(m - μ)}^{3} + 3 v (m - μ) + s)) . \end{matrix}

(11)

Let

A_{1} = n α (m - μ) + \frac{n α^{2} (v + {(m - μ)}^{2})}{2} + \frac{n α^{3}}{6} ({(m - μ)}^{3} + 3 v (m - μ) + s),

A_{2} = - n α (m - μ) + \frac{n α^{2} (v + {(m - μ)}^{2})}{2} - \frac{n α^{3}}{6} ({(m - μ)}^{3} + 3 v (m - μ) + s),

whenever

X_{i}

has a finite third moment s. We can obtain from the Markov inequality that for any

μ \in R

and

x \in R_{+}

,

\begin{matrix} P \{R_{n, α} (μ) \geq A_{1} + x\} \\ = P \{exp (R_{n, α} (μ)) \geq exp (A_{1} + x)\} \\ \leq E [e^{R_{n, α} (μ)}] / exp (A_{1} + x) \\ \leq e^{- x} . \end{matrix}

(12)

In the same way, we have

P \{- R_{n, α} (μ) \geq A_{2} + x\} \leq e^{- x} .

(13)

Then, as shown in Figure 2.

We can control the estimator

{\hat{m}}_{n, α}

by the roots of the cubic equation as follows:

\begin{matrix} C_{+} (μ) = A_{1} + x = 0, \\ C_{-} (μ) = - A_{2} - x = 0 . \end{matrix}

(14)

Equation (13) above can be regarded as a cubic equation about

m - μ

. To solve (13), we first convert it into a standard-form one-dimensional cubic equation by letting

y_{n} = m - μ - \frac{1}{α}, (n = 1, 2)

, and then we obtain the following equations:

\begin{matrix} y_{1}^{3} + \frac{3 + 3 v α^{2}}{α^{2}} y_{1} + \frac{n α^{3} s + 6 x - 4 n}{n α^{3}} = 0, \\ y_{2}^{3} + \frac{3 + 3 v α^{2}}{α^{2}} y_{2} - \frac{n α^{3} s + 6 x - 4 n}{n α^{3}} = 0 . \end{matrix}

(15)

For any

α \in R_{-}

, according to Lemma 1, since

(3 + 3 v α^{2}) / α^{2}

is always positive,

Δ

is always greater than 0. In this case, our equation has one real root and two imaginary roots, which means we can control the

{\hat{m}}_{n, α}

by the root of (13) as follows:

μ_{+} = m - \frac{1}{α} + \sqrt[3]{\frac{q}{2} - \sqrt{Δ}} + \sqrt[3]{\frac{q}{2} + \sqrt{Δ}},

μ_{-} = m - \frac{1}{α} - \sqrt[3]{\frac{q}{2} + \sqrt{Δ}} - \sqrt[3]{\frac{q}{2} - \sqrt{Δ}},

where the

Δ

, p, and q are the same as above. We can easily obtain from the formula above that

R_{n, α} (μ_{+}) \leq 0

, implying that

{\hat{m}}_{α, n} < μ_{+}

with probability at least

1 - e^{- x}

, since

R_{n, α} (μ)

is a non-increasing function. Similarly,

{\hat{m}}_{α, n} > μ_{-}

with probability at least

1 - e^{- x}

. Then, by choosing the parameter

α

, we can derive the performance of the estimator

{\hat{m}}_{α, n}

for the bias of the mean m. That is, with probability at least

1 - 2 e^{- x}

, we have

μ_{-} < {\hat{m}}_{α, n} < μ_{+} .

The proof of Theorem 2 is completed. □

Proof of Corollary 1.

In fact, the right-hand side of (7) can be bounded as follows without limiting the sign of s:

\begin{matrix} |{\hat{m}}_{n, α} - m| & \leq 2 (|\sqrt[3]{\frac{q}{2} + \sqrt{Δ}} + \sqrt[3]{\frac{q}{2} - \sqrt{Δ}}|) \\ < 4 |\sqrt[3]{\frac{n α^{3} s + 6 x - 4 n}{2 n α^{3}}}| \\ = 4 |\sqrt[3]{- \frac{2}{α^{3}} + \frac{3 x}{n α^{3}} + \frac{s}{2}}|, \end{matrix}

(16)

with the assumption

n > \frac{3}{2} (1 + x)

, which is weaker than Catoni’s, (16) can be bounded by

\begin{matrix} 4 |\sqrt[3]{\frac{2}{α^{3}} - \frac{2}{α^{3}} + \frac{s}{2}}| \\ = 4 |\sqrt[3]{\frac{s}{2}}| . \end{matrix}

(17)

Moreover, assuming that

- \sqrt{\frac{4 n^{3} v^{3}}{729}} ⩽ s ⩽ \sqrt{\frac{4 n^{3} v^{3}}{729}}

, we can obtain that (17) is bounded by

(1 + x) \sqrt{\frac{v}{n}}

; then, (8) holds. □

3. Simulation

In this section, we considered the performance of the estimator with respect to the t-distribution on applications by Monte Carlo simulation exercise results. We focused on the performance of the estimator in

L_{1}

regression. Our data were simulated from a linear model generated from a t-distribution regressed by our proposed estimator; we measured the loss of the regression by the minimization of the

L_{1}

norm.

The details of the simulation are as follows: we considered n independent, identically distributed real random variables pairs

(X_{1}, Y_{1}), (X_{2}, Y_{2}) \dots

,

(X_{n}, Y_{n})

where

X_{i}

take their values in

R^{3}

while

Y_{i}

in

R

, and the explanatory variables

X_{i}

are drawn from a multivariate normal distribution with 0 mean, and variance is a three-dimensional identity matrix. The response variable

Y_{i}

is generated as follows:

Y_{i} = X_{i}^{T} θ + ϵ_{i},

(18)

where the parameter vector

θ

is set to be

(0.25, - 0.25, 0.50)

, and

ϵ_{i}

is an error term with zero mean and unit variance, which is drawn from a Student t-distribution. Our main goal was to estimate the parameter

θ

by minimizing the

L_{1}

risk

E |Y - X_{i}^{T} θ|,

and then we defined the the

L_{1}

estimators

{\hat{θ}}_{1}

, the classical Catoni mean estimator

{\hat{θ}}_{2}

, and the third-moment Catoni’s estimator

{\hat{θ}}_{3}

as follows

\begin{matrix} {\hat{θ}}_{1} = arg min_{θ} {\hat{R}}_{1} (θ) = arg min_{θ} \frac{1}{n} \sum_{i = 1}^{n} |Y_{i} - X_{i}^{T} θ|, \\ {\hat{θ}}_{2} = arg min_{θ} {\hat{R}}_{2} (μ) = arg min_{θ} \frac{1}{n α} \sum_{i = 1}^{n} ϕ (α (|Y_{i} - X_{i}^{T} θ| - μ)) = 0, \\ {\hat{θ}}_{3} = arg min_{θ} {\hat{R}}_{3} (μ) = arg min_{θ} \frac{1}{n α} \sum_{i = 1}^{n} ψ (α (|Y_{i} - X_{i}^{T} θ| - μ)) = 0, \end{matrix}

(19)

where the

{\hat{R}}_{2} (μ)

,

{\hat{R}}_{3} (μ)

is the root of the right side of the equation, respectively;

ϕ (x)

is the widest choice defined in Catoni’s result, the parameter

α = 1

, which is the same as Brownless’s work;

ψ (x)

was set as above; and the parameter

α = - 1

. The measures for the performance of the estimator are as follows:

\begin{matrix} R ({\hat{θ}}_{1}) - R (θ) = E |Y - X^{T} {\hat{θ}}_{1}| - E |Y - X^{T} θ|, \\ R ({\hat{θ}}_{2}) - R (θ) = E |Y - X^{T} {\hat{θ}}_{2}| - E |Y - X^{T} θ|, \\ R ({\hat{θ}}_{3}) - R (θ) = E |Y - X^{T} {\hat{θ}}_{3}| - E |Y - X^{T} θ| . \end{matrix}

(20)

The simulation experiments repeated with different sample sizes, which ranged from 50 to 1000 and with degrees of freedom of the t-distribution ranging from 1 to 7. Each set of the sample size experiments was replicated 1000 times, and for each replication, we evaluated the performance of the regression by the mean of the sample

(X_{1}^{'}, Y_{1}^{'}), (X_{2}^{'}, Y_{2}^{'}), \dots, (X_{m}^{'}, Y_{m}^{'})

—that is, i.i.d.with the sample

(X_{1}, Y_{1}), (X_{2}, Y_{2}) \dots

,

(X_{n}, Y_{n})

. We used the following equation to evaluate the performance of the regression, which called excess risk.

\begin{matrix} \tilde{R} ({\hat{θ}}_{1}) = \frac{1}{m} \sum_{i = 1}^{m} {|Y_{i}^{'} - Z_{i}^{T} {\hat{θ}}_{1}|}^{2}, \\ \tilde{R} ({\hat{θ}}_{2}) = \frac{1}{m} \sum_{i = 1}^{m} {|Y_{i}^{'} - Z_{i}^{T} {\hat{θ}}_{2}|}^{2}, \\ \tilde{R} ({\hat{θ}}_{3}) = \frac{1}{m} \sum_{i = 1}^{m} {|Y_{i}^{'} - Z_{i}^{T} {\hat{θ}}_{3}|}^{2} . \end{matrix}

(21)

Figure 3 displays the performance of the excess risk for three estimators when

n = 500

and the degrees of freedom of the t-distribution ranged from 1 to 7; we can obtain that the proposed estimator performs better than the other estimators, which means more stability on the outliers.

The results of the Monte Carlo simulation including the performance of the estimator for different n are presented in Table 1, and we also compared the performance between the proposed estimator and other estimators with various risks in Table 2 where sample size

n = 500

and degrees of freedom

d = 1

; the

L_{1}

represents the general

L_{1}

regression; the C and

C_{3}

denote the original Catoni estimator and our third-moment Catoni estimator, respectively; and the ER, RB, and SMSE represents the excess risk, relative risk (

\frac{| ∥ \bar{\hat{θ}} ∥_{2} - {∥ θ ∥}_{2} |}{{∥ θ ∥}_{2}}

, with

\bar{\hat{θ}} = \frac{1}{1000} \sum_{j = 1}^{1000} {\hat{θ}}^{(j)}

), and the square root of the mean square error (

\sqrt{M S E} = \sqrt{\frac{1}{1000} \sum_{j = 1}^{1000} [∥ \hat{θ} {(j) ∥}_{2} - {∥ θ ∥}_{2}]^{2}}

).

We can derive from the table that when the distribution has a heavy tail, our estimator performs better in most cases than the other two estimators, and the excess risk of the estimator decreases as the sample size increases. At the same time, with the degrees of freedom of the t-distribution rising, the tail of the t-distribution becomes thinner, which becomes closer to the normal distribution, and the performance of all procedures on excess risk is significantly improved; additionally, the proposed estimator also performs well for different risks.

We also examined the performance of the third-moment Catoni estimator on a skewed normal distribution in Table 3; the model still follows (18) where

ϵ

follows a skewed normal distribution with shape parameter

α = 1, 3, 5

and with other settings unchanged. We can draw conclusions from the table that the bias of the improved estimator is still smaller than the original one. However, the deviation in the estimator did not display a significant difference as the shape parameter

α

changed. We suppose that this results from the tail behavior of the skew normal distribution in that the existence of its fourth moment conflicts with the usual assumption that the fourth moment of heavy tail distribution does not exist. At the same time, neither Catoni’s estimator nor our estimator performed better than the estimator obtained by L1 regression.

4. Empirical Analysis

In this section, we used the proposed procedure to research the dataset “tumor cell resistance to death,” an artificial dataset consisting of two different types of tumor cells A and B, and the experiment records their resistance to different doses of experimental drugs. The explanatory variable

X_{i}

here is the dose of the drug, and the response variable

Y_{i}

is the score representing the resistance to death, ranging from 0 to 4. These data are available in the R lqr package; Galarza et al. [20] have studied these data by the quantile regression method.

In Figure 4, Figure 5, Figure 6 and Figure 7, we display the QQplot and the log-QQplot about the scores for cell A and cell B, and it can be seen that the distribution of both cells lacks normality; however, the normality is satisfied between cells and log-scores; besides, the boxplot and the bee colony diagram in Figure 8 and Figure 9 shows that both cell A and cell B have heavy-tails, which allows us to focus on the following regression model:

log (Y_{i}) = β_{0} + β_{1} X_{i},

where

Y_{i}

and

X_{i}

are defined before. Our focus was estimating the parameters

β_{0}

and

β_{1}

, the solution of the following equation:

{\hat{r}}_{β} (u) = \frac{1}{n α} \sum_{i = 1}^{n} ψ (α (|log (Y_{i}) - β_{0} - X_{i}^{⊤} β_{1}| - u)) = 0 .

Let

{\hat{R}}_{C} (β)

denote the solution of the

{\hat{r}}_{β} (u) = 0

; then, the Catoni regression estimator of

β_{0}

and

β_{1}

is in the form as follows:

arg min_{β_{0}, β_{1}} {\hat{R}}_{C} (β) .

Moreover, we compared the proposed estimator with the classical OLS estimator in Figure 10 and Figure 11. The residuals plots are shown in Figure 12, Figure 13, Figure 14 and Figure 15, from which we can draw the conclusion that the distribution of the residual of the three-order Catoni regression performs more uniformly; furthermore, the Mean Squared Error of the third-moment Catoni regression and OLS regression was 0.1120, 0.1255 for cell A and 0.2268, 0.2335 for cell B, respectively, which indicates that the proposed method is a better regression.

5. Discussion

Estimating the mean of random variables is a classical issue in statistics [1], and it has been well studied in classical statistics; however, with the discovery of heavy-tailed distribution in many research fields, its existence is an important challenge in statistics. When the data have heavy tails, the traditional estimators such as the empirical mean usually perform poorly. Therefore, how to find an appropriate robust procedure is a well-known problem and has aroused great interest. A new estimator based on reconstructing the structure of the empirical mean was proposed by Catoni, which has excellent theoretical properties on the bias.

The Catoni’s estimator is based on the existence of the variance v of the random variable. Therefore, with a weaker assumption on the moment conditions, it is an interesting issue whether the estimator has a better performance. In this study, we assumed that the third moment s of the data exists, and a more accurate upper bound of the exponential moment was obtained, which motivates an estimator with a better bias. To a certain extent, the assumption reduces the robustness to outliers, but it has a minimal effect in heavy tails distribution (the fourth moment does not exist). In future work, we have the following goals: first, we believe that our method can be applied as an improved mean estimator to any relevant model as long as the third moment of the distribution has good theoretical properties and wide application; second, it is an interesting idea to discuss and compare the bias bound of the proposed estimator with the minimax bound; finally, the estimation of the variance in regression models is very important in statistical inference. Considering that the deviation of the estimator given in our main theoretical results from the true value can be regarded as the confidence interval based on the known variance; therefore, the proposed estimator is not suitable for the estimation of variance, but it is an interesting issue how a proper variance estimator affects the bias of our estimator; additionally, we will consider variance estimation under heavy-tailed distributions in later work.

Author Contributions

Conceptualization, Y.C.; methodology, Y.C. and R.L.; investigation, Y.C. and R.L.; software, S.S.; writing, Y.C. and R.L.; Y.C. has designed the framework of this study and substantively revised it; R.L. and Y.C. have performed the methodology and simulation; S.S. implements research on empirical analysis research. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (No. 72073082).

Data Availability Statement

The dataset for the empirical analysis can be derived from the following resource available in CRAN, https://cran.r-project.org/web/packages/lqr/index.html, accessed on 12 February 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lugosi, G.; Mendelson, S. Mean estimation and regression under heavy-tailed distributions: A survey. Found. Comput. Math. 2019, 19, 1145–1190. [Google Scholar] [CrossRef] [Green Version]
Embrechts, P.; Kluppelberg, C.; Mikosch, T. Modelling Extremal Events for Insurance and Finance; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
Huber, P. Robust estimation of a location parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Nemirovsky, A.; Yudin, D. Problem Complexity and Method Efficiency in Optimization; Springer: Berlin/Heidelberg, Germany; Wiley: New York, NY, USA, 1983. [Google Scholar]
Hsu, D.; Sabato, S. Loss minimization and parameter estimation with heavy tails. J. Mach. Learn. Res. 2016, 17, 1–40. [Google Scholar]
Jerrum, M.; Valiant, L.; Vazirani, V. Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci. 1986, 43, 169–188. [Google Scholar] [CrossRef] [Green Version]
Tukey, J.; McLaughlin, D. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/Winsorization. I. Sankhyā Ser. A 1963, 25, 331–352. [Google Scholar]
Huber, P.; Ronchetti, E. Robust Statistics; Wiley: New York, NY, USA, 2009. [Google Scholar]
Catoni, O. Statistical Learning Theory and Stochastic Optimization; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Audibert, J.; Catoni, O. Robust linear least squares regression. Ann. Stat. 2011, 39, 2766–2794. [Google Scholar] [CrossRef]
Bartlett, P.; Mendelson, S. Empirical minimization. Probab. Theory Relat. Fields 2006, 311–334. [Google Scholar] [CrossRef] [Green Version]
Maronna, R.A.; Martin, D.R.; Yohai, V.J. Robust Statistics: Theory and Methods; Springer: New York, NY, USA; Wiley: New York, NY, USA, 2006. [Google Scholar]
Bubeck, S.; Cesa-Bianchi, N.; Lugosi, G. Bandits with heavy tail. IEEE Trans. Inform. Theory 2013, 7711–7717. [Google Scholar] [CrossRef]
Catoni, O. Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 2012, 48, 1148–1185. [Google Scholar] [CrossRef]
Brownlees, C.; Joly, E.; Lugosi, G. Empirical risk minimization for heavy-tailed losses. Ann. Stat. 2015, 43, 2507–2536. [Google Scholar] [CrossRef] [Green Version]
Minsker, S. Geometric median and robust estimation in Banach spaces. Bernoulli 2015, 21, 2308–2335. [Google Scholar] [CrossRef] [Green Version]
Wang, Z.; Liu, H.; Zhang, T. Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Stat. 2014, 42, 2164–2201. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jones, M.C.; Faddy, M.J. A skew extension of the t-distribution, with applications. J. R. Stat. Soc. Ser. B Stat. Methodol. 2003, 65, 159–174, 62E10 (62F10). [Google Scholar] [CrossRef]
Høyrup, J. The Babylonian Cellar Text BM 85200+ VAT 6599; Springer: Berlin/Heidelberg, Germany; Birkhäuser: Basel, Switzerland, 1992; pp. 315–358. [Google Scholar]
Galarza, C.; Zhang, P.; Lachos, V. Logistic quantile regression for bounded outcomes using a family of heavy-tailed distributions. Sankhya B 2021, 83, S325–S349. [Google Scholar] [CrossRef]

Figure 1. Different chooses of influence function.

Figure 2. Representation of

{\hat{R}}_{f} (μ)

and the cubic equation

C_{+} (μ)

and

C_{-} (μ)

.

Figure 2. Representation of

{\hat{R}}_{f} (μ)

and the cubic equation

C_{+} (μ)

and

C_{-} (μ)

.

Figure 3. Excess risk varies with degrees of freedom.

Figure 4. QQplot for cell A.

Figure 5. QQplot for cell B.

Figure 6. log-QQplot for cell A.

Figure 7. log-QQplot for cell B.

Figure 8. Boxplot about the log-scores for the two types of cells.

Figure 9. The bee colony diagram about the log-scores for the two types of cells.

Figure 10. Regression for cell A.

Figure 11. Regression for cell B.

Figure 12. OLS regression residual plot for cell A.

Figure 13. Third-moment Catoni regression residual plot for cell A.

Figure 14. OLS regression residual plot for cell B.

Figure 15. Third-moment Catoni regression residual plot for cell B.

Table 1. The excess risk of the

L_{1}

, Catoni, and third-moment Catoni regression estimator for different degrees of freedom and sample size n.

Table 1. The excess risk of the

L_{1}

, Catoni, and third-moment Catoni regression estimator for different degrees of freedom and sample size n.

		$n = 50$	$n = 100$	$n = 250$	$n = 500$	$n = 1000$
$d = 1$	$L_{1}$	$8.79$	$5.91$	$5.75$	$4.15$	$3.67$
	C	$7.53$	$4.63$	$4.90$	$4.07$	$3.49$
	$C_{3}$	$7.46$	$4.06$	$4.84$	$4.06$	$3.38$
$d = 3$	$L_{1}$	$1.51$	$1.34$	$1.22$	$1.15$	$1.14$
	C	$1.38$	$1.27$	$1.20$	$1.15$	$1.12$
	$C_{3}$	$1.27$	$1.21$	$1.14$	$1.10$	$1.08$
$d = 5$	$L_{1}$	$1.09$	$1.11$	$1.08$	$1.08$	$1.07$
	C	$1.08$	$1.13$	$1.09$	$1.10$	$1.08$
	$C_{3}$	$1.06$	$1.08$	$1.03$	$1.04$	$1.04$
$d = 7$	$L_{1}$	$1.08$	$1.02$	$1.05$	$1.01$	$1.00$
	C	$1.05$	$0.94$	$1.03$	$1.00$	$0.98$
	$C_{3}$	$0.97$	$0.94$	$0.90$	$0.85$	$0.86$

Table 2. Comparisons of the performance between the proposed estimator and other estimators with various risks.

	$L_{1}$	C	$C_{3}$
$E R$	4.1561	4.0747	4.0628
$R B$	0.0398	0.0385	0.0383
$S M S E$	0.0970	0.0952	0.0947

Table 3. The excess risk of the

L_{1}

, Catoni, and third-moment Catoni regression estimator on a skewed normal distribution.

Table 3. The excess risk of the

L_{1}

, Catoni, and third-moment Catoni regression estimator on a skewed normal distribution.

		$n = 50$	$n = 100$	$n = 250$	$n = 500$	$n = 1000$
$s = 1$	$L_{1}$	$0.847$	$0.829$	$0.820$	$0.807$	$0.784$
	C	$0.865$	$0.844$	$0.825$	$0.785$	$0.779$
	$C_{3}$	$0.859$	$0.837$	$0.823$	$0.789$	$0.781$
$s = 3$	$L_{1}$	$0.857$	$0.833$	$0.809$	$0.819$	$0.798$
	C	$0.861$	$0.842$	$0.829$	$0.835$	$0.828$
	$C_{3}$	$0.861$	$0.843$	$0.827$	$0.835$	$0.824$
$s = 5$	$L_{1}$	$0.831$	$0.825$	$0.812$	$0.792$	$0.782$
	C	$0.856$	$0.855$	$0.850$	$0.839$	$0.828$
	$C_{3}$	$0.855$	$0.855$	$0.848$	$0.827$	$0.822$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, R.; Chen, Y.; Song, S. On the M-Estimator under Third Moment Condition. Mathematics 2022, 10, 1713. https://doi.org/10.3390/math10101713

AMA Style

Luo R, Chen Y, Song S. On the M-Estimator under Third Moment Condition. Mathematics. 2022; 10(10):1713. https://doi.org/10.3390/math10101713

Chicago/Turabian Style

Luo, Rundong, Yiming Chen, and Shuai Song. 2022. "On the M-Estimator under Third Moment Condition" Mathematics 10, no. 10: 1713. https://doi.org/10.3390/math10101713

APA Style

Luo, R., Chen, Y., & Song, S. (2022). On the M-Estimator under Third Moment Condition. Mathematics, 10(10), 1713. https://doi.org/10.3390/math10101713

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the M-Estimator under Third Moment Condition

Abstract

1. Introduction

2. Main Result

3. Simulation

4. Empirical Analysis

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI