Depth Induced Regression Medians and Uniqueness

Zuo, Yijun

doi:10.3390/stats3020009

Open AccessArticle

Depth Induced Regression Medians and Uniqueness

by

Yijun Zuo

Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA

Stats 2020, 3(2), 94-106; https://doi.org/10.3390/stats3020009

Submission received: 14 February 2020 / Revised: 16 March 2020 / Accepted: 31 March 2020 / Published: 10 April 2020

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two most promising notions in regression. Carrizosa depth

D_{C}

is another depth notion in regression. Depth-induced regression medians (maximum depth estimators) serve as robust alternatives to the classical least squares estimator. The uniqueness of regression medians is indispensable in the discussion of their properties and the asymptotics (consistency and limiting distribution) of sample regression medians. Are the regression medians induced from RD, PRD, and

D_{C}

unique? Answering this question is the main goal of this article. It is found that only the regression median induced from PRD possesses the desired uniqueness property. The conventional remedy measure for non-uniqueness, taking average of all medians, might yield an estimator that no longer possesses the maximum depth in both RD and

D_{C}

cases. These and other findings indicate that the PRD and its induced median are highly favorable among their leading competitors.

Keywords:

uniqueness; regression depth; maximum depth estimator; regression median; robustness

1. Introduction

Regular univariate sample median defined as the innermost (deepest) point of a data set is unique (If the sample median is defined to be the point

θ

that minimizes the sum of its distances to sample points (i.e.,

θ = arg {min}_{θ \in R^{1}} \sum_{i = 1}^{n} | θ - x_{i} |

, where

x_{i}, i = 1, \dots, n

are the given n sample points in

R^{1}

), then it is not unique. However, to overcome this drawback, conventionally it is defined as

θ = Median {x_{i}} : = x_{(⌊ \frac{n + 1}{2} ⌋)} + x_{(⌊ \frac{n + 2}{2} ⌋)} / 2

, where

x_{(1)} \leq x_{(2)} \leq \dots \leq x_{(n)}

are ordered values of

x_{i}

’s and

⌊ \cdot ⌋

is the floor function. Namely, it is the innermost point (from both left and right direction) or the average of two deepest sample points. Hence, it is unique). The population median defined as the

\frac{1}{2}

-th quantile (Recall, for any univariate distribution function F, and for

0 < p < 1

, the quantity

F^{- 1} (p) : = inf {x : F (x) \geq p}

is called the pth quantile or fractile of F (see page 3 of Serfling (1980) [1])) of the underlying distribution (there are other versions of definition) is also unique. The most outstanding feature of the univariate median is its robustness. In fact, among all translation equivariant location estimators, it has the best possible breakdown point (Donoho (1982) [2]) (and the minimum maximum bias if underlying distribution has a unimodal symmetric density (Huber (1964) [3]). Besides serving as a promising robust location estimator, the univariate median also provides a base for a center-outward ordering (in terms of the deviations from the median), an alternative to the traditional left-to-right ordering.

To extend the univariate median to multidimensional settings and to share its outstanding robustness property and an alternative ordering scheme is desirable for multidimensional data. One approach, among others, is via notions of data depth. General notions of data depth have been increasingly pursued and studied (Liu, et al. (1999) [4], Zuo and Serfling (2000) (ZS00) [5]) since the pioneer proposal of Tukey (1975) [6] (see Donoho and Gasko (1992) [7]). Besides Tukey depth, another prevailing depth, among others, is the projection depth (PD) [5] (Liu (1992) [8], and Zuo (2003) [9]).

Depth notions in location have also been extended to regression. Regression depth (RD) of Rousseeuw and Hubert (1999) (RH99) [10], the most famous, exemplifies a direct extension of Tukey location depth to regression. Projection regression depth (PRD) of Zuo (2018a) (Z18a) [11] is another example of the extension of prominent PD in location to regression. The RD and PRD represent the two leading notions of depth in regression ([11]) which satisfy desirable axiomatic properties. Carrizosa depth

D_{C}

(Carrizosa (1996) (C96)) [12] (defined in Section 2.2) is one of the other notions of depth in regression ([11]). One of the outstanding advantages of depth notions is that they can be directly employed to introduce median-type deepest estimating functionals (or estimators in the empirical case) for the location or regression parameters in a multi-dimensional setting based on a general min-max stratagem. The maximum (deepest) regression depth estimator (also called regression median) serves as a robust alternative to the classical least squares or least absolute deviations estimator for the unknown parameters in a general linear regression model:

\begin{matrix} y_{i} & = & x_{i}^{⊤} β + e_{i}, for i = 1, \dots, n, \end{matrix}

(1)

where ⊤ denotes the transpose of a vector, and vector

x_{i} = {(1, x_{i 1}, \dots, x_{i (p - 1)})}^{⊤}

and parameter vector

β = (β_{1}, \dots, β_{p})

are in

R^{p}

(

p \geq 2

), and

e_{i}

is a random variable in

R

. One can regard the observations

(y_{i}, x_{i}^{⊤})

as a sample from random vector

(y, x^{⊤}) \in R^{p + 1}

.

Robustness of the median induced from RD and PRD have been investigated in Van Aelst and Rousseeuw (2000) (VAR00) [13] and Zuo (2018b) [14], respectively. These medians, just like their location or univariate counterpart, indeed possess high breakdown point robustness.

Regression median, as the deepest regression hyperplane, just like their location or univariate counterpart, is expected to be unique because non-uniqueness would result in vagueness in the inference (prediction and estimation) via regression median. Uniqueness is the indispensable feature and axiomatic property when one (i) investigates the population median, or (ii) deals with the convergence in probability or in distribution of the sample regression median to its inevitably unique population version (iii) it is also an essential property in the computation of the sample regression medians for the convergence of approximate algorithms. The uniqueness issue of multidimensional location medians has been addressed in Zuo (2013) [15].

Are the medians induced from regression depth notions via the min-max scheme generally unique? Answering this question is the goal of this article. It turns out that the regression depth-induced medians are not necessarily unique. The conventional remedy measure for this issue is taking average of all. It, however, might not work (in the sense that the resulting estimator might no longer possess the maximum depth) for both RD of RH99 and

D_{C}

of C96. On the other hand, PRD-induced regression medians are unique.

The rest of article is organized as follows. Section 2 introduces leading regression depth notions and induced medians and show these medians indeed recover the regular univariate sample median in the special univariate case. Empirical examples of regression depth and medians and their behavior are illustrated in Section 3. Section 4 establishes general results on uniqueness of regression medians. Brief concluding remarks in Section 5 end the article.

2. Maximum Depth Functionals (Regression Medians)

Let

D (β; P)

be a generic non-negative functional on

R^{p} \times P

, where

β \in R^{p}

and

P

is a collection of distributions

F_{Z}

of

Z = {(y, x^{⊤})}^{⊤} \in R^{p + 1}

(

F_{Z}

and P are used interchangeably).

If

D (β; P)

satisfies four axiomatic properties: (P1) (regression, scale and affine) invariance; (P2) maximality at center; (P3) monotonicity relative to any deepest point and (P4) vanishing at infinity, then it is called a regression depth functional (see [11] for details). The maximum regression depth functional, or the regression median, can be defined as

β^{*} (F_{Z}) : = arg max_{β \in R^{p}} D (β; F_{Z}) .

(2)

Note that

β^{*}

might not be unique, and a conventional remedy measure is to take the average of all maximum depth points. Unfortunately, this could lead to a scenario where the resulting functional (or estimator) might not have the maximum depth any more. For detailed discussions on

D (β; F_{Z})

and

β^{*} (F_{Z})

, see [11]. In the following we elaborate three examples.

2.1. Median Induced from Regression Depth of RH99

Definition 1.

For any

β \in R^{p}

and the joint distribution P of

(y, x^{⊤})

in (1), [10] defined the regression depth of β, denoted hence by

{R D}_{R H} (β; P)

, to be the minimum probability mass that needs to be passed when tilting (the hyperplane induced from) β in any way until it is vertical. The maximum regression depth functional

β_{{R D}_{R H}}^{*}

(regression median) is defined as

β_{{R D}_{R H}}^{*} (P) = arg max_{β \in R^{p}} {R D}_{R H} (β; P)

(3)

The

{R D}_{H R} (β; P)

definition above is rather abstract and not easy to comprehension, many characterizations of it, or equivalent definitions, have been given in the literature though, see, e.g., [11] and references cited therein.

2.2. Median Induced from Carrizosa Depth of C96

Among regression depth notions investigated in [11], Carrizosa depth

D_{C} (β; P)

D_{C} (β; P) = inf_{α \in R^{p}} P (| r (β) | \leq | r (α) |),

(4)

for any

β \in R^{p}

and underlying probability measure P associated with

(y, x^{⊤})

, was a pioneer regression depth notion introduced in [12] and thoroughly investigated in [11], where

r (γ) = y - x^{⊤} γ

. As characterized in [11] (see Proposition 2.2 there), it turns out that (

p \geq 2

)

D_{C} (β; P) = P (r (β) = 0) .

(5)

The maximum regression depth functional (or regression median) was then defined as

β_{D_{C}}^{*} (P) = arg max_{β \in R^{p}} D_{C} (β; P) .

(6)

As shown in [11],

β_{D_{C}}^{*}

always exists if the assumption: (A)

P (H_{v}) = 0

, for any vertical hyperplane

H_{v}

, holds. Unfortunately, as

D_{C}

violates (P3) generally (see [11]), we will not focus on it in the sequel. On the other hand, under (A) RD

_{R H}

above satisfies (P1)–(P4).

2.3. Median Induced from Projection Regression Depth of Z18a

Hereafter, assume that R is a univariate regression estimating functional which satisfies

(A1) regression, scale and affine equivariant. That is, respectively,

$R (F_{(y + x b, x)}) = R (F_{(y, x)}) + b$ , $\forall b \in R^{1}$ , and
$R (F_{(s y, x)}) = s R (F_{(y, x)})$ , $\forall s \in R^{1}$ , and
$R (F_{(y, a x)}) = a^{- 1} R (F_{(y, x)})$ , $\forall a (\neq 0) \in R^{1}$ .

where

x, y \in R^{1}

are random variables. Throughout, the lower case x stands for a variable in

R^{1}

while the bold

x

for a vector in

R^{p}

(

p > 1

).

F_{(x_{1}, x_{2})}

is the distribution of vector

(x_{1}, x_{2})

.

(A2)

{sup}_{v \in S^{p - 1}} | R (F_{(y, x ⊤ v)}) | < \infty

, where

S^{p - 1} : = {u \in R^{p}, ∥ u ∥ = 1}

.

(A3)

R (F_{(y - x ⊤ β, x ⊤ v)})

is continuous in

β

and

v

, and quasi-convex in

β

, for

β \in R^{p}

,

v \in S^{p - 1}

.

Let S be a positive scale estimating functional that is scale equivariant and location invariant.

R will be restricted to the form

R (F_{(y - x ⊤ β, x ⊤ v)}) = T (F_{(y - x ⊤ β) / (x ⊤ v)})

and T will be a univariate location functional that is location, scale and affine equivariant (see pages 158–159 of Rousseeuw and Leroy (1987) (RL87) [16] for definitions). Hereafter we assume that (A0)

P (x ⊤ v = 0) = 0

for any

v \in S^{p - 1}

(see (I) of Remarks 4.1 for the explanations).

Examples of T include, among others, the mean, weighted mean, and quantile functionals. A particular example of

R (F_{(y - x ⊤ β, x ⊤ v)})

is

{Med}_{x ⊤ v \neq 0} (F_{(y - x ⊤ β) / x ⊤ v})

, where Med stands for the median functional. Typical examples of S include the standard deviation and weighted deviation functionals (Wu and Zuo (2008) [17]) and the median of absolute deviations (MAD) functional.

Equipped with a pair of T and S, we can introduce a corresponding projection based regression estimating functional. By modifying a functional in Marrona and Yohai (1993) [18] to achieve scale equivarance, [11] defined

{U F}_{v} (β; F_{(y, x^{⊤})}, T) : = | T (F_{(y - x ⊤ β) / x ⊤ v}) | / S (F_{y}),

(7)

which represents unfitness of

β

at

F_{(y, x^{⊤})}

w.r.t. T along the

v \in S^{p - 1}

. If R is a Fisher consistent regression estimating functional, then

T (F_{(y - x ⊤ β_{0}) / x ⊤ v}) = 0

for some

β_{0}

(the true parameter of the model) and

\forall v \in S^{p - 1}

. Thus overall, one expects

| T |

to be small and close to zero for a candidate

β

, independent of the choice of

v

and

x ⊤ v

. The magnitude of

| T |

measures the unfitness of

β

along the

v

. Taking the supremum over all

v \in S^{p - 1}

yields

U F (β; F_{(y, x^{⊤})}, T) = sup_{∥ v ∥ = 1} {U F}_{v} (β; F_{(y, x^{⊤})}, T),

(8)

the unfitness of

β

at

F_{(y, x^{⊤})}

w.r.t. T. Now applying the min-max scheme, [11] obtained the projection regression estimating functional (also denoted by

β_{PRD}^{*}

) w.r.t. the pair

(T, S)

\begin{matrix} β^{*} (F_{(y, x^{⊤})}, T) & = arg min_{β \in R^{p}} U F (β; F_{(y, x^{⊤})}, T) \\ = arg max_{β \in R^{p}} PRD (β; F_{(y, x^{⊤})}, T), \end{matrix}

(9)

where the projection regression depth (PRD) functional was defined in [19] as

PRD (β; F_{(y, x^{⊤})}, T) = {(1 + U F (β; F_{(y, x^{⊤})}, T))}^{- 1},

(10)

Just like S (which is for achieving scale invariance and is nominal), T sometimes is also suppressed in above functionals for simplicity. The authors of [11] showed that PRD satisfies (P1)–(P4).

For robustness consideration, in the sequel,

(T, S)

is the fixed pair

(Med, MAD)

, unless otherwise stated. Hereafter, we write

Med (Z)

rather than

Med (F_{Z})

. For this special choice of T and S, we have that

\begin{matrix} T (F_{(y - x ⊤ β) / x ⊤ v}) & = & {Med}_{x ⊤ v \neq 0} (\frac{y - x ⊤ β}{x ⊤ v}), \\ S (F_{y}) & = & MAD (F_{y}) . \end{matrix}

To end this section, we show that the three maximum depth estimators above indeed deserve to be called regression median since they recover the regular univariate sample median in the special univariate case. (The result below also holds true for the population case).

Proposition 1.

For univariate data, the

β_{R D_{R H}}^{*}

,

β_{D_{C}}^{*}

and

β_{P R D}^{*}

all recover the univariate sample median.

Proof.

(i) For

β_{R D_{R H}}^{*}

, this has already been discussed and claimed in [10] (page 390). So we only need to focus on the other two.

(ii) For

β_{D_{C}}^{*}

, we no longer can use (5) and have to invoke (4). Note that

r (β) = y - β

in this case (no slope term any more).

\begin{matrix} D_{C} (β, P) & = inf_{α \in R^{1}} P (| y - β | \leq | y - α |) \\ = min \{inf_{α > β} P (| y - β | \leq | y - α |), inf_{α \leq β} P (| y - β | \leq | y - α |)\} \\ = min \{P (y \leq β), P (y \geq β)\} . \end{matrix}

That is,

D_{C} (β, P_{n}) = min \{\sum_{i = 1}^{n} I (y_{i} \leq β) / n, \sum_{i = 1}^{n} I (y_{i} \geq β) / n\}

. The latter immediately leads to

β_{D_{C}}^{*} = {Med}_{i} {y_{i}}

(the average of all solutions).

(iii) For

β_{P R D}^{*}

, first we note that (without loss of generality, assume that

S (F_{y}) = 1

)

β_{P R D}^{*} = arg min_{β \in R^{p}} sup_{v \in S^{p - 1}} | {Med}_{i} {\frac{y_{i} - x ⊤_{i} β}{x ⊤_{i} v}} | .

(11)

When

p = 1

, it reduces to the following

β_{P R D}^{*} = arg min_{β \in R} sup_{v = \pm 1} | {Med}_{i} {\frac{y_{i} - β}{v}} | .

(12)

It is readily seen that

β_{P R D}^{*} = arg min_{β \in R} | {Med}_{i} {y_{i} - β} | = arg min_{β \in R} | {Med}_{i} {y_{i}} - β |,

(13)

where the first equality follows from (12) and the oddness of median operator, the second one follows from the translation equivalence (see page 249 of [16] for definition) of the median as a location estimator. The last display means that

β_{P R D}^{*}

recovers the sample median. □

3. Examples of Regression Depths and Regression Medians

For a better comprehension of depth notions and depth-induced medians in the last section, we present empirical examples below. We will confine attention to RD and PRD only since

D_{C} (β, P)

is just the probability mass carried by the hyperplane determined by

y = x ⊤ β

.

Example 1.

Example 3.1(Empirical RD $_{R H}$ and PRD) . What do empirical RD

_{R H}

and PRD look like? To answer the question, 30 random bivariate standard normal points are generated (plotted in Figure 1) and RD

_{R H}

and PRD are computed w.r.t. these points.

We select 961 equally spaced grid points from the square of

[x, y]

with range of

| x | \leq 3

and

| y | \leq 3

, then treat each point

(x, y)

as a

β ⊤ = (β_{1}, β_{2})

and compute its regression depth (RD

_{R H}

and PRD) w.r.t. the 30 bivariate normal points. The depths of these 961 points are plotted in Figure 2.

Inspecting the Figure reveals that (i) sample RD

_{R H}

function is a step-wise increasing function (each step in this case is

1 / 30

). For this roughly symmetric data case, it can attain maximum depth around the center of symmetry (the origin), while (ii) on the other hand, PRD is a strictly monotonically increasing function and attains its maximum value at the center of symmetry, sharply contrasting the behavior of RD

_{R H}

around the center (one has a unique maximum depth point and the other is opposite (multiple maximum depth points)).

Example 2.

Uniqueness of medians induced from empirical RD $_{R H}$ and PRD This example illustrates the uniqueness behavior of the regression depth (RD

_{R H}

and PRD)-induced medians in the empirical distribution case via a concrete example on the real data from the Hertzsprung–Russell diagram of the star cluster CYG OB1 (see Table 3 in chapter 2 of [16]), which contains 47 stars in the direction of Cygnus. Here, x is the logarithm of the effective temperature at the surface of the star (

T_{e}

), and y is the logarithm of its light intensity (

L / L_{0}

); see Figure 3 for the plot of the data set.

Five regression lines are plotted in Figure 3. Among them, three (dashed red, dotted blue, and dotdash green) are regression medians from RD

_{R H}

, one (solid black) from PRD, and the other (longdash purple) is the least squares line. Note that the classical least squares regression estimator (as well as many traditional regression estimators) could be regarded as a depth-induced median under the general “objective depth”

D_{O b j}

framework (see [11]). Thus, for the benchmark purpose, the least squares line is also plotted in Figure 3 alongside the four other median lines.

The LS line also justifies the legitimacy of the existence of RD

_{R H}

- and PRD-induced medians (as robust alternatives) since the LS line fails to capture the main-sequence/pattern of the data cloud (stars) and is heavily affected by four giant stars whereas the other four depth medians resist the four leverage points (outliers) and catch the main trend/cluster.

It turns out that there exist three maximum depth lines (medians) induced from RD

_{R H}

. Each of the three lines goes through exactly two data points. In terms of (intercept, slope) form, they are (−6.065000, 2.500000), (−8.586500, 3.075000), and (−7.903043, 2.913043). These lines are plotted by dash red, dotted blue, and dotdash green in Figure 3. All three possess regression depth

21 / 47

. Note that the average of the three deepest lines is (−7.518181, 2.829348), which possesses RD

20 / 47

. That is, it no longer possesses the maximum regression depth.

On the other hand, there exists only one maximum regression line (median), (−7.453665, 2.829416), induced from PRD, plotted in solid black in Figure 3, with PRD value

0.8585901

. Incidentally, the LS line is (6.7934673, −0.4133039), plotted in longdash purple.

The computation issues of RD

_{R H}

have been discussed in RH99, Rousseeuw and Struyf (1998) [20], and Liu and Zuo (2014) [21]. For the discussion on the computation of the PRD and induced regression medians, see Zuo (2019b) (Z19b) [22].

After obtaining

({\hat{β}}_{1}, {\hat{β}}_{2})

, one can immediately get the fitted line

\hat{y} = {\hat{β}}_{1} + {\hat{β}}_{2} x

(which has actually already been plotted in Figure 3), and the predicted values:

{\hat{y}}_{i} = {\hat{β}}_{1} + {\hat{β}}_{2} x_{i}

, and hence the residuals:

r_{i} : = y_{i} - {\hat{y}}_{i}

. All these involve the uniqueness issue, we first need to have a unique fitted line for each method. Here due to the non-uniqueness of the deepest RD lines, we select the first deepest line

(- 6.065000, 2.500000)

among the three as a representative. Then we construct a table with nine columns: 1st is the id’s of observations, 2nd is the explanatory variable

x_{i}

values, 3rd is dependent variable

y_{i}

values, 4th–6th are the predicted

\hat{y_{i}}

values for LS, RD, PRD methods, respectively, 7th–9th are the residuals

r_{i}

for LS, RD, PRD methods, respectively (Table 1).

Next the residuals of three methods are plotted below in the Figure 4.

Inspecting the residuals plot immediately reveals that in this case, the residuals of LS method are rather deceptively homogeneous, its plot fails to identify any outliers whereas the robust regression median lines all can easily spot the four obvious outliers and two groups of stars. Based on the residual plot, one can make some conclusions. For example, the four outliers are not necessarily errors but might be exceptional observations (they come from a different group of stars), and LS line does not provide a good fit (only explained

4.4 %

of total variation in observations of y).

In the empirical distribution case, one can always take the average over all regression medians to take care of the non-uniqueness issue. Nevertheless, challenges arise computationally if there exist infinitely many medians in higher dimensions. Furthermore, the average sometimes will no longer be a deepest line/hyperplane (as seen in this example and more in Section 4).

The non-uniqueness issue is more vital with the population case since without the uniqueness, there will be no uniquely defined median and it is impossible to discuss the convergence (or consistency) and the limiting distribution of the unique empirical regression median.

4. Uniqueness of Regression Medians

From the empirical example in the previous section, we see that there can exist multiple empirical regression medians induced from RD

_{R H}

while in the case of PRD there exists a unique one. These results are just empirical special examples and not for general cases. In the following, we address general cases and draw general conclusions.

4.1. Non-Uniqueness of $β_{R D_{R H}}^{}$ and $β_{D_{C}}^{}$

Under certain symmetry assumption (e.g., regression symmetry of Rousseeuw and Struyf (2004) (RS04)) [19] and other conditions, the regression median induced from RD

_{R H}

can be unique (see Theorem 3 and Corollary 3 of [19]). However, generally speaking, we have

Proposition 2.

β_{R D_{R H}}^{*} (F_{(y, x)})

is not unique in general. The average of all

β_{R D_{R H}}^{*} (F_{(y, x)})

might not possess the maximum depth any more.

Proof.

A counterexample suffices.

In fact, the real data example 3.2 could serve as one counterexample, where one has three maximum depth lines and the average line no longer possesses the maximum RD

_{R H}

value.

An even simpler counterexample could be constructed. Assume that there are three sample points

A = (- 1, 0)

,

B = (0, 1)

and

C = (1, 0)

. Then it is readily seen that three lines each of which formed by two sample points are

(1, - 1); (1, 1)

and

(0, 0)

in terms of (intercept, slope) form and each line has the maximum RD

_{R H}

2 / 3

whereas the average of all maximum depth lines is

(2 / 3, 0)

which has RD

_{R H}

only

1 / 3

. □

For special distributions, the median induced from Carrizosa depth can also be unique. But generally speaking, it is not.

Proposition 3.

β_{D_{C}}^{*} (F_{(y, x)})

is not unique in general. The average of all

β_{D_{C}}^{*} (F_{(y, x)})

might not possess the maximum depth value any more.

Proof.

A counterexample suffices.

Denote by

H_{β}

the hyperplane determined by

y = x ⊤ β

for any

β \in R^{p}

and by

θ_{β}

the acute angle formed between the hyperplane

H_{β}

and the horizontal hyperplane

H_{h}

(y=0).

Assume that

β_{i} \in R^{p}

, (

i = 1, 2

),

β_{1} \neq β_{2}

, and

H_{β_{i}}

each contains

1 / 2

probability mass; any hyperline in

H_{β_{i}}

contains no probability mass (

i = 1 or 2

);

θ_{β_{1}} = θ_{β_{2}}

, and

H_{β_{1}}

intersects with

H_{β_{2}}

at a hyperline in the horizontal hyperplane

H_{h}

.

Now in light of characterization (5) of

D_{C}

, it is readily seen that at each

β_{i}

, D

_{C} (β_{i}; P)

attains the maximum depth value

1 / 2

.

Let

γ = (β_{1} + β_{2}) / 2

, then it is readily seen that the D

_{C} (γ; P) = 0

, and

H_{γ}

is no longer a hyperplane with the maximum depth value. □

4.2. Uniqueness of $β_{P R D}^{*}$

For two univariate random variables

X, Y

defined on the sample space

Ω

,

X < Y

stands for

X (ω) < Y (ω)

,

\forall ω \in Ω

. We say that

T (F_{(y - x ⊤ β) / x ⊤ v})

is strictly monotonic at point

β_{0}

iff

T (F_{(y - x ⊤ β_{0}) / x ⊤ v}) > T (F_{(y - x ⊤ β_{1}) / x ⊤ v})

whenever

- x ⊤ β_{0} > - x ⊤ β_{1}

\forall β_{1} \in R^{p}

, for any

v \in S^{p - 1}

.

Proposition 4.

If (A0) holds and

T (F_{(y - x ⊤ β) / x ⊤ v})

(i) is strictly monotonic at

0

and (ii) satisfies (A1), (A2), and (A3), then

β_{P R D}^{*} (F_{(y, x)})

exists uniquely.

Proof.

To prove the proposition, we first invoke the following result. □

Lemma 1

([11]). The PRD and

β_{P R D}^{*}

satisfy the following propoerties.

(i): The $β_{P R D}^{*} (F_{(y, x^{⊤})})$ is regression, scale and affine equivariant in the sense that

$\begin{matrix} β^{*} (F_{(y + x ⊤ b, x^{⊤})}) = β^{*} (F_{(y, x^{⊤})}) + b, \forall b \in R^{p}; \\ β^{*} (F_{(s y, x^{⊤})}) = s β^{*} (F_{(y, x^{⊤})}), \forall s c a l a r s (\neq 0) \in R; \\ β^{*} (F_{(y, A ⊤ x)}) = A^{- 1} β^{*} (F_{(y, x^{⊤})}), \forall n o n s i n g u l a r A \in R^{p \times p}, \end{matrix}$

respectively.
(ii): The maximum of PRD $(β; F_{(y, x^{⊤})})$ exists and is attained at a $β_{0} \in R^{p}$ with $∥ β_{0} ∥ < \infty$ .
(iii): The PRD $(β; F_{(y, x^{⊤})})$ monotonically decreases along any ray stemming from a deepest point in the sense that for any $β \in R^{p}$ and $λ \in [0, 1]$ ,

$\begin{matrix} P R D (λ β^{*} + (1 - λ) β; F_{(y, x^{⊤})}) & \geq & P R D (β; F_{(y, x^{⊤})}), \end{matrix}$

where $β^{*}$ is a maximum depth point of PRD $(β; F_{(y, x^{⊤})})$ for any $β \in R^{p}$ .

Now we are in a position to prove the proposition.

Assume, w.l.o.g., that

S (F_{y}) = 1

(since it does not involve

v

and

β

, it has nothing to do with the maximum depth point

β_{P R D}^{*}

). The existance of the maximum depth point (the regression median) is guaranteed in light of Lemma 4.1 above. We thus focus on the uniqueness. Assume that there are two maximum depth points

β_{1}^{*} \neq β_{2}^{*}

. We seek a contradiction.

Let

β_{0}^{*} = (β_{1}^{*} + β_{2}^{*}) / 2

. By virtue of Lemma 4.1 above,

β_{0}^{*}

is also a maximum depth point. By the invariance of the projection regression depth functional (see [11]) and Lemma 4.1 above, assume (w.l.o.g.) that

β_{0}^{*} = 0

.

For a given

β \in R^{p}

, write

g (β, v) : = T (F_{(y - x ⊤ β) / x ⊤ v})

. In light of the continuity of T in

v

, the generalized extreme value theorem on a compact set, and (A1), there exists a

v_{β} \in S^{p - 1}

such that

g (β, v_{β}) = sup_{v \in S^{p - 1}} | T (F_{(y - x ⊤ β) / x ⊤ v}) |

(14)

For simplicity, denote by

v_{0}

for

v_{β_{0}^{*}}

. Then we have

g (β_{0}^{*}, v_{0}) = T (F_{(y - x ⊤ β_{0}^{*}) / x ⊤ v_{0})}) = T (F_{y / x ⊤ v_{0}}) = sup_{v \in S^{p - 1}} | T (F_{(y - x ⊤ β_{0}^{*}) / x ⊤ v}) | : = α^{*} .

(15)

Denote by

l (β_{1}^{*}, β_{2}^{*})

the hyperline that connects

β_{1}^{*}

and

β_{2}^{*}

in the parameter space of

β \in R^{p}

. Consider two cases.

Case I

x

does not concentrate on any single hyperplane. In light of this assumption, there exists at least one

γ \in R^{p}

on

l (β_{1}^{*}, β_{2}^{*})

in the parameter space

R^{p}

such that

- x ⊤ γ \neq 0

. Assume (w.l.o.g.) that

- x ⊤ γ < 0 = - x ⊤ β_{0}^{*}

. By (15) and the strictly monotonicity of T, one has that for the

v_{γ}

defined in (14)

\begin{matrix} α^{*} = inf_{β \in R^{p}} sup_{v \in S^{p - 1}} | T (F_{(y - x ⊤ β) / x ⊤ v}) | & \leq sup_{v \in S^{p - 1}} | T (F_{(y - x ⊤ γ) / x ⊤ v)}) | = T (F_{(y - x ⊤ γ) / x ⊤ v_{γ}}) \\ < T (F_{(y - x ⊤ β_{0}^{*}) / x ⊤ v_{γ}}) \leq sup_{v \in S^{p - 1}} | T (F_{y / x ⊤ v}) | \\ = T (F_{y / x ⊤ v_{0})}) = α^{*} \end{matrix}

(16)

which is a contradiction. This completes the proof of the Case I.

Case II

x

concentrates on a single hyperplane. This implies that there is a

v \in S^{p - 1}

such that

x ⊤ (ω) v = 0

for any

ω \in Ω

. This contradicts (A0), however. This completes the proof of Case II and thus the proposition.

Remark 1.

(I) (A0) automatically holds if

x

has density or if

x

is not degenerated to concentrate on a single

(p - 1)

dimensional hyperplane. The latter means all x lie on the same point for

p = 1

, and they lie on a single line for

p = 2

, and lie on a plane for

p = 3

, and so on.

(II) (A1), (A2) and (A3) hold for a large class of T, such as the mean, weighted mean (Wu and Zuo (2009) [23]), and quantile functionals.

(III) There also exists a large class of T that is strictly monotonic. For example (i) If

T (F_{(y - x ⊤ β) / x ⊤ v})

=

E ((y - x ⊤ β) / x ⊤ v)

, then T is strictly monotonic at any β as long as the related expectations exist and

E (x ⊤ α / x ⊤ v) > 0

whenever

x ⊤ α > 0

for any

α \in R^{p}

and

v \in S^{p - 1}

. (ii) When

T (F_{(y - x ⊤ β) / x ⊤ v)}) = Q_{q} ((y - x ⊤ β) / x ⊤ v)

,

q \in (0, 1)

, where

Q_{q} (Z)

is the qth quantile associated with the random variable Z (i.e.,

Q_{q} (Z) = inf {z : P (Z \leq z) \geq q}

), then T is strictly monotonic at any β as long as the CDF of

Z (β; v, y, x) : = (y - x ⊤ β) / x ⊤ v

is not flat at β for a given

v \in S^{p - 1}

.

(IV) The proposition covers the sample case. That is, when

F_{(y, x^{⊤})}

is replaced by its sample version in the proposition, we have the uniqueness of the sample regression median induced from PRD, which is very helpful in the practical computation of the median and consistent with the finding in Figure 2.

5. Concluding Remarks

5.1. Why Do We Care About the Non-Uniqueness of Regression Medians?

Uniqueness is actually implicitly assumed when we discuss the property (such as the Fisher consistency, regression, scale and affine equivariance, or asymptotic breakdown point) of regression medians. Without the uniqueness, (i) the sample regression median can never converge in probability or in distribution to its population version, (ii) deepest regression will yield more than one response and residual for a given

x

, (iii) algorithms for the approximate computation of sample medians can never converge.

Uniqueness is so essential in our discussion of medians that there is a conventional remedy measure for non-uniqueness: to take average of all medians. This works in many scenarios, but not for

β_{R D_{R H}}^{*}

and

β_{D_{C}}^{*}

. This phenomenon for

β_{R D_{R H}}^{*}

was first noticed by Mizera and Volauf (2002) [24] and Van Aelst et al. (2002) [25]. Concrete examples such as the real data Example 3.2 and the artificially constructed one in the proof of Proposition 4.1 are presented here though.

5.2. Why Do We Just Treat Three Regression Medians?

D_{C}

([12]) and RD

_{R H}

([10]) are two pioneer notions of regression depth. PRD was recently introduced in [11]. The latter systematically studied the three regression depth notions w.r.t. four axiomatic properties, that is, (P1), (P2), (P3) and (P4) (see Section 2). It is found out that both regression depth RD

_{R H}

and projection regression depth PRD are real depth notions in regression since both satisfy (P1)–(P4). While the former needs an extra assumption (A) (see Section 2.2), the latter does not need any extra assumptions. On the other hand, Carrizosa depth

D_{C}

violates (P3) in general, hence is not a real regression depth notion w.r.t. the definition in [23]. That motivates us to just focus on RD

_{R H}

and PRD throughout.

5.3. Summary and Conclusions

In terms of robustness, both depth-induced medians are indeed robust. In fact, the median

β_{{R D}_{R H}}^{*}

can asymptotically resist up to

33 %

[13] contamination, whereas

β_{P R D}^{*}

can resist up to

50 %

[14] contamination without breakdown, sharply contrasting to the

0 %

of the classical LS estimator.

In terms of efficiency, sample

β_{P R D}^{*}

could possess a higher relative efficiency when compared with sample

β_{{R D}_{R H}}^{*}

(see [22]).

Now in terms of uniqueness,

β_{P R D}^{*}

again distinguishes itself from the leading depth median

β_{{R D}_{R H}}^{*}

by generally possessing the desirable uniqueness property.

From the computational point of view, RD (and

β_{{R D}_{R H}}^{*}

) has an edge over PRD (and

β_{P R D}^{*}

). The former is relatively easier to compute than the latter (see [22]).

By virtue of the performance criteria above, we conclude that PRD and

β_{P R D}^{*}

are promising options among the leading regression depths and their induced medians.

Funding

This research received no external funding.

Acknowledgments

The author thanks Hanshi Zuo for his careful proofreading of the manuscript and two anonymous referees for their helpful and constructive comments and suggestions, all of which have led to improvements in the manuscript. Special thanks go to the Managing Editor, Yuanyuan Yang for her enthusiastic invitation, encourage, and full support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Serfling, R.J. Approximation Theorems for Mathematical Statistics; John Wiley Sons Inc.: New York, NY, USA, 1980. [Google Scholar]
Donoho, D.L. Breakdown Properties of Multivariate Location Estimators. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1982. [Google Scholar]
Huber, P.J. Robust estimation of a location parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Liu, R.Y.; Parelius, J.M.; Singh, K. Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Stat. 1999, 27, 783–858, (With discussion). [Google Scholar]
Zuo, Y.; Serfling, R. General notions of statistical depth function. Ann. Statist. 2000, 28, 461–482. [Google Scholar] [CrossRef]
Tukey, J.W. Mathematics and the picturing of data. In Proceeding of the International Congress of Mathematicians, Vancouver; James, R.D., Ed.; Canadian Mathematical Congress: Montreal, QC, Canada, 1975; Volume 2, pp. 523–531. [Google Scholar]
Dohono, D.L.; Gasko, M. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Stat. 1992, 20, 1803–1827. [Google Scholar] [CrossRef]
Liu, R.Y. Data depth and multivariate rank tests. In L1-Statistical Analysis and Related Methods; Dodge, Y., Ed.; North-Holland: Amsterdam, The Netherlands, 1992; pp. 279–294. [Google Scholar]
Zuo, Y. Projection-based depth functions and associated medians. Ann. Stat. 2003, 31, 1460–1490. [Google Scholar] [CrossRef]
Rousseeuw, P.J.; Hubert, M. Regression depth (with discussion). J. Am. Stat. Assoc. 1999, 94, 388–433. [Google Scholar] [CrossRef]
Zuo, Y. On general notions of depth in regression. arXiv 2018, arXiv:1805.02046. [Google Scholar]
Carrizosa, E. A characterization of halfspace depth. J. Multivar. Anal. 1996, 58, 21–26. [Google Scholar] [CrossRef] [Green Version]
Van Aelst, S.; Rousseeuw, P.J. Robustness of Deepest Regression. J. Multivar. Anal. 2000, 73, 82–106. [Google Scholar] [CrossRef] [Green Version]
Zuo, Y.; Robustness of Deepest Projection Regression Depth Functional. Statistical Papers. 2019. Available online: https://doi.org/10.1007/s00362-019-01129-4 (accessed on 13 February 2020).
Zuo, Y. Multidimensional medians and uniqueness. Comput. Stat. Data Anal. 2013, 66, 82–88. [Google Scholar] [CrossRef]
Rousseeuw, P.J.; Leroy, A. Robust Regression and Outlier Detection; Wiley: New York, NY, USA, 1987. [Google Scholar]
Wu, M.; Zuo, Y. Trimmed and Winsorized Standard Deviations based on a scaled deviation. J. Nonparametr. Stat. 2008, 20, 319–335. [Google Scholar] [CrossRef]
Maronna, R.A.; Yohai, V.J. Bias-Robust Estimates of Regression Based on Projections. Ann. Stat. 1993, 21, 965–990. [Google Scholar] [CrossRef]
Rousseeuw, P.J.; Struyf, A. Characterizing angular symmetry and regression symmetry. J. Stat. Plan. Inference 2004, 122, 161–173. [Google Scholar] [CrossRef]
Rousseeuw, P.J.; Struyf, A. Computing location depth and regression depth in higher dimensions. Stat. Comput. 1998, 8, 193–203. [Google Scholar] [CrossRef]
Liu, X.; Zuo, Y. Computing halfspace depth and regression depth. Commun. Stat. Simul. Computat. 2014, 43, 969–985. [Google Scholar] [CrossRef]
Zuo, Y. Computation of projection regression depth and its induced median. arXiv 2019, arXiv:1905.11846. [Google Scholar]
Wu, M.; Zuo, Y. Trimmed and Winsorized means based on a scaled deviation. J. Stat. Plan. Inference 2009, 139, 350–365. [Google Scholar] [CrossRef]
Mizera, I.; Volauf, M. Continuity of halfspace depth contours and maximum depth estimators: Diagnostics of depth-related methods. J. Multivar. Anal. 2002, 83, 365–388. [Google Scholar] [CrossRef] [Green Version]
Van Aelst, S.; Rousseeuw, P.J.; Hubert, M.; Struyf, A. The deepest regression method. J. Multivar. Anal. 2002, 81, 138–166. [Google Scholar] [CrossRef]

Figure 1. Thirty bivariate standard normal points.

Figure 2. Regression depth (RD

_{R H}

) (left) and projection regression depth (PRD) (right) of 961 candidate parameter

β^{⊤}

’s w.r.t. 30 bivariate standard normal points.

Figure 2. Regression depth (RD

_{R H}

) (left) and projection regression depth (PRD) (right) of 961 candidate parameter

β^{⊤}

’s w.r.t. 30 bivariate standard normal points.

Figure 3. Five regression depth median lines based on the data from Hertzsprung–Russell diagram of the star cluster CYG OB1 (solid black for

β_{P R D}^{*}

; dashed red, dotted blue, and dotdash green all for

β_{R D}^{*}

; longdash purple for LS).

Figure 3. Five regression depth median lines based on the data from Hertzsprung–Russell diagram of the star cluster CYG OB1 (solid black for

β_{P R D}^{*}

; dashed red, dotted blue, and dotdash green all for

β_{R D}^{*}

; longdash purple for LS).

Figure 4. Residuals plots for three types of regression methods for the star cluster data. (a) Residuals plot for the LS method. (b) Residuals plot for one of deepest line induced from RD. (c) Residuals plot for the unique deepest line induced from PRD.

Table 1. Residuals analysis for three regression methods.

				$\hat{y}$			r
i	x	y	ls	rd	prd	ls	rd	prd
1	4.37	5.23	4.987329	4.860	4.910883	0.2426707	0.370	0.3191171
2	4.56	5.74	4.908802	5.335	5.448472	0.8311985	0.405	0.2915280
3	4.26	4.93	5.032793	4.585	4.599647	−0.1027927	0.345	0.3303528
4	4.56	5.74	4.908802	5.335	5.448472	0.8311985	0.405	0.2915280
5	4.30	5.19	5.016261	4.685	4.712824	0.1737395	0.505	0.4771762
6	4.46	5.46	4.950132	5.085	5.165530	0.5098681	0.375	0.2944696
7	3.84	4.65	5.206380	3.535	3.411292	−0.5563803	1.115	1.2387076
8	4.57	5.27	4.904668	5.360	5.476766	0.3653315	−0.090	−0.2067661
9	4.26	5.57	5.032793	4.585	4.599647	0.5372073	0.985	0.9703528
10	4.37	5.12	4.987329	4.860	4.910883	0.1326707	0.260	0.2091171
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
45	4.55	5.54	4.912935	5.310	5.420178	0.62706545	0.230	0.1198222
46	4.45	4.98	4.954265	5.060	5.137236	0.02573506	−0.080	−0.1572362
47	4.42	4.50	4.966664	4.985	5.052354	−0.46666406	−0.485	−0.5523537

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zuo, Y. Depth Induced Regression Medians and Uniqueness. Stats 2020, 3, 94-106. https://doi.org/10.3390/stats3020009

AMA Style

Zuo Y. Depth Induced Regression Medians and Uniqueness. Stats. 2020; 3(2):94-106. https://doi.org/10.3390/stats3020009

Chicago/Turabian Style

Zuo, Yijun. 2020. "Depth Induced Regression Medians and Uniqueness" Stats 3, no. 2: 94-106. https://doi.org/10.3390/stats3020009

APA Style

Zuo, Y. (2020). Depth Induced Regression Medians and Uniqueness. Stats, 3(2), 94-106. https://doi.org/10.3390/stats3020009

Article Menu

Depth Induced Regression Medians and Uniqueness

Abstract

1. Introduction

2. Maximum Depth Functionals (Regression Medians)

2.1. Median Induced from Regression Depth of RH99

2.2. Median Induced from Carrizosa Depth of C96

2.3. Median Induced from Projection Regression Depth of Z18a

3. Examples of Regression Depths and Regression Medians

4. Uniqueness of Regression Medians

4.1. Non-Uniqueness of $β_{R D_{R H}}^{}$ and $β_{D_{C}}^{}$

4.2. Uniqueness of $β_{P R D}^{*}$

5. Concluding Remarks

5.1. Why Do We Care About the Non-Uniqueness of Regression Medians?

5.2. Why Do We Just Treat Three Regression Medians?

5.3. Summary and Conclusions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Depth Induced Regression Medians and Uniqueness

Abstract

1. Introduction

2. Maximum Depth Functionals (Regression Medians)

2.1. Median Induced from Regression Depth of RH99

2.2. Median Induced from Carrizosa Depth of C96

2.3. Median Induced from Projection Regression Depth of Z18a

3. Examples of Regression Depths and Regression Medians

4. Uniqueness of Regression Medians

4.1. Non-Uniqueness of β R D R H * and β D C *

4.2. Uniqueness of β P R D *

5. Concluding Remarks

5.1. Why Do We Care About the Non-Uniqueness of Regression Medians?

5.2. Why Do We Just Treat Three Regression Medians?

5.3. Summary and Conclusions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. Non-Uniqueness of $β_{R D_{R H}}^{}$ and $β_{D_{C}}^{}$

4.2. Uniqueness of $β_{P R D}^{*}$