Regression medians and uniqueness

Notions of depth in regression have been introduced and studied in the literature. Regression depth (RD) of Rousseeuw and Hubert (1999), the most famous, exemplifies a direct extension of Tukey location depth (Tukey (1975)) to regression. The extension of another prevailing location depth, the projection depth (Liu (1992), and Zuo and Serfling (2000)), to regression is called the projection regression depth (PRD) (Zuo (2018a)(Z18a)). These two represent the most promising depth notions in regression (Z18a). Carrizosa depth DC (Carrizosa (1996)) is another notion of depth in regression. The most remarkable advantage of the notion of depth in regression is to introduce directly, the median-type estimator, the maximum (or deepest) regression depth estimator (regression median) for regression parameters in a multi-dimensional setting. The maximum (deepest) regression depth estimators (regression medians) serve as robust alternatives to the classical least squares or least absolute deviations estimator of the unknown parameters in a general linear regression model. The uniqueness of regression medians is central in the discussion of the asymptotics of sample regression medians and a desirable property for the convergence of approximate algorithm in the computation of sample regression medians. Are the regression medians induced from RD, PRD, and DC unique? Answering this question is the goal of this article. It is found that the regression median induced from PRD possesses the uniqueness property, unlike its leading competitors. This and other findings on the performance of PRD and its induced median indicate that the PRD and its induced median are highly favorable among its leading competitors. AMS 2000 Classification: Primary 62G08, 62G35; Secondary 62J05, 62J99.


Introduction
Regular univariate sample median defined as the innermost (deepest) point of a data set is unique. The population median defined as the 1 2 -th quantile of the underlying distribution is also unique. The most outstanding feature of the univariate median is its robustness. In fact, among all translation equivariant location estimators, it has the best possible breakdown point (Donoho (1982))) (and the minimum maximum bias if underlying distribution has a unimodal symmetric density (Huber(1964)). The univariate median also provides a base for a center-outward ordering (in terms of the deviations from the median), an alternative to the traditional left-to-right ordering.
To extend the univariate median to multidimensional settings and share its outstanding robustness property and alternative ordering idea is very much desirable for multidimensional data. One approach, among others, is via notions of data depth. General notions of data depth have been increasingly pursued and studied (Zuo and Serfling (2000) (ZS00)) since the pioneer proposal of Tukey (1975) (see Donoho and Gasko (1992)). Besides Tukey depth, another prevailing depth is the projection depth (PD) (Liu (1992), ZS00, and Zuo (2013)).
Depth notions in location have also been extended to regression. Regression depth (RD) of Rousseeuw and Hubert (1999) (RH99), the most famous, exemplifies a direct extension of Tukey location depth to regression. Projection regression depth (PRD) of Zuo (2018a) (Z18a) is another example of extension of prominent PD in location to regression. The RD and PRD represent two most leading notions of depth in regression (Z18a). Carrizosa depth D C (Carrizosa (1996)) is another notion of depth in regression (see Z18a).
One of the outstanding advantages of depth notions is that they can be directly employed to introduce median-type deepest estimating functionals (or estimators in the empirical case) for the location or regression parameters in a multi-dimensional setting based on a general min-max stratagem. The maximum (deepest) regression depth estimator (also called regression median) serves as a robust alternative to the classical least squares or least absolute deviations estimator for the unknown parameters in a general linear regression model: where ′ denotes the transpose of a vector, and random vector x = (x 1 , · · · , x p ) ′ and parameter vector β are in R p (p ≥ 2), and random variable y and e are in R 1 .
Robustness of the median induced from RD and PRD have been investigated in Van Aelst and Rousseeuw (2000) (VAR00) and Zuo (2018b), respectively. These medians, just like their location or univariate counterpart, indeed possess high breakdown point robustness.
Regression medians, just like their location or univariate counterpart, are expected to be unique. Uniqueness is the central feature when one deals with the convergence or the limiting distribution of the sample regression medians (see Zuo (2019a)). It is also a desirable property in the computation of the sample regression medians for the convergence of approximate algorithms. The uniqueness issue of multidimensional location medians has been addressed in Zuo (2013). Are the medians induced from regression depth notions generally unique? Answering this question is the goal of this article.
The rest of article is organized as follows. Section 2 introduces leading regression depth notions and induced regression medians. Empirical examples of regression depth and medians and their behavior are illustrated in Section 3. Section 4 establishes general results on uniqueness of regression medians. Brief concluding remarks in Section 5 end the article.
2 Maximum depth functionals (regression medians) 2.1 Regression median induced from regression depth Definition 2.1 For any β ∈ R p and the joint distribution P of (y, x) in (1), RH99 defined the regression depth of β, denoted here by RD RH (β; P ), to be the minimum probability mass that needs to be passed when tilting (the hyperplane induced from) β in any way until it is vertical. The maximum regression depth estimating functional β * RD RH is defined as Many characterizations of RD HR (β; P ), or equivalent definitions, have been given in the literature, see, e.g., Z18a and references cited therein.

Regression median induced from Carrizosa depth
Among regression depth notions investigated in Z18a, Carrizosa depth D C (β; P ) for any β ∈ R p and underlying probability measure P associated with (y, x), was first introduced in Carrizosa (1996) and thoroughly investigated in Z18a, where r(β) = y − x ′ β. As characterized in Z18a (see Proposition 2.2 there), it turns out that D C (β; P ) = P (r(β) = 0).
The maximum regression functional (or regression median) was then defined as As shown in Z18a, β * D C always exists if the assumption: (A) P (H v ) = 0, for any vertical hyperplane H v , holds. Unfortunately, as D C lacks a key desired property (see Z18a), we will not focus on it in the sequel.

Regression median induced from projection regression depth
Hereafter, assume that T is a univariate regression estimating functional which satisfies (A1) regression, scale and affine equivariant. That is, respectively, where x, y ∈ R are random variables. Throughout, the lower case x stands for a variable in R while the bold x for a vector in R p (p > 1) and · for the Euclidean norm of a vector.
Let S be a positive scale estimating functional that is scale equivariant and location invariant.
) include, among others, the mean, weighted mean, and quantile functionals. A particular example is Equipped with a pair of T and S, we can introduce a corresponding projection based regression estimating functional. By modifying a functional in Marrona and Yohai (1993) to achieve scale equivarance, Z18a defined which represents unfitness of β at Thus, overall one expects |T | to be small and close to zero for a candidate β, independent of the choice of v and x ′ v. The magnitude of |T | measures the unfitness of β along the v. Taking the supremum over all v ∈ S p−1 , yields the unfitness of β at F (y, x) w.r.t. T . Now applying the min-max scheme, Z18a obtained the projection regression estimating functional (also denoted by β * PRD ) w.r.t. the pair (T, S) = argmax β∈R p PRD β; F (y, x) , T , where the projection regression depth (PRD) function was defined in Z18a as Just like S (which is for achieving scale invrarance and nominal), T sometimes is also suppressed in above functionals for simplicity.
For robustness consideration, in the sequel, (T, S) is the fixed pair (Med, MAD), unless otherwise stated. Hereafter, we write Med(Z) rather than Med(F Z ). For this special choice of T and S, we have that

Examples of regression depths and regression medians
Example 3.1 This example illustrates the behavior of the empirical regression depth RD RH and projection regression depth (PRD) based on 30 randomly generated bivariate standard normal points (plotted in Figure 1).
We select 961 equal-spaced grid points from the square of [x, y] with range |x| ≤ 3 and |y| ≤ 3, then treat each point as a β ′ and compute its regression depth (RD RH and PRD) w.r.t. the 30 bivariate normal points. The depths of these 961 points are plotted in Figure 2.  Permormance of regression depth medains on uniqueness Example 3.2 This example illustrates the uniqueness behavior of the regression depth (RD RH and PRD) induced medians at the empirical distribution case via a concrete example on the real data from Hertzsprung-Russell diagram of the star cluster CYG OB1 (see Table 3 of Rousseeuw and Leroy (1987)), which contains 47 stars in the direction of Cygnus. Here x is the logarithm of the effective temperature at the surface of the star (T e ), and y is the logarithm of its light intensity (L/L 0 ); see Figure 3 for the plot of the data set.
Five regression lines are plotted in Figure 3. Among them, three (red, blue, and green) are regression medians from RD RH , one (black) from PRD, and the other (purple) is the least squares line. Note that the classical least squares regression estimator (as well as many traditional regression estimators) could be regarded as a depth induced median under the general "objective depth" D Obj framework (see Z18a). Thus, for the benchmark purpose, the least squares line is also plotted in Figure 3 alongside the four other median lines.
The LS line also justifies the legitimacy of the existence of RD RH and PRD induced medians (as robust alternatives) since the LS line fails to capture the main-sequence/pattern of the data cloud (stars) and is heavily affected by four giant stars whereas the other four depth medians resist the four leverage points (outliers) and catch exactly the main trend/cluster. It turns out that there exist infinitely many maximum depth lines (medians) induced from RD RH . Among them, there are three lines that go through exactly two data points. In terms of (intercept, slope) form, they are (-6.065000, 2.500000), (-8.586500, 3.075000), and (-7.903043, 2.913043). These lines are colored by red, blue, and green in Figure 3. All three possess regression depth 21/47. On the other hand, there exists only one maximum regression line (median), (-7.453665 2.829416), induced from PRD, colored in black in Figure  3, with PRD 0.8585901. Incidentally, the LS line is (6.7934673, -0.4133039).
The computation issues of RD RH have been discussed in RH99, Rousseeuw and Struyf (1998), and Liu and Zuo (2014). For the discussion on the computation of the PRD and induced regression medians, see Zuo (2019b) (Z19b).
With the empirical distribution case, one can always take the average over all regression medians to take care of non-uniqueness issue. The uniqueness issue is more central with the population case since without the uniqueness, it is impossible to talk about the convergence and the limiting distribution of the unique empirical regression median.
From the empirical example above, we see that the empirical regression medians induced from RD RH can be infinitely many while in the case of PRD there exists a unique regression median. Certainly, the single empirical example above does not represent a general behavior on RD RH or PRD. To draw general conclusions we must establish some general results for any general distribution.

Uniqueness of regression medians
The results in the last section are just empirical special examples and not for general cases. In the following, we address general cases and draw general conclusions.
Under certain symmetry assumption (e.g. regression symmetry of Rousseeuw and Struyf (2004) (RS04)) and other conditions, the regression median induced from RD RH can be unique (see Theorem 3 and Corollary 3 of RS04). However, generally speaking, we have Proposition 4.1 β * RD RH (F (y, x) ) is not unique in general.
Denote by H β the hyperplane determined by y = x ′ β for any β ∈ R p and by θ β the acute angle formed between the hyperplane H β and the horizontal hyperplane H h (y=0).
Assume that β i ∈ R p , (i = 1, 2), β 1 = β 2 , and H β i each contains 1/2 probability mass; any hyperline in H β i contains no probability mass (i = 1 or 2); θ β 1 = θ β 2 , and H β 1 intersects with H β 2 at a hyperline in the horizontal hyperplane H h . Now in light of Definition 2.1 of RD RH , it is readily seen that at each β i , RD RH (β i ; P ) attains the maximum depth value 1/2.
For special distributions, the median induced from Carrizosa depth can also be unique. But generally speaking, it is not unique. (F (y,x) ) is not unique in general.
Proof: In light of (4)) and the proof of Proposition 3.1 above, it is trivial and thus skipped.
To prove the Proposition, we first invoke the following result. (i) The β * P RD (F (y, x) ) is regression, scale and affine equivariant in the sense that respectively.
(ii) The maximum of PRD(β; F (y,x) ) exists and is attained at a bounded β 0 ∈ R p .
(iii) The PRD(β; F (y,x) ) monotonically decreases along any ray stemming from a deepest point in the sense that for any β ∈ R p and λ ∈ [0, 1], where β * is a maximum depth point of PRD(β; F (y,x) ).
Proof: see Zuo (2018a). Now we are in the position to prove the Proposition.

Proof of Proposition 4.3:
Assume, w.l.o.g., that S(F y ) = 1 (since it does not involve v and β thus has nothing to do with the maximum depth point β * P RD ). The existance of the maximum depth point (the regression median) is guaranteed in light of Lemma 3.1 above. We thus focus on the uniqueness. Assume that there are two maximum depth points β * 1 = β * 2 . We seek a contratiction.
For a given β ∈ R p , write g(β, v) := T (F (y−x ′ β, x ′ v) ). In light of the continuity of T in v, the generalized extrme value theorem on a compact set, and (A1), there exists a v β ∈ S p−1 such that For simplicity, denote by v 0 for v β * 0 . Then we have Denote by l(β * 1 , β * 2 ) the hyperline that connects β * 1 and β * 2 in the parameter space of β ∈ R p . Consider two cases.
Case I x does not concentrate on any single hyperline. In light of this assumption, there exists at least one γ ∈ R p on l(β * (11) and the strictly monotonicity of T , one has that for the v γ defined in (10) which is a contradiction. This completes the proof of the Case I.
Case II x concentrates on a single hyperline. Then the distribution of (y, x) is degenerated to a vertical two-dimensional hyperplane. By the affine equivarance of the regression median (see Lemma 3.1), assume (w.l.o.g.) that we have a simple regression problem in the (x, y) two-dimensional plane.
Following the proof of case (I), (i) if there is a γ ∈ R 2 on l(β * 1 , β * 2 ) in the parameter space we reach a contradiction like in (12); (ii) if there is no such γ ∈ R 2 on l(β * 1 , β * 2 ) in the parameter space, then for any γ ∈ R 2 on the line l(β * 1 , β * 2 ) in the parameter space, which implies that (a) either l is degenerated to a single point (0, 0) ′ (the original), or (ii) x is degenerated to 0 since x(ω) = 0, ∀ ω ∈ Ω. The latter case is excluded from the given condition, the former case exactly means the uniqueness of β * . This completes the proof of Case II and thus the Proposition. (III) There also exists a large class of T that is strictly monotonic. For example (i) If x ′ v , then T is strictly monotonic at any β as long as the related expectations exist and is the qth quantile associated with the random variable Z (i.e. Q q (Z) = inf{z : P (Z ≤ z) ≥ q}), then T is strictly monotonic at any β as long as the CDF of Z(β; v, y, is not flat at β for a given v ∈ S p−1 . (IV) The Proposition covers the sample case. That is, when F (y,x) is replaced by its sample version in the Proposition, we have the uniqueness of sample regression median induced from PRD, which is very helpful in the practical computation of the median and consistent with the finding in Figure 2.

Concluding remarks
(A) In terms of four axiomatic properties (see Z18a), that is, (P1) Invariance, (P2) Maximality at center, (P3) Monotonicity relative to deepest point, and (P4) Vanishing at infinity, both regression depth RD and projection regression depth PRD are real depth notions in regression since both satisfy (P1) to (P4), while the former need an extra assumption (A) whereas the latter does not need any extra assumption (see Z18a). On the other hand, Carrizosa depth D C violates (P3) in general, hence is not a real regression depth notion w.r.t. the definition in Z18a. That motivates us just focusing on RD RH and PRD here.
(B) In terms of robustness of depth induced medians, both depth induced medians are indeed robust. In fact, the median induced from RD, β * RD RH could asymptotically resist up to 33% (VAR00) contamination, whereas the one from PRD, β * P RD , could resist up to 50% (Zuo (2018b)) contamination without breakdown, sharply contrasting to the 0% of the classical LS estimator.
(C) In terms of efficiency, sample β * P RD possesses a higher relative efficiency when compared with sample β * RD RH (see Z19b).
(D) Now in terms of uniqueness, β * P RD again distinguishes itself from its competitors by generally possessing the desirable uniqueness property.
By virtue of the (A)-(D) performance criteria above, we conclude that PRD and β * P RD are highly competitive among its leading competitors.