On the Convergence and Law of Large Numbers for the Non-Euclidean L p-Means

This paper describes and proves two important theorems that compose the Law of Large Numbers for the non-Euclidean Lp-means, known to be true for the Euclidean L2-means: Let the Lp-mean estimator, which constitutes the specific functional that estimates the Lp-mean of N independent and identically distributed random variables; then, (i) the expectation value of the Lp-mean estimator equals the mean of the distributions of the random variables; and (ii) the limit N → ∞ of the Lp-mean estimator also equals the mean of the distributions.

In [1][2][3], a generalized characterization of means was introduced, namely, the non-Euclidean means, based on metrics induced by L p -norms, wherein the median is included as a special case for p = 1 (L 1 ) and the ordinary Euclidean mean for p = 2 (L 2 ) (see also: [4,5]).Let the set of y-values {y k } W k=1 (y k ∈ D y ⊆ ), associated with the probabilities {p k } W k=1 ; then, the non-Euclidean means µ p , based on L p -norms, are defined by where the median µ 1 and the arithmetic mean µ 2 follow as special cases when the Taxicab L 1 and Euclidean L 2 -norms are respectively considered.Both the median µ 1 and arithmetic µ 2 means can be implicitly written in the form of Equation (1) as ∑ W k=1 p k sign(y k − µ 1 ) = 0 and , respectively.Note that the solution of Equation ( 1) is a specific case of the so-called M-estimators [6], while it is also related to the Fréchet Means [7].The Euclidean norm L 2 is also known as "Pythagorean" norm.In [3], we preferred referring to the non-Pythagorean norms as non-Euclidean, inheriting the same characterization to Statistics.One may adopt the more explicit characterizations of "Non-Euclidean-normed" Statistics, for avoiding any confusion with the non-Euclidean metric of the (Euclidean-normed) Riemannian Geometry.As an example of an application in physics, the L p expectation value of an energy spectrum {ε k } W k=1 is defined by representing the non-Euclidean adaptation of internal energy U p [8]. Figure 1 illustrates an example of L p -means.We use the Poisson distribution p k = e −λ λ k /k! and the dataset y k = k, for k = 1, ..., W; hence, the L p -means are implicitly given by note that the constant term e −λ can be ignored).The function µ p = µ p (λ) is examined for various values of the p-norm, either (a) super-Euclidean, p > 2, or (b) sub-Euclidean p < 2. The mean value for the Euclidean case, p = 2, is µ 2 = λ, which is represented by the diagonal line in both panels.We observe that for p > 2 we always have µ p > λ, while for p < 2 there is a critical value λ * (p), for which µ p > λ for λ > λ * and µ p < λ for λ < λ * .The critical value λ * (p) increases with p, and as p → 2, µ p → λ.For λ = 1, µ p = 1 for any p ≤ 2, while for λ = 0, µ p = 0 for any values of p.The Law of Large Numbers is a theorem that guarantees the stability of long-term averages of random events, but is valid only for Euclidean metrics based on L 2 -norms.The purpose of this paper is to extend the theorem of the "Law of Large Numbers" to the non-Euclidean, L p -means.Namely, (i) the expectation value of the L p -mean estimator (that corresponds to Equation (1)) equals the mean of the distribution of each of the random variables; and (ii) the limit N → ∞ of the L p -mean estimator also equals the mean of the distributions.These are numbered as Theorems 2 and 3, respectively.The paper is organized as follows: In Section 2, we prove the theorem of uniqueness of the L p -means (Theorem 1).This will be used in the proofs of Theorems 2 and 3, shown in Sections 3 and 4, respectively.Finally, Section 5 briefly summarizes the conclusions.Several examples are used to illustrate the validity of the Theorems 1-3, that is, the Poisson distribution (discrete description) and a superposition of normal distributions (continuous description).

Uniqueness of L p -Means
Here, we show the theorem of uniqueness of the L p -means for any p > 1.The theorem will be used in the Theorem 2 and 3 of the next sections.
Theorem 1.The curve µ p (p) is univalued, namely, for each p > 1, there is a unique value of the L p -mean µ p (p).
Proof of Theorem 1.Using the implicit function theorem [9], we can easily show the uniqueness in a sufficiently small neighbourhood of p = 2. Indeed, there is at least one point, that is the Euclidean point (p = 2, µ p = µ 2 ), for which the function µ p (p) exists and is univalued.Then, the values of µ p (p), ∀p > 1, can be approximated to any accuracy, starting from the Euclidean point.The implicit function F(p, µ p ) = 0, defined by Equation ( 1), is continuous and ∂F(p, µ p )/∂µ p = (p-1) . Indeed, the inverse derivative is non-zero for any p, i.e., The inverse function, p(µ p ), should be continuous and differentiable according to Equation ( 2).If µ p (p) were multi-valued, then, it should have local minima or maxima.However, the derivative dp/dµ p is non-zero.Therefore, we conclude that p(µ p ) cannot be multi-valued, and there is a unique curve µ p (p) that passes through (p = 2, µ p = µ 2 ).
As an example, Figure 2 plots the L p -means of the Poisson distribution shown in Figure 1, but now as a function of the p-norm, and for various values of 0 < λ < 1.For λ < ln 2, the function µ p (p) is monotonically increasing with p.On the contrary, for λ > ln 2, the function µ p (p) is not monotonic, having a minimum in the region of sub-Euclidean norms, 1 < p < 2. The separatrix between these two behaviors of µ p (p) is given for λ = ln 2. We observe that the function µ p (p) is differentiable, ∂µ p /∂p is always finite or ∂p/∂µ p is always non-zero, thus µ p (p) is unique for any value of p.
Finally, we note that the uniqueness of µ p for a given p does not ensure monotonicity, as different values of p may lead to the same L p -mean.Such an example is the L p -means of the Poisson distribution for λ > ln 2, shown in Figure 2. As stated and illustrated in [3], when the examined probability distribution is symmetric, then the whole set of L p -means degenerates to one single value, while when it is asymmetric, a spectrum-like range of L p -means is rather generated.

The Concept of L p -Expectation Values
Given the sampling {y i } N i=1 , the L p -mean estimator µ p, N = µ p, N ({y j } N j=1 ; p) is implicitly expressed by Then, the L p expectation value of µ p, N ({y j } N j=1 ; p), namely µ p, N p ≡ Êp [ µ p, N ({y j } N j=1 ; p)], is implicitly given by where P ({y j } N j=1 ) is the normalized joint probability density, so that This property is formally called Exchange-ability [10] and will be used in Lemmas 1 and 2.
Next, we postulate and prove Lemmas 1 and 2, which are necessary for the following Theorem 2 about the expectation value of the L p -mean estimator.
Lemma 1.The symmetrically distributed random variables {Y i } N i=1 are characterized by the same L p expectation value, namely, Y i p = Êp (Y i ) = µ p ∈ , ∀ i = 1, . . ., N, which is implicitly given by where P y i (u) ≡ P y (u), ∀ i = 1, . . ., N, is the marginal distribution density, that is identical for all the random variables {Y i } N i=1 .
Proof of Lemma 1.The y i -marginal probability density, P y i (y i ), is so that Given the symmetrical joint distribution, we have Hence, the expression of the marginal distribution density P y i (u) is identical ∀ i = 1, . . ., N, i.e., for all the random variables, P y i (u) ≡ P y (u).
Then, we readily derive that the random variables {Y i } N i=1 are characterized by the same L p expectation value, namely, Y i p = Êp (Y i ) = µ p ∈ , ∀ i = 1, . . ., N, which is implicitly expressed by Indeed, if we had Y i p = µ pi ∈ , ∀ i = 1, . . ., N, then and for k = i, However, given the uniqueness of the L p -means, Equations ( 11) and ( 12) lead to

Lemma 2. Let the auxiliary functionals {G
Proof of Lemma 2. The L p expectation value of G i p is implicitly given by If G i p = 0, then × P({y j } N j=1 ) dy 1 . . .dy N = 0, (14) while if G i p = 0, then the above functional has to be non-zero, because of the uniqueness of L p expectation values, namely, Now, rewriting Equation ( 15) for an index k( = i), we have because of the symmetrical distribution of random variables {y j } N j=1 , i.e., P(y 1 ...y k ...y i ...y N ) = P(y 1 ...y i ...y k ...y N ), ∀ i, k( = i) = 1, . . ., N, (the same symmetry holds also for the estimator µ p, N , while the integration on each y i spans the same integral D y .Hence, C i (p, N) = C k (p, N) ≡ C(p, N).Then, by summing both sides of Equation ( 15) with ∑ N i=1 , we conclude in or C(p, N) = 0. Thus, Equation ( 14) holds, and given the uniqueness of L p expectation values, we conclude in G i p = 0, ∀ i = 1, . . ., N.
Theorem 2. Consider the sampling {y i } N i=1 , y i ∈ D y ⊆ , ∀ i = 1, . . ., N, of the symmetrically distributed random variables {Y i } N i=1 .According to Lemma 1, the random variables are characterized by the same L p expectation value (assuming that this exists), namely, Y i p = Êp (Y i ) = µ p ∈ , ∀ i = 1, . . ., N, which is implicitly expressed by Equation (6).Then, the L p expectation value of the L p -mean estimator µ p, N ({y j } N j=1 ; p) is equal to µ p , i.e., µ p, N p = Êp [ µ p, N ({y j } N j=1 ; p)] = µ p or Proof of Theorem 2. (For useful inequalities, see [11]) Apparently, the following integral inequalities hold: and, ∀ i = 1, . . ., N, 0 = y i ∈D y y i − µ p p−1 sign(y i − µ p )P y (y i ) dy i ≤ y i ∈D y y i − µ p p−1 P y (y i ) dy i . ( Furthermore, we consider the L p expectation value of the functional g({y j } N j=1 ; p) ≡ µ p, N ({y j } N j=1 ; p) − µ p , namely, g p = Êp (g({y j } N j=1 ; p)), which is implicitly given by becomes symmetric and the deviation | µ p, N − µ p | obtains small values (while its minimization at a certain p loses its meaning).We observe that for δa ≈ 0.01 or smaller, the deviation is small enough-of the order of 10 −4 − 10 −3 (it is non-zero because of the computation errors caused by the finite N), so that µ p, N ≈ µ p .

Conclusions
The Euclidean L 2 means are derived by minimizing the sum of the total square deviations, i.e., the Euclidean variance.In a similar way, the non-Euclidean L p means were developed by minimizing the sum of the L p deviations, that is proportional to the L p variance [3].The main advantage of the new statistical approach is that the p-norm is a free parameter, thus both the L p -normed expectation values and their variance are flexible to analyze new phenomena that cannot be described under the notions of classical statistics based on Euclidean norms.The least square method based on the Euclidean norm, p = 2, and the least absolute deviations method based on the "Taxicab" norm, p = 1, are some cases of the general fitting methods based on the L p -norms (e.g., [15]; for more applications on the fitting methods based on Lp norms, see: [2,4,16,17]; several other applications can be in signal processing optimization and block entropy analysis, e.g., [2]; in image processing, e.g., [18]; in general data analysis, e.g., [5]; in statistical mechanics, e.g., [3,8,19].The Law of Large Numbers is a theorem that guarantees the stability of long-term averages of random events, but is valid only for metrics induced by the Euclidean L 2 norm.The importance of this paper is in extending this theorem for L p -norms.Other interesting applications will be to establish a central limit theorem applied for the L p -means.

Figure 2 .
Figure 2. Uniqueness of the L p -means of the Poisson distribution.The means are plotted as a function of the p-norm, and for various values of 0 < λ < 1, that is, λ < ln 2 (red solid), λ = ln 2 (black dash), and λ > ln 2 (blue solid).