Expectation Values and Variance Based on L p-Norms

This analysis introduces a generalization of the basic statistical concepts of expectation values and variance for non-Euclidean metrics induced by L-norms. The non-Euclidean L means are defined by exploiting the fundamental property of minimizing the L deviations that compose the L variance. These L expectation values embody a generic formal scheme of means characterization. Having the p-norm as a free parameter, both the L-normed expectation values and their variance are flexible to analyze new phenomena that cannot be described under the notions of classical statistics based on Euclidean norms. The new statistical approach provides insights into regression theory and Statistical Physics. Several illuminating examples are examined.

1. Introduction 2500 years after the Pythagorean discipline and the "self-imposed" intense study on the arithmetic, geometric and harmonic means, the power-means of the elements {y i } N i=1 , y i ∈ D y ⊆ ℜ, ∀ i = 1, . . ., N , given by M p ({y i } N i=1 ) = ( ∑ N i=1 y i p /N ) 1/p , were introduced [1] as suitable generalization of the former Pythagorean ones, for p = 1, 0, −1, respectively.An even more general characterization, preceded of the power-means, concerns the Kolmogorov-Nagumo means [2,3], also known as φ-means, which are expressed in terms of a strictly monotonic function φ, as M φ ({y i } ∑ N i=1 φ(y i )/N ].Hardy et al. [4] showed that the φ-means are characterized by various fundamental properties of the ordinary arithmetic means.Later, Ben-Tal [5] showed that the φ-means are indeed ordinary arithmetic means defined on linear spaces with suitably chosen operations of addition and multiplication.The latter justifies the φ-means alternative names of "quasilinear" [4] or "quasiarithmetic" [6].The series {y i } N i=1 can be rearranged, according to the y-values of its elements, to the set {y k } W k=1 , where each value y k has possibility p k = ∆N k N , with ∆N k being the number of elements {y i } N i=1 that follow the equality y i = y k (for examples, see [7]).However, the probability distribution {p k } W k=1 can be constructed directly in association with {y k } W k=1 , and without the necessity of the series {y i } N i=1 .Then, the relation p k = p k ({y k } W k=1 ), ∀ k = 1, . . ., W can be derived.These weighted φ-means were introduced by [8], and expressed in terms of the probability distribution {p k } W k=1 , namely, M φ ({y giving advance to the axiomatic theory of information functions [9]. The φ-means are found to have useful applications in a variety of topics, namely, in statistics [10], in decision theory [11], in signal processing [12], in thermostatistics [13,14], etc.Within the framework of signal processing, [12] succeeded in specifying classes of signals associated with the quasiarithmetic mean of two variables.However, the main efforts to use the φ-means to this topic, are typically addressed in signal denoising processes.For example, by applying the power-means, the moving average shifts towards small signal values for small p and emphasizes large signal values for large p, thus highlighting the fluctuations of the preferred scaling.As soon as the nonlinear function φ is appropriately chosen, φ-means-based filters can efficiently reduce noise at a preferred scale of signal values.Another significant nonlinear filtering scheme, having interesting properties in signal denoising, is based on the moving median, instead of the moving φ-means technique [15,16].Lately, the median value found to play a key-point role in signal processing optimization and block entropy analysis [7,17].The median is based on the non-Euclidean norm L 1 (Taxicab norm), and thus, it does not embody the characterization of φ-means, which are in their basis Euclidean, i.e., they are induced by the Euclidean norm L 2 [18].
In [7] a novel generalized characterization of means was introduced, namely, the non-Euclidean means, based on metrics induced by L p norms, wherein the median is included as special case for p = 1 (L 1 ), while the non-Euclidean φ-means can also be defined.(See also the work of [19], where the general clustering approaches is investigated using, among others, similar well-defined means based on non-Euclidean optimization.)In this way, the L p expectation value of a given energy spectrum {ε k } W k=1 is defined, representing the non-Euclidean adaptation of internal energy U p .(This issue is mentioned in Section 3.1; it can be further examined in the framework of non-Euclidean-normed Statistical Mechanics.) The Euclidean norm L 2 is also known as "Pythagorean" norm.Hereafter, we prefer referring to the non-Pythagorean norms as non-Euclidean, inheriting the same characterization to Statistics.One may adopt the more explicit characterizations of "non-Euclidean-normed" statistics, for avoiding any confusion with the non-Euclidean metric of the (Euclidean-normed) Riemannian geometry.
The paper is organized as follows.In Section 2, the L p means are defined, resulting from the minimization of the respective L p deviations.Similarly, the more generic Φ-normed φ-means are also defined.The general scheme of means characterization, wherein the L p means are embodied, is given.In Section 3, the concept of L p -expectation values is thoroughly studied.The L p means are expressed in terms of the Lp operator, which helps to automatically retrieve the non-Euclidean representation of a formulation from its respective Euclidean one.The non-Euclidean expectation and variance operators are accordingly deduced.Several examples of Statistical Mechanics are examined: gas in thermal equilibrium, space plasmas out of thermal equilibrium, and multi-dimensional quantum harmonic oscillator at thermal equilibrium.Section 4 introduces the L p variance; its expression is justified and examined in detail.In Section 5, some further analytical and numerical examples are examined.It is shown that in the case of symmetric distributions of data, the whole set of L p means degenerate to one single value, while for asymmetric ones, spectrum-like range of L p means is generated.In addition, we deal with numerical data of the magnitude of Earth's total magnetic field.Finally, Section 6 draws the conclusions.

The Means Characterization Based on Optimization Methods
The non-Euclidean means µ p , based on L p norms, are defined as follows where the median µ 1 and the arithmetic mean µ 2 follow as special cases for the Taxicab L 1 and Euclidean L 2 , respectively.Both the median µ 1 and arithmetic µ 2 means can be implicitly written in the form of Equation ( 1) as , for describing the optimal values α * = µ 1 and α * = µ 2 , respectively [7,18,20].Similarly, the L p mean µ p emerges from minimizing the Total p-Deviations, i.e., The optimization leads to (i) the normal Equation (1), from which the optimal parameter α * = µ p can be derived as an implicit expression of p, (ii) the total deviations minimum value which is the pth absolute moment at the L p mean α * = µ p .Therefore, the Euclidean mean µ 2 , known as the one minimizing the (Euclidean) variance, is generalized to the L p mean µ p , which is the one minimizing the L p variance, a quantity proportional to T D p ({y k } W k=1 ; α ; p) p (Sections 3.1,4).The generalized L p φ-means µ (φ,p) can be defined given the strictly monotonic function φ : which is the normal equation derived from the Total (φ, p)-Deviations, Thereafter, we arrive at the classical φ-means µ (φ,p=2) = M φ by considering the Euclidean norm.Even further, by considering an arbitrary functional norm Φ : u → Φ(|u|) (Φ(0) = 0), the Total (φ, Φ)-Deviations are formulated leading to the normal equation from which we obtain the Φ-normed φ-means µ (φ,Φ) .It is noted that the solution of Equation ( 5) is generally called M -estimator, a broad class of estimators, which are obtained by minimizing the sums of functions of data.An M -estimator can be defined to be a zero of an estimating function that often is the derivative of another statistical function.When this differentiation is possible, leading to Equation ( 6), the M -estimator is said to be of ψ-type.(For more on M -estimators theory, see [21,22].)

Formal Scheme of Means Characterization
The characterization of means based on optimization methods has already been found useful, providing insights into the optimization theory and its applications (e.g., see [7]).The most important application involves reestablishing the concept of expectation values.However, we have to be certain that the optimization characterization of means, as described in Section 2.1, can embody the general scheme of means characterization.
Aczél also pointed [23] that the "internness" property, i.e., M in(y 1 , y 2 ) ≤ M (y 1 , y 2 ) ≤ M ax(y 1 , y 2 ), follows from preconditions (i), (ii), (iv).However, it is evident that the internness, together with the continuity (i), leads to the reflexivity (iv).This remark implies that we can settle the internness as precondition, instead of the strict monotonicity and reflexivity.(Besides, the median does not follow strict monotonicity, while it follows the internness.In addition, internness ensures that the mean shall preserve the units of y-values under units-transformations.)Then, turning aside of preconditions (ii) and (iii) that entail the Euclidean character of φ-means, an alternative characterization of means can be given by the univalued, N -multivariable function M ({y i } N i=1 ), fulfilling the three preconditions: ). (This characterization considered also by [25] for N = 2.) It can be easily verified that the non-Euclidean, Φ-normed, φ-means µ (Φ,φ) obey to the above characterization.Throughout, we deal with the L p means µ p .The uniqueness of µ p means can be provided for any p ≥ 1. (The restriction p ≥ 1 is required for the triangle inequality of the norm's definition to hold.)In particular, the non-zero derivative and the fact that the derivatives of any order of µ p (p), inductively, are analytically expressed in terms of the (univalued) Euclidean mean µ p (p)| p=2 = µ 2 , can ensure for the uniqueness of µ p , for a given p > 1 (for p = 1, see [7]).Note that for the specific case where ∃k = k ′ : y k ′ = µ p , then for p < 2, Equation ( 7)

The Non-Euclidean Norm Operator Lp
The non-Euclidean L p -expectation value ⟨y⟩ p is implicitly given by and it is apparent that the most of the fundamental properties of the Euclidean expectation value are not inherited by the non-Euclidean expectation values.In particular, we distinguish among others, the following two Euclidean properties: (i) ⟨y⟩ 2 = ∑ W k=1 p k y k (by definition), and (ii

, W , and
. Now, we examine whether the above two properties can be fulfilled even for the case of non-Euclidean L p -expectation value ⟨y⟩ p , and a suitably transformed data set {y k .Indeed, this is true for the specific transformation Lp : where Finally, Lp exhibits the following properties: (iii) Norm-derivative (Equation ( 7)): (iv) In the Euclidean case, Lp degenerates to the identity operator Lp=2 = 1 .
(vi) Non-additivity of Lp : As we mentioned in property (v), Equation ( 8) can be rewritten in the form ⟨ Lp (y Obviously, this leads to Equation ( 8), for any value of the scalar C ̸ = 0.However, property (ii) is fulfilled if and only if C has the specific expression C = 1 (p−1)ϕp (see further below).Property (ii) is important when we incorporate the non-Euclidean-normed Statistics in Statistical Mechanics: (a) The Canonical probability distribution can be automatically derived and explicitly expressed.(If the scalar C were expressed by any other formulation, after the extremization of entropy in the Canonical Ensemble, we would not be able to solve in terms of the probability, namely, to express explicitly the probability in terms of the energy.)(b) The basic relation that connects Statistical Mechanics with Thermodynamics has to remain the same with the classical case.Namely, the classical relation between the derivative of the partition function Z p and the mean energy (internal energy) U p has to remain invariant, independently of the p-norm.In other words, if then the specific expression of the scalar C = 1 (p−1)ϕp , yields the following scheme: where the non-Euclidean L p expectation value of energy states {ε k } W k=1 yields the internal energy ⟨ε⟩ p = U p , while the L p Canonical partition function Z p (in Boltzmann-Gibbs Statistical Mechanics) is given by Z which equals U p , if and only if C = 1 (p−1)ϕp .On the other hand, we have which equals zero, leading to Equation (11) As an example, the L p Canonical partition function in Boltzmann-Gibbs Statistical Mechanics is given by where {ε k } W k=1 is a discrete energy spectrum.Consequently, according to the above considerations, the non-Euclidean L p -expectation estimator Êp , acting on a random variable Y , is given by where Ê2 (≡ Ê) is the classical (Euclidean) expectation estimator.
On the other hand, the non-Euclidean L p -variance estimator σ 2 p has to result to the Total p-Deviations of Equation ( 2), or at least, to be proportional to that, such that its minimization leads to Equation (1).This can be achieved by setting, (Section 4 revisits the concept of non-Euclidean L p -variance, providing a convincing and consistent justification of Equation ( 17).) Finally, given a set of random variables {Y i } n i=1 , the non-Euclidean covariance estimator ( σ 2 p ) ij is given by

The Non-Euclidean L p -Mean Estimator and Its Expectation Value
Two basic theorems of the ordinary expectation values are inherited to the non-Euclidean ones: (1) The L p -expectation value of the L p -mean estimator µ p, N ({y j } N j=1 ; p) is equal to the L p -expectation value µ p of any of the independent and identically distributed random variables Given the sampling {y i } N i=1 , the non-Euclidean, L p -mean estimator Then, the non-Euclidean L p -expectation value of µ p, N , namely, where L({y j } N j=1 ) is the normalized joint probability density, so that Consider the sampling Namely, the joint distribution density has the property L({y Lemma 1: The symmetrically distributed random variables {Y i } N i=1 are characterized by the same non-Euclidean L p -expectation value, namely, where L y i (u) ≡ L y (u), ∀ i = 1, . . ., N , is the marginal distribution density, which is identical for all the random variables {Y i } N i=1 .Lemma 2: Let the auxiliary functionals {G i } N i=1 , with G i = G i ({y j } N j=1 ; p) ≡ y i − µ p, N ({y j } N j=1 ; p), ∀ i = 1, . . ., N .Then, their L p -expectation values are zero, namely, ⟨G i ⟩ p = Êp (G i ) = 0, ∀ i = 1, . . ., N .

Theorem 1: Consider the sampling {y
According to Lemma 1, the random variables are characterized by the same non-Euclidean L p -expectation value, namely, which is implicitly expressed by Equation (22).Then, the L p -expectation value of the L p -mean estimator Theorem 2: Consider the sampling {y i } N i=1 , y i ∈ D y ⊆ R, ∀ i = 1, . . ., N , of the independent and identically distributed random variables The independent and identically distributed random variables are also symmetrically distributed.Then, from Theorem 1 we have The L p -expectation value ⟨Y i ⟩ p = µ p is calculated given the marginal distribution density L y (y i ), ∀ i = 1, . . ., N .However, the expression of this distribution is in the generic case unknown, and thus, we estimate µ p by means of µ p, N for N >> 1.

Examples
In the following three examples from Statistical Mechanics, we examine the systems of (1) gas in thermal equilibrium, (2) space plasmas out of thermal equilibrium, and (3) multi-dimensional quantum harmonic oscillator at thermal equilibrium.The non-Euclidean-normed internal energy U p is derived by utilizing the classical Euclidean probability distribution of Canonical Ensemble.

Gas at Thermal Equilibrium
For the continuous energy spectrum ε ∈ [0, ∞) with distribution p(ε) and degeneracy g(ε), the L p internal energy ⟨ε⟩ p is given by At classical thermal equilibrium, the energy distribution is given by the Boltzmann-Gibbs distribution where f denotes the degrees of freedom.Hence, the internal energy, ⟨ε⟩ p /(k B T ), is implicitly expressed in terms of p, f , as follows where we set x ≡ ε/(k B T ).In the Euclidean case, the internal energy is ⟨ε⟩ 2 = (f /2)k B T .In the non-Euclidean case, this is written as ⟨ε⟩ p = (f p /2)k B T , where f p represents the reflected degrees of freedom; for sufficiently high number of degrees we find f p ≃ f + 0.66 • (p − 2).

Plasma Out of Thermal Equilibrium
Classical systems are said to be in thermal equilibrium -the concept that any flow of heat (thermal conduction, thermal radiation) is in balance.However, thermal equilibrium is not the only possible state that is stationary (i.e., the phase space distribution does not explicitly depend on time).For example, space plasmas are systems residing in stationary states but out of thermal equilibrium.For these systems, the energy distribution is well-described by the empirical kappa distribution (see [26] and references therein).Moreover, the kappa distribution was shown to be connected [26] with the solid background of non-extensive Statistical Mechanics [27], and represents the probability distribution that maximizes entropy in the Canonical Ensemble.The kappa distribution is the generalization of the classical Boltzmann-Gibbs exponential distribution that describes systems only at thermal equilibrium.The temperature and the kappa index that govern these distributions are the two independent controlling parameters of non-equilibrium systems.The invariant form of the kappa distribution, in which the temperature T , the kappa index κ 0 and the total degrees of freedom f are all independent variables [28], is given by where the kappa index (κ 0 > 0) determines a measure of how far the system is from the thermal equilibrium [29].The kappa distribution recovers the Boltzmannian exponential distribution for κ 0 → ∞, which is the value of the kappa index characterizing thermal equilibrium.The smallest possible value of the kappa index is κ 0 → 0, and determines the furthest stationary state from thermal equilibrium [30].
In Figure 1 the kappa distribution P (ε; T ; κ 0 ; f )g(ε) × (k B T ) is depicted in terms of ε/(k B T ) for κ 0 = 0.01, 0.1, 1, 10.The internal energy ⟨ε⟩ p is given by (Again, we set x ≡ ε/(k B T ) and the degeneracy is ) In Figure 2a we depict the internal energy, ⟨ε⟩ p /(1/2 k B T ) with respect to the degrees of freedom f , for p = 1.5, 2, 2.5 and for kappa indices κ 0 = 1.5 and κ 0 = 100 (this practically equals the Boltzmannian exponential distribution at thermal equilibrium).Figure 2b plots the internal energy over the degrees of freedom, ⟨ε⟩ p /(f /2 k B T ), for the same p and κ 0 .Figure 2c shows the dependence on the kappa index (for f = 3).Interestingly, the internal energy is kappa-dependent for any of the non-Euclidean norms, i.e., ⟨ε⟩ p = f p /2 k B T , with f p = f p (κ 0 ; p).(Hence, T does not well-define the kinetic temperature for p ̸ = 2 [26,30].)Note that the integral converges only for κ 0 > p − 2, which generalizes the inequality κ 0 > 0 of the Euclidean case.

D-Dimensional Quantum Harmonic Oscillator in Thermal Equilibrium
For the energy states {ε n } W n=1 with degeneracy {g n } W n=1 that are associated with the discrete energy distribution {p n } W n=1 , the L p internal energy ⟨ε⟩ p is with the Boltzmann's energy distribution p n ∝ e − εn k B T at thermal equilibrium.The D-dimensional quantum harmonic oscillator has discrete energy spectrum ε n = ℏω(n + 1  2 ) with degeneracy g n = ( n+D−1 n ) ∝ (n + D − 1)!/n!, ∀ n = 0, 1, 2,...Then, the internal energy ⟨ε⟩ p is implicitly given by In Figure 3a, we depict ⟨ε⟩ p /(ℏω) with respect to k B T /(ℏω), for p = 1.8, 2, 2.5 and D = 2.The relevant heat capacity, c V,p = (∂⟨ε⟩ p /∂T ) V , is shown in Figure 3b.For high temperatures, the Dulong-Petit limit is generalized to where the reflected degrees of freedom are f p ≡ 2D + 0.6 • (p − 2); the involved constant is α p ≃ 0.015D(p − 2).Each dimension of the Euclidean quantum oscillator has two degrees of freedom so that f 2 = 2D; the non-Euclidean quantum oscillator deviates proportionally to the factor f p − f 2 ∝ p − 2. In Figure 3c,d we depict ⟨ε⟩ p and c V,p as a function of the dimensionality D, for p = 1.8, 2, 2.5 and k B T /(ℏω) = 1.While for the Euclidean norm the heat capacity gives c V,p /k B = D, for the sub-and super-Euclidean norms we have c V,p /k B < D and > D, respectively.Finally, we remark that the heat capacity appears to have periodic cusps that become smaller in higher temperatures.
with ⟨y⟩ p implicitly given by Equation ( 8), or in the continuous description y ∈ D y ⊆ R, with probability distribution p(y), with ⟨y⟩ p implicitly given by ∫ Then, the L p variance is given by where we have set u ≡ y−µ σ , and z ≡ u 2 2 .Thus, the L p variance yields the well-known Euclidean variance σ 2 of the Gaussian distribution.

Example 2: Generalized Gaussian distribution
Consider the Generalized Gaussian probability distribution density, where Q ≥ 0 is the shape parameter, C Q,p is the normalization constant, and η Q,p is calculated by setting where we have set u ≡ y−µ σ , z ≡ η Q,p u Q , and Of particular interest is the case where the shape parameter equals the norm, Q = p, namely, with ] p 2 (Note that ⟨y⟩ p = µ for both the examples.See Section 5.1.)

Justification of the L p -Variance Expression
Let the set of {Y i } N i=1 independent and identically distributed random variables with sampling values First, we consider that each variable is distributed as Y i ∼ N (µ, σ), namely, by the Gaussian probability distribution density ∀ i = 1, . . ., N .Then, the likelihood function is constructed Thus, for the normally distributed {y i } N i=1 , the best estimators for the true value of the parameters µ and σ 2 are the Euclidean estimators of the mean and variance, that is 2 , respectively.What is important here is not the variance σ 2 of the theoretical distribution density f G (y i ; µ; σ), but the variance S 2 2, N of the Euclidean mean estimator µ 2, N ({y i } N i=1 ) [31], given by 1 so that the value of µ, estimated by µ 2, N , has a variance S 2 2, N , given by Now consider that each variable is distributed as Y i ∼ GG(µ, σ; Q, p), namely, by the General Gaussian probability distribution density of Equation (35), ∀ i = 1, . . ., N .Then, the likelihood function is constructed Thus, for {y i } N i=1 distributed by the General Gaussian, the best estimator for the true value of the parameter µ is the non-Euclidean L p -mean estimator µ p, N ({y i } N i=1 ; p), defined in Equation ( 19).In addition, ∂ ∂σ ln[L y ({y i } N i=1 ; µ, σ; p)] = 0 leads to and thus, the variance S 2 p, N of the estimator µ p, N ({y i } N i=1 ; p) is given by 1 which is similar to Equation (43) after replacing the Euclidean variance, σ 2 2, N , with the non-Euclidean one, σ 2 p, N .Finally, the L p variance (e.g., as given in Equation ( 30)) can be written in the form of Equation ( 17), i.e., Note that the L p mean ⟨y⟩ p (Equations ( 1) and ( 8)) is derived from the minimization of σ 2 p (α) = where = 0 is the Total p-Deviation first derivative (in terms of the parameter α), which is equal to zero, while the curvature factor A 2 (p) is given by As it is shown in [7], the variance σ 2 p that is proportional to the square error of the optimal value α * (p) is inversely proportional to A 2 (p), On the other hand, it is expected that the variance σ 2 p will be proportional also to A 0 (p), namely, which completes the proportionality of Equation ( 51), leading to Equation (30).
There are two more ways to show the specific expression of L p -variance: The proportionality factor C connects, either the variance with the Total Deviations, i.e., σ 2 p = C • T D p p , or the energy states The property (ii) of the non-Euclidean norm operator Lp holds if and only if C = 1 (p−1)ϕp .This can be shown in the following two ways: (1) Due to the property (ii) of the non-Euclidean norm operator, the respective Canonical probability distribution can be automatically derived.Indeed, the internal energy constraint , so that by maximizing Boltzmann-Gibbs entropy we end up with p k ∼ e −β Lp(εk) .( 2) The connection with thermodynamics can be achieved if and only if C = 1 (p−1)ϕp (Equation ( 15)).(the sign is odd function, while p(y)|y| p−1 is even).Hence, given the uniqueness of ⟨y⟩ p for a given p, we conclude in ⟨y⟩ p = 0, ∀ p ≥ 1.

Further Analytical and Numerical Examples
Therefore, in the case where the distribution p(y) is symmetric, the whole set of ⟨y⟩ p -values degenerate to one single value, which can be found thus, by the usual Euclidean norm, namely ⟨y⟩ p = ⟨y⟩ 2 .(The opposite statement is also true.)However, when p(y) is asymmetric, a spectrum-like range of different ⟨y⟩ p -values is generated [7].For example, the distribution p(y) ≃ 1 + δ(1 − 3y 2 ) in the interval y ∈ [0, 1] is symmetric for δ = 0, but becomes asymmetric for 0 < δ << 1.Then, we find

Numerical Example: Earth's Magnetic Field
We consider the time series of the Earth's magnetic field magnitude (in nT).In particular, we focus on a stationary segment recorded by the GOES-12 satellite between the month 1/1/2008 and 1/2/2008, that is a sampling of one measurement per minute, constituting a segment of N = 46, 080 data points, depicted in Figure 4a.This segment is characterized by a roughly symmetric distribution p(B) (in nT −1 ), depicted in Figure 4b, resulting to a narrow spectrum of ⟨B⟩ p -values, depicted in Figure 4c.Notice that the extreme values of ⟨B⟩ p do not coincide with the respective for p = 1 and p → ∞.Indeed, a minimum of the L p -expectation values, that is ⟨B⟩ p,min ≈ 96.614 nT, can be found for the non-Euclidean norm p ≈ 5.63, while a maximum value of about ⟨B⟩ p,max ≈ 98.819 nT, is located at p ≈ 25.83.As p → ∞, ⟨B⟩ p tends to (B min +B max )/2 ≈ 98.535 nT.Hence, it is evident that L p means µ p are not indispensably monotonic functions of p.The expectation value ⟨B⟩ p is given by the estimator µ p, N ({B i } N i=1 ; p).On the other hand, the error δ⟨B⟩ p is given by the square root of the variance S 2 p, N of the estimator µ p, N ({B i } N i=1 ; p), that is In Figure 5a,b, the L p -expectation value of the Earth's magnetic field magnitude (shown in Figure 4a), ⟨B⟩ p , together with its error δ⟨B⟩ p , are respectively depicted as functions of the p-norm.A local minimum of the error δ⟨B⟩ p can be detected for p ≈ 2.05, for which ⟨B⟩ p ≈ 97.88 nT and δ⟨B⟩ p,min ≈ 0.071 nT, shown in the magnified inset of Figure 5c.For p ≳ 2.05 the error increases as p increases, until it reaches a local maximum at p ≈ 7.88, for which ⟨B⟩ p ≈ 97.07 nT and δ⟨B⟩ p,max ≈ 0.091 nT.Then, for p ≳ 7.88 the error decreases monotonically as p increases.We can readily derive that For p ≲ 2.05 the error increases as p decreases, but numerous fluctuations appear that become more dense as p → 1.This "instability cloud" is due to the reading errors of the data values, with their effect being magnified as p − 1 tends to zero.This effect can be demonstrated in Figure 6, where the error δ⟨B⟩ p is depicted when an additive noise is inserted into the {B i } N i=1 values.In particular, we consider the perturbed values { Bi } N i=1 , where Bi ≡ B i + ϵ r i , ∀ i = 1, . . ., N , with {r i } N i=1 , being equidistributed in [0, 1] and |ϵ| is the amplitude of the perturbations.We set |ϵ| = 0.01 nT, which is equal to the resolution (reading error) of the values {B i } N i=1 .In Figure 6a we set ϵ = −0.01nT, while in Figure 6c we set ϵ = +0.01nT.In Figure 6b we depict the unperturbed error for convenience.We observe that the "instability cloud", occurred for p → 1, is different for the three cases ϵ = −0.01,0, and +0.01 (nT).However, the minimum at p ≈ 2.05 remains unaffected.
The existence of a local minimum of the error, such as the minimum at p ≈ 2.05, is of great importance.It suggests that for this specific norm, the expectation value comes with the minimum error.Therefore, after the total deviations minimization that leads to the normal Equation (1) from which the optimal parameter α * (p) is derived, the optimization is completed by determining the specific norm p * for which the variance σ 2 p (p) has a local minimum (if that exists).The importance of the local minimum of σ 2 p (p) is that for any deviation of the norm at p = p * , either p < p * , or p > p * , the error increases, introducing thus, a type of norm-stability.Hence, the L p -norm that corresponds to the minimized error, p ≈ 2.05, is distinguished.It is interesting that the norm p ≈ 2.05 is very close to the Euclidean one (p = 2).However, it has to be stressed out that there is not any universally preferred norm, since this is dependent on the specific data values.
Both the diagrams of the L p mean µ p (p) and its relevant error δµ p (p), depicted in terms of the norm p, constitute a "metricogram".In a metricogram, we are able to observe the whole spectrum of the L p mean and its error, and to recognize the preferable norms.

Conclusions
This analysis introduced a possible generalization of the basic statistical concepts of the expectation value and variance for non-Euclidean metrics induced by L p norms.The Euclidean L 2 mean is derived by minimizing the sum of the total square deviations T SD, which is the Euclidean variance.Similarly, the non-Euclidean L p means were developed by minimizing the sum of the L p deviations T D p p , which is proportional to the L p variance.The main advantage of the new statistical approach is that the p-norm is a free parameter, thus both the L p -normed expectation values and their variance are flexible to analyze new phenomena that cannot be described under the notions of classical statistics based on Euclidean norms.As it was shown, the L p means embody a generic formal scheme of means characterization, given the sampling values {y i } N i=1 .This involves the existence of a univalued, N -multivariable function M ({y i } N i=1 ), fulfilling the following three preconditions: (i) Continuity; (ii) Internness; (iii) Symmetry.(This axiomatic scheme of means characterization generalizes the one proposed by Aczél [23], which led to the Euclidean φ-means.) Earth's magnetic field magnitude (time series recorded within January of 2008), a preferable norm was found for p ≈ 2.05, for which ∂ ∂p δ⟨B⟩ p (p) = 0, ∂ 2 ∂p 2 δ⟨B⟩ p (p) > 0. The classical concept of expectation values was generalized to the non-Euclidean L p -normed representation, highlighting its implication in Statistical Mechanics.Indeed, the L p expectation value of a given energy spectrum {ε k } W k=1 represents the non-Euclidean adaptation of the internal energy U p .This is an issue that has to be considered in Statistical Mechanics: Several pedagogical examples were examined: gas in thermal equilibrium, space plasmas out of thermal equilibrium, and multi-dimensional quantum harmonic oscillator at thermal equilibrium.
respectively.These forms can be thought as the result of the corresponding optimizations of the Total Absolute Deviations T AD({y k } W k=1 ; α) = ∑ W k=1 p k |y k − α| (also called Total Boundary Area -TBA), and the Total Square Deviations T SD({y k } W k=1 ; α) 2 = ∑ W k=1 p k |y k − α| 2

Figure 2 .
Figure 2. The dependence of the internal energy on the degrees of freedom f and the kappa index κ 0 .(a) Plot of ⟨ε⟩ p /(1/2 k B T ) vs. the degrees of freedom f , for p = 1.5 (red), 2 (blue), 2.5 (green), and κ 0 = 1.5 (solid), and 100 (dash).(b) Plot of ⟨ε⟩ p /(f /2 k B T ), vs. f , for the same p and κ 0 .(c) The same plot as (b), but vs. κ 0 and for f = 3.We observe larger deviation from the Euclidean norm for smaller kappa indices.Note that the graph is restricted to κ 0 > p − 2 (see text).

Figure 4 .
Figure 4.The magnitude of the Earth's total magnetic field.(a) The time series recorded between 1/1/2008 and 1/2/2008.(b) The relevant distribution p(B) is roughly symmetric.As a result, the numerically calculated L p -expectation values, ⟨B⟩ p , configure a narrow spectrum within the interval between the two horizontal dotted lines, where the dependence of ⟨B⟩ p -values on the p-norm is shown within the magnified inset (c).

Figure 5 .
Figure5.The L p -expectation value of the magnitude of the Earth's total magnetic field (shown in Figure4a), ⟨B⟩ p , together with its error δ⟨B⟩ p , are depicted as functions of the p-norm (panels (a) and (b), respectively).A local minimum of the error is found close to the Euclidean norm, i.e., for p ≈ 2.05, as it is shown within the magnified inset (c).

Figure 6 .
Figure 6.The error δ⟨B⟩ p is depicted with additive equidistributed noise inserted into the {B i } N i=1 values (N = 46080).The amplitude of the noise is equal to the resolution of the values {B i } N i=1 .Namely, we set ϵ = −0.01nT (a), and ϵ = +0.01nT (c).In panel (b) we depict the unperturbed error for convenience.The magnified panels (d), (e) and (f) of the respective panels (a), (b) and (c) demonstrate the minimum error at p ≈ 2.05 that remains unaffected, for amplitudes of additive noise less or equal to the reading error, i.e., |ϵ ≤ 0.01| nT, in contrast to the fluctuations, appearing for p−1 → 0, which are affected by the additive noise.
, if and only if C = 1 (p−1)ϕp .Given the set of states {y k } W k=1 of a physical quantity, which are included in the Euclidean representation of a formulation F [{y k } W k=1 ], the respective non-Euclidean representation can be often retrieved automatically by replacing y k with Lp (y k