Means as Improper Integrals

The aim of this work is to study generalizations of the notion of the mean. Kolmogorov proposed a generalization based on an improper integral with a decay rate for the tail probabilities. This weak or Kolmogorov mean relates to the weak law of large numbers in the same way that the ordinary mean relates to the strong law. We propose a further generalization, also based on an improper integral, called the doubly-weak mean, applicable to heavy-tailed distributions such as the Cauchy distribution and the other symmetric stable distributions. We also consider generalizations arising from Abel–Feynman-type mollifiers that damp the behavior at infinity and alternative formulations of the mean in terms of the cumulative distribution and the characteristic function.


Introduction
The mean is sometimes taken as the foundational principle for all of probability theory (see, for example, [1]).Nate Silver [2] in the President's Invited Address at the Joint Statistical Meeting 2013 in Montreal said that "the average is still the most useful statistical tool ever invented."Steven Levitt [3] in the Arthur M. Sackler Lecture at the National Academy of Sciences in March 2015 said "When I work with companies . . ., I like to show a comparison of means and that's often more effective than very complicated things . . .."Kosko [4] suggested, on the other hand, jettisoning the mean in favor of the median because of its shortcomings.The chief shortcoming is that the mean sometimes does not exist, whereas the median always exists (even though it is multi-valued in some cases).Kosko cited the Cauchy distribution as a leading example where the mean fails to exist.
Here we study generalizations of the mean intended to extend its reach as much as possible.At the outset, we note that these generalizations are only applicable to the standard case and to distributions with heavy tails at both ends: If neither of these equations is true, then E(X) exists.If only one of them holds, then E(X) is ±∞.Thus, our ideas will extend only to distributions heavy-tailed at both ends and indeed having equally heavy tails at both ends.This includes the symmetric stable distributions [5][6][7], a family arising in signal processing, among which are the normal and Cauchy distributions and which satisfy a generalized central limit theorem.
See also [8], where we provide an axiomatic treatment of the ordinary mean based on a condensation principle due to Bemporad and a continuity principle.
Three ways to extend the notion of the mean come to mind.One way, always available, is to transform the variable so that the mean exists.We apply a strictly-increasing function f to the variable X and study E f (X) = f −1 (E( f (X))).The function f can be a power function, a log function, a logistic function, or even the function arctan x.The function converts X into a bounded variable that necessarily has a finite mean.The difficulty with this approach is that the extensions do not recover the ordinary mean.If E f (X) = E(X) whenever E(X) exists, then f is necessarily a linear transformation given by f (x) = Cx + D, C = 0, and E f (X) exists only when E(X) exists.
A second way to extend the mean, an obvious and straight-forward approach, is to replace the usual definition by an improper integral.Instead of integrating x over R with respect to the probability distribution P of the random variable X we integrate over the interval [c − M, c + M] for some choice of a center c and determine what happens as M tends to ∞.We follow this approach here.
We itemize the different cases that can result (Theorem 1) in this way and give examples of each.We examine a weak mean proposed by Kolmogorov that corresponds to the weak law of large numbers, and we show that it is additive.We also consider two further generalizations: one with a weakening of the decay conditions that Kolmogorov imposed for his weak mean, which we call the doubly-weak mean; and another, the superweak mean, when the improper integral exists for at least one choice of the center c.These topics occupy Sections 2 and 3.
A third way to extend the notion of the mean is to introduce what we call a "mollifier", namely, a parameter-dependent multiplier φ λ (x) such that x → φ λ (x)x is integrable with respect to P and φ λ (x) tends to one as λ tends to λ 0 .Then, we consider what happens to E(φ λ (X)X) as λ tends to λ 0 .This approach is examined in Section 4. We note some dangers associated with mollifiers, consider some examples, and determine cases where this approach reduces to the previous approach.Richard Feynman (see [9]) sometimes used mollifier methods, and his approach motivated us.A well-known theorem of Abel is also related to these methods.
In Section 5, we compare and restate the results discussed earlier in terms of the cumulative distribution and the characteristic function of the variable X.In Section 6, we offer a brief conclusion.

Improper Integrals Extending the Mean
Consider a real-valued random variable X.We suppose that associated with X is a Borel probability measure P that takes each Borel subset A of the real numbers to P(A) = the probability that X belongs to the set A.
We shall also use the notation P(X ∈ A).The mean of X, denoted by E(X) or µ X , is defined when x is integrable with respect to P to be: One direction of the remarkable strong law of large numbers (see Pollard [10], pp.37-38, 78) states that if {X n } is a sequence of independent random variables with common distribution P and if there exists a constant m such that: as n tends to ∞, then each X n has mean m.Here "almost surely" means outside a set of measure zero in the countably-infinite product space induced by the measure P (see [10], pp.99-102).Simply put, if the sample mean of independent copies of X settles down to a specific number, then that number is E(X).This can be regarded as the motivation for the transition from the sample mean to the mathematical mean E(X).The general notion of mean is derived from the finitary notion of the sample mean.
The other direction of the strong law of large numbers asserts that if E(X) exists, then the sample mean of n identical independent copies of X converges almost surely to E(X) as n tends to infinity.For a proof of both directions of the strong law, see [10], pp.95-102, 105.For an alternate proof due to N. Etemadi, see [10], pp.106-107.
When x is not integrable with respect to P, the notion E(X) above is inapplicable, and we must rely on other notions of the mean.The most obvious generalization is the following improper integral: for a real number c.
By the Lebesgue dominated convergence theorem ( [11], p. 172), this notion coincides with the ordinary mean when x is integrable with respect to P. In his great foundational work ( [12], p. 40), Kolmogorov noted this option in the case when c = 0 and observed that it does not require integrability of |x|.Indeed if X is a random variable obeying the Cauchy distribution f (x) = 1/π(1 + x 2 ), then X satisfies L(c) ≡ 0 for any choice of c.
Two related notions of mean are: x P(dx) and: where a ≤ b.
It is easily seen that the expression in L-1 coincides with L((a + b)/2) since: As for L-2, we have the following result.
Proposition 1.Let X be a random variable with probability measure P. Let a and b be real numbers with a ≤ b.Then: x P(dx) exists if and only if x is integrable with respect to P. Furthermore, when this limit exists, it is equal to E(X).
Returning now to L(c), we stipulate that −∞ ≤ L(c) ≤ ∞.This gives us flexibility in characterizing what can happen.A necessary, but not sufficient condition for L(c) to be finite, we note in passing, is that for each positive real number p: Lemma 1.Let X be a random variable with probability measure P, and let c 1 and c 2 be real numbers with c 1 < c 2 .Then, there are three possibilities: x P(dx); x P(dx); Proof.Suppose c 1 < c 2 .Then: The second and third terms on the right side of Equation ( 1) are both non-negative for M sufficiently large, and (i) and (ii) of Lemma 1 follow at once.
In the case of (iii), note that (i) and (ii) imply that if both L(c 1 ) and L(c 2 ) exist, then L(c 1 ) ≤ L(c 2 ).Suppose then that L(c 1 ) and L(c 2 ) both exist with c 1 < c 2 and −∞ ≤ L(c 1 ) < L(c 2 ) ≤ ∞.Then, there is a positive number K such that K < L(c 2 ) − L(c 1 ), and for M sufficiently large: where d = max {|c 2 |, |c 1 |}.Thus: Now, replace M by M j = M + j(c 2 − c 1 ) for each non-negative integer j to get: Summing over these inequalities and noting that c 2 + M j = c 1 + M j+1 and c 1 − M j = c 2 − M j+1 , we obtain: which is a contradiction.Thus, the possibility L(c 1 ) < L(c 2 ) is eliminated, and L(c 1 ) = L(c 2 ).
Theorem 1.Let X be a random variable with probability measure P. Then exactly one of these possibilities holds: (i) L(c) does not exist in [−∞, ∞] for any real number c; (ii) L(c) exists in (−∞, ∞) for exactly one real number c; (iii) L(c) exists in [−∞, ∞] for all real numbers c and is independent of c; (iv) there is a number c 0 such that L(c) = ∞ for c > c 0 , L(c) does not exist for c < c 0 , and L(c 0 ) equals ∞ or does not exist; or (v) there is a number c 0 such that L(c) = −∞ for c < c 0 , L(c) does not exist for c > c 0 , and L(c 0 ) equals −∞ or does not exist.
Now, suppose L(c) = ±∞ for all real numbers c.Then, either L(c) does not exist for any c and Theorem 1 Part (i) holds, or L(c) exists for exactly one c and Theorem 1 Part (ii) holds, or L(c) exists at two or more points and is a finite number.
In the latter case, we can find c 1 and c 2 with c 1 < c 2 so that by Lemma 1 Part (iii) L(c 1 ) = L(c 2 ) ∈ R. In this case, the last two terms in Equation ( 1) each tend to zero as M tends to infinity.By the changes of the variable, we conclude that: lim M→∞ (M,p+M] x P(dx) = 0. ( and: lim hold where p = c 2 − c 1 .It follows immediately that (2) and (3) also hold when 0 Likewise, if (2) and (3) hold for p, they also hold for 2p, 3p, ..., and indeed for np where n is any fixed positive integer.Accordingly, ( 2) and ( 3) hold for all positive real numbers p. Applying this result to the two terms on the far right side of Equation ( 1), we find that: for all real numbers c, and thus, Theorem 1 Part (iii) holds.
We give some examples to illustrate that each of the possibilities enumerated in Theorem 1 can occur.For simplicity, we use discrete random variables in most of our examples.
Example 1.Consider a random variable X whose probability measure is of the form: where δ z is the (Dirac) probability measure whose value is one on any Borel subset of R that contains the real number z and whose value is zero on the remaining Borel subsets of R. The sum of the nonzero values is one, so this obviously defines a probability measure.However, the integral of x over the interval [c − M, c + M] is the difference between the size of the first set and the size of the second set below: The size of the set {n: The size of the set {n: where is the floor function.For fixed c and sufficiently large M, the difference of the above quantities can assume the values zero and −1, and the integral does not settle down to either one.This is an instance of Theorem 1, Part (i).
Example 2. A random variable X can also be defined with the probability measure of the form: Therefore, P is concentrated at the points ±2 n and assigns probability 1/(2 n+1 ) to these points.For this measure, L( 0 Example 3. Now, consider a random variable X having a probability density (with respect to Lebesgue measure on the real line) of the form: Example 4. The probability measure: and this expression has the value (2 n 0 − 1)/3 or (2 n 0 + 2 n 0 −1 − 1)/3 where n 0 = log (c + M)/ log 3 for large M. Since M and n 0 tend to infinity together, it follows that L(c) ≡ ∞ for c ≥ 0.
On the other hand, if c = −d where d > 0, the integral of x over and this reduces to (2 n 0 − 1)/3 or to (−1)/3 for large M where: depending on whether a positive integer lies in the interval (log (M − d)/ log 3, log (M + d)/ log 3] or not.Thus, L(c) does not exist for c < 0.

Example 5.
Another example for Part (iv) of Theorem 1 is the case where the probability measure is given by: Here K is a suitably-chosen positive normalizer, which is easily seen to be in the interval (1/3, 1).For c > 0, and this is greater than or equal to K(2 n 0 − 1) for M sufficiently large where n 0 = log (M − c)/ log 3 .Since n 0 and M tend to infinity together, L(c) ≡ ∞ for all c > 0. When c = 0, the integral of x over [−M, M] reduces to K(2 n 0 − 1) or to −K where n 0 is the largest integer such that 3 n 0 + (1/n 0 ) ≤ M, and the first or second reduction occurs according to M < 3 n 0 +1 or not.Thus, L(0) does not exist.By Theorem 1, L(c) does not exist for c < 0.
Other cases arising in Theorem 1, such as Part (v), are obtained by modifying the examples above, e.g., replacing X by −X or by X + a.

Weak Means
An implication of Theorem 1 is that if L(c) exists for more than one choice of c and is finite for one c, then L(c) exists for all c, is finite, and is independent of c.The case of the Cauchy distribution shows that this can happen without the ordinary mean existing.Accordingly, for a random variable X, we define the doubly weak mean of X, denoted by E WW (X), to be the common value of L(c) for all c when this common value exists and is in (−∞, ∞).In Theorem 4 below, we will consider alternative characterizations of the doubly-weak mean.
An intermediate notion exists due to Kolmogorov ([12], pp.64-66) between the ordinary mean and the doubly-weak one that motivates our terminology.The weak mean of X, denoted by E W (X), is defined as follows: E W (X) is the quantity L(0) provided the latter exists in (−∞, ∞) and: The following theorem is due to Kolmogorov.It indicates that the existence of the weak mean coincides precisely with the existence of a number for which the weak law of large numbers holds.
Theorem 2 (Kolmogorov, 1928).Let X be a random variable.Suppose that X 1 , • • • , X n , • • • are independent identically distributed copies of X with P n the n-fold product distribution.Then, there is a real number m such that for each > 0: Proof.See [12], p. 65, [13], and [14], Theorems XII and XIII.
Proposition 2. Let X be a random variable.
(i) If X has a mean, then X has a weak mean, and E(X) = E W (X); and (ii) if X has a weak mean, then X has a doubly-weak mean, and E W (X) = E WW (X).
Proof.In the case when X has a mean, then the identity function x → x is integrable with respect to the probability measure P on the real line.In particular, the tail integrals: [n,∞) x P(dx) and x P(dx) tend to zero as n tends to infinity.Since the absolute values of these integrals are larger respectively than nP(X ≥ n) and nP(X ≤ −n), it follows that lim n→∞ nP(|X| > n) = 0. Likewise, by Lebesgue's dominated convergence theorem, L(0) = E(X).Now, suppose X has a weak mean.If c 1 < c 2 and > 0 are given, then for a sufficiently large M, For sufficiently large M, the right side is as close to , as we like.Thus: x P(dx) = 0.
Accordingly, from Equation (1) in the proof of Lemma 1, it follows that whenever one of L(c 2 ) or L(c 1 ) exists and is finite, the other exists and is equal to it.Since L(0) = m, then L(c) exists for all c, L(c) ≡ m, and m is the doubly-weak mean of X.
Kolmogorov in [12], p. 66, gave an example where the weak law holds, but the strong law does not.Kolmogorov's example is a random variable X whose probability distribution P is given by: where C is a suitable normalizing constant and A is any Borel set in the reals.Cauchy random variables have L(c) existing for all c, independent of c, but violate the weak law by not decaying rapidly enough at infinity.These examples demonstrate that the notions of mean, weak mean, and doubly-weak mean are strictly distinct.Since the strong law and the weak law correspond precisely to the mean and the weak mean, it is natural to wonder if another such law corresponds to the doubly-weak mean.
Theorem 3 (Doubly-weak law of large numbers).Let X be a random variable.For c in R and M > 0, let c X M denote the random variable defined by: • • • are independent identically distributed copies of c X M with Pn their n-fold product distribution.If there is a real number m such that for each > 0 and for c = c 1 and c = c 2 for distinct real numbers c 1 and c 2 , and for M sufficiently large: then X has doubly-weak mean E WW (X) = m.Conversely, if X has doubly-weak mean E WW (X) = m, then Equation ( 5) holds for each > 0, for each c ∈ R, and for M sufficiently large.
Proof.Suppose that c X M is as above and ( 5) holds.Note that the probability distribution P of c X M is given by P The variable c X M is a bounded variable and hence has a mean given by: x P(dx).
Furthermore, this variable obeys the weak law of large numbers, i.e., for every > 0, every c ∈ R, and every M > 0. Combining ( 5) and ( 6), we conclude that: for c ∈ {c 1 , c 2 } and M sufficiently large.Since the inequality |E( c X M ) − m| ≤ 2 does not depend on n, we conclude that it must be true for M sufficiently large.However, is an arbitrary positive number.Therefore, for c ∈ {c 1 , c 2 }, that is to say, L(c 1 ) and L(c 2 ) exist and are equal to the real number m.By Theorem 1, L(c) ≡ m for every real numbers c, and m is the doubly-weak mean of X.
To prove the converse, note that since L(c) ≡ E WW (X) for all c, then for each c ∈ R, (7) holds.Thus, for any > 0, any c, and for any M sufficiently large (dependent on both and c): However, combining the above result with (6), with , we obtain (5) with replaced by 2 .Since is an arbitrary positive number, the factor of two is irrelevant.Thus, (5) holds for all , all c, and M sufficiently large.
The doubly-weak mean can be characterized in a different manner as follows, based on the argument in Theorem 1.
Theorem 4. The random variable X has a doubly-weak mean if and only if any of the following equivalent conditions holds: (i) L(c) exists in (−∞, ∞) for all real numbers c and is independent of c; or (ii) L(c) exists in (−∞, ∞) for two distinct real numbers c; or (iii) L(c) exists in (−∞, ∞) for some real number c and for every positive real number p lim M→∞ MP(M < |X| ≤ M + p) = 0; or (iv) L(c) exists in (−∞, ∞) for some real number c, and there exists a real number p ≥ 1 such that Of course, n above denotes an integer variable, while M is a real variable.Any one of these conditions can be taken as defining when a doubly-weak mean exists, in which case the doubly-weak mean is the (common) value of L(c).The closest in spirit to Kolmogorov's definition for the weak mean (cf.Equation ( 4)) is Condition (iv).
Proof.That (i) and (ii) are equivalent is a consequence of Theorem 1.If (ii) holds, it also follows from Theorem 1 that −∞ < L(c 1 ) = L(c 2 ) < ∞ for real numbers c 1 and c 2 with c 1 < c 2 .The proof of Theorem 1 then shows that Equations ( 2) and (3) hold for any positive number p. Since: it follows that (ii) implies (iii).Condition (iv) is a special case of (iii).
We must still show that (iv) implies (i) or (ii).The condition that: for some p ≥ 1 implies the same for any smaller p > 0. For a number k larger than p in the interval (jp, (j + 1)p] with j a positive integer, choose n 0 so large that for n ≥ n 0 : Then: for n ≥ n 0 .Thus, (iv) holds for any p > 0.
If L(c) exists for some number c and d is another number, larger than c without loss of generality, then in the imitation of Equation (1): However, for a positive number M > max (−c, d): and: where n 1 and n 2 are integers such that For M sufficiently large, n 1 and n 2 are as large as we like, but with p fixed, the right sides of these two inequalities tend to zero.According in the limit, as M tends to ∞ in Equation ( 8), we obtain L(d) = L(c).This establishes (ii), and hence (i).
The doubly-weak mean may appear to be the last possibility for generalizing the mean.However, Theorem 1 suggests yet another generalization.We say that X has a superweak mean if L(c) exists in [−∞, ∞] for some real number c. Example 2 is a case where a finite superweak mean exists, but not a doubly-weak or weak mean.We have thus covered every case in Theorem 1 since Example 1 shows that there are cases where even the superweak mean does not exist.
We are now in a position to treat symmetric stable distributions.
Proposition 3. Let X be a real random variable having a symmetric stable distribution with location parameter a. Then X has a doubly-weak mean E WW (X) = a.
Proof.In the notation of [7], p. 14, when the characteristic exponent α is in (1,2], the ordinary mean E(X) = a exists.See [7], p. 22.We assume the symmetry parameter β = 0.When α = 1, we are dealing with the Cauchy distribution, which obviously satisfies (iii) and (iv) of Theorem 4. Therefore, it suffices to consider cases when 0 < α < 1.Furthermore, the variable X can be taken to have location parameter a = 0 and scale parameter γ = 1 since we can replace X by (X − a)/γ 1/α without loss of generality.Then, note that the density function for such an X is symmetric about zero, and hence, L(0) = 0.It thus suffices to show that MP(M < X ≤ M + p) tends to zero as M tends to ∞ for a fixed p > 0. Using Equation 2.9 of [7], p. 16, we obtain: , which tends to zero as M tends to ∞.Here, M is a number in (M, M + p) that depends on M and p and is guaranteed to exist by the integral form of the mean value theorem.
Another proposition that can be established is the following.
Proposition 4. Let X and Y be random variables both of which have a weak mean.Then X + Y has a weak mean, and Proof.This can be shown directly from the definition of the weak mean and Equation ( 4) at the beginning of Section 3.However, here, we will use the weak law.Let (X + Y) 1 , ...(X + Y) n , ... be a sequence of independent copies of X + Y with associated product measure P n for the first n of these variables.Let X 1 , ..., X n , .. be a sequence of independent copies of X and Y 1 , ...Y n , .. a sequence of independent copies of Y, and without loss of generality, suppose X 1 + Y 1 , ..., X n + Y n , ... are independent.Let Q n , Q n , and Q 2n be the product measure spaces associated respectively with the first n terms of these sequences.Then: Since X and Y have weak means, the last two lines converge to 1 + 1 − 1 = 1 as n tends to ∞. Accordingly, the first expression does also, and our result follows by Theorem 2.
A similar result is not available for doubly-weak means except for special cases (e.g., linear combinations of independent copies of a symmetric stable distributions).

Mollifiers
Richard Feynman was famous for his integration techniques, some of which are recorded in the book of Mathews and Walker [9], based on lectures Feynman gave at Cornell.Feynman's ideas, as noted earlier, partly motivated our investigation.
Mollifiers are used to aid approximation of the delta function and to smooth functions, but another use is to regularize the behavior at ±∞. Mollifiers fall under the heading of summation methods [15].The mollifier can then be used to reinterpret integrals, or renormalize them, in a manner that makes them finite.This method is used to "evaluate" the integrals of sin bx and sin x/x on [0, ∞) in [9], pp.60, 91.
This idea can be used to generalize the notion of the mean.We introduce a function φ λ (x) that depends on a parameter λ so that x → φ λ (x)x is integrable with respect to P for λ = λ 0 and φ λ (x) → 1 for each x as λ → λ 0 .Then, we define: In the case of the means L(c) discussed in earlier sections, the multiplier can be taken to be: where χ A is the characteristic function of the set A, |λ| = 1/M, and λ 0 = 0.Then, L(φ) is the same as L(c).However, there are dangers associated with mollifiers that the following example illustrates.
Example 6. Define a function φ λ,D for λ and D in R by: Evidently, φ λ,D is a well-behaved function, integrable and dying off at ±∞.Furthermore, {φ λ,D } converges pointwise to the constant function identically equal to one as λ tends to λ 0 = 0 with D fixed.
Suppose we use this family of functions as a mollifier to determine a mean for a variable obeying the Cauchy distribution.Let m(λ, D) be defined by: Now: for any positive real number K. Thus: Letting K tend to infinity, we find that lim λ→0 m(λ, 1) = 1.Hence, the mollifier-induced mean of the standard Cauchy distribution is: However, D is arbitrary and depends on the choice of the mollifier!
The underlying problem is that the choice of the mollifier is asymmetric.There are two issues: symmetry and a center of symmetry.The doubly-weak mean discussed in the previous section addresses these two issues by making use of the family of improper integrals (9) and requiring that E WW (X) = L(c) be the same regardless of the center c.
This motivates the following definition.We call {φ λ } a mollifier for X with center c if and only if for each real number λ in some neighborhood of zero, we are given a function x → φ λ (x) taking R into R with the following properties: 1.
Without loss of generality, we have taken the limiting value of λ to be zero in this definition.Furthermore, since φ λ (c) tends to 1 as λ tends to zero, if {φ λ } does not satisfy (4), we can replace this family by { φ λ φ λ (c)

}.
Examples of mollifiers, in addition to (9), include: The examples in (9) have center c, while those just mentioned have center zero for suitable X's.In general, a mollifier with center c can be created by taking one with center zero, call it φ λ (x), and replacing it by φ λ (x − c).However, we cannot be sure that the new mollifier will satisfy the property that x → xφ λ (x − c) remains integrable with respect to P (although it will be with respect to P Y where Y = X − c).
Let {φ λ } be a mollifier for the random variable X, and as before, let: It is natural to consider how L(φ) behaves in relation to the family {L(c) : c ∈ R} described in Theorem 1.We content ourselves with the following theorem and a few examples.Theorem 5. Let {φ λ } be a mollifier for the random variable X with center c.Suppose that for each λ = 0, φ λ (x) is an absolutely continuous function of x on each finite subinterval of R. Suppose also that there is a positive constant K such that for each λ = 0, x → φ λ (x) has variation bounded by K on R. If L(c) exists for X and some number c, then L(φ) exists and is equal to L(c).
Proof.Without loss of generality, we assume that c = 0. Consider the following integral identities where 0 < x P(dx) dt.(10) Here, we have used the differentiability of φ λ , the integrability of φ λ , and Fubini's theorem.In the last line of (10), we have made a change of variables from t to −t and used the identity φ λ (−t) = −φ λ (t).
The last line of (10) has an absolute value less than or equal to: x P(dx) + − max(M,t)
Since L(0) is finite, the second expression in absolute values in this integral can be made smaller than /2K where is a given positive number for M sufficiently large with M ≤ t ≤ M .However, φ λ has variation bounded by K. Therefore, the entire integral in (11) is smaller than /2.
Let us also require that M be so large that: Now, pick a positive number δ so that |λ| < δ implies that: This can be done since for fixed M, the integrand in ( 13) is bounded due to the bounded variation of φ λ , and hence, the Lebesgue dominated convergence theorem implies that the integral inside the absolute value tends to zero as λ tends to zero.
Theorem 5 indicates that some mollifiers are guaranteed to yield the same answer for the mean as our earlier techniques.One question, though, is whether they will sometimes give an answer when the previous methods do not.
A theorem of Abel asserts that a power series in z that converges at a point z 0 on the unit circle is absolutely convergent at each point in the open unit disk.Moreover, the analytic function f (z) that it converges to has the property that lim z→z 0 f (z) = f (z 0 ) provided z tends to z 0 non-tangentially from the interior of the unit disk.In particular, lim r→1 − ∑ a n r n = ∑ a n provided the latter converges.
A well-known counterexample to the converse of Abel's theorem is the series ∑ n≥0 (−1) n .This series diverges since its partial sums oscillate between one and −1.However, ∑ n≥0 (−1) n r n converges to 1/(1 + r) for |r| < 1 and, as r tends to 1 − , tends to 1/2.The quantity r n serves as a kind of mollifier for the original series and enables a kind of convergence.
Example 7. We can use the above to find a counterexample to the converse of Theorem 5, i.e., a case where L(φ) exists even though L(c) fails to exist for all c.Let X be a random variable taking the values: for each non-zero integer n with P(X = Let θ : [0, ∞) −→ [0, ∞) be defined in the following way: for each positive integer n, set θ(x) = n if n 3 − n 2 ≤ x ≤ n 3 + n 2 , and let θ take the interval [n 3 + n 2 , (n + 1) 3 − (n + 1) 2 ] onto the interval [n, n + 1] in an increasing (and smooth) fashion.A consequence of this definition is that θ(|x n |) = |n| for non-zero integers n.
If we now try to calculate the improper integral L(c) for X for some number c, we obtain: To see that L(c) does not exist for any c, consider the following.The first sum in the last expression, for M sufficiently large, will be the sum of consecutive nonzero integers from n 0 to n 1 with n 0 < 0 < n 1 , and the second sum will be ((−1) n 0 + (−1) n 0 +1 + • • • + (−1) n 1 − (−1) 0 ) where the power (−1) 0 is added and then subtracted.The second sum will thus be either 0, −1, or − 2, depending on the parity of n 0 and n 1 .The first sum will oscillate among zero, an increasingly large positive number, and an increasingly large negative number.Unless the first sum is eventually equal to zero for all sufficiently large M, there is no way the combined sum can have a finite limit.However, if the first sum is identically zero for large M, then n 0 = −n 1 , and the second sum equals zero or −2 depending on the parity of n 1 .If M yields one parity for n 1 , a slight increase in M will reverse the parity.Therefore, the sum will oscillate between zero and −2.
Now, consider the existence of L(φ) where we use the mollifier φ λ just defined.As λ tends to zero, r tends to one and the above tends to L(φ) = −K.Hence, L(φ) exists even though L(c) exists for no c.

Cumulative Distributions and Characteristic Functions
Two familiar alternative formulas for the ordinary mean are: and: Here, F(t) de f = P(X ≤ t) for t in R and F(t−) = P(X < t) = lim x→t− F(x).We review these formulas below and relate them to our notions of the mean.
In the case of the cumulative distribution, we have the following theorem.
Theorem 6.Let X be a random variable with probability distribution P and cumulative distribution F. Then: (i) E(X) exists if and only if 1 − F(t−) and F(−t) are in L 1 ([0, ∞)) with respect to the Lebesgue measure.In this case, E(X) = Proof.In order for E(X) to exist, it is necessary and sufficient that x be integrable with respect to P on (−∞, ∞).In particular, this is equivalent to x being integrable with respect to P on both [0, ∞) and (−∞, 0], with: where again Fubini's theorem has been used.This entire expression differs from: ) equals zero by symmetry.However, L(c) does not exist for other choices of c.If c is positive, the integral of x over the closed interval [c − M, c + M] reduces by cancellation to its integral over the half-open interval (M − c, M + c].This integral oscillates between zero and 1 2 for large M depending on whether 2 n is in the interval (M − c, M + c] or not.A similar behavior occurs when c < 0. This example is an instance of Theorem 1 Part (ii).
where a and b are numbers in(1,2) and A, B, C, and D are suitable positive constants that guarantee that the density integrates to one.This random variable satisfies Part (iii) of Theorem 1.It is easy to see that L(c) ≡ ∞ for all c or −∞ for all c according as b > a or a > b.When a = b, then L(c) ≡ +∞ if AD − BC > 0 and L(c) ≡ −∞ if AD − BC < 0. When a = b and AD = BC, then L(c) is a real number independent of c as in Part (iii) of Theorem 1.If we let a = b = 2 and (A, C) = (B, D) = ( 1 π , 1), we obtain the Cauchy distribution, which also illustrates Part (iii) of Theorem 2. with L(c) ≡ 0.

M 1 0( 1 − 1 0( 1 P 1 P 1 P 1 0
F(t−) − F(−t)) dt = M P(X ≥ t) − P(−t ≥ X)) dt, where we have introduced a new variable M 1 , by the quantity:− c+M 0 P(X > c + M) dt + 0 c−M P(c − M > X) dt + c+M M (X ≥ t)dt − M−c M (−t > X) dt = −MP(X > c + M) + MP(c − M > X) − cP(X > c + M) − cP(c − M > X) (−t > X) dt.If we suppose that lim K→∞ K 0 (1 − F(t−) − F(−t)) dt,then the combination of the last two terms tends to zero as M and M 1 tend to ∞ since M+c M−c P(t > X) dt tends to zero as M gets large.Likewise, the previous two terms with multiplier c tend to zero as M gets large.Therefore, L(c) exists and equals lim M 1 →∞ M (1 − F(t−) − F(−t)) dt if and only if lim M→∞ M(P(X > c + M) − P(c − M > X)) = 0.