Next Article in Journal
Labelled Natural Deduction for Public Announcement Logic with Common Knowledge
Next Article in Special Issue
On the Consecutive k1 and k2-out-of-n Reliability Systems
Previous Article in Journal
Gas–Liquid Two-Phase Flow Investigation of Side Channel Pump: An Application of MUSIG Model
Previous Article in Special Issue
Multi-Partitions Subspace Clustering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Results in Broken Stick Models: The Approach via Lorenz Curves

“Gheorghe Mihoc-Caius Iacob” Institute of Mathematical Statistics and Applied Mathematics, 050711 Bucharest, Romania
Mathematics 2020, 8(4), 625; https://doi.org/10.3390/math8040625
Submission received: 29 January 2020 / Revised: 1 April 2020 / Accepted: 7 April 2020 / Published: 18 April 2020
(This article belongs to the Special Issue Probability, Statistics and Their Applications)

Abstract

:
A stick of length 1 is broken at random into n smaller sticks. How much inequality does this procedure produce? What happens if, instead of breaking a stick, we break a square? What happens asymptotically? Which is the most egalitarian distribution of the smaller sticks (or rectangles)? Usually, when studying inequality, one uses a Lorenz curve. The more egalitarian a distribution, the closer the Lorenz curve is to the first diagonal of [ 0 ,   1 ] 2 . This is why in the first section we study the space of Lorenz curves. What is the limit of a convergent sequence of Lorenz curves? We try to answer these questions, firstly, in the deterministic case and based on the results obtained there in the stochastic one.

1. Introduction: Defining the Problem

Some time ago we were interested in the emergence and evolution of inequality if a whole is divided at random into n pieces. How much inequality is there in that division? The whole may be the surface, the population, the total GDP of the world, the number of species of insects or birds, etc. What means “at random”, and how can we model this phenomenon?
This is the so-called “broken stick model”. It was introduced by MacArthur [1] in 1957, and it has been extensively studied since then. A probabilistic approach can be found in Cohen [2]. Since “at random” usually means “uniformly distributed”, these papers were about uniform spacings. Many books and papers deal with them, e.g., Aly [3], Bansal et al. [4], Beirlant [5], Barton and David [6], Csörgö [7], Cucala [8], Durbin [9], Le Cam [10], Pyke [11,12], Rao [13], Stephens [14,15], Tung [16], Wilks [17].
The inequality produced by these models was also considered via Lorenz curves, generalized Lorenz curves, and Gini coefficients (see for instance Arnold [18,19], Sarabia et al. [20] or Stephens [14,15]). The convergence of the empirical Lorenz curves to a theoretical one was studied by Goldie in 1977 (see Reference [21]). The order between Lorenz curves is a particular stochastic ordering (see Marshall and Olkin [22] and Stoyan [23]).
Our study was prompted by the strange fact that if we replace “at random = uniform distribution” with “at random = any absolutely continuous distribution on [0, 1]”, the inequality produced in the broken stick model seems to increase. Simulations with different distributions made us conjecture that the Lorenz curves of the spacings always have a limit, and that limit is below the graph of L ( x ) = x + ( 1     x )   l n   ( 1     x ) , meaning that all the other distributions of spacings dominate the spacings produced by the uniform distribution in the Lorenz order.
We prove that in a particular case.
In order that this paper be as self-contained as possible, we study in the first three sections the case when the spacings are produced by a deterministic algorithm, not by a random one. The main results are Proposition 14 and its analog in terms of spacings, Proposition 20, which say that some mixtures increase the inequality.
The last two sections focus on the random case. The main result is Proposition 27 which implies the fact that if the density of the random variables used to break the stick is a step function, then the limit of the corresponding Lorenz curves does exist, and it lies below the Lorenz curve of the exponential distribution. The last section deals with the broken rectangle: the limit of the Lorenz curves does exist and one can compute its Gini coefficient even if we do not have an analytic formula for the Lorenz curve.
We have collected in the Appendix A several results which may be well known, but to which we were not able to find appropriate references.

2. Generalities: The Lorenz and Pre-Lorenz Curve of a Probability Distribution on the Non-Negative Half-Line

Notation. Let X ≥ 0 be a non-negative random variable. We will denote by the same letter F or F X both its distribution and its distribution function. There will be no danger of confusion, since if B is a Borel set, then F(B) will mean P(X B ), while if x , F(x) will actually mean F((− , x ] ) = P(X x ) . Moreover, as F X ( x ) = 0 , for all x < 0 we may as well think that F : [ 0 , )   .
We will denote by F 1   :   [ 0 ,   1 )   its superior quantile defined by F 1 ( p ) = i n f   { t :   F ( t ) > p } .
Notice that we define the quantiles only on the interval [ 0 ,   1 ) , since F 1 (1) may be if the distribution does not have a compact support.
The expectation E X depends only on F. In terms of distribution, it will be denoted by e ( F ) . Since X is non-negative, it is more or less obvious that:
e ( F ) =   x d F ( x ) = 0 1 F 1 ( y ) d y
whereby 0 1 F 1 ( y ) d y , we understand the Riemann integral l i m ε 0 0 1 ε F 1 ( y ) d y . If X is integrable, then e ( F ) < , otherwise e ( F ) = .
Let Λ F :   [ 0 , 1 )     [ 0 ,   ) be defined by:
Λ F ( p ) = 0 p F 1 ( y ) d y
As F 1 is non-decreasing, Λ F ( p ) p F 1 ( p ) , thus the definition of Λ F makes sense. We call Λ F the pre-Lorenz curve of the distribution F.
The Lorenz curve is defined by:
L F ( p ) = Λ F ( p ) Λ F ( 1 0 ) ,   0     p < 1
and makes sense only for distribution with finite expectation. Here and in the sequel Λ F ( 1 0 ) = l i m p 1 Λ F ( p ) . The limit does exist due to the monotonicity reasons.
The pre-Lorenz curve makes sense for any probability distribution on [0, ∞), even if its expectation is infinite.
The pre-Lorenz curve was introduced by Shorrocks [24] in 1983 under the name of “Generalized Lorenz Curve”, and it was studied by many authors under this name. We think that “pre-Lorenz” is a more appropriate name, since one computes the Lorenz curve after computing its pre-Lorenz one and dividing by the expectation of the distribution.
Example 1.
Let F = ( a b p q ) = p δ a + q δ b with 0     a < b.
Here and in the sequel, δ a means the Dirac measure at a defined by δ a ( B ) = 1 B ( a ) = { 1 i f   a B 0 e l s e w h e r e .
Then F 1 ( t ) = a 1 ( 0 ,   p ) ( t ) + b 1 [ p , 1 )   ( t ) , therefore Λ F ( t ) = a t 1 ( 0 , p ) ( t ) + ( a p + b ( t p ) ) 1 [ p , 1 )   ( t ) ,   Λ F ( 1     0 ) = a p + b q ,
L F ( t ) = { a t a p + b q i f   t < p a p + b ( t p ) a p + b q i f   p t < 1
Example 2.
Let F ( x ) = ( 1 1 x α ) 1 [ 1 , ) ( x ) be the Pareto distribution of parameter α > 0.
Then F 1 ( p ) = ( 1 p ) 1 α and
Λ F ( p ) = { α 1 α ( 1 ( 1 p ) 1 α α ) i f 0 < α < 1 ln ( 1 p ) i f α = 1 α α 1 ( 1 ( 1 p ) α 1 α ) i f α > 1 ,   Λ F ( 1 0 ) = { i f   α 1 α α 1 i f   α > 1
L F ( t ) = { 0 i f   α 1 1 ( 1 p ) α 1 α i f   α > 1
Unlike the mapping F     L F , the mapping F   Λ F is a one-one operator.
Proposition 1.
Let P be the set of all the probability distributions on ( [ 0 ,     ) , B ( [ 0 ,     ) ) . Let F     P. Then Λ F ( 0 ) = 0 , Λ F ( 1     0 ) = Λ F ( 1 ) = e ( F ) . Moreover, Λ F is convex and non-decreasing, and its right derivative is F−1.
Proof. 
Obvious. Lemma A1 says that any convex function defined on [a, b] is continuous on (a, b), and it has both right and left derivatives. Its right derivative is ( Λ F ) r ( p ) = l i m δ 0 p p + δ F 1 ( y ) d y δ = F 1 ( p ) for all p     [ 0 ,   1 ) if F 1 stands for the superior quantile which is right continuous. □
Let now C be the set of all convex functions L : [ 0 , 1 )     [ 0 ,   ) which are convex and non-decreasing. The following result concerns the unicity.
Proposition 2.
The mapping F   Λ F from P to C is onto and one–one.
Proof. 
Let F, G be two distributions from P such that Λ F = Λ G . Taking the right derivatives, it follows that F 1 = G 1 . As F, G are right continuous, it follows that F = G .
This is a consequence of the following elementary fact: Let F ,   G :     be non-decreasing functions. Let α :   ( F ( ) , F ( ) ) , β :   ( G ( ) , G ( ) ) be two functions having the property that F ( α ( p )     0 )     p     F ( α ( p ) + 0 ) ,   G ( β ( p )     0 )     p     G ( β ( p ) + 0 ) . If α   =   β , then F(x) = G(x) for any continuity point of F or G. Indeed, suppose, to make a choice, that F(x) < G(x). Let p such that F ( x ) < p < G ( x ) . Thus F ( x ) < F ( α ( p ) + 0 ) , G ( x ) > G ( α ( p )     0 )     x     α ( p ) ,   x     α ( p )     x =   α ( p ) and F ( x )   <   F ( x + 0 ) ,   G ( x )   >   G ( x 0 ) . Thus, x must be a discontinuity point both for F and G . This set is at most countable.
On the other hand, if Λ C then, its right derivative λ = Λ is right continuous and non-decreasing. According to Lemma A4, there exists a distribution function F such that λ = F 1 hence Λ F =   Λ .  □
Remark 1.
If F has finite expectation, we can see that L F   ( 1     0 ) = 1 . Indeed, L F ( p ) = Λ F ( p ) Λ F ( 1 ) = Λ F ( p ) e ( F ) . If a function L :   [ 0 ,   1 ] is convex, non-decreasing, L ( 0 ) = 0 ,   L ( 1 ) = 1 but L ( 1     0 ) < 1 , we call it a defective Lorenz curve.
If F n and F are distributions on the real line, their weak convergence will be denoted by “ F n F ”. Thus F n F means   f d F n   f d F for any continuous and bounded f : [ 0 , ) .
The connection between the Λ - transform and the weak convergence is given by the following.
Proposition 3.
Let F n , F   P be probability distributions on [ 0 ,     ) . Then F n   F if and only if Λ F n ( p ) Λ F ( p ) for any p     ( 0 ,   1 ) . Moreover, if the limit Λ of the sequence ( Λ F n ) n does exist and it is finite, then the weak limit of the sequence ( F n ) n does exist, too, and Λ = Λ l i m F n .
Proof. 
According to Lemma A5 from the Appendix A, F n   F if and only if F n 1 ( p )     F 1 ( p ) for p     [ 0 ,   1 ] , with the possible exception of a countable set, N .
If p   N , then for any ε   > 0 there exists n ε such that n     n ε   F n 1 ( p )   F 1 ( p ) +   ε . As F n 1 are non-decreasing and non-negative, for any 0     F n 1 ( t )   F 1 ( p ) +   ε for all t     p ,   t     N . Therefore, we may apply Lebesgue theorem of dominated convergence to infer that 0 p F n 1 ( t ) d t 0 p F 1 ( t ) d t or, which is the same thing, that Λ F n ( p ) Λ F ( p ) .
Conversely, if Λ F n ( p ) Λ F ( p ) , then, according to Lemma A3, the right derivatives converge too: ( Λ F n ) ( p ) ( Λ F ) ( p ) at all points p where ( Λ F ) r ( p ) = ( Λ F ) l ( p ) . But the right derivatives are F n 1 ( p ) and F 1 ( p ) : it follows that F n 1 F 1 with the possible exception of a countable set. Apply again Lemma A5 (iv) to derive that F n converges to F with the possible exception of a countable set. But this means precisely that F n   F .
If we only know that Λ F n ( p ) Λ ( p ) for all 0 < p < 1 and Λ is finite, then its right derivative, Λ , is the superior quantile of some probability distribution on [ 0 ,   ) , and the proof goes in the same way.
The Lorenz curve does not have the same good properties: it is possible that F n   F but L F n ( p ) does not converge to L F .
Example 3.
Let F n = ( a n a 1 1 n 1 n ) , a > 0 . Obviously F n   δ a but L F n ( t ) = { a t 2 a 1 n i f   t < 1 1 n a ( 1 1 n ) + n a ( t 1 + 1 n ) 2 a 1 n i f   1 1 n t < 1 converges to t 2 and not to L δ a ( t ) = t.
Example 4.
The same, modified a bit: F n = ( 1 a n + 1 1 1 n 1 n ) . Then F n   δ 1 , Λ F n ( p ) p   p [ 0 , 1 ) but L F n ( p ) = p 1 + a n n   p [ 0 , 1 1 n ) may have no limit at all if the sequence a n n has no limit.
The reason is the loss of mass to the infinity: if X n   0 and X n   X then E X     l i m n   E X n . This happens due to Fatou’s Lemma.
If there is no loss of mass we can state.
Proposition 4.
Let P1 be the set of the probability distributions from P which have finite expectation. Suppose that F n ,   F   P1, F n     F and e ( F n )     e ( F ) > 0 . Then, L F n ( p ) L F ( p ) for all 0     p     1 .
Conversely, if L n :   [ 0 ,   1 ]     [ 0 ,   1 ] are Lorenz curves, L n     L and L is continuous at p = 1 , then there exist probability distributions F n and F from P 1 such that e ( F n ) = e ( F ) = 1 and F n     F .
Or, reformulated in terms of random variables:
Let ( X n ) n     0 be random variables from L 1 . Let X     0 be another random variable such that E X = 1 . Then X n E X n   D   X L X n L X . Conversely, if L X n L and L is continuous at p = 1 , then there exists a non-negative random variable on the probability space ( Ω ,   K ,   P ) = ( ( 0 , 1 ) ,   B ( ( 0 , 1 ) ) ,   L e b e s g u e ) such that X n E X n   L 1   X .
Proof. 
From Proposition 3, we know that F n   F Λ F n ( p ) Λ F ( p ) . On the other hand,
e ( F n ) = Λ F n ( 1 0 ) and e ( F n ) = Λ F ( 1 0 ) . Thus,
F n     F     Λ F n ( p ) Λ F ( p ) , Λ F n ( 1 0 ) Λ F ( 1 0 ) Λ F n ( p ) Λ F n ( 1 0 ) Λ F ( p ) Λ F ( 1 0 ) L F n ( p ) L F ( p ) for all 0     p     1 . Conversely, if L F n ( p ) L F ( p ) , we apply Lemma A3 from the Appendix A, and the right derivatives converge in L 1 . The right derivatives λ n = ( L n ) r and λ = ( L ) r are right continuous and non-decreasing, thus there exist distribution functions F n and F such that F n 1 = λ n and F 1 = λ (Lemma A4(iv)). Obviously, e ( F n ) = e ( F ) = 0 1 λ n ( x ) d x = 0 1 λ ( x ) d x = 1 and, according to Lemma A5, we see that F n     F .  
The situation changes if e ( F n ) does not converge to e ( F ) .
If the sequence ( e ( F n ) ) n has no limit at all, then Example 4 points out that the sequence L F n ( p ) = Λ F n ( p ) e ( F n ) has no limit, too, (since Λ F n ( p ) Λ F ( p ) for any 0 < p < 1 ).
If e ( F n )     k e ( F ) , we know that k     1 (Fatou’s Lemma). Then the limit is a defective Lorenz curve: l i m n L F n ( p ) = l i m n Λ F n ( p ) e ( F n ) = Λ F ( p ) k e ( F ) = 1 k L F ( p ) . Thus, we can add to the proposition 4 the following fact.
Proposition 5.
Suppose that F n ,   F       P1, F n     F and e ( F n )     k e ( F ) > 0 . Then, L F n ( p ) 1 k L F ( p ) for all 0     p < 1 . If the limit of a sequence of Lorenz curves is defective, then the convergence does not ensure the convergence of expected values. As a limit case, if L F n ( p ) 0 for 0     p < 1 and 0 < e ( F n ) < M for some M > 0 then, F n     δ 0 .
Proof. 
Only the last claim needs a proof. Remark that Λ δ 0 ( p ) = 0 thus if L F n ( p ) 0 then   Λ F n Λ δ 0 . In this case Proposition 3 says that F n   δ 0 . If the the sequence ( e ( F n ) ) n is not bounded, the claim is not true. For instance, if F n = ( 1 n 2 1 1 n 1 n ) then L F n ( p ) 0 but F n δ 1 and not to δ 0 .  

3. The Empirical Lorenz and Pre-Lorenz Curves of a Sequence of Non-Negative Real Numbers

Here, we study the deterministic case: if the sequence of points which generates the smaller sticks is an usual one. We generalize a bit, giving up the restriction that the points lie between 0 and 1.
Let a = ( a n ) n     1 be a sequence of non-negative reals. We attach to it the sequence of probability distributions.
F n = F n ( a ) = k = 1 n δ a k n ,   F n * = F n * ( a ) = k = 1 n δ n s n a k n ,   s n = a 1 + + a n
Next, we denote by Λ n = Λ n ( a ) and L n = L n ( a ) the pre-Lorenz and the Lorenz curves of F n . Thus:
Λ n = Λ n ( a ) = Λ F n   ( a ) ,   L n = L n ( a ) = L F n   ( a )
We want to know when these curves have a limit. First, a remark which is useful for calculus:
Proposition 6.
Let a = ( a n ) n     1 be a sequence of non-negative reals. Let n     2 be fixed. Let also O n ( a ) = ( a 1 : n   , a 2 : n   , , a n , n ) be the sorted (ascending) values of ( a 1 , a 2 , a n ) . Denote by S k : n the sum
S k : n = a 1 : n + a 2 : n + + a k : n , f o r 1       k     n
Then,
(i) 
Λ n is the polygonal line which joins the points (0,0), ( 1 n , S 1 , n n ) , ( 2 n , S 2 , n n ) ,…, ( n n , S n , n n ) , and L n is the polygonal line which joins the points ( 0 , 0 ) , ( 1 n , S 1 , n S n , n ) , ( 2 n , S 2 , n S n , n ) ,…, ( n n , S n , n S n , n ) .
Or, formally, for any 1   k     n we have:
                              Λ n ( k 1 + ε n ) = ( 1 ε ) S k 1 , n + ε S k , n n , L n ( k 1 + ε n ) = ( 1 ε ) S k 1 , n + ε S k , n S n , n
(ii) 
L F n ( a ) = Λ F n * ( a ) .
(iii) 
Λ n ( α a ) = α Λ n ( a ) , L n ( α a ) = L n ( a ) for any α > 0 . The Lorenz curve is invariant to homotheties.
(iv) 
if σ     S n is a permutation, then Λ n ( a ( 1 ) , a ( 2 ) , , a ( n ) ) = Λ n ( a ) . Both Λ n and L n are invariant with respect to permutation of the first n terms of a .
Proof. 
The distribution function of F n is F n ( x ) = k = 1 n 1 [ a k , ) ( x ) n a.s.o. The assertions are elementary and left to the reader.
Example 5.
If z j are some points in plane, we denote by P o l y ( z 1 z 2 z k ) = P o l y   ( z j   ;   1     j     k ) the polygonal line which joins the points z 1 , z 2 , , z k from plane (in this order). Then:
(i) 
a = ( 1 ,   0 ,   1 ,   0 ,   1 ,   0 ,   ) . Then O n ( a ) = ( 0 ,   ,   0 ,   1 ,   ,   1 ) where, if n = 2 k , we have k zeros and k ones, and if n = 2 k + 1 , we have k zeros and k + 1 ones. If n = 2 k , then Λ n is the polygonal line which joins the points ( 0 ,   0 ) , ( 1 n , 0 ) , , ( k n , 0 ) , ( k + 1 n , 1 n ) , , ( n n , k n ) . Many points are on the same line, thus it is easier to write Λ 2 k = P o l y ( ( 0 , 0 ) ( 1 2   , 0 ) ( 1 ,   1 2   ) ) and Λ 2 k + 1 = P o l y ( ( 0 , 0 ) ( k 2 k + 1 , 0 ) ( 1 , k + 1 2 k + 1 ) ) . As S n : n = k if n = 2 k and S n : n = k + 1 if n = 2 k + 1 , it follows that L 2 k = P o l y ( ( 0 , 0 ) ( 1 2   , 0 ) ( 1 , 1 ) ) and Λ 2 k + 1 = P o l y ( ( 0 , 0 ) ( k 2 k + 1 , 0 ) ( 1 , 1 ) )   .
(ii) 
More general, if a is a periodic sequence: a = ( x 1 , , x k ,   x 1 , , x k , ) , then the describing of Λ n and L n become very annoying if we do not know the ordering of the numbers x j . But if we assume that x 1     x k , the describing can be done as follows: suppose that n = k m + r with 0     r < k , then O n ( a ) = ( x 1 , ,   x 1 ,   x 2 ,   , x 2   , , x r , , x r , x r + 1 ,   ,   x r + 1 ,   , x k ,   ,   x k ) , where x j occur m + 1 times if j     r , and it occurs m times if j > r . Thus S n : n = m S k : k + S r : k and
Λ n = P o l y ( ( 0 , 0 ) ( m + 1 n , m + 1 n x 1 ) ( 2 ( m + 1 ) n , m + 1 n ( x 1 + x 2 ) ) ( r ( m + 1 ) n , m + 1 n ( x 1 + + x r ) ) ( r ( m + 1 ) + m n , m + 1 n ( x 1 + x 2 + + x r ) + m n x r + 1 ) ( r ( m + 1 ) + m ( k r ) n , m + 1 n ( x 1 + x 2 + + x r ) + m n ( x r + 1 + + x k ) ) ( 1 , m S k : k + S r : k n ) ) , and L n   =   P o l y ( ( 0 , 0 ) ( m + 1 n , m + 1 S n : n x 1 ) ( 2 ( m + 1 ) n , m + 1 S n : n ( x 1 + x 2 ) ) ( r ( m + 1 ) n , m + 1 S n : n ( x 1 + x 2 + + x r ) ) ( r ( m + 1 ) + m n , m + 1 S n : n ( x 1 + x 2 + + x r ) + m n x r + 1 ) ( r ( m + 1 ) + m ( k r ) n , m + 1 S n : n ( x 1 + x 2 + + x r ) + m S n : n ( x r + 1 + + x k ) ) ( 1 , 1 ) )
(iii) 
Suppose that a n = α n . If α > 0 , the sequence is increasing, if α = 0 it is constant, and if α < 0 it is decreasing. In the first two cases, no sort is necessary hence
Λ n = P o l y   ( ( 0 , 0 ) { ( k n , s k n ) ; 1 k n } ) , L n = P o l y   ( ( 0 , 0 ) { ( k n , s k s n ) ; 1 k n } ) where s n = 1α + 2α + … + nα and s 0 = 0 . If α = β < 0 then O n ( a ) = ( 1 n β , 1 ( n 1 ) β , , 1 2 β , 1 ) , therefore
Λ n = P o l y   ( ( 0 , 0 ) { ( k n , s n s n k n ) ; 1 k n } ) , L n = P o l y   ( ( 0 , 0 ) { ( k n , s k s n k s n ) ; 1 k n } )
(iv) 
a   = ( 1 ,   0 ,   0 ,   1 ,   1 ,   1 ,   1 ,   0 ,   0 ,   0 ,   0 ,   0 ,   0 ,   0 ,   0 ,   1 ,     ( 2 4   t i m e s ) ,   0 ,   0 ,     ( 2 5   t i m e s ) ,   . )
Let N = 1 + 2 + + 2 n 1 = 2 n 1  
If n = 2 m is even, O n ( a ) = ( 0 ,   0 ,   (   2 3 ( 2 n 1 )   t i m e s   ) ,   1 ,   1 ,   ,   (   1 3 ( 2 n 1 )   t i m e s ) ) and if n = 2 m + 1 is odd, O n ( a ) = ( 0 ,   0   (   1 3 ( 2 n 1 )   t i m e s   ) , 1 , 1 , ,   ( 2 3 ( 2 n 1 )   t i m e s ) ) . In the first case, F n = ( 0 1 2 3 1 3 ) , and in the second one it is F n = ( 0 1 1 3 2 3 ) . For other types of n , we obtain intermediate values. The pre-Lorenz curves move between ( p     2 3 ) + and ( p     1 3 ) + , and the Lorenz ones move between 3 ( p     2 3 ) + and 2 ( p     1 3 ) + . None of them have a limit.
The first problem is: for what kind of sequences does the sequence Λ n ( a ) have a finite limit.
This is the easy one.
Proposition 7.
Let a = ( a n ) n     1 be a sequence of non-negative integers. The following assertions are equivalent:
(i) 
The sequence ( F n ( a ) ) n 1 has a weak limit, F = F ( a ) ;
(ii) 
The sequence ( Λ n ( a ) ) n 1 has a finite limit, Λ F ( a ) ;
(iii) 
The sequence ( f ( a n ) ) n     1 has finite Cesaro limit for any continuous bounded f : [ 0 , ) and this limit is equal to   f d F .
Proof. 
The equivalence (i) (ii) is actually Proposition 3. According to the well-known Portmanteau proposition (see, for instance, References [25,26]), one knows that F n     F if and only if   f d F n   f d F   for any f : bounded continuous. In our case,   f d F n ( a ) = f ( a 1 ) + f ( a 2 ) + + f ( a n ) n . If this limit exists for any continuous and bounded f , then the sequence of probability measures ( F n ) n should have a limit weak limit too.
In order to detect the limit, we have to study the empirical distribution functions F n ( x ) = | { k n : a k x } | n . The hard way is to compute Λ n ( a ) . If we want to find the limit of L n (a), the fact that F n     F does not help very much unless we can prove that e ( F n ) converges to e ( F ) , too.
Sometimes the following result may help.
Lemma 1.
Let f n   : [ a , b ]   [ m , M ] be convex and non-decreasing. Suppose that for any x     ( a , b ) there exists a sequence ( x n ) n such that x n   x and the sequence ( f n ( x n ) ) n is convergent.
Then, the limit f ( x )   : = l i m n f n ( x ) does exist for any x       ( a , b ) , and it is convex and non-decreasing. Moreover, f ( x ) = l i m n f n ( x ) .
Proof. 
Let a < x < b and f ( x ) = l i m n f n ( x n ) . Write f ( x ) f n ( x ) = f ( x ) f n ( x n ) + f n ( x n ) f n ( x ) . Let ε > 0 be small enough in order that a < x ε < x + ε < b . Let n ε be such that n n     x ε < x n < x + ε . For n n ε , we have f n l ( x ε ) f n ( x ) f n ( x n ) x x n ( f n ) r ( x + ε ) hence | f n ( x ) f n ( x n ) | ( f n ) r ( x + ε ) | x x n | . By chord inequality, ( f n ) r ( x + ε ) f n ( b ) f n ( x + ε ) b ( x + ε ) and the last quantity is smaller than M m b ( x + ε ) . Thus, we obtained the evaluation | f ( x )   f n ( x ) |   | f ( x )   f n ( x n ) | + M m b ( x + ε ) | x x n |     0 if n     .
A first positive answer for the existence of the limit is:
Proposition 8.
Let a     0 . Suppose that a n     a . If a > 0 , then F n ( a )   δ a , F n * ( a )   δ 0 and L n ( a ) ( t )     t ,   0     t     1 . If = 0 , then F n ( a ) δ 0 . If, moreover, the series k = 1 a k is convergent then F n * ( a )   δ 0 and L n ( a ) ( t )     0 .
Proof. 
The first claim is obvious: if f is continuous and bounded, then l i m k = 1 n f ( a k ) n = l i m f ( a n ) = f ( a ) for a 0 . To prove that F n * ( a )   δ 1 write a n = a   + ε n with ε n 0 . Let T n   = ε 1 + + ε n , S n = a 1 + + a n = n a + T n . Notice that n a k S n = n a + n ε k n a + T n = 1 + ε k n 1 + T n n a converge to 1 as n     for any k     n . Let f : [ 0 , )     [ 0 , ) be uniformly continuous and bounded. Let ε > 0 be arbitrary small and let δ = δ ( ε ) be such that | x y | < δ     | f ( x )     f ( y ) | < ε .
Let nε be such that n > nε | n a k S n 1 | < δ . Notice that l i m n k = 1 n f ( n a k S n ) n = l i m n k = n ε + 1 n f ( n a k S n ) n hence f ( 1 ) ε = l i m n ( n n ε ) ( f ( 1 ) ε ) n l i m n k = n ε + 1 n f ( n a k S n ) n l i m n ( n n ε ) ( f ( 1 ) + ε ) n = f ( 1 ) + ε . As ε is arbitrary, l i m n k = 1 n f ( n a k S n ) n = f ( 1 ) =   f d δ 1 . As e ( δ 1 ) = e ( F n * ) = 1 , the Lorenz curves L n ( a ) converge to the Lorenz curve of δ 1 , namely, to L ( t ) = t .
If a = 0 , we compute the Lorenz curve, i.e., the polygonal line given by the points ( k n , b 1 + + b k S n ) , where b k = a k : n . Let c n = b n k + 1 and k = c 1 + + c k . Note that the sequence ( c n ) n is non-increasing, S n = Σ n = c 1 + + c n , b 1 + + b k = Σ n Σ n k . If we write b 1 + + b [ n t ] S n = Σ n Σ [ n t ] Σ n = 1 Σ [ n t ] Σ n and remark that 0 < t < 1   l i m Σ n = l i m   Σ [ n t ] = k = 1 a k , it becomes obvious that l i m n b 1 + + b [ n t ] S n = 0 . L n ( a )   0 .
By Proposition 6 (ii), we know that L F n ( a ) = Λ F n * ( a ) . As Λ F n * converges to 0 , it results that F n * δ 0 .
Remark 2.
If k = 1 a k = , a n   0 we can say nothing about the limit of L n ( a ) . For instance, if a = ( 1 n α ) n 1 0 < α < 1 is the sequence from Example 5(iii) below, then a n   0 but the sequence L n ( a ) ( p ) has the limit L ( p ) = 1 ( 1 p ) 1 α . Therefore, the limit of F n * ( a ) is the probability distribution with the distribution function F * ( x ) = ( 1 ( 1 α x ) 1 α ) + which is a Pareto-type one. Thus, it is very possible that F n ( a ) and F n ( b ) have the same limit, but F n * ( a ) and F n * ( b ) have a different limit—if any. Conversely, F n * ( a ) and F n * ( λ a ) have the same limit, but F n ( a ) and F n ( λ a ) have different limits. Many other examples will follow.
Remark 3.
The order does matter. Precisely, if we know that F n ( a )     F and we modify the terms of a according to some permutation σ : { 1 , 2 , }   { 1 , 2 , } , then we can say nothing about the limit of F n ( a σ ) . Suppose, for instance, that a = (0,1,0,1,0,1,….) and that σ is chosen in such a way that a σ = ( 0 , 1 , 0 , 0 , 1 , 1 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 1 , 0 ( 23   t i m e s ) , 1 ( 23   t i m e s ) , 0 ( 24   t i m e s ) , 1 ( 24   t i m e s ) , ) . The reader can check that ( F n ( a σ ) ) n has no limit. This is a main difference between the Cesaro convergence and the usual convergence.
We study now what happens if we modify the sequence a. The question is: how sensitive is L with respect to that change, and, in general if F n ( a ) has a limit (or F n * ( a ) has a limit) what can we say about the limits of F n ( φ ( a ) ) or F n * ( φ ( a ) ) if φ   : [ 0 , ) [ 0 , ) ?
A partial answer is:
Proposition 9.
(i) Let a be a non-decreasing and non-bounded sequence of non-negative reals. Let b be a bounded sequence of non-negative reals. Suppose that ( F n * ( a ) ) n is weakly convergent to some probability distribution F and that S n 2 n 2 a n as n     , where S n = a 1 + a 2 + + a n .
Then F n * ( a + b )   F , too, and L n ( a + b )     L F .
(ii) If F n ( a )       F , then F n ( φ ( a ) )   F ϕ 1 .
(iii) If L n ( a )     L , then L n ( α a + β ) ( t )     L ( t ) + β α μ t 1 + β α μ t , where μ = l i m n S n n > 0 , α > 0 , β > 0 , t ( 0 , 1 ) .
Proof. 
(i). Let T n = b 1 + b 2 + + b n . The assumption is that   f d F n * ( a )   f d F for any f : bounded and uniformly continuous. According to the definition of F n * , that is the same with:
j = 1 n f ( n S n a j ) n   f d F
The claim is that:
j = 1 n f ( n S n + T n ( a j + b j ) ) n   f d F
Let ε   > 0 be arbitrary and let δ = δ ( ε ) such that | x     x | δ     | f ( x )     f ( x ) | ε . We prove that there exists a positive integer n 0 such that n n 0 | n S n a j n S n + T n ( a j + b j ) | < δ for any 1     j     n . Indeed, let M > 0 such that b j     M for any j . Since a is non-decreasing, we have | n a j S n n ( a j + b j ) S n + T n | = n S n | T n a j S n b j | S n + T n n S n T n a j + S n b j S n + T n n S n n M a j + S n M S n + T n n M S n n a n + S n S n + T n M ( n 2 a n S n 2 + n S n ) .
According to our assumptions S n 2 n 2 a n     n 2 a n S n 2   0 . On the other hand, n S n   0 , since a n is non-decreasing and non-bounded. It follows that there exists n 0 such that n     n 0   | n a j S n n ( a j + b j ) S n + T n | < δ for 1     j     n   | f ( n a j S n ) f ( n ( a j + b j ) S n + T n ) | < ε. Therefore, | j = 1 n f ( n S n + T n ( a j + b j ) ) j = 1 n f ( n a j S n ) | n < ε for any n     n 0 . As ε is arbitrary, the claim (12) immediately follows.
(ii) is obvious.
(iii) The assumption is that the sequence of polygonal lines given by the points ( k n , a 1 : n + + a k : n S n ) 0 k n converges to some Lorenz curve L . We want to see what happens with the polygonal lines given by the points ( k n , α ( a 1 : n + + a k : n ) + k β α S n + n β ) 0 k n = ( k n , a 1 : n + + a k : n S n + k β α S n 1 + n β α S n ) 0 k n . If k = [ n t ] with 0 < t < 1 , then   n         a 1 : n + + a k : n S n   L ( t ) , k β α S n β t α μ , n β α S n β α μ if μ < . If μ = , the limit remains L ( t ) .
Remark 4.
If μ = 0 , the result fails to be true. However, one may check that if k = 1 a k < , then F n ( α a + β )   δ 1 .
Remark 5.
One may see that if β     0 , the Lorenz curve increases. This is a particular case of a theorem of Arnold [27] which says that if X     0 a random variable, E X > 0 then the Lorenz curve of f ( X ) is above the Lorenz curve of X if and only if f is non-decreasing and f ( x ) x is non-increasing. In our case f ( x ) = α x + β .
Combining. 
Now we study what happens if we combine two sequences a and b. Precisely, if we know the limits for L n ( a ) and L n ( b ) , what can we say about the limit of this combination.
By combination we mean the sequence a & b = ( a 1 , b 1 , a 2 , b 2 , . ) .
A problem that arises is to describe the limit of F n * ( a & b ) if we know the limits of F n * ( a ) and F n * ( b ) .
We cannot expect the result to be some mixture of the two limit distributions as the following example shows:
Proposition 10.
Let a = ( 1 , 2 , 3 , 4 , . . ) and b = ( α , 2 α , 3 α , . ) . Then F n * ( a ) = F n * ( b )   U n i f o r m ( 0 , 2 ) . However F n * ( a & b )     1 2   ( U n i f o r m ( 0 ,   4 1 + α ) + U n i f o r m ( 0 ,   4 α 1 + α ) ) .
Proof. 
F n * ( a ) = F n * ( b ) = k = 1 n δ 2 k n + 1 n . If f is bounded and continuous, then   f d F n * ( a ) = k = 1 n f ( 2 k n + 1 ) n 0 1 f ( 2 x ) d x = 1 2 0 2 f ( x ) d x =   f d μ where μ = U n i f o r m ( 0 , 2 ) .
In order to compute   f d F n * ( a & b ) , we have to consider two cases: if n is odd or even. If n = 2 m then:
  f d F n * ( a & b ) = j = 1 m f ( 4 j ( 1 + α ) ( m + 1 ) ) + j = 1 m f ( 4 α j ( 1 + α ) ( m + 1 ) ) 2 m = = m + 1 2 m ( 1 m + 1 j = 1 m f ( 4 1 + α j m + 1 ) + 2 2 m + 1 j = 1 m f ( 4 α 1 + α j m + 1 ) )
and the last sum converges to:
1 2 ( 0 1 f ( 4 1 + α x ) d x + 0 1 f ( 4 α 1 + α x ) d x ) = 1 2   f d ( Uniform ( 0 , 4 1 + α ) + Uniform ( 0 , 4 α 1 + α ) )
If n = 2 m + 1 , the computations are almost the same.
Notice that if α = 1 (hence a = b ), then F n * ( a & b )     U n i f o r m ( 0 ,   2 ) .
There is no general result concerning the limit of L n ( a & b ) . However, we can state some partial ones.
Proposition 11.
Let a , b be two non-decreasing sequences of positive numbers and let S n = a 1 + a 2 + + a n . Suppose that a & b is also non-decreasing, S n + 1 S n 1 as n   and that the weak limit l i m n F n * ( a ) = F does exist and e ( F ) = 1 .
Then, both L n ( b ) and L n ( a & b ) converge to the Lorenz curve of F , L F .
Or, otherwise written, l i m n F n * ( b ) = l i m n F n * ( a & b ) = F .
Proof. 
The assumption is that the points ( [ n p ] n , S [ n p ] S n )   ( p , L F ( p ) as n for any 0 < p < 1 . Let then T n = b 1 + b 2 + + b n . Notice that S n T n S n + 1 . It means that S [ n p ] S n + 1 T [ n p ] S n S [ n p ] + 1 S n . We supposed that S n + 1 S n 1 as n . It follows that l i m n S [ n p ] + 1 S n = l i m n S [ n p ] + 1 S [ n p ] l i m n S [ n p ] S n = 1 L F ( p ) and l i m n S [ n p ] S n + 1 = l i m n S [ n p ] S n l i m n S n S n + 1 = L F ( p ) 1 hence l i m n ( [ n p ] n , T [ n p ] T n ) = ( p , L F ( p ) , or, which is the same thing, that L n ( b )     L F . □
Let c = a & b and let U n = c 1 + + c n . If n = 2 m is even, then U n = S n + T n an if n = 2 m 1 is odd, then U n = S m + T m 1 . On the other hand:
U k U n = { S j + T j S m + T m i f   k = 2 j , n = 2 m S j + T j 1 S m + T m i f   k = 2 j 1 , n = 2 m S j + T j S m + T m 1 i f   k = 2 j , n = 2 m 1 S j + T j 1 S m + T m 1 i f   k = 2 j 1 , n = 2 m 1
Suppose that n and k in such a way that k n p for some 0 < p < 1 . Then [ k / 2 ] [ n / 2 ] p too. In other words, j m p , where j and m are those from (13). In all the four cases U k U n L F ( p ) .
For instance:
| S j + T j S m + T m L F ( p ) | = | S m S m + T m ( S j S m L F ( p ) ) + T m S m + T m ( T j T m L F ( p ) ) | m a x ( | S j S m L F ( p ) | , | T j T m L F ( p ) | )
As a byproduct, we may notice:
Proposition 12.
Let a ,   b be two non-decreasing sequences of positive numbers. Suppose that a & b is also non-decreasing, the weak limits coincide: l i m n F n * ( a ) = l i m n F n * ( b ) = F and e ( F ) = 1 . Then l i m n F n * ( a & b ) = F . As a particular case, l i m n F n * ( a & a ) = F .
Proof. 
It is the same. We do not need the hypothesis that S n + 1 S n 1 .
Mixture. 
Mixture is a generalization of combining. If b and c are two sequences and a = b &   c , then b and c have the same weight in a . It is possible that the weights can be different. Then, we have a mixture.
Definition 1.
Let N be the set of positive integers. Let B = { i 1 < i 2 < i 3 < } , C = { j 1 < j 2 < j 3 < } N be such that B   C = N and B and C are disjointed. Let b and c be two sequences of reals. Let β n = | B   { 1 , , n } | β , γ n = | C   { 1 , , n } | γ. Suppose that β n n   p ( 0 , 1 ) and γ n n   q ( 0 , 1 ) = q = 1 p . Define a n = b k if n B , n = i k and a n = c k if n C , n = j k . Then a is called a ( B , C ) —mixture of b and c . Let S n ( b ) = b 1 + + b n and S n ( c ) = c 1 + + c n . Then it is obvious that S n ( a ) = S β n ( b ) + S γ n ( c ) . If, moreover the limits l i m S β n ( b ) S n ( a ) = π b , l i m S γ n ( c ) S n ( a ) = π c do exist, then we call the sequence “a good ( B , C ) mixture of b and c ”. One may notice that, for instance, if B = { 1 , 3 , 5 , } , C = { 2 ,   4 ,   6 } , then a = b &   c . Thus, combining is indeed a particular case of mixture.
The generalization is obvious. Split the positive integers in sets B 1 , , B k . Consider k sequences b 1 , , b k . Suppose that all the sets B j = { i j . 1 ,   i j . 2 , } are infinite. Let β j , n = | B j   { 1 ,   2 ,   ,   n } | and p j = l i m n β j , n n   . Define the mixture by a n = b j , k if n B j , n = i j , k . Then S n ( a ) = S β 1 , n ( b 1 ) + S β 2 , n ( b 2 ) + + S β k , n ( b k ) .
Good Mixtures.
The mixture ( B 1 , , B k ) of ( b 1 , , b k ) is called a good mixture if all the limits π j = l i m n S β j , n ( b j ) S n ( a ) do exist. Remark that always π 1 + π 2 + + π k = 1 .
Proposition 13.
(i) Let a be a ( B , C ) –mixture of b and c . Suppose that F n ( b )   H b and F n ( c )   H c .
Then F n ( a )   p H b + q H c .
(ii) Suppose that a is a good ( B , C ) mixture of b and c and F n * ( b ) H b * and F n * ( c ) H c * . Then F n * ( a ) p H b * ( h π b / p ) 1 + q H c * ( h π c / q ) 1 where h λ ( x ) = λ x is the homothety. Here H f 1 means the image of H given by f , defined by H f 1 ( B ) = μ ( f 1 ( B ) ) for all Borel sets B .
(iii) Generalization.
If B is a ( B 1 , B 2 , , B k ) ) mixture of b 1 , , b k and F n ( b j ) H j then F n ( a ) k = 1 n p j H j . If the mixture is good and F n * ( b j ) H b j * then F n * ( a ) j = 1 k p j H b j * ( h π j / p j ) 1 .
Proof. 
(i) Let f : [ 0 , ) [ 0 , ) be continuous and bounded. We know that as n   , k = 1 n f ( b k ) n   f d H b and k = 1 n f ( c k ) n   f d H c . We also know that β n n p ( 0 , 1 ) and γ n n q . Then k = 1 n f ( a k ) n = β n n k = 1 α n f ( b k ) β n + γ n n k = 1 β n f ( c k ) γ n p   f d H b + q   f d H c =   f d ( p H b + q H c ) .
(ii) Recall that F n * ( a ) = k = 1 n δ n s n a k n We know by hypothesis that k = 1 n f ( n b k S n ( b ) ) n   f d H b * and k = 1 n f ( n b k S n ( c ) ) n   f d H c * for any uniform continuous and bounded f : [ 0 , ) [ 0 , ) . We want to prove that k = 1 n f ( n a k S n ( a ) ) n p   f h π b / p d H b * + q   f h π b / q d H c * for any bounded continuous f. It is known (see, for instance, Reference [26] p. 371) that it is enough to consider only continuous functions with compact support, provided that the guessed limit is indeed a probability. This is our case. Thus, we write:
k = 1 n f ( n a k S n ( a ) ) n = β n n k = 1 α n f ( n S β n ( b ) S n ( a ) β n β n b k S β n ( b ) ) β n + γ n n k = 1 β n f ( ( n S γ n ( c ) S n ( a ) β n γ n b k S γ n ( c ) ) ) γ n
Notice that when n , n S β n ( b ) S n ( a ) β n = n β n S β n ( b ) S n ( a ) π b p and n S γ n ( b ) S n ( a ) γ n π c q . Taking into account the fact that f is uniformly continuous and its support is included in some interval [ 0 ,   M ] , the reader can check that:
l i m n k = 1 α n f ( n S β n ( b ) S n ( a ) β n β n b k S β n ( b ) ) β n = l i m n k = 1 α n f ( π b p β n b k S β n ( b ) ) β n and in the same way that the same way l i m n k = 1 β n f ( ( n S γ n ( c ) S n γ n γ n b k S γ n ( c ) ) ) γ n = l i m n k = 1 β n f ( ( π c q γ n b k S γ n ( c ) ) ) γ n . But, according to our hypothesis l i m n k = 1 α n f ( π b p β n b k S β n ( b ) ) β n =   f ( π b p x ) d H b * ( x ) and l i m n k = 1 β n f ( ( n S γ n ( c ) S n β n γ n b k S γ n ( c ) ) ) γ n =   ( π c q x ) d H c * ( x ) .
The result is that
l i m n k = 1 n f ( n a k S n ) n = p   f ( π b p x ) d H b * ( x ) + q   f ( π c q x ) d H c * ( x )     =     f d ( p H b * ( h π b p ) 1 + q H c * ( h π c q ) 1 )
(iii) Same proof. .
Corollary 1.
Assume the same conditions as in Proposition 13. If H b j = H for al j, then F n ( a ) H and if H b j * = H * for all j, then F n * ( a ) j = 1 k p j H * ( h π j / p j ) 1 .
Proof. 
Obvious.
Lorenz Domination.
Let F and G be two probability distributions on [ 0 ,   ) and let L F , L G be their Lorenz curves. If L F L G , we say that F is Lorenz dominated by G or, to quote Barry Arnold [28], that “G exhibits more inequality than F ”. Usually one denotes this domination by F   L G . It is known that F   L G   u d F   u d G for all convex functions u : [ 0 , ) : [ 0 , ) for which the integrals are finite. See Reference [29].
It is interesting that a mixture of the type described in Corollary 1 always increases the inequality.
Proposition 14.
Let F be a probability distribution on : [ 0 , ) . Let ( p j ) 1 j k , and ( π j ) 1 j k be convex combinations with positive coefficients. Let λ j = π j p j . Then F   L j = 1 k p j F ( h λ j ) 1 .
Proof. 
Let u be convex. Then,   f d ( j = 1 k p j F ( h λ j ) 1 ) =   j = 1 k p j f ( λ j x ) d F ( x )   f ( p j k p j λ j x ) d F ( x ) =   f d F .
Example 6.
a n = n α .
The sequence ( Λ n ) n . If α > 0 a n   and the sequence is increasing. ( F n ( a ) ) n has no limit (unless we consider the abstract distribution δ). Thus Λ n ( a ) ( p ) . If α = 0 , a n = 1   F n = δ 1 . If α < 0 , a n 0 F n = δ 0 hence the result is l i m n Λ n ( p ) = { 0 i f α < 0 p i f α = 0 i f α > 0 .
The sequence ( L n ) n : If α > 0 the sequence is increasing. The extreme points of the polygonal line L n are ( k n , s k s n ) ; 1 k n . Here s k = 1 α + 2 α + + k α . Notice that k α + 1 + α α + 1 < s k < ( k + 1 ) α + 1 1 α + 1 hence k α + 1 + α ( n + 1 ) α + 1 1 < s k s n < ( k + 1 ) α + 1 1 n α + 1 + α . Let 0 < p < 1 and ( k n ) n be a sequence such that k n n p . Then ( k n n ) α + 1 + α n α + 1 ( 1 + 1 n ) α + 1 1 n α + 1 < s k n s n < ( k n + 1 n ) α + 1 1 n α + 1 1 α + 1 + α n α + 1 . Passing to limit as n we obtain s k n s n p α + 1 . Thus, L n ( k n n ) p α + 1 . By Lemma 1, we get that L n ( p ) p α + 1 . Similar considerations may be applied in the other cases; the result is that l i m n L n ( p ) = { 0 i f α 1 1 ( 1 p ) α + 1 i f 1 < α 0 p α + 1 i f α > 0 . Notice that if α 1 , the limit is a degenerate Lorenz curve.
Example 7.
a = ( 1 , 0 , 0 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , ( 2 4   t i m e s ) , 0 , 0 , ( 2 5   t i m e s ) , . ) .
Let N = 1 + 2 + + 2 n 1 = 2 n 1 . If n = 2 m is even, O n ( a ) = ( 0 , 0 ( 2 3 ( 2 n 1 ) t i m e s ) , 1 , 1 , ( 1 3 ( 2 n 1 ) t i m e s   ) ) and if n = 2 m + 1 is odd, O n ( a ) = ( 0 , 0 ( 1 3 ( 2 n 1 ) t i m e s ) , 1 , 1 , ( 2 3 ( 2 n 1 ) t i m e s   ) ) . In the first case, F n = ( 0 1 2 3 1 3 ) and in the second one it is F n = ( 0 1 1 3 2 3 ) . For other types of n we obtain intermediate values. The pre-Lorenz curves move between ( p 2 3 ) + and ( p 1 3 ) + , and the Lorenz ones move between 3 ( p 2 3 ) + and 2 ( p 1 3 ) + . The limits do not exist.
Example 8.
A generic example of injective sequences a with the property that F n ( a ) U ( 0 , 1 ) . Let ( A k ) k be a sequence of positive numbers. Suppose that A k > 1 is increasing and l i m k A k = . Let σ ( 0 ) = 0 , σ ( k ) = [ A 1 ] + [ A 2 ] + + [ A k ] for k     1 . ( [ A ] means the integer part of A ). Suppose, moreover, that A k + 1 A k 1 and that the ratio A m A k is not a rational number. Any positive integer n can be written uniquely as n = σ ( k ) + j for some k = k ( n ) and j = j ( n ) { 0 , , [ A k + 1 ] 1 } . For this n , define a n = j + 1 A k + 1 .
Then, all the elements of a are different, F n ( a ) U ( 0 , 1 ) and L n ( a ) ( p ) p 2 = L U ( 0 , 1 ) ( p ) .
Proof. 
The first assertion is obvious: if a m = a n for some m     n , then i A k = j A m for some i ,   j ,   k ,   m positive integers, contradicting the fact that A m A k is not a rational number. For the second one, we have to check that f ( a 1 ) + f ( a 2 ) + + f ( a n ) n 0 1 f ( x ) d x for any f :     bounded and continuous. (Proposition 7). Let us denote by U A the distribution i = 1 [ A ] δ i A [ A ] . Then   f d U A = i = 1 [ A ] f ( i A ) [ A ] . It is easy to see that l i m A   f d U A = 0 1 f ( x ) d x for any bounded continuous f (even more general, for every Riemann integrable f on [ 0 , 1 ] ).
Let now k     1 , j < [ A k + 1 ] and n = σ ( k ) + j .
Then, f ( a 1 ) + f ( a 2 ) + + f ( a n ) n = [ A 1 ]   f d U A 1 + [ A 2 ]   f d U A 2 + + [ A k ]   f d U A k + f ( a σ ( k ) + 1 ) + + f ( a σ ( k ) + j ) n . Suppose that f     0 . Then, as σ ( k ) n σ ( k + 1 ) , we have the evaluation
i = 1 k [ A i ]   f d U A i σ ( k + 1 ) j = 1 n f ( a j ) n i = 1 k + 1 [ A i ]   f d U A i σ ( k )
Apply Stolz-Cesaro lemma: as k , the right-hand term has the same limit as [ A k + 2 ]   f d U A k + 2 [ A k + 1 ] , and the left hand one has the same limit as [ A k + 1 ]   f d U A k + 1 [ A k + 2 ] . But both limits are the same, namely, 0 1 f ( x ) d x . This ends the proof if f     0 . If not, write f = f + f and that is all. Thus, F n ( a ) U ( 0 , 1 ) Λ n ( p ) p 2 2 . To prove the last assertion, we have to check that the expectations e ( F n ( a ) ) converge to the expectation of U ( 0 , 1 ) , i.e., converge to 1 2 . The reasoning is the same as before. The expectation of U A is e A : = [ A + 1 ] 2 A hence i = 1 k [ A i ] e A i σ ( k + 1 ) j = 1 n a j n i = 1 k + 1 [ A i ] e A i σ ( k ) converges to 1 2 by the same argument of Stolz-Cesaro.

4. The Empirical Lorenz and Pre-Lorenz Curves in the Broken Stick Models

Here, we suppose that the stick is not broken at random but using a deterministic sequence of points. What happens when n is great? These sticks are called spacings or lags. We shall be interested in the existence of limit for spacings of a .
Definition 2.
Let a = ( a n ) n be a sequence of reals. For any n     2 we sort the first n terms of a as a 1 : n a 2 : n a n , n . For 1     j     n 1 let b j , n = a ( j + 1 ) : n a j : n . These are the spacings of a at level n .
The vector D n ( a ) = ( b 1 , n , , b n 1 , n ) 1 j n 1 is the vector of spacings after n trials.
The probability measure G n = G n ( a ) = j = 1 n 1 δ b j , n n 1 is the empirical distribution of spacings after n   trials.
Remark 6.
Let m n ( a ) = a 1 : n = m i n ( a 1 , , a n ) and M n ( a ) = a n : n = m a x ( a 1 , , a n ) . We may think that the interval ( m n ( a ) , M n ( a ) ) is a stick which is broken by points a j : n , 2     j     n 1 . The n 1 segments obtained after this operation are the spacings. If two points coincide, the segment will be a degenerated one.
From the definition, we see that the Lorenz curve of spacings is invariant with respect to scalings.
Proposition 15.
L G n ( a ) = L G n ( α a + β ) for any α > 0 , β .
Proof. 
The spacings which correspond to ( α , a n + β ) n are α b j , n = α ( a j + 1 : n a j : n ) .
Proposition 16.
If the sequence a is bounded, then G n δ 0 and e ( G n ) 0 as n .
Proof. 
The second assertion is obvious: e ( G n ) = j = 1 n 1 b j , n n 1 = M n m n n 1 0 as n . To check the first one, let f C b ( ) . Then   f d G n + 1 ( a ) = f ( b 1 ) + f ( b 2 ) + f ( b n ) n with b j = a ( j + 1 ) ( n + 1 ) a j : ( n + 1 ) . Suppose that m     a n     M for some m < M and for every n . Then b 1 + b 2 + b n n M m n 0 as n . Let ε > 0 be fixed and let δ   =   δ ( ε ) such that | x | < δ | f ( x ) f ( 0 ) | < ε . Notice that k n | { j n : b j δ } | is always smaller that ( M m ) δ hence k n n 0 .
Let A = inf f , B = sup f . Then f ( b 1 ) + f ( b 2 ) + f ( b n ) n = j : b j < δ f ( b j ) + j : b j δ f ( b j ) n ( n k n ) ( f ( 0 ) + ε ) + B k n n . And, similarly, f ( b 1 ) + f ( b 2 ) + f ( b n ) n ( n k n ) ( f ( 0 ) ε ) A k n n . Thus, ( n k n ) ( f ( 0 ) ε ) A k n n   f d G n + 1 ( a ) ( n k n ) ( f ( 0 ) + ε ) B k n n . Passing to limit as n we get f ( 0 ) ε l i m i n f n   f d G n + 1 ( a ) l i m s u p n   f d G n + 1 ( a ) f ( 0 ) + ε . It means that G n δ 0 as claimed.
Remark 7.
That is why the distributions G n are not very informative if we want to study the asymptotic behavior of the broken stick. We normalize them in such a way that their expectation be equal to 1, replacing them with G n * = 1 n 1 j = 1 n 1 δ ( n 1 ) b j , n ,         M n m n . Now, e ( G n * ) = 1 for any n . The Lorenz curve remains the same, but we may hope that the weak limit of G n * does exist and it is not trivial. Moreover, Λ G n * = L G n .
Definition 3.
G n * ( a ) ) is called the normalized empirical distribution of the spacings generated by a .
Example 9.
  a = ( 1 , 0 , 1 , 0 , 1 , 0 , . ) ; as O n ( a ) = ( 0 , , 0 , 1 , , 1 ) the spacings are ( 0 , , 0 , 1 , 0 , 0 ) ( n 2 times 0 and once a 1). Thus, Λ G n 0 as n . As G n * = ( 0 n 1 n 2 n 1 1 n 1 ) δ 0 , Λ G n * = L G n 0 too as n . Recall that F n ( a ) ( 0 1 1 2 1 2 ) .
Example 10.
a is a periodic sequence: a = ( x 1 , , x k ,   x 1 , , x k ,   ) with x 1 x 2 x k ;
As O n ( a ) = ( x 1 , , x 1 , x 2 , , x 2 , , x r , , x r , x r + 1 , , x r + 1 , , x k , , x k ) we see that the spacings are ( 0 , , 0 , x 2 x 1 , 0 , , 0 , x 3 x 2 , 0 , , 0 , , x k x k + 1 , 0 , 0 ) ( n k times 0 and once the differences x j + 1 x j . If n = k m , then:
G n * = ( 0 ( n 1 ) x 2 x 1 x k x 1 ( n 1 ) x 3 x 2 x k x 1 ( n 1 ) x k x k 1 x k x 1 n k n 1 1 n 1 1 n 1 1 n 1 ) . Thus, Λ G n * = L G n 0 .
Example 11.
a n = n α . If α > 1 , the spacings are ( 2 α 1 ,   3 α 2 α , , n α ( n 1 ) α ) hence,
G n * = ( ( n 1 ) 2 α 1 n α 1 ( n 1 ) 3 α 2 α n α 1 ( n 1 ) 4 α 3 α n α 1 ( n 1 ) n α ( n 1 ) α n α 1 1 n 1 1 n 1 1 n 1 1 n 1 ) . Then, Λ G n * is the polygonal line given by the points ( k n 1 , k α 1 n α 1 ) k . If k n = [ n p ] , these points converge to ( p , p α ) . According to Lemma 1, Λ G n * ( p ) = L G n ( p ) p α as p .
If α = 1 , the spacings are ( 1 , 1 , , 1 ) . G n = δ 1 , Λ G n ( p ) = L G n ( p ) = p .
If α ( 0 , 1 ) , the spacings ( 2 α 1 ,   3 α 2 α , , n α ( n 1 ) α ) . As they are decreasing:
G n * = ( ( n 1 ) n α ( n 1 ) α n α 1 ( n 1 ) ( n 1 ) α ( n 2 ) α n α 1 ( n 1 ) ( n 2 ) α ( n 3 ) α n α 1 ( n 1 ) 2 α 1 n α 1 1 n 1 1 n 1 1 n 1 1 n 1 ) . Then, Λ G n * is the polygonal line given by the points ( k n 1 , n α ( n k ) α n α 1 ) k . If k n = [ n p ] , these points converge to ( p , 1 ( 1 p ) α ) . According to Lemma 1, Λ G n * ( p ) = L G n ( p ) 1 ( 1 p ) α as p .
If α = 0 , The spacings are ( 0 , 0 , , 0 ) and G n = δ 0 . The Lorenz curve makes no sense.
If α < 0 the sequence a is convergent, then L G n 0   according to Proposition 17 below.
a = ( 1 , 0 , 0 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , ( 2 4   t i m e s ) , 0 , 0 , ( 2 5   t i m e s ) , . ) . The spacings are the same as in Example 1. Λ G n * = L G n 0 but F n ( a ) has no limit.
A first question is: if we know that the sequence ( F n * ( a ) ) n has a limit, does it result that the sequence G n * ( a ) has a limit? First, we should notice that there is no connection between the limit of F n * and the limit of G n * .
Here is an extension of Proposition 16: sometimes not only G n ( a ) δ 0 , but G n * ( a ) δ 0 , too. In addition, it is also possible that F n ( a ) has no limit even if G n ( a ) has one.
Proposition 17.
Let a = ( a n ) n be bounded. Suppose that it has only a discrete set of limit points. Then G n * ( a ) δ 0 , L G n ( a ) 0 . It is possible that L F n ( a ) and Λ F n ( a ) have no limit.
Proof. 
A discrete set is at most countable. Let A   = { a n : n 1 } . By hypothesis, the set of limit points B   =   Closure ( A ) \ A is at most countable hence Closure ( A ) is at most countable, too. Let a   = inf ( a ) and b   = sup ( a ) .
The set D   = ( a , b ) \ Closure ( A ) is open, hence it can be written as D = j J I j , where I j = ( α j , β j ) are open intervals with endpoints in the set A . Let μ j = β j α j and μ = ( μ n ) n . As A ,   B are null sets, j J μ j = b a , hence the series is convergent. According to Proposition 8, L n ( μ ) 0 as n . If we relabel the sequence a in such a way that the sequence (µn)n is non-increasing, we see that L G n ( a ) = L n ( μ ) 0 . Otherwise written, lim n L G n * ( a ) ( p ) = 0 .
It is easy to find cases when L G n has a limit but L F n does not have one, it is enough to consider Example 11 from above: it has the same spacings as the one from Example 9, thus L G n 0 but L F n has no limit.
Remark 8.
A consequence of Proposition 17 is that if the sequence a is bounded, a condition in order that G n * have a limit different than δ0, is that λ ( C l o s u r e { a n : n 1 } ) > 0 . The sequence should be dense in a set of positive Lebesgue measure. Here λ is the Lebesgue measure. For instance, if a n = s i n   n , then ( C l { a n : n 1 } ) = [ 1 , 1 ] . (For this very sequence we are not able to decide if G n * ( a ) has a limit. The computer says it does not—if you can believe it).
The converse holds too: it is possible that F n * ( a ) has a limit but G n * ( a ) does not.
Proposition 18.
Let ε = ( ε n ) n [ 0 , 1 ] and a = ( 1 ,   1 + ε 1 , 2 ,   2 + ε 2 , 3 ,   3 + ε 3 , ) .
Then F n * U n i f o r m ( 0 , 2 ) , but it is possible that G n * ( a ) has no limit.
Proof. 
Let = ( 1 ,   2 ,   3 ,   ) , c = ( ε 1 ,   ε 2 , ε 3 , ) .
Then Λ F n * ( b ) ( p ) = L F n ( b ) ( p ) p 2 = Λ Uniform ( 0 , 2 ) ( p ) .
According to proposition 10, the fact that Λ F n * ( a ) Λ Uniform ( 0 , 2 ) implies that F n * Uniform ( 0 , 2 ) . The sequence c = b + ε has the same property: F n * ( c ) converges to Uniform (0,2), too: this is a consequence of Proposition 10. Notice that a = b & c c. As a is non-decreasing too, Proposition 11 says that F n * ( a ) U n i f o r m ( 0 , 2 ) .
The spacings are easy to compute:
If n = 2 m is even then D n ( a ) = ( ε 1 ,   1 ε 1 ,   ε 2 ,   1 ε 2 , , ε m 1 ,   1 ε m 1 ,   ε m ) and
If n = 2 m + 1 , D n ( a ) = ( ε 1 ,   1 ε 1 ,   ε 2 ,   1 ε 2 , , ε m ,   1 ε m ) .
Their sum S n is equally to [ n 2 ] if n is odd or to n 2 1 + ε n 2 if n is even.
If, for example, n = 2 m then G n * ( a ) = k = 1 2 m ( δ 2 ε k + δ 2 2 ε k ) 2 m and the limit of this sequence has no reasons to exist.
Remark 9.
Sometimes it is not very difficult to compute G n * ( a ) . If ε n { α , β } , where 0 < α < β < 1 β < 1 α < 1 and n = 2 m + 1 , then G n * = ( 2 α 2 β 2 ( 1 β ) 2 ( 1 α ) i m m j m m j m m i m m ) (if n = 2 m is slightly different). Here, we denoted i m = | { j m : ε j = α } | and j n = m N ( α ) . Thus, G n * is a distribution of the form G n * = ( 2 α 2 β 2 ( 1 β ) 2 ( 1 α ) p n q n q n p n ) , where p n + q n = 1 2 . If the frequencies ( p n ) n have a limit as n , then G n * has a limit, too, therefore L G n * has a limit too. But that is by no means the rule.
Remark 10.
If G = ( 2 α 2 β 2 ( 1 β ) 2 ( 1 α ) p q q p ) with a < b < 1 2 and p + q = 1 2 , then L G is the polygonal line given by the points ( 0 , 0 ) , ( p , 2 a p ) , ( 1 2 ,   2 a p   +   2 b q ) , ( 1     p   , 2 ( a p + q ) , (1,1). Its Gini coefficient (i.e., Gini ( G ) = 1 2 0 1 L G ( x ) d x ) is equal to 1 2 b     4 p ( b     a ) 4 p 2 ( b a )   lies between 1 2 b and 1 2 a . We think that the distributions G n given by Remark 9 have always the property that Gini ( G n * ) 1 2   . For instance, if ε n = α < 1 2 is constant, then Gini ( G n * ) = 1 2 α 2 .
Combining
Proposition 19.
Suppose that the sequence a has the property that l i m G n * ( a ) = G and l i m L n ( a ) = L . Then, l i m G n * ( a & a ) = 1 2 ( δ 0 + G ) and l i m L n ( a & a ) ( p ) = L ( ( 2 p 1 ) + ) .
Proof. 
Let n = 2 m + 1 .
Then O n ( a & a ) = ( a 1 : m + 1 ,   a 1 : m + 1 ,   a 2 : m + 1 ,   a 2 : m + 1 ,   ,   a m + 1 : m + 1 ,   a m : m , a m : m + 1 ) and the spacings are D n ( a & a ) = ( 0 , d 1 , 0 , d 2 , , 0 , d m ) , where d j = a j + 1 ,   m + 1 a j . m + 1 . Then G n * ( a ) = m δ 0 + j = 1 m δ 2 m d j Δ m 2 m where Δ m = a m + 1 ,   m + 1 a 1 ,   m + 1 .
Mixture. 
We can say something about a special case of mixture.
Proposition 20.
Let T 0 < T 1 < < T k and I j = [ T j 1 , T j ) . Let a be a sequence valued into [ T 0 , T k ) = j = 1 k I j having the property that inf a = T 0 and s u p a = T k . For n     2 let n j = | { i n : a i I j } | . Suppose that the limits p j = l i m n n j n do exist and p j > 0 for all 1     j     k .
Let A j = { n : a n I j } = { i j , 1 < i j , 2 } and let the sequences b j be defined by b j , n = a i j , n . Suppose that G n * ( b j ) H j , where H j are some probability distributions on [ 0 ,   ) [0, ∞) and e ( H j ) = 1 .
For any fixed n 2 , let d i , n = a i + 1 : n + 1 a i : n + 1 be the n spacings corresponding to ( a 1 , , a n + 1 ) .
Suppose, finally, that there are no gaps: i n f { a n : n A j } = T j 1 and s u p { a n : n A j } = T j 1 . Then
(i) G n * ( a ) j = 1 n p j H j ( h π j / p j ) 1 with π j = T j T j 1 T k T 0 .
(ii) The Lorenz curves L n ( a ) converge to the Lorenz curve of j = 1 n p j H j ( h π j / p j ) 1 .
(iii) If all the probability distributions H j coincide with the same H then the Lorenz curves L n ( a ) converge to the Lorenz curve of j = 1 n p j H ( h π j / p j ) 1 which is always below L H . In words, the mixture of sequences of the same type always exhibits more inequality.
Proof. 
(i) Let f :   [ 0 , ) R be continuous and with compact support. We have to compute the limit of i = 1 n f ( n d i , n a n + 1 : n + 1 a 1 : n + 1 ) n . As inf a = T 0 , sup a = T k , this limit is the same with the limit of i = 1 n f ( n d i , n T ) n   for T = T k T 0 – if it exists.
There are two kinds of spacings: spacings d i , n for which both a i + 1 , n + 1 and a i , n + 1 are in the same interval I j . Call them spacings of type j and denote by B j , n their set. If n is great enough, | B j , n | = n j 1 .
The other kind of spacings are those d i , n for which a i , n + 1 I j but a i + 1 , n + 1 I j + 1 for some 1     j k 1 . The number of spacings of the second kind is finite—actually, if n is great enough, it must be equal to k 1 , due to the fact of our hypothesis that all p j are positive. That is why we can neglect them in computing the limit. Then:
l i m n i = 1 n f ( n d i , n T ) n = l i m n j = 1 k i B j , n f ( n d i , n T ) n = j = 1 k l i m n n j n i B j , n f ( n ( T j T j 1 ) n j T n j d i , n T j T j 1 ) n j . Notice that n j n p j , n ( T j T j 1 ) n j T π j p j and i B j , n f ( n j d i , n T j T j 1 ) n j   f d H j
As f is uniformly continuous:
i B j , n f ( n ( T j T j 1 ) n j T n j d i , n T j T j 1 ) n j   f ( π j p j x ) d H j ( x )
.
Therefore, l i m n i = 1 n ( n d t , n T ) n = j = 1 k   p j f ( π j p j x ) d H j ( x ) as claimed.
(ii) Notice that e ( j = 1 n p j H j ( h π j / p j ) 1 ) = 1 (since e ( H j ) = 1 for all j ), but in that case Λ = L and apply Proposition 3.
(iii) If f is convex, then:
  f d ( j = 1 k p j H h π j / p j ) 1 =   j = 1 k p j f ( π j p j x ) d H ( x )   f ( j = 1 k p j π j p j x ) d H ( x ) =   f d H .
Remark 11.
Here we supposed that there are no gaps: the spacings tend to 0. What happens if we accept some gaps?
Proposition 21.
Finite number of gaps. Let a be a bounded sequence, α = i n f a , β = s u p a . Suppose that C l { a n : n 1 } = I 1   I 2     I k where I j = [ α j β j ] with α = α 1 < β 1 < α 2 < β 2 < < α k < β k = β .
Let n j = | i n : a i I j | . Suppose that the limits p j = l i m n n j n do exist and p j > 0 for all 1     j     k . Let A j = { n : a n I j } = { i j , 1 < i j , 2 < } and let the sequences b j be defined by b j , n = a i j , n . Suppose that G n * ( b j ) H j , where H j are some probability distributions on [ 0 ,   ) . For any fixed n     2 , let d i , n = a i + 1 : n + 1 a i : n + 1 be the n spacings corresponding to ( a 1 , , a n + 1 ) .
Let π j = β j α j β α Then G n * ( a ) j = 1 n p j H j ( h π j / p j ) 1 .
The Lorenz curves   L n ( a ) converge to k L j = 1 k p j H j ( h π j / p j ) 1 where k = j = 1 k β j α j β α < 1 . The limit is a degenerate Lorenz curve.
Proof. 
The same proof, the difference is that now j = 1 k π j < 1 . Notice that e ( j = 1 n p j H j ( h π j / p j ) 1 ) = j = 1 k π j . Apply Proposition 5.
Conclusion: Gaps or no gaps, the Lorenz curve of a mixture of sequences of the same type is below the mother Lorenz curve.

5. The Empirical Lorenz Curves in the Stochastic Broken Stick Models

Now suppose that a stick of length 1 is broken at random into n smaller sticks by some i.i.d. random variables X 1 , , X n + 1 . Empirical evidence shows that the empiric distribution function of normed spacings always has a limit.
We shall prove that in some situations.
Precisely, let X = ( X n ) n 0 be i.i.d. random variables having a common distribution F . Suppose that they are integrable and 0 < μ = E X j < . Let a = essinf   X j , b = esssup   X j , and D 1 , n = X 1 : n a , D 2 , n = X 2 : n X 1 : n , , D n , n = X n : n X n 1 : n . If b < , we could also add the ( n + 1 ) -th spacing, namely, D n + 1 , n = b X n : n .
The Glivenko theorem says that the empirical distribution F n ( X ) = j = 1 n δ X j n converges weakly to F almost surely. More than that, the distribution functions F n x ( x ) = | { j n : X j x } | n converge almost surely uniformly to F ( x ) .
We prove that even the normed empirical functions defined by F n * ( X ) = k = 1 n δ n S n X k n converge almost surely to the distribution of X k μ :
Lemma 2.
Let Fn and F be distribution functions and ( α n ) n be a sequence of non-negative numbers. Suppose that F n F and α n   α . Then F n ( α n x ) F ( α x ) at all continuity points of F .
Proof. 
Let ε > 0 and n ε such that n > n ε α ε < α n < α + ε . Then F n ( ( α ε ) x ) F n ( α n x ) F n ( ( α + ε ) x ) . Suppose that F is continuous at α x , ( α ε ) x and ( α + ε ) x (this is no problem, the discontinuity set of points for F is at most countable). Then F ( ( α ε ) x ) liminf n   F n ( α n x ) liminf n   F n ( α n x ) F n ( ( α + ε ) x ) . As ε is arbitrary small, the result immediately follows.
Remark 12.
Another version of Glivenko’s theorem is in terms of order statistics. Note that F n ( x ) = k n if and only if X k : n x < X k + 1 : n . Let p ( 0 , 1 ) and let ( k n ) n be a sequence of positive integers with the property that k n n p (for instance k n = [ n p ] ). Then F n ( X k n : n ) = k n n p X k n : n F 1 ( p ) almost surely for all p for which F 1 ( x ) is unique. As a special case, X [ n p ] F 1 ( p ) a.s. for all p ( 0 , 1 ) if F is bijective.
Proposition 22.
Let X = ( X n ) n be i.i.d. non-negative F-distributed random variables.
Let F n * ( X ) = k = 1 n δ n S n X k n . Then F n * ( X ) F ( h 1 μ ) 1 almost surely, where μ = E X j > 0 . In other words, they converge to the distribution of X j μ . Moreover, the empirical Lorenz curves L n ( X ) converge almost surely to the Lorenz curve of F , denoted by L F .
Proof. 
According to the strong law of large numbers, the sequence α n = S n n converges almost surely to μ . Let F n ( x ) be the empirical distribution function of X . The distribution function of k = 1 n δ n S n X k n is F n ( α n x ) . We know that F n ( x ) F ( x ) almost surely at all continuity points of F and α n μ almost surely. Then, by Lemma 2 we see that F n ( α n x ) F ( μ x ) almost surely at all discontinuity points of G ( x ) = F ( μ x ) . The distributions F n * ( X ) were constructed in such a way that e ( F n * ( X ) ) = 1 . By Proposition 4, their Lorenz curves converge to the Lorenz curve of F , denoted by L F .
The problem of convergence of G n * to some limit distribution is more delicate. We believe that the almost sure weak limit does always exist, but we are not able to prove it. Besides, we can prove some weaker convergence results.
Definition 4.
Let F be a distribution on real line. We say that F has the property (D) if the weak limit of G n * ( X ) = k = 2 n δ n X n * X * , n D k , n n 1 does exist in probability for any sequence X of i.i.d. F-distributed random variables. It means that the empirical distribution function G n * ( X ) ( x ) = | { k n : n X n * X * , n D k , n x } | n 1 —which are random variables—converge in probability to some distribution function G ( x ) al all continuity points of G . In that case we say that F is the mother distribution and that G is the born distribution. Call the quantities D k , n X n * X * , n the normed spacings.
Proposition 23.
If F has the property ( D ) and h ( x ) = a x + b , then F h 1 has the same property. Moreover, F and F h 1 bear the same G .
Proof. 
Obvious. If Υ j = α X j + β , then the normed spacings are the same for X and Υ .  
The first positive result is that all discrete probability distributions have the property (D). All of them bear δ 0 .
Proposition 24.
Suppose F = k = 1 p k δ a k is discrete. Then G n * ( X ) δ 0 in probability.
Thus, any discrete distribution on the real line has property ( D ) : it gives birth to the Dirac distribution δ 0 .
Proof. 
Let X = ( X n ) n 1 be a sequence of F -distributed i.i.d. random variables, let ε > 0 be arbitrary small, and let k such that p 1 + + p k > 1 ε . For any n     1 let n j = | { j n : X n = a j } | . By the law of large numbers, n j n p j . At least n 1 1 + n 2 1 + + n k 1 spacings will be equal to 0. Thus, G n * ( { 0 } ) j = 1 k n j k n j = 1 k p j > 1 ε . It means that l i m n G n * ( { 0 } ) = 1 limn Gn*({0}) = 1. That obviously implies G n * ( X ) δ 0 in probability.
Remark 13.
All the random variables produced by computer using pseudo random numbers are discrete. Using the computer to guess a limit can be misleading many times.
The second positive result is that F = Uniform (0,1) has the property (D)—it gives birth to the standard exponential distribution, denoted by Exp (1). According to Proposition 25 below this will hold for any F = Uniform ( a , b ) . We shall use the following result:
Lemma 3.
Let X = ( X n ) n and Υ = ( Υ n ) n be two sequences of random variables. Suppose that X n and Υ n have the same distributions. If X n c (a.s.) for some constant c , then Υ n c (in probability).
Proof. 
P ( | Υ n c | > ε ) = P ( | X n c | > ε )   0 .  
Remark 14.
Unfortunately, it is not true that if we know that X n 0 (a.s.) and Υ n ~ X n Υ n 0 a.s. A simple counterexample on ( 0 , 1 ) is if:
Υ = ( 1 ( 0 , 1 2 ) , 1 ( 1 2 , 1 ) , 1 ( 0 , 1 3 ) , 1 ( 1 3 , 2 3 ) , 1 ( 2 3 , 1 ) , 1 ( 0 , 1 4 ) , 1 ( 1 4 , 2 4 ) , 1 ( 2 4 , 3 4 ) , 1 ( 3 4 , 1 ) , 1 ( 0 , 1 5 ) , ) and
X = ( 1 ( 0 , 1 2 ) , 1 ( 0 , 1 2 ) , 1 ( 0 , 1 3 ) , 1 ( 0 , 1 3 ) , 1 ( 0 , 1 3 ) , 1 ( 0 , 1 4 ) , 1 ( 0 , 1 4 ) , 1 ( 0 , 1 4 ) , 1 ( 0 , 1 4 ) , 1 ( 0 , 1 5 ) , ) .
And it is not true if the limit is not a constant.
Proposition 25.
If F = Uniform ( a , b ) , then F   has the property (D). It gives birth to the standard exponential distribution Exp (1).
As a consequence, the sequence of Lorenz Curves L n ( X ) converge in probability to L ( t ) = t + ( 1     t ) ln ( 1 t ) .
Proof. 
Let ( Ω ,   Κ ,   Ρ ) be a probability space. According to Proposition 23, we can put a = 0 , b = 1 . Let X = ( X n ) n 1 be a sequence of i.i.d. random variables uniformly distributed, let O n ( X ) = ( X 1 : n ,   X 2 : n , , X n : n ) , and D n + 1 ( X ) = ( D j , n + 1 ) 0 j n be the n + 1 spacings D j , n + 1 = X j + 1 , n X j , n , with the convention that X n + 1 , n = 1 .
It is well known that D n ( X ) ~ Uniform ( P n + 1 ) , where P n = { x + n : x 1 + + x n = 1 }   is the unitary simplex—see any book of order statistics, for instance, References [10,11,12,17].
Let, on the other hand, ξ = ( ξ n ) n 1 be a sequence of i.i.d. random variable distributed as Exp (1). Let S n = ξ 1 + + ξ n .. It is also known (see References [10,11,12,17]) that ( ξ 1 S n , ξ 2 S n , , ξ n S n ) is also uniformly distributed on P n . Thus, D n ( X ) = D ( ξ 1 S n , ξ 2 S n , , ξ n S n ) for any n     2 . Therefore, n D n ( X ) has the same distribution as ( n ξ 1 S n , n ξ 2 S n , , n ξ n S n ) . Consequently, the Lorenz curves of G n * ( X ) = j = 1 n δ n D j , n n and F n * ( ξ ) j = 1 n δ n ξ j S n n have the same distribution. According to Glivenko’s theorem, the empirical distribution functions F n + 1 ( ξ ) ( x ) = j = 1 n + 1 1 { ξ j x } n + 1 converge almost surely uniformly to the distribution function of the Exp distribution, namely, to ( x ) = 1 e x . Otherwise written, the empirical distributions F n = j = 1 n δ ξ j n + 1 converge almost surely to F Exp   ( 1 ) Let Ω 1 = { ω Ω : j = 1 n δ ξ j ( ω ) n F } and Ω 2 = { ω Ω : j = 1 n ξ j ( ω ) n 1 } . According to Glivenko’s theorem and to the strong law of large numbers, P ( Ω 1 ) = P ( Ω 2 ) = 1 . Thus, if ω Ω 1   Ω 2 the sequence of empirical distributions F n * ( ξ ( ω ) ) = j = 1 n δ n ξ j ( ω ) ξ 1 ( ω ) + ξ 2 ( ω ) + + ξ n ( ω ) n weakly converge to Exp (1). All these distribution have the same expectation, namely, 1, hence for almost all ω Ω , the limit of the Lorenz curves L F n * ( ξ ( ω ) ) ( t ) is the same, namely, the Lorenz curve of Exp (1) which is L ( t ) = t + ( 1     t ) ln ( 1 t ) . The curves L G n * ( X ) ( t )   have the same distribution with L F n * ( ξ ( ω ) ) ( t ) ; therefore, they have the same limit in probability—namely, L ( t ) .
Remark 15.
The result has been known for some time and can be proved in several ways. For instance, Stephens [14,15] gives an alternate proof of the fact that the limit of the Lorenz curves in the broken stick model is L ( t ) = t + ( 1     t ) ln ( 1 t ) .
Proposition 26.
The exponential distribution has the property (D), too: it gives birth to δ0.
Proof. 
Let X = ( X n ) n 1 be iid standard exponentially distributed random variables. The density is π ( x ) = e x 1 ( 0 , ) ( x ) . The reader can check that the density of the order statistics O n ( X ) is p ( x 1 , , x n ) = n ! e ( x 1 + x 2 + + x n ) 1 { x 1 < x 2 < < x n } and the density of D n ( X ) is q ( x 1 , , x n ) = n ! e ( n x 1 + ( n 1 ) x 2 + + x n ) 1 [ 0 , ) n ( x ) .
Thus, the distribution of D n ( X ) is j = 1 n Exp ( n + 1 j ) , otherwise written D n ( X ) = D ( ξ 1 n , ξ 2 n 1 , , ξ n 1 ) ,   where ( ξ j ) 1 j n are standard independent exponentially distributed.
The normalized empirical distribution is G n * ( X ) = j = 1 n δ n D j , n X n * n , (since j = 1 n D j , n = X n * = max ( X 1 , , X n ) ) and its empirical tail is G n ¯ ( x ) = j = 1 n 1 { n D j , n X n * > x } n . We shall prove that G n ¯ ( x ) 0 in probability for any x > 0 and that will mean that G n * ( X ) δ 0 .
The idea is to prove that G n ¯ ( x ) 0 in L 1 . As these random variables are positive, the only thing we have to do is to check that E G n ¯ ( x ) 0 . But E G n ¯ ( x ) = 1 n j = 1 n P ( n D j , n X n * > x ) = 1 n j = 1 n P ( n ξ j j X n * > x ) = 1 n j = 1 n P ( ξ j > j x n X n * ) .
As the random variables ξ j are i.i.d. and the function φ ( x ) = max ( x 1 , , x n )   is symmetrical, the probabilities P ( ξ j > j x n X n * ) coincide with P ( ξ 1 > j x n X n * ) .
Let ψ ( t ) = P ( ξ 1 > t X n * ) . Then we have E G n ¯ ( x ) = 1 n j = 1 n P ( ξ 1 > j x n X n * ) = 1 n j = 1 n ψ ( j x n ) .
Notice that ψ is decreasing therefore the sum 1 n j = 1 n ψ ( j x n ) is smaller than 0 1 ψ ( t x ) d t . It follows that x > 0 E G n ¯ ( x ) 0 1 P ( ξ 1 > t x X n * ) d t . But the sequence ( X n * ) n is non-decreasing and tends to infinity almost surely. It follows that P ( ξ 1 > t x X n * ) 0 E G n ¯ ( x ) 0 as n G n ¯ ( x ) 0 in L 1 G n ¯ ( x ) 0 in probability.
Mixtures. 
The stochastic equivalents of Propositions 20 and 21.
Proposition 27.
f :   [ a , b ) ( 0 , ) be a positive density and let F be its corresponding distribution. Next, let a = T 0 < T 1 < < T k = b be a division of [ a , b ) and I j = [ T j 1 , T j ] , 1     j     k . Let F j be the probability distributions on Ij given by the densities f j = f 1 [ T j 1 , T j ) F ( T j ) F ( T j 1 ) . Suppose that all the probability distributions F j have the property (D) and that F j bears H j . Then F has the property (D).
(i) 
G n * ( X ) converges weakly in probability to j = 1 n p j H j ( h π j / p j ) 1 with π j = λ j b a , λ j = T j T j 1 and p i = F ( T j ) F ( T j 1 ) .
(ii) 
The Lorenz curves L n ( X ) converge to the Lorenz curve of j = 1 n p j H j ( h π j / p j ) 1 .
If all the probability distributions H j coincide with the same H , then the Lorenz curves L n ( a ) converge to the Lorenz curve of j = 1 n p j H ( h π j / p j ) 1 which is always below L H .
In other words, the mixture of sequences of the same type always exhibits more inequality.
Proof. 
Let X = ( X n ) n be a sequence of i.i.d F-distributed random variables. Let A j = { n : X n I j } = { i j , 1 < i j , 2 < i j , 3 ; } and n j = | A j   { 1 ,   2 ,   ,   n } | . From the law of large numbers, n j n p j F ( T j ) F ( T j 1 ) > 0 almost surely. Define Υ j , m = X n if X n I j and n = i j , m .
The sequences Υ j = ( Υ j , m ) m are again i.i.d but their distribution is F j . We know that F j bears H j . Meaning that i = 1 n j g ( n j λ j Δ i , n j ( j ) ) n j   P     g d H j for any g continuous with compact support. Here Δ i , n ( j ) are the spacings of Υ j and n j . We want to find the limit in probability for i = 1 n g ( n b a D i , n ) n , where D i , n = X i + 1 : n X i : n , 1     i     n 1 are the n 1 spacings of X .
Let B j , n = { i n : i , i + 1 A j , n } . Then | B j , n | = n j 1 and i B j , n D i , n = Δ m , n j ( j ) for some m < n j . All but k spacings are of one of these k types. Then, as 1 b a = π j λ j , we can write l i m i = 1 n g ( n b a D i , n ) n = l i m j = 1 k n j n i = 1 n j 1 g ( n π j n j n j Δ i , n j ( j ) λ j ) n j .
When n , n j almost surely and n π j n j π j p j almost surely. Then, as g is uniformly continuous and has a compact support, i = 1 n j 1 g ( n π j n j n j Δ i , n j ( j ) λ j ) n j   P     g ( π j p j x ) d H j ( x ) . The remaining claims have the same proof as in Proposition 21.
Corollary 2.
Let a = T 0 < T 1 < < T k = b , If F has the density f = j = 1 k α j 1 [ T j , T j + 1 ) then F has the property ( D ) . It bears the distribution H = j = 1 k α j λ j e x p ( ( b a ) α j ) which dominate always Exp (1) in Lorenz order. Here, λ j = T j T j 1 . Therefore, the Lorenz curves L n ( X ) converge in probability to the Lorenz curve of H .
Proof. 
Let λ j = T j T j 1 . If f is a density, then j = 0 k α j λ j   must be equal to 1. Suppose that α j > 0 for all 1     j     n . Apply Proposition 27. The densities are f 1 = j = 0 k 1 λ j 1 [ T j , T j + 1 ) and p j = α j λ j . All H j are the same, by Proposition 25, namely, H = exp   ( 1 ) .
Next, π j p j = 1 ( b a ) α j hence j = 1 n p j H ( h π j / p j ) 1 = j = 1 n α j λ j exp ( ( b a ) α j ) .
Remark 16.
What happens if we have gaps? Then we apply Proposition 21. The weak limit remains the same, but the Lorenz curves converge to a defective one. Precisely:
Corollary 3.
Suppose the mother distribution has the density F = j = 1 k α j 1 [ s j , t j ) , where s 1 < t 1 s 2 < t 2 s k < t k . Let λ j = t j s j , L = t k s 1 , q = λ 1 + + λ k L . Then, F has the property ( D ) and bears the same H = j = 1 k α j λ j e x p ( ( b a ) α j ) , but the Lorenz curves L n ( X ) converge in probability to q L H which can be defective if t j < s j 1 for some j .
Remark 17.
One may see that the order of the intervals I j and of the possible gaps [ t j , s j 1 ) is not important. What matters is their lengths. For instance the probability distributions F 1 , F 2 having the densities f 1 = α 1 [ 1 , 3 ) + β 1 [ 4 , 5 ) and f 2 = β 1 [ 0 , 1 ) + α 1 [ 2 , 4 ) with α , β > 0 and 2 α + β > 0 bear the same probability distribution H = 2 α e x p ( 4 α ) + β e x p ( 4 β ) . However, the empirical Lorenz curves L n ( X ) converge to the defective curve L * = 3 4 L H . Here λ 1 = 2 , λ 2 = 1 , π 1 = 1 2 , π 2 = 1 4 .
The Broken Rectangle.
Suppose that the rectangle R = [ a , a + L ) × [ b , b + l ) is broken by a sequence of points z = ( x n , y n ) n 1 which belong to R as follows: for any n     1 we sort increasingly the components ( x j ) 1 j n and ( y j ) 1 j n obtaining ( x j : n ) 1 j n and ( y j : n ) 1 j n . Then add the endpoints x 0 : n = a , x n + 1 : n = a + L , y 0 : n = b , y n + 1 : n = b + l and construct the rectangle D i . j = [ x i : n , x i + 1 : n ) × [ y j : n , y j + 1 : n ) for 0     i , j     n . Its area is σ i , j = ( x i + 1 : n x i : n ) ( y i + 1 : n y i : n ) . Then consider the normalized empirical distributions G n * ( z ) = 0 i , j n δ ( n + 1 ) 2 L l σ i , j ( n + 1 ) 2 (all of them have the expectation equal to 1).
Do they have a weak limit?
Proposition 28.
Let X = ( X n ) n 1 and Y = ( Y n ) n 1 be two independent sequences of i.i.d random variables. Let F X be the distribution of X n and F Υ be the distribution of Υ n . and Z = ( X n , Υ n ) n 1 1. Suppose both F X and F Υ have the property ( D ) , namely F X gives birth to H X and F Υ gives birth to H Υ . Let ξ X and ξ Υ be two random variables distributed as H X and H Υ and H be the distribution of ξ X ξ Υ . Let D n ( X ) = ( X i + 1 : n + 1 X i : n + 1 ) 1 i n ( U 1 , U n ) , D n ( Υ ) = ( Υ j + 1 : n + 1 Υ j : n + 1 ) 1 j n ( V 1 , V n ) . Thus, the assumption is that G n * ( X ) H X X and G n * ( Υ ) H Υ in probability. Finally, let G n * ( Z ) = 1 i , j n δ n 2 U i V j ( M n + 1 ( X ) m n + 1 ( X ) ) ( M n + 1 ( Y ) m n + 1 ( Y ) ) n 2 be the empirical distribution of the normalized spacings produced by Z . Here M n ( X ) = max ( X 1 , , X n ) , m n ( X ) = min ( X 1 , , X n ) a.s.o. Then
(i) 
G n ( Z ) H in probability and the Lorenz curves of ( U i V j ) 1 i , j n tend to L ( p ) = E ξ X E ξ Y L ξ X ξ Y ( p ) .
(ii) 
If E ξ X = E ξ Υ = 1 , the limit is the Lorenz curve L ( p ) = L ξ X ξ Y ( p ) .
(iii) 
As a particular case, if F X = F Υ = Uniform ( 0 , 1 ) , then ξ X and ξ Υ are exponentially distributed hence H has the tail H ¯ ( x ) = 0 e ( y + x y ) d y . The empirical Lorenz curves converge in probability to the Lorenz curve of H .
(iv) 
L ξ X ξ Y m i n ( L H X , L H Y ) hence always L m i n ( L H X , L H Y ) .
In the particular case (iii), L ( p )   p + ( 1 p ) ln ( 1 p ) .
Proof. 
(i) We know that G n * ( X ) H X and G n * ( Y ) H Y in probability. It means that   g d G n * ( X )   P     g d H X and   g d G n * ( Y )   P     g d H Y for any bounded continuous g . We claim that G n * ( X ) G n * ( Y ) H X H Y in probability. The easy way is to use the equivalence “ X n   P   X if and only if from any subsequence ( X k n ) n one may extract a sub-sub-sequence ( X k σ ( n ) ) n such that X k σ ( n )   a . s .   X ”.
In our case, from any subsequence ( k 1 < k 2 < ) we can extract a sub-sub-sequence G k σ ( n ) * ( X ) H X , G k σ ( n ) * ( Y ) H Y almost surely. In that case, G k σ ( n ) * ( X ) G k σ ( n ) * ( Y ) H X H Y almost surely, because one knows that if F n ,   G n , F , G are probability distributions, then F n F , G n G implies F n G n F G (prove follows immediately using characteristic functions).
In our case:
G n ( X ) = 1 i n δ n U i ( M n + 1 ( X ) m n + 1 ( X ) ) n ,
G n ( Y ) 1 j n δ n V i ( M n + 1 ( Y ) m n + 1 ( Y ) ) n , G n ( X ) G n ( Y ) = 1 i , j n δ ( n U i M n + 1 ( X ) m n + 1 ( X ) , n V j M n + 1 ( Y ) m n + 1 ( Y ) ) n 2 . It follows that
1 i , j n g ( n U i M n + 1 ( X ) m n + 1 ( X ) , n V j M n + 1 ( Y ) m n + 1 ( Y ) ) n 2   P     g d H X H Y for any g : 2 bounded continuous.
As a particular case, we set g ( x ,   y ) = h ( x y ) with h   :   bounded continuous and the result is that 1 i , j n h ( n 2 U i V j ( M n + 1 ( X ) m n + 1 ( X ) ) ( M n + 1 ( Y ) m n + 1 ( Y ) ) ) n 2   P     h ( x y ) d H X H Y ( x , y ) = E h ( ξ X ξ Y ) .
The remaining (i) is a consequence of Proposition 5.
(iii) If F X = F Υ = Uniform (0,1), then H X = H Υ = Exp (1). If ξ X , ξ Υ are independent and exponentially distributed, then P ( ξ X ξ Υ > x ) = E ( P ( ξ Υ > x   /   ξ X | ξ X ) ) = E ( e x ξ X ) = 0 e ( y + x y ) d y .
(iv) If X , Y are two non-negative independent random variables, E X = a > 0 , E Υ = b > 0 , then L X Υ min ( L X , L Υ ) . Indeed, it is enough to consider a = b = 1 . If u :   [ 0 ,   )   is convex, then E u ( X Υ ) = E ( E u ( X Υ ) | Υ ) E u ( E ( X Υ ) | Υ ) (Jensen’s inequality) = E u ( Υ E ( X | Υ ) ) = E u ( Υ E X ) = E u ( Υ ) .
Also, in the same way, E u ( X Υ ) E u ( X ) .
Remark 18.
Otherwise written, the stick broken at random exhibits always smaller inequality than the rectangle broken at random.
Generalization is easy. If, instead of a square we have the hypercube [ 0 , 1 ] k , then the empirical Lorenz curves tend, in the uniform case, to the Lorenz curve of ξ 1 , , ξ k where ( ξ j ) 1 j k are i.i.d. standard exponentially distributed random variables.
Remark 19.
Gini index. It is interesting that we can compute the Gini index for the broken rectangle, even if we do not know its distribution analytically. It is known (see [3,27,28]) that the Gini index can be computed as G i n i ( L X ) = 1 E m i n ( X , Y ) E X , where Υ is an independent copy of X .
If X ~ Exp (1), then m i n ( X , Υ ) ~ Exp (2), hence it is obvious that in the uniform broken stick model the limit of empirical Gini is 1 2 .
In the broken square model, the limit distribution of spacings has the tail H ¯ ( x ) = 0 e ( y + x y ) d y . If X has the distribution H and Υ is an independent copy of X , then the tail of m i n ( X , Υ ) is H ¯ 2 ( x ) . As E ξ = 1 , we get G i n i ( L X ) = 1 E m i n ( X , Υ ) = 1 0 ( 0 e ( y + x y ) d y ) 2 d x = 1 0 0 0 e ( y + x y ) e ( z + x z ) d z d y d x = 1 0 0 0 e ( y + z + x ( 1 y + 1 z ) ) d x d z d y = 1 0 0 y z e ( y + z ) y + z d z d y = 1 0 y y ( u y ) e u u d u d y = 1 0 ( y ( e y y y e u u d u ) ) d y = 0 y y 2 e u u d u d y = 0 ( 0 u y 2 e u u d y ) d u = 0 e u u ( 0 u y 2 d y ) d u = 0 u 2 e u 3 d u = 2 3 .
Therefore, in the broken square model the limit for the Gini index is 2 3 . It is empirically confirmed by the computer.

6. Conclusions and Open Problems

We have proved that in many cases, the empirical distributions of the normed spacings generated in the process of breaking the stick, denoted by G n * ( X ) , have a limit and that the limit of their associated Lorenz curves L n ( X ) lies below the Lorenz curve of the exponential distribution. Thus, they are more inegalitarian that the exponential distribution. As a byproduct, the limit of their Gini coefficient is always greater than 1 2 . As a principle, no random division of a whole to many individuals according to the principle of broken stick—be it land, wealth or income—can produce a Gini index less than 0.5.
We were not able to fix some problems such as:
  • Prove or disprove that if X = ( X n ) are i.i.d., then G n * ( X ) always has a weak limit ALMOST SURELY;
  • Prove or disprove that if essinf X n = or ess sup X n = , then G n * ( X ) δ 0 ALMOST SURELY.
  • Let a n = { n α } , where α Q . Is it possible to find a formula for weak l i m n G n * ( ( a n ) n ) ?
  • Let a = ( 0 , 1 , 0 , 1 2 , 1 , 0 , 1 3 , 2 3 , 1 , 0 , 1 4 , 2 4 , 3 4 , 1 , 0 , 1 5 , 2 5 , 3 5 , 4 5 , 1 , 0 , 1 6 , 2 6 , ) . Find l i m n   F n * ( a ) , l i m n   G n * ( a ) ;
  • The same question for a = ( 0 , 1 , 1 2 , 1 3 , 2 3 , 1 4 , 3 4 , 1 5 , 2 5 , 3 5 , 4 5 , 1 6 , 5 6 , 1 7 , 2 7 , 3 7 , 4 7 , 5 7 , 6 7 , 1 8 , 3 8 , 5 8 , 7 8 , 1 9 , 2 9 , ) ;
  • Rearrangement. Empirical evidence suggests that if a mother distribution F has density f and g is another density having the property that λ ( { f > t } ) = λ ( { g > t } ) ) for every t > 0 , then G is also a mother distribution and bears the same H as F . For instance, f 1 ( x ) = 2 x 1 ( 0 , 1 ) ( x ) , f 2 ( x ) = 2 ( 1 x ) 1 ( 0 , 1 ) ( x ) and f 3 ( x ) = 4 min ( x ,   1 x ) , bear the same H . Or, in terms of random variables: Let U = ( U n ) n and V = ( V n ) n be two independent sequences of i.i.d. uniform (0,1), distributed random variables. Then, W 1 = max ( U ,   V ) , W 2 = min ( U ,   V ) , W 3 = 1 2 ( U + V ) have the same limit distribution of the spacings. Prove or disprove that.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

We prove some (maybe well-known) facts about convex functions, quantiles, and for other technicalities for which we were not able to find appropriate references and also for the sake of self-containment.
Convexity
Lemma A1.
Let f : ( a ,   b )   be convex. Then
(i) 
its right derivative f r ( x ) = l i m δ 0 f ( x + δ ) f ( x ) δ is non-decreasing and right continuous.
(ii) 
its left derivative f l ( x ) = l i m δ 0 f ( x ) f ( x δ ) δ is non-decreasing and left continuous.
(iii) 
moreover, f r ( x ) = f l ( x + 0 ) for any x   ( a , b ) .
(iv) 
the two derivatives coincide with the possible exception of a (at most) countable set.
Proof. 
The convexity of f is equivalent with the chord inequality. For any a < x 1 < x 2 < x 3 < b one has f ( x 2 ) f ( x 1 ) x 2 x 1 f ( x 3 ) f ( x 1 ) x 3 x 1 f ( x 3 ) f ( x 2 ) x 3 x 2   with the obvious consequence:
a < x 1 < x 2 < b f r ( x 1 ) f ( x 2 ) f ( x 1 ) x 2 x 1 f l ( x 2 )
These two derivatives are non-decreasing, therefore all the lateral limits f r ( x + 0 ) , f r ( x 0 ) , f l ( x + 0 ) and f l ( x 0 ) do exist. Keeping x 1 fixed in ( A 1 ) and letting x 2 x 1 one gets f r ( x 1 ) f l ( x + 0 ) ; if one keeps x2 fixed and lets x 1 x 2 , one gets f r ( x 2 0 ) f l ( x 2 ) . Thus,
f r ( x 0 ) f l ( x ) f t ( x ) f l ( x + 0 )   for   any   a < x < b
In order to prove the continuities, we remark that for any a < x 1 < x 2 < b , there exist y 1 , y 2 such that:
a < x 1 < y 1 < y 2 < x 2 < b   and   f ( y 1 ) f ( x 1 ) y 1 x 1 f ( y 2 ) f ( y 1 ) y 2 y 1 f ( x 2 ) f ( x 1 ) x 2 x 1
Indeed, if f ( y ) f ( x 1 ) y x 1 = f ( x 2 ) f ( x 1 ) x 2 x 1 for some ( x 1 , x 2 ) , then f is affine on ( x 1 , x 2 ) , hence any y 1 , y 2 satisfy (A3). If not, choose y 2 ( x 1 , x 2 ) arbitrarily such that f ( y 2 ) f ( x 1 ) y 2 x 1 < f ( x 2 ) f ( x 1 ) x 2 x 1 . As f is continuous, there must exist y 1 ( x 1 , y 2 ) such that f ( y 2 ) f ( y 1 ) y 2 y 1 < f ( x 2 ) f ( x 1 ) x 2 x 1 . Then, by chord inequality:
f r ( x 1 ) f ( y 1 ) f ( x 1 ) y 1 x 1 f ( y 2 ) f ( y 1 ) y 2 y 1 f ( x 2 ) f ( x 1 ) x 2 x 1 f l ( x 2 )
If in (A4) we let y 2 x 1 , then y 1 x 1 1 too, and we get f r ( x 1 )   f r ( x 1 + 0 ) f ( x 2 ) f ( x 1 ) x 2 x 1 f l ( x 2 ) . If we let now x2↓x1, we find that f r ( x 1 + 0 ) f r ( x 1 ) f l ( x 1 + 0 ) .Thus, f r is right continuous and, moreover, the inequality f r ( x + 0 ) f l ( x + 0 ) may occur only if f r ( x ) = f l ( x + 0 ) . This equality obviously implies the fact that f r is right continuous: if we replace any function with its limits at right (provided that they do exist) we get a right-continuous function. The fact that f l is left continuous can be proved in a similar way. Finally, both f l and f r are non- decreasing, hence they are continuous with the possible exception of some at most countable set. The equality f r ( x ) = f l ( x + 0 ) implies that f r and f l coincide at continuity points.
Lemma A2.
Let f : [ a , b ) be convex and continuous at a. Let λ = f r be the right derivative. Then λ is Riemann integrable on any interval [ a . x ] with a < x < b and, moreover:
f ( x ) = f ( a ) + a x λ ( t ) d t
Proof. 
Let a < x 1 < x 2 < < x n = x be a division of [ a , x ] . As f ( x ) f ( a ) = k = 1 n f ( x k ) f ( x k 1 ) x k x k 1 ( x k x k 1 ) we have, according to (A1) the evaluation:
k = 1 n f r ( x k 1 ) ( x k x k 1 ) f ( x ) f ( a ) k = 1 n f l ( x k ) ( x k x k 1 )
Both f r and f l are bounded on [ a , x ] and continuous with the possible exception of a countable set. Therefore they are Riemann integrable and, passing to the limit when the norm of the division tends to 0 we get a x f r ( t ) d t f ( x ) f ( a ) a x f l ( t ) d t . The claimed equality results from the fact that f r f l . □
The convex functions have a property that it is not true in general: the convergence of functions implies the almost everywhere convergence of the derivatives. Precisely
Lemma A3.
Let f n , f : [ a , b ] be convex and continuous. Suppose that f n f . Then
(i)
( f n ) r ( x ) f r ( x ) at any point x for which f l ( x ) = f r ( x ) .
(ii)
If, moreover, f n and f are non-decreasing, then the convergence of ( f n ) r to f r holds in L 1 ( a , b ) , too.
Proof. 
(i)
Let x ( a ,   b ) . Let δ > 0 be such that a < x δ < x + δ < b , then, according to (A1)
f ( x ) f ( x δ ) δ f l ( x ) f r ( x ) f ( x + δ ) f ( x ) δ . It follows that:
f n ( x ) f n ( x δ ) δ f ( x + δ ) f ( x ) δ ( f n ) r ( x ) f r ( x ) f n ( x + δ ) f n ( x ) δ f ( x ) f ( x δ ) δ .
Letting n , we get:
f ( x ) f ( x δ ) δ f ( x + δ ) f ( x ) δ l i m i n f n ( ( f n ) r ( x ) f r ( x ) ) l i m s u p n ( ( f n ) r ( x ) f r ( x ) ) f ( x + δ ) f ( x ) δ f ( x ) f ( x δ ) δ which further implies:
l i m s u p n | ( f n ) r ( x ) f r ( x ) | f ( x + δ ) f ( x ) δ f ( x ) f ( x δ ) δ
Now let (A6) assume that δ 0 . It follows that
l i m s u p n | ( f n ) r ( x ) f r ( x ) | f r ( x ) f l ( x )
(ii)
If we take into account Lemma A1 (iv), it follows that ( f n ) r ( x ) f r ( x ) for x [ a , b ] with the possible exception of a countable set. As any countable set is a null set for the Lebesgue measure, we can write that ( f n ) r f r almost everywhere (a.e.). From (A5) we have that a b ( f n ) r ( x ) d x = f n ( b ) f n ( a ) , a b ( f ) r ( x ) d x = f ( b ) f ( a ) . To prove the convergence in L 1 , notice that | x | = x + = x + | x | 2 | x | = 2 x + x . We write
a b | ( f ) r ( x ) ( f n ) r ( x ) | d x = 2 a b ( ( f ) r ( x ) ( f n ) r ( x ) ) + d x a b ( ( f ) r ( x ) ( f n ) r ( x ) ) d x
The second integral is equal to ( f ( b ) f n ( b ) ) ( f ( b ) f n ( b ) ) tends to 0 as n as f n f . The second one tends to 0 too because of the domination ( ( f ) r ( f n ) r ) + ( f ) r L 1 ( a , b ) . Thus, we prove the convergence in L 1 .
Now we know that the right derivative of a convex function Λ : [ 0 , 1 ) + with the property that Λ ( 0 ) = 0 is a mapping which is non-negative, right continuous, non-decreasing and, moreover, Λ has the representation.
Λ ( x ) = 0 x λ ( t ) d t   for   all   x ( 0 , 1 )
We claim that λ is the superior quantile of some distribution function F : [ 0 , ) [ 0 , 1 ] .
Quantiles.
Definition 5.
Let F : be non-decreasing and not a constant. Let m = F ( ) , M = F ( ) . A quantile of F is any function Q : ( m ,   M ) with the property that Q ( p ) = x F ( x 0 ) p F ( x + 0 ) .
The mappings Q + ( p ) = s u p F 1 ( ( , p ] ) , ( Q ( p ) = i n f F 1 ( [ p , ) ) ) are called the superior (inferior) quantiles of F . Sometimes one denotes the superior quantile by F 1 .
Lemma A4.
(i). If F is one to one, then the quantile is unique. Moreover, if Q 1 and Q 2 are two quantiles of F , then they coincide with the possible exception of a countable set.
(ii). Q ( p 0 ) < Q ( p + 0 ) if and only if the level set I F ( p ) = { x : F ( x ) = p } is a non-degenerate interval. Otherwise written, the continuity points of Q are those points for which either the level set I F ( p ) is void or it is a singleton. As a consequence, the unique quantile of a one to one non-decreasing function F is continuous.
(iii). The superior quantile is always right—continuous. Moreover, the only quantile which is right continuous is the superior one.
(iv). If λ : ( m , M ) is non-decreasing and right continuous, then there exists F : [ m ,   M ] which is non-decreasing and right—continuous such that λ is the superior quantile of F . As a particular case, if λ : ( 0 , 1 ) is non-decreasing and right continuous then there exists a distribution function F such that λ = F 1 .
Proof. 
(i) Suppose that Q 1 and Q 2 are two quantiles of F and that there exists some p such that Q 1 ( p ) = x 1 and Q 2 ( p ) = x 2 . Thus F ( x 1 0 ) p F ( x 1 + 0 ) , F ( x 2 0 ) p F ( x 2 + 0 ) . Suppose x 1 < x 2 . It follows that F ( x 1 0 ) p F ( x 1 + 0 ) F ( x 2 0 ) p F ( x 2 + 0 ) F ( x 1 + 0 ) = F ( x 2 0 ) = p ( x 1 , x 2 ) I F ( p ) . Thus Q 1 = Q 2 . But the set { p ( 0 , 1 ) :   I F ( p ) contains an open interval} is always at most countable.
(ii) Suppose that I F ( p ) contains the interval ( x 1 , x 2 ) and x 1 < x 2 . We claim that Q ( p 0 ) x 1 and Q ( p + 0 ) x 2 . Indeed, let ε , δ be small enough such that x 1 < x 1 + ε < x 2 δ < x 2 . Let p < p and x = Q ( p ) . Then x x 1 + ε . Indeed, we know that F ( x 0 ) p < F ( x + 0 ) . If x > x 1 + ε , then F ( x 0 ) F ( x 1 + ε 0 )   and that is absurd p F ( x 0 ) p . Thus p < p Q ( p ) x 1 + ε for any ε > 0 . It means that Q ( p 0 ) x 1 . In the same way one proves chat p > p Q ( p ) x 2 δ for any δ > 0 hence ( p + 0 ) x 2 .
On the other hand, let us suppose that p ( m , M ) has the property that I ( p ) has at most one point. Then there exists x such that x < x F ( x + 0 ) < p and x > x F ( x 0 ) > p . We claim that Q ( p 0 ) = Q ( p + 0 ) = x . Indeed, suppose that Q ( p 0 ) = x < x or that Q ( p + 0 ) = x > x . Suppose that we are in the first situation. Let p n < p be a sequence such that p n p . Let x n = Q ( p n ) F ( x n 0 ) p n F ( x n + 0 ) . As x n x we see that F ( x n 0 ) p n   F ( x 0 ) < p for all n . This is absurd, since p n p . We are forced to accept that x = x . In the same way one checks that x = x .
(iii) The superior quantile is Q ( p ) = sup { x : F ( x ) p } . If { x : F ( x ) = p } is at most a singleton, Q is continuous at p and we have nothing to prove. If not, then Q ( p ) = sup { x : F ( x ) = p } hence Q ( p ) = x F ( x 0 ) = F ( x ) = p . Let p n > p be a sequence such that p n p and let ( p n ) = x n . Let x = lim x n .
If x > x , then F ( x ) < F ( x 0 ) therefore p = F ( x 0 ) < F ( x 0 ) F ( x n 0 ) p n . Passing to the limit we find the absurdity p = F ( x ) < F ( x 0 ) p .
Now suppose that Q is a quantile for F   and it is right continuous. The discontinuities of Q   are in those points p where the level set I F ( p ) contains non-degenerate intervals. This set is at most countable. For those points, let α ( p ) = inf I F ( p ) < β ( p ) = sup I F ( p ) . Then we claim that Q ( p + 0 ) = β ( p ) . Indeed, it is clear that Q ( p + 0 ) β ( p ) . If, ad absurdum, Q ( p + 0 ) > β ( p ) , then there exists x * > β ( p ) such that Q ( p ) x * for any p > p . Then p = F ( β ( p ) ) < F ( x * 0 ) F ( Q ( p ) 0 ) p for all p > p . This is an absurdity: if p < p < F ( x * 0 ) then p cannot be greater than F ( x * 0 ) . It follows that Q ( p + 0 ) = β ( p ) . If Q is right continuous, then Q ( p ) = Q ( p + 0 ) = β ( p ) . But this is precisely the superior quantile, F 1 .
(iv) Define F ( x ) = { m i f   x λ ( m + 0 ) s u p { p : λ ( p ) x } i f   m > λ ( p ) M i f   λ ( M 0 ) x . Then F is right continuous. It will be enough to prove that λ is a quantile for F , since the only quantile which is right continuous is the superior one. We have to check that λ ( p ) = x F ( x 0 ) p F ( x ) . The level set { q : λ ( q ) = x } is an interval with endpoints p 1 and p 2 . Then F ( x ) = p 2 p , F ( x 0 ) = p 1 p hence F ( x 0 ) p F ( x ) . It follows that λ is the superior quantile for F .
The connection between the weak convergence of the probability measures on the real line and quantiles is given by following result.
Lemma A5.
Let F n , F : [ 0 , 1 ] be distribution functions. Then F n ( x ) F ( x ) for all x which are continuity points of F if and only if F n 1 ( p ) F 1 ( p ) for p ( 0 , 1 ) with the possible exception of a countable set.
Proof. 
Let Γ = { x : F n ( x ) F ( x ) } . The complement of Γ is at most countable hence, Γ is a dense subset of . For x in Γ we have the equivalences
F ( x ) p k   m = m ( k )     such   that   F m + n ( x ) p + 1 k   n  
F ( x ) p k   m = m ( k )     such   that   F m + n ( x ) p 1 k   n  
Here, m ,   n ,   k are positive integers
Or
k = 1 m = 1 n = 1 { F m + n p + 1 k }   Γ = k = 1 l i m i n f k { F n p + 1 k }   Γ
{ F     p }     Γ = k = 1 m = 1 n = 1 { F m + n p 1 k }   Γ = k = 1 l i m i n f k { F n p 1 k }   Γ
As Γ is dense, s u p ( { F     p }     Γ ) = s u p ( { F     p } ) and i n f ( { F     p }     Γ ) = i n f ( { F     p } ) .
Let Q + = sup { F p } , Q n + = sup { F n p } , Q = inf { F p } , and Q n = inf { F n p } , be the superior (inferior) quantiles of F and F n . Taking the supremum in (A9) and infimum in (A10) we get Q + ( p ) = l i m k ( l i m i n f n Q n + ) ( p + 1 k ) ( l i m i n f n Q n + ) ( p + 0 ) .
Q ( p ) = l i m k ( l i m s u p n Q n ) ( p 1 k ) ( l i m s u p n Q n ) ( p 0 )
Since all the four functions from (A11)—namely, Q + ,   Q , ( l i m i n f n Q n + ) ,   ( l i m s u p n Q n ) , are nondecreasing, they are all continuous with the possible exception of a set which is at most countable.
Moreover, excepting another at most countable set, Q + ( p ) = Q ( p ) and Q n + ( p ) = Q n ( p ) . Put all these exception sets in a set N which is at most countable. It follows that if p N we have the inequalities:
( l i m i n f n Q n + ) ( p ) = ( l i m i n f n Q n + ) ( p + 0 ) Q + ( p ) , ( l i m s u p n Q n ) ( p ) = ( l i m s u p n Q n ) ( p 0 ) Q ( p ) which further implies ( l i m i n f n Q n + ) ( p ) Q + ( p ) = Q ( p ) ( l i m s u p n Q n ) ( p ) otherwise written F n 1 ( p )     F 1   ( p ) for all p ( 0 , 1 ) /   ( M   N ) .
Conversely, suppose that F n 1 ( p )     F 1   ( p ) for p Γ with Γ ( 0 , 1 ) dense. According to Lemma A4(iv) F n is the superior quantile of F n 1 and F is the superior quantile of F 1 . The proof goes in the same way as before.
Remark A1.
Actually, one can be more precise: F n 1 ( p )     F 1   ( p ) for all p which are continuity points for F 1 (see Proposition 5 [30] (p. 250) or, for more general cases, [7,21,31]).

References

  1. MacArthur, R.H. On the relative abundance of bird species. Proc. Natl. Acad. Sci. USA 1957, 43, 293–295. [Google Scholar] [PubMed] [Green Version]
  2. Cohen, J.E. Alternate derivation of a species-abundance relation. Am. Nat. 1968, 102, 165–172. [Google Scholar] [CrossRef]
  3. Aly, E.A.A.; Beirlant, J.; Horvath, L. Strong and weak approximation of k-spacings processes. Z. Wahrschein. Gebiete 1984, 66, 461–484. [Google Scholar]
  4. Bansal, P.; Arora, S.; Mahajan, K.K. Estimates of Inequality Indices Based on Simple Random, Ranked Set, and Systematic Sampling. ISRN Probab. Stat. 2013. [Google Scholar] [CrossRef] [Green Version]
  5. Beirlant, J. Strong approximations of the empirical and quantile processes of uniform spacings. Limit Theoremes Probab. Stat. 1984, 1, 77–89. [Google Scholar]
  6. Barton, D.E.; David, F.N. Tests for randomness of points on a line. Biometrika 1956, 43, 104–112. [Google Scholar]
  7. Csörgö, M.; Yu, H. Weak Approximations for Empirical Lorenz Curves and Their Goldie Inverses of Stationary Observations. Adv. Appl. Probab. 1999, 31, 698–719. [Google Scholar]
  8. Cucala, L. Le Cam spacings theorem in dimension two. arXiv 2005, arXiv:Math/0507367. [Google Scholar]
  9. Durbin, J. Kolmogorov-Smirnov tests when parameters are estimated with applications to tests of exponentiality and tests on spacings. Biometrika 1975, 62, 5–22. [Google Scholar]
  10. Le Cam, L. Un theoreme sur la division d’une intervalle par des points pres au hazard. Publ. Inst. Statist. Univ. Paris 1958, 7, 7–16. [Google Scholar]
  11. Pyke, R. Spacings. J. Roy. Statist. Sot. Ser. B 1965, 27, 395–449. [Google Scholar]
  12. Pyke, R. Spacings revisited. Sixth Berkeley Symp. Math. Statist. Probab. 1972, 1, 417–427. [Google Scholar]
  13. Rao, J.S.; Sethuraman, J. Weak convergence of empirical distribution functions of random variables subject to perturbations and scale factors. Ann. Stat. 1975, 3, 299–313. [Google Scholar] [CrossRef]
  14. Stephens, M.A. Tests for the Uniform Distribution. In Goodness-of-Fit Techniques; D’Agostino, R., Stephens, M.A., Eds.; Marcel Dekker: New York, NY, USA, 1986. [Google Scholar]
  15. Stephens, C.J. Towards Understanding the Lorenz Curve Using the Uniform distribution. In Proceedings of the Gini-Lorenz Conference, Siena, Italy, 23–26 May 2005; University of Siena: Siena, Italy, 2005. [Google Scholar]
  16. Tung, D.T.; Jamalamadaka, S.R. U statistics based on spacings. J. Stat. Plann. Infer. 2012, 142, 673–684. [Google Scholar]
  17. Wilks, S.S. Mathematical Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1962. [Google Scholar]
  18. Arnold, B.C. Majorization: Here, there and everywhere. Stat. Sci. 2007, 22, 407–413. [Google Scholar]
  19. Arnold, B.C. Majorization and the Lorenz Order: A Brief Introduction; Springer: New York, NY, USA, 1987. [Google Scholar]
  20. Sarabia, J.M.; Castillo, E.; Slottje, D.J. An ordered family of Lorenz curves. J. Econom. 1999, 91, 43–60. [Google Scholar]
  21. Goldie, C.M. Convergence theorems for empirical Lorenz curves and their inverses. Adv. Appl. Probab. 1977, 9, 765–791. [Google Scholar]
  22. Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications, 2nd ed.; Springer: New York, NY, USA, 2010. [Google Scholar]
  23. Stoyan, D. Qualitative Eigenschaften und Abschatzungen stochastischer Modelle; Academie Verlag: Berlin, Germany, 1977. [Google Scholar]
  24. Shorrocks, A.F. Ranking Income Distributions. Economica 1983, 50, 3–17. [Google Scholar] [CrossRef] [Green Version]
  25. Billingsley, P. Probability and Measure; John Wiley & Sons: Hoboken, NJ, USA, 1995. [Google Scholar]
  26. Billingsley, P. Convergence of Probability Measures; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  27. Arnold, B.C. Lorenz ordering of exponential order statistics. Stat. Probab. Lett. 1984, 11, 485–490. [Google Scholar]
  28. Arnold, B.C. The Lorenz Curve: Evergreen after 100 years. In Advances on Income Inequality and Concentration Measures; Betti, G., Lemmi, A., Eds.; Routledge Frontiers of Political Economy; Routledge: London, UK, 2008; pp. 34–46. [Google Scholar]
  29. Shaked, M.; Shanthikumar, J. Stochastic Orders; Springer: New York, NY, USA, 2007. [Google Scholar]
  30. Fristedt, B.E.; Gray, L.F. A Modern Approach to Probability Theory; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
  31. Ahidar-Coutrix, A.; Berthet, P. Convergence of Multivariate Quantile Surfaces. Available online: https://hal.archives-ouvertes.fr/hal-01754841/document (accessed on 10 March 2020).

Share and Cite

MDPI and ACS Style

Zbăganu, G. Asymptotic Results in Broken Stick Models: The Approach via Lorenz Curves. Mathematics 2020, 8, 625. https://doi.org/10.3390/math8040625

AMA Style

Zbăganu G. Asymptotic Results in Broken Stick Models: The Approach via Lorenz Curves. Mathematics. 2020; 8(4):625. https://doi.org/10.3390/math8040625

Chicago/Turabian Style

Zbăganu, Gheorghiță. 2020. "Asymptotic Results in Broken Stick Models: The Approach via Lorenz Curves" Mathematics 8, no. 4: 625. https://doi.org/10.3390/math8040625

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop