Lorenz Curves, Size Classification, and Dimensions of Bubble Size Distributions

Lorenz curves of bubble size distributions and their Gini coefficients characterize demixing processes. Through a systematic size classification, bubble size histograms are generated and investigated concerning their statistical entropy. It turns out that the temporal development of the entropy is preserved although characteristics of the histograms like number of size classes and modality are remarkably reduced. Examinations by Renyi dimensions show that the bubble size distributions are multifractal and provide information about the underlying structures like self-similarity.


Introduction
It has been shown that the application of majorization [1][2][3][4][5][6] to time series of bubble size distributions leads to very intriguing insights into decaying foam [7][8][9][10][11][12][13][14][15].The decay of liquid foams is comprised of two processes: drainage at the beginning and subsequent ageing.Generating foam by ultrasound treatment enables a separation of the two decay processes.Not only measuring the shrinking foam volume [7][8][9] but also the evaluation of bubble size histograms by classical majorization and its order-preserving functions has revealed a temporal separation of drainage and ageing [11,12].As a first approach ten size classes were chosen for forming bubble size histograms.Then drainage is characterized by classical majorization in time, histograms from the ageing phase are predominantly incomparable.As an order-preserving function of classical majorization the Shannon entropy [16] was used for mapping the histograms onto real values.Since drainage corresponds to a classical majorization in time the Shannon entropy values increase, but the incomparableness during the ageing phase shows a decrease which is accompanied by an irregular behaviour.
In [14] the same foam experiment was used but the area imaged by the camera and the size classes of the bubble size histograms were smaller by a factor of 4 respectively 2. With these conditions a process separation could not be found, instead a monotonously increasing Shannon entropy behaviour was found.Hence, the question arises whether the process separation depends on the image size or on the chosen data binning.In general, if data binning is to be chosen for bubble sizes do there exist criteria for an optimal bin width.Does there exist a certain data binning which leads to other foam characteristics?
In this paper systematic data binning is applied to bubble size distributions which are obtained under the conditions described in [14].In order to get insights into the temporal development of the raw data, concepts are used which go back to Lorenz [17] and Dalton [18] and represent a predecessor of majorization.Originally introduced to measure inequality of income or wealth, Lorenz curves are used to describe the underlying statistical process of bubble size distributions.Then bubble size histograms are generated in which the bin width depends on the smallest element (bubble).The development of both Lorenz curves and histograms of different bin classifications are evaluated by corresponding entropy measures: the Gini coefficient [19] for Lorenz curves and the Shannon entropy [16] for histograms.
This work is organized as follows: first the mathematical background is introduced (Section 2.).In Section 3. Lorenz curves and Gini coefficients of bubble size distributions are generated, followed by systematic size classification for bubble size histograms.The latter leads to both a possibility to find a criterion for an optimal size classification and a further foam characteristic using the concept of Rényi dimensions [21], see Section 4..An optimal size classification is suggested and a simple model depicting the structure of the raw data is discussed in Section 5., followed by a conclusion.

Mathematical Background
In the following, the notations of Lorenz curves and histograms with their entropy measures are introduced.The original example of income distributions of a population is used.

Lorenz Curve and Gini Coefficient
The Lorenz curve [17] is a method to describe inequality in wealth or size.If one maps the cumulative proportion of ordered individuals onto the corresponding cumulative proportion of their size a Lorenz curve will be obtained.Let a population consist of n individuals and let x i be the wealth of individual i, i = 1, • • • , n.The individuals are ordered increasingly, that means from poorest to richest, The polygon connecting these points is the Lorenz curve.If all individuals are of the same size, the Lorenz curve is a straight line from (0, 0) to (1, 1), called the line of equality.If there is any inequality in size, then the Lorenz curve is convex and lies below the line of equality, see Figure 1.As a measure of inequality (and for a Lorenz curve) the Gini coefficient G (or Gini ratio) [19] is used.It is easy to calculate the Gini coefficient which is the ratio between the area enclosed by the line of equality and the Lorenz curve A, and the total triangular area under the line of equality A + B, see Figure 1.Then the Gini coefficient ranges from zero, when all individuals are equal, to one, when every individual except one has the size of zero.

Histogram and Shannon Entropy
For simplicity, a histogram is defined by a number of size classes m (income classes) and a number of individuals n which are distributed over these classes.The number of individuals within each size class gives the frequency f i , ∑ m i=1 f i = n.By normalizing frequencies to unity one obtains relative frequencies p i .A classical measure for histograms is the Shannon entropy [16]

Application to Foam
In this section the foam experiment is introduced and the raw data of bubble sizes are plotted as Lorenz curves.The corresponding Gini coefficients of the curves reproduce the underlying statistical development of the bubble sizes during decay.Furthermore, a systematic data binning is represented leading to time series of bubble size histograms the statistical entropy development of which is investigated.The latter is to clarify the problems which are mentioned in section 1.

Lorenz Curves of Bubble Sizes
A rectangular glass vessel 2.5 cm × 20 cm × 2.5 cm was filled with 20 mL of frothless beer (for this investigation Haake Beck beer was used) at a temperature of 24 ± 1 • C. By ultrasound treatment (Ultrasonik 28x; NEY) the beer was frothed up.A CV-M10 CCD-camera with a telecentric lens JEN metar T M 1/12LD registered the bubbles at the 22 mL mark of the rectangular glass vessel.For illumination, a cold light source KL 2500 LC was used.Images were taken in five-second intervals.The size of the recorded image area was 6.4 mm × 5 mm.Such an image is shown on the left in Figure 2. Number of bubbles n and their sizes (diameter) a i of 52 images were determined.Then each image is defined by a size vector a(t) = (a i (t)).Since the number of bubbles n decreases in time [12,14], the size vectors are defined by a(t) = (a i (t)) ∈ R n where n is the maximum number of bubbles given by the first size vector a(t 0 ) and subsequent size vectors are are filled with zeros.In other words the number of bubbles is held constant but the loss of bubbles is taken into account by allowing the size zero.Note that the normalization of the individuals introduced by Lorenz (Section 2.1.)is not applied.Originally, the concept of Lorenz curves were used to compare different populations and their income distributions without dynamics.Hence, the extension of the size vectors with zeros is necessary in order to take into account the loss and the size development of bubbles.Not only the number of bubbles decreases in time but also the sum of diameters ∑ n i=1 a i (t) have the tendency to decrease in time.In Figure 2 (right) the decrease of the sum of the bubble diameters is shown and illustrated by a smooth variation from cyan (t 0 ) to magenta (t end ).
Firstly, the partial sum vectors of the increasingly sorted size vectors a = (a ∑ k i=1 a i , and A 0 = 0.The temporal development of these curves is illustrated again by the smoothly variation from cyan (t 0 ) to magenta (t end ).
By normalizing the partial sum vectors in Figure 3 (left) the Lorenz curves are obtained.The plot is given by (k, ∑ k i=1 a i , and A 0 = 0, see Figure 3 (middle).Additionally, a blue line is shown which gives the line of equality.
It seems for both plots in Figure 3 (left and middle) that each curve at t + 1 lies below the curve at t which would mean that the development of the size vectors corresponds to a process which continuously develops away from equality (demixing process).But there are a few curves which cross.Despite the crossing curves, the temporal development of the Gini coefficient which gives the distance to the line of equality is monotonous, see Figure 3 (right).
In terms of Lorenz curves and Gini coefficient the temporal development of the bubble sizes can be characterized by a demixing process.It can be expected that the corresponding histograms of the raw data show a similar behaviour.Then the small image size does not enable a process separation to find; in order to prove this a linear scaling concerning the bubble size classes will be introduced in the following.Note that the demixing process of the bubble sizes described by Lorenz curves changes to a mixing process of bubble size histograms where bubble individuals are distributed over size classes.The plot is given by (k, A k ) where k corresponds to the bubble individuals (k = 0, 1, ∑ k i=1 a i is the partial sum of the bubble diameters, and A 0 is set to zero.The temporal development is illustrated by the coloration of the curves from cyan (t 0 ) to magenta (t end ).Middle: The Lorenz curves of the size vectors a.The plot is given by the bubble individuals k = 0, 1, • • • , n and the normalized partial sum vectors A k /A n of the increasingly sorted size vectors a, (k, A k /A n ), with A 0 = 0.The coloring gives the temporal development (cyan = t 0 and magenta = t end ).The line of equality is shown in blue.Right: The temporal development of the Gini coefficient G of the Lorenz curves (plot in the middle).The Gini coefficient increases monotonously in time.

Linear Scaling
As reference, the initial bubble size vector a(t 0 ) with its smallest size, min(a(t 0 )), is set.Then the classes are defined by the left-open and right-closed intervals ((j − 1) min(a(t 0 )), j min(a(t 0 ))] with j = 1, 2, • • • , where j is incremented until all bubble sizes are taken into account.The next step to reduce the number of size classes is to define the classes by multiples of min(a(t 0 )).Then the classes are defined by the intervals (q (j − 1) min(a(t 0 )), q j min(a(t 0 ))] where q is the multiplication or scaling factor which is incremented until all bubble sizes are in one class.This procedure can be illustrated by horizontal bars which depend on q min(a(t 0 )).In Figure 4 the increasingly sorted bubble sizes of the initial bubble size vector are shown with the size classification q = 1, 5, 10 from left to right.
It is easy to see that for q = 40 all bubbles of the initial bubble size vector belong to one class.More exactly, the minimum scaling factor of the first distribution in order to assign all sizes to one class is q = 39.The corresponding histograms of the classification in Figure 4 are given in Figure 5.The increasingly sorted bubble sizes of the initial bubble size vector a(t 0 ).On the left the classification is formed by the multiples of q min(a(t 0 )) with q = 1 and j = 1, • • • , 39; in the middle the classification is set by q = 5, j = 1, • • • , 8, and on the right with q = 10, j = 1, • • • , 4.  The characteristics of the histograms clearly change with increasing q.Particularly, the number of size classes (intervals) and the modality of the distributions are diminished from q = 1 to q = 10.One may expect that there is a great loss of information by reducing the number of size classes which is accompanied by a decreasing modality.Of course, there will be no information, if the size classes are too large such that all sizes of all size vectors fall into one size class.Mapping the normalized histograms onto real values by the Shannon entropy (1) provides information about the underlying process of the temporal development of the histograms depending on q.
In Figure 6 (left) the time development of the Shannon entropy values of the histograms for q = 1 (black plot), q = 5 (red plot), q = 10 (blue plot), and q = 40 (magenta plot) is shown.The latter is a kind of limit since the size classes become too large and the histograms at the beginning are mapped onto zero by the Shannon entropy (1).
It is very interesting that the basic behaviour is preserved although the histograms are strongly reduced.The Shannon entropy values increase in tendency in time which corresponds to a mixing process.The next step is to use q values which are less than one.1) of the histograms for q = 1 (black plot), q = 5 (red plot), q = 10 (blue plot), and q = 40 (magenta plot).Right: The temporal development of the Shannon entropy (1) of the histograms for q = 1 (black plot), q = 0.5 (light blue plot), q = 0.25 (cyan plot), and q = 0.1 (mauve plot).In Figure 6 (right) the temporal development of the Shannon entropy values of the histograms for q = 1 as reference (black plot), q = 0.5 (light blue plot), q = 0.25 (cyan plot), and q = 0.1 (mauve plot) is shown.With decreasing q the Shannon entropy plot changes considerably.Particularly for q = 0.1 the mixing process becomes a demixing process after 15 time units.But setting q = 0.1 is not recommended, since the corresponding bin width is very small (a tenth of the smallest bubble size of the initial distribution).Hence, there is a further restriction for small q.The limits of q can be illustrated by plotting (log 2 (q), I(p)) where p = (p i ) are the relative frequencies of the normalized histograms.The mixing process of the normalized histograms is preserved for a linear behaviour of the plot (log 2 (q), I(p)) or if ∆I/∆log 2 (q) = const., see Figure 7.The time dependence of the histograms is given by smoothly varying from cyan to magenta.
Figure 7. Development of the Shannon entropy of the histograms depending on the logarithmus dualis of the scaling factor q, (log 2 (q), I(p)).One sees that for a certain section the value of log 2 (q) ∆I/∆log 2 (q) remains constant.The coloration of the plot gives the time dependence as before.Particularly, for q values between 8 and 16 (log 2 (q) ∈ [3,4]) the slope for all distributions is approximately the same.These q values are the best selection for the size classification.
As expected the demixing process of Lorenz curves is mapped onto a mixing process of histograms for all size classifications with q ∈ [1,40].A process separation of drainage and ageing as in [11] cannot be found, but a region for best size classification is suggested and can be defined by a common slope, see Figure 7. Consequently, a statistical process separation only occurs for sufficiently large foam images.

Dimensions
From the plot (log 2 (q), I(p)) in Figure 7 new characteristics of bubble size distributions can be derived.The Rényi entropy of order f [20] gives in the limiting case for f → 1 of H f (p) the Shannon entropy, H 1 (p) = I(p).N represents the number of occupied classes.For f = 0, 2 one obtains H 0 (p) = log N = log |p| (logarithm of the cardinality of p) and H 2 = − log ∑ N i=1 p 2 i (correlation entropy).The plot of these entropies H f depending on different scales gives the so-called Rényi dimensions [21]: where s is the scaling factor.Classically, for f = 0, 1, 2 one obtains the capacity (fractal or Hausdorff dimension [22]), the information, and the correlation dimension: For D 0 the number of occupied size classes N depending on s is considered.D 1 is comparable to the plot (log 2 (q), I(p)) in Figure 7.The sum ∑ N (s) i=0 p 2 i gives the probability that two bubbles are in the same size class.The scaling factor s is defined by: s = ⌈max(a(t))/min(a(t 0 ))⌉/q, where the braces indicate the ceiling function.Then the dimensions are generated over the individual data scale of each bubble size distribution.Additionally, the ceiling function is used in order to define a minimum integer value of q as starting point which becomes systematically reduced by the factor s.For example, the first bubble size distribution has the minimum integer value q = 39 where all sizes fall into one size class (see section 3.2.).The second step, s = 2, leads to a q value of 39/2 which defines the size class and so on.The minimum integer q of the second distribution a(t 0 + 1) is 35 for instance.The temporal development of these minimum integer q factors have the tendency to increase, this is shown in Figure 8.
In the beginning, the capacity dimension equals one, which means that with each size reduction of the classes all resulting classes are occupied.For further reductions, this dimension deviates from one which indicates an increase of multimodality.As above mentioned the information dimension gives the gain of information with shrinking size classes.In Figure 7 the loss of information is considered with another scale (log 2 (q)).The high value of the information dimension indicates that with each reduction of the size classes the relative frequencies of these classes are more mixed.This and the meaning of the value of the correlation dimension will be discussed in the following section.

Discussion
For this statistical investigation it is assumed that the development of the bubble sizes is a deterministic process.The initial bubble size distribution determines successive distributions.Hence, size classes are defined on the basis of the first distribution.Additionally, the chosen size classes are equal and closed over the data set contrary to former investigations [7][8][9][10][11][12][13][14][15] where the last size class was open.In the literature several recipes for the computation of the number of size classes and their sizes can be found [23][24][25].But these approaches are either not directly applicable to dynamical systems or already included in this here introduced systematic data binning.
It is surprising that no specific size classification for the optimal evaluation of the foam experiment exists but size limits of size classes can be defined.In Figure 7 one sees that there is a region where with increasing log 2 (q) (the size classes become larger) the loss of information is approximately constant for all distributions.Although salient features of the bubble size histograms vanish with increasing scaling factor q (compare to Figure 5) the temporal development of the Shannon entropy values of the histograms is preserved for a certain region, Figure 6 (left).As reference for the temporal development of the Shannon entropy values the Lorenz curves which are mapped onto the Gini coefficient are used.This concept uses the original data without a classification.
The Rényi dimensions of the bubble size distributions for f = 0, 1, 2 decrease, D 0 > D 1 > D 2 , which indicates that the distributions are multifractal [26].But which insights are gained by this characteristic?Considering the construction of a self-similar distribution by iterated bisecting [27] it can be assumed that the resulting structure can be partially found in the bubble size distributions.If for each bisecting the resulting proportions are p = 0.66 for one size class and 1 − p for the other one, see Figure 10, the dimensions will be D 0 = 1, D 1 = 0.925, D 2 = 0.859.These values are entirely comparable to the dimensions of the bubble size distributions.Especially, the information and the correlation dimension practically coincide within experimental accuracy (D 1 = 0.921 and D 2 = 0.863).
Of course, the bubble size distributions do not obey exactly this construction but it can be supposed that the bubble size distributions are self-similar in certain regions.The construction of the distributions in Figure 10 and the value of the information dimension of the bubble size distributions indicate that the differences of the increasingly sorted bubble sizes a i+1 − a i are quite small.Hence, each size class reduction leads to relative frequencies which are more mixed.For less mixed relative frequencies p > 0.66 the information and correlation dimension become smaller.The dimensions become one for p = 0.5 which leads to equal distributions.The small differences of the bubble sizes also cause the relatively high value of the correlation dimension.The larger the value of the correlation dimension the larger the probability that two bubbles are in the same size class.Since the foam system is small-the number of bubble individuals ranges in time from 1150 to 139-higher Rényi dimensions which correspond to the probability of f -tuples (f ≥ 2) of bubbles with marginally differing sizes are omitted.It would be interesting to investigate larger foam systems in order to generate a sufficiently large multifractal spectrum for the determination of the Lipshitz-Hölder mass exponent [26].Moreover, large foam systems including process separation are ought to be investigated in order to prove the general applicability of the above-mentioned concepts to bubble size distributions.

Conclusions
Both Lorenz curves and systematic data binning of bubble size distributions do not show a statistical process separation of drainage and ageing of the presented foam measurement.Consequently, a sufficiently large foam image size is required to divide the bubble size development into drainage and ageing.The systematic data binning leads to an approach of an optimal bin width selection.Moreover, Rényi dimensions can be calculated and indicate multifractality of bubble size distributions.By means of a simple bisecting model it can be assumed that bubble size distributions are partially self-similar.Both Rényi dimensions and self-similarity point out that the distribution structure is preserved during the whole bubble size development.Only the size range of the distributions changes in time.This range is described by the q factor where the scaling factor s is one.

Figure 1 .
Figure 1.A Lorenz curve (blue) with the line of equality (black).The Gini coefficient is calculated by the ratio of the areas A/(A + B).

Figure 2 .
Figure 2. Left: Bubble image.Right: By smoothly varying from cyan to magenta the time development of the sum of the bubble diameters is shown.

Figure 3 .
Figure 3. Left: The plot of the partial sum vectors of the increasingly sorted size vectors a.The plot is given by (k, A k ) where k corresponds to the bubble individuals (k = 0, 1,• • • , n), A k =∑ k i=1 a i is the partial sum of the bubble diameters, and A 0 is set to zero.The temporal development is illustrated by the coloration of the curves from cyan (t 0 ) to magenta (t end ).Middle: The Lorenz curves of the size vectors a.The plot is given by the bubble individuals k = 0, 1, • • • , n and the normalized partial sum vectors A k /A n of the increasingly sorted size vectors a, (k, A k /A n ), with A 0 = 0.The coloring gives the temporal development (cyan = t 0 and magenta = t end ).The line of equality is shown in blue.Right: The temporal development of the Gini coefficient G of the Lorenz curves (plot in the middle).The Gini coefficient increases monotonously in time.

Figure 4 .
Figure 4.The increasingly sorted bubble sizes of the initial bubble size vector a(t 0 ).On the left the classification is formed by the multiples of q min(a(t 0 )) with q = 1 and j = 1, • • • , 39; in the middle the classification is set by q = 5, j = 1, • • • , 8, and on the right with q = 10, j = 1, • • • , 4.

Figure 8 .Figure 9 .
Figure 8.The temporal development of the minimum integer q factor of the bubble size distributions.