Measuring the Complexity of Continuous Distributions

We extend previously proposed measures of complexity, emergence, and self-organization to continuous distributions using differential entropy. This allows us to calculate the complexity of phenomena for which distributions are known. We find that a broad range of common parameters found in Gaussian and scale-free distributions present high complexity values. We also explore the relationship between our measure of complexity and information adaptation.


Introduction
We all agree that complexity is everywhere. Yet, there is no agreed definition of complexity. Perhaps complexity is so general that it resists definition [1]. Still, it is useful to have formal measures of complexity to study and compare different phenomena [2]. We have proposed measures of emergence, self-organization, and complexity [3,4] based on information theory [5]. Shannon information can be seen as a measure of novelty, so we use it as a measure of emergence, which is correlated with chaotic dynamics. Self-organization can be seen as a measure of order [6], which can be estimated with the inverse of Shannon's information and is correlated with regularity. Complexity can be seen as a balance between order and chaos [7,8], between emergence and self-organization [4,9].
We have studied the complexity of different phenomena for different purposes [10][11][12][13][14]. Instead of searching for more data and measure its complexity, we decided to explore different distributions with our measures. This would allow us to study broad classes of dynamical systems in a general way, obtaining a deeper understanding of the nature of complexity, emergence, and self-organization. Nevertheless, our previously proposed measures use discrete Shannon information. Even when any distribution can be discretized, this always comes with caveats [15]. For this reason, we base ourselves on differential entropy [15,16] to propose measures for continuous distributions.
The next section provides background concepts related to information and entropies. Next, discrete measures of emergence, self-organization, and complexity are reviewed [4]. Section 4 presents continuous versions of these measures, based on differential entropy. The probability density functions used in the experiments are described in Section 5. Section 6 presents results, which are discussed and related to information adaptation [17] in Section 7.

Information Theory
Let us have a set of possible events whose probabilities of occurrence are p 1 , p 2 , . . . , p n ∈ P (X). Can we measure the uncertainty described by the probability distribution P (X)? To solve this endeavor in the context of telecommunications, Shannon proposed a measure of entropy [5], which corresponds to Boltzmann-Gibbs entropy in thermodynamics. This measure as originally proposed by Shannon, possess a dual meaning of both uncertainty and information, even when the latter term was later discouraged by Shannon himself [18]. Moreover, we encourage the concept of entropy as the average uncertainty given the property of asymptotic equipartition (described later in this section). From an information-theoretic perspective, entropy measures the average number of binary questions required to determine the value of p i . In cybernetics, it is related to variety [19], a measure of the number of distinct states a system can be in.
In general, entropy is discussed regarding a discrete probability distribution. Shannon extended this concept to the continuous domain with differential entropy. However, some of the properties of its discrete counterpart are not maintained. This has relevant implications for extending to the continuous domain the measures proposed in [3,4]. Before delving into these differences, first we introduce the discrete entropy, the asymptotic equipartition property (AEP), and the properties of discrete entropy. Next, differential entropy is described, along with its relation to discrete entropy.

Discrete Entropy
Let X be a discrete random variable, with a probability mass function p (x) = P r {X = x} , x ∈ X . The entropy H (X) of a discrete random variable X is then defined by The logarithm base provides the entropy's unit. For instance, base two measures entropy as bits, base ten as nats. If the base of the logarithm is β, we denote the entropy as H β (X). Unless otherwise stated, we will consider all logarithms to be of base two. Note that entropy does not depend on the value of X, but on the probabilities of the possible values X can take. Furthermore, Eq. 1 can be understood as the expected value of the information of the distribution.

Asymptotic Equipartition Property for Discrete Random Variables
In probability, the large numbers law states that, for a sequence of n i.i.d. elements of a sample X, the average value of the sample 1 n n i=1 X i approximates the expected value E (X). In this sense, the Asymptotic Equipartition Property (AEP) establishes that H (X) can be approximated by such that n → ∞, and x i ∈ X are i.i.d. (independent and identically distributed). Therefore, discrete entropy can be written also as where E is the expected value of P (X) . Consequently, Eq. 2 describes the expected or average uncertainty of probability distribution P (X) .
A final note about entropy is that, in general, any process that makes the probability distribution more uniform increases its entropy [15].

Properties of Discrete Entropy
The following are properties of the discrete entropy function. Proofs and details can be found in texbooks [15].
4. H (X) ≤ log |X| , with equality iff X is distributed uniformly over X.

Differential Entropy
Entropy was first formulated for discrete random variables, and was then generalized to continuous random variables in which case it is called differential entropy [20]. It has been related to the shortest description length, and thus, is similar to the entropy of a discrete random variable [21]. The differential entropy H (X) of a continuous random variable X with a density f (x) is defined as where S is the support set of the random variable. It is well-known that this integral exists iff the density function of the random variables is Riemann-integrable [15,16]. The Riemann integral is fundamental in modern calculus. Loosely speaking, is the approximation of the area under any continuous curve given by the summation of ever smaller sub-intervals (i.e. approximations), and implies a well-defined concept of limit [21]. H (f ) can also be used to denote differential entropy, and in the rest of the article, we shall employ this notation.

Asymptotic Equipartition Property of Continuous Random Variables
Given a set of i.i.d. random variables drawn from a continuous distribution with probability density f (x), its differential entropy H (f ) is given by such that n → ∞. The convergence to expectation is a direct application of the weak law of large numbers.

Properties of Differential Entropy
1. H (f ) depends on the coordinates.
For different choices of coordinate systems for a given probability distribution P (X), the corresponding differential entropies might be distinct.
The H (f ) of a Dirac delta probability distribution, is considered the lowest H (f )bound, which corresponds to H (f ) = −∞.
5. Information measures such as relative entropy and mutual information are consistent, either in the discrete or continuous domain [22].

Differences between Discrete and Continuous Entropies
The derivation of equation 3 comes from the assumption that its probability distribution is Riemann-integrable. If this is the case, then differential entropy can be defined just like discrete entropy. However, the notion of "average uncertainty" carried by the Eq. 1 cannot be extended to its differential equivalent. Differential entropy is rather a function of the parameters of a distribution function, that describes how uncertainty changes as the parameters are modified [15].
To understand the differences between Eqs. 1 and 3 we will quantize a probability density function, and then calculate its discrete entropy [15,16].
First, consider the continuous random variable X with a probability density function f (x) .This function is then quantized by dividing its range into h bins of length ∆. Then, in accordance to the Mean Value Theorem, within each h i bin of size [i∆, (i + 1) ∆], there exists a value x * i that satisfies Then, a quantized random variable X ∆ i is defined as and, its probability is Consequently, the discrete entropy of the quantized variable X ∆ , is formulated as To understand the final form of Eq. 8, notice that as the size of each bin becomes infinitesimal, ∆ → 0, the left-hand term of Eq. 8 becomes log 2 (∆). This is a consequence of Furthermore, as ∆ → 0, the right-hand side of Eq. 8 approximates the differential entropy of X such that Note that the left-hand side of Eq. 8, explodes towards minus infinity such that Therefore, the difference between H (f ) and H X ∆ is H (f ) − H X ∆ = log 2 (∆), which approaches to −∞ as the bin size becomes infinitesimal. Moreover, consistently with this is the fact that the differential entropy of a discrete value is −∞ [16].
Lastly, in accordance to [15], the average number of bits required to describe a continuous variable X with a n-bit accuracy (quantization) is H (X) + n ≈ H (f ) such that

Discrete Complexity Measures
Emergence E, self-organization S, and complexity C are close relatives of Shannon's entropy. These information-based measures, inherit most of the properties of Shannon's discrete entropy [4], being the most valuable one that, discrete entropy quantizes the average uncertainty of a probability distribution. In this sense, complexity C and its related measures (E and S) are based on a quantization of the average information contained by a process described by its probability distribution.

Emergence
Another form of entropy, rather related to the concept of information as uncertainty, is called emergence E [4]. Intuitively, E measures the ratio of uncertainty a process produces by new information that is consequence of changes in a) dynamics or b) scale [4]. However, its formulation is more related to the thermodynamics entropy. Thus, it is defined as where p i = P (X = x) is the probability of the element i, and K is a normalizing constant.

Multiple Scales
In thermodynamics, the Boltzmann constant K, is employed to normalize the entropy in accordance to the probability of each state. However, Shannon's entropy typical formulation [15][16][17] neglects the usage of K in Eq. 10 (been its only constraint that K > 0, [4]). Nonetheless, for emergence as a measure of the average production of information for a given distribution, K plays a fundamental role. In the cybernetic definition of variety [18], K is a function of the distinct states a system can be, i.e. the system's alphabet size. Formally, it is defined as where b corresponds to the size of the alphabet of the sample or bins of a discrete probability distribution. Furthermore, K should guarantee that 0 ≤ E ≤ 1, therefore, b should be at least equal to the number of bins of the discrete probability distribution. It is also worth noting that the denominator of Eq. 11, log 2 (b) , is equivalent to the maximum entropy for a continuous distribution function, the uniform distribution. Consequently, emergence can be understood as the ratio between the entropy for given distribution P (X), and the maximum entropy for the same alphabet size H (U ) [23], this is

Self-Organization
Entropy can also provide a measure of system's organization, and its predictability [23]. In this sense, with more uncertainty less predictability is achieved, and vice-versa. Thus, an entirely random process (e.g. uniform distribution) has the lowest organization, and a completely deterministic system one (Dirac delta distribution), has the highest. Furthermore, an extremely organized system yields no information with respect of novelty, while, on the other hand, the more chaotic a system is, the more information is yielded [4,23].
The metric of self-organization S was proposed to measure the organization a system has regarding its average uncertainty [4,24]. S is also related to the cybernetic concept of constraint, which measures changes in due entropy restrictions on the state space of a system [8]. These constraints confine the system's behavior, increasing its predictability, and reducing the (novel) information it provides to an observer. Consequently, the more self-organized a system is, the less average uncertainty it has. Formally, S is defined as such that 0 ≤ S ≤ 1. It is worth noting that, S is the complement of E. Moreover, the maximal S (i.e. S = 1) is only achievable when the entropy for a given probability density function (PDF) is such that H (P (X)) → 0, which corresponds to the entropy of a Dirac delta (only in the discrete case).

Complexity
Complexity C can be described as a balance between order (stability), and chaos (scale or dynamical changes) [4]. More precisely, this function describes a system's behavior in terms of the average uncertainty produced by its probability distribution in relation the dynamics of a system. Thus, the complexity measure is defined as such that, 0 ≤ C ≤ 1.

Continuous Complexity Measures
As mentioned before, discrete and differential entropies do not share the same properties. In fact, the property of discrete entropy as the average uncertainty in terms of probability, cannot be extended to its continuous counterpart. As consequence, the proposed continuous information-based measures describe how the production of information changes respect to the probability distribution parameters. In particular, this characteristic could be employed as a feature selection method, where the most relevant variables are those which have a high emergence (the most informative).
The proposed measures are differential emergence (E D ), differential self-organization (S D ), and differential complexity (C D ). However, given that the interpretation and formulation (in terms of emergence) of discrete and continuous S (Eq. 13) and C (Eq. 14) are the same, we only provide details on E D . The difference between S D , C D and S, C is that the former are defined on E D , while the latter on E. Furthermore, we make emphasis in the definition of the normalizing constant K, which play a significant role in constraining E D ∈ [0, 1], and consequently, S D and C D as well.

Differential Emergence
As for its discrete form, the emergence for continuous random variables is defined as where, [υ, ζ] is the domain, and K stands for a normalizing constant related to the distribution's alphabet size. It is worth noting that this formulation is highly related to the view of emergence as the ratio of information production of a probability distribution respect the maximum differential entropy for the same range. However, since E D can be negative (i.e. entropy of a single discrete value), we choose E D such that E D is rather a more convenient function than E D , as 0 ≤ E D ≤ 1. This statement is justified in the fact that the differential entropy of a discrete value is −∞ [15]. In practice, differential entropy becomes negative only when the probability distribution is extremely narrow, i.e. there is a high probability for few states. In the context of information changes due parameters manipulation, an E D < 0 means that the probability distribution is becoming a Dirac delta distribution. For notation convenience, from now on we will employ E D and E D interchangeably.

Multiple Scales
The K constant expresses the relation between uncertainty of a given P (X) Defined by H(X), respect to the entropy of a maximum entropy over the same domain [23]. In this setup, as the uncertainty grows, E D becomes closer to unity.
To constrain the value of H (X) = [0, 1] in the discrete emergence case, it was enough to establish the distribution's alphabet size, b of Eq. 10, such that b ≥ # bins [4]. However, for any PDF, the number of elements between a pair of points a and b, such that a = b, is infinite. Moreover, as the size of each bin becomes infinitesimal, ∆ → 0, the entropy for each bin becomes −∞ [15]. Also, it has been stated that b value should be equal to the cardinality of X [23], however, this applies only to discrete emergence. Therefore, rather than a generalization, we propose an heuristic for the selection of a proper K in the case of differential emergence. Moreover, we differentiate between b for H (f ), and b' for H X ∆ .
As in the discrete case, K is defined as Eq. 11. In order to determine the proper alphabet size b, we propose the next algorithm: 1. If we know a priori the true P (X), we calculate H (f ), and b = |P (X)| is the cardinality within the interval of Eq. 15. In this sense, a large value will denote the cardinality of an "ghost" sample [16] 1 .
2. If we do not know the true P (X), or we are interested rather in H X ∆ where a sample of finite size is involved, we calculate b' as such that, the non-negative function ind (·) is defined as For instance, in the quantized version of the standard normal distribution (N (0, 1)), only values within ±3σ satisfy this constraint despite the domain of Eq. 15. In particular, if we employ b = |X| rather than b , we compress the E D value as it will be shown in the next section. On the other hand, for a uniform distribution or a power-law (such that 0 < x min < x), the whole range of points satisfies this constraint.

Probability Density Functions
In communication and information theory, uniform (U) and normal, a.k.a. Gaussian (G) distributions play a significant role. Both are referent to maximum entropy: on the one hand, U has the maximum 1 It is ghost, in the concrete sense that it does not exist. Its only purpose is to provide a bound for the maximum entropy accordingly to some large alphabet size. entropy within a continuous domain; on the other hand, G has the maximum entropy for distributions with a fixed mean (µ), and a finite support set for a fixed standard deviation (σ) [15,16]. Moreover, as mentioned earlier, H (f ) is useful when comparing the entropies of two distributions over some reference space [15,16,25]. Consequently, U, but mainly G, are heavily used in the context of telecommunications for signal processing [16]. Nevertheless, many natural and man-made phenomena can be approximated with power-law (PL) distributions. These types of distributions typically present complex patterns that are difficult to predict, making them a relevant research topic [26]. Furthermore, power-laws have been related to the presence of multifractal structures in certain types of processes [25]. Moreover, power-laws are tightly related to self-organization and criticality theory, and have been studied under information frameworks before (e.g. Tsallis', and Renyi's maximum entropy principle) [26,27].
Therefore, in this work we focus our attention to these three PDFs. First, we provide a short description of each PDF, then, we summarize its formulation, and the corresponding H (f ) in Table  1.

Uniform Distribution.
The simplest PDF, as its name states, establishes that for each possible value of X, the probability is constant over the whole support set (defined by the range between a and b), and 0 elsewhere. This PDF has no parameters besides the starting and ending points of the support set. Furthermore, this distribution appears frequently in signal processing as white noise, and it has the maximum entropy for continuous random variables [16].
Its PDF, and its corresponding H (f ) are shown in first row of Table 1. It is worth noting that, as the cardinality of the domain of U grows, its differential entropy increases as well.

Normal Distribution.
The normal or Gaussian distribution is one of the most important probability distribution families [28]. It is fundamental in the central limit theorem [16], time series forecasting models such as classical autoregressive models [29], modelling economic instruments [30], encryption, modelling electronic noise [16], error analysis and statistical hypothesis testing. Its PDF is characterized by a symmetric, bell-shaped function whose parameters are: location (i.e. mean µ), and dispersion (i.e. standard deviation σ 2 ). The standard normal distribution is the simplest and most used case of this family, its parameters are N (µ = 0, σ 2 = 1). A continuous random variable x ∈ X is said to belong to a Gaussian distribution, X ∼ N (µ, σ 2 ) , if its PDF p (x) is given by the one described in the second row of Table 1. As is shown in the table, the differential entropy of G only depends on the standard deviation. Furthermore, it is well known that its differential entropy is monotonically increasing concave in relation to σ [28]. This is consistent with the aforementioned fact that H (f ) is translation-invariant. Thus, as σ grows, so does the value of H (G), while as σ → 0 such that 0 < σ < 1, it becomes a Dirac delta with H (f ) ≈ 0.

Power-Law Distribution.
Power-law distributions are commonly employed to describe multiple phenomena (e.g. turbulence, DNA sequences, city populations, linguistics, cosmic rays, moon craters, biological networks, data storage in organisms, chaotic open systems, and so on) across numerous scientific disciplines [25][26][27][31][32][33][34]. These type of processes are known for being scale invariant, being the typically scales (α, see below) in nature between one and 3.5 [27]. Also, the closeness of this type of PDF to chaotic systems and fractals is such that, some fractal dimensions are called entropy dimensions (e.g. box-counting dimension, and Renyi entropy) [33].
Power-law distributions can be described by continuous and discrete distributions. Furthermore, Power-laws in comparison with Normal distribution, generate events of large orders of magnitude more often, and are not well represented by a simple mean. A Power-Law density distribution is defined as such that, C is a normalization factor, α is the scale exponent, and X | x > x min > 0 is the observed continuos random variable. This PDF diverges as x → 0 , and do not hold for all x ≥ 0 [34]. Thus, x min corresponds to lower bound of a power-law. Consequently, in Table 1 we provide the PDF of a Power-Law as proposed by [32], and its corresponding H (f ) as proposed by [35]. The aforementioned PDFs, and their corresponding H (f ) are shown in Table 1. Further details about the derivation of H (f ) for U, and G can be found in [15,16]. For additional details on the differential entropy of the power-law, we refer the reader to [25,35]. Table 1. Studied PDFs (left column) with their corresponding analytical differential entropies (right column).

Distribution PDF Differential Entropy
Uniform

Results
In this section, comparisons of theoretical vs quantized differential entropy for the PDFs considered are shown. Next, we provide differential complexity results (E D , S D , and C D ) for the mentioned PDFs. Furthermore, in the case of power-laws, we also provide and discuss the corresponding complexity measures results for real world phenomena, already described in [36]. Also, it is worth noting that, since for quantized H (f ) of the power-law yielded poor results, the power-law's analytical H (f ) form was used.

Theoretical vs Quantized Differential Entropies
Numerical results of theoretical and quantized differential entropies are shown in Figs. 1 and 2.
Analytical H (f ) results are displayed in blue, whereas the quantized H X ∆ ones are shown in red. For each PDF, a sample of one million (i.e. 1 × 10 6 ≡ 1M) points where employed for calculations. The bin size ∆ required by H X ∆ , is obtained as the ratio ∆ = Range |Sample| . However, the value of ∆ has considerable influence in the resulting quantized differential entropy. The results for U were expectable. We tested several values of the cardinality of P (X), such that b = 2 i | i = 1, . . . , 15. Using the analytical H (f ) formula of Table 1, the quantized H X ∆ , and ∆ = 1 we achieved exactly the same differential entropy values. Results for U are shown in the left side of Fig. 1. As was mentioned earlier, as the cardinality of the distribution grows, so does the differential entropy of U.

Normal Distribution.
Results for the Gaussian distribution were less trivial. As in the U case, we calculate both H (f ) and H X ∆ , for a fixed µ = 0, and modified the standard deviation parameter such that, σ = 2 i | i = 0, 1, . . . , 14. Notice that the first tested distribution is the standard normal distribution.
In Fig. 2, results obtained for the n-bit quantized differential entropy, and for the analytical form of Table 1 are shown. Moreover, we displayed two cases of the normal distribution: the left side of Fig.  2 shows results for P (X) with range [−50, 50] and a bin size, ∆ = 100 1M = 1 × 10 −4 ,whereas, right side provides results for a P (X) with range [−500e3, 500e3] and ∆ = 1. It is worth noting that, in the former case the quantized differential entropy shows a discrepancy with H (f ) after only σ = 2 4 = 16, which quickly increases with growing σ. On the other hand, for the latter case there is an almost perfect match between the analytical and quantized differential entropies, however, the same mismatch will be observed if the standard deviation parameter is allowed to grow unboundedly (σ → ∞). Nonetheless, this is a consequence of how H X ∆ is computed. As mentioned earlier, as ∆ → 0 the value of each quantized X ∆ grows towards −∞. Therefore, in the G case, it seems convenient employing a Probability Mass Function (PMF) rather than a PDF. Consequently, the experimental setup of right side image of Fig. 2 is employed for the calculation of the continuous complexity measures of G.  Figure 2. Two comparisons of theoretical vs quantized differential entropy for the Gaussian distribution.

Power-Law Distribution.
Results for the power-law distribution are shown in the right side of Fig. 1. In both U and G, a PMF instead of a PDF was used to avoid cumbersome results (as depicted in the corresponding images). However, for the power-law distribution, the use of a PDF is rather convenient. As shown in Fig. 4 and highlighted by [32], x min has a considerable impact on the value of H (f ). For Fig. 1, the range employed was [1,50], with a bin size of ∆ = 1 × 10 −5 , a x min = 0.99, and modified the scale exponent parameter such that, α = i | i = 1, . . . , 15. For this particular setup, we can observe that as α increases, H (f ) and H X ∆ decreases its value towards −∞. This effect is consequence of increasing the scale of the Power-law such that, the slope of the function in a log-log space, approaches to zero. In this sense, with larger α's, the P (X) becomes closer to a Dirac delta distribution, thus, H (f ) → −∞. However, as will be discussed later, for larger α's larger x min values are required, in order for H (f ) to display positive values. H(U ) = 1, which is exactly the same as its discrete counterpart. Thus, U results are not considered in the following analysis.
Continuous complexity results for G and PL are shown in Figs. 3 and 4, respectively. In the following we provide details of these measures.

Normal Distribution.
It was stated in Section 4 that, the size of the alphabet is given by the function ind (P (X)). This rule establishes a valid cardinality such that P (X) > 0, thus, only those states with a positive probability are considered. For P X ∆ , such operation can be performed. Nevertheless, when the analytical H (f ) is used, the proper cardinality of the set is unavailable. Therefore, in the Gaussian distribution case, we tested two criteria for selecting the value of b: 1.
x i ind (·) is employed for H X ∆ 2. A constant with a large value (C = 1 × 10 6 ) is used for the analytical formula of H (f ).
In Fig. 3, solid dots are used when K is equal to the cardinality of P (X) > 0, whereas solid squares are used for an arbitrary large constant. Moreover, for the quantized case of P (G) , Table 2 shows the cardinality for each sigma, b i , and its corresponding K i . As it can be observed, for a large normalizing constant K, a logarithmic relation is displayed for E D and S D . Also, the maximum C D is achieved for σ = 2 8 = 256, which is where E D = S D . However, for H X ∆ the the maximum C D is found around σ = 2 1,2,3 = 2, 4, 8, such that C D ≤ | → 0. A word of advise must be made here. The required cardinality to normalize the continuous complexity measures such that 0 ≤ E D , S D , C D ≤ 1, must have a lower bound. This bound should be related to the scale of the P (X) [37], and the quantization size ∆. In our case, when a large cardinality |U | = 1 × 10 6 , and ∆ = 1 are used, the normalizing constant flattens E D results respect those obtained by b ; moreover, the large constant increases S D , and takes greater standard deviations for achieving the maximum C D . However, these complexity results are rather artificial in the sense that, if we arbitrarly let |U | → ∞ then trivially we will obtain E D = 0, S D = 1, and C D = 0. Moreover, it has been stated that the cardinality of P (X) should be employed as a proper size of b [23]. Therefore, when H X ∆ is employed, the cardinality of P (X) > 0 must be used. On the contrary, when H (f ) is employed, a coarse search for increasing alphabet sizes could be used so that the maximal H (f ) satisfies H(f ) H(U ) ≤ 1.  Figure 3. Complexity of the Gaussian distribution.

Power-Law Distribution
In this case, H (f ) rather than H X ∆ is used for computational convenience. Although the cardinality of P (X) > 0 is not available, by simply substituting p (x i ) > 0 | x = {1, . . . , 1 × 10 6 } we can see that the condition is fulfilled by the whole set. Therefore, the large C criterium, earlier detailed, is used. Still, given that a numerical power-law distribution is given by two parameters, a Table 2. Alphabet size b , and its corresponding normalizing K constant for the normal distribution G. lower bound x min and the scale exponent α, we depict our results in 3D in Fig. 4. From left to right, E D , S D , and C D for the power-law distribution are shown, respectively. In the three images, the same coding is used: x-axis displays the scale exponent (α) values, y-axis shows x min values, and z-axis depicts the continuous measure values; lower values of α are displayed in dark blue, turning into reddish colors for larger exponents.
As it can be appreciated in Fig. 4, for small x min (e.g. x min = 1) values, low emergence is produced despite the scale exponent. Moreover, maximal self-organization (i.e. S D = 1) is quickly achieved (i.e. α = 4), providing a PL with at most fair complexity values. However, if we let x min take larger numbers, E D grows, achieving the maximal complexity (i.e. C D ≈ 0.8) of this experimental setup at x min = 15, α = 1. This behavior is also observed for other scale exponent values, where emergence of new information is produced as the x min value grows. Furthermore, it has been stated that for P (X) displays a power-law behavior it is required that ∀x i ∈ P (X) | x i > x min [34]. Thus, for every α there should be an x min such that E D > 0. Moreover, for larger scale exponents, larger x min values are required for the distribution shows emergence of new information at all.

Real World Phenomena and their Complexity
Data of phenomena that follows a power law is provided in Table 3. These power-laws have been studied by [32,34,36], and the power-law parameters were published by [36]. The phenomena in the table mentioned above compromises data from: More details about these power-laws can be found in [32,34,36]. For each phenomenon, the corresponding differential entropy and complexity measures are shown in Table 3. Furthermore, we also provide Table 5 which is a color coding for complexity measures proposed in [4]. Five colors are employed to simplify the different value ranges of E D , S D , and C D results. According to the nomenclature suggested in [4], results for these sets show that, very high complexity 0.8 ≤ C D ≤ 1 is obtained by the number of citations set (i.e. 2), and intensity of solar flares (i.e. 7). High complexity, 0.6 ≤ C D < 0.8 is obtained for received telephone calls (i.e. 4), intensity of wars (i.e. 8), and frequency of family names (i.e. 9). Fair complexity 0.4 ≤ C D < 0.6 is displayed by earthquakes magnitude (i.e. 5), and population of U.S. cities (i.e. 10). Low complexity, 0.2 ≤ C D < 0.4 is obtained for frequency of used words in Moby Dick (i.e. 1) and web hits (i.e. 3), whereas, moon craters (i.e. 6) have very low complexity 0 ≤ C D < 0.2. In fact, earthquakes, and web hits, have been found not to follow a power law [32]. Furthermore, if such sets were to follow a power-law, a greater value of x min would be required as can be observed in Fig. 4. In fact, the former case is found for the frequency of words used in Moby Dick. In [36], parameters of Table 3 are proposed. However, in [32], another set of parameters are estimated (i.e. x min = 7, α = 1.95). For the more recent estimated set of parameters, a high complexity is achieved (i.e. C D = 0.74), which is more consistent with literature about Zipf's law [36]. Lastly, in the case of moon craters, the x min = 0.01 is rather a poor choice according to Fig. 4. For the chosen scale exponent, it would require at least a x min ≈ 1, for the power-law to produce any information at all. It should be noted that x min can be adjusted to change the values of all measures. Also, it is worth mentioning that if we were to normalize and discretize a power law distribution to calculate its discrete entropy (as in [4]), all power law distributions present a very high complexity, independently of x min and α, precisely because these are normalized. Still, this is not useful for comparing different power law distributions.

Discussion
The relevance of the work presented here lies in the fact that it is now possible to calculate measures of emergence, self-organization, and complexity directly from probability distributions, without needing access to raw data. Certainly, the interpretation of the measures is not given, as this will depend on the use we make of the measures for specific purposes.
From exploring the parameter space of the uniform, normal, and scale-free distributions, we can corroborate that high complexity values require a form of balance between extreme cases. On the one hand, uniform distributions, by definition, are homogeneous and thus all states are equiprobable, yielding the highest emergence. This is also the case of normal distributions with a very large standard deviation and for power law distributions with an exponent close to zero. On the other hand, highly biased distributions (very small standard deviation in G or very large exponent in PL) yield a high self-organization, as few states accumulate most of the probability. Complexity is found between these two extremes. From the values of σ and α, this coincides with a broad range of phenomena. This does not tell us something new: complexity is common. The relevant aspect is that this provides a common framework to study of the processes that lead phenomena to have a high complexity [38]. It should be noted that this also depends on the time scales at which change occurs [39].
In this context, it is interesting to relate our results with information adaptation [17]. In a variety of systems, adaptation takes place by inflating or deflating information, so that the "right" balance is achieved. Certainly, this precise balance can change from system to system and from context to context. Still, the capability of information adaptation has to be correlated with complexity, as the measure also reflects a balance between emergence (inflated information) and self-organization (deflated information).
As a future work, it will be interesting to study the relationship between complexity and semantic information. There seems to be a connection with complexity as well, as we have proposed a measure of autopoiesis as the ratio of the complexity of a system over the complexity of its environment [4,40]. These efforts should be valuable in the study of the relationship between information and meaning, in particular in cognitive systems.
Another future line of research lies in the relationship between the proposed measures and complex networks [41][42][43][44], exploring questions such as: how does the topology of a network affect its dynamics? How much can we predict the dynamics of a network based on its topology? What is the relationship between topological complexity and dynamic complexity? How controllable are networks [45] depending on their complexity?