1. Purpose of This Paper
This paper describes the recent developments in a new statistical theory describing Evolution and SETI by mathematical equations. I call this the Evo-SETI mathematical model of Evolution and SETI.
The main question which this paper focuses on is, whenever a new exoplanet is discovered, what is the evolutionary stage of the exoplanet in relation to the life on it, compared to how it is on Earth today? This is the central question for Evo-SETI. In this paper, it is also shown that the (Shannon) Entropy of b-lognormals addresses this question, thus allowing the creation of an Evo-SETI SCALE that may be applied to exoplanets.
An important new result presented in this paper stresses that the cubic in the work of Markov-Korotayev [
1,
2,
3,
4,
5,
6,
7,
8] can be taken as the mean value curve of a lognormal process, thus reconciling their deterministic work with our probabilistic Evo-SETI theory.
2. During the Last 3.5 Billion Years, Life Forms Increased as in a (Lognormal) Stochastic Process
Figure 1 shows the time
on the horizontal axis, with the convention that negative values of
are past times, zero is now, and positive values are future times. The starting point on the time axis is
ts = 3.5 billion (10
9) years ago, i.e., the accepted time of the origin of life on Earth. If the origin of life started earlier than that, for example 3.8 billion years ago, the following equations would remain the same and their numerical values would only be slightly changed. On the vertical axis is the number of species living on Earth at time
, denoted
and standing for “life at time
”. We do not know this “function of the time” in detail, and so it must be regarded as a random function, or stochastic process
. This paper adopts the convention that capital letters represent random variables, i.e., stochastic processes if they depend on the time, while lower-case letters signify ordinary variables or functions.
3. Mean Value of the Lognormal Process L(t)
The most important, ordinary and continuous function of the time associated with a stochastic process like
is its mean value, denoted by:
The probability density function (
pdf) of a stochastic process like
is assumed in the Evo-SETI theory to be a
b-lognormal, and its equation thus reads:
This assumption is in line with the extension in time of the statistical Drake equation, namely the foundational and statistical equation of SETI, as shown in [
9].
The mean value (Equation (1)) is related to the pdf Equation (2) by the relevant integral in the number
of living species on Earth at time
, as follows:
The “surprise” is that this integral in Equation (3) may be exactly computed with the key result, so that the mean value
is given by:
In turn, the last equation has the “surprising” property that it may be exactly inverted, i.e., solved for
:
4. L(t) Initial Conditions at ts
In relation to the initial conditions of the stochastic process
, namely concerning the value
, it is assumed that the exact positive number
is always known, i.e., with a probability of one:
In practice, will be equal to one in the theories of the evolution of life on Earth or on an exoplanet (i.e., there must have been a time in the past when the number of living species was just one, be it RNA or something else), and it is considered as equal to the number of living species just before the asteroid/comet impacted in the theories of mass extinction of life on a planet.
The mean value
of
must also equal the initial number
at the initial time
, that is:
Replacing
with
in Equation (4), one then finds:
That, checked against Equation (8), immediately yields:
These are the initial conditions for the mean value.
After the initial instant
, the stochastic process
unfolds, oscillating above or below the mean value in an unpredictable way. Statistically speaking however, it is expected that
does not “depart too much” from
, and this fact is graphically shown in
Figure 1 by the two dot-dot blue curves above and below the mean value solid red curve
. These two curves are the upper standard deviation curve
and the lower standard deviation curve
respectively (see [
4]). Both Equations (11) and Equations (12), at the initial time
, equal the mean value
. With a probability of one, the initial value
is the same for all of the three curves shown in
Figure 1. The function of the time
is called the variation coefficient, since the standard deviation of
(noting that this is just the standard deviation
of
and not either of the above two “upper” and “lower” standard deviation curves given by Equations (11) and (12), respectively) is:
Thus, Equation (14) shows that the variation coefficient of Equation (13) is the ratio of
to
, i.e., it expresses how much the standard deviation “varies” with respect to the mean value. Having understood this fact, the two curves of Equations (11) and (12) are obtained:
5. L(t) Final Conditions at te > ts
With reference to the final conditions for the mean value curve, as well as for the two standard deviation curves, the final instant can be termed
, reflecting the end time of this mathematical analysis. In practice, this
is zero (i.e., now) in the theories of the evolution of life on Earth or exoplanets, but it is the time when the mass extinction ends (and life starts to evolve again) in the theories of mass extinction of life on a planet. First of all, it is clear that, in full analogy to the initial condition Equation (8) for the mean value, the final condition has the form:
where
is a positive number denoting the number of species alive at the end time
. However, it is not known what random value
will take, but only that its standard deviation curve Equation (14) will, at time
, have a certain positive value that will differ by a certain amount
from the mean value Equation (16). In other words, from Equation (14):
When dividing Equation (17) by Equation (16), the common factor
is cancelled out, and one is left with:
Solving this for
finally yields:
This equation expresses the so far unknown numerical parameter in terms of the initial time plus the three final-time parameters .
Therefore, in conclusion, it is shown that once the five parameters are assigned numerically, the lognormal stochastic process is completely determined.
Finally, notice that the square of Equation (19) may be rewritten as:
from which the following formula is inferred:
This Equation (21) enables one to get rid of
, replacing it by virtue of the four boundary parameters:
. It will be later used in
Section 8 to rewrite the Peak-Locus Theorem in terms of the boundary conditions, rather than in terms of
.
6. Important Special Cases of m(t)
- (1)
The particular case of Equation (1) when the mean value
is given by the generic exponential:
is called the Geometric Brownian Motion (GBM), and is widely used in financial mathematics, where it represents the “underlying process” of the stock values (Black-Sholes models). This author used the GBM in his previous models of Evolution and SETI ([
9,
10,
11,
12,
13,
14]), since it was assumed that the growth of the number of ET civilizations in the Galaxy, or, alternatively, the number of living species on Earth over the last 3.5 billion years,
grew exponentially (Malthusian growth). Upon equating the two right-hand-sides of Equations (4) and (22) (with
t replaced by (
t-ts)), we find:
Solving this equation for
yields:
This is (with
) the mean value at the exponent of the well-known GBM pdf, i.e.,:
This short description of the GBM is concluded as the exponential sub-case of the general lognormal process Equation (2), by warning that GBM is a misleading name, since GBM is a lognormal process and not a Gaussian one, as the Brownian Motion is.
- (2)
As has been mentioned already, another interesting case of the mean value function
in Equation (1) is when it equals a generic
polynomial in t starting at ts, namely (with
being the coefficient of the
k-th power of the time
t-ts in the polynomial)
The case where Equation (26) is a second-degree polynomial (i.e., a parabola in
) may be used to describe the Mass Extinctions on Earth over the last 3.5 billion years (see [
13]).
- (3)
Having so said, the notion of a
b-lognormal must also be introduced, for
t > b = birth, representing the lifetime of living entities, as single cells, plants, animals, humans, civilizations of humans, or even extra-terrestrial (ET) civilizations (see [
12], in particular pages 227–245)
7. Boundary Conditions when m(t) is a First, Second, or Third Degree Polynomial in the Time (t-ts)
In [
13], the reader may find a mathematical model of Darwinian Evolution different from the GBM model. That model is the Markov-Korotayev model, for which this author proved the mean value (1) to be a
i.e., a third degree polynomial in
.
In summary, the key formulae proven in [
13], relating to the case when the assigned mean value
is a polynomial in
t starting at
ts, can be shown as:
- (1)
The mean value is a straight line. This straight line is the line through the two points,
and
, that, after a few rearrangements, becomes:
- (2)
The mean value is a parabola, i.e., a quadratic polynomial in
. Then, the equation of such a parabola reads:
Equation (30) was actually firstly derived by this author in [
13] (pp. 299–301), in relation to Mass Extinctions, i.e., it is a decreasing function of time.
- (3)
The mean value is a cubic. In [
13] (pp. 304–307), this author proved, in relation to the Markov-Korotayev model of Evolution, that the
cubic mean value of the
lognormal stochastic process is given by the cubic equation in
:
Notice that, in Equation (31), one has, in addition to the usual initial and final conditions
and
, two more “middle conditions” referring to the two instants
at which the Maximum and the minimum of the cubic
occur, respectively:
8. Peak-Locus Theorem
The Peak-Locus theorem is the new mathematical discovery of ours, playing a central role in Evo-SETI. In its most general formulation, it can be used for any lognormal process
or arbitrary mean value
. In the case of GBM, it is shown in
Figure 2.
The Peak-Locus theorem states that the family of
b-lognormals, each having its own peak located exactly
upon the mean value curve (1), is given by the following three equations, specifying the three parameters
,
, and
appearing in Equation (27) as three functions of the peak abscissa, i.e., the independent variable
. In other words, we were actually pleased to find out that these three equations may be written directly in terms of
as follows:
The proof of Equation (33) is lengthy and was given as a special file (written in the language of the Maxima symbolic manipulator) that the reader may freely download from the web site of [
13].
An important new result is now presented. The Peak-Locus Theorem Equation (33) is rewritten, not in terms of
, but in terms of the four boundary parameters known as:
. To this end, we must insert Equations (21) and (20) into Equation (33), producing the following result:
In the particular GBM case, the mean value is Equation (22) with
,
and
. Then, the Peak-Locus theorem Equation (33) with
yields:
In this simpler form, the Peak-Locus theorem had already been published by the author in [
10,
11,
12], while its most general form is Equations (33) and (34).
9. EvoEntropy(p) as a Measure of Evolution
The (Shannon) Entropy of the
b-lognormal Equation (27) is (for the proof, see [
11], page 686):
This is a function of the peak abscissa
and is measured in bits, as in Shannon’s Information Theory. By virtue of the Peak-Locus Theorem Equation (33), it becomes:
One may also directly rewrite Equation (37) in terms of the four boundary parameters
, upon inserting Equation (21) into Equation (37), with the result:
Thus, Equation (37) and Equation (38) yield the entropy of each member of the family of b-lognormals (the family’s parameter is ) peaked upon the mean value curve (1). The b-lognormal Entropy Equation (36) is thus the measure of the extent of evolution of the b-lognormal: it measures the decreasing disorganization in time of what that b-lognormal represents.
Entropy is thus disorganization decreasing in time. However, one would prefer to use a measure of the increasing organization in time. This is what we call the EvoEntropy of
p:
The Entropy of evolution is a function that has a minus sign in front of Equation (36), thus changing the decreasing trend of the (Shannon) entropy Equation (36) into the increasing trend of this EvoEntropy Equation (39). In addition, this EvoEntropy starts at zero at the initial time
, as expected.
By virtue of Equation (37), the EvoEntropy Equation (39), invoking also the initial condition Equation (8), becomes:
Alternatively, this could be directly rewritten in terms of the five boundary parameters
, upon inserting Equation (38) into Equation (39), thus finding:
It is worth noting that the standard deviation at the end time,
, is irrelevant for the purpose of computing the simple curve of the EvoEntropy Equation (39). In fact, the latter is just a continuous curve, and not a stochastic process. Therefore, any numeric arbitrary value may be assigned to
, and the EvoEntropy curve must not change. Keeping this in mind, it can be seen that the true EvoEntropy curve is obtained by “squashing” down Equation (42) into the mean value curve
and this only occurs if we let:
Inserting Equation (43) into Equation (42), the latter can be simplified into:
which is the final form of the EvoEntropy curve. Equation (44) will be used in the sequel. It can now be clearly seen that the final EvoEntropy Equation (44) is made up of three terms, as follows:
- (a)
The constant term
whose numeric value in the particularly important case of
is:
that is, it approximates almost zero.
- (b)
The denominator square term in Equation (44) rapidly approaches zero as
increases to infinity. In other words, this inverse-square term
may become almost negligible for large values of the time
.
- (c)
Finally, the dominant, natural logarithmic, term, i.e., that which is the major term in this EvoEntropy Equation (45) for large values of the time
.
In conclusion, the EvoEntropy Equation (44) depends upon its natural logarithmic term Equation (48), and so its shape in time must be similar to the shape of a logarithm, i.e., nearly vertical at the beginning of the curve and then progressively approaching the horizontal, though never reaching it. This curve has no maxima nor minima, nor any inflexions.
10. Perfectly Linear EvoEntropy When the Mean Value Is Perfectly Exponential (GBM): This Is Just the Molecular Clock
In the GBM case of Equation (22) (with
t replaced by (
t-ts)), when the mean value is given by the exponential
the EvoEntropy Equation (44)
is exactly a linear function of the time , since the first two terms inside the braces in Equation (44) cancel each other out, as we now prove.
Proof. Insert Equation (49) into Equation (44) and then simplify:
In other words, the GBM EvoEntropy is given by:
This is a straight line in the time
, starting at the time
of the origin of life on Earth and increasing linearly thereafter. It is measured in bits/individual and is shown in
Figure 3.
This is the same linear behaviour in time as the molecular clock, which is the technique in molecular evolution that uses fossil constraints and rates of molecular change to deduce the time in geological history when two species or other taxa diverged. The molecular data used for such calculations are usually nucleotide sequences for DNA or amino acid sequences for proteins (see [
16,
17,
18]).
In conclusion, we have ascertained that the EvoEntropy in our Evo-SETI theory and the molecular clock are the same linear time function, apart from multiplicative constants (depending on the adopted units, like bits, seconds, etc.). This conclusion appears to be of key importance when assessing the stage at which a newly discovered exoplanet is in the process of its chemical evolution towards life.
11. Markov-Korotayev Alternative to Exponential: A Cubic Growth
Figure 3, showing the linear growth of the Evo-Entropy over the last 3.5 billion years of evolution of life on Earth, illustrates the key factor in molecular evolution and allows for an immediate quantitative estimate of how much (in bits per individuals) any two species differ from each other; this being the key to cladistics. However, after 2007, this exponential vision was shaken by the alternative “cubic vision” now outlined.
This cubic vision is detailed in the full list of papers published by Andrey Korotayev and Alexander V. Markov et al., since 2007 [
1,
2,
3,
4,
5,
6,
7]. Another important publication is their mathematical paper [
8] relating to the new research field entitled “Big History”. In addition, a synthetic summary of the Markov-Korotayev theory of evolution appears on Wikipedia at
http://en.wikipedia.org/wiki/Andrey_Korotayev, for which an adapted excerpt is seen below:
“According to the above list of published papers, in 2007–2008 the Russian scientists Alexander V. Markov and Andrey Korotayev showed that a ‘hyperbolic’ mathematical model can be developed to describe the macrotrends of biological evolution. These authors demonstrated that changes in biodiversity through the Phanerozoic correlate much better with the hyperbolic model (widely used in demography and macrosociology) than with the exponential and logistic models (traditionally used in population biology and extensively applied to fossil biodiversity as well). The latter models imply that changes in diversity are guided by a first-order positive feedback (more ancestors, more descendants) and/or a negative feedback arising from resource limitation. Hyperbolic model implies a second-order positive feedback. The hyperbolic pattern of the world population growth has been demonstrated by Markov and Korotayev to arise from a second-order positive feedback between the population size and the rate of technological growth. According to Markov and Korotayev, the hyperbolic character of biodiversity growth can be similarly accounted for by a feedback between the diversity and community structure complexity. They suggest that the similarity between the curves of biodiversity and human population probably comes from the fact that both are derived from the interference of the hyperbolic trend with cyclical and stochastic dynamics [
1,
2,
3,
4,
5,
6,
7].”
This author was inspired by the following
Figure 4 (taken from the Wikipedia site
http://en.wikipedia.org/wiki/Andrey_Korotayev), showing the increase, but not monotonic increase, of the number of Genera (in thousands) during the last 542 million years of life on Earth, making up the Phanerozoic. Thus, it is postulated that the red curve in
Figure 4 could be regarded as the “Cubic mean value curve” of a lognormal stochastic process, just as the exponential mean value curve is typical of GBMs.
The Cubic Equation (31) may be used to represent the red line in
Figure 4, thus reconciling the Markov-Korotayev theory with our Evo-SETI theory. This is realized when considering the following numerical inputs to the Cubic Equation (31), that we derive from looking at
Figure 4. The precision of these numerical inputs is relatively unimportant at this early stage of matching the two theories (this one and the Markov-Korotayev’s), as we are just aiming for a “proof of concept”, and better numeric approximations might follow in the future.
In other words, the first two equations of Equation (52) mean that the first of the genera appeared on Earth about 530 million years ago, i.e., the number of genera on Earth was zero before 530 million years ago. In addition, the last two equations of Equation (52) mean that, at the present time
, the number of genera on Earth is approximately 4000, noting that a standard deviation of about ±1000 affects the average value of 4000. This is shown in
Figure 4 by the grey stochastic process referred to as all genera. It is re-phrased mathematically by assigning the fifth numeric input:
Then, as a consequence of the five numeric boundary inputs
, plus the standard deviation
on the current value of genera, Equation (19) yields the numeric value of the positive parameter
:
Having thus assigned numerical values to the first five conditions, only the conditions on the two abscissae of the Cubic maximum and minimum, respectively, tone to be assigned.
Figure 4 establishes them (in millions of years ago) as:
Finally, inserting these seven numeric inputs into the Cubic Equation (31), as well as into both of the equations of Equation (15) of the upper and lower standard deviation curves, the final plot shown in
Figure 5 is produced.
12. EvoEntropy of the Markov-Korotayev Cubic Growth
What is the EvoEntropy Equation (44) of the Markov-Korotayev Cubic growth Equation (31)?
To answer this question, Equation (31) needs to be inserted into Equation (44) and the resulting equation can then be plotted:
The plot of this function of
t is shown in
Figure 6.
13. Comparing the EvoEntropy of the Markov-Korotayev Cubic Growth, to the Hypothetical (1) Linear and (2) Parabolic Growth
It is a good idea to consider two more types of growth in the Phanerozoic:
- (1)
The LINEAR (= straight line) growth, given by the mean value of Equation (29)
- (2)
The PARABOLIC (= quadratic) growth, given by the mean value of Equation (30).
These can be compared with the CUBIC growth Equation (31) typical for the Markov-Korotayev model.
The results of this comparison are shown in the two diagrams (upper one and lower one) in
Figure 7.
For the sake of simplicity, we omit all detailed mathematical calculations and confine ourselves to writing down the equation of the:
- (1)
- (2)
PARABOLIC (quadratic) EvoEntropy:
- (3)
CUBIC (MARKOV-KOROTAYEV) EVOENTROPY, i.e., Equation (56).
14. Conclusions
The evolution of life on Earth over the last 3.5 to 4 billion years has barely been demonstrated in a mathematical form. Since 2012, I have attempted to rectify this deficiency by resorting to lognormal probability distributions in time, starting each at a different time instant
(birth), called
b-lognormals [
9,
10,
11,
12,
13,
14,
19]. My discovery of the Peak-Locus Theorem, which is valid for any enveloping mean value (and not just the exponential one (GBM), for the general proof see [
16], in particular
supplementary materials over there), has made it possible for the use of the Shannon Entropy of Information Theory as the correct mathematical tool for measuring the evolution of life in bits/individual.
In conclusion, the processes which occurred on Earth during the past 4 billion years can now be summarized by statistical equations, noting that this only relates to the evolution of life on Earth, and not on other exoplanets. The extending of this Evo-SETI theory to life on other exoplanets will only be possible when SETI, the current scientific search for extra-terrestrial intelligence, achieves the first contact between humans and an alien civilization.