The Fundamental Theorem of Natural Selection

John C. Baez

doi:10.3390/e23111436

¹

Department of Mathematics, University of California, Riverside, CA 92521, USA

²

Centre for Quantum Technologies, National University of Singapore, Singapore 117543, Singapore

Entropy2021, 23(11), 1436;https://doi.org/10.3390/e23111436

Version Notes

Order Reprints

Abstract

Suppose we have n different types of self-replicating entity, with the population

P_{i}

of the ith type changing at a rate equal to

P_{i}

times the fitness

f_{i}

of that type. Suppose the fitness

f_{i}

is any continuous function of all the populations

P_{1}, \dots, P_{n}

. Let

p_{i}

be the fraction of replicators that are of the ith type. Then

p = (p_{1}, \dots, p_{n})

is a time-dependent probability distribution, and we prove that its speed as measured by the Fisher information metric equals the variance in fitness. In rough terms, this says that the speed at which information is updated through natural selection equals the variance in fitness. This result can be seen as a modified version of Fisher’s fundamental theorem of natural selection. We compare it to Fisher’s original result as interpreted by Price, Ewens and Edwards.

Keywords:

Fisher information metric; natural selection; population biology; replicator equation; Lotka–Volterra equation

1. Introduction

In 1930, Fisher [] stated his “fundamental theorem of natural selection” as follows:

The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.

Some tried to make this statement precise as follows:

The time derivative of the mean fitness of a population equals the variance of its fitness.

However, this is only true under very restrictive conditions, so a controversy was ignited.

An interesting resolution was proposed by Price [], and later amplified by Ewens [] and Edwards []. We can formalize their idea as follows. Suppose we have n types of self-replicating entity, and idealize the population of the ith type as a positive real-valued function

P_{i} (t)

. Suppose

\frac{d}{d t} P_{i} (t) = f_{i} (P_{1} (t), \dots, P_{n} (t)) P_{i} (t)

where the fitness

f_{i}

is a differentiable function of the populations of every type of replicator. The mean fitness at time t is

\bar{f} (t) = \sum_{i = 1}^{n} p_{i} (t) f_{i} (P_{1} (t), \dots, P_{n} (t))

where

p_{i} (t)

is the fraction of replicators of the ith type:

p_{i} (t) = \frac{P_{i} (t)}{\sum_{j = 1}^{n} P_{j} (t)} .

By the product rule, the rate of change of the mean fitness is the sum of two terms:

\frac{d}{d t} \bar{f} (t) = \sum_{i = 1}^{n} {\dot{p}}_{i} (t) f_{i} (P_{1} (t), \dots, P_{n} (t)) + \sum_{i = 1}^{n} p_{i} (t) \frac{d}{d t} f_{i} (P_{1} (t), \dots, P_{n} (t)) .

The first of these two terms equals the variance of the fitness at time t. We give the easy proof in Theorem 1. Unfortunately, the conceptual significance of this first term is much less clear than that of the total rate of change of mean fitness. Ewens concluded that “the theorem does not provide the substantial biological statement that Fisher claimed”.

However, there is another way out, based on an idea Fisher himself introduced in 1922: Fisher information []. Fisher information gives rise to a Riemannian metric on the space of probability distributions on a finite set, called the ‘Fisher information metric’—or in the context of evolutionary game theory, the ‘Shahshahani metric’ [,,]. Using this metric we can define the speed at which a time-dependent probability distribution changes with time. We call this its ‘Fisher speed’. Under just the assumptions already stated, we prove in Theorem 2 that the Fisher speed of the probability distribution

p (t) = (p_{1} (t), \dots, p_{n} (t))

is the variance of the fitness at time t.

As explained by Harper [,], natural selection can be thought of as a learning process, and studied using ideas from information geometry []—that is, the geometry of the space of probability distributions. As

p (t)

changes with time, the rate at which information is updated is closely connected to its Fisher speed. Thus, our revised version of the fundamental theorem of natural selection can be loosely stated as follows:

As a population changes with time, the rate at which information is updated equals the variance of fitness.

The precise statement, with all the hypotheses, is in Theorem 2. However, one lesson is this: variance in fitness may not cause ‘progress’ in the sense of increased mean fitness, but it does cause change.

2. The Time Derivative of Mean Fitness

Suppose we have n different types of entity, which we call replicators. Let

P_{i} (t),

or

P_{i}

for short, be the population of the ith type of replicator at time t, which we idealize as taking positive real values. Then a very general form of the Lotka–Volterra equations says that

\frac{d P_{i}}{d t} = f_{i} (P_{1}, \dots, P_{n}) P_{i} .

(1)

where

f_{i} : {[0, \infty)}^{n} \to R

is the fitness function of the ith type of replicator. One might also consider fitness functions with explicit time dependence, but we do not do so here.

Let

p_{i} (t)

, or

p_{i}

for short, be the probability at time t that a randomly chosen replicator will be of the ith type. More precisely, this is the fraction of replicators of the ith type:

p_{i} = \frac{P_{i}}{\sum_{j} P_{j}} .

(2)

Using these probabilities, we can define the mean fitness

\bar{f}

by

\bar{f} = \sum_{j = 1}^{n} p_{j} f_{j} (P_{1}, \dots, P_{n})

(3)

and the variance in fitness by

Var (f) = \sum_{j = 1}^{n} p_{j} {(f_{j} (P_{1}, \dots, P_{n}) - \bar{f})}^{2} .

(4)

These quantities are also functions of t, but we suppress the t dependence in our notation.

Fisher said that the variance in fitness equals the rate of change of mean fitness. Price [], Ewens [] and Edwards [] argued that Fisher only meant to equate part of the rate of change in mean fitness to the variance in fitness. We can see this in the present context as follows. The time derivative of the mean fitness is the sum of two terms:

\frac{d \bar{f}}{d t} = \sum_{i = 1}^{n} {\dot{p}}_{i} f_{i} (P_{1} (t), \dots, P_{n} (t)) + \sum_{i = 1}^{n} p_{i} \frac{d}{d t} f_{i} (P_{1} (t), \dots, P_{n} (t))

(5)

and as we now show, the first term equals the variance in fitness.

Theorem 1.

Suppose positive real-valued functions

P_{i} (t)

obey the Lotka–Volterra equations for some continuous functions

f_{i} : {[0, \infty)}^{n} \to R

. Then

\sum_{i = 1}^{n} {\dot{p}}_{i} f_{i} (P_{1} (t), \dots, P_{n} (t)) = Var (f) .

Proof.

First we recall a standard formula for the time derivative

{\dot{p}}_{i}

. Using the definition of

p_{i}

in Equation (2), the quotient rule gives

{\dot{p}}_{i} = \frac{{\dot{P}}_{i}}{\sum_{j} P_{j}} - \frac{P_{i} (\sum_{j} {\dot{P}}_{j})}{{(\sum_{j} P_{j})}^{2}}

where all sums are from 1 to n. Using the Lotka–Volterra equations this becomes

{\dot{p}}_{i} = \frac{f_{i} P_{i}}{\sum_{j} P_{j}} - \frac{P_{i} (\sum_{j} f_{j} P_{j})}{{(\sum_{j} P_{j})}^{2}}

where we write

f_{i}

to mean

f_{i} (P_{1}, \dots, P_{n})

, and similarly for

f_{j}

. Using the definition of

p_{i}

again, this simplifies to:

{\dot{p}}_{i} = f_{i} p_{i} - (\sum_{j} f_{j} p_{j}) p_{i}

and thanks to the definition of mean fitness in Equation (3), this reduces to the well-known replicator equation:

{\dot{p}}_{i} = (f_{i} - \bar{f}) p_{i} .

(6)

Now, the replicator equation implies

\sum_{i} f_{i} {\dot{p}}_{i} = \sum_{i} f_{i} (f_{i} - \bar{f}) p_{i} .

(7)

On the other hand,

\sum_{i} \bar{f} (f_{i} - \bar{f}) p_{i} = \bar{f} \sum_{i} (f_{i} - \bar{f}) p_{i} = 0

(8)

since

\sum_{i} f_{i} p_{i} = \bar{f}

but also

\sum_{i} \bar{f} p_{i} = \bar{f}

. Subtracting Equation (8) from Equation (7) we obtain

\sum_{i} f_{i} {\dot{p}}_{i} = \sum_{i} (f_{i} - \bar{f}) (f_{i} - \bar{f}) p_{i}

or simply

\sum_{i} f_{i} {\dot{p}}_{i} = Var (f) . □

The second term of Equation (5) only vanishes in special cases, e.g., when the fitness functions

f_{i}

are constant. When the second term vanishes we have

\frac{d \bar{f}}{d t} = Var (f) .

This is a satisfying result. It says the mean fitness does not decrease, and it increases whenever some replicators are more fit than others, at a rate equal to the variance in fitness. However, we would like a more general result, and we can state one using a concept from information theory: the Fisher speed.

3. The Fisher Speed

While Theorem 1 allows us to express the variance in fitness in terms of the time derivatives of the probabilities

p_{i}

, it does so in a way that also explicitly involves the fitness functions

f_{i}

. We now prove a simpler formula for the variance in fitness, which equates it with the square of the ‘Fisher speed’ of the probability distribution

p = (p_{1}, \dots, p_{n})

.

The space of probability distributions on the set

{1, \dots, n}

is the

(n - 1)

-simplex

Δ^{n - 1} = {(x_{1}, \dots, x_{n}) : x_{i} \geq 0, \sum_{i = 1}^{n} x_{i} = 1}

The Fisher metric is the Riemannian metric g on the interior of the

(n - 1)

-simplex such that given a point p in the interior of

Δ^{n - 1}

and two tangent vectors

v, w

we have

g (v, w) = \sum_{i = 1}^{n} \frac{v_{i} w_{i}}{p_{i}} .

Here we are describing the tangent vectors

v, w

as vectors in

R^{n}

with the property that the sum of their components is zero: this makes them tangent to the

(n - 1)

-simplex. We are demanding that x be in the interior of the simplex to avoid dividing by zero, since on the boundary of the simplex we have

p_{i} = 0

for at least one choice of i.

If we have a time-dependent probability distribution

p (t)

moving in the interior of the

(n - 1)

-simplex as a function of time, its Fisher speed is defined by

\sqrt{g (\dot{p} (t), \dot{p} (t))} = {(\sum_{i = 1}^{n} \frac{{\dot{p}}_{i} {(t)}^{2}}{p_{i} (t)})}^{1 / 2}

if the derivative

\dot{p} (t)

exists. This is the usual formula for the speed of a curve moving in a Riemannian manifold, specialized to the case at hand.

These are all the formulas needed to prove our result. However, for readers unfamiliar with the Fisher metric, a few words may provide some intuition. The factor of

1 / p_{i}

in the Fisher metric changes the geometry of the simplex so that it becomes round, with the geometry of a portion of a sphere in

R^{n}

. But more relevant here is the Fisher metric’s connection to relative information—a generalization of Shannon information that depends on two probability distributions rather than just one []. Given probability distributions

p, q \in Δ^{n - 1}

, the information of q relative to p is

I (q, p) = \sum_{i = 1}^{n} q_{i} ln (\frac{q_{i}}{p_{i}}) .

This is the amount of information that has been updated if one replaces the prior distribution p with the posterior q. So, sometimes relative information is called the ‘information gain’. It is also called ‘relative entropy’ or ‘Kullback–Leibler divergence’. It has many applications to biology [,,,].

Suppose

p (t)

is a smooth curve in the interior of the

(n - 1)

-simplex. We can ask the rate at which information is being updated as time passes. Perhaps surprisingly, an easy calculation gives

\frac{d}{d t} I (p (t), p (t_{0})) |_{t = t_{0}} = 0 .

Thus, to first order, information is not being updated at all at any time

t_{0} \in R .

However, another well-known calculation (see, e.g., []) shows that

\frac{d^{2}}{d t^{2}} I (p (t), p (t_{0})) |_{t = t_{0}} = g (\dot{p} (t_{0}), \dot{p} (t_{0})) .

So, to second order in

t - t_{0}

, the square of the Fisher speed determines how much information is updated when we pass from

p (t_{0})

to

p (t)

.

Theorem 2.

Suppose positive real-valued functions

P_{i} (t)

obey the Lotka–Volterra equations for some continuous functions

f_{i} : {[0, \infty)}^{n} \to R

. Then the square of the Fisher speed of the probability distribution

p (t)

is the variance of the fitness:

g (\dot{p}, \dot{p}) = Var (f (P)) .

Proof.

Consider the square of the Fisher speed

g (\dot{p}, \dot{p}) = \sum_{i = 1}^{n} \frac{{\dot{p}}_{i}^{2}}{p_{i}}

and use the replicator equation

{\dot{p}}_{i} = (f_{i} - \bar{f}) p_{i}

obtaining

\begin{matrix} g (\dot{p}, \dot{p}) & = & \sum_{i = 1}^{n} {(f_{i} (P) - \bar{f} (P))}^{2} p_{i} \\ = & Var (f) \end{matrix}

as desired. □

The generality of this result is remarkable. Formally, any autonomous system of first-order differential equations

\frac{d}{d t} P_{i} (t) = F_{i} (P_{1} (t), \dots, P_{n} (t))

can be rewritten as Lotka–Volterra equations

\frac{d}{d t} P_{i} (t) = f_{i} (P_{1} (t), \dots, P_{n} (t)) P_{i} (t)

simply by setting

f_{i} (P_{1}, \dots, P_{n}) = F_{i} (P_{1}, \dots, P_{n}) / P_{i} .

In general

f_{i}

is undefined when

P_{i} = 0

, but this not a problem if we restrict ourselves to situations where all the populations

P_{i}

are positive; in these situations, Theorems 1 and 2 apply.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

This research was done at the Topos Institute. I thank Marc Harper for his invaluable continued help with this subject, and evolutionary game theory more generally. I also thank Rob Spekkens for some helpful comments.

Conflicts of Interest

The authors declares no conflict of interest.

References

Fisher, R.A. The Genetical Theory of Natural Selection; Clarendon Press: Oxford, UK, 1930. [Google Scholar]
Price, G.R. Fisher’s “fundamental theorem” made clear. Ann. Hum. Genet. 1972, 36, 129–140. [Google Scholar] [CrossRef] [PubMed]
Ewens, W.J. An interpretation and proof of the Fundamental Theorem of Natural Selection. Theor. Popul. Biol. 1989, 36, 167–180. [Google Scholar] [CrossRef]
Edwards, A.W.F. The fundamental theorem of natural selection. Biol. Rev. 1994, 69, 443–474. [Google Scholar] [CrossRef] [PubMed]
Fisher, R.A. On the mathematical foundations of theoretical statistics. Philos. Trans. A Math. Phys. Eng. Sci. 1922, 222, 309–368. [Google Scholar]
Akin, E. The Geometry of Population Genetics; Springer: Berlin/Heidelberg, Germany, 1979. [Google Scholar]
Akin, E. The differential geometry of population genetics and evolutionary games. In Mathematical and Statistical Developments of Evolutionary Theory; Lessard, S., Ed.; Springer: Berlin/Heidelberg, Germany, 1990; pp. 1–93. [Google Scholar]
Shahshahani, S. A new mathematical framework for the study of linkage and selection. Mem. Am. Math. Soc. 1979, 17, 211. [Google Scholar] [CrossRef]
Harper, M. Information geometry and evolutionary game theory. arXiv 2009, arXiv:0911.1383. [Google Scholar]
Harper, M. The replicator equation as an inference dynamic. arXiv 2009, arXiv:0911.1763. [Google Scholar]
Amari, S. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
Baez, J.C.; Pollard, B.S. Relative entropy in biological systems. Entropy 2016, 18, 46. [Google Scholar] [CrossRef]
Leinster, T. Entropy and Diversity: The Axiomatic Approach; Cambridge Press: Cambridge, UK, 2021. [Google Scholar]
Baez, J.C. Information Geometry, Part 7. 2011. Available online: https://math.ucr.edu/home/baez/information (accessed on 28 October 2021).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).