Information Theory in a Darwinian Evolution Population Dynamics Model

Kwessi, Eddy

doi:10.3390/sym16111522

Open AccessArticle

Information Theory in a Darwinian Evolution Population Dynamics Model

by

Eddy Kwessi

Department of Mathematics, Trinity University, 1 Trinity Place, San Antonio, TX 78212, USA

Symmetry 2024, 16(11), 1522; https://doi.org/10.3390/sym16111522

Submission received: 14 October 2024 / Revised: 1 November 2024 / Accepted: 11 November 2024 / Published: 13 November 2024

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

Since Darwin, evolutionary population dynamics has captivated scientists and has applications beyond biology, such as in game theory where economists use it to explore evolution in new ways. This approach has renewed interest in dynamic evolutionary systems. In this paper, we propose an information-theoretic method to estimate trait parameters in a Darwinian model for species with single or multiple traits. Using Fisher information, we assess estimation errors and demonstrate the method through simulations.

Keywords:

information theory; evolution; Fisher’s information; population dynamics

1. Introduction

Evolution can be seen as the dynamic changes in organisms’ traits, driven largely by environmental factors and occurring through mutations or random genetic variations. Darwin’s theory of natural selection ([1]) suggests that species best adapted to their environment pass beneficial traits to their offspring. Over time, advantageous traits may lead to the emergence of new species, making the tracking of trait variations central to understanding evolution.

Information theory offers tools to track such evolutionary changes. For a random variable X depending on a trait parameter

θ

, the Fisher’s information

I (θ)

measures the information X provides about

θ

([2]). Since its introduction ([3]), the Fisher’s information has been widely applied, including in Bayesian statistics ([4]), to derive the Wald’s test ([5]), to estimate Cramér–Rao bounds ([6,7]), in optimal experimental design ([8]), in machine learning ([9]), and in epidemiology ([10]), just to name some. Notably, it connects to relative entropy (or Kullback–Leibler divergence), as the Fisher’s information represents the Hessian of relative entropy with respect to

θ

. In evolutionary biology, information in genomes is expected to change as environments evolve.

Our focus on evolutionary population dynamics and Fisher information builds on the work of [11], where Darwinian dynamics are explored through evolutionary game theory. They analyze a static game with n players, each choosing a strategy

θ_{i}

to maximize their payoff

f_{i} (Θ)

, defined by

f_{i} (Θ), i \in {1, 2, \dots, n},

where

Θ = (θ_{1}, θ_{2}, \dots, θ_{n})

. This evolves into a dynamic game represented by an ordinary differential equation

{\dot{x}}_{i} = F_{i} (x, Θ), i \in {1, 2, \dots, n},

where

x = (x_{1}, x_{2}, \dots, x_{n})

, and

F_{i}

is the instantaneous payoff for player i. In ecology, this approach adapts to n species, with

f_{i} (Θ)

as the per capita growth rate of a species with density

x_{i}

and trait

θ_{i}

. Ref. [12] introduced the G-function to generalize the growth rate:

G (x, θ, Θ) |_{θ = θ_{i}} = f_{i} (x, Θ), i \in {1, 2, \dots, n},

where

θ

is a “virtual variable”. This leads to an evolution equation for strategies:

{\dot{θ}}_{i} = σ^{2} g (x, θ, Θ) |_{θ = θ_{i}}, where g (x, θ, Θ) = \frac{\partial ln (G (x, θ, Θ))}{\partial θ} .

The connection between game theory and evolution population dynamics can be made using the Table 1 below:

In light of this comparison, estimation of parameters

θ_{i}

’s, either in a game theory context or in a population dynamics context, is of paramount importance if one aims to make meaningful predictions, given data, on the underlying model. Now, consider the following population dynamics system:

\begin{matrix} {\dot{x}}_{i} & = x_{i} G (x, θ, Θ) |_{θ = θ_{i}} \\ {\dot{θ}}_{i} & = σ^{2} g (x, θ, Θ) |_{θ = θ_{i}} \end{matrix},

where

σ^{2}

represents the variance of trait distribution. Solutions to this system, called Evolutionarily Stable Strategies (ESSs), provide insight into stable trait configurations ([13,14]). For an ESS, G must attain a maximum at zero with respect to

θ

([12]). Recent work on discrete versions of these models includes studies of evolutionary stability ([15]), difference equation schemes ([16]), and applications to disease resistance ([17]). This paper addresses two questions raised in [11]: 1. Are there interpretable relations between the maximum of G, g, and

\frac{\partial g}{\partial θ} < 0

? 2. Are there similar interpretable relations for

g = 0

and

\frac{\partial g}{\partial θ} > 0

?

Our literature review found no comprehensive answers to these questions. To explore them, we frame G in the context of random variables. For one species, if

G = G (x, θ)

represents a density function for a random variable X depending on an unknown trait

θ

, Fisher information quantifies the information a sample of X contains about

θ

. By analyzing G and its derivatives, we gain insights into evolutionary traits and parameter estimation. This paper develops a statistical framework for estimating species traits in Darwinian models. By minimizing or maximizing the Fisher’s information in this context, we not only address the questions above but also offer a method for estimating unknown parameters in evolution models. This paper is organized as follows: Section 2 provides an overview of Fisher information in statistics. Section 3 discusses Fisher information within discrete evolutionary population dynamics. Section 4 concludes with final remarks.

2. Review of Fisher’s Information Theory

Let

p (x, θ)

represent the probability density function (pdf) of a random variable X, which can be continuous or discrete on an open set

X \times Ω \subset R \times R^{n}

. Here,

θ = (θ_{1}, θ_{2}, \dots, θ_{n})

denotes a single parameter or a vector of parameters. Since

p (x, θ)

is a pdf, it must satisfy

\int_{X} p (x, θ) d x = 1

. If we have a nonnegative integrable function

p_{0} (x, θ)

defined on

X

, it can be normalized to become a pdf by defining

p (x, θ) = c p_{0} (x, θ)

, where

c = {(\int_{X} p_{0} (x, θ) d x)}^{- 1}

. In what follows, we assume the following conditions on

p (x, θ)

:

(1): Support independence: The support ${x \in X : p (x, θ) \neq 0}$ of p is independent of $θ$ .
(2): Nonnegativity and integrability: $p (\cdot, θ)$ is nonnegative for all $θ \in Ω$ and $p \in L^{1} (X)$ .
(3): Smoothness: $p (x, \cdot) \in C^{2} (Ω)$ , meaning $p (x, θ)$ is continuously differentiable up to the second order with respect to $θ$ for all $x \in R$ .

The first condition excludes cases like a uniform distribution

p (x, θ) = \frac{1}{θ}

, where the support varies with

θ

. The second condition ensures that the log-density

f_{0} (x, θ) = ln (p (x, θ))

, its first derivative (score function)

f_{1} (x, θ) = \nabla_{θ} f_{0} (x, θ)

, and its second derivative

f_{2} (x, θ) = \nabla_{θ}^{2} f_{0} (x, θ)

are well defined. We denote the expectation of a random variable X by

E [X]

.

Definition 1

(Fisher information). For a random variable X with density

p (x, θ)

satisfying assumptions (1) and (2), the Fisher information of X is defined as

I (θ) = E_{X} [{(f_{1} (X, θ))}^{2}] = - E_{X} [f_{2} (X, θ)] .

If θ is a vector, then

I (θ)

is a symmetric positive definite matrix, given by

I_{k l} (θ) = E_{X} [\frac{\partial^{2} f_{0} (X, θ)}{\partial θ_{k} \partial θ_{l}}], for 1 \leq k, l \leq n .

The Fisher information

I (θ)

quantifies the amount of information about

θ

provided by an estimator based on X. When

θ

is estimated from a random sample

X_{1}, X_{2}, \dots, X_{n}

drawn from

p (x, θ)

, the Fisher information in the sample is

I_{n} (θ) = n I (θ) .

Definition 2

(estimator and properties). Let X be a random variable depending on the parameter vector θ, and let

X_{1}, X_{2}, \dots, X_{n}

be a random sample from X. We define the following:

Estimator: $T = T (X_{1}, X_{2}, \dots, X_{n})$ is called an estimator of θ.
Unbiased estimator: T is unbiased if $E [T] = θ$ .
Efficient estimator: An unbiased estimator T is efficient if its variance satisfies $Var (T) = \frac{1}{I_{n} (θ)}$ .

In particular, if

T = T (Θ)

is an estimator of

Θ

based on a sample

X_{1}, X_{2}, \dots, X_{n}

, the Cramér–Rao (see for instance [6,7]) bound gives an estimate of the best lower bound for the variance of T as

Var (T) \geq {(\nabla_{Θ} E (T))}^{T} I_{n} {(Θ)}^{- 1} (\nabla_{Θ} E (T)) .

(1)

Equality is obtained in (1) if T is efficient. If T is unbiased, then

E [T] = Θ

and consequently,

\nabla_{Θ} E [T] = 1

, where

1

is a vector of ones

R^{n}

. Therefore, (1) becomes

Var (T) = I_{n} {(Θ)}^{- 1} .

(2)

Example 1.

Suppose X is a random variable with distribution

f (x, θ) = θ e^{- θ x}

, where

θ > 0

. Then we have that

λ (x, θ) = ln (θ) - θ x, g (x, θ) = \frac{1}{θ} - x

, and

h (x, θ) = - \frac{1}{θ^{2}}

. It follows that

I (θ) = - E_{X} [h (x, θ)] = \frac{1}{θ^{2}}

and

I_{n} (θ) = \frac{n}{θ^{2}}

. If we now consider a random sample

X_{1}, X_{2}, \dots, X_{n}

from X, and

T = T (θ) = \frac{1}{n} \sum_{i = 1}^{n} X_{i}

, we note that

E [X_{i}] = θ

, so that

E [T] = θ

. Thus, T is an unbiased estimator of θ and

\frac{d}{d θ} T = 1

. Moreover,

V a r (T) = \frac{θ^{2}}{n}

. We then verify from (2) that indeed we have

V a r (T) = \frac{θ^{2}}{n} = \frac{1}{I_{n} (θ)}

. Consequently, accurate estimates of Θ have large Fisher’s information (matrix) components, whereas inaccurate ones have small Fisher’s information components.

3. Evolution Population Dynamics and Information Theory

3.1. Single Population Model with One Trait

Consider the following discrete evolutionary dynamical model:

\{\begin{matrix} x_{t + 1} & = x_{t} G (x_{t}, θ_{t}, u_{t}) \\ θ_{t + 1} & = θ_{t} + σ^{2} g (x_{t}, θ_{t}, u_{t}) \end{matrix},

(3)

where

G (x, θ, u) = b (θ) e^{- c_{u} (θ) x}

where

b (θ) = b_{0} e^{- \frac{θ^{2}}{2 w^{2}}}

and

c_{u} (θ) = c_{0} e^{- κ (θ - u)}

, for a constant u and for some positive constants

σ

(speed of evolution),

b_{0}

(initial birth rate),

c_{0}

(competition constant),

κ

, and w (standard deviation of the distribution of birth rates), and for a differentiable function

c_{u} (θ)

of

θ

and positive and continuous function

b (θ)

. This system has nontrivial fixed points

(x^{*}, θ^{*})

if they satisfy the equations

\{\begin{matrix} 1 & = b (θ) e^{- c_{u} (θ) x} \\ \frac{1}{b (θ)} & = c_{u}^{'} (θ) x \end{matrix} .

(4)

This can further be reduced to the condition on

b (θ)

and

c_{u} (θ)

given by

\frac{ln (b (θ))}{c_{u} (θ)} = \frac{1}{b (θ) c_{u}^{'} (θ)} .

(5)

The theorem below shows how to obtain the Fisher’s information of the above system as a function of the system’s parameters.

Theorem 1.

Let

Γ_{u} (θ) = \frac{b (θ)}{c_{u} (θ)}

. Then the Fisher’s information of this system is constant and given by

I (θ) = \frac{1}{ω^{2}} + κ^{2} Γ_{u} (θ) .

(6)

The proof can be found in Appendix A.1.

The corollary below shows that the Fisher’s information has a maximum value, see Figure 1 below for an illustration.

Corollary 1.

The Fisher’s information

I (θ)

attains its maximum value

I_{m a x}

for

θ_{m a x} = w^{2} κ

, and the maximum value is

I_{m a x} = \frac{1}{w^{2}} + κ^{2} \frac{b_{0}}{c_{0}} e^{\frac{1}{2} {(w κ)}^{2} - κ u} .

(7)

The proof can be found in Appendix A.2.

In the proposition below, we give precise conditions for the existence of nontrivial fixed points of the system above.

Proposition 1.

Let

ξ_{1} = \frac{1}{κ^{2}} + 2 w^{2} ln (b_{0}) .

If

ξ_{1} < 0

, then the Darwinian system (9) does not have a nontrivial critical point.

If

ξ_{1} = 0

, then the Darwinian system has a unique nontrivial fixed point

(x_{*}, θ_{*})

given as

x_{*} = \frac{ln (b_{0}) - \frac{θ_{*}}{2 w^{2}}}{c_{0} e^{- κ (θ_{*} - u)}}, θ_{*} = - \frac{1}{κ} .

If

ξ_{1} > 0

, then the Darwinian system has two nontrivial fixed points

(x_{* +}, θ_{* +})

and

(x_{* -}, θ_{* -})

given by

x_{* \pm} = \frac{ln (b_{0}) - \frac{θ_{* \pm}}{2 w^{2}}}{c_{0} e^{- κ (θ_{* \pm} - u)}}, θ_{* \pm} = - \frac{1}{κ} \pm \sqrt{ξ_{0}} .

Proposition 1 and Theorem 1 are important in that when

θ_{t} \to θ_{*}

as

t \to \infty

, then by continuity of the function

I (θ)

with respect to

θ

, we will have

I (θ_{t}) \to I (θ_{*})

as

t \to \infty

. This means that the Fisher’s information, over time, will be maximized at the critical point

(x_{*}, θ_{*})

of the dynamical system. Therefore, for estimation purposes, the reciprocal of

I (θ_{*})

will be the smallest variance for any unbiased estimator of the trait

θ

. In Figure 2, below, we use the following parameters:

w = 3; κ = 3; b_{0} = 10; c_{0} = 0.5; u = 1; σ = 5; n = 750;

x_{0} = 1; θ_{0} = 10

.

Special case:

We will now discuss the particular case of an exponential distribution, that is,

b (θ) = c_{u} (θ) = θ

. Clearly, the condition (5) is satisfied with

θ = θ_{*} = : e

and

x = x_{*} : = e^{- 1}

. Therefore,

\begin{matrix} G (x, θ) & = θ e^{- θ x}, \\ λ (x, θ) & = ln (θ) - θ x, \\ g (x, θ) & = \frac{1}{θ} - x, \\ h (x, θ) & = - \frac{1}{θ^{2}} . \end{matrix}

(8)

This implies that

I (θ) = - E_{X} [h (X, θ)] = \frac{1}{θ^{2}}

. It follows that there are equilibrium fixed points: the extinction equilibrium (trivial point)

E_{0} = (0, 0)

and the interior equilibrium (nontrivial point)

E_{1} = (e^{- 1}, e)

. In Figure 3, below, we represent functions

G (x, θ), F (x, θ = x G (x, θ)

, and

λ (x, θ)

for

θ = \frac{3}{2}

. This shows that

λ (x, θ)

is minimized where

G (x, θ)

is maximized, providing a clue as to the relation between the maximum of G and the critical points of g and

\frac{\partial g}{\partial θ}

. Another clue can be found in Figure 4 below.

From a dynamical systems’ perspective, it means that

X_{t}

, the value of X at time t, is generated from the distribution

G (x, θ)

and used to calculate the value of

X_{t + 1}

. Therefore, the role of the first choice of

θ

is to initialize the dynamical system. Once the system is initialized for

X_{t}

, we can use an information theory approach to provide an estimator

T (θ)

of

θ_{t}

, the value of

θ

at time t. That estimator, if efficient, will have variance

I_{n} {(θ_{t})}^{- 1}

. We can then use the dynamical system (9) to estimate

θ_{t + 1}

, and the Fisher’s information will provide its variance. We note that

θ_{t + 1}

will only be an estimate of the true value and therefore will carry an error as t changes. It is therefore expected that at the nontrivial critical point (ESS)

(x^{*}, θ^{*})

of the dynamical system, the estimator

T (θ)

converges to

θ^{*}

and the variance of

T (θ)

converges to

I_{n} {(θ_{*})}^{- 1}

as

t \to \infty

.

Figure 5 below is an illustration of this fact for

G (x, θ) = θ e^{- θ x}

.

Indeed, we generated two random samples of size

n = 50

(Figure 5a,b) and

n = 250

(Figure 5c,d) with

t = 1, 2 \dots, m

where

m = 10

and

σ = 0.04

from exponential distributions with respective initial parameters

θ_{0} = 0.2, 2.13

. We choose

T_{n} (θ_{t}) = \frac{1}{n} \sum_{i = 1}^{n} X_{i}

, which is known to be an efficient estimator of

θ

. The dashed lines represent the respective values of

T_{n} (θ_{t})

, and the black lines represent their 95% confidence intervals

(T_{n} (θ_{t}) - 1.96 I_{n} {(θ_{t})}^{- 1}, T_{n} (θ_{t}) + 1.96 I_{n} {(θ_{t})}^{- 1})

. This shows in particular that on average,

T_{n} (θ_{t})

converges to e (dashed line), the fixed point of the dynamical system, as expected from the above. It also shows that as the Fisher’s formation gets larger, the variance of the estimator gets smaller and thus the width of the confidence interval gets smaller and quickly approaches zero as in Figure 5c,d. The conclusion is that given an evolution dynamical system, we can estimate the evolution parameter vector

Θ

and provide an estimate of the error on the estimate that is the inverse of the Fisher’s information. Moreover, over time, the estimate converges to the ESS of the system (when it exists) and the errors on the estimate essentially becomes zero when convergence is reached.

Remark 1.

We observe that convergence of

θ_{t}

towards e as predicted depends on choosing an appropriate value of σ. Large values of σ will definitely make the system unstable as oscillations will slowly and increasingly occur, see Figure 6 below.

3.2. Discussion

Assumption

A_{1}

is important in that we only require that G be nonnegative and

G \in L^{1} (Ω)

, which guarantees that it can be transformed into the density of a random variable. It does not, however, guarantee that we can easily obtain a sample from it! If G happens to be a classical distribution (normal, exponential, t-distribution, Weibull, etc.), then there are sampling methods already available. If G has a nonclassic expression, we may have to resort to either the Probability Integral Transform (see Theorem 2.1.10, p. 54 in [18]) or to Markov Chain Monte Carlo (MCMC) to obtain a sample, which sometimes are themselves onerous in terms of time. Obtaining an efficient estimator of

θ

is easily undertaken when G is a classic distribution. While efficiency would be great, it may not be necessary since over time, the estimator would still converge, albeit slowly, to the fixed point of

θ

. We observe that the estimates we obtain in this case are point estimates of

T_{n} (θ_{t})

(mean, median, etc.), albeit at each time t. A Bayesian estimate is also possible, provided that the initial distribution of

θ_{0}

is selected from a well-defined Jeffrey’s prior. As for answers to the questions raised in the Introduction, we can say based on the above that the set of points

(x, θ)

where

G (x, θ)

is maximized is the same where

ln (G (x, θ))

is minimized and contains all the critical points of the function G. This can be written formally as

Arg {max}_{θ \in Ω} G (x, θ) = Arg {min}_{θ \in Ω} ln (G (x, θ)) \supseteq (x, θ) \in X \times Ω : g (x, θ) = 0

.

3.3. Single Population Model with Multiple Traits

Now suppose we are in the presence of one species with density x possessing n traits given by the vector

Θ = (θ_{1}, θ_{2}, \dots, θ_{n})

and a vector

U = (u_{1}, u_{2}, \dots, u_{n})

.

(

H_{1}

) We assume that

b (Θ) = b_{0} e x p (- \sum_{i = 1}^{n} \frac{θ_{i}^{2}}{2 w_{i}^{2}})

is the joint distribution of the independent traits

θ_{i}

, each with mean 0 and variance

w_{i}^{2}

.

(

H_{2}

) We also assume that

c_{U} (Θ) = c_{0} e x p (- \sum_{i = 1}^{n} κ_{i} (θ_{i} - u_{i}))

.

(

H_{3}

) We assume that the density of

x_{t}

is given as

G (x, Θ, U) = b (Θ) \exp (- c_{U} (Θ) x)

at

t = 1

.

Under

H_{1}, H_{2}

, and

H_{3}

, we consider the discrete dynamical system

\{\begin{matrix} x_{t + 1} & = x_{t} G (x_{t}, Θ_{t}, U_{t}) \\ Θ_{t + 1} & = Θ_{t} + Σ g (x_{t}, Θ_{t}, U_{t}) \end{matrix},

(9)

where

Σ = (\begin{matrix} σ_{11} & σ_{12} & \dots & σ_{1 n} \\ σ_{21} & σ_{22} & \dots & σ_{2 n} \\ ⋮ & ⋮ & \dots & ⋮ \\ σ_{n 1} & σ_{n 2} & \dots & σ_{n n} \end{matrix}) .

(10)

Remark 2.

There are a couple of distinctions between this model and the ones encountered in the recent literature, see for instance [16,19]. Firstly, the matrix Σ considered here is not necessarily symmetric (

σ_{i j} \neq σ_{j i}

). Secondly, the competition function

c_{U} (Θ)

depends subtly on a vector

U

that need not be the mean of Θ, as it is often considered. We note, however, that when

Θ = U

, then

c_{U} (Θ) = c_{0}

. Ecologically, this happens when competition is maximal. This leads to recovering the uncoupled Darwinian model (2) in [19].

The result below shows how to obtain the Fisher’s information of the Darwinian dynamical system (9).

Theorem 2.

Let

Γ_{U} (Θ) = \frac{b (Θ)}{c_{U} (Θ)}

. Then under assumptions

H_{1}, H_{2}

, and

H_{3}

, the dynamical system above has the Fisher’s information matrix

I (Θ)

given as

I (Θ) = (\begin{matrix} \frac{1}{w_{1}^{2}} + κ_{1}^{2} Γ_{U} (Θ) & κ_{1} κ_{2} Γ_{U} (Θ) & \dots & κ_{1} κ_{n} Γ_{U} (Θ) \\ κ_{1} κ_{2} Γ_{U} (Θ) & \frac{1}{w_{2}^{2}} + κ_{2}^{2} Γ_{U} (Θ) & \dots & κ_{2} κ_{n} Γ_{U} (Θ) \\ ⋮ & ⋮ & \dots & ⋮ \\ κ_{1} κ_{n} Γ_{U} (Θ) & κ_{2} κ_{n} Γ_{U} (Θ) & \dots & \frac{1}{w_{n}^{2}} + κ_{n}^{2} Γ_{U} (Θ) \end{matrix}) .

(11)

The proof of this result can be found in Appendix B.

Remark 3.

We observe that in the case of a high-dimension vector of parameters Θ, the theorem above gives us not only the variance (the diagonal terms of

I (Θ)

) on the estimator of each component of Θ but also the covariances between pairs of estimates (the off-diagonal terms) of different components (or traits). This may be important, especially to distinguish correlated and noncorrelated traits or strategies.

A necessary condition for the existence of an extinction equilibrium

(0, 0)

for this system is that

\det (Σ) \neq 0

. Letting

ρ (A)

represent the spectral radius of matrix A, it was proved in [16] that if

ρ (I + Σ h (0, 0)) < 1

and

b_{0} < 1

, then the extinction equilibrium is asymptotically stable and unstable if

ρ (I + Σ h (0, 0)) < 1

and

b_{0} > 1

, or if

ρ (I + Σ h (0, 0)) > 1

for all

b_{0} > 0

. This is particularly true if

Σ

is diagonally dominant and for

σ_{12} = σ_{21} = 0

. This system admits a nontrivial fixed point (positive equilibrium)

(x_{*}, Θ_{*})

if

G (x_{*}, Θ_{*}) = 1

and

\det (Σ) = 0

. In the proposition below, we give a more precise characterization of nontrivial fixed points of the system (15).

Proposition 2.

Assume

d e t (Σ) = 0

and

σ_{i i} \neq 0

for

i, j = 1, \dots, n

. Put

μ_{i j} = \frac{σ_{i j}}{σ_{i i}}, ν_{j} = κ_{j} + \sum_{i = 1, i \neq j}^{n} μ_{i j} κ_{i},

and given

j \in 1, 2, \dots, n

ξ_{n j} : = \frac{1}{ν_{j}^{2}} (\frac{1}{w_{j}^{2}} + \sum_{i = 1, i \neq j}^{n} \frac{μ_{i j}^{2}}{w_{i}^{2}}) + 2 ln (b_{0}) .

If

ξ_{n j} < 0

, then there is no nontrivial solution for the system (9).

Now suppose

ξ_{n j} \geq 0

. Then the system (9) has a nontrivial solution

(x_{*}, Θ_{*})

given as

x_{*} = \frac{ln (b_{0}) - \sum_{i = 1}^{n} \frac{{(θ_{i *})}^{2}}{2 w_{i}^{2}}}{c_{0} e x p (- \sum_{i = 1}^{n} κ_{i} (θ_{i *} - u_{i}))},

(12)

and

(i): If $ξ_{n j} = 0$ , then $Θ_{*} = (- \frac{1}{ν}, - \frac{μ_{2}}{ν}, \dots, - \frac{μ_{n}}{ν})$ .
(ii): If $ξ_{n j} > 0$ , then the coordinates of the vector $Θ_{*}$ are points that lie on the curve of equation

$\frac{1}{w_{j}^{2}} {(θ_{j} + \frac{1}{ν})}^{2} + \sum_{i = 1, i \neq j}^{n} \frac{1}{w_{i}^{2}} {(θ_{i} + \frac{μ_{i j}}{ν_{j}})}^{2} - ξ_{n j} = 0 .$

(13)

The proof can be found in Appendix C.

Special case: single species with two traits.

Here we consider the particular case of a system of one species with two traits, namely, the coupled dynamical system

\{\begin{matrix} x_{t + 1} & = x_{t} G (x_{t}, θ_{1, t}, θ_{2, t}) \\ θ_{1, t + 1} & = θ_{1, t} + σ_{11} g_{1} (x_{t}, θ_{1, t}, θ_{2, t}) + σ_{12} g_{2} (x_{t}, θ_{1, t}, θ_{2, t}) \\ θ_{2, t + 1} & = θ_{2, t} + σ_{21} g_{1} (x_{t}, θ_{1, t}, θ_{2, t}) + σ_{22} g_{2} (x_{t}, θ_{1, t}, θ_{2, t}) \end{matrix} .

(14)

The fixed points

(x_{*}, θ_{1 *}, θ_{2 *})

of this model are solutions of the system of equations

\{\begin{matrix} x G (x, u, v) & = x \\ Σ g (x, u, v) & = 0 \end{matrix},

(15)

where

Σ = (\begin{matrix} σ_{11} & σ_{12} \\ σ_{21} & σ_{22} \end{matrix}), for (u, v) \in R^{2} .

Corollary 2.

Assume

σ_{11}, σ_{22} \neq 0

and

σ_{11} σ_{22} = σ_{12} σ_{21}

. We put

μ_{1} = \frac{σ_{21}}{σ_{22}}, μ_{2} = \frac{σ_{12}}{σ_{11}}, ν = κ_{1} + μ_{2} κ_{2},

and

ξ_{2} : = \frac{1}{ν^{2}} (\frac{1}{w_{1}^{2}} + \frac{μ_{2}^{2}}{w_{2}^{2}}) + 2 ln (b_{0}) .

If

ξ_{2} < 0

, then there is no nontrivial solution for the system (14).

Now suppose

ξ_{2} \geq 0

. Then the system (14) has a nontrivial solution

(x_{*}, θ_{1 *}, θ_{2 *})

given as

x_{*} = \frac{ln (b_{0}) - \sum_{i = 1}^{n} \frac{{(θ_{i *})}^{2}}{2 w_{i}^{2}}}{c_{0} e x p (- \sum_{i = 1}^{n} κ_{i} (θ_{i *} - u_{i}))}, f o r n = 2 .

(16)

and

(i): If $ξ_{2} = 0$ , then $(θ_{1 *}, θ_{2 *}) = (- \frac{1}{ν}, - \frac{μ_{2}}{ν})$ .
(ii): If $ξ_{2} > 0$ , then $(θ_{1 *}, θ_{2 *})$ are points that lie on the ellipse of equation

$\frac{{(θ_{1} + \frac{1}{ν})}^{2}}{a^{2}} + \frac{{(θ_{2} + \frac{μ_{2}}{ν})}^{2}}{b^{2}} = 1, w h e r e a = w_{1} \sqrt{ξ_{2}} a n d b = w_{2} \sqrt{ξ_{2}} .$

(17)

In Figure 7, below, we illustrate Proposition 2 for

x_{0} = 1, θ_{10} = 3; θ_{20} = 5; w_{1} = 4;

w_{2} = 1; κ_{1} = 3; κ_{2} = 0.5, σ 11 = 0.1; σ_{21} = 1; σ_{11} = 1; σ_{12} = 2; σ_{22} = 2; u_{1} = 3; u_{2} = 0,

c_{0} = 0.1

. We verify that

μ_{1} = σ_{21} / σ_{22} = 0.5

and

μ_{2} = σ_{12} / σ_{11} = 2

. Therefore,

μ_{1} μ_{2} = 1

, that is,

\det (Σ) = 0

. We also have that

ν = 4

and that

ξ_{2} \approx 4.86 > 0

. Hence, according to Proposition 2 above,

(θ_{1 *}, θ_{2 *})

is expected to be on the ellipse centered as

(- 1 / ν, μ_{2} / μ) = (- 0.25, - 0.5)

with respective major and minor axis lengths

a = w_{1} \sqrt{ξ} \approx 8.82

and

b = w_{2} \sqrt{ξ} \approx 2.21

.

The parameters for Figure 8 are the same except for

θ_{10} = 3, θ_{20} = 1

and

x_{0}

is generated from an exponential distribution with parameters

c_{U} (Θ) = c_{0} \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i 0} - u_{i}))

.

3.4. Discussion

An important remark about Theorem 2 is that we assume the vector U is given. In fact, if the vector

Θ

were given, the same technique could have been used for the estimation of U, up to a negative sign on the Fisher’s information matrix. Having U be different from the average of

Θ

allows for generalization, in that

Θ - U

represents the difference of a set of traits

Θ

from a given set of traits U, which need not be the average of

Θ

. One thing we have not insisted much about in this paper is the type of estimator of

Θ

itself. We do not need to specify in particular which estimator to use, since the inverse of the Fisher’s information is the smallest variance for all estimators. One consequence of Proposition 2 above is that once we have estimated

Θ_{*}

, we can deduce the value of

x_{*}

. The results above also show that there may be many equilibria when

ξ_{n} > 0

. In the context of evolution and natural selection, one should focus on equilibria that ensure better adaptation to environmental fluctuations. This specifically means adding stochasticity to the model by means of, say, a Wiener process and finding traits that ensure, for example, that on average, the species density is bounded away from the extinction equilibrium. Another possibility could be to focus on equilibria that maximize the species density in order to increase the prospects of survivability of the species. This can be performed using a constraint optimization problem

max_{u \in R^{n}} f (U)

where

f (U) = x_{*}

subject to the constraint that the point u be on the curve defined in Equation (13). In the particular case where

n = 2

with

U = (u, v)

, we have

f (u, v) : = \frac{ln (b_{0}) - \frac{u^{2}}{2 w_{1}^{2}} - \frac{v^{2}}{2 w_{2}^{2}}}{c_{0} \exp (- κ_{1} (u - u_{i}) - κ_{2} (v - u_{i}))} .

However, the form of the function

f (u, v)

makes it a very challenging problem. Rewriting, we have

f (u, v) : = c_{0}^{- 1} (ln (b_{0}) - \frac{u^{2}}{2 w_{1}^{2}} - \frac{v^{2}}{2 w_{2}^{2}}) e^{κ_{1} (u - u_{i}) + κ_{2} (v - u_{i})} .

We observe that for positive

κ_{1}

and

κ_{2}

, the function

(u, v) \mapsto e^{κ_{1} (u - u_{i}) + κ_{2} (v - u_{i})}

is a positive and increasing function of u and v. Therefore, maximizing

f (u, v)

amounts to maximizing

f_{*} (u, v) = ln (b_{0}) - \frac{u^{2}}{2 w_{1}^{2}} - \frac{v^{2}}{2 w_{2}^{2}} .

(18)

Geometrically, this is the equation of a paraboloid that bends down. Therefore, the constraint optimization amounts to finding the points of intersection between a paraboloid and an ellipse. This means there can be between 0 and 4 points of intersection. Finally, let us observe that the expression of

x^{*}

in the case of multiple traits is just a generalization of the case of one trait. In fact, the first equation in the system (4) can be written as

x^{*} = \frac{ln (b (θ))}{c_{u} (θ)}

. We see that this is similar to the expression of

x^{*}

given in Equation (A3) for two traits, which naturally generalizes to the case of one species with

n \geq 2

traits.

4. Conclusions

In this paper, we introduced a method for estimating trait coefficients in a Darwinian evolution population dynamics model by employing the Fisher information matrix. This approach enables us not only to estimate traits effectively but also to characterize the uncertainty inherent in the estimation process. Our study focuses on two specific cases: a single species with one trait and a single species with multiple traits. Extending this framework to scenarios involving multiple species with one or more traits represents an intriguing avenue for future research, offering the potential to understand interspecies interactions in greater detail.

An essential contribution of our work is the proposed relationship between the G-function, its natural logarithm, and its derivative g. This relationship facilitates a more nuanced understanding of evolutionary dynamics and provides a clearer basis for examining trait adaptation over time. As a by-product, our method yields a precise characterization of the nontrivial fixed points of the model. Specifically, we demonstrated that once the critical density

x^{*}

is determined, the set of critical traits

Θ_{*}

aligns along a well-defined curve in

R^{n}

. Conversely, if the set of critical traits

Θ_{*}

is known, there may exist a unique critical density

x_{*}

for the Darwinian system. Notably, this density may not necessarily ensure the survival of the species, adding complexity to our understanding of stable evolutionary states.

In addition to the traditional approach, we explored the potential of using modern machine learning techniques for trait parameter estimation. Traits could be estimated by minimizing relative information, such as Kullback–Leibler divergence, within a Darwinian evolution population model. This could be achieved via either classical gradient ascent or stochastic gradient ascent methods, with careful consideration given to selecting appropriate weights for the minimization process. Both approaches would need to be tailored to suit supervised or unsupervised learning environments, depending on the data availability and modeling goals.

Looking ahead, an extension of this work could involve introducing stochasticity into the model, for example, by incorporating a Wiener process. Such a modification would allow us to examine strong persistence on average, study the existence of global solutions, and explore stationary distributions within the population dynamics framework. This stochastic extension would provide a richer understanding of the model’s behavior under real-world conditions, where randomness and environmental variability play a significant role in shaping evolutionary outcomes.

Funding

This research was funded by AMS-Simon Research Enhancement for PUI grant.

Data Availability Statement

The data used in this paper are all simulated.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Appendix A.1. Proof of Theorem 1

Proof.

We have that

λ (x, θ) = ln (G (x, θ)) = ln (b_{0}) - \frac{θ^{2}}{2 w^{2}} - c_{0} x e^{- κ (θ - u)}

.

It follows that

\begin{matrix} g (x, θ) & = & - \frac{θ}{w^{2}} + κ c_{0} x e^{- κ (θ - u)} \\ = & - \frac{θ}{w^{2}} + κ c_{u} (θ) x, \end{matrix}

and

h (x, θ) = - \frac{1}{w^{2}} - κ^{2} c_{0} x e^{- κ (θ - u)} .

From Definition 1 above, it follows that

I (θ) = - E_{X} [h (X, θ)] = - E_{X} [- \frac{1}{w^{2}} - κ^{2} c_{u} (θ) X] = \frac{1}{w^{2}} + κ^{2} c_{u} (θ) E [X] .

We observe that X has probability distribution

G (x, θ) = b (θ) e^{- c_{u} (θ) x}

; therefore,

\begin{matrix} E [X] & = & \int_{X} x G (x, θ) d x \\ = & \frac{b (θ)}{c_{u} (θ)} \int_{X} x c_{u} (θ) e^{- c_{u} (θ) x} d x \\ = & \frac{b (θ)}{c_{u} {(θ)}^{2}} . \end{matrix}

It then follows that

I (θ) = \frac{1}{w^{2}} + κ^{2} c_{u} (θ) \frac{b (θ)}{c_{u} {(θ)}^{2}} = \frac{1}{w^{2}} + κ^{2} \frac{b (θ)}{c_{u} (θ)} = \frac{1}{w^{2}} + κ^{2} Γ (θ) .

(A1)

This ends the proof of the Theorem. □

Appendix A.2. Proof of Corollary 1

Suppose

\frac{b_{0}}{c_{0}} = \frac{1}{κ \sqrt{2 π w κ}} e^{κ u - \frac{{(w κ)}^{2}}{2}}

. From Equation (A1) above, we have that

I (θ) = \frac{1}{w^{2}} + κ^{2} \frac{b_{0}}{c_{0}} e^{- \frac{θ^{2}}{2 w^{2}} + κ (θ - u)} .

Completing the square, we have that

\begin{matrix} - \frac{θ^{2}}{2 w^{2}} + κ (θ - u) & = & - \frac{1}{2 w^{2}} [θ^{2} - 2 w^{2} κ θ + 2 w^{2} κ u] \\ = & - \frac{1}{2 w^{2}} [{(θ - w^{2} κ)}^{2} - w^{4} κ^{2} + 2 w^{2} κ u] \\ = & - \frac{1}{2 w^{2}} [{(θ - w^{2} κ)}^{2}] + \frac{1}{2} {(w κ)}^{2} - κ u . \end{matrix}

It follows that

\begin{matrix} I (θ) & = & \frac{1}{w^{2}} + κ^{2} \frac{b_{0}}{c_{0}} e^{\frac{1}{2} {(w κ)}^{2} - κ u} e^{- \frac{1}{2 w^{2}} [{(θ - w^{2} κ)}^{2}]} . \end{matrix}

From the given expression of

\frac{b_{0}}{c_{0}}

above, it follows that that

I (θ) = \frac{1}{w^{2}} + \frac{1}{w \sqrt{2 π}} e^{- \frac{1}{2 w^{2}} [{(θ - w^{2} κ)}^{2}]} .

Appendix B. Proof of Theorem 2

Proof.

Let

b (Θ) = b_{0} \exp (- \sum_{i = 1}^{n} \frac{θ_{i}^{2}}{2 w_{i}^{2}})

and

c_{U} (Θ) = c_{0} \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i} - u_{i}))

.

We have that

\begin{matrix} G (x, Θ) & = & b_{0} \exp (- \sum_{i = 1}^{n} \frac{θ_{i}^{2}}{2 w_{i}^{2}} - c_{0} x \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i} - u_{i}))) \\ = & b (Θ) e^{- c_{U} (Θ) x}, \end{matrix}

from which we can deduce

\begin{matrix} λ (x, Θ) & = & ln (b_{0}) - \sum_{i = 1}^{n} \frac{θ_{i}^{2}}{2 w_{i}^{2}} - c_{0} x \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i} - u_{i})) \\ = & ln (b (Θ)) - x c_{U} (Θ) . \end{matrix}

It follows that

g (x, Θ)

is the vector given as

g (x, Θ) : = {(g_{i} (x, Θ))}_{i = 1, \dots, n} : = {(- \frac{θ_{i}}{w_{i}^{2}} + c_{0} κ_{i} x \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i} - u_{i})))}_{i = 1, 2, \dots, n} .

Hence,

h (x, Θ)

is an

n \times n

matrix given by

h (x, Θ) = (\begin{matrix} g_{11} (x, Θ) & g_{12} (x, Θ) & \dots & g_{1 n} (x, Θ) \\ g_{21} x, Θ) & g_{22} (x, Θ) & \dots & g_{2 n} (x, Θ) \\ ⋮ & ⋮ & \dots & ⋮ \\ g_{n 1} (x, Θ) & g_{n 2} (x, Θ) & \dots & g_{n n} (x, Θ) \end{matrix}),

where for

i = 1, 2, \dots, n

, we have

\begin{matrix} g_{i i} (x, Θ) = \frac{\partial^{2} λ}{\partial θ_{i}^{2}} & = & - \frac{1}{w_{i}^{2}} - c_{0} κ_{i}^{2} x \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i} - u_{i})) = - \frac{1}{w_{i}^{2}} - κ_{i}^{2} c_{U} (Θ) x . \end{matrix}

and

\begin{matrix} g_{i j} (x, Θ) = \frac{\partial^{2} λ}{\partial θ_{i} \partial θ_{j}} & = & - c_{0} κ_{i} κ_{j} x \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i} - u_{i})) = κ_{i} κ_{j} c_{U} (Θ) x . \end{matrix}

We can deduce that

I (Θ)

is the

n \times n

matrix given as

(\begin{matrix} I_{11} (Θ) & I_{12} (Θ) & \dots & I_{1 n} (Θ) \\ I_{21} x, Θ) & I_{22} (Θ) & \dots & I_{2 n} (Θ) \\ ⋮ & ⋮ & \dots & ⋮ \\ I_{n 1} (Θ) & I_{n 2} (Θ) & \dots & I_{n n} (Θ) \end{matrix}),

where

I_{i i} (Θ) = - E_{X} [\frac{\partial^{2} λ (X, Θ)}{\partial θ_{i}^{2}}] = \frac{1}{w_{i}^{2}} + κ_{i}^{2} c_{U} (Θ) E [X] .

Since X has distribution

G (x, Θ)

, we have that

\begin{matrix} E [X] & = & \int_{X} x G (x, Θ) d x \\ = & \frac{b (Θ)}{c_{U} (Θ)} \int_{X} x c_{U} (Θ) e^{- c_{U} (Θ) x} d x \\ = & \frac{b (Θ)}{c_{U} (Θ)} \cdot \frac{1}{c_{U} (Θ)} . \end{matrix}

Therefore, we have that

I_{i i} (Θ) = \frac{1}{w_{i}^{2}} + κ_{i}^{2} \frac{b (Θ)}{c_{U} (Θ)} = \frac{1}{w_{i}^{2}} + κ_{i}^{2} Γ_{U} (Θ) .

Likewise, for

i \neq j

,

I_{i j} = - E_{X} [\frac{\partial^{2} λ (X, Θ)}{\partial θ_{i} \partial θ_{j}}] = κ_{i} κ_{j} c_{u} (θ) E [X] = κ_{i} κ_{i} \frac{b (Θ)}{c_{U} (Θ)} = κ_{i} κ_{i} Γ_{U} (Θ) .

□

Appendix C. Proof of Proposition 2 and Corollary 2

Proof.

The proof of Proposition 2 is an easy generalization from the two-traits model.

First, assume that we are in the presence of two traits.

We have that

\det (Σ) = 0 ⟺ σ_{11} σ_{22} = σ_{12} σ_{21} ⟺ μ_{1} μ_{2} = 1 .

The system in (14) has a non-trivial solution if

Σ g (x, Θ) = 0

, that is,

\{\begin{matrix} g_{1} (x, θ_{1}, θ_{2}) + μ_{2} g_{2} (x, θ_{1}, θ_{2}) & = 0 \\ μ_{1} g_{1} (x, θ_{1}, θ_{2}) + g_{2} (x, θ_{1}, θ_{2}) & = 0 \end{matrix} .

(A2)

We will show in the sequel that either of the Equations in (A2) can be used to characterize the solutions

(θ_{1}, θ_{2})

. Let

H = c_{0} \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i *} - u_{i}))

for

n = 2

. From the first Equation in (15), we have

\begin{matrix} G (x_{*}, θ_{1 *}, θ_{2 *}) = 1 & ⟺ & b_{0} \exp (- \sum_{i = 1}^{n} \frac{θ_{i *}^{2}}{2 w_{i}^{2}} - c_{0} x_{*} \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i *} - u_{i}))) = 1 \\ ⟺ & b_{0} = \exp (\sum_{i = 1}^{n} \frac{θ_{i *}^{2}}{2 w_{i}^{2}} + c_{0} x_{*} \exp (- \sum_{i = 1}^{n} κ_{i} (θ_{i *} - u_{i}))) \\ ⟺ & b_{0} = \exp (\sum_{i = 1}^{n} \frac{θ_{i *}^{2}}{2 w_{i}^{2}} + x_{*} H) \\ ⟺ & ln (b_{0}) = \sum_{i = 1}^{n} \frac{θ_{i *}^{2}}{2 w_{i}^{2}} + x_{*} H . \end{matrix}

It therefore follows that

x^{*} = \frac{ln (b_{0}) - \sum_{i = 1}^{n} \frac{θ_{i *}^{2}}{2 w_{i}^{2}}}{H} .

(A3)

Next, we define

ξ_{2} = \frac{1}{ν^{2}} (\frac{1}{w_{1}^{2}} + \frac{μ_{2}^{2}}{w_{2}^{2}}) + 2 ln (b_{0}) .

Therefore, for a solution

(x, θ_{1}, θ_{2})

of (15), we have

\begin{matrix} g_{1} (x, θ_{1}, θ_{2}) + μ_{2} g_{2} (x, θ_{1}, θ_{2}) & = & - \frac{θ_{1}}{w_{1}^{2}} + x κ_{1} H + μ_{2} (- \frac{θ_{2}}{w_{2}^{2}} + x κ_{2} H) \\ = & - (\frac{θ_{1}}{w_{1}^{2}} + μ_{2} \frac{θ_{2}}{w_{2}^{2}}) + x H (κ_{1} + μ_{2} κ_{2}) \\ = & - (\frac{θ_{1}}{w_{1}^{2}} + μ_{2} \frac{θ_{2}}{w_{2}^{2}}) + ν (ln (b_{0}) - \frac{θ_{1}^{2}}{2 w_{1}^{2}} - \frac{θ_{2}^{2}}{2 w_{2}^{2}}) \\ = & - \frac{ν}{2 w_{1}^{2}} θ_{1}^{2} - \frac{1}{w_{1}^{2}} θ_{1} - \frac{ν}{2 w_{2}^{2}} θ_{2}^{2} - \frac{μ_{2}}{w_{2}^{2}} θ_{2} + ν ln (b_{0}) . \\ = & - \frac{ν}{2 w_{1}^{2}} (θ_{1}^{2} + \frac{2}{ν} θ_{1}) - \frac{ν}{2 w_{2}^{2}} (θ_{2}^{2} + \frac{2 μ_{2}}{ν} θ_{2}) + ν ln (b_{0}) . \\ = & - \frac{ν}{2 w_{1}^{2}} {(θ_{1} + \frac{1}{ν})}^{2} + \frac{ν}{2 w_{1}^{2}} \frac{1}{ν^{2}} - \frac{ν}{2 w_{2}^{2}} {(θ_{2} + \frac{μ_{2}}{ν})}^{2} \\ + & \frac{ν}{2 w_{2}^{2}} \frac{μ_{2}^{2}}{ν^{2}} + ν ln (b_{0}) . \\ = & - \frac{ν}{2 w_{1}^{2}} {(θ_{1} + \frac{1}{ν})}^{2} - \frac{ν}{2 w_{2}^{2}} {(θ_{2} + \frac{μ_{2}}{ν})}^{2} + \frac{1}{2 w_{1}^{2} ν} \\ + & \frac{μ_{2}^{2}}{2 w_{2}^{2} ν} + ν ln (b_{0}) . \end{matrix}

Dividing the latter by

\frac{ν}{2}

, we have that

\begin{matrix} g_{1} (x, θ_{1}, θ_{2}) + μ_{2} g_{2} (x, θ_{1}, θ_{2}) = 0 & ⟺ & - \frac{1}{w_{1}^{2}} {(θ_{1} + \frac{1}{ν})}^{2} - \frac{1}{w_{2}^{2}} {(θ_{2} + \frac{μ_{2}}{ν})}^{2} + \frac{1}{w_{1}^{2} ν^{2}} \\ + & \frac{μ_{2}^{2}}{w_{2}^{2} ν^{2}} + 2 ln (b_{0}) = 0 . \\ ⟺ & \frac{1}{w_{1}^{2}} {(θ_{1} + \frac{1}{ν})}^{2} \frac{1}{w_{2}^{2}} {(θ_{2} + \frac{μ_{2}}{ν})}^{2} - ξ_{2} = 0 \end{matrix}

Clearly, if

ξ_{2} < 0

, there is no solution to

g_{1} (x, θ_{1}, θ_{2}) + μ_{2} g_{2} (x, θ_{1}, θ_{2}) = 0

.

If

ξ_{2} = 0

, then

g_{1} (x, θ_{1}, θ_{2}) + μ_{2} g_{2} (x, θ_{1}, θ_{2}) = 0 ⟺ (θ_{1}, θ_{2}) = (- \frac{1}{ν}, - \frac{μ_{2}}{ν}) .

If

ξ_{2} > 0

, we let

a = w_{1} \sqrt{ξ_{2}}, b = w_{2} \sqrt{ξ_{2}} .

It follows that

g_{1} (x, θ_{1}, θ_{2}) + μ_{2} g_{2} (x, θ_{1}, θ_{2}) = 0 ⟺ \frac{{(θ_{1} + \frac{1}{ν})}^{2}}{a^{2}} + \frac{{(θ_{2} + \frac{μ_{2}}{ν})}^{2}}{b^{2}} = 1 .

That is, the ellipse centered at

(θ_{1}^{0}, θ_{2}^{0}) = (- \frac{1}{ν}, - \frac{μ_{2}}{ν})

with respective major and minor axis lengths a and b. Similar to above, we define

ξ_{*} = \frac{1}{ν_{*}^{2}} (\frac{μ_{1}}{w_{1}^{2}} + \frac{1}{w_{2}^{2}}) + 2 ln (b_{0}), where ν_{*} = μ_{1} κ_{1} + κ_{2} .

If

ξ_{*} > 0

, then

μ_{1} g_{1} (x, θ_{1}, θ_{2}) + g_{2} (x, θ_{1}, θ_{2}) = 0 ⟺ \frac{{(θ_{1} + \frac{μ_{1}}{ν_{*}})}^{2}}{a_{*}^{2}} + \frac{{(θ_{2} + \frac{1}{ν_{*}})}^{2}}{b_{*}^{2}} = 1 .

That is, the ellipse centered at

(θ_{1 *}^{0}, θ_{2 *}^{0}) = (- \frac{μ_{1}}{ν_{*}}, - \frac{1}{ν_{*}})

with respective major and minor axis lengths

a_{*} = w_{1} \sqrt{ξ_{*}}, b_{*} = w_{2} \sqrt{ξ_{*}} .

We observe that the two ellipses are the same:

(1): They have identical centers. Indeed, we have $ν = μ_{2} ν_{*}$ and $\frac{1}{ν} = \frac{1}{μ_{2} ν_{*}} = \frac{μ_{1}}{ν_{*}}$ since $\det (Σ) = 0$ implies that $μ_{1} μ_{2} = 1$ . Likewise, we have $\frac{1}{ν_{*}} = \frac{μ_{2}}{ν}$ .
This proves that $(θ_{1}^{0}, θ_{2}^{0}) = (θ_{* 1}^{0}, θ_{* 2}^{0})$ , that is, the two centers are identical.
(2): They have the same parameters. Indeed, we have

\begin{matrix} \frac{1}{ν^{2}} (\frac{1}{w_{1}^{2}} + \frac{μ_{2}^{2}}{w_{2}^{2}}) & = & \frac{1}{μ_{2}^{2} ν_{*}^{2}} (\frac{1}{w_{1}^{2}} + \frac{μ_{2}^{2}}{w_{2}^{2}}) \\ = & \frac{1}{ν_{*}^{2}} (\frac{1}{μ_{2}^{2} w_{1}^{2}} + \frac{1}{w_{2}^{2}}) \\ = & \frac{1}{ν_{*}^{2}} (\frac{1}{w_{2}^{2}} + \frac{μ_{1}^{2}}{w_{1}^{2}}), since μ_{1} μ_{2} = 1 . \end{matrix}

This implies that

ξ_{2} = ξ_{*}

and thus

a = a_{*}

and

b = b_{*}

.

To generalize, we note that

Σ g (x, Θ) = 0

implies that for the given

1 \leq j \leq n

, we have

\sum_{i = 1}^{n} σ_{j i} g_{i} (x, Θ) = 0 .

Since we assume that

σ_{i i} \neq 0

, without loss of generality, let

j = 1

. Then we have

\sum_{i = 1}^{n} σ_{j i} g_{i} (x, Θ) = 0 ⟺ g_{1} (x, Θ) + μ_{12} g_{2} (x, Θ) + \dots + μ_{1 n} g_{n} (x, Θ) .

Rearranging the terms and completing the squares, we obtain the result as announced. □

References

Darwin, C. On the Origin of Species by Means of Natural Selection; John Murray: London, UK, 1859. [Google Scholar]
Lehman, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Fisher, R.A. On the Mathematical Foundation of Theoretical Statistics. Philos. Trans. R. Soc. Lond. Ser. A 1922, 222, 594–604. [Google Scholar]
Bernado, J.M.; Smith, A.F.M. Bayesisan Theory; John Wiley & Sons: Hoboken, NJ, USA, 1994. [Google Scholar]
Ward, M.D.; Ahlquist, J. Maximum Likelihood for Social Science: Strategies for Analysis; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Rao, C. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–89. [Google Scholar]
Cramér, H. Mathematical Methods of Statistics; Princeton Univ. Press: Princeton, NJ, USA, 1946. [Google Scholar]
Smith, K. On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance they give Towards a Proper Choice of the Distribution of Observations. Biometrika 1918, 1, 1–85. [Google Scholar] [CrossRef]
Kilrkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quand, J.; Ramalho, T. Overcoming Catastrophic Forgetting in Neural Networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
Parag, K.; Donnelly, C.A.; Zarebski, A.E. Quantifying the Information in Noisy Epidemic Curves. Nat. Comput. Sci. 2022, 2, 584–594. [Google Scholar] [CrossRef] [PubMed]
Vincent, T.L.; Vincent, T.L.S.; Cohen, Y. Darwinian Dynamics and Evolutionary Game Theory. J. Biol. Dyn. 2011, 5, 215–226. [Google Scholar] [CrossRef]
Vincent, T.L.; Brown, J.S. Evoluationary Game Theory, Natural Selection, and Darwinian Dynamics; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Smith, M.; Price, G.R. The logic of Animal Conflict. Nature 1973, 246, 15–18. [Google Scholar] [CrossRef]
Smith, J.M. Evolution and the Theory of Games; Cambridge Unievrsity Press: Cambridge, UK, 1982. [Google Scholar]
Ackleh, A.S.; Cushing, J.M.; Salceneau, P.L. On the dynamics of evolutionary competition models. Nat. Resour. Model. 2015, 28, 380–397. [Google Scholar] [CrossRef]
Cushing, J.M. Difference Equations as Models of Evolutionary Population Dynamics. J. Biol. Dyn. 2019, 13, 103–127. [Google Scholar] [CrossRef] [PubMed]
Cushing, J.M.; Park, J.; Farrell, A.; Chitnis, N. Treatment of outcome in an SI Model with Evolutionary Resistance: A Darwinian Model for the Evolutionary Resistance. J. Biol. Dyn. 2023, 17. [Google Scholar] [CrossRef] [PubMed]
Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Cengage: Boston, MA, USA, 2002. [Google Scholar]
Elaydi, S.; Kang, Y.; Luis, R. The effects of Evolution on the Stability of Competing Species. J. Biol. Dyn. 2022, 16, 816–839. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The blue curve represents the Fisher’s information in Equation (6) with its maximum value represented by the red dashed line, for

w = 7; κ_{1} = 0.5; u = 0.02; \frac{b_{0}}{c_{0}} = 10^{- 4}

.

Figure 1. The blue curve represents the Fisher’s information in Equation (6) with its maximum value represented by the red dashed line, for

w = 7; κ_{1} = 0.5; u = 0.02; \frac{b_{0}}{c_{0}} = 10^{- 4}

.

Figure 2. In (a), represented is the time series of

x_{t}

. It shows a convergence to

x^{*} \approx 20.789 \times 10^{5}

(blue dashed line). In (b), represented is the time series of

θ_{t}

, showing a convergence to

θ_{* +} \approx 6.113

(blue dashed line). Figure (c) represents the time series of the Fisher’s information

I (θ_{t})

, showing a convergence to

I (θ_{* +}) \approx 345.43 \times 10^{5}

(red dashed line). Figure (d) is the plot of

I (θ_{t})

versus

θ_{t}

, showing that once the fixed point

θ_{* +}

is reached, the Fisher’s information is maximized. This is illustrated by the intersection between the blue and red dashed lines.

Figure 2. In (a), represented is the time series of

x_{t}

. It shows a convergence to

x^{*} \approx 20.789 \times 10^{5}

(blue dashed line). In (b), represented is the time series of

θ_{t}

, showing a convergence to

θ_{* +} \approx 6.113

(blue dashed line). Figure (c) represents the time series of the Fisher’s information

I (θ_{t})

, showing a convergence to

I (θ_{* +}) \approx 345.43 \times 10^{5}

(red dashed line). Figure (d) is the plot of

I (θ_{t})

versus

θ_{t}

, showing that once the fixed point

θ_{* +}

is reached, the Fisher’s information is maximized. This is illustrated by the intersection between the blue and red dashed lines.

Figure 3. This figure shows

F (x, θ) : = x G (x, θ)

in blue,

G (x, θ)

in black, and

λ (x, θ)

in red for

θ = 1.5

. The green dots represent the intersection between the vertical

x = \frac{1}{θ} = \frac{2}{3}

and these curves. We observe that

G (x, θ)

is maximized at the same point x where

λ (x, θ)

is minimized (green dots) and vice versa (red dots).

Figure 3. This figure shows

F (x, θ) : = x G (x, θ)

in blue,

G (x, θ)

in black, and

λ (x, θ)

in red for

θ = 1.5

. The green dots represent the intersection between the vertical

x = \frac{1}{θ} = \frac{2}{3}

and these curves. We observe that

G (x, θ)

is maximized at the same point x where

λ (x, θ)

is minimized (green dots) and vice versa (red dots).

Figure 4. In (a), represented is the time series of

x_{t}

in the special case above. It shows a convergence to

x^{*} = e^{- 1}

(blue dashed line). In (b), represented is the time series of

θ_{t}

, showing a convergence to

θ_{* +} = e

(blue dashed line). Figure (c) represents the time series of the Fisher’s information

I_{n} (θ_{t})

, showing a convergence to

I (θ_{*}) \approx 0.125

(red dashed line) as

t \to \infty

. Figure (d) is the plot of

I (θ_{t})

versus

θ_{t}

, showing that once the fixed point

θ_{* +} = e

is reached, the Fisher’s information is maximized. This is illustrated by the intersection between the blue and red dashed lines.

Figure 4. In (a), represented is the time series of

x_{t}

in the special case above. It shows a convergence to

x^{*} = e^{- 1}

(blue dashed line). In (b), represented is the time series of

θ_{t}

, showing a convergence to

θ_{* +} = e

(blue dashed line). Figure (c) represents the time series of the Fisher’s information

I_{n} (θ_{t})

, showing a convergence to

I (θ_{*}) \approx 0.125

(red dashed line) as

t \to \infty

. Figure (d) is the plot of

I (θ_{t})

versus

θ_{t}

, showing that once the fixed point

θ_{* +} = e

is reached, the Fisher’s information is maximized. This is illustrated by the intersection between the blue and red dashed lines.

Figure 5. Values of

T_{n} (θ_{t})

(dashed line) for two different starting values of

θ

, each with their 95% confidence bands (colored-shaded areas). In each case,

T_{n} (θ_{t})

converges to e (light dashed line) as

t \to \infty

. When n is relatively small as (a,b), confidence intervals are relative large. When n is large as in (c,d),

I_{n} {(θ)}^{- 1}

becomes smaller and so is the width of the confidence interval.

Figure 5. Values of

T_{n} (θ_{t})

(dashed line) for two different starting values of

θ

, each with their 95% confidence bands (colored-shaded areas). In each case,

T_{n} (θ_{t})

converges to e (light dashed line) as

t \to \infty

. When n is relatively small as (a,b), confidence intervals are relative large. When n is large as in (c,d),

I_{n} {(θ)}^{- 1}

becomes smaller and so is the width of the confidence interval.

Figure 6. Time series of a Darwinian model when

σ

is large. We observe that there are oscillations making the critical point unstable (blue dashed line).

Figure 6. Time series of a Darwinian model when

σ

is large. We observe that there are oscillations making the critical point unstable (blue dashed line).

Figure 7. In (a), the solid curve represents the dynamics of

x_{t}

in the system (14) above. The dashed line represents the nontrivial equilibrium point

x_{*}

. The blue line represent the value of

x_{*}

as given in Equation (A3), using the values of

θ_{1 *} \approx 3.873

and

θ_{2 *} \approx 1.873

obtained as nontrivial fixed points from the last two equations in (14). That the blue line and the dashed coincide is proof of the first part of the Proposition above. In (b,c), the solid curves represent the dynamics of

θ_{1, t}

and

θ_{2, t}

, respectively. The dashed lines represent the nontrivial fixed points

(θ_{1 *}, θ_{2 *})

. In (d), the blue curve represents the ellipse given in Equation (17) above, with center

(- 1 / ν, - μ_{2} / ν)

. The red dot represents the nontrivial fixed

(θ_{1 *}, θ_{2 *})

. This point almost lies on the ellipse (the discrepancy is due to an accumulation of error), which is proof of the second part of Proposition 2 above.

Figure 7. In (a), the solid curve represents the dynamics of

x_{t}

in the system (14) above. The dashed line represents the nontrivial equilibrium point

x_{*}

. The blue line represent the value of

x_{*}

as given in Equation (A3), using the values of

θ_{1 *} \approx 3.873

and

θ_{2 *} \approx 1.873

obtained as nontrivial fixed points from the last two equations in (14). That the blue line and the dashed coincide is proof of the first part of the Proposition above. In (b,c), the solid curves represent the dynamics of

θ_{1, t}

and

θ_{2, t}

, respectively. The dashed lines represent the nontrivial fixed points

(θ_{1 *}, θ_{2 *})

. In (d), the blue curve represents the ellipse given in Equation (17) above, with center

(- 1 / ν, - μ_{2} / ν)

. The red dot represents the nontrivial fixed

(θ_{1 *}, θ_{2 *})

. This point almost lies on the ellipse (the discrepancy is due to an accumulation of error), which is proof of the second part of Proposition 2 above.

Figure 8. In (a), represented in black are 100 trajectories of

x_{t}

with a starting point selected at random from an exponential distribution with parameter

c_{u} (θ)

. The red curve represents their average over time converging to

x_{*} \approx 1.417

. In (b), represented in light-blue are the 95% confidence bands for the corresponding trajectories of

θ_{1, t}

. The red curve represents their average and we verify that they all converge to

θ_{1 *} \approx 3.083

. We note that these confidence bands are constructed using the Fisher’s information as

\bar{θ_{2, t}} \pm 1.96 / \sqrt{I_{11} (Θ_{t})}

, where

\bar{θ_{1, t}}

is the average at time t. Similarly in (c), represented in light-blue are the 95% confidence bands for

θ_{2, t}

and the corresponding sample average in red. They converge to

θ_{2 *} \approx 1.083

. In (d), represented is the ellipse given in Equation (17) above. We verify that the point

(θ_{1 *}, θ_{2 *})

is on the ellipse and that the value of

x^{*}

obtained from Equation (A3) is the same as convergence value of

x_{t}

.

Figure 8. In (a), represented in black are 100 trajectories of

x_{t}

with a starting point selected at random from an exponential distribution with parameter

c_{u} (θ)

. The red curve represents their average over time converging to

x_{*} \approx 1.417

. In (b), represented in light-blue are the 95% confidence bands for the corresponding trajectories of

θ_{1, t}

. The red curve represents their average and we verify that they all converge to

θ_{1 *} \approx 3.083

. We note that these confidence bands are constructed using the Fisher’s information as

\bar{θ_{2, t}} \pm 1.96 / \sqrt{I_{11} (Θ_{t})}

, where

\bar{θ_{1, t}}

is the average at time t. Similarly in (c), represented in light-blue are the 95% confidence bands for

θ_{2, t}

and the corresponding sample average in red. They converge to

θ_{2 *} \approx 1.083

. In (d), represented is the ellipse given in Equation (17) above. We verify that the point

(θ_{1 *}, θ_{2 *})

is on the ellipse and that the value of

x^{*}

obtained from Equation (A3) is the same as convergence value of

x_{t}

.

Table 1. The table shows for Game Theory variables are interpreted in population dynamics.

Parameter	Game Theory	Population Dynamics
$x_{i}$	Gain of player i	Density of population i
$θ_{i}$	Strategy of player i	Trait value of population i
$σ^{2}$	Variance of distribution of strategies	Variance of population’s traits
$f_{i} (x, Θ)$	Instantaneous payoff of player i	Per capita growth rate of species with density $x_{i}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kwessi, E. Information Theory in a Darwinian Evolution Population Dynamics Model. Symmetry 2024, 16, 1522. https://doi.org/10.3390/sym16111522

AMA Style

Kwessi E. Information Theory in a Darwinian Evolution Population Dynamics Model. Symmetry. 2024; 16(11):1522. https://doi.org/10.3390/sym16111522

Chicago/Turabian Style

Kwessi, Eddy. 2024. "Information Theory in a Darwinian Evolution Population Dynamics Model" Symmetry 16, no. 11: 1522. https://doi.org/10.3390/sym16111522

APA Style

Kwessi, E. (2024). Information Theory in a Darwinian Evolution Population Dynamics Model. Symmetry, 16(11), 1522. https://doi.org/10.3390/sym16111522

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Theory in a Darwinian Evolution Population Dynamics Model

Abstract

1. Introduction

2. Review of Fisher’s Information Theory

3. Evolution Population Dynamics and Information Theory

3.1. Single Population Model with One Trait

3.2. Discussion

3.3. Single Population Model with Multiple Traits

3.4. Discussion

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Corollary 1

Appendix B. Proof of Theorem 2

Appendix C. Proof of Proposition 2 and Corollary 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI