New Method of Randomized Forecasting Using Entropy-Robust Estimation: Application to the World Population Prediction

Popkov, Yuri S.; Dubnov, Yuri A.; Popkov, Alexey Yu.

doi:10.3390/math4010016

Open AccessArticle

New Method of Randomized Forecasting Using Entropy-Robust Estimation: Application to the World Population Prediction

by

Yuri S. Popkov

^1,2,3,*,

Yuri A. Dubnov

^1,2 and

Alexey Yu. Popkov

^1,2

¹

Institute for Systems Analysis of Russian Academy of Sciences, Moscow 117312, Russia

²

Moscow Institute of Physics and Technology (State University), Moscow 141700, Russia

³

National Research University Higher School of Economics, Moscow 101000, Russia

^*

Author to whom correspondence should be addressed.

Mathematics 2016, 4(1), 16; https://doi.org/10.3390/math4010016

Submission received: 13 October 2015 / Revised: 25 January 2016 / Accepted: 4 March 2016 / Published: 11 March 2016

Download

Browse Figures

Versions Notes

Abstract

:

We propose a new method of randomized forecasting (RF-method), which operates with models described by systems of linear ordinary differential equations with random parameters. The RF-method is based on entropy-robust estimation of the probability density functions (PDFs) of model parameters and measurement noises. The entropy-optimal estimator uses a limited amount of data. The method of randomized forecasting is applied to World population prediction. Ensembles of entropy-optimal prognostic trajectories of World population and their probability characteristics are generated. We show potential preferences of the proposed method in comparison with existing methods.

Keywords:

entropy; randomized model; randomized forecasting; the exponential World population model

1. Introduction

For a studied process, forecasting as a procedure consists of four consecutive stages: modeling (model design), learning (estimation of model characteristics), testing (of the “learned” model) and prediction of future development.

Forecasting is based on retrospective data analysis with its subsequent extrapolation to future periods. Consider the state of a studied process at moment

t_{0}

, and suppose that the problem is to forecast further evolution of the process on a time interval

T_{f r c} = [t_{0}, T]

. Then, it is necessary to operate existing data on its past dynamics on a time interval

T_{r t s} = [T^{-}, t_{0}]

, where

T^{-} < t_{0}

(the so-called retrospective data). Generally, retrospective data and the time interval

T_{r t s}

are divided into two groups, namely data serving for the estimation of the model’s characteristics (on a time interval

T_{e s t} = [T^{-}, t_{e}]

) and model testing (on a time interval

T_{t s t} = [t_{e}, t_{0}]

).

There exist at least three forecasting techniques differing in the objectification degree of constructed forecasts. The first technique, referred to as scenario forecasting [1], proceeds from the scenario approach whose objectification is replaced by the opinion of an expert group. Actually, it implements only the stages of modeling and prediction: learning and testing are eliminated owing to the opinion of experts, who choose an appropriate mathematical model of a studied process and form value sets (scenarios) of the model parameters. Then, the model with the scenario parameter values generates forecasting trajectories. As a matter of fact, such a forecasting technique is most widespread in demographic prediction [2,3,4]. Note that real retrospective data about a studied process are not utilized. They are indirectly reflected by the knowledge and experience of invited experts.

The second technique of forecasting explicitly involves real data for model learning and testing. The framework of mathematical statistics provides numerous estimation methods for model parameters; see [5,6,7,8]. Here, a major assumption is that the model possesses deterministic parameters, the values of which are defined using sets of real retrospective data. The latter are treated as a stochastic object with certain properties (a sample from a universe, normal distribution, etc.). In this case, one may assign different probabilistic characteristics (variances, confidence intervals, and so on) to the derived estimates of model parameters.

Sometimes, these characteristics assist in constructing the probabilistic characteristics of forecasting trajectories. The described technique will be termed probabilistic forecasting (PF) [9,10,11,12,13,14]. We emphasize that the above hypotheses regarding the stochastic properties of real datasets are almost impossible to verify, especially under small data arrays. It follows for the low efficiency of forecasts [15,16].

Finally, the third technique of forecasting proposed in this paper stems from the randomized model (RM) of a studied process, where model parameters are supposed random. Hence, we characterize RMs using the probability density functions (PDFs) of their parameters. At the learning stage, the PDFs of the model parameters are estimated on the basis of real retrospective data.

A randomized model generates an ensemble of forecasting trajectories, where each trajectory corresponds to a set of random realizations of parameters with the derived estimates of the PDFs. Computer simulation of such models employs the Monte Carlo method. Below, this technique will be called randomized forecasting (RF).

As opposed to existing methods [17,18,19], the proposed method of randomized forecasting is based on entropy-optimal estimations of PDFs for real datasets. The structure of the randomized dynamic model, used in this method, is based on ordinary differential equations.

The developed method serves for obtaining randomized predictions of the World population dynamics. Modeling the World population variations in time and space forms a major problem of demographic analysis [20,21,22].

Throughout the paper, we describe the above mentioned dynamics by the exponential model incorporating several parameters associated with fertility and mortality rates, as well as its change in time. In the randomized setting, they are assumed random, whereas World population is measured with random errors. To find the corresponding PDFs, the method involves the retrospective population data provided by the UN (see UNdata service at https://data.un.org/). In addition, we perform the comparative analysis of the PF- and RF-based approaches.

2. Randomized Model: Linear Differential Form

Consider a dynamic object having an input

f (t) = {f_{1} (t), \dots, f_{m} (t)}

and an output

x (t) = {x_{1} (t), \dots, x_{n} (t)}

. The components of the input and output vectors can be observed (measured) on a time interval

T_{r t s} = [T^{-}, t_{0}]

.

The relationship between the input and output of the object is described by the linear nonautonomous system of ordinary differential equations:

\begin{matrix} \frac{d x (t)}{d t} = A x (t) + B f (t), x (T^{-}) = x^{0} \end{matrix}

(1)

where

x \in R^{n}

;

f \in R^{m}, m \leq n

;

A = [a_{i j} | (i, j) = \bar{1, n}]

and

B = [b_{i k} | i = \bar{1, n}, k = \bar{1, m}]

denote matrices of appropriate dimensions.

The object’s output is observed with inevitable disturbances modeled by a vector noise

\bar{ξ} (t) = {ξ_{1} (t), \dots, ξ_{n} (t)}

. Therefore, the observed output of the model acquires the form:

\begin{matrix} v (t) = x (t) + \bar{ξ} (t) \end{matrix}

(2)

where

v (t) \in R^{n}

.

Equations (1) and (2) define a linear dynamical randomized model (LDRM) if:

the matrix A is a random matrix (with independent random elements or elements with independent random components) of the interval type:

$\begin{matrix} A = [A : A^{-} \leq A \leq A^{+}] \end{matrix}$

(3)

where $A^{-}, A^{+}$ mean given matrices;
there exists a probability density function (PDF) $P (A), A \in A$ ;
the vector $\bar{ξ}$ is a random vector (i.e., contains independent random components) of the interval type:

$\begin{matrix} Ξ = [{\bar{ξ}}^{-} \leq \bar{ξ} \leq {\bar{ξ}}^{+}] \end{matrix}$

(4)
there exists a probability density function (PDF) $Q (\bar{ξ}), \bar{ξ} \in Ξ$ ;
the matrix B possesses fixed known elements.

Under the stated conditions, the LDRM generates an ensemble of random trajectories on the time interval

T = [T^{-}, T]

.

Let us rewrite the LDRM (2) in the input-output representation using the notion of matrix exponent [23]:

\begin{matrix} W (A | t - τ) = exp [A (t - τ)] \end{matrix}

(5)

The input and output are measured at discrete moments with step h. Hence, on the time interval

T_{e s t}

, we have:

\begin{matrix} x [T^{-} + i h] & = & W (A | i h) x^{0} + \\ + & \int_{T^{-}}^{T^{-} + i h} W (A | T^{-} + i h, τ) B f (τ) d τ, \\ i \in \bar{0, N_{e s t}} \end{matrix}

(6)

here

N_{e s t} = [(t_{e} - T^{-}) / h]

and

[•]

indicates the integer part of •.

The LDRM output Equation (3) observed at discrete moments has the following form:

\begin{matrix} v [T^{-} + i h] = x [T^{-} + i h] + \bar{ξ} [T^{-} + i h], i = \bar{0, N_{e s t}} \end{matrix}

(7)

Let us denote

{\bar{ξ}}^{(i)} = \bar{ξ} [T^{-} + i h], i = \bar{0, N_{e s t}}

. They are random vectors with independent and interval components. Now, we introduce the block-vector

\hat{ξ} = {{\bar{ξ}}^{(0)}, \dots, {\bar{ξ}}^{(N_{e s t})}}

. As we assume that these vectors and their components are independent, then the joint PDF is:

\begin{matrix} Q (\hat{ξ}) = \prod_{i = 0}^{N_{e s t}} Q_{i} ({\bar{ξ}}^{(i)}) = \prod_{i = 0}^{N_{e s t}} \prod_{j = 1}^{n} q_{i j} (ξ_{j}^{(i)}) \end{matrix}

(8)

The domain of this function is:

\begin{matrix} \hat{Ξ} = \underset{(N_{e s t} + 1) multipliers}{\underset{︸}{Ξ \times Ξ \dots \times Ξ}} \end{matrix}

(9)

Here,

\bar{ξ} (T^{-}), \bar{ξ} (T^{-} + h), \dots, \bar{ξ} (T^{-} + N_{e s t} h)

gives a sequence of n-dimensional independent random vectors of the interval type, associated with corresponding PDFs.

As soon as the matrix A and the noise vector

\hat{ξ}

are random and characterized by the PDFs

P (A)

and

Q (\bar{ξ})

, respectively, so the observed output of the LDRM represents an ensemble

V

of random trajectories

v [T^{-} + i h], i = \bar{0, N_{e s t}}

.

3. $S_{PQ}^{1}$ Entropy-Robust Estimation

The first stage of the randomized forecasting (RF) is an estimation of the PDFs of the RMs’ parameters and of the measurement noises. It is a classical problem of the Bayesian approach, and there exists the classical maximum likelihood method (or maximum relation of likelihood; see [6]) for its solving.

Let us recall the definition of the function of the relation of likelihood (FRL) in the terms of Section 2. The a priori PDFs

P^{0} (A), Q^{0} (\hat{ξ})

and the a posteriori PDFs

P (A), Q (\hat{ξ})

are the basic notations of the FRL. The FRL takes the form:

\begin{matrix} L (A) = ln \frac{P (A)}{P^{0} (A)}, L (\hat{ξ}) = ln \frac{Q (\hat{ξ})}{Q^{0} (\hat{ξ})} \end{matrix}

(10)

If the PDFs in these expressions are restored as functions of the parameters, then maximization of these functions gives “optimal” estimations. The principle of maximization of the FRL takes the form:

\begin{matrix} \hat{A} = arg max_{A} L (A) \end{matrix}

(11)

As a declaration, this principle is fine. However, how is it possible to restore the PDFs

P (A)

and

P^{0} (A)

(also

Q (\hat{ξ})

and

Q^{0} (\hat{ξ})

)? The problem of the restoration of the PDFs remains outside of the FRL.

Let us consider the functional of the likelihood relation (FuRL) in the following form:

\begin{matrix} L [P (A)] = \int_{A} P (A) ln \frac{P (A)}{P^{0} (A)} d A, L [Q (\hat{ξ})] = \int_{\hat{Ξ}} Q (\hat{ξ}) ln \frac{Q (\hat{ξ})}{Q^{0} (\hat{ξ})} d \hat{ξ} \end{matrix}

(12)

From Equation (12), we can see that the FuRL is the mathematical expectation of the FRL. On the other side, the FuRL is the opposite generalized information Boltzmann entropy (Kullback–Leibler distance) [24,25], that is:

\begin{matrix} H_{A} [P (A)] = - L [P (A)], H [Q (\hat{ξ})] = - L [Q (\hat{ξ})] \end{matrix}

(13)

According to [26], maximization of entropy functions gives the best robust solution under high uncertainty. This idea with the addition of real data balance conditions forms the basis of the

S_{P Q}^{1}

entropy-robust estimation method [27].

The

S_{P Q}^{1}

entropy-robust estimation can be reformulated as a problem of functional nonlinear programming [28,29]:

\begin{matrix} - H [P (A), Q (\bar{ξ})] & = & \int_{A} P (A) ln P (A) d A + \\ + & \int_{\bar{Ξ}} Q (\hat{ξ}) ln Q (\hat{ξ}) d \hat{ξ} \Rightarrow min \end{matrix}

(14)

subject to the constraints imposed on:

-: the class of (normalized) PDFs:

$\begin{matrix} \int_{A} P (A) d A = 1 \\ \int_{\hat{Ξ}} Q (\hat{ξ}) d \hat{ξ} = 1 \end{matrix}$

(15)

-: the balance between the first moment vector of the observed output $v [T^{-} + i h] = v^{(i)}$ in the LDRM Equation (7) and the real data vector $y [T^{-} + i h] = y^{(i)}$ :

$\begin{matrix} M {v^{(i)}} & = & {\bar{v}}^{(i)} [P (A), Q (\hat{ξ})] = \int_{A} u^{(i)} (A) P (A) d A + \\ + & \int_{\hat{Ξ}} {\bar{ξ}}^{(i)} Q (\hat{ξ}) d \hat{ξ} = y^{(i)}, i \in \bar{0, N_{e s t}} \end{matrix}$

(16)

where:

$\begin{matrix} u^{(i)} (A) = W (A | i h) x^{0} + \int_{T^{-}}^{T^{-} + i h} W (A | T^{-} + i h - τ) B f (τ) d τ \end{matrix}$

(17)

S_{P Q}^{1}

entropy-robust estimation uses the first moment vector of the output LDRM. It is possible to use moments of higher order. This depends on measurable real data. If they represent the k-moment, then the balance condition can be formulated in the following form:

\begin{matrix} {(M {v^{k}})}^{1 / k} = y \end{matrix}

(18)

where

v^{(k)}

is a vector of k-moment components. In this case, we will have

S_{P Q}^{k}

entropy-robust estimation.

The problem Equations (14)–(17) are related to the Lyapunov problem [28,29] where the goal functional and constraints are of an integral type. Here, we will use the necessary condition of equality to zero of the integral equation:

\begin{matrix} \int_{X} h (x) g (x) d x = 0 \end{matrix}

(19)

where the function

h (x)

is continuous and is equal to zero on the boundary of the set X (the class

\tilde{C}

); the function

g (x)

is differentiable (the class

D

). Then, this equality will be valid for any function

h (x)

with the mentioned properties if:

\begin{matrix} g (x) = 0, \forall x \in X \end{matrix}

(20)

Now, we return to the problem Equations (14)–(17) and introduce the Lagrange functional:

\begin{matrix} L [P (A), Q (\hat{ξ})] & = & - H [P (A), Q (\hat{ξ})] + λ (\int_{A} P (A) d A - 1) + μ (\int_{\hat{Ξ}} Q (\hat{ξ}) d \hat{ξ} - 1) + \\ + & \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, m^{(i)} [P (A), Q (\hat{ξ})] 〉 \end{matrix}

(21)

where:

\begin{matrix} m^{(i)} [P (A), Q (\hat{ξ})] = {\bar{v}}^{(i)} [P (A), Q (\hat{ξ})] - y^{(i)} \end{matrix}

(22)

Sign

〈 •, • 〉

denotes a scalar product.

As the solution of the problem Equations (14)–(17) is searched in the class of differentiable functions, then the Gato derivation can be used for determination of the variation of Lagrange functional Equation (21).

Let us denote the solution of the problem Equations (14)–(17) as

P^{*} (A)

and

Q^{*} (\hat{ξ})

. Furthermore, introduce the functions

ϕ (A) \in \tilde{C}, ψ (\hat{ξ}) \in D

and two scalar variables

α, β

to present the functions

P (A), Q (\hat{ξ})

in the following form:

\begin{matrix} P (A) = P^{*} (A) + α ϕ (A), Q (\hat{ξ}) = Q^{*} (\hat{ξ}) + β ψ (\hat{ξ}) \end{matrix}

(23)

Functions

P^{*} (A), Q^{*} (\hat{ξ})

, as a solutions of the problem Equations (14)–(17), are fixed. The optimality conditions for the problem Equations (14)–(17) take the form:

\begin{matrix} {\frac{d L}{d α}|}_{α = β = 0} = 0, {\frac{d L}{d β}|}_{α = β = 0} = 0 \end{matrix}

(24)

The application of these conditions leads to the following systems of integral equations:

\begin{matrix} \int_{A} ϕ (A) (ln P (A) + 1 + λ + \sum_{i = 0}^{N_{e s t}} 〈 {\hat{θ}}^{(i)}, u^{(i)} (A) 〉) d A & = & 0 \\ \int_{\hat{Ξ}} ψ (\hat{ξ}) (ln Q (\hat{ξ}) + 1 + μ + \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, {\bar{ξ}}^{(i)} 〉) d \hat{ξ} & = & 0 \end{matrix}

(25)

According to Equations (19) and (20), we obtain the following equations, which are necessary optimality conditions (necessary conditions of Lagrangian-stationarity) for the problem Equations (14)–(17):

\begin{matrix} ln P (A) + 1 + λ + \sum_{i = 0}^{N_{e s t}} 〈 {\hat{θ}}^{(i)}, u^{(i)} (A) 〉 & = & 0 \\ ln Q (\hat{ξ}) + 1 + μ + \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, {\bar{ξ}}^{(i)} 〉 = 0 \end{matrix}

(26)

The solution of the problem Equations (14)–(17) takes the form:

\begin{matrix} P^{*} (A) & = & \frac{exp (- \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, u^{(i)} (A) 〉)}{R ({\bar{θ}}^{(0)}, \dots, {\bar{θ}}^{(N_{e s t})})} \\ Q^{*} (\hat{ξ}) & = & \frac{exp (- \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, {\bar{ξ}}^{(i)} 〉)}{Q ({\bar{θ}}^{(0)}, \dots, {\bar{θ}}^{(N_{e s t})})} \end{matrix}

(27)

where:

\begin{matrix} R ({\bar{θ}}^{(0)}, \dots, {\bar{θ}}^{(N_{e s t})}) & = & \int_{A} exp (- \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, u^{(i)} (A) 〉) d A \\ Q ({\bar{θ}}^{(0)}, \dots, {\bar{θ}}^{(N_{e s t})}) & = & \int_{\hat{Ξ}} exp (- \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, {\bar{ξ}}^{(i)} 〉) d {\bar{ξ}}^{(0)} \dots d {\bar{ξ}}^{(N_{e s t})} \end{matrix}

(28)

The vectors of Lagrange multipliers are determined from the following equations:

\begin{matrix} U ({\bar{θ}}^{(0)}, \dots, {\bar{θ}}^{(N_{e s t})}) = \int_{A} u^{(i)} (A) \frac{exp (- \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, u^{(i)} (A) 〉)}{R ({\bar{θ}}^{(0)}, \dots, {\bar{θ}}^{(N_{e s t})})} d A + \\ + & \int_{\hat{Ξ}} {\bar{ξ}}^{(i)} \frac{exp (- \sum_{i = 0}^{N_{e s t}} 〈 {\bar{θ}}^{(i)}, {\bar{ξ}}^{(i)} 〉)}{Q ({\bar{θ}}^{(0)}, \dots, {\bar{θ}}^{(N_{e s t})})} d {\bar{ξ}}^{(0)} \dots d {\bar{ξ}}^{(N_{e s t})} - y^{(i)} = 0, i \in [0, N_{e s t}] \end{matrix}

(29)

Calculation of the vectors

{\hat{θ}}^{*} = {{\bar{θ}}_{*}^{(0)}, \dots, {\bar{θ}}_{*}^{(N_{e s t})}}

is turned into a search for the global minimum of the residual function:

J (\hat{θ}) = {∥ U (\hat{θ}) ∥}_{L_{2}}

(30)

A global optimization algorithm is based on the simple Monte Carlo trials proposed in [30]. However, as soon as the

L_{2}

metric is a convex function, one of the traditional gradient-based local optimization methods can be used for its solving.

4. Randomized Forecast Implementation

We comprehend a randomized forecast as an ensemble of trajectories on a forecasting interval

T_{f r c} = [t_{0}, T]

, which has to be generated using the model Equations (6) and (7) with the random matrix A and noise

\bar{ξ}

described by the PDFs

P^{*} (A)

and

Q^{*} (\bar{ξ})

, respectively; see formulas Equations (27) and (28). The matrices A and the vector noise

\bar{ξ}

belong to the parallelepipeds

A

from Equation (3) and Ξ from Equation (4), respectively.

Let us study the generation problem of random matrices with the PDF Equation (27). First, we transform a matrix into a vector through concatenation of its rows. This procedure yields a vector

a

of length

m = n^{2}

. Additionally, the domain of random matrices becomes an m-dimensional parallelepiped:

\begin{matrix} A = [a^{-} \leq a \leq a^{+}] \end{matrix}

(31)

where the vectors

a^{-}

and

a^{+}

result from the row concatenation of the matrices

A^{-}

and

A^{+}

, respectively.

Lets consider a transformation of a vector

a

into a vector

q

belonging to the m-dimensional unit nonnegative cube

Q

:

\begin{matrix} a = (a^{+} - a^{-}) q + a^{-}, Q = {q : 0 \leq q \leq 1} \end{matrix}

(32)

Therefore, the entropy-optimal PDF undergoes the following chain of transformations:

P^{*} (A) \Rightarrow P (a) \Rightarrow P (q)

(33)

Therefore, it is necessary to generate random vectors

q \in Q

with PDFs

P (q)

. The generation was implemented by the acceptance-rejection algorithm [24].

5. Application of the RF Method for World Population Prediction

5.1. The World Population Prediction Problem

The state of an isolated population is characterized by its size

E (t)

on a calendar time interval

T = [T^{-}, t_{0}]

. Population size varies under the impact of fertility and mortality processes, since World population is an isolated system. Within the framework of the linear population dynamics model, fertility and mortality are described by corresponding rates, whereas the flows of newborns and decedents appear proportional to population size, while fertility (b) and mortality (m) rates are considered as linear time-dependent parameters [22].

5.1.1. Randomized population model

World population evolves according to the following differential equation that has an analytical solution:

\begin{matrix} \frac{d E (t)}{d t} & = & (b (t) - m (t)) E (t), b (t) = b_{0} + u_{b} t, m (t) = m_{0} + u_{m} t \\ E (t) & = & E_{0} exp (((b_{0} - m_{0}) + \frac{1}{2} (u_{b} - u_{m}) t) t), where \\ E_{0} & = & E (T^{-}), b_{0} = b (T^{-}), m_{0} = m (T^{-}) \end{matrix}

(34)

Real measurements of the population size dynamics modeled by Equation (34) take place at discrete moments. Hence, the population size at discrete moments

i h

(where h specifies a given increment) is defined by the expression:

\begin{matrix} Φ_{i} (r, u_{r} | E_{0}) & = & E_{0} exp [(r + u_{r} i) i h], i \in I \\ r & = & b_{0} - m_{0}, u_{r} = \frac{1}{2} (u_{b} - u_{m}) h \end{matrix}

(35)

with parameters r and

u_{r}

, which describe the result of the difference between fertility and mortality flows.

World population is measured in billions of people. Fertility and mortality processes aggregate many factors, including measurement errors, whose quantitative analysis is impossible or complicated. On the other hand, the mass nature of fertility and mortality processes admits their modeling based on the probabilistic approach.

Thus, the resultant flow rate and its changing in time are supposed to be random variables with a joint probability density function

P (r, u_{r})

defined on the rectangle:

\begin{matrix} J = I_{r} ⋃ I_{u_{r}}, I_{r} = [r^{-}, r^{+}], I_{u_{r}} = [u_{r}^{-}, u_{r}^{+}] \end{matrix}

(36)

Generally, measurement errors are modeled by an additive noise

ξ [i h]

of the interval type:

\begin{matrix} ξ [i h] \in Ξ_{i} = [ξ_{i}^{-}, ξ_{i}^{+}], i \in I \end{matrix}

(37)

where I indicates a time interval of such measurements. By assumption, the PDFs

q_{i} (ξ [i h]), i \in I

are specified on the intervals

Ξ_{i}

from Equation (37). Due to the independence of the set of random variables

ξ [0], \dots, ξ [i h]

, their joint PDF has the form:

\begin{matrix} Q (\bar{ξ}) = \prod_{i \in I} q_{i} (ξ [i h]) \end{matrix}

(38)

Therefore, the randomized model of World population dynamics can be described by:

\begin{matrix} v [i h] = Φ_{i} (r, u_{r} | E_{0}) + ξ [i h], i \in I \end{matrix}

(39)

where the function

Φ_{i} (r, u_{r} | E_{0})

meets Equality (35).

5.1.2. Real and forecasting data

For PDF estimation, address the World population measurements for the period from 1960–1995 (http://data.un.org/; see Table 1).

The entropy-optimal RM is tested via the measurements of the World population dynamics during the period from 1995–2015 and the values

E_{1985}^{p r n}

for this period (according to the UN forecast announced in 1985) (http://www.irbis.vegu.ru/repos/1002/Html/27.htm; see Table 2).

The World population prediction till 2050 using the UN data is illustrated by Table 3 (http://data.un.org/; [31]). United Nations’ projections both for the testing interval and the forecasting interval were made in accordance with the commonly-used cohort-component method, based on age-specific estimates of the components of population change (fertility, mortality and international migration) [32,33]. We will compare this prediction with its randomized analog.

World population is measured in billions of people. In the sequel, the subscript

r e a l

indicates the measured data of the population.

To summarize, the RM has the following forms on corresponding time intervals:

on the estimation interval $T_{e s t}$ (see Table 1):

$\begin{matrix} v [i h] = Φ_{i} (r, u_{r} | E_{r e a l}^{e s t} [0]) + ξ [i h], i \in [0, 7] \end{matrix}$

(40)
on the testing interval $T_{t s t}$ (see Table 2):

$\begin{matrix} v [i h] = Φ_{i} (r, u_{r} | E_{r e a l}^{t s t} [0]) + ξ [i h], i \in [0, 4] \end{matrix}$

(41)
on the forecasting interval $T_{f r c}$ (see Table 3):

$\begin{matrix} v [i h] = Φ_{i} (r, u_{r} | E_{r e a l}^{f r c} [0]), i \in [0, 5] \end{matrix}$

(42)

where r and

u_{r}

are random parameters with the entropy-optimal PDFs

P^{*} (r, u_{r})

and

\bar{ξ}

is a vector of random noise with entropy-optimal PDF

Q^{*} (\bar{ξ})

.

5.1.3. The entropy-optimal PDFs of the parameters and noise

According to the general entropy-robust estimation procedure of PDFs (see Section 2), we have:

the PDF of the parameters r and $u_{r}$ in the form:

$\begin{matrix} P^{*} (r, u_{r}) = \frac{1}{R (\bar{θ} | E_{r e a l}^{e s t} [0])} \prod_{j = 0}^{7} p_{j}^{*} (r, u_{r} | θ_{j}), p_{j}^{*} (r, u_{r} | θ_{j}) = exp (- θ_{j} Φ_{j} (r, u_{r} | E_{r e a l}^{e s t} [0])) \end{matrix}$

(43)
the PDF of the noise in the form:

$\begin{matrix} Q^{*} (\bar{ξ}) = \frac{1}{Q (\bar{θ})} \prod_{j = 0}^{7} q_{j}^{*} (ξ [j h] | θ_{j}), q_{j}^{*} (ξ [j h] | θ_{j}) = exp (- θ_{j} ξ [j h]) \end{matrix}$

(44)

where:

$\begin{matrix} R (\bar{θ} | E_{r e a l}^{e s t} [0]) & = & \int_{J} \prod_{j = 0}^{7} exp (- θ_{j} Φ_{j} (r, u_{r} | E_{r e a l}^{e s t} [0])) d r d u_{r} \end{matrix}$

(45)

$\begin{matrix} Q (\bar{θ}) & = & \prod_{j = 0}^{6} \int_{ξ_{j}^{-}}^{ξ_{j}^{+}} exp (- θ_{j} ξ [j h]) d ξ [j h] = \\ = & \prod_{j = 0}^{6} \frac{1}{θ_{j}} (exp (- θ_{j} ξ_{j}^{-}) - exp (- θ_{j} ξ_{j}^{+})) \end{matrix}$

(46)

To calculate the Lagrange multipliers, we will solve the system of balance equations (see Equations (16) and (17):

\begin{matrix} \frac{1}{R (\bar{θ} | E_{r e a l}^{e s t} [0])} \int_{J} Φ_{i} (r, u_{r} | E_{r e a l}^{e s t} [0]) \prod_{j = 0}^{7} exp (- θ_{j} Φ_{j} (r, u_{r} | E_{r e a l}^{e s t} [0])) d r d u_{r} + \\ + & \frac{1}{Q (\bar{θ})} \int_{Ξ} ξ [i h] \prod_{j = 0}^{7} exp (- θ_{j} ξ [j h]) d ξ [i h] - E_{r e a l}^{e s t} [i h] = 0, i \in [0, 7] \end{matrix}

(47)

We will denote:

\begin{matrix} N_{i} (\bar{θ} | E_{r e a l}^{e s t} [0]) = \int_{J} Φ_{i} (r, u_{r} | E_{r e a l}^{e s t} [0]) \prod_{j = 0}^{7} exp (- θ_{j} Φ_{j} (r, u_{r} | E_{r e a l}^{e s t} [0])) d r d u_{r} \end{matrix}

(48)

Then, Equation (47) can be rewritten as:

\begin{matrix} G_{i} (\bar{θ} | E_{r e a l}^{e s t} [0]) & = & \frac{N_{i} (\bar{θ} | E_{r e a l}^{e s t} [0])}{R (\bar{θ} | E_{r e a l}^{e s t} [0])} + L_{i} (θ_{i}) - y_{i} = 0 \\ y_{i} & = & E_{r e a l}^{e s t} [i h], i = [0, 7] \end{matrix}

(49)

where:

\begin{matrix} L_{i} (θ_{i}) = \frac{exp (- θ_{i} ξ_{i}^{-}) (ξ_{i}^{-} + \frac{1}{θ_{i}}) - exp (- θ_{i} ξ_{i}^{+}) (ξ_{i}^{+} + \frac{1}{θ_{i}})}{exp (- θ_{i} ξ_{i}^{-}) - exp (- θ_{i} ξ_{i}^{+})} \end{matrix}

(50)

We endeavor to solve these equations through minimizing the residual function:

\begin{matrix} J (\bar{θ}) = ∥ G (\bar{θ}) ∥ \Rightarrow min \end{matrix}

(51)

5.2. The Results of Computer Experiment

5.2.1. General conditions

We have performed calculations for the stages of estimation, testing and prediction with the following ranges for the model parameters, which include the possibility for both positive and negative trends of World population growth:

\begin{matrix} I_{r} & = & [- 0.025; 0.075], I_{u_{r}} = [- 0.002; 0.001] \end{matrix}

(52)

and identically the range for the measurement noises:

\begin{matrix} Ξ_{j} = [- 0.5; 0.5], j \in [0, 7] \end{matrix}

(53)

Under the above ranges of the model parameters and measurement noises, the computer experiment has constructed the entropy-optimal PDFs and generated the ensembles of corresponding random trajectories for RM testing and randomized forecasting.

5.2.2. Estimation

On the estimation interval, we employ the data from Table 1. The residual function

J (\bar{θ})

Equation (51) is a function of eight variables, and it contains integral component Equations (45)–(49) to be estimated only numerically. For this, we have selected the so-called tiled method of two-dimensional integral estimation, which represents a combination of several quadrature formulas [34].

The idea of the tiled method consists of: (1) partitioning the whole domain of integration into a set of smaller-area subdomains having the rectangular or trapezoidal shape; and (2) applying appropriate quadrature formulas on each subdomain. The described method is implemented in MATLAB by the function quad2d.

Minimization of the residual function

J (\bar{θ})

(51) runs by the nonlinear trusted region method implemented by the function lsqnonlin from the package Optimization Toolbox. The function lsqnonlin has been optimized [35] for nonlinear least-squares problems.

The function lsqnonlin have several user-defined options for stopping criteria, such as function evaluation tolerance (

10^{- 6}

), step size tolerance (

10^{- 6}

) and maximum number of iterations (500). Table 4 presents the calculated Lagrange multipliers for the above ranges.

Figure 1 and Figure 2 demonstrate the entropy-optimal PDFs for the parameters of Model (35) and noise components. The functions

P^{*} (r, u_{r}, \bar{θ})

and

q^{*} (ξ [i h]), i \in [0, 7]

represent an entropy-optimal distribution of random variables at the corresponding intervals and will be used for making randomized predictions.

5.2.3. Testing

Testing of the model has been made using the data from Table 2. The population size has been evaluated by the Formula (41), where

r, u_{r}

are the random variables with the PDF

P^{*} (r, u_{r})

and

ξ [i h]

are the random noises with the PDFs

q_{i} (ξ [i h]), i \in [0, 4]

(see Figure 1 and Figure 2).

To generate an ensemble of random variables, the two-dimensional modification of the Ulam–von Neumann method has been used [24]. The size of the generated sample is

k =

100,000.

Each pair of the random values r and

u_{r}

defines a separate exponential growth curve; moreover, for each point

i h

, a random value of the noise

ξ [i h]

is added according to its PDF. As a result, the constructed trajectory of World population dynamics is not an exponential function.

The test procedure yields the probabilistic characteristics of the parameters r,

u_{r}

and ξ. It depends on the initial population size for the testing interval. Figure 3 shows the ensemble of the model-based trajectories for the parameter range from Equation (52) and the noise range from Equation (53), with a data-based selection of the initial point:

\begin{matrix} E^{t s t} [0] = E_{r e a l}^{t s t} [0] \end{matrix}

(54)

Figure 3 has the following notation: 1, the ensemble-average trajectories of population dynamics; 2, real population dynamics on the testing interval; 3, population dynamics on the testing interval according to the UN forecast of the year 1998; 4, the boundaries of the first and the third quartiles. The same model-generated ensemble for testing interval can be presented as a box plot with median mark and interquartile ranges; see Figure 4.

Testing quality is assessed by the root-mean-square error of the average trajectory with respect to its real counterpart:

\begin{matrix} δ = ∥ E^{t s t} - E_{r e a l}^{t s t} ∥ = \sqrt{\sum_{i = 0}^{4} {(E_{r e a l}^{t s t} [i h] - E^{t s t} [i h])}^{2}} \end{matrix}

(55)

as well as by the relative error:

\begin{matrix} ε = \frac{δ}{∥ E_{r e a l}^{t s t} ∥ + ∥ E^{t s t} ∥} \end{matrix}

(56)

For instance, the UN forecast for the testing interval (Table 2) has the following errors:

\begin{matrix} δ_{1985} = 0.228, ε_{1985} = 0.008 \end{matrix}

(57)

In our case, the deviation of the model-average trajectory from the real one (Figure 3) appears to be appreciably smaller and demonstrates the following errors:

\begin{matrix} δ_{R M} = 0.079, ε_{R M} = 0.003 \end{matrix}

(58)

5.2.4. Prediction

This simple RM has been applied to predict World population dynamics for the period from 2015–2050. The trajectory ensemble corresponding to the UN predictions (Table 3) is illustrated by Figure 5: 1, the ensemble-average trajectory; 2, UN projection; 4, the boundaries of the interquartile range (IQR zone). The corresponding box-plot is presented in Figure 6.

The presented results testify that the randomized forecasting as opposed to existent methods gives a set of probability characteristics of the World population prediction, which is calculated by using the ensemble of prognostic trajectories. The latter is generated by the randomized dynamic model with entropy-optimal PDFs of parameters. The randomized projection algorithm shows significantly closer to real data numerical results for testing interval, as well as stable projection that is higher, but close to the modern UN forecast for the future. According to our randomized model, 2026 will be the first eight billion year.

6. Conclusions

In this paper, we have suggested a randomized forecasting method that operates dynamic models described by linear ordinary differential equations with random parameters. Entropy-robust estimation has been developed for the probability density functions (PDFs) of model parameters and noisy measurements based on entropy maximization. It has been shown that the above PDFs belong to the exponential class. The randomized forecasting technique has been applied to randomized prediction of the World population dynamics. It has been demonstrated that randomized forecasting gives a set of probability characteristics of the World population dynamics.

Acknowledgments

This work was supported by the Russian Foundation for Basic Research (Project No. 16-07-00743). We also thank the comments from the three anonymous reviewers, which improved the quality of the paper.

Author Contributions

All authors have contribued significantly to the work reported by this article. Y.S. developed and declared the method; Y.S. and Y.A. conceived and designed the experiments; Y.A. performed the experiments; Y.A. and A.Y. analyzed the data; A.Y. contributed reagents/materials/analysis tools; Y.S., Y.A and A.Y wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

De Beer, J.; van Wissen, L. (Eds.) Europe: One Continent, Different Worlds. Population Scenarios for the 21th Century; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1999.
Hilderink, H. World Population in Transition. An Integrated Regional Modelling Framework; Groningen University Press: Groningen, The Netherlands, 2000. [Google Scholar]
Feichtinger, G. (Ed.) Vienna Yearbook of Population Reseach; Vienna Institute of Demography, Austrian Academy of Sciences: Vienna, Austria, 2003.
Wissen, L.J.G. PROFILE: Population Research in the Netherlands. Public Serv. Rev. Eur. Union 2013, 25, 33. [Google Scholar]
Kendall, M.G.; Stuart, A. The Advanced Theory of Statistics; Vol. II, Inference and Relationship, 3rd ed.; Griffin: London, UK, 1973. [Google Scholar]
Cramer, H. Mathematical Methods of Statistics; Princeton Univ. Press: Princeton, NJ, USA, 1999. [Google Scholar]
Aivazyan, S.A.; Mkhitryan, V.S. Prikladnaya statistika i osnovy ekonometriki. In Applied Statistics and Foundations of Econometrics; Interreklama: Moscow, Russia, 2003. [Google Scholar]
Kobzar, A.I. Prikladnaya matematicheskaya statistika. In Applied Mathematical Statistics; Fizmatlit: Moscow, Russia, 2006. [Google Scholar]
Cho, S. A Linear Bayesian Stochastic Approximation to Update Project Duration Estimation. Eur. J. Oper. Res. 2009, 196, 585–593. [Google Scholar] [CrossRef]
Zellner, A. Bayesian Shrinkage Estimation and Forecasts of Individual and Total or Aggregate Outcomes. Econ. Model. 2010, 27, 1392–1397. [Google Scholar] [CrossRef]
Horvath, R. Research & Development and Growth: A Bayesian Model Averaging Analysis. Econ. Model. 2011, 28, 2669–2673. [Google Scholar]
Kim, M.J.; Jiang, R.; Makis, V.; Lee, C.-G. Optimal-Bayesian Fault Prediction Scheme for Partially Observable System Subject to Random Failure. Eur. J. Oper. Res. 2011, 214, 331–339. [Google Scholar] [CrossRef]
Musal, R.M.; Soyer, R.; McCabe, C.; Kharroubi, S.A. Estimating the Population Utility Function: A Parametric Bayesian Approach. Eur. J. Oper. Res. 2012, 218, 538–547. [Google Scholar] [CrossRef]
Borisov, A.V. A posteriori Minimax Estimation with Limit Likelihood. Autom. Remote Control 2012, 9, 49–56. [Google Scholar]
Lawrence, M.; Goodwin, P.; O’Connor, M.; Öncal, D. Judgemental Forecasting: A Review of Progress over the Last 25 years. Int. J. Forecast. 2006, 22, 493–518. [Google Scholar] [CrossRef]
Kociecki, A.; Kolasa, M.; Rubaszek, M. A Bayesian Method of Combining Judgmental and Model-Based Density Forecasts. Econ. Model. 2012, 29, 1349–1355. [Google Scholar] [CrossRef]
Lahiri, K.; Peng, H.; Zhao, Y. Testing the Value of Probability Forecasts for Calibrated Combining. Int. J. Forecast. 2015, 31, 113–129. [Google Scholar] [CrossRef] [PubMed]
Jacobson, M.Z. Fundamentals of Atmospheric Modeling; Cambridge University Press: New York, NY, USA, 2005; p. 828. [Google Scholar]
Allen, M.R.; Stainforth, D.A. Towards Objective Probabilistic Climate Forecasting. Nature 2002, 419. [Google Scholar] [CrossRef] [PubMed]
Popkov, A.Y.; Popkov, Y.S.; van Wissen, L. Positive Dynamic Systems with Entropy Operator: Application to Labour Market Modeling. Eur. J. Oper.l Res. 2005, 164, 811–828. [Google Scholar] [CrossRef]
Kapitsa, S.P. Obshchaya teoriya rosta naseleniya Zemli. In The General Theory of World Population Growth; Nauka: Moscow, Russia, 1999. [Google Scholar]
Popkov, Y.S. Mathematical Demoeconomy. Integrating Demographic and Economic Approach; De Gruyter: Berlin, Germany, 2014; p. 534.
Kaashoek, M.A.; Seatzu, S.; van der Mee, C. Recent Advances in Operator Theory and Its Applications; Springer: Berlin, Germany; Heidelberg, Germany, 2006; p. 478. [Google Scholar]
Rubinstein, R.Y.; Kroese, D.P. Simulation and the Monte Carlo Method; John Wiley & Sons: Chichester, UK, 2008. [Google Scholar]
Popkov, Y.S. Macrosystems Theory and its Applications; Springer: New York, NY, USA, 1995; p. 340. [Google Scholar]
Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
Popkov, Y.S.; Popkov, A.Y. New Methods of Entropy-Robust Estimation for Randomized Models under Limited Data. Entropy 2014, 16, 675–698. [Google Scholar] [CrossRef]
Ioffe, A.D.; Tihomirov, V.M. Theory of Extremal Problems (Studies in Mathematics and Its Applications); Nauka: Moscow, Russia, 1979. [Google Scholar]
Alekseev, V.M.; Tihomirov, V.M.; Fomin, S.V. Optimal Control; Nauka: Moscow, Russia, 1987. [Google Scholar]
Darkhovskii, B.S.; Popkov, Y.S.; Popkov, A.Y. Monte Carlo Method of Batch Iterations: Probabilistic Characteristics. Autom. Remote Control 2015, 76, 775–784. [Google Scholar] [CrossRef]
Gonzalo, J.A.; Munoz, F.-F.; Santos, D.J. Using a Rate Equation Approach to Model World Population Trends. Simul. Trans. Soc. Model. Simul. Int. 2013, 89, 192–198. [Google Scholar] [CrossRef]
Shryock, H.S.; Siegel, J.S. The Methods and Materials of Demography; United Nations Department of Commerce: Washington, DC, USA, 1973. [Google Scholar]
World Population Prospects: The 2015 Revision, Methodology of the United Nations Population Estimates and Projections; Working Paper No. ESA/P/WP.242; Department of Economic and Social Affairs, Population Division: New York, NY, USA, 2015.
Shampine, L.F. MatLab Program for Quadrature in 2D. Appl. Math. Comput. 2008, 202, 266–274. [Google Scholar] [CrossRef]
Coleman, T.F.; Li, Y. An Interior, Trust Region Approach for Nonlinear Minimization Subject to Bounds. SIAM J. Optim. 1996, 6, 418–445. [Google Scholar] [CrossRef]

Figure 1. The joint PDF of the random parameters r and

u_{r}

for the range

J = I_{r} ⋃ I_{u_{r}}

.

Figure 1. The joint PDF of the random parameters r and

u_{r}

for the range

J = I_{r} ⋃ I_{u_{r}}

.

Figure 2. The family of the PDFs of the noise

ξ_{i}

,

i \in [0, 7]

.

Figure 2. The family of the PDFs of the noise

ξ_{i}

,

i \in [0, 7]

.

Figure 3. The ensemble of projection trajectories on the testing interval.

Figure 4. Box plot for the ensemble of projection trajectories on the testing interval.

Figure 5. The ensemble of projection trajectories on the prediction interval.

Figure 6. Box plot for the ensemble of projection trajectories on the testing interval.

Table 1. Estimation interval

T_{e s t}

.

**Table 1.** Estimation interval $T_{e s t}$ .
i	0	1	2	3	4	5	6	7
year	1960	1965	1970	1975	1980	1985	1990	1995
$E_{real}^{est}$	3.026	3.358	3.691	4.070	4.449	4.884	5.320	5.724

Table 2. Testing interval

T_{t s t}

.

**Table 2.** Testing interval $T_{t s t}$ .
i	0	1	2	3	4
year	1995	2000	2005	2010	2015
$E_{real}^{tst}$	5.724	6.128	6.514	6.916	7.359
$E_{1985}^{prn}$	5.666	5.962	6.450	6.985	7.469

Table 3. Forecasting interval

T_{f r c}

**Table 3.** Forecasting interval $T_{f r c}$
i	0	1	2	3	4	5
year	2015	2020	2025	2030	2040	2050
$E_{UN}^{frc}$	7.359	7.644	7.964	8.284	8.924	9.564

Table 4. Calculated Lagrange multipliers.

**Table 4.** Calculated Lagrange multipliers.
Range $I_{r}, I_{u_{r}}$
Measurements	0	1	2	3	4	5	6	7
$\bar{θ}$	0.0000	−0.3833	−0.3984	−0.5839	−0.3802	−0.4679	−0.1812	0.8881

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Popkov, Y.S.; Dubnov, Y.A.; Popkov, A.Y. New Method of Randomized Forecasting Using Entropy-Robust Estimation: Application to the World Population Prediction. Mathematics 2016, 4, 16. https://doi.org/10.3390/math4010016

AMA Style

Popkov YS, Dubnov YA, Popkov AY. New Method of Randomized Forecasting Using Entropy-Robust Estimation: Application to the World Population Prediction. Mathematics. 2016; 4(1):16. https://doi.org/10.3390/math4010016

Chicago/Turabian Style

Popkov, Yuri S., Yuri A. Dubnov, and Alexey Yu. Popkov. 2016. "New Method of Randomized Forecasting Using Entropy-Robust Estimation: Application to the World Population Prediction" Mathematics 4, no. 1: 16. https://doi.org/10.3390/math4010016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Method of Randomized Forecasting Using Entropy-Robust Estimation: Application to the World Population Prediction

Abstract

1. Introduction

2. Randomized Model: Linear Differential Form

3. $S_{PQ}^{1}$ Entropy-Robust Estimation

4. Randomized Forecast Implementation

5. Application of the RF Method for World Population Prediction

5.1. The World Population Prediction Problem

5.1.1. Randomized population model

5.1.2. Real and forecasting data

5.1.3. The entropy-optimal PDFs of the parameters and noise

5.2. The Results of Computer Experiment

5.2.1. General conditions

5.2.2. Estimation

5.2.3. Testing

5.2.4. Prediction

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

New Method of Randomized Forecasting Using Entropy-Robust Estimation: Application to the World Population Prediction

Abstract

1. Introduction

2. Randomized Model: Linear Differential Form

3. S PQ 1 Entropy-Robust Estimation

4. Randomized Forecast Implementation

5. Application of the RF Method for World Population Prediction

5.1. The World Population Prediction Problem

5.1.1. Randomized population model

5.1.2. Real and forecasting data

5.1.3. The entropy-optimal PDFs of the parameters and noise

5.2. The Results of Computer Experiment

5.2.1. General conditions

5.2.2. Estimation

5.2.3. Testing

5.2.4. Prediction

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. $S_{PQ}^{1}$ Entropy-Robust Estimation