Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models

Popkov, Yuri S.; Popkov, Alexey Yu.; Dubnov, Yuri A.; Solomatine, Dimitri

doi:10.3390/math8071119

Open AccessArticle

Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models

by

Yuri S. Popkov

^1,2,3,*,

Alexey Yu. Popkov

¹,

Yuri A. Dubnov

^1,4 and

Dimitri Solomatine

⁵

¹

Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Moscow 119333, Russia

²

Institute of Control Sciences of Russian Academy of Sciences, Moscow 117997, Russia

³

Department of Software Engineering, ORT Braude College, Carmiel 2161002, Israel

⁴

National Research University “Higher School of Economics”, Moscow 101000, Russia

⁵

IHE Delft Institute for Water Education, 2601 Delft, The Netherlands

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(7), 1119; https://doi.org/10.3390/math8071119

Submission received: 20 May 2020 / Revised: 6 July 2020 / Accepted: 6 July 2020 / Published: 8 July 2020

(This article belongs to the Special Issue Machine Learning and Data Mining in Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

:

We propose a new forecasting procedure that includes randomized hierarchical dynamic regression models with random parameters, measurement noises and random input. We developed the technology of entropy-randomized machine learning, which includes the estimation of characteristics of a dynamic regression model and its testing by generating ensembles of predicted trajectories through the sampling of the entropy-optimal probability density functions of the model parameters and measurement noises. The density functions are determined at the learning stage by solving the constrained maximization problem of an information entropy functional subject to the empirical balances with real data. The proposed procedure is applied to the randomized forecasting of the daily electrical load in a regional power system. We construct a two-layer dynamic model of the daily electrical load. One of the layers describes the dependence of electrical load on ambient temperature while the other simulates the stochastic quasi-fluctuating temperature dynamics.

Keywords:

forecasting; randomization; dynamic regression; information entropy; empirical balance; randomized machine learning

1. Introduction

Due to the gradually increasing resources and computational power of computers, huge amounts of data can be accumulated and stored, both in natural and digitized formats. Then, the following question arises immediately: what should be done with these data, except for storage? Extracting new knowledge from data seems to be a very interesting idea. The concepts of Data Mining (DM) [1,2], Big Data (BD) [3] and Data Science (DS) [4] were formulated and further developed by researchers accordingly.

A very tempting goal—extracting new knowledge from data—inevitably leads to the verbal or formal (mathematical) modeling of the “expected” knowledge. Therefore, any model has some predictive properties, which can be implemented only under known values of its quantitative characteristics (parameters). Data are a fundamental component of the three concepts above: data are adopted for estimating the characteristics of a model using machine learning (ML) procedures, which allows extracting new knowledge.

Unlike DM, BD, and DS, the concept of ML has a rich history of over 70 years as well as vast experience in solving numerous problems. The first publication in this field of research dates back to 1957; see [5]. The notion of empirical risk, a key element of ML procedures, was introduced in 1970 in the monograph [6]. The method of potential functions for classification and recognition problems was also presented in 1970 in another monograph [7]. The modern concept of ML is based on the deterministic parametrization of models and estimates using data sets with postulated properties. The quality of estimation is characterized by empirical risk functions, and their minimization gives optimal estimates [8,9].

As a rule, real problems that are solved by ML procedures are immersed in some uncertain environment. If the matter concerns data, they are acquired with inevitable errors, omissions or low reliability. The design and parametrization of models is a non-formalizable and subjective process that depends on the individual knowledge of a researcher. Therefore, in the mass application of ML procedures, the level of uncertainty is quite high.

All these circumstances indicate that it is necessary to somehow compensate uncertainty. Here a general trend is the application of its stochastic description to parametrized models and data. This means that the model parameters are assumed to be random (appropriately randomized), and the data are assumed to have random errors. Machine learning procedures with these properties belong to the class of randomized machine learning (RML) procedures. Their difference from the conventional ML procedures is that optimal estimates are constructed not for the parameters but for the probability density functions (PDFs) of the random parameters and the PDFs of the worst-case random errors in the data. In the entropy-based RML procedures, the functional of generalized information entropy [10] is used as an optimality criterion for estimates.

The core of an RML procedure is a parametrized predictive model designed for simulating the temporal (or spatiotemporal) evolution of a process under study. Therefore, such a model belongs to the class of dynamic models. Parametric dynamic regression models (PDRMs) are most widespread representatives of this class, in which the current state of a model is determined by its past states on a certain time interval [11,12]. A formal image of PDRMs is difference equations, in the general case of the pth order [13]. Most applications are described by linear PDRMs. In particular, they naturally occur in many problems of macroeconomic modeling and forecasting, e.g., time series analysis of economic indices [14], adequacy analysis of PDRMs [15], and prediction of exchange rates [16]. Linear PDRMs are effective enough for short-term forecasting, yet causing significant errors on large forecasting horizons. Therefore, attempts to improve forecasts by introducing various nonlinearities into PDRMs are quite natural. The monograph [17] was dedicated to a general approach to the formation and use of nonlinear PDRMs. However, applications require a more “personalized” approach to choosing the most useful and effective nonlinearity. On this pathway, it seems fruitful, e.g., to forecast exchange rates using logistic and exponential nonlinearities [18], or to predict the daily electrical load of a power system using periodic autoregressive models [19] or multidimensional time series [20].

Since forecasting is performed under uncertainty, the resulting errors caused by some unaccounted factors are often compensated, if possible, by assigning some probabilities to forecasts [21,22]. The most common approach is the use of Bayes’ theorem on posterior probability. Let a parametrized conditional probability density function of data and an a priori probability density function of parameters be specified; then their normalized product will determine the posterior probability density function of the parameters under fixed data. Fundamental problems in this field of investigations are connected with the structural choice of the conditional and prior PDFs. Typically, Gaussian PDFs or their mixture are selected, and the mixture weights are estimated using retrospective data [23,24,25]. A similar approach was adopted in applications: population genetics [26], where the method of numerical approximation of posterior PDFs was developed; the interaction between the financial sector of the economy and the labor market [27], where the Metropolis–Hastings algorithm was used to estimate the parameters of the above PDFs; population dynamics [28], where a hierarchy of Bayesian models was designed to predict fertility, mortality and migration rates. Probabilistic forecasts are constructed by other methods, taking into account the specifics of applications. In meteorology, retrospective weather forecasts are accumulated for estimating the PDFs; subsequently, these PDFs are used for short-term forecasting [29,30,31,32]. A rather interesting procedure is to form a probabilistic forecast as a mixture of forecasts obtained by different methods [33].

In this paper, we propose a fundamentally different forecasting method—the so-called entropy-randomized forecasting (ERF). In accordance with this method, an ensemble of random forecasts is generated by a predictive dynamic regression model (PDRM) with random input and parameters. The corresponding probabilistic characteristics, namely the probability density functions, are determined using the entropy randomized machine learning procedure. The ensembles of forecasting trajectories are constructed by the sampling of the entropy-optimal PDFs.

The proposed method is adopted for randomized prediction of the daily electrical load of a regional power system. A hierarchical randomized dynamic regression model that describes the dependence of the load on the ambient temperature is constructed. The temporal evolution of the ambient temperature is represented by an oscillatory second-order dynamic regression model with a random parameter and a random input. The results of randomized learning of this model on the GEFCom2014 dataset [34] are given. A randomized forecasting technology is suggested and its adequacy is investigated depending on the length of forecasting horizon.

2. Procedure of Entropy-Randomized Forecasting

Randomization as a means of imparting artificial and rationally organized random properties to naturally nonrandom events, indicators, or methods is a fairly common technique that yields a positive effect. There exist many examples in various fields of science, management, economics: randomized numerical optimization methods [35,36]; the mixed (random) strategies of trading on a stock exchange [37]; the randomized forecasting of population dynamics [38]; vibration control of industrial processes [39]. As the result of randomization, nonrandom objects gain artificial stochastic properties with optimal probabilistic characteristics in a chosen sense. The question on appropriate quantitative characteristics of optimality has always been controversial and ambiguous. It requires arguments that would somehow reflect the important specifics of a randomized object. In particular, a fundamental feature of forecasting procedures is uncertainty in the data, predictive models, methods for generating forecasts, etc.

In what follows, information entropy [40] will be used as a characteristic of uncertainty. In the works [41,42,43], using the first law of thermodynamics it was demonstrated that entropy is a natural functional describing the processes of universal evolution. Moreover, in accordance with the second law of thermodynamics, entropy maximization determines the best state of an evolutionary process under the worst-case external disturbance (maximum uncertainty). Also, note another quality of information entropy associated with measurement errors and other types of errors, which are important attributes of data: with the factor of such errors being considered in terms of informational entropy, the probabilistic characteristics of noises exerting the worst-case impact on forecasting procedures can be estimated in explicit form.

The technology of entropy-randomized forecasting consists of the following stages. At the beginning (the first stage), a predictive randomized model (PRM) of a studied object is formed and its parameters are designed. A PRM transforms real data into a model output. In the general case, these transformations are assumed to be dynamic, i.e., the model output observed at a time instant n depends on the states observed on some past interval. The PRM parameters are assumed to be of the interval type and random, and their probabilistic properties are characterized by the corresponding PDFs.

The second stage of the technology under consideration—randomized machine learning (more specifically, its entropy version)—is intended to estimate the PDFs. At this stage, the estimates of the PDFs are calculated using learning data sets and also a learning algorithm in the form of a functional entropy-linear programming problem.

At the third stage, the optimized PRM (with the entropy-optimal PDFs) is tested using a test data set and accepted quantitative characteristics of the quality of learning. The optimized PRM actually generates an ensemble of random trajectories, vectors, or events with the entropy-optimal values of their parameters.

The learned and tested PRMs serve for forecasting. In this case, the ensembles of random forecasted trajectories generated by the entropy-optimal PRMs are used to calculate their numerical characteristics such as mean trajectories, variance curves, median trajectories, the PDF evolution of forecasted trajectories, etc.

3. Randomized Dynamic Regression Models with Random Input and Parameters

Randomized dynamic regression models (RDRMs) form a class of dynamic models with random parameters that describe a parametrized dependence of the object’s state at a given time instant on external factors and its states at some past time instants.

The structures of models are designed on the basis of existing knowledge and hypotheses about the properties of an object, which often turn out to be very inaccurate. Moreover, the external factors themselves can change over time and therefore should be predicted to model the object’s dynamics. Reliable information on realistically measured impacts leading to the temporal evolution of external factors is often unavailable. The aforementioned indicates the presence of uncertainty, both in the development and further use of models. In [10], a method for reducing the influence of uncertainty based on the randomization of models (including the class of RDRMs) was proposed. In the latter case, this method extends the idea of randomization to the modeling of external factors and their evolution.

The structure of the RDRM is shown in Figure 1. It consists of a model of the main object (RDRM-O) with random parameters

a \in R^{p}

and a model of external factors (RDRM-F) with random parameters

b \in R^{s}

and a random input

ζ \in R^{q} .

The states of the object and its model belong to the vector space

R^{m}

, in which

\hat{x} [n]

are the state vectors of the object and

x [n] \in R^{m}

are the state vectors of RDRM-O. The external factors are characterized by the vector

\hat{y} [n] \in R^{q}

while the changes in the state of RDRM-F over time by the vector

y [n] \in R^{q} .

The variable n denotes discrete time taking integer values on the interval

L = [n^{-}, n^{+}] .

Consider the linear version of RDRM-O. Its state

x [n]

at a time instant n is changing under the influence of p retrospective states

x [n - 1], \dots, x [n - p]

and measurable external factors

z [n] \in R^{q}

. The corresponding equation has the form

x [n] = X^{(n, p)} A^{(p)} + A_{(p + 1)} z [n],

(1)

with the following notations:

$A^{(p)} = {⌊ A_{1}, \dots, A_{p} ⌋}^{⊺}$

(2)

as the block column vector of parameters, where $A_{i}$ is a random matrix of dimensions $(m \times m)$ with random elements of the interval type, i.e.,

$A_{i} \in A_{i} = [A_{i}^{-}, A_{i}^{+}], i = \bar{1, p};$

(3)
$A_{(p + 1)}$ as a matrix of dimensions $(m \times q)$ with random elements of the interval type, i.e.,

$A_{(p + 1)} \in A_{(p + 1)} = [A_{(p + 1)}^{-}, A_{(p + 1)}^{+}];$

(4)
$X^{(n, p)} = ⌊ x [n - 1], \dots, x [n - p] ⌋$

(5)

as the block row vector of p retrospective states, where $⌊ • ⌋$ denotes a block row vector.

The probabilistic properties of the block vector

A^{(p)}

and the matrix

A_{(p + 1)}

are characterized by a joint PDF

P (A^{(p)})

and a PDF

F (A_{(p + 1)}),

respectively.

The state of RDRM-O is assumed to be measurable at each time instant n and also to contain an additive noise

μ [n]

:

v [n] = x [n] + μ [n] .

(6)

The random vectors

μ [n]

are of the interval type, i.e.,

μ [n] \in M_{n} = [μ_{n}^{-}, μ_{n}^{+}],

(7)

with a PDF

M_{n} (μ [n])

. The random vectors measured at different time instants are assumed to be statistically independent.

Consider the linear version of RDRM-F, which has a similar structure described by the equation

y [n] = Y^{(n, s)} B^{(s)} + ζ [n]

(8)

with the following notations:

$B^{(s)} = {⌊ B_{1}, \dots, B_{s} ⌋}^{⊺}$

(9)

as a block column vector formed by matrices $B_{i}$ of dimensions $(q \times q)$ with random elements of the interval type, i.e.,

$B_{i} \in B_{i} = [B_{i}^{-}, B_{i}^{+}], i = \bar{1, s};$

(10)
$Y^{(n, p)} = ⌊ y [n - 1], \dots, y [n - s] ⌋$

(11)

as a block row vector.

The probabilistic properties of the parameters are characterized by a continuously differentiable PDF

W (B^{(s)}) .

The random vector

ζ [n]

is of the interval type, i.e.,

ζ [n] \in E_{n} = [ζ_{n}^{-}, ζ_{n}^{+}],

(12)

with a continuously differentiable PDF

Q_{n} (ζ [n]) .

The random vectors

ζ [n]

measured at different time instants are statistically independent.

By analogy with RDRM-O, the state of RDRM-F is assumed to be measurable at each time instant n and also to contain an additive noise

ξ [n]

:

z [n] = y [n] + ξ [n] .

(13)

The random vectors

ξ [n]

are of the interval type, i.e.,

ξ [n] \in Ξ_{n} = [ξ_{n}^{-}, ξ_{n}^{+}],

(14)

with a continuously differentiable PDF

G_{n} (ξ [n])

. The random vectors measured at different time instants are assumed to be statistically independent.

Thus, in the RDRM (RDRM-O and RDRM-F), the unknown characteristics are the PDFs

P (A^{(p)})

,

F (A_{(p + 1)})

and

W (B^{(s)})

of the model parameters and also the PDFs

M_{n} (μ [n])

,

Q_{n} (ζ [n])

and

G_{n} (ξ [n])

of the measurement noises,

n \in L

.

4. Models of Learning Data Sets

The desired PDFs (see the previous section) are estimated using the learning data sets that are obtained on a learning interval

n \in L = [n^{-}, n^{+}]

and are consistent with the RDRM.

Consider RDRM-O. On the learning interval,

\begin{matrix} x [n^{-}] & = & X^{(n^{-}, p)} A^{(p)} + A_{(p + 1)} z [n^{-}], \\ x [n^{-} + 1] & = & X^{(n^{-} + 1, p)} A^{(p)} + A_{(p + 1)} z [n^{-} + 1], \\ \dots & = & \dots \dots \dots \dots \dots, \\ x [n^{+}] & = & X^{(n^{+}, p)} A^{(p)} + A_{(p + 1)} z [n^{+}] . \end{matrix}

(15)

The observable states of RDRM-O on the learning interval

L

represent the collection of vectors

v [n] = x [n] + μ [n], n = \bar{n^{-}, n^{+}} .

(16)

Hence, the learning data set consists of the data on retrospective states of the object,

{\hat{X}}^{(n^{-}, p)}, {\hat{X}}^{(n^{-} + 1, p)}, \dots, {\hat{X}}^{(n^{+}, p)},

(17)

and the data on observable current states,

\hat{v} [n^{-}], \dots, \hat{v} [n^{+}], \hat{z} [n^{-}], \dots, \hat{z} [n^{+}] .

(18)

Consider RDRM-F. On the learning interval,

\begin{matrix} y [n^{-}] & = & Y^{(n^{-}, s)} B^{(s)} + ζ [n^{-}], \\ y [n^{-} + 1] & = & Y^{(n^{-} + 1, s)} B^{(s)} + ζ [n^{-} + 1], \\ \dots & = & \dots \dots \dots \dots \dots, \\ y [n^{+}] & = & Y^{(n^{+}, p)} B^{(s)} + ζ [n^{+}] . \end{matrix}

(19)

The observable states of RDRM-F on the learning interval

L

represent the collection of vectors

z [n] = y [n] + ξ [n], n = \bar{n^{-}, n^{+}} .

(20)

Hence, the learning data set consists of the data on retrospective states of the factors,

{\hat{Y}}^{(n^{-}, s)}, {\hat{Y}}^{(n^{-} + 1, s)}, \dots, {\hat{Y}}^{(n^{+}, s)},

(21)

and the data on observable current states,

\hat{z} [n^{-}], \dots, \hat{z} [n^{+}] .

(22)

Thus, the learning procedure of the RDRM involves three data sets, (18), (21) and (22).

5. Algorithm of Randomized Machine Learning

The entropy version [10] of RML algorithms is used for estimating the PDFs of the model parameters and measurements noises of RDRM-O and RDRM-F. For RDRM-O, the corresponding algorithm has the form

\begin{matrix} H_{O} = - \int_{A} P (A^{(p)}) ln P (A^{(p)}) d A^{(p)} - \int_{A_{(p + 1)}} F (A_{(p + 1)}) ln F (A_{(p + 1)}) d A_{(p + 1)} - \\ - & \sum_{n = n^{-}}^{n^{+}} \int_{M_{n}} M_{n} (μ [n]) ln M_{n} (μ [n]) d M_{n} (μ [n]) \Rightarrow max \end{matrix}

(23)

subject to the following constraints:

-: the normalization conditions of the PDFs given by

$\begin{matrix} \int_{A} P (A^{(p)}) d A^{(p)} = 1, \int_{A_{(p + 1)}} F (A_{(p + 1)}) d A_{(p + 1)} = 1, \\ \int_{M_{n}} M_{n} (μ [n]) d M_{n} (μ [n]) = 1, n = \bar{n^{-}, n^{+}}; \end{matrix}$
-: the empirical balances given by

$\begin{matrix} \int_{A} P (A^{(p)}) {\hat{X}}^{(n, p)} A^{(p)} d A^{(p)} + \int_{A_{(p + 1)}} F (A_{(p + 1)}) A_{(p + 1)} \hat{z} [n] d A_{(p + 1)} + \\ + & \int_{M_{n}} M_{n} (μ [n]) μ [n] d μ [n] = \hat{v} [n], n = \bar{n^{-}, n^{+}} . \end{matrix}$

(24)

Please note that the empirical balances represent a system of

(n^{+} - n^{-})

blocks composed of m equations. With each block an m-dimensional vector of Lagrange multipliers

θ^{(n)}

is associated. This optimization problem belongs to the class of entropy-linear programming problems of the Lyapunov type [44]. It has an analytic solution parametrized by the Lagrange multipliers:

\begin{matrix} P^{*} (A^{(p)}) & = & \frac{exp (- \sum_{n = n^{-}}^{n^{+}} 〈 θ^{(n)}, {\hat{X}}^{(n, p)} A^{(p)} 〉)}{P (θ)}, \\ F^{*} (A_{(p + 1)}) & = & \frac{exp (- \sum_{n = n^{-}}^{n^{+}} 〈 θ^{(n)}, A_{(p + 1)} \hat{z} [n] 〉)}{F (θ)}, \\ M_{n}^{*} (μ [n]) & = & \frac{exp (- 〈 θ^{(n)}, μ [n] 〉)}{M_{n} (θ^{(n)})}, n = \bar{n^{-}, n^{+}} . \end{matrix}

(25)

In the above formulas,

\begin{matrix} P (θ) & = & \int_{A} exp (- \sum_{n = n^{-}}^{n^{+}} 〈 θ^{(n)}, {\hat{X}}^{(n, p)} A^{(p)} 〉) d A^{(p)}, \\ F (θ) & = & \int_{A_{(p + 1)}} exp (- \sum_{n = n^{-}}^{n^{+}} 〈 θ^{(n)}, A_{(p + 1)} \hat{z} [n] 〉) d A_{(p + 1)}, \\ M_{n} (θ^{(n)}) & = & \int_{M_{n}} exp (- 〈 θ^{(n)}, μ [n] 〉) d μ [n], n = \bar{n^{-}, n^{+}} . \end{matrix}

(26)

The matrix of Lagrange multipliers

θ = [θ^{(n^{-})}, \dots, θ^{(n^{+})}]

is determined by solving the balance equations

\begin{matrix} \frac{1}{P (θ)} \int_{A} exp (- \sum_{n = n^{-}}^{n^{+}} 〈 θ^{(n)}, {\hat{X}}^{(n, p)} A^{(p)} 〉) {\hat{X}}^{(n, p)} A^{(p)} d A^{(p)} + \\ + & \frac{1}{F (θ)} \int_{A_{(p + 1)}} exp (- \sum_{n = n^{-}}^{n^{+}} 〈 θ^{(n)}, A_{(p + 1)} \hat{z} [n] 〉) A_{(p + 1)} \hat{z} [n] d A_{(p + 1)} + \\ + & \frac{1}{M_{n} (θ^{(n)})} \int_{M_{n}} exp (- 〈 θ^{(n)}, μ [n] 〉) μ [n] d μ [n] = \hat{x} [n], n = \bar{n^{-}, n^{+}} . \end{matrix}

(27)

From (25)–(27) it follows that the PDFs

P^{*} (A^{(p)})

and

F^{*} (A_{(p + 1)})

of the model parameters of RDRM-O and the PDFs

M_{n}^{*} (μ [n]), n = \bar{n^{-}, n^{+}},

of the measurement noises are found using the retrospective learning data sets

\hat{X} (n^{-}, p), \hat{X} (n^{-} + 1, p), \dots, \hat{X} (n^{+}, p)

, the current state data sets

\hat{x} [n^{-}], \dots, \hat{x} [n^{+}]

and the data sets

\hat{z} [n^{-}], \dots, \hat{z} [n^{+}]

generated by RDRM-F.

For obtaining the latter collections, the RML algorithm is applied to estimate the PDFs of the model parameters and measurement noises of RDRM-F. In accordance with [10],

\begin{matrix} H_{F} = - \int_{B} W (B) ln W (B) d B - \\ - & \sum_{n = n^{-}}^{n^{+}} \int_{E_{n}} Q_{n} (ζ [n]) ln Q_{n} (ζ [n]) d Q_{n} (ζ [n]) - \\ - & \sum_{n = n^{-}}^{n^{+}} \int_{Ξ_{n}} G_{n} (ξ [n]) ln G_{n} (ξ [n]) d G_{n} (ξ [n]) \Rightarrow max \end{matrix}

(28)

subject to the following constraints:

-: the normalization conditions of the PDFs given by

$\begin{matrix} \int_{B} W (B) d B = 1, \\ \int_{E_{n}} Q_{n} (ζ [n]) d Q_{n} (ζ [n]) = 1, \int_{Ξ_{n}} G_{n} (ξ [n]) d G_{n} (ξ [n]) = 1, n = \bar{n^{-}, n^{+}}; \end{matrix}$
-: the empirical balances given by

$\begin{matrix} \int_{B} W (B^{(s)}) {\hat{Y}}^{(n, s)} B^{(s)} d B^{(s)} + \int_{E_{n}} Q_{n} (ζ [n]) ζ [n] d ζ [n] + \\ + & \int_{Ξ_{n}} G_{n} (ξ [n]) ξ [n] d ξ [n] = \hat{z} [n], n = \bar{n^{-}, n^{+}} . \end{matrix}$

(29)

This problem is from the same class as (25)–(27). It has the following analytic solution in terms of Lagrange multipliers:

\begin{matrix} W^{*} (B^{(s)}) = \frac{exp (- \sum_{n = n^{-}}^{n^{+}} 〈 η^{(n)}, {\hat{Y}}^{(n, s)} B^{(s)} 〉)}{W (η)}, \\ Q_{n}^{*} (ζ [n]) = \frac{exp (- 〈 ζ [n], η^{(n)} 〉)}{Q_{n} (η^{(n)})}, \\ G_{n}^{*} (ξ [n]) = \frac{exp (- 〈 ξ [n], η^{(n)} 〉)}{G_{n} (η^{(n)})}, n = \bar{n^{-}, n^{+}} . \\ η = [η^{(n^{-})}, \dots, η^{(n^{+})}] . \end{matrix}

(30)

In the above formulas,

\begin{matrix} W (η) = \int_{B} exp (- \sum_{n = n^{-}}^{n^{+}} 〈 η^{(n)}, {\hat{Y}}^{(n, s)} B^{(s)}) d B^{(s)}, \\ Q_{n} (η^{(n)}) = \int_{E_{n}} exp (- 〈 ζ [n], η^{(n)} 〉) d ζ [n], G_{n} (η^{(n)}) = \int_{Ξ_{n}} exp (- 〈 ξ [n], η^{(n)} 〉) d ξ [n], \\ n = \bar{n^{-}, n^{+}} . \end{matrix}

(31)

The matrix of Lagrange multipliers

η

is determined by solving the balance equations

\begin{matrix} W^{(- 1)} (η) \int_{B} exp (- \sum_{n = n^{-}}^{n^{+}} 〈 η^{(n)}, {\hat{Y}}^{(n, s)} B^{(s)}) {\hat{Y}}^{(n, s)} B^{(s)} d B^{(s)} + \\ + & Q_{n}^{(- 1)} (η^{(n)}) \int_{E_{n}} exp (- 〈 ζ [n], η^{(n)} 〉) ζ [n] d ζ [n] + \\ + & G_{n}^{(- 1)} (η^{(n)}) \int_{Ξ_{n}} exp (- 〈 ξ [n], η^{(n)} 〉) ξ [n] d ξ [n] = \hat{z} [n], \\ n = \bar{n^{-}, n^{+}} . \end{matrix}

(32)

6. Entropy-Randomized Forecasting of Daily Electrical Load of Power System

The daily electrical load L of a power system depends on many various factors. The analysis below is restricted to one of the most significant external factors—the ambient temperature

T .

The daily temperature variations are fluctuating [45,46]. These fluctuations affect electrical load, but with some time delay due the inertia of a power network supplying electrical energy from the generator to consumers.

(1). Dynamic Regression Model.

In accordance with the general structure of the RDRM (see Section 2), the electrical load model (the

L T

model) describes the dynamic relationship between electrical load and ambient temperature while the ambient temperature model (the

T ξ

model) describes the daily dynamics of ambient temperature. There exist quite a few versions of the

L T

model, albeit all being static, i.e., describing the relationship between electrical load and ambient temperature at current time instants [47]. The daily temperature dynamics are fluctuating, and such fluctuations are described, in particular, by the periodic autoregressive model [48].

Please note that the effect of ambient temperature on electrical load is dynamic, i.e., the change in load due to temperature at a given time instant depends on its value at a previous time instant. A similar property applies to ambient temperature fluctuations.

Therefore, following the general randomized approach, the

L T

model is designed as a first-order dynamic regression model with random parameters, while the

T ξ

model is designed as a second-order dynamic regression with a random parameter and a random input

ξ .

Then the

L T ξ

model is the composition of the two models above.

In the class of linear models, the randomized dynamic regression load–temperature model (

L T

model) of the first order can be written in the form

\begin{matrix} L [n] = a L [n - 1] + b T [n], \\ v [n] = L [n] + μ [n], n = \bar{n^{-}, n^{+}}, \end{matrix}

(33)

where random independent parameters a and b take values within intervals

a \in A = [a^{-}, a^{+}], b \in B = [b^{-}, b^{+}] .

(34)

The probabilistic properties are characterized by PDFs

P (a)

and

F (b)

defined on the sets

A

and

B,

respectively. The random noise

μ [n]

that simulates electrical load measurement errors is of the interval type as well. In the general case, for each time instant the intervals may have different limits, i.e.,

μ [n] \in M_{n} = [μ^{-} [n]], μ^{+} [n]],

(35)

with PDFs

M_{n} (μ [n]),]

,

n = \bar{n^{-}, n^{+}}

.

Consider the

T ξ

model. The fluctuating character of the daily temperature variations is described by the randomized dynamic regression model of the second order

\begin{matrix} τ [n] & = & c (2.1 τ [n - 1] - 1.1 τ [n - 2]), \\ T [n] & = & t + τ [n] + ξ [n], \end{matrix}

(36)

where t is the mean daily temperature. These parameters are random and take values within given intervals

c \in [c^{-}, c^{+}]

. The probabilistic properties of the parameters are characterized by PDFs

W (c)

defined on corresponding intervals.

Equation (36) contains random noises described by independent random variables

ξ [n]

; in each measurement

n,

their values may lie in different intervals, i.e.,

ξ [n] \in Ξ_{n} = [ξ_{n}^{-}, ξ_{n}^{+}] .

(37)

The probabilistic properties of the random variable

ξ [n]

are characterized by a PDF

Q_{n} (ξ [n]), n = \bar{n^{-}, n^{+}}

.

Thus, Equations (33) and (36) describing electrical load dynamics in a power system are characterized by the following PDFs:

the $L T$ model, by the PDFs $P (a)$ and $F (b)$ of the model parameters and the PDF $M_{n} (μ [n])$ , of the measurement noises, $n = \bar{n^{-}, n^{+}}$ ;
the $T ξ$ model, by the PDFs $W (c)$ of the model parameters and the PDFs $Q_{n} (ξ [n])$ of the measurement noises, $n = \bar{n^{-}, n^{+}}$ .

(2). Learning Data Set.

For estimating the PDFs, the normalized real data from the GEFCom2014 dataset (see [34]) on daily electrical load variations

0 \leq L_{r}^{(i)} [n] \leq 1

, mean daily temperature variations

0 \leq t_{r}^{(i)} \leq 1

and temperature deviations

0 \leq τ_{r}^{(i)} [n] \leq 1

from the mean daily value can be used. (Here normalization means the reduction to the unit interval.)

The normalization procedure is performed in the following way:

\begin{matrix} L_{r}^{(i)} [n] & = & \frac{{\hat{L}}_{r}^{(i)} [n] - {\hat{L}}_{m i n}^{(i)}}{{\hat{L}}_{m a x}^{(i)} - {\hat{L}}_{m i n}^{(i)}}, \\ τ_{r}^{(i)} [n] & = & \frac{{\hat{τ}}_{r}^{(i)} [n] - {\hat{τ}}_{m i n}^{(i)}}{{\hat{τ}}_{m a x}^{(i)} - {\hat{τ}}_{m i n}^{(i)}}, \\ t_{r}^{(i)} & = & \frac{1}{n^{+} - n^{-}} \sum_{n = n^{-}}^{n^{+}} τ_{r}^{(i)} [n], \end{matrix}

(38)

where

{\hat{L}}_{m i n}^{(i)} = min_{n} {\hat{L}}^{(i)} [n]

,

{\hat{L}}_{m a x}^{(i)} = max_{n} {\hat{L}}^{(i)} [n]

,

{\hat{τ}}_{m i n}^{(i)} = min_{n} {\hat{τ}}^{(i)} [n]

,

{\hat{τ}}_{m a x}^{(i)} = max_{n} {\hat{τ}}^{(i)} [n]

.

In accordance with (33) and (36), the model variables and the corresponding real data on the learning interval

n \in T_{l}

are described by the vectors

\begin{matrix} L^{(i)} (T_{l}) = {L^{(i)} [1], \dots, L^{(i)} [24]}, & L_{r}^{(i)} (T_{l}) = {L_{r}^{(i)} [1], \dots, L_{r}^{(i)} [24]}, \\ L^{(i)} (T_{l} - 1) = {L^{(i)} [0], \dots, L^{(i)} [23]}, & L_{r}^{(i)} (T_{l} - 1) = {L_{r}^{(i)} [0], \dots, L_{r}^{(i)} [23]}, \\ V^{(i)} (T_{l}) = {v^{(i)} [1], \dots, v^{(i)} [24]}, & V_{r}^{(i)} (T_{l}) = {v_{r}^{(i)} [1], \dots, v_{r}^{(i)} [24]}, \\ T^{(i)} (T_{l}) = {τ^{(i)} [1], \dots, τ^{(i)} [24]}, & T_{r}^{(i)} (T_{l}) = {τ_{r}^{(i)} [1], \dots, τ_{r}^{(i)} [24]}, \end{matrix}

\begin{matrix} {\tilde{T}}^{(i)} (T_{l} - 1, T_{l} - 2) = {2, 1 τ^{(i)} [0] - 1, 1 τ^{(i)} [- 1], \dots, 2, 1 τ^{(i)} [23] - 1, 1 τ^{(i)} [22]}, \\ {\tilde{T}}_{r}^{(i)} (T_{l} - 1, T_{l} - 2) = {2, 1 τ_{r}^{(i)} [0] - 1, 1 τ_{r}^{(i)} [- 1], \dots, 2, 1 τ_{r}^{(i)} [23] - 1, 1 τ_{r}^{(i)} [22]}, \end{matrix}

μ^{(i)} (T_{l}) = {μ^{(i)} [1], \dots, μ^{(i)} [24]}, ξ^{(i)} (T_{l}) = {ξ^{(i)} [1], \dots, ξ^{(i)} [24]} .

In terms of (39), the

L T

and

T ξ

models on the learning interval

T_{l}

have the form

\begin{matrix} L^{(i)} (T_{l}) = a L^{(i)} (T_{l} - 1) + b T^{(i)} (T_{l}), \\ V^{(i)} (T_{l}) = L^{(i)} (T_{l}) + μ^{(i)} (T_{l}), \\ T^{(i)} (T_{l}) = c {\tilde{T}}^{(i)} (T_{l} - 1, T_{l} - 2), \\ T^{(i)} (T_{l}) = t + {\tilde{T}}^{(i)} (T_{l}) + ξ^{(i)} (T_{l}) . \end{matrix}

(39)

The random parameters take values within the intervals

A = [0.05, 0.15], B = [0.5, 1.0], C = [0.75, 0.85] .

(40)

The measurement noises take values within the intervals

M_{n} = [- 0.1, 0.1], Ξ_{n} = [- 0.1, 0.1] .

(41)

(3). Entropy-Optimal Probability Density Functions of Parameters and Noises.

In accordance with the approach described in Section 5, for the

L T

model (33)–(35) the PDFs parametrized by the Lagrange multipliers

θ^{(i)} = {θ_{1}^{(i)}, \dots, θ_{24}^{(i)}}

have the form

\begin{matrix} P_{i}^{*} (a, θ^{(i)}) = \frac{l_{r}^{(i)} (θ) exp (- a l_{r}^{(i)} (θ))}{exp (- a^{-} l_{r}^{(i)} (θ)) - exp (- a^{+} l_{r}^{(i)} (θ))}, \\ F_{i}^{*} (b, θ^{(i)}) = \frac{h_{r}^{(i)} (θ) exp (- b h_{r}^{(i)} (θ))}{exp (- b^{-} h_{r}^{(i)} (θ)) - exp (- b^{+} h_{r}^{(i)} (θ))}, \\ M_{i, n}^{*} (μ [n]) = \frac{θ_{n}^{(i)} exp (- θ_{n}^{(i)} μ [n])}{exp (- μ^{-} [n] θ_{n}^{(i)}) - exp (- μ^{+} [n] θ_{n}^{(i)})}, n = \bar{1, 24}, \end{matrix}

(42)

where

l_{r}^{(i)} (θ) = \sum_{n = 1}^{24} θ_{n} L_{r}^{(i)} [n - 1], h_{r}^{(i)} (θ) = \sum_{n = 1}^{24} θ_{n} T_{r}^{(i)} [n] .

(43)

The Lagrange multipliers

θ^{(i)}

are calculated by solving the system of balance equations

L^{(i)} (θ^{(i)}) + T^{(i)} (θ^{(i)}) + M_{n}^{(i)} (θ_{n}^{(i)}) = L_{r}^{(i)} [n], n = \bar{1, 24},

(44)

where

\begin{matrix} L^{(i)} (θ^{(i)}) = \frac{exp (- a^{-} l_{r}^{(i)} (θ^{(i)})) (a^{-} l^{(i)} (θ^{(i)}) + 1) - exp (- a^{+} l_{r}^{(i)} (θ^{(i)})) (a^{+} l^{(i)} (θ^{(i)}) + 1)}{exp (- a^{-} l^{(i)} (θ^{(i)})) - exp (- a^{+} l^{(i)} (θ^{(i)}))}, \\ T^{(i)} (θ^{(i)}) = \frac{exp (- b^{-} h_{r}^{(i)} (θ^{(i)})) (b^{-} h^{(i)} (θ^{(i)}) + 1) - exp (- b^{+} h_{r}^{(i)} (θ^{(i)})) (b^{+} h^{(i)} (θ^{(i)}) + 1)}{exp (- b^{-} h^{(i)} (θ^{(i)})) - exp (- b^{+} h^{(i)} (θ^{(i)}))}, \\ M_{n}^{(i)} (θ_{n}^{(i)}) = \frac{exp (- μ^{-} [n] θ_{n}^{(i)}) (μ^{-} [n] θ_{n}^{(i)}) + 1) - exp (- μ^{+} [n] θ_{n}^{(i)}) (μ^{+} [n] θ_{n}^{(i)}) + 1)}{θ_{n}^{(i)} (exp (- μ^{-} [n] θ_{n}^{(i)}) - exp (- μ^{+} [n] θ_{n}^{(i)}))} . \end{matrix}

(45)

Consider the

T ξ

model. The corresponding entropy-optimal PDFs parametrized by the Lagrange multipliers have the form

\begin{matrix} W_{i}^{*} (c, η^{(i)}) = \frac{{\tilde{h}}_{r}^{(i)} (η) exp (- c {\tilde{h}}_{r}^{(i)} (η))}{exp (- a^{-} {\tilde{h}}_{r}^{(i)} (η)) - exp (- a^{+} {\tilde{h}}_{r}^{(i)} (η))}, \\ Q_{i, n}^{*} (ξ [n]) = \frac{η_{n}^{(i)} exp (- η_{n}^{(i)} ξ [n])}{exp (- ξ^{-} [n] η_{n}^{(i)}) - exp (- ξ^{+} [n] η_{n}^{(i)})}, n = \bar{1, 24}, \end{matrix}

where

{\tilde{h}}_{r}^{(i)} (η) = \sum_{n = 1}^{24} η_{n} (2, 1 T_{r}^{(i)} [n - 1] - 1, 1 T_{r}^{(i)} [n - 2]), q^{(i)} (η^{(i)}) = \sum_{n = 1}^{24} η_{n}^{(i)} .

(46)

The Lagrange multipliers

η^{(i)}

are calculated by solving the system of balance equations

D^{(i)} (η^{(i)}) + N^{(i)} (η^{(i)}) + K_{n}^{(i)} (η_{n}^{(i)}) = T_{r}^{(i)} [n], n = \bar{1, 24},

(47)

where

\begin{matrix} D^{(i)} (η^{(i)}) = \frac{exp (- t^{-} q^{(i)} (η^{(i)})) (t^{-} q^{(i)} (η^{(i)}) + 1) - exp (- t^{+} q^{(i)} (η^{(i)})) (t^{+} q^{(i)} (η^{(i)}) + 1)}{exp (- t^{-} q^{(i)} (η^{(i)})) - exp (- t^{+} q^{(i)} (η^{(i)}))}, \\ N^{(i)} (η^{(i)}) = \frac{exp (- c^{-} {\tilde{h}}_{r}^{(i)} (η^{(i)})) (c^{-} {\tilde{h}}^{(i)} (η^{(i)}) + 1) - exp (- c^{+} {\tilde{h}}_{r}^{(i)} (η^{(i)})) (c^{+} {\tilde{h}}^{(i)} (η^{(i)}) + 1)}{exp (- c^{-} {\tilde{h}}^{(i)} (η^{(i)})) - exp (- c^{+} {\tilde{h}}^{(i)} (η^{(i)}))}, \\ K_{n}^{(i)} (η_{n}^{(i)}) = \frac{exp (- ξ^{-} [n] η_{n}^{(i)}) (ξ^{-} [n] η_{n}^{(i)}) + 1) - exp (- ξ^{+} [n] η_{n}^{(i)}) (ξ^{+} [n] η_{n}^{(i)}) + 1)}{η_{n}^{(i)} (exp (- ξ^{-} [n] η_{n}^{(i)}) - exp (- ξ^{+} [n] η_{n}^{(i)}))} . \end{matrix}

(48)

(4). Results of Model Learning.

Using the available data on daily variations of electrical load and ambient temperature (see Figure 1) for the three days indicated above, the balance Equations (44), (45), (47) and (48) were formed. Their solution was determined by minimizing the quadratic residual between the left- and right-hand sides of the equations. Since the equations are significantly nonlinear, the resulting values of the Lagrange multipliers (see Table 1) correspond to a local minimum of the residual. All calculations were implemented in MATLAB; optimization was performed using the fsolve function.

Because the parameters of the

L T

model are independent, the joint PDFs

U_{i}^{*} (a, b) = P_{i}^{*} (a) F_{i}^{*} (b)

of the parameters and noises have the form

\begin{matrix} U_{1}^{*} (a, b) & = & 53.09 exp (- 9.72 a) exp (0.06 b), \\ U_{2}^{*} (a, b) & = & 55.49 exp (- 6.04 a) exp (0.58 b), \\ U_{3}^{*} (a, b) & = & 65.81 exp (- 6.09 a) exp (- 0.81 b), \end{matrix}

(49)

(a, b) \in [0.05, 0.15] ⋃ [0.5, 1.0], μ \in [- 0.1, 0.1], i = \bar{1, 24} .

Clearly, the PDFs are of exponential type. For

i = 1,

the graphs are shown in Figure 2.

For the

T ξ

model, the PDFs of the parameters and noises have the form

\begin{matrix} W_{1}^{*} (c) & = & 13.90 exp (- 0.41 c), \\ W_{2}^{*} (c) & = & 10.43 exp (- 0.05 c), \\ W_{3}^{*} (c) & = & 11.65 exp (- 0.19 c), \end{matrix}

c \in [0.75, 0.85], ξ \in [- 0.1, 0.1], i = \bar{1, 24} .

For

i = 1,

the graphs can be seen in Figure 3.

Thus, the randomized

L T ξ

model generates random trajectories with the entropy-optimal PDFs of the model parameters and measurement noises:

\begin{matrix} L [n] & = & a L [n - 1] + b T [n], (P^{*} (a), F^{*} (b)), \\ v [n] & = & L [n] + μ [n], M_{n}^{*} (μ [n]); \\ τ [n] & = & c (2.1 τ [n - 1] - 1.1 τ [n - 2]), W^{*} (c), i = \bar{1, 3}, \\ T [n] & = & t + τ [n] + ξ [n], Q_{n}^{*} (ξ [n]) . \end{matrix}

(50)

The corresponding ensembles are generated by the sampling procedure of the resulting PDFs of the parameters and noises using the acceptance-rejection (AR) method (also known as rejection sampling (RS); see [49]). During calculations, 100 samples for each parameter and 100 samples for each noise were used; in other words, the ensemble consisted of

10^{4}

trajectories.

(5). Model Testing.

The adequacy of the model was analyzed by the self- and cross-testing of the

L T

and

T ξ

models on the real load–temperature data for 3–5 July 2016 (

i = 1, 2, 3

). Self-testing means generating an ensemble of trajectories with the entropy-optimal parameters and noises for day i, calculating the mean (mean) and median (med) trajectories and also the variance curve (std±) of the ensemble, and comparing the mean trajectory with the real counterparts by electrical load and ambient temperature for the same day i. The quality of approximation is characterized by relative errors,

δ_{L}^{(i)} = \frac{\sum_{n = 1}^{24} {(L_{m e a n}^{(i)} [n] - L_{r}^{(i)} [n])}^{2}}{\sum_{n = 1}^{24} {(L_{m e a n}^{(i)} [n])}^{2} + \sum_{n = 1}^{24} {(L_{r}^{(i)} [n])}^{2}}, i = \bar{1, 3},

(51)

in electrical load and

δ_{T}^{(i)} = \frac{\sum_{n = 1}^{24} {(T_{m e a n}^{(i)} [n] - T_{r}^{(i)} [n])}^{2}}{\sum_{n = 1}^{24} {(T_{m e a n}^{(i)} [n])}^{2} + \sum_{n = 1}^{24} {(T_{r}^{(i)} [n])}^{2}}, i = \bar{1, 3},

(52)

in ambient temperature.

Cross-testing represents a similar procedure in which the mean trajectories are compared with the real counterparts in terms of electrical load and ambient temperature for days

j \neq i .

The quality of approximation is characterized by relative errors,

δ_{L}^{(i, j)} = \frac{\sum_{n = 1}^{24} {(L_{m e a n}^{(i)} [n] - L_{r}^{(j)} [n])}^{2}}{\sum_{n = 1}^{24} {(L_{m e a n}^{(i)} [n])}^{2} + \sum_{n = 1}^{24} {(L_{r}^{(j)} [n])}^{2}}, i = \bar{1, 3}, i \neq j,

(53)

in electrical load and

δ_{T}^{(i, j)} = \frac{\sum_{n = 1}^{24} {(T_{m e a n}^{(i)} [n] - T_{r}^{(j)} [n])}^{2}}{\sum_{n = 1}^{24} {(T_{m e a n}^{(i)} [n])}^{2} + \sum_{n = 1}^{24} {(T_{r}^{(j)} [n])}^{2}}, i = \bar{1, 3}, i \neq j,

(54)

in ambient temperature.

Self-testing. For the

L T

model, the real ambient temperature data

T_{r}^{(i)} [n]

as well as the entropy-optimal PDFs

P_{i}^{*} (a)

and

F_{i}^{*} (b)

of the parameters

(a, b)

and the PDFs

M_{1}^{*} (μ [1]), \dots, M_{24}^{*} (μ [24])

of the measurement noises

μ [n]

were used. The ensembles

L^{(i)}

were generated using the sampling procedure of the above PDFs. The mean trajectory

L_{m e a n}^{(i)} [n]

, the median trajectory

L_{m e d}^{(i)} [n]

and also the trajectories

L_{s t d \pm}^{(i)} [n]

corresponding to the limits of the variance graph were found. The errors

δ_{L}^{(i)}

were calculated. The resulting ensembles and relative errors

δ_{L}^{(i)}

for the three indicated days are demonstrated in Figure 4.

The

T ξ

model was tested by generating the ensemble

T^{(i)}

of random trajectories

T^{(i)} [n]

,

n = \bar{1, 24}

with the entropy-optimal PDFs

W^{(i)} (c)

and

Q_{1}^{*} (ξ [1]), \dots, Q_{24}^{*} (ξ [24])

through sampling. The mean trajectory

T_{m e a n}^{(i)} [n]

, the median trajectory

T_{m e d}^{(i)} [n]

and the trajectory

T_{s t d \pm}^{(i)} [n]

corresponding to the limits of the variance curve are calculated. The resulting ensembles and relative errors

δ_{T}^{(i)}

for the three days are shown in Figure 5.

Cross-testing. For cross-testing, the

L T

and

L T ξ

models learned on the data for day i were used, and their mean trajectories were compared with the data for days

j \neq i

. The resulting errors are combined in Table 2, Table 3 and Table 4.

(6). Randomized Prediction of N-Daily Load.

In the randomized prediction of the N-daily load, the

L T ξ

model learned on the interval

T_{l}

was used. The quality of the forecast was characterized using the

L T ξ

model with the entropy-optimal PDFs obtained on the real data for the first (

i = 1

) day.

The 1-day (

n \in [1, 24]

), 2-day (

n \in [1, 48]

) and 3-day (

n \in [1, 72]

) ensembles were constructed by the sampling procedure of the above PDFs. For these ensembles, the mean trajectories

L_{m e a n} [n]

, the median trajectories

L_{m e d} [n]

and also the limiting trajectories

L_{s t d \pm} [n]

of the variance curve were found. The forecast results were compared with the real data for 3–7 July 2006 (

i = \bar{1, 4}

). The forecasting quality was characterized by the relative errors calculated similar to (53) and (54).

The resulting 24-h, 48-h and 72-h randomized forecasts of electrical load and their probabilistic characteristics (the mean and median trajectories, the limit trajectories of the variance curves) are presented in Figure 6. The errors, i.e., the deviations between the model forecasts and real data, can be seen in Table 5.

7. Conclusions

The article proposes a new forecasting approach based on the idea of generation not the only forecast and not a set of forecasts with the scenario’s model parameters, and not forecasts with assigned probabilities, but the ensemble of random forecasts with entropy-optimal model parameters and measurement noises.

For the randomized forecasting we propose a structure of predictive dynamical model which uses as real data as optimized noises. The latter are the source of ensemble of predictive trajectories, which allow computing deterministic trajectories of its different numerical characteristics and probabilistic estimates.

Author Contributions

Conceptualization, Y.S.P.; Data curation, A.Y.P.; Methodology, Y.S.P., A.Y.P., Y.A.D. and D.S.; Software, A.Y.P. and Y.A.D.; Supervision, D.S.; Writing—original draft, Y.S.P., A.Y.P., Y.A.D. and D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Russian Foundation for Basic Research (project Nos. 20-07-00223, 20-07-00683, 20-07-00470).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Frawley, W.J.; Piatetsky-Shapiro, G.; Matheus, C.J. Knowledge discovery in databases: An overview. AI Mag. 1992, 13, 57. [Google Scholar]
Witten, I.H.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2005. [Google Scholar]
Campbell, P. Editorial on special issue on big data: Community cleverness required. Nature 2008, 455, 1. [Google Scholar]
Dhar, V. Data Science and Prediction. Commun. ACM 2013, 56, 64–73. [Google Scholar] [CrossRef]
Rosenblatt, F. The Perceptron, a Perceiving and Recognizing Automaton Project Para; Cornell Aeronautical Laboratory: Buffalo, NY, USA, 1957. [Google Scholar]
Tsypkin, Y.Z. Osnovi Teorii Obuchaiuschichsia Sistem (Foundations of the Theory of Learning Systems); Nauka: Moscow, Russia, 1970. [Google Scholar]
Ayzerman, M.A.; Braverman, E.M.; Rozonoer, L.I. Metod Potencialnikh Funkcii v Teorii Obuchenia Mashin (Method of Potential Functions in the Theory of Machine Learning); Nauka: Moscow, Russia, 1970. [Google Scholar]
Vapnik, V.N. Statistical Learning Theory; Wiley: New York, NY, USA, 1998. [Google Scholar]
Bishop, C. Pattern Recognition and Machine Learning (Information Science and Statistics), 1st ed.; Springer: New York, NY, USA, 2006; Reprint in 2007. [Google Scholar]
Popkov, Y.S.; Popkov, A.Y.; Dubnov, Y.A. Randomizirovannoe Mashinnoe Obichenie: Ot Empiricheskoi Veroiatnosti k Entropiinoi Randomizacii (Randomized Machine Learning: From Empirical Probability to Entropy Randomization); LENAND: Moscow, Russia, 2019. [Google Scholar]
Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Springer Series in Statistics; Springer: Berlin, Germany, 2001; Volume 1. [Google Scholar]
Aivazyan, S.A.; Mhitaryan, V.S. Prikladnaia Statistika i Osnovi Econometriki (Applied Statistics and Basics of Econometrics); Unity: Moscow, Russia, 1998.
Tarassow, A. Forecasting U.S. money growth using economic uncertainty measures and regularisation techniques. Int. J. Forecast. 2019, 35, 443–457. [Google Scholar] [CrossRef]
Marcellino, M.; Stock, J.H.; Watson, M.W. A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series. J. Econ. 2006, 135, 499–526. [Google Scholar] [CrossRef] [Green Version]
Eitrheim, Ø.; Teräsvirta, T. Testing the adequacy of smooth transition autoregressive models. J. Econ. 1996, 74, 59–75. [Google Scholar] [CrossRef]
Molodtsova, T.; Papell, D. Out-of-sample exchange rate predictability with Taylor rule fundamentals. J. Int. Econ. 2009, 77, 167–180. [Google Scholar] [CrossRef]
Granger, C.; Teräsvirta, T. Modelling Non-Linear Economic Relationships; Oxford University Press: Oxford, UK, 1993. [Google Scholar]
Wang, R.; Morley, B.; Stamatogiannis, M.P. Forecasting the exchange rate using nonlinear Taylor rule based models. Int. J. Forecast. 2019, 35, 429–442. [Google Scholar] [CrossRef]
Bessec, M.; Fouquau, J. Short-run electricity load forecasting with combinations of stationary wavelet transforms. Eur. J. Oper. Res. 2018, 264, 149–164. [Google Scholar] [CrossRef]
Clements, A.E.; Hurn, A.; Li, Z. Forecasting day-ahead electricity load using a multiple equation time series approach. Eur. J. Oper. Res. 2016, 251, 522–530. [Google Scholar] [CrossRef] [Green Version]
Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
Wheatcroft, E. Interpreting the skill score form of forecast performance metrics. Int. J. Forecast. 2019, 35, 573–579. [Google Scholar] [CrossRef]
Canale, A.; Ruggiero, M. Bayesian nonparametric forecasting of monotonic functional time series. Electron. J. Stat. 2016, 10, 3265–3286. [Google Scholar] [CrossRef]
Dubnov, Y.; Boulytchev, A.V. Bayesian Identification of a Gaussian Mixture Model. Inform. Tekhn. Vychisl. Sist. 2017, 1, 101–114. [Google Scholar]
Frazier, D.T.; Maneesoonthorn, W.; Martin, G.M.; McCabe, B.P. Approximate bayesian forecasting. Int. J. Forecast. 2019, 35, 521–539. [Google Scholar] [CrossRef] [Green Version]
Beaumont, M.A.; Zhang, W.; Balding, D.J. Approximate Bayesian computation in population genetics. Genetics 2002, 162, 2025–2035. [Google Scholar]
McAdam, P.; Warne, A. Euro area real-time density forecasting with financial or labor market frictions. Int. J. Forecast. 2019, 35, 580–600. [Google Scholar] [CrossRef] [Green Version]
Alkema, L.; Gerland, P.; Raftery, A.; Wilmoth, J. The United Nations probabilistic population projections: An introduction to demographic forecasting with uncertainty. Foresight (Colch. VT) 2015, 2015, 19. [Google Scholar]
Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
Bröcker, J.; Smith, L.A. From ensemble forecasts to predictive distribution functions. Tellus A Dyn. Meteorol. Oceanogr. 2008, 60, 663–678. [Google Scholar] [CrossRef] [Green Version]
Christensen, H.; Moroz, I.; Palmer, T. Evaluation of ensemble forecast uncertainty using a new proper score: Application to medium-range and seasonal forecasts. Q. J. R. Meteorol. Soc. 2015, 141, 538–549. [Google Scholar] [CrossRef]
Gneiting, T.; Katzfuss, M. Probabilistic forecasting. Annu. Rev. Stat. Appl. 2014, 1, 125–151. [Google Scholar] [CrossRef]
Lahiri, K.; Wang, J.G. Evaluating probability forecasts for GDP declines using alternative methodologies. Int. J. Forecast. 2013, 29, 175–190. [Google Scholar] [CrossRef]
Hong, T.; Pinson, P.; Fan, S.; Zareipour, H.; Troccoli, A.; Hyndman, R.J. Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond. Int. J. Forecast. 2016, 32, 896–913. [Google Scholar] [CrossRef] [Green Version]
Vidyasagar, M. Randomized Algorithms for Robust Controller Synthesis Using Statistical Learning Theory: A Tutorial Overview. Eur. J. Control 2001, 7, 287–310. [Google Scholar] [CrossRef]
Granichin, O.N.; Polyak, B.T. Randomizirovannie Algoritmi Ocenivania i Optimizacii pri Pochti Proizvolnikh Pomekhakh (Randomized Algorithms of Estimation and Optimization under Almost Arbitrary Disturbances); Nauka: Moscow, Russia, 2003. [Google Scholar]
Biondo, A.E.; Pluchino, A.; Rapisarda, A.; Helbing, D. Are random trading strategies more successful than technical ones? PLoS ONE 2013, 8, e68344. [Google Scholar] [CrossRef]
Lutz, W.; Sanderson, W.; Scherbov, S. The end of world population growth. Nature 2001, 412, 543. [Google Scholar] [CrossRef] [Green Version]
Tsirlin, A.M. Metody Usrednennoi Optimizatsii i Ikh Primenenie (Average Optimization Methods and Their Application); Fizmatlit: Moscow, Russia, 1997. [Google Scholar]
Shannon, C.E. Communication theory of secrecy systems. Bell Labs Tech. J. 1949, 28, 656–715. [Google Scholar] [CrossRef]
Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
Rosenkrantz, R.D.; Jaynes, E.T. Papers on Probability, Statistics, and Statistical Physics; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989. [Google Scholar]
Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Joffe, A.D.; Tihomirov, A.M. Teoriya Ekstremalnykh Zadach (Theory of Extreme Problems); Nauka: Moscow, Russia, 1974. [Google Scholar]
Wang, P.; Liu, B.; Hong, T. Electric load forecasting with recency effect: A big data approach. Int. J. Forecast. 2016, 32, 585–597. [Google Scholar] [CrossRef] [Green Version]
Gaillard, P.; Goude, Y.; Nedellec, R. Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting. Int. J. Forecast. 2016, 32, 1038–1050. [Google Scholar] [CrossRef]
Fiedner, G. Hierarchical Forecasting: Issues and Use Guidelines. Ind. Manag. Data Syst. 2001, 101, 5–12. [Google Scholar] [CrossRef]
Amaral, L.F.; Souza, R.C.; Stevenson, M. A smooth transition periodic autoregressive (STPAR) model for short-term load forecasting. Int. J. Forecast. 2008, 24, 603–615. [Google Scholar] [CrossRef]
Von Neumann, J. 13. Various Techniques Used in Connection With Random Digits. Appl. Math. Ser. 1951, 12, 36–38. [Google Scholar]

Figure 1. Structure of the RDRM.

Figure 2. PDFs of the parameters and noises for i = 1.

Figure 3. PDFs of the parameters and noises for i = 1.

Figure 4. Ensembles of LT model.

Figure 5. Ensembles of

T ξ

model.

Figure 5. Ensembles of

T ξ

model.

Figure 6. 24-h, 48-h and 72-h forecasts using

L T ξ

model.

Figure 6. 24-h, 48-h and 72-h forecasts using

L T ξ

model.

Table 1. Lagrange multipliers

θ, η

.

Table 1. Lagrange multipliers

θ, η

.

Time Instants	$θ^{(1)}$	$θ^{(2)}$	$θ^{(3)}$	$η^{(1)}$	$η^{(2)}$	$η^{(3)}$
1	−29.72	7009.28	1038.07	14.63	21.22	17.34
2	1.58	230.89	35.35	19.52	26.71	28.32
3	−4.09	369.96	26.23	35.91	31.60	26.33
4	−4.68	29.93	11.96	55.83	127.82	52.08
5	−7.21	24.25	1.03	96.85	642.35	110.94
6	−9.26	13.72	−15.76	592.99	7009.28	4729.52
7	−59.09	−5.96	−7009.28	7009.28	183.92	7009.28
8	−7009.28	−33.99	−767.99	48.21	39.94	23.16
9	−766.00	−1409.28	−22.91	66.58	12.28	−1.26
10	−50.90	−4229.90	−4.27	37.38	2.35	−19.78
11	−18.97	−45.22	3.72	22.51	−8.82	−22.73
12	−11.42	−15.07	9.17	7.16	−27.06	−23.06
13	−13.94	2.59	17.38	5.72	−172.25	−27.06
14	−17.62	5.82	14.94	2.83	−65.29	−23.02
15	−18.18	9.33	17.74	−0.30	−57.45	−23.15
16	−27.28	11.35	21.85	−1.24	−482.69	−47.78
17	−49.55	4.50	22.68	−5.49	−889.02	−130.49
18	−25.41	−7.09	29.39	−0.89	−28.12	−60.71
19	−8.20	−4.66	98.03	−4.23	−14.20	−270.17
20	0.95	−4.89	52.27	3.70	−6.41	−31.47
21	1.01	−16.37	8.15	21.16	2.48	−1.23
22	22.00	−8.24	7.45	17.85	12.15	14.86
23	2881.43	17.00	902.73	26.78	9.98	26.39
24	512.14	36.30	355.47	27.32	24.65	121.44
$l_{r}^{(i)} (θ^{*})$	9.71	6.04	6.09
$h_{r}^{(i)} (θ^{*})$	0.06	0.58	0.81
${\tilde{h}}_{r}^{(i)} (η^{*})$				0.41	0.05	0.19

Table 2. Values

δ_{L}

obtained by cross-testing of

L T

model. Mean value

δ_{L} = 0.0530

.

Table 2. Values

δ_{L}

obtained by cross-testing of

L T

model. Mean value

δ_{L} = 0.0530

.

$i / j$	1	2	3
1		0.0495	0.1052
2	0.0858		0.1428
3	0.0569	0.0364

Table 3. Values

δ_{T}

obtained by cross-testing of

T ξ

model. Mean value

δ_{L} = 0.0757

.

Table 3. Values

δ_{T}

obtained by cross-testing of

T ξ

model. Mean value

δ_{L} = 0.0757

.

$i / j$	1	2	3
1		0.1051	0.1079
2	0.1506		0.1185
3	0.1315	0.0676

Table 4. Values

δ_{T}

obtained by cross-testing of

L T ξ

model. Mean value

δ_{L} = 0.1478

.

Table 4. Values

δ_{T}

obtained by cross-testing of

L T ξ

model. Mean value

δ_{L} = 0.1478

.

$i / j$	1	2	3
1		0.1437	0.2659
2	0.1756		0.2322
3	0.3475	0.1655

Table 5. Accuracy of 24-h, 48-h and 72-h forecasts using

L T ξ

model.

Table 5. Accuracy of 24-h, 48-h and 72-h forecasts using

L T ξ

model.

$δ_{L}^{(2)}$	$δ_{L}^{(3)}$	$δ_{L}^{(4)}$
0.1509	0.2515	0.2133

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Popkov, Y.S.; Popkov, A.Y.; Dubnov, Y.A.; Solomatine, D. Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models. Mathematics 2020, 8, 1119. https://doi.org/10.3390/math8071119

AMA Style

Popkov YS, Popkov AY, Dubnov YA, Solomatine D. Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models. Mathematics. 2020; 8(7):1119. https://doi.org/10.3390/math8071119

Chicago/Turabian Style

Popkov, Yuri S., Alexey Yu. Popkov, Yuri A. Dubnov, and Dimitri Solomatine. 2020. "Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models" Mathematics 8, no. 7: 1119. https://doi.org/10.3390/math8071119

APA Style

Popkov, Y. S., Popkov, A. Y., Dubnov, Y. A., & Solomatine, D. (2020). Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models. Mathematics, 8(7), 1119. https://doi.org/10.3390/math8071119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models

Abstract

1. Introduction

2. Procedure of Entropy-Randomized Forecasting

3. Randomized Dynamic Regression Models with Random Input and Parameters

4. Models of Learning Data Sets

5. Algorithm of Randomized Machine Learning

6. Entropy-Randomized Forecasting of Daily Electrical Load of Power System

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI