Entropy-Randomized Forecasting of Stochastic Dynamic Regression Models

: We propose a new forecasting procedure that includes randomized hierarchical dynamic regression models with random parameters, measurement noises and random input. We developed the technology of entropy-randomized machine learning, which includes the estimation of characteristics of a dynamic regression model and its testing by generating ensembles of predicted trajectories through the sampling of the entropy-optimal probability density functions of the model parameters and measurement noises. The density functions are determined at the learning stage by solving the constrained maximization problem of an information entropy functional subject to the empirical balances with real data. The proposed procedure is applied to the randomized forecasting of the daily electrical load in a regional power system. We construct a two-layer dynamic model of the daily electrical load. One of the layers describes the dependence of electrical load on ambient temperature while the other simulates the stochastic quasi-ﬂuctuating temperature dynamics.


Introduction
Due to the gradually increasing resources and computational power of computers, huge amounts of data can be accumulated and stored, both in natural and digitized formats. Then, the following question arises immediately: what should be done with these data, except for storage? Extracting new knowledge from data seems to be a very interesting idea. The concepts of Data Mining (DM) [1,2], Big Data (BD) [3] and Data Science (DS) [4] were formulated and further developed by researchers accordingly.
A very tempting goal-extracting new knowledge from data-inevitably leads to the verbal or formal (mathematical) modeling of the "expected" knowledge. Therefore, any model has some predictive properties, which can be implemented only under known values of its quantitative characteristics (parameters). Data are a fundamental component of the three concepts above: data are adopted for estimating the characteristics of a model using machine learning (ML) procedures, which allows extracting new knowledge.
Unlike DM, BD, and DS, the concept of ML has a rich history of over 70 years as well as vast experience in solving numerous problems. The first publication in this field of research dates back to 1957; see [5]. The notion of empirical risk, a key element of ML procedures, was introduced in 1970 in the monograph [6]. The method of potential functions for classification and recognition problems In this paper, we propose a fundamentally different forecasting method-the so-called entropy-randomized forecasting (ERF). In accordance with this method, an ensemble of random forecasts is generated by a predictive dynamic regression model (PDRM) with random input and parameters. The corresponding probabilistic characteristics, namely the probability density functions, are determined using the entropy randomized machine learning procedure. The ensembles of forecasting trajectories are constructed by the sampling of the entropy-optimal PDFs.
The proposed method is adopted for randomized prediction of the daily electrical load of a regional power system. A hierarchical randomized dynamic regression model that describes the dependence of the load on the ambient temperature is constructed. The temporal evolution of the ambient temperature is represented by an oscillatory second-order dynamic regression model with a random parameter and a random input. The results of randomized learning of this model on the GEFCom2014 dataset [34] are given. A randomized forecasting technology is suggested and its adequacy is investigated depending on the length of forecasting horizon.

Procedure of Entropy-Randomized Forecasting
Randomization as a means of imparting artificial and rationally organized random properties to naturally nonrandom events, indicators, or methods is a fairly common technique that yields a positive effect. There exist many examples in various fields of science, management, economics: randomized numerical optimization methods [35,36]; the mixed (random) strategies of trading on a stock exchange [37]; the randomized forecasting of population dynamics [38]; vibration control of industrial processes [39]. As the result of randomization, nonrandom objects gain artificial stochastic properties with optimal probabilistic characteristics in a chosen sense. The question on appropriate quantitative characteristics of optimality has always been controversial and ambiguous. It requires arguments that would somehow reflect the important specifics of a randomized object. In particular, a fundamental feature of forecasting procedures is uncertainty in the data, predictive models, methods for generating forecasts, etc.
In what follows, information entropy [40] will be used as a characteristic of uncertainty. In the works [41][42][43], using the first law of thermodynamics it was demonstrated that entropy is a natural functional describing the processes of universal evolution. Moreover, in accordance with the second law of thermodynamics, entropy maximization determines the best state of an evolutionary process under the worst-case external disturbance (maximum uncertainty). Also, note another quality of information entropy associated with measurement errors and other types of errors, which are important attributes of data: with the factor of such errors being considered in terms of informational entropy, the probabilistic characteristics of noises exerting the worst-case impact on forecasting procedures can be estimated in explicit form.
The technology of entropy-randomized forecasting consists of the following stages. At the beginning (the first stage), a predictive randomized model (PRM) of a studied object is formed and its parameters are designed. A PRM transforms real data into a model output. In the general case, these transformations are assumed to be dynamic, i.e., the model output observed at a time instant n depends on the states observed on some past interval. The PRM parameters are assumed to be of the interval type and random, and their probabilistic properties are characterized by the corresponding PDFs.
The second stage of the technology under consideration-randomized machine learning (more specifically, its entropy version)-is intended to estimate the PDFs. At this stage, the estimates of the PDFs are calculated using learning data sets and also a learning algorithm in the form of a functional entropy-linear programming problem.
At the third stage, the optimized PRM (with the entropy-optimal PDFs) is tested using a test data set and accepted quantitative characteristics of the quality of learning. The optimized PRM actually generates an ensemble of random trajectories, vectors, or events with the entropy-optimal values of their parameters.
The learned and tested PRMs serve for forecasting. In this case, the ensembles of random forecasted trajectories generated by the entropy-optimal PRMs are used to calculate their numerical characteristics such as mean trajectories, variance curves, median trajectories, the PDF evolution of forecasted trajectories, etc.

Randomized Dynamic Regression Models with Random Input and Parameters
Randomized dynamic regression models (RDRMs) form a class of dynamic models with random parameters that describe a parametrized dependence of the object's state at a given time instant on external factors and its states at some past time instants.
The structures of models are designed on the basis of existing knowledge and hypotheses about the properties of an object, which often turn out to be very inaccurate. Moreover, the external factors themselves can change over time and therefore should be predicted to model the object's dynamics. Reliable information on realistically measured impacts leading to the temporal evolution of external factors is often unavailable. The aforementioned indicates the presence of uncertainty, both in the development and further use of models. In [10], a method for reducing the influence of uncertainty based on the randomization of models (including the class of RDRMs) was proposed. In the latter case, this method extends the idea of randomization to the modeling of external factors and their evolution.
The structure of the RDRM is shown in Figure 1. It consists of a model of the main object (RDRM-O) with random parameters a ∈ R p and a model of external factors (RDRM-F) with random parameters b ∈ R s and a random input ζ ∈ R q . The states of the object and its model belong to the vector space R m , in whichx with the following notations: as the block column vector of parameters, where A i is a random matrix of dimensions (m × m) with random elements of the interval type, i.e., • A (p+1) as a matrix of dimensions (m × q) with random elements of the interval type, i.e., as the block row vector of p retrospective states, where • denotes a block row vector.
The probabilistic properties of the block vector A (p) and the matrix A (p+1) are characterized by a joint PDF P(A (p) ) and a PDF F(A (p+1) ), respectively. The state of RDRM-O is assumed to be measurable at each time instant n and also to contain an The random vectors µ[n] are of the interval type, i.e., with a PDF M n (µ[n]). The random vectors measured at different time instants are assumed to be statistically independent.
Consider the linear version of RDRM-F, which has a similar structure described by the equation with the following notations: as a block column vector formed by matrices B i of dimensions (q × q) with random elements of the interval type, i.e., as a block row vector.
The probabilistic properties of the parameters are characterized by a continuously differentiable PDF W(B (s) ).
The random vector ζ[n] is of the interval type, i.e., with a continuously differentiable PDF Q n (ζ[n]). The random vectors ζ[n] measured at different time instants are statistically independent. By analogy with RDRM-O, the state of RDRM-F is assumed to be measurable at each time instant n and also to contain an additive noise ξ[n]: The random vectors ξ[n] are of the interval type, i.e., with a continuously differentiable PDF G n (ξ[n]). The random vectors measured at different time instants are assumed to be statistically independent.
Thus, in the RDRM (RDRM-O and RDRM-F), the unknown characteristics are the PDFs P(A (p) ), F(A (p+1) ) and W(B (s) ) of the model parameters and also the PDFs M n (µ[n]), Q n (ζ[n]) and G n (ξ[n]) of the measurement noises, n ∈ L.

Models of Learning Data Sets
The desired PDFs (see the previous section) are estimated using the learning data sets that are obtained on a learning interval n ∈ L = [n − , n + ] and are consistent with the RDRM.
Consider RDRM-O. On the learning interval, · · · = · · · · · · · · · · · · · · · , The observable states of RDRM-O on the learning interval L represent the collection of vectors Hence, the learning data set consists of the data on retrospective states of the object, and the data on observable current states, Consider RDRM-F. On the learning interval, · · · = · · · · · · · · · · · · · · · , The observable states of RDRM-F on the learning interval L represent the collection of vectors Hence, the learning data set consists of the data on retrospective states of the factors, and the data on observable current states,ẑ Thus, the learning procedure of the RDRM involves three data sets, (18), (21) and (22).

Algorithm of Randomized Machine Learning
The entropy version [10] of RML algorithms is used for estimating the PDFs of the model parameters and measurements noises of RDRM-O and RDRM-F. For RDRM-O, the corresponding algorithm has the form subject to the following constraints: the normalization conditions of the PDFs given by the empirical balances given by Please note that the empirical balances represent a system of (n + − n − ) blocks composed of m equations. With each block an m-dimensional vector of Lagrange multipliers θ (n) is associated. This optimization problem belongs to the class of entropy-linear programming problems of the Lyapunov type [44]. It has an analytic solution parametrized by the Lagrange multipliers: In the above formulas, The matrix of Lagrange multipliers θ = [θ (n − ) , . . . , θ (n + ) ] is determined by solving the balance equations From (25)- (27) it follows that the PDFs P * (A (p) ) and F * (A (p+1) ) of the model parameters of RDRM-O and the PDFs M * n (µ[n]), n = n − , n + , of the measurement noises are found using the retrospective learning data setsX(n − , p),X(n − + 1, p), . . . ,X(n + , p), the current state data setŝ x[n − ], . . . ,x[n + ] and the data setsẑ[n − ], . . . ,ẑ[n + ] generated by RDRM-F.
For obtaining the latter collections, the RML algorithm is applied to estimate the PDFs of the model parameters and measurement noises of RDRM-F. In accordance with [10], subject to the following constraints: the normalization conditions of the PDFs given by This problem is from the same class as (25)- (27). It has the following analytic solution in terms of Lagrange multipliers: G n (η (n) ) , n = n − , n + . η = [η (n − ) , . . . , η (n + ) ].
In the above formulas, The matrix of Lagrange multipliers η is determined by solving the balance equations n = n − , n + .

Entropy-Randomized Forecasting of Daily Electrical Load of Power System
The daily electrical load L of a power system depends on many various factors. The analysis below is restricted to one of the most significant external factors-the ambient temperature T. The daily temperature variations are fluctuating [45,46]. These fluctuations affect electrical load, but with some time delay due the inertia of a power network supplying electrical energy from the generator to consumers.

(1). Dynamic Regression Model.
In accordance with the general structure of the RDRM (see Section 2), the electrical load model (the LT model) describes the dynamic relationship between electrical load and ambient temperature while the ambient temperature model (the Tξ model) describes the daily dynamics of ambient temperature. There exist quite a few versions of the LT model, albeit all being static, i.e., describing the relationship between electrical load and ambient temperature at current time instants [47]. The daily temperature dynamics are fluctuating, and such fluctuations are described, in particular, by the periodic autoregressive model [48].
Please note that the effect of ambient temperature on electrical load is dynamic, i.e., the change in load due to temperature at a given time instant depends on its value at a previous time instant. A similar property applies to ambient temperature fluctuations. Therefore, following the general randomized approach, the LT model is designed as a first-order dynamic regression model with random parameters, while the Tξ model is designed as a second-order dynamic regression with a random parameter and a random input ξ. Then the LTξ model is the composition of the two models above.
In the class of linear models, the randomized dynamic regression load-temperature model (LT model) of the first order can be written in the form where random independent parameters a and b take values within intervals The probabilistic properties are characterized by PDFs P(a) and F(b) defined on the sets A and B, respectively. The random noise µ[n] that simulates electrical load measurement errors is of the interval type as well. In the general case, for each time instant the intervals may have different limits, i.e., with PDFs M n (µ[n]), ], n = n − , n + . Consider the Tξ model. The fluctuating character of the daily temperature variations is described by the randomized dynamic regression model of the second order where t is the mean daily temperature. These parameters are random and take values within given intervals c ∈ [c − , c + ]. The probabilistic properties of the parameters are characterized by PDFs W(c) defined on corresponding intervals. Equation (36) contains random noises described by independent random variables ξ[n]; in each measurement n, their values may lie in different intervals, i.e., The probabilistic properties of the random variable ξ[n] are characterized by a PDF Q n (ξ[n]), n = n − , n + . Thus, Equations (33) and (36)  The normalization procedure is performed in the following way: In accordance with (33) and (36), the model variables and the corresponding real data on the learning interval n ∈ T l are described by the vectors [1], . . . , L (i) [24] [1], . . . , ξ (i) [24]}.
In terms of (39), the LT and Tξ models on the learning interval T l have the form

(4). Results of Model Learning.
Using the available data on daily variations of electrical load and ambient temperature (see Figure 1) for the three days indicated above, the balance Equations (44), (45), (47) and (48) were formed. Their solution was determined by minimizing the quadratic residual between the left-and right-hand sides of the equations. Since the equations are significantly nonlinear, the resulting values of the Lagrange multipliers (see Table 1) correspond to a local minimum of the residual. All calculations were implemented in MATLAB; optimization was performed using the fsolve function. Table 1. Lagrange multipliers θ, η.

Time Instants
θ (1) θ (2) θ (3) η (1) η (2) η ( Because the parameters of the LT model are independent, the joint PDFs U * i (a, b) = P * i (a)F * i (b) of the parameters and noises have the form For i = 1, the graphs can be seen in Figure 3.
Thus, the randomized LTξ model generates random trajectories with the entropy-optimal PDFs of the model parameters and measurement noises: The corresponding ensembles are generated by the sampling procedure of the resulting PDFs of the parameters and noises using the acceptance-rejection (AR) method (also known as rejection sampling (RS); see [49]). During calculations, 100 samples for each parameter and 100 samples for each noise were used; in other words, the ensemble consisted of 10 4 trajectories.

(5). Model Testing.
The adequacy of the model was analyzed by the self-and cross-testing of the LT and Tξ models on the real load-temperature data for 3-5 July 2016 (i = 1, 2, 3). Self-testing means generating an ensemble of trajectories with the entropy-optimal parameters and noises for day i, calculating the mean (mean) and median (med) trajectories and also the variance curve (std±) of the ensemble, and comparing the mean trajectory with the real counterparts by electrical load and ambient temperature for the same day i. The quality of approximation is characterized by relative errors, in electrical load and in ambient temperature. Cross-testing represents a similar procedure in which the mean trajectories are compared with the real counterparts in terms of electrical load and ambient temperature for days j = i. The quality of approximation is characterized by relative errors, in electrical load and in ambient temperature.
Self-testing. For the LT model, the real ambient temperature data T (i) r [n] as well as the entropy-optimal PDFs P * i (a) and F * i (b) of the parameters (a, b) and the PDFs M * 1 (µ[1]), . . . , M * 24 (µ [24]) of the measurement noises µ[n] were used. The ensembles L (i) were generated using the sampling procedure of the above PDFs. The mean trajectory L  T for the three days are shown in Figure 5.
Cross-testing. For cross-testing, the LT and LTξ models learned on the data for day i were used, and their mean trajectories were compared with the data for days j = i. The resulting errors are combined in Tables 2-4.

(6). Randomized Prediction of N-Daily Load.
In the randomized prediction of the N-daily load, the LTξ model learned on the interval T l was used. The quality of the forecast was characterized using the LTξ model with the entropy-optimal PDFs obtained on the real data for the first (i = 1) day.
The 1-day (n ∈ [1, 24]), 2-day (n ∈ [1,48]) and 3-day (n ∈ [1, 72]) ensembles were constructed by the sampling procedure of the above PDFs. For these ensembles, the mean trajectories L mean [n], the median trajectories L med [n] and also the limiting trajectories L std± [n] of the variance curve were found. The forecast results were compared with the real data for 3-7 July 2006 (i = 1, 4). The forecasting quality was characterized by the relative errors calculated similar to (53) and (54).
The resulting 24-h, 48-h and 72-h randomized forecasts of electrical load and their probabilistic characteristics (the mean and median trajectories, the limit trajectories of the variance curves) are presented in Figure 6. The errors, i.e., the deviations between the model forecasts and real data, can be seen in Table 5.

Conclusions
The article proposes a new forecasting approach based on the idea of generation not the only forecast and not a set of forecasts with the scenario's model parameters, and not forecasts with assigned probabilities, but the ensemble of random forecasts with entropy-optimal model parameters and measurement noises.
For the randomized forecasting we propose a structure of predictive dynamical model which uses as real data as optimized noises. The latter are the source of ensemble of predictive trajectories, which allow computing deterministic trajectories of its different numerical characteristics and probabilistic estimates.