Prediction Intervals: A Geometric View

: This article provides a review of the approaches to the construction of prediction intervals. To increase the reliability of prediction, point prediction methods are replaced by intervals for many aims. The interval prediction generates a pair as future values, including the upper and lower bounds for each prediction point. That is, according to historical data, which include a graph of a continuous and discrete function, two functions will be obtained as a prediction, i.e., the upper and lower bounds of estimation. In this case, the prediction boundaries should provide guaranteed probability of the location of the true values inside the boundaries found. The task of building a model from a time series is, by its very nature, incorrect. This means that there is an inﬁnite set of equations whose solution is close to the time series for machine learning. In the case of interval use, the inverse problem of dynamics allows us to choose from the entire range of modeling methods, using conﬁdence intervals as solutions, or intervals of a given width, or those chosen as a solution to the problems of multi-criteria optimization of the criteria for evaluating interval solutions. This article considers a geometric view of the prediction intervals and a new approach is given.


Introduction
A dynamic system is defined as a model of the evolution of a process, the state of which is determined based on the initial state and the equations of dynamics. The tasks involved in predicting or obtaining numerical estimates of the future values of the observed process based on historical data include the process of determining a discrete dynamic system whose behavior would correspond to real data (perhaps obtaining a continuous model, a decision that is decreed in time intervals that correspond to the relevant changes-the real process). From the point of view of geometry, the graph that corresponds to the solution should differ as little as possible from the graph that corresponds to the observed process, that is, there is a transformation close to unity, under conditions of an insignificant change in symmetry.
Modern methods for building predictive models [1,2], based on machine learning, including artificial intelligence methods [3,4], aim to find the most suitable systems that minimize the deviation of the observed time series from the model graph (solution of the equation of evolution of a dynamic system), which requires achieving a given accuracy or qualitative adequacy of the phase trajectories. From the point of view of geometry, we must eliminate the violation of symmetry. For a dynamic system, the concepts of robust models stand in opposition to sensitivity [5], while, as a rule, a robust model is defined as insensitive in its dynamic behavior and dynamic and frequency characteristics when the model parameters change (which can be called parametric robustness). As opposed to interval parameter assignments, one can introduce dynamic output robustness, i.e., the solution determined by a dynamic system with fixed parameters has an interval form, known as prediction intervals (PI).
Predictions that are not in point form, but in the form of ranges of values, are called forecast intervals (PI). However, this involves not only forecasting, but also interval extrapolation from point historical data. It seems appropriate to call such a model a robust model, with robust output. Thus, in the problem of building a model according to a time series, i.e., an ill-posed problem (the inverse problem of dynamics), robust interval models can be understood as dynamic systems, the solution of which is given by values at intervals, while the interval is reliable and adequately includes the original time series and gives a reliable prediction. Under these conditions, several (completely different) models can determine the initial interval around the observed states of the system.
In other words, for the time series {X(t), (t = 0, 1, 2, . . .)}, the following functions are defined f i (x(t)) = y(t + 1) ∈ [y − ; y + ] i , where f i represents various analytic functions, or functional expansions; t represents discrete time (point number); x(t) is the value of the state of the system at the moment t; y is the computed value; y − , y + are the lower and upper limits of the interval, respectively; I is the function number. The interval can be defined as a confidence interval in a stochastic model and it can be based on interval estimates with measurement errors, or it can be fixed and set based on the conditions of the problem, and the value of the interval can be the result of solving a variational or optimization issue of the given criteria presented in intervals.
Thus, it is possible to define a new class of systems dynamic models (stochastic or deterministic), the parameters of which are functions without uncertainties, and the solution is an interval form. Here, we also consider the transformation of a plot of a function into an interval. In addition, the operation of the idempotent addition of intervals is introduced, which transforms the set of graphs into a semiring. One can recall that if we consider particular solutions of (evolutionary) differential equations, they form Lie groups (a symmetry group). This paper considers PI from geometric positions. When using PI for a group of interconnected series, for example, a portfolio of financial instruments, forecasting the energy consumption of a group or other structurally complex dynamic objects [6] allows us to reduce the requirements for the width of forecasts, paying attention to any other characteristics, for example, absenteeism forecast values beyond the lower limit of the forecast.
The contribution of this article is presented in the following list: -A geometric approach for interval forecasts is proposed; -A review of PI methods based on a geometric view was carried out; -A new approach to the construction of robust output models is proposed.
In this article, we review and construct new types of models. The structure of the article is as follows: Section 2, a review of PIs is carried out, in Section 3, a selection of criteria for evaluating interval solutions is presented, in Section 4, a new theory of interval models is proposed, and an example of construction is given. Section 5 contains the conclusion.

Prediction Intervals: Review
The task of prediction is the task of finding the values of the observed process in the future for the number of reports (for discrete systems and time series) or a time interval (for continuous systems) called the forecast horizon. In terms of geometry, it is required to extend or extrapolate the graph (continuous or discrete) by a given value. Traditional time series methods look for the next point at the next point in time, provided the current and historical data are known. Unlike the point prediction, the prediction interval (PI) provides the numerical value of the range in which the system will be at the next point in time. At the same time, the graph itself (initial data) is the same as for the point prediction. In other words, for a function graph (time series), the prediction will be an interval characterized by upper and lower boundaries. It is often argued that the PI is the most probable interval of values, but this is not entirely true, as it all depends on the conditions for the width of the interval.
With the use of PIs, there is a blurring of the accuracy of the prediction, but at the same time, the degree of uncertainty is reduced, and ensures the robustness of the model in terms of output. A guaranteed hit in each prediction interval is a reduction in the error, with a sufficiently wide interval and a complete elimination of the error. Based on historical data (time interval) in the form of a graph, the prediction includes two non-intersecting graphs for a given forecast interval; the first graph is the upper limit, while the lower graph is the lower forecast limit. We are looking for two transformations-the original chart into the chart of the upper boundary and the chart of the lower boundary. In the resulting range, an infinite number of functions can be included, including those corresponding to real data.
Currently, PIs, as the most intuitive approach, have attracted a significant amount of attention [7][8][9][10]. This approach is intuitive, since a weather forecast is usually a range of predicted temperatures for each moment of time or each day. The methods are widely used both for weather data, wind forecasting [7], in energy [8] and in other applied areas [9,10]. The generality of application can be characterized as follows: (1) the problem has pronounced trends and seasonal components; (2) there are portfolios of several signals; (3) for forecasts, there are requirements for reliable upper or lower bounds.
In recent years, probabilistic prediction methods have been extensively studied to effectively quantify uncertainties. A probabilistic prediction generates probability density functions [11], quantiles [12], or intervals [13] in applied prediction problems. Traditional interval prediction methods include fuzzy inference [14], beta distribution function [15] and Gaussian processes, autoregressive integrated moving average models and log-normal processes for obtaining probabilistic predictions [16][17][18]. The features of such solutions are the assumptions about the type of distribution, while Gaussian processes require significant computational resources to estimate the covariance matrices [19]. Numerical methods for obtaining quasi-Gaussian noise can obtain negative values of the interval, which is inconsistent with the physical meaning of the problem. For short-term prediction, the following were proposed: a combination of a kernel-based support vector quantile regression model and Copula theory [20]; the use of Yeo-Johnson transformation quantile regression and Gaussian kernel function [21]; machine learning methods, which are now widely used [22,23]. In these papers, the results were evaluated based on the PI coverage probability and the normalized mean PI width. The features of the methods are high mathematical computational complexity, with the need to identify seasonal components.
As a non-parametric method, the proposed lower and upper bound estimation (LUBE) [24,25] can effectively solve the above problem by constructing the PI directly from the input data, without any additional assumptions about the distribution of the data. The LUBE method was proposed in [26] and is aimed at constructing narrow PIs with a high probability. In general, LUBE can be solved in single-purpose and multi-purpose frameworks. For example, in [27], a single-target structure for LUBE is proposed in which the average PI width is minimized with restrictions. In [28], it was proposed to decompose and group the series into different components using the empirical mode and sample decomposition methods. In a multi-criteria framework [29], the coverage probability and the average PI width are simultaneously optimized to obtain a set of Pareto-optimal solutions. Here, partial decomposition and smoothing of fluctuations in wind power series were used to correctly preprocess data for the forecast model. As a rule, multicriteria involve building a compromise between the maximum probability of real values falling into the found interval and obtaining a compression of the average PI width, which is successfully solved based on evolutionary algorithms [29]. To adjust the parameters of the PI model, heuristic algorithms were used, including the genetic algorithm for quick sorting with nondominance (NSGA-II) [30], differential evolution (DE) [31], particle swarm optimization (PSO) [32] and the dragonfly algorithm (DA) [33].
The existing methods for constructing a prediction can be divided into two main types-direct modeling methods and inverse modeling methods. Prediction tools can be classified as follows: direct, reverse and separate interval methods. Direct methods involve the use of engineering physical models built based on physical laws (white boxes or internal models). Inverse methods consist of building models in the form of a "black box", that is, a known process that transforms input characteristics into output or observable processes. The purpose of the inverse methods is to solve the inverse problem of dynamics by finding the type of system that can generate the same signal. This problem is an ill-posed problem because it has an infinite number of solutions. In addition, this kind of model can be called an external model. When modeling, we are not trying to penetrate the physics of the occurring phenomena, but are trying to find any type of transformation that transforms the input process into an output process. The prediction of PI is an extension of any of the methods used for finding upper and lower bounds. This can be represented as a search for two transformations-input data to the upper boundary of the output and input data to the lower boundary of the output.
Direct physical methods [34] are based not only on the initial data, but also on the knowledge of the physical phenomena that occur in the simulated process. This includes, for example, control methods of technical systems. The advantages of these methods are their high prediction accuracy and high interpretability [35] The disadvantage is that sometimes, it is impossible to represent the system as a white box. These models require a significant number of calculations and a detailed, often redundant description.
Inverse methods, in turn, can be divided into the use of series, statistical and stochastic methods, and machine learning methods. The series are based on known function expansions such as Fourier series. Statistical methods are data-driven using historical time series data to predict future values based on autoregressive dependencies, such as the moving average model (ARMA) [36,37] and the autoregressive integrated moving average model (ARIMA) [38]. In recent years, many machine learning technologies have been applied. Among them, the artificial neural network (ANN) has become a common prediction method because it can capture the non-linear relationship between historical data [39]. Many studies use shallow ANNs and some use deep learning (DL) to capture complex non-linear features [40,41]. In different applied areas, special methods are being developed that are oriented to considering the rate of change in parameters. Thus, for prediction in the electric power industry, where the reaction of processes to disturbances occurs very quickly, data preprocessing includes noise filtering using artificial intelligence [42][43][44]. Modern neural networks are effective at identifying trends; in neural networks with periodic characters, a sinusoidal activation function [45] is used to replace the sigmoid activation function. Some studies combine ANNs with statistical methods to capture both linear and non-linear characteristics [46]. However, there are some shortcomings in direct and inverse methods of point prediction, including errors in the prediction accuracy, since, as a rule, the real value differs from the predicted value [47][48][49].
Thus, the development of prediction tools based on interval prediction forms a separate, essentially important group, combining both direct and inverse methods, and the need to build a solution in the form of a range. The PI can assess the potential uncertainty and risk level more reasonably and provide better planning information. From the point of view of geometry and differential geometry [50][51][52][53], the prediction is an area limited by two graphs (upper and lower boundaries) and in this area, there is an unlimited number of graphs that fit into the given boundaries, and the autocorrelation between any two points can be different. Both a graph with a large correlation (slow and monotonous) and a graph with a small correlation (quickly changing) can be inscribed in the boundaries. The only requirements for these graphs (solutions of finite difference or differential equations, or decomposition in the form of neural networks or any series) include their limitation to a given interval. This fact allows us to solve the inverse problem-obtaining an interval as an approximation built based on machine learning from the historical data of a time series of several completely different models. Table 1 shows the classification of predictive modeling methods with an analysis of the advantages and disadvantages. Can use any of the above models One is required to solve the problem of choosing the width of the interval

Criteria for Evaluating and Choosing the Optimal Width of the Prediction Intervals
The LUBE method generates a PI in one step, resulting in a lower and an upper bound. Thus, the LUBE model can be viewed as a type of direct mapping of the input data to the PI without any assumptions about the distribution of the data. In addition, the efficiency of LUBE is determined by the following two conflicting criteria: on the one hand, it is necessary to ensure that the predicted values fall within the predicted interval and on the other hand, the interval cannot be extended to infinity, so the estimates must be finite and rather narrow. Thus, evaluating the accuracy and efficiency of the PI is a major challenge in LUBE. This section introduces some PI score indexes.
PICP is a critical appraisal index for PI that indicates the likelihood that future values will be covered by lower and upper bounds. It is evident that a larger PICP means that the PIs built can more accurately reflect future uncertainties. PICP is defined as follows [69]: where N is the size of the data sample, and a i is a binary variable defined by the following formula: In Equation (2), the values y i_ ,y i+ are the estimated lower and upper bounds. To ensure the accuracy of the PI, it is usually required that the PICP exceeds a predetermined level of confidence (1 − α).
In general, PICP is considered as a very important indicator of PIs, which represents the accuracy of the PI, that is, the probability that the target value overlaps the upper and lower bounds of the PI.
Although PICP is a key indicator of PI accuracy, the effectiveness of PICP is offset by interval widening because a large PICP (even approaching 100%) can easily be obtained with an extremely wide prediction interval. However, wide intervals may not provide meaningful information or provide effective management or monitoring of system performance.
PINAW is introduced to measure the effectiveness of PIs [70].
where W is the width range. The purpose of normalization is to achieve the objective assessment of the PI width from absolute values. Typically, PICP and PINAW are two conflicting goals when building a PI. Increasing the coverage probability inevitably increases the width, and narrowing the width results in a low coverage probability.
To evaluate the overall performance of the PI, CWC [71] is proposed by combining PICP and PINAW together, as shown in the following formula: where is a binary value defined as follows: Equation (5) also has two control parameters, γ, µ, which for CWC reflect the PI coverage probability requirement and can be calculated from a predetermined confidence level (1 − α). η is a penalty factor when the resulting PI fails to meet the coverage probability requirement. In a unified optimization framework, CWC helps to find a trade-off between accuracy and PI efficiency.
When accepting an existing PICP index, the test data samples are turned into binary variables to indicate whether they are within the lower and upper bounds. Because continuous values can convey more information than binary variables, the data information provided by the data set is not fully utilized if a PICP is received. In addition, most of the existing studies focus on the probability of coverage but ignore the risk beyond the obtained intervals. In our opinion, this is not advisable, since the risk outside the forecasting intervals can have a significant impact on decision-making processes. In addition, estimation errors outside of the PI can also have a significant impact on decision-making processes. For example, if the real value goes beyond the limits of the interval, it can lead to equipment failure when the limits correspond to the critical values of technologies. Therefore, to ensure reliability, when constructing an PI, the estimation error should be considered. In [72], the Winkler score (WS) was used to estimate the quality of PI, which is calculated as a weighted sum of the width and the error of the PI estimate.
Usually, at a given confidence level, PIs with a small absolute value of WS are of high quality. However, WS does not distinguish between the contribution of the mean width and estimation error. In some real-world applications, decision makers wish to obtain the PI estimation error to estimate the operational risk. In addition, to reduce the operational risk, it is necessary to independently minimize the error in PI estimation when training the interval forecast model. To mitigate the above shortcomings, this study proposes a new estimation index called the PI estimation error (PIEE) to eliminate PI estimation errors [7].
In addition, the total PIEE can be calculated as follows: (7) Based on the PIEE index above, the PI estimation error can be measured separately with the average PI width. In this way, decision makers can assess and mitigate the operational risk beyond the scope of the PI. While PIEE is proposed to measure PI accuracy, the PI performance is ignored. For example, a small PIEE (even approaching 0) can easily be achieved using an extremely wide PI. However, this is meaningless, since no useful uncertainty information can be given to the operators when the PIs are extremely wide. Therefore, PI width is also an important factor in LUBE, which is usually estimated by PINAW. Based on the above, PIEE is optimized together with PINAW to obtain high quality PIs.
Stochastic sensitivity (SS) is calculated from the average of the output deviations of the model by a small perturbation of the singularity [73]. If the output of the model is severely disturbed by small perturbations, then the robustness and stability of the model are weak, and this usually results in its weak generalization ability regarding future unknown samples. The model is less likely to successfully predict future invisible patterns. SS is defined as the average difference of the predicted value of random perturbed samples to the label, which is formulated as follows: where x denotes the given training set, x p , is the perturbed sample around x, y is the number of perturbed samples, and β the value predicted by the model h, respectively. Various application works use indexes. The multi-objective particle swarm optimization algorithm and least-squares support vector regression are two examples [70][71][72][73].

New Approach
In this section, we will consider two approximate models, the solution of which is graphs with interval solutions, which allows us to operate with the following intervals: We can calculate the value of the intervals as a solution to some optimization problems, and so on.
Here, we require the attraction of phase trajectories into a closed region. The area in the phase space in the series of dynamics corresponds to a limited range of values. That is, the definition requires that there is a finite-dimensional interval in the development of the evolution of the system. Therefore, the definition requires the dissipativity of the system.
We will call such a model a robust interval model.

Definition 2.
Let there given interval models y 1 = f 1 (x(n)), y 2 = f 2 (x(n)), were y '1 ≈ y 2 ≈ x(n); x(n) ∈ [y 1− , y 1+ ], x(n) ∈ [y 2− , y 2+ ]; f 1 , f 2 are different models; y 1− , y 2− are lower limits of intervals; y 1+ , y 2+ are upper limits of intervals, Then the operation of combining models ⊕: Lemma 1. If f 1 (x) does not satisfy definition 1, then a function f 2 (x) may exist, which is a robust interval model of the original system. If we consider the operation of combining models, then the resulting model will also be robust.
The proof is obvious, since the union of intervals by Definition 2 includes a robust interval.
In the case of divergent models, the interval increases, but the robustness property does not disappear for dissipative stationary systems. As a robust model, it uses the union of confidence intervals of all used models. The validity and meaning of the proposed operation and definition take place in the case of stationary processes, as well as for dissipative systems, i.e., the phase portraits of which, stable or unstable, have limit cycles and do not have bifurcation points. Many financial series belong to cyclic dissipative systems and this is especially true for aggregated portfolio financial instruments, which include several tens of thousands of series, both macro-indicators and rapidly changing micro-series. Example We can consider the problem of targeting residuals of the Federal Treasury. The data that enter the system are divided into the following two types: (1) planned balances time series with a period of 1 year (current year) and monthly detailing; (2) actual balances time series with a period from the beginning of the current year to the current day and daily detailing. The final output summary prediction should be a time series with a period of 1 year (current year) and daily granularity. Accordingly, for its construction, it is relevant to solve the following problems: splitting monthly totals into daily totals for series of planned balances and forecasting with a given horizon (until the end of the year) and for series of actual balances and those converted into daily series of planned balances.
Historical data were taken for the period 2015-2020. Significant series were selected for analysis (for example, income/expenses of the Social Insurance Fund, Pension Fund, Federal Compulsory Medical Insurance Fund) and aggregated time series of several insignificant series (for example, receipts of the municipal level).
To solve the extrapolation problem with a forecast horizon for a year ahead with daily discretization, the following four models were studied and applied: To assess the accuracy of the models, the MAPE metric was used, including the average absolute percentage error, i.e., the average percentage of how often the predictive model is wrong.
The data on the actual account balances are characterized by significant structural shifts (see Figure 1) and these are due to many reasons, for example, the client's decision to split the balances between several of their accounts or transfer an amount of the balance to the "reserve". The DSS [11] is able to take into account the opinion of experts on the choice of the main model. and for series of actual balances and those converted into daily series of planned balances.
Historical data were taken for the period 2015-2020. Significant series were selected for analysis (for example, income/expenses of the Social Insurance Fund, Pension Fund, Federal Compulsory Medical Insurance Fund) and aggregated time series of several insignificant series (for example, receipts of the municipal level).
To solve the extrapolation problem with a forecast horizon for a year ahead with daily discretization, the following four models were studied and applied: (1) The weighted average; (2) Triple exponential smoothing (Holt-Winters); (3) ARIMA (Box-Jenkins); (4) Linear regression.
To assess the accuracy of the models, the MAPE metric was used, including the average absolute percentage error, i.e., the average percentage of how often the predictive model is wrong.
The data on the actual account balances are characterized by significant structural shifts (see Figure 1) and these are due to many reasons, for example, the client's decision to split the balances between several of their accounts or transfer an amount of the balance to the "reserve". The DSS [11] is able to take into account the opinion of experts on the choice of the main model.  It is possible to construct a robust interval model based on the introduced concepts. At the entrance to the structurally complex system, we can submit the most aggregated time series using machine learning models, take simple moving averages that show the worst results in the studies of individual series and predict monthly data.
Using historical data for the period 2015-2020, we can emulate the behavior of the system in 2020. By sequentially adding data for the next month to the system, we can extrapolate increasing series using moving averages, aggregating them into the final forecast with an interval of 99%. At the output (Figure 2), we can obtain 11 pairs of upper and lower intervals, respectively, in the form of a set of separate rows with dimensions from 11 points to 1 point.
worst results in the studies of individual series and predict monthly data.
Using historical data for the period 2015-2020, we can emulate the behavior of the system in 2020. By sequentially adding data for the next month to the system, we can extrapolate increasing series using moving averages, aggregating them into the final forecast with an interval of 99%. At the output (Figure 2), we can obtain 11 pairs of upper and lower intervals, respectively, in the form of a set of separate rows with dimensions from 11 points to 1 point. By combining the neighboring first points of the upper and lower rows of the confidence intervals, we can obtain a set of intervals (corridor) in which the actual values of the final prediction should be located (Figure 3). As can be observed from Figure 3, despite the roughness of the model, the actual data of the residual on the EKC during the experiment extended beyond the lower limit of the interval once in August 2020. By combining the neighboring first points of the upper and lower rows of the confidence intervals, we can obtain a set of intervals (corridor) in which the actual values of the final prediction should be located ( Figure 3).
worst results in the studies of individual series and predict monthly data.
Using historical data for the period 2015-2020, we can emulate the behavior of the system in 2020. By sequentially adding data for the next month to the system, we can extrapolate increasing series using moving averages, aggregating them into the final forecast with an interval of 99%. At the output (Figure 2), we can obtain 11 pairs of upper and lower intervals, respectively, in the form of a set of separate rows with dimensions from 11 points to 1 point. By combining the neighboring first points of the upper and lower rows of the confidence intervals, we can obtain a set of intervals (corridor) in which the actual values of the final prediction should be located ( Figure 3). As can be observed from Figure 3, despite the roughness of the model, the actual data of the residual on the EKC during the experiment extended beyond the lower limit of the interval once in August 2020. As can be observed from Figure 3, despite the roughness of the model, the actual data of the residual on the EKC during the experiment extended beyond the lower limit of the interval once in August 2020.

Conclusions
It is possible to define a new class of systems dynamic models, in which the parameters are functions without uncertainties, and the solution is an interval. In addition, this is the first time that the prediction interval has been analyzed in this way, but this view is based on many modern studies, which we showed in this review.
In this review, we also presented a geometric view on the questions surrounding building an interval prediction. We can use graphs (solutions to evolutionary equations or solutions obtained in the course of machine learning of artificial neural networks) as objects that describe the same process in intervals with a given degree of deviation from the initial data, on the basis of which the model was trained.