Analysis and Forecasting of Monthly Electricity Demand Time Series Using Pattern-Based Statistical Methods

: This article provides a solution based on statistical methods (ARIMA, ETS, and Prophet) to predict monthly power demand, which approximates the relationship between historical and future demand patterns. The energy demand time series shows seasonal ﬂuctuation cycles, long-term trends, instability, and random noise. In order to simplify the prediction issue, the monthly load time series is represented by an annual cycle pattern, which uniﬁes the data and ﬁlters the trends. A simulation study performed on the monthly electricity load time series for 35 European countries conﬁrmed the high accuracy of the proposed models.


Introduction
Forecasting loads on power systems is an integral activity embedded in the long-term system operation planning processes and in the processes of the ongoing control of its operation.The system cannot function without accurate forecasts.This is because electricity cannot be stored in large quantities.The demand must be covered on an ongoing basis with production, with the limitations resulting from the flexibility of the production units and the requirements of the reliability and safety of the system operation.The accuracy of forecasts translates into the costs of production, transmission, and the degree of reliability of electricity supply to consumers.Inflated forecasts lead to the maintenance of too many generating units in order to meet the safety requirements to ensure an adequate margin of reserve capacity.Underestimated forecasts have the opposite effect-too few generating units are planned, which are not able to cover the actual demand.In such a situation, additional units with quick start-up are intervened in the traffic, generating additional operating costs [1].
Medium-term forecasting most often concerns forecasting the monthly electricity load (MEL).MEL time series contain components of nonlinear trend, annual seasonality and random disturbances.They show a significant variation in the variance and shape of the annual cycle over time.MEL is highly dependent on economic and socioeconomic as well as climatic and weather variables.The factors disturbing the MEL include unpredictable economic events, extreme weather changes and political decisions [2].The importance of MEL forecasting in the power sector and the complexity of the problem encourage the search for forecasting models that will meet the requirements of the specificity of the task and generate accurate forecasts.
Medium-term electricity demand forecasting is a very well-researched issue.There is a great deal of solutions used to settle this issue, including classical/statistical methods [3,4], neural networks (NNs) [5][6][7], deep learning [8][9][10][11][12] or similarity-based methods [13][14][15][16][17][18].Traditional strategies were first launched for electricity load forecasting.Linear regression, ARIMA, and exponential smoothing (ETS) methods have been widely used [19].The restricted versatile capacities of these strategies and their linear nature have brought about an expanded interest in artificial intelligence techniques [5].Neural networks in [5] are utilized to predict the trend of the time series of MELs and the Fourier series are included to forecast the seasonal component.Then, at that point, the two gauges, trend and occasional changes, are totaled.In [20], a deep short-term memory network (LSTM) was carried out for the probabilistic estimating of client load profiles.The LSTM method is also used for predicting electricity prices in the article [21] with good accuracy.
The simplest classical models include naive models, which assume a selected historical load value as a forecast.In series that exhibit seasonality, such as monthly load series, this is a twelve-month shifted value.Random fluctuations and trend may negatively affect the forecast results.
Linear regression models allow for taking into account the trend (only linear), but the implementation of seasonal cycles in the model requires additional operations, e.g., decomposition of the series into individual months.An example of the application of a linear model for medium-and long-term forecasting of the electricity loads can be found in [22].The model uses strong daily (24 h) and annual (52 weeks) correlations to forecast daily load profiles in the horizon from several weeks to several years.The forecast results are corrected for annual load increments.For forecasting with a one-year horizon, the MAPE errors were obtained with values not greater than 3.8%.In [3], the operation of the linear regression model and the ARIMA model in the task of forecasting monthly peak loads up to 12 months in advance was compared.The models are powered by the same set of inputs, including historical peak load data, weather data, and economic data.About twice the accuracy of the ARIMA model was demonstrated experimentally.For non-stationary time series with an irregular periodic trend, [23] proposed a linear regression model extended with periodic components implemented by the sine function of different frequencies.
The spatial autoregression model for medium-term load forecasting is described in [24].The authors noted a strong correlation between the load on the system and GDP in the analyzed thirty Chinese provinces.To forecast the load in a given province, they used not only the local dependence of the load on GDP, but also the relationships identified for neighboring provinces.This allows to reduce forecast errors from 5.2-5.4% to 3.5-3.9%.Ref. [25] described ARIMA and ETS prognostic methods in combination with bootstrap aggregation.The time series were initially processed using the Box-Cox transform (a procedure often used to compensate for variances) and decomposed.The ARMA model was used for the generation of bootstrap tests, and the ARIMA and ETS models for their forecasting.
Classical methods also use Fourier series.In [5], they were used to model the seasonal component of the time series of monthly loads.Based on spectral analysis, six fundamental frequencies were identified, and a Fourier series was created for them.The forecast of the seasonal component calculated by the Fourier series was added to the trend forecast.Markov chains were used in [26] to analyze the input data and select the best forecasting model.This approach is especially useful when the upward trend of the time series is unstable.The classic models also include the model described in [27], which is based on a simple logistic function.The input variables are time, maximum atmospheric temperature and the social factor, taking into account religious holidays (the model was developed for Arab countries).
In this work, statistical methods for estimating month-to-month power demand are utilized.What differentiates these models from other traditional strategies is that they use pattern representation of seasonal cycles of the time series.The patterns allow us to unify the data and filter out the trend.The input and output variables in the pattern space are characterized by a less complex relationship contrasted with the original space.Thus, due to pattern representation, the classical forecasting method has an easier task to perform.Using time-series preprocessing by calculating yearly patterns constitutes the direct novelty of the proposed models.
Patterns represent fragments of time series and extract information about the shape of these fragments, filter out the trend and normalize the variance.By using a pattern-based representation, the relationship between the input and output variables is simplified, and hence the predictive model is simplified.Unlike parametric systems, which are most often "black box" models, the statistical methods' principle of operation is understandable, which is important in practical industrial applications and translates to a greater degree of confidence in the forecast.

MEL Time Series Analysis
A time series is a set of observations of a certain variable ordered in the time domain: {z t , t = 1, 2, . . ., N}. Formally, the observations z t are implementations of a sequence of random variables {Z t , t = 1, 2, . . ., N} having certain specified cumulative distribution.These definitions are applied to a discrete variable when the time variable t takes values at equal intervals (seconds, hours, days, months).
The purpose of the time series analysis is to detect and describe the regularities affecting the phenomenon expressed in the form of a time series.The following components of time series are distinguished, which are the effect of the influence of various factors on the studied phenomenon [28,29] Periodic fluctuations, also known as seasonal fluctuations [30], are characterized by a constant period, as opposed to a cyclical fluctuations in which there are no fixed period fluctuations.
A time series is called stationary if its statistical properties do not change over time.That is, a time series of {Z t } has the same properties as time series shifted in time {Z t+l }, for each value shift l.In practice, weak stationarity is tested (so-called wide-sense stationarity), which requires immutability mean level, variance and time correlation.It can be written as follows [28]: -the correlation between observations is solely dependent on time shift l.
The stationarity of the time series plays a key role in a forecasting model selection and the construction of forecasts.Establishing forecasts for the stationery time series, where the basic properties do not change over time, is much easier than for non-stationary series.Therefore, in practice, it is often sought to transform the time series into a stationary form before starting the forecasting process.
A series of MELs for Poland with a box diagram, showing the distribution of monthly values in the following years, is shown in Figure 1.The variability of the median and variance is clearly visible over time.The last year of observation is characterized by almost a three times smaller interquartile range (box height) than the first one.Figure 2 shows the MEL time series for 35 European countries.The time series vary in length, from 60 to 288 months, ending with a year 2014.As we can see, these time series are non-stationary, and they show variability of mean value and variance.They are characterized by strong annual seasonality and non-linear trends.
Other forms of graphical presentation of MEL time series are shown in Figures 3 and 4. Figure 3 presents the trends set in the subsequent months of the year.Strong upward trends are observed in the summer months, which is not the case for the winter months.Figure 4) allows us to observe distinctive features of annual cycles and assess the shape similarity of these cycles over the years.Despite the reduction in the amplitude of the yearly fluctuations in subsequent years, the shapes of the annual cycles show similarities.The demand in the winter months is significantly greater than the demand in the summer months.It is observed that there is understated demand in February with the following months (which cannot only be explained by the smaller number of days this month) and overstated demand in March and October.The shape similarities of the annuals cycles are particularly important in the nonparametric regression models [31].The seasonal graph can be compiled in polar coordinates (Figure 4)).Relationship between observations of the time series distant by l time units are assessed using the autocorrelation function (ACF) and the partial autocorrelation (PACF).The former has the form [28]: where γ(h) is a sample autocovariance function: where Z is the mean value of the time series.In the case of PACF, when determining the autocorrelation between Z t and Z t+l , the influence of intermediate observations on this is eliminated dependency, Z t+1 , . . ., Z t+l−1 .Sample partial autocorrelation is an estimate of a theoretical PACF function of a given pattern: where P t,l (Z t ) is the orthogonal projection operator on linear subspace spanning over variables Z t+1 , . . ., Z t+l−1 .
Examples of ACF and PACF charts for the MEL time series are shown in Figure 5.The horizontal lines designate the confidence intervals, which allow a conclusion to be drawn about the statistical significance of the autocorrelation.The 95% confidence interval has the form − 1.96 √ N , 1.96 √ N .Autocorrelations outside this range are considered to be statistically significant.On the ACF chart, we can see very strong oscillations, proving a clear seasonality of the series.According to what is expected, the strongest autocorrelation occurs for lags which are multiples of twelve, which is confirmed by the annual cycle.The big PACF value for delay l = 1, close to 1, can signal the presence of an uptrend.
Seasonal fluctuations can also be identified using the harmonic (spectral, spectral) analysis [32].It leads to creating a model consisting of the sum of the sine and cosine functions different frequency.A time series of length n is recorded using the Fourier series as follows [33,34]: where a 0 , a i and b i are the coefficients which are determined from patterns: The magnitudes of the amplitudes for successive harmonics are as follows: The amplitude of i-th harmony, A i , testifies to the participation of this harmony in explaining the variance of the considered variable.This share is expressed by the following formula [33]: The amplitudes of successive harmonics for the MEL time series for Poland are shown in a periodogram (Figure 6).As can be seen, the dominant period in this series is a year.

Forecasting Model
Let us consider the monthly electricity demand time series beginning from January and finishing off with December: E = {E t : t = 1, 2, . . ., N}.We divide each time series into yearly fragments E i = {E t : t = 12(i − 1) + 1, 12(i − 1) + 2, . . ., 12(i − 1) + 12)}, i = 1, 2, . . ., N/12.Every fragment can be described by a vector Let us formulate an x-pattern x i = [x i,1 x i,2 . . .x i,12 ] T as a vector, which represents a yearly fragment E i .A function, which transforms time series points into patterns, is chosen, including the character of the time series, such as trend, variance, and seasonalities.We proposed a few definitions of this function in [35].In this study, x-patterns are defined as [8] x where j = 1, 2, . . ., 12, E i is a mean of sequence E i , and The X-pattern is a normalized E i vector.Note that yearly fragments expressed by E i have a different mean and dispersion.After the normalization process, time-series fragments are also unified: all x-patterns have unity length, the same variance and also the mean value of these fragments equals zero.x-patterns carry information about the shapes of the yearly fragments.Then, the new time series composed of x-patterns representing successive yearly periods are created: x = {x i : i = 1, 2, . . ., N/12} = {x 1,1 , x 1,2 , . . ., x N/12,12 }.Note that it is distinguished by regular character and stationarity.
The forecasting procedure using the time series composed of x-patterns requires determining the demand forecast using the x-pattern forecast.After generating the xpattern by the forecasting model, the MELs in the forecasted yearly period are computed from the forecasted x-pattern using transformed Equation (8) (this is called decoding): However, in this equation, the coding variables, E i and σ i , are not known because they are the mean and dispersion of the future fragment, which is forecasted.So, the coding variables must be forecasted based on their historical values.ARIMA and ETS models are used for this purpose in this work in every model and for preprocessed time-series forecasting, when we use these two models [8]. Figure 7 presents the idea of pattern-based forecasting using a block diagram of the proposed methodology presenting data structures and illustrating data flow.

Autoregressive Integrated Moving Average ARIMA Model
The autoregressive integrated moving average ARIMA model ARI MA(p, d, q)(P, D, Q) m [30,36] was used to model the MEL time series: where z t are the terms of the time series (in the considered case of the series {E t , t = 1, 2, . . ., N}), m is the length of the annual cycle (m = 12), B is the backward shift operator, D and d are the orders of seasonal differentiation and of the ordinary, respectively, φ(.), Φ(.), θ(.), and Θ(.) are polynomials of degree p, g, P and Q, respectively, c denotes a constant, and ξ t is a white noise process with zero mean and the variance of σ 2 .
The ARIMA model was also used to construct predictive models for the code variables Êi and σi needed to decode the pattern x.In this case, the ARIMA model can be simplified to the non-seasonal ARIMA form (p, d, q): where z t represents the terms of the time series E 1 , E 2 , . . ., E N−1 or {σ 1 , σ 2 , . . . ,σ N−1 }.

Exponential Smoothing ETS
The seasonal exponential smoothing method (Holt-Winters method), known since the 1960s, is, next to the ARIMA method, one of the most frequently used in practice.The idea of ETS is to assign exponentially declining weights to observations from previous periods.Thus, observations from recent periods have a greater impact on the forecast.The Holt-Winters model is based on three smoothing equations that represent the level of the forecast variable, its increment and seasonality.There are two types of Holt-Winters methods, which depend on how seasonality is modeled.The additive version of the Holt-Winters method is used when seasonal fluctuations are independent of the trend.On the other hand, the multiplicative version is used when there is a proportional relationship between seasonal fluctuations and the trend [28].
The equations for the additive version of the Holt-Winters method are [36] Level where m is the period of seasonal fluctuations (m = 12 for the MEL series), h is the forecast horizon, h + m = [(h − 1) mod m] + 1, and α, β, γ are the smoothing coefficients from the range (0, 1).
Before using the model, the initial values of the states l 0 , b 0 , s 1−m ,. . ., s 0 and the smoothing parameters α, β and γ should be given.All these values are then estimated from the observed data.
In [36], ETS models were defined by state equations and classified into 30 types.These types differ in the ways in which the model includes the trend components (the sum of the level and the increment), seasonality and error.Components can be expressed as additive or multiplicative, and the trend can be further suppressed.
The ETS model was also used to build forecasting models for the code variables Êi and σi .

Prophet
Prophet is a time series forecasting method designed by Facebook for direct use in business applications [37].It is distinguished by a completely automatic forecasting procedure with an intuitive selection of parameter values that can be adjusted without knowing the details of the base model.Prophet is immune to data deficiencies and trend changes.It usually copes well with outliers.The model implementation is publicly available in the Python and R environments.
Prophet is an additive model with three components-non-linear trend, seasonality, and a component that represents holidays: where g(t) is a trend function modeling non-periodic changes in the value of the time series, s(t) represents seasonal changes (e.g., weekly and annual seasonality), h(t) represents holiday effects, and t is a residual component with a normal distribution.
The model specification is similar to the generalized additive model (GAM), with nonlinear smoothing and time t as the sole regressor.The trend is modeled in two ways: using a piecewise-linear model or a limited-growth model.In the latter case, the logistic curve is used: where C(t) is the carrying capacity, k(t) is the increment function, and q(t) is the offset function.
In (14) both the carrying capacity and the increment and offset are functions of time.They vary depending on the characteristics of the time series.This makes it possible to flexibly shape the trend function.
The seasonal component is modeled using the Fourier series: where m is the length of the seasonal cycle, and a i and b i are the coefficients.
The component representing the effects of public holidays, h(t), is not applicable in forecasting MEL time series.

Simulation Study
In this section, we analyze the monthly electricity demand time series for 35 European countries and verify our proposed forecasting model on these time series.The data were downloaded from the ENTSO-E repository, (www.entsoe.eu,accessed on 12 April 2016).

MEL Time Series Analysis Results
Table 1 shows the statistics and parameters describing the analyzed time series:

•
Median-median as a measure of the average level of the series, • IQR-average of the annual interquartile ranges as a measure of the annual dispersion of the series, • iqr % -mean relative annual dispersion as mean ratio of annual interquartile ranges to annual medians: where M is the length of the series in years, IQR i and Median i is the interquartile range and the median of the year i, • ACF(12)-value of the autocorrelation function for the delay l = 12, • u 12 -annual-period harmonics share in the series variance (7).
The countries classified as the largest, most developed and with the largest number of inhabitants in Europe have the greatest demand for electricity, i.e., Germany, France, Italy and Great Britain.The greatest relative annual volatility iqr % is characterized in the following order: Norway, Sweden, Montenegro, Bulgaria, Estonia and France.In these cases, the interval of the median is over 25% of the median.The least annual volatility is observed for Italy and Iceland (less than 7%).
The strongest autocorrelation for the annual delay l = 12 (ACF(12) ≥ 0.9), signaling the clearest annual cycles, occurs for the time series of the MEL of Switzerland, Spain, Portugal, France and Italian.The MEL time series of Montenegro, Northern Ireland and Iceland show the lowest values ACF (12).
The annual-period harmonics share is the highest for Norway, Sweden, Finland and Estonia.There is a high correlation u 12 with mean relative annual dispersion iqr % .The harmonic analysis in some cases gives very low values u 12 despite the high values of the annual autocorrelation, e.g., for Italy, Greece and Spain.Periodograms for these countries show a high bar for the year period but many lower bars for higher frequencies.This means that there are disturbances in the series of those countries with the lower value of u 12 , masking the picture of the annual cycle.Figure 8 shows pie charts presenting the shares of each components of decomposition, i.e., trend, periodic fluctuations and random fluctuations, in the total variance of the MEL time series.The results were used as seasonal-trend decomposition using LOESS (STL).The shares were calculated from the formulas where T t , S t , R t is the component of the trend, respectively, seasonal and random fluctuations, and E t is the MEL time series.By analyzing pie charts, one can divide the MEL time series from due to the dominant component.Countries whose time series have the highest content of the trend component, above 80%, are Spain, Portugal, the Netherlands and Italy.The highest share of the seasonal component, over 90%, distinguishes Norway, Finland, Estonia, Sweden and Ireland.Countries whose MEL time series show the dominance of the seasonal component comprise the most numerous group.It should be noted that the results of these analyzes are dependent on the length of the time series.The trend shines through the time series longer ones, while in short ones, the seasonal component dominates.Montenegro is the only country whose series includes the disturbance component as dominant (53%).Other countries with a high proportion of random fluctuations, over 30% are in the following order: Northern Ireland, Serbia, Iceland and Slovenia [31].

MEL Time Series Forecasting Results
It follows that the time series show extensive contrasts and permit us to dependably assess predictive models.The forecasting issue is to create the multi-step-ahead forecasts for the each month of 2014 (last year of data) utilizing the information from the past period for training.For hyperparameter selection, the models were trained and validated on data up to 2013.
In this work, classical statistical models in connection with preprocessing are proposed: The proposed models were here compared with other computational intelligence models as well as classical statistical models.They include the following: • k-NN-k nearest neighbor is an ML-type model designed for the medium-term load forecasting (MTLF) [13].It learns from patterns defined by ( 8).• N-WE-Nadaraya-Watson estimator [39] for MTLF using pattern representation (8).• LSTM-Long short-term memory (LSTM) network developed for MEL forecasting in [8].It works on patterns (8).• MLP-Multilayer perceptron proposed in [6].This model is developed for MEL forecasting.It learns from patterns defined by ( 8).• ANFIS-adaptive neuro-fuzzy inference system proposed for MTLF in [40].It works on patterns (8).• SVM-Support vector machine proposed for MTLF in [41].It works on patterns (8).• ES-RNN-A hybrid method of exponential smoothing and recurrent neural networks [42][43][44].
The length of the x-patterns is the one of the main hyperparameters for k-NN, N-WE, LSTM, MLP, ANFIS, and SVM models.Despite the natural choice for this hyperparameter, which is equal to the seasonal cycle length, i.e., 12 for the monthly type of time series, the optimal value of n, in a range from 3 to 24, for each model and each time series was selected in leave-one-out procedure using historical data.
The parameters of the ARIMA, ETS, and Prophet models were selected in the optimization procedures implemented in the auto.arima,ets, and prophet functions, respectively.These functions ensure a fully automatic selection of model structure and parameters for each time series individually.The MLP model learned from x patterns.A separate MLP network was trained for each time series and each month of the forecast period (2014).A single-hidden layer network with sigmoid activation functions was used.The networks were trained using the Levenberg-Marquardt method with Bayesian regularization, which helped prevent overfitting.The number of hidden nodes was selected from 1 to 10, individually for each time series and each forecasted month.The adaptive-network-based fuzzy inference system, ANFIS, like MLP, learned from x patterns.A separate ANFIS model was trained for each time series and month of the forecast period.The initial parameters of the Gaussian membership functions in the premise parts of the rules were selected using fuzzy c-means clustering.ANFIS was trained with a hybrid method that uses a combination of the least squares method to estimate the consequent parameters and the backpropagation gradient descent method to select the premise parameters.The number of fuzzy rules M was selected from 2 to 13. SVM, like MLP and ANFIS, is learned from x patterns.For each time series and each month of the forecast period, a separate SVM model was trained with kernels in the form of a dot product K(x i , x j ) = x T i x j .The length of the input pattern was selected for each series.The remaining hyperparameters (BoxConstraint, KernelScale, and Epsilon) were selected in the automatic optimization procedure implemented in the fitrsvm function from the Statistics and Machine Learning Toolbox in the Matlab environment.LSTM networks in all variants were trained using the Adam optimization algorithm (adaptive moment estimation).The number of hidden units was selected for each time series individually from the set {1, 2, . . ., 10, 15, . . ., 50, 60, . . ., 200}.The remaining hyperparameters assumed default values: number of epochs, 250; initial value of the learning rate, 0.005; and a threshold value of the gradient (to prevent gradient explosion), 1.In the middle of the learning process, the learning rate was reduced to 0.001.The ETS+RD-LSTM model learned on all-time series simultaneously (cross-learning).The optimal values of the hyperparameters of this model for the set of 35-time series were as follows: number of epochs, 16; learning rate, 0.001; length of the cell and hidden state, 40; pinball loss, 0.4; regularization parameter, 50; and ensembling parameters, L = 5, K = 3, and R = 3.
In the case of the k-NN, MLP, ANFIS, SVM, and LSTM models, the optimal values of their hyperparameters were selected individually for each of the 35 time series.The selection of parameters was carried out separately for each of the variants of the models, V1, or (V2, and V3).The selection of hyperparameters was performed on the training set in the grid search procedure using cross-validation-the leave-one-out cross-validation method or by treating the last year of training data (2013) as validation data (SVM and LSTM models).The training set contained all terms of a given series from the historical period, up to and including 2013.The model with the optimal set of hyperparameter values was used to forecast the time series in the test period, which covered 12 months in 2014.Optimization procedures for ARIMA, ETS, and Prophet models are fully automatic, built into the functions that implement these models (auto.arima,ets, prophet).The ETS+RD-LSTM model was optimized on training data, treating the last year of training data (2013) as validation data.
The k-NN, N-WE, ARIMA and ETS are deterministic models, and they return the same results for the same data.NN-based models, for example, MLP, ANFIS, LSTM, and SVM, return various outcomes for the same data because of the stochastic type of the learning system.In this study, these models were run 100 times, and the final errors were averaged from 100 independent trials.
Taking into account the x-pattern encoding variants described in Section 3, three variants of each models (which also work with pattern representation usage) are considered: • V1.The basic variant, where the coding variables for x-patterns are the mean and dispersion of previous sequence X i−1 for k-NN, N-WE, LSTM, MLP, ANFIS, and SVM.
No patterns and coding variables are used for ES-RNN, ARIMA, ETS, LSTM, and Prophet in this case.We can use this variant to forecast the MEL from (8) without additional forecasting for coding variables.• V2.The variant, for which the mean and dispersion of sequence X i serve as the coding variables.Using ARIMA model, they are both independently forecasted for the query pattern based on their previous values.This variation broadens the denotations by "+AR", e.g., "k-NN + AR", and "ANFIS + AR".• V3.The mean and dispersion of sequence X i serve as the coding variables, as in variant V2.However, in this instance, they are predicted using ETS for the query pattern."+ETS" is used to extend the denotations in this variant, such as "k-NN + ETS", and "ANFIS + ETS".
The following measures are used to assess the quality of forecasts and forecasting models: • Percentage error (PE): where E is the actual value and E is the forecasted value.
• Mean percentage error (MPE): • Absolute percentage error (APE) : • Mean absolute percentage error (MAPE): • Interquartile range of absolute percentage error (IQR): where Q1(APE), Q3(APE) are the lower and upper quartiles, respectively.The quarter range allows you to assess the variability of the APE error.It includes 50% of all observations located centrally in the distribution.
• Root mean squared error (RMSE): • Standard deviation of percentage errors (stdPE): • Coefficient of asymmetry (skewness) of the distribution of percentage errors (skewPE): The coefficient of asymmetry is zero for symmetric distributions, negative values for left asymmetric distributions (most of the population is below average) and positive for right asymmetric distributions (most of the population is above average).
• Kurtosis of percentage error distribution (kuPE): kuPE is a measure of the clustering of PE errors around the mean value of MPE.The higher the kurtosis value, the more slender the distribution of errors and the greater the concentration of their values around the mean.
The median absolute percentage error (MdAPE), mean absolute percentage error (MAPE), interquartile range of absolute percentage error (APE) as a measure of the forecast dispersion, and root mean square error (RMSE) are all shown in Table 2.The RMSE error measure, as can be seen from this table, indicates that Prophet + ETS is the most accurate model when compared to its rivals.In cases of MdAPE and MAPE metrics, Prophet + ETS results are very close to the state-of-the-art model ETS-RNN + ETS, but the predictions are fully interpretable, and the model has a small number of parameters to estimate.In Table 3, the forecast errors and the descriptive statistics for the percentage errors are shown.The bias of the forecasts can be estimated from the average value of the PE error.For all models, the mPE is negative, which means that the forecast is overstated.The least loaded forecasts are made by the Prophet model (mPE = −0.70),and the most by the LSTM model (mPE = −3.12).The distribution of PE errors is described by the statistics: medPE, stdPE), skewPE, and kurtPE.The distribution of Prophet + ETS errors (medPE = 0.00) is the closest to zero, and the most distant LSTM (medPE = −1.81).The PE errors for the SVM model show the greatest dispersion around the mean (stdPE = 16.76), and the Prophet + AR errors the smallest (stdPE = 7.02).The most slender and centered around the mean value are the error distributions MLP + AR, MLP + ETS, ANFIS + AR, ARIMA + AR, ETS, ETS + AR, Prophet, Prophet + AR (kurtPE > 15), and the most flattened is error distribution MLP + ETS (kurtPE = 11.83).The negative skewness values, skewPE, which characterize the PE distributions of all models, indicate the left-hand skewness (the greater part of the population has values below the average).The most flattened distribution shows the greatest symmetry, obtained for ANFIS + ETS (skewPE = 0.96), and the smallest, the most slender distribution, obtained for SVM (skew (PE) < −13.40).Figure 11 shows the MAPE errors broken down into individual months of the forecast period (2014).Attention should be paid to errors that are lower for months 8-10 and higher for months 1-4 and 12 [18].The lowest errors were most often achieved by models with the forecast of code variables using ETS.Examples of forecasts generated by the models with the + ETS representation for several European countries are shown in Figure 12.The forecast errors of the PL series, except for the ETS model, do not exceed 2%, which should be considered a very good result.DE, ES, and IT ranks are forecast with a slightly greater error.In the case of DE, there are greater deviations in the forecasts set by ANFIS + ETS for September.The GB series is projected well below the actual mileage.This is due to the unexpected increase in demand in the UK system in 2014, despite the downward trend observed in 2010-2013 (see Figure 5).The opposite situation for FR, ES, and IT resulted in a slight overestimation of forecasts.MAPE and RMSE results are presented in Figure 13.The time series of code variables and their forecasts for six sample countries are shown in Figure 14).Figure 15 shows Wilcoxon tests and the distribution of APE errors.The models were ranked from the lowest to the highest median (APE) value.The first two positions are occupied by the ETS-RNN models using the forecast of code variables.It should be noted that the proposed statistical models achieve higher positions in the case of using ETS to forecast code variables.ANFIS models take final positions to confirm the statistical significance of the differences in APE errors.Wilcoxon tests were performed for each pair of models.A white diagram element means that the models intersecting this element do not differ statistically in terms of APE.A yellow element means that the model pointed to by this element on the OY axis has reached a smaller error than the model indicated on the OX axis.A red element means that the model pointed to by this element on the OY axis has reached a greater error than the model pointed to on the OX axis. Figure 15 shows that models without prediction of code variables in many cases bring greater errors than hybrid models that contain predictions of these variables.The results of the Wilcoxon test for the ETS-RNN + ETS model, which is characterized by the lowest MAPE error (see Table 3), is similar to the results of the Prophet + ETS model.Both of these models show advantages over at least twenty-three other models (statistically significant difference in errors) and are as accurate as the other models.The ETS-RNN + ETS model shows the greatest advantage: in terms of error, the APE is more accurate than twenty-five other models.Summarizing the preliminary results, it should be noted that the performance of the prediction model is completely dependent on proper TS preprocessing.The introduction of initial normalization and predictive coding variables through ETS in the classic models significantly improves performance.

Discussion
In the ARIMA and ETS models, the optimization is global, i.e., their parameters are adjusted to ensure the lowest error for all terms of the time series (however, the length of the series can be restricted to the last terms so that the model takes into account only the specificity of the last historical period).The memory of the ARIMA model is limited to the last values of the time series (the size of the memory is determined by the rows of the AR and MA processes).Thus, the predicted value is "constructed" from the last terms.The ETS model is based on all values, but their influence on the predicted value decreases exponentially with time, i.e., more distant points have lower weights.Unlike these models, for example, k-NN constructs a local model individually for each query pattern.For the construction of the forecast, it uses all terms of the time series, not limited to the last period, as in ARIMA, and without introducing weights depending on the time distance, as in ETS.
To construct a local regression function, k-NN considers values that may be distant in time from the query pattern if such distant cases are similar in shape to the query pattern.It is worth noting, however, that the time distance information can be easily introduced into, for example, N-WE by additional weights (decreasing in time) assigned to the x output patterns in the model.We can also enter weights based on seasonality.Outliers in the time series interfere with the selection of ARIMA and ETS parameters, leading to suboptimal models.In, for example, k-NN, as previously stated, outliers have a reduced impact in part.An additional distinction between statistical models and, for example, N-WE, is that the former generates forecasts one step ahead.Forecasts with longer horizons are achieved recursively, taking the the prediction for the preceding time step as an input for the prediction of the following time step.In contrast, for example, k-NN predicts the x-pattern, representing the entire predicted sequence, in one step.

Conclusions
In this paper, statistical methods for mid-term load forecasting were proposed.The input data for the models represent the normalized annual seasonal cycle of a load time series with filtered trend and unified variance.They express shapes of the yearly cycles.The proposed approach uses statistical methods for forecasting annual patterns and also for forecasting coding variables for decoding these patterns.Due to the pattern representation of the time series, forecasting models do not need to grasp the essence of the complex time series, which simplifies the predictive problem.
In the experimental part of the work, classical models with pattern representation were tested on MEL predicting issues for 35 European countries.The outcomes showed the advantages of the proposed approach over an alternative approach without forecasting coding variables.For statistical models and also for many comparative models, the proposed approach improved the accuracy.For example, in the cases of Prophet + ETS, ETS + ETS, and ARIMA + ETS usage, it leads to outperform predecessors Prophet, ETS, and ARIMA about 13.7%, 17.4%, and 25% in case of MAPE error.The proposed models can be further implemented for short-term load forecasting.

Figure 5 .Figure 6 .
Figure 5. Graphs of autocorrelation and partial correlation of the MEL time series for Poland (lag means delay l).Periodogram

Figure 7 .
Figure 7.A block diagram of pattern-based MEL forecasting methodology.

Figure 8 .
Figure 8.Shares of individual components decomposition in the total variance of the MEL time series.

Figure 9
Figure 9 provides more specific results, such as MAPE for each nation.It is important to note that Prophet + ETS is frequently one of the most precise models.The model rankings, based on MAPE and RMSE, are shown in Figure 10.They display the models' average positions in the country-by-country rankings.Take note of Prophet + ETS's top spot in both rankings.

Figure 10 .
Figure 10.Rankings of the models.

Figure 13 .
Figure 13.Errors of the models.

Figure 15 .
Figure 15.APE and Willcoxon test results for each model.

Table 1 .
Statistics and describing parameters analyzed MEL time series.
[38]returns the optimal model estimated by the model parameters using AICc[38].• Prophet -modular additive regression model with nonlinear trend and seasonal components [37] implemented in function Prophet in R environment (package prophet).
[4]]ines unit root tests, maximum likelihood estimation and minimization of the Akaike information criterion (AICc) to obtain the optimal ARIMA model[38].•ETS-exponentialsmoothing state space model[4]used in function ets (R package forecast).This implementation uses many types of ETS models depends on the consideration of the trend, seasonal and error components.It can be expressed multiplicatively or additively, and the trend could be damped or not.Similar to the case of auto.arima,

Table 2 .
Results comparison among proposed and comparative models.

Table 3 .
Descriptive statistics of percentage errors among proposed and comparative models.