Hybrid Load Forecasting Using Gaussian Process Regression and Novel Residual Prediction

: Short-term electricity load forecasting has attracted considerable attention as a result of the crucial role that it plays in power systems and electricity markets. This paper presents a novel hybrid forecasting method that combines an autoregressive model with Gaussian process regression. Mixed-user, hourly, historical data are used to train, validate, and evaluate the model. The empirical wavelet transform was used to preprocess the data. Among the perturbing factors, the most influential predictors that were recorded were the weather factors and day type. The developed methodology is upgraded using a novel closed-loop algorithm that uses the forecasting values and influential factors to predict the residuals. Most performance indicators that are computed indicate that forecasting the residuals actually improves the method’s precision, decreasing the mean absolute percentage error from 5.04% to 4.28%. Measured data are used to validate the effectiveness of the presented approach, making it a suitable tool for use in load forecasting by utility companies. H.B.: resources; D.M., S.P., S.D.C.: supervision and writing of


Introduction
In recent years, considerable effort has been invested in the reduction of energy use, the promotion of renewable energy sources (RES), and the implementation of mitigation measures that will lead to a reduction in the environmental cost. The industrial sector is believed to account for 38% of total global energy consumption. Studies have shown that by simply implementing the energy efficiency solutions that are currently available, by 2040, industry could produce almost double the value per unit of energy, compared to its current levels [1]. Considering the emerging trends in energy consumption coupled with rising energy prices, it is crucial that the implementation of new forecast methodology is customized to users' specific profiles and capabilities of user profile.
Energy load forecasting has been one of the main focal points of energy-related research in both industry and academia in recent years. Nowadays, almost all decisions regarding the electricity market and utility industry use load forecasting as their primary criterion. For example, system operators use energy load forecasting to ensure power system reliability. It has also been proven to be a useful dynamical method for aggregators participating in the energy trading market or managing electricity consumption. Forecasting methods may essentially be classified as follows: very short-term load forecasting (a few minutes ahead to a several hours), short-term (one day ahead to one week ahead), medium-term load forecasting (ranging from one month to one year), and longterm load forecasting (more than one year) [2]. Short-term predictions are used in demand and supply management and to balance electricity generation [3], transmission, distribution and use on different scales [4].
Another aspect of load forecasting in the context of the energy sector is the integration of RES into the power grid [5]. The use of a high-quality forecasting model may increase the impact of renewable sources [6]. In terms of energy management, load forecasting is an indispensable tool in demand response (DR) strategies [7]. However, the execution of reliable energy load forecasting is not easy, and several challenges arise in the process [8]. For this reason, researchers focus on overcoming these obstacles using new technologies, such as big data [9].
When energy load forecasting was still nascent, simple methods, such as linear regression [10], hybrid regressions [11], Kalman filters [12], Kalman estimators [13], or autoregressive integrating moving average (ARIMA) [14] and metabolic nonlinear grey models [15], were used. While these models are effective when dealing with linear data, they are unable to provide accurate forecasting in complex, non-linear energy load time series. Load forecasting methods have changed in response to advances in artificial intelligence neural network development [16]. Paras et al. used a neural networks approach for very short energy forecasting [17] and Nima et al. integrated a feature selection to the neural network algorithm [18]. As these models used optimal parameters [19], hybrid methods-basically, artificial neural network optimized methods-were proposed [20]. Many of the optimization methods used rely on feature selection [21] or hybrid feature selection to improve the parameters used for forecasting methodology [22]. Recent research has shown that performing a feature selection when starting to train the algorithm yields better results, as the model will be more likely to learn the characteristics of the time series when dealing with the optimum input-output set [23].
Hybrid algorithms that combine at least two prediction methods when developing the forecasting model have been developed [24]. Huaiguang et al. proposed a support vector regression method enforced with feature selection of data [25]. Sibonelo Motepe also proposed a hybrid methodology but using deep learning algorithms [26]. Due to the proven efficiency of these methods, the research presented herein is also based on a hybrid approach. Unlike traditional forecasting methods that are used to predict the load on a single node (e.g., a substation), the focus of this paper is on short-term total energy load forecasting, which yields greater accuracy by using the measured information in all nodes of interest. Another benefit of the proposed methodology over the classical methods mentioned above is the use of the residuals as predictors. This approach will customize the algorithm to better respond to the characteristics of the data. By doing this with each iteration the algorithm will better adapt to the data sets.
The hybrid algorithm described in Figure 1, uses the autoregressive (AR) time series model in combination with Gaussian process regression (GPR), coupled with a data pre-processing algorithm, the empirical wavelet transform (EWT), for data optimization. This combination of parametric and non-parametric time series, AR and GPR, showed great results in predicting the energy load consumption characteristics [27]. For optimizing the forecasting model, the use of rational quadratic (RQ) kernel function is proposed. This covariance function was chosen after testing different other kernels, due to its compatibility with the non-linear profile of load data.
The remainder of this paper is organized as follows: Section 2 presents the energy load data used to develop the forecasting method. The subsequent section deals with data preprocessing and presents the EWT used. Section 4 presents the GPR hybrid forecasting method proposed in this paper. Subsequently, the model performance indicators used to assess the algorithm are described in Section 5. Section 6 presents the results and evaluates the forecasting method's performance. The final section of the paper concludes the study with some of the authors' observations and recommendations for future research regarding the presented topic.

Data Analysis
This paper tests how different factors impact energy consumption behavior. To develop the forecasting algorithms, load data were gathered from the Romanian energy transmission and system operator [28] and correlated with weather measurements for the city of Bucharest [29]. The weather data were correlated with energy load data to provide information at hourly intervals on the same time samples. Energy load data consist of energy consumption for every hour, starting with January 1, 2018 and ending on June 1, 2019. The weather data that were taken into account consist of five variables that were sampled in the same time intervals. The weather factors that were considered to have the greatest impact on load consumption were temperature, changes in atmospheric pressure, dew point temperature, the level of precipitation in mm, and air humidity, all measured at two meters above ground reference. Another important factor considered was whether or not the day was a working day. This data type is binary and is 1 if the day is a weekend day or a national holiday and is 0 otherwise. The final factor to be considered was the date (month, day, day of the week); this criterion was added as, upon data analysis, this principle was observed to create patterns in load demand. Prior to algorithm implementation, all data were processed, and an outlier test was performed to improve the datasets.

Empirical Wavelet Transform
EWT [30] is used to preprocess the energy load data. This will ensure that the analyzed data will follow a smooth path that shapes accordingly to data changes without capturing the peak values introduced by field equipment. All data are preprocessed before using the forecasting algorithm. This method is not used in real time because data are collected until the end of the day. After acquiring the new data set, the EWT preprocess starts and the result is then added to the historical data. This wavelet method adapts to the processed signal by building a set of bandpass filters. To construct the wavelet filter bank, a robust peak detection is used prior to determining a maxima for the spectrum segmentation. This method is proposed based on multiple studies that have validated it as a perfect fit for non-stationary time signals as an energy load measurements vector [31].
To perform a short description of the algorithm, first, a limit, , is defined between each step segment. The segments are denoted with The corresponding wavelets are then defined as: After defining the wavelet methodology, the detailed coefficients and the approximation coefficients required for the signal reconstruction must be defined. The detailed coefficients are obtained from the inner product of the signal with the empirical wavelets, and the approximation coefficients result from the inner product of the signal with the scaling function: The signal reconstruction is obtained by summing the approximation with all the detailed coefficients, as shown below: For the energy load signal, a level three decomposition with the proposed EWT method is applied. A graphical comparison of the original signal and the reconstructed signal after processing is illustrated below in Figure 6. As presented, the EWT, smoothens the input data which will prove very beneficial for the forecasting algorithm because it will help form patterns more easily.

Energy Load Forecasting Method
For the proposed energy load prediction, several forecasting models were tested. The best match for the above-presented historical data was an AR model combined with GPR and an error-adjusting function specific to the datasets.
First, a simple AR model is presented, beginning with the mathematical representation [32]: where ci is the method coefficient for the current iteration and represents how the previous value influences the current value, em represents the error for the mth value, and ct is the first coefficient of the function with a constant value.
To compute the order of the AR model, a partial autocorrelation function is used with Yule-Walker equations: where k is the value at which the correlation function Φpk cuts off after exceeding the model order, M represents the number of observations, and ct is the mean value iteration of the finite time series. The next step for the method after determining the order of the model is to estimate the model coefficients. To accomplish this, the first equation is expressed in vector form to express the interconnection between the input and the output: where w represents the number of input/output pairs that are constructed using the original time series. Having written the equation in vector form, we can determine the coefficients of the model using the least squares method: Next, the proposed GPR model is presented [33]. It can be written as ( )~(m( ), k( , ′)) , an expression of its mean and covariance functions. The expression of the output regarding the n-order input vector is computed as follows: where e represents the noise, and may be approximated with a Gaussian distribution of zeromean value, a variance of , x is the input vector, and y is the scalar output: The model, together with the noise assumption, determines the likelihood that represents the probability density based on the parameters. The objective is to construct a training set = ( , )| = 1,2 … , ⱳ using the input vector and scalar output described above: The likelihood presented in the above expression is given by the Gaussian distribution with | mean and variance, = [y , y … y ⱳ ] , = [f( ), f( ) … f( ⱳ )] , and I being the unit matrix of ⱳ order. Marginal distribution p( ) is defined as having mean zero and a Gram K matrix: With the above knowledge, the marginal distribution of y is computed: To ground the predictions in theory, for the predicted output * with the input vector * , all the parameters' values are averaged and weighted according to their posterior probability: * ~ , * * * * + If there are ⱳ training data points, then * is the ⱳ order matrix of the covariances evaluated with all the data points for training and testing. Having considered the constraints of the conditioning Gaussians [33], the prediction can be determined: where * * = ( * , * ) and is the covariance matrix. The kernel, or covariance, function is that which defines the distance and similarity between the training data points. After testing various functions, the best performing kernel function was the RQ: This can also be represented as an infinite sum of squared exponential covariance functions with different length scale characteristics (l) and scale mixture (α). RQ kernel was determined as the best choice to minimize the distance between two neighbors through experimental testing. It was decided that for this specific data string from all tested functions the best match was RQ. After integrating the kernel function, the prediction mean and covariance may be calculated: This paper's forecasting method is based on a hybrid of the autoregressive model and a double GPR method, which takes a predicted error into consideration: where, h(x) ~ (0,k(x,x ′ )), c represents the method coefficients matrix and ( ) represents the predicted residuals. This formulation indicates that the output is the summation of a zero-mean Gaussian process, a set of fixed basis functions, and the zero-mean process of the residuals.

Results and Analysis
To test the method's accuracy, three of the most commonly used performance metrics in machine learning were used: mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE) [34]. Each of these metrics measures the difference between the experimental and the predicted data. The smaller their value, the better the algorithm will perform: where * are the measured and predicted parameters at i time and n represents the number of the considered data.
The algorithm was trained and tested using the energy load historical data described above. Before starting the validation process, and in order to enhance the processm some measurements against overfitting are required. Overfitting represents the set of data that corresponds too closely or exactly to another particular set of data and may therefore fail to predict future observations reliably. To protect against overfitting, a fivefold cross-validation method was used. The described performance metrics of the algorithm are MAPE= 4.89%, RMSE= 245.3 MWh, MAE=188.4 MWh. Figure 7 presents a comparison between the predicted and actual values of the load. The first observation is that the forecasted load mimics the characteristics of the historical data. Also, as computed above, the results are expected to generate solid forecasting data for this specific energy load trend.  As proposed in the methodology section, to further improve the algorithm, the residuals were predicted using the same regression methodology. Applying this dual-layer prediction process improves the forecasting values and the performance metrics: MAPE= 4.09%, RMSE=221.6 MWh, and MAE=153.2 MWh. Figure 9 illustrates the load's actual value in blue, the predicted value in red, and the dual-layer prediction in magenta. To further validate and test the algorithm's fitness for the specific load pattern, another test was performed. As presented above, the algorithm was implemented on historic energy load data from January 2018 to June 2019. Data collected from the month of July were used to test the algorithm's performance. Figure 10 presents a graphical representation of the measured and predicted energy load values for this specific period of time. This graph validates the algorithm that uses residual prediction and also shows its superiority over the hybrid method without error prediction. A comparison was made between the proposed forecasting method with and without error prediction. Table 1 presents the performance metrics for the July testing.

Conclusions
Accurate electricity load forecast plays an essential role in planning for utilities and electricity markets in electricity power networks. This paper proposes a novel hybrid GPR forecasting method of the aggregated energy load of a mixture of residential, commercial, and industrial clients. The relatively new EWT method was used for data preprocessing. To ensure a precise load prediction, some of the most important influencing factors were selected. Among these, the most influential were temperature, working/non-working day, and humidity. The hybrid method assumes the combination of an autoregressive time series model and GRP. Using the same methodology, further forecasting of the residuals was added. Following implementation of the algorithm, the method was validated on a new set of data. It was observed that a better approach would be to further apply an error prediction for the residuals. The proposed forecasting method was evaluated using three performance metrics. For the presented load data, this algorithm presents good results that mimic the characteristics of the aggregated plot. The novelty of the approach consists of applying EWT to preprocess the data, combining an AR method with GPR, and using the generated residuals as predictors to increase the algorithm's precision. A future development of the method may be developed by adding a closed-loop system methodology in which the method is executed until the residuals decrease no more.

RQ
Rational quadratic

MAPE
Mean absolute percentage error

RMSE
Root mean square error

MAE
Mean absolute error