Short Term Electric Power Load Forecasting Using Principal Component Analysis and Recurrent Neural Networks

Electrical load forecasting study is required in electric power systems for different applications with respect to the specific time horizon, such as optimal operations, grid stability, Demand Side Management (DSM) and long-term strategic planning. In this context, machine learning and data analytics models represent a valuable tool to cope with the intrinsic complexity and especially design future demand-side advanced services. The main novelty in this paper is that the combination of a Recurrent Neural Network (RNN) and Principal Component Analysis (PCA) techniques is proposed to improve the forecasting capability of the hourly load on an electric power substation. A historical dataset of measured loads related to a 33/11 kV MV substation is considered in India as a case study, in order to properly validate the designed method. Based on the presented numerical results, the proposed approach proved itself to accurately predict loads with a reduced dimensionality of input data, thus minimizing the overall computational effort.


Introduction
Nowadays, the energy system is facing a radical revolution towards a green transition, with increasing penetration of renewable energy sources (RES), migration to distributed systems, with new actors like prosumers, and storage integration, both utility scale and domestic, which represent a key technology to decouple energy production and consumption [1].
In this regard, distributed sensor architectures, digital technology, data analytics and computational tools would represent crucial enabling technologies for monitoring, forecasting and maintenance purposes, to better manage the balance between power demand and supply, and to improve embedding of distributed RES; additionally, for the particular case of stand-alone hybrid systems, energy forecasting will particularly help anticipating customers' behavior, sizing the electrical infrastructure and improving overall system reliability [2]. Therefore, forecasting capability brings helpful insights for security of energy supply, supporting power companies in providing their end-users with advanced demand-side services, and safe and stable systems.
Utility companies have several advantages with accurate load forecasting, such as reduced operation and maintenance costs, optimized management of demand supply, increased system reliability, effective long-term strategic planning for future investments [3,4]. Electrical load forecasting can be generally divided into four main categories based on forecasting time, such as very short-term, short-term [5], medium-term and long-term load forecasting [6]. Load forecasting with different applications with respect to the specific time horizon, such as optimal operations [7], grid stability [8], Demand Side Management (DSM) [9] or long-term strategic planning [10].
On the other hand, with respect to short-term load forecasting, energy trading is another important task for utilities to successfully increase revenues on the day-ahead energy market models. Power wholesale markets have around the world many different mechanisms and day-ahead or infra-day sessions, e.g., in India two categories exist based on trading time, such as Hourly Ahead Market (HAM) and Day Ahead Market (DAM). In HAM, one hour before the time of energy use, energy trading will be open. Similarly, for DAM, one day before the time of energy use, energy can be traded [11].
A methodology was developed for short-Term load forecasting by combining Light Gradient Boosting Machine (LGBM), eXtreme Gradient Boosting machine (XGB) and Multi-Layer Perceptron (MLP) models in [12]. In this hybrid model both XGB-LGBM combining is used for meta-data generation. A multi-temporal-spatial-scale temporal convolutional network was used in [13] to predict active power load. The multi-temporal-spatial-scale technique is used to minimize noise in load data. A hybrid clustering-based deep learning methodology was developed in [14] for short-term load forecasting. Clustering technique was used to make different clusters of distribution transformers based on load profile. Markov-chain mixture distribution model is developed in [15] to predict the load of residential customers by 30 min ahead. A study was done for load forecasting using various machine learning models like SVM, Random Forest and LSTM [16] both individually and with a fusion prediction approach. Short-Term load forecasting was done using convolutional neural networks (CNN) and sequence models like LSTM and GRU in [17]. CNN was used for feature extraction and sequence models are used for load forecasting. A CNN and Deep Residual Network based machine learning model was developed in [18] for short-Term load forecasting. Various regression models along with correlation concept for dimensionality reduction were used for load forecasting in [19]. LSTM and factor analysis based deep learning model was developed in [20] for load foresting within a smart cities environment. Artificial neural network based machine learning models were developed both for photovoltaic power forecasting [21], and load forecasting on MV distribution networks [22]. Most of the papers on probabilistic renewable generation forecasting literature over the last ten years or so have focused on different variants of statistical and machine learning approaches: in [23] a comparison of non-parametric approaches to this probabilistic forecasting problem has been performed. All these methodologies in literature contributed significantly to face short-term electric power load forecasting problems. In order to improve the forecasting accuracy and also to build a light weight model for active power load forecasting applied to a 33/11 kV substation, a new approach was developed in this paper by using recurrent neural networks for load forecasting and Principal Component Analysis for dimensional reduction.
The novelty of the proposed approach consists in a hybrid approach combining the heterogeneous input structure with PCA: in particular, the new approach considers the temporal impact of the previous three hours data and three days at the same hour data, and the previous three weeks at the same hour data, thus enabling the model to predict load with good accuracy by properly capturing temporal resolution diversity (e.g., the weekends load pattern); additionally, PCA is able to extract the most essential features from the given nine input information, thus compacting the input layer and reducing computational load, maintaining the same overall accuracy. The combination of RNN and PCA is used for the first time in short-term load forecasting problem. RNN models were trained using self adaptive Adam optimizer as shown in [24]. Complete literature summary on shortterm load forecasting domain with various machine learning approaches is presented in Table 1. All these methodologies provides valuable contribution towards short-term load forecasting but have some limitations like model complexity, accuracy and weekly impact not considered. In this paper, accuracy in load prediction is improved by tuning the RNN model parameters, model complexity reduced by using Principle Component Analysis and weekly impact considered by using features like P(h − 168), P(h − 336) and P(h − 504).

Reference Year Contribution
Disadvantage [12] 2021 novel stacking ensemble-based algorithm Model complexity [13] 2021 multi-temporal-spatial-scale technique Missing Weekly impact [14] 2021 k-Medoid based algorithm Model complexity [15] 2021 Markov-chain mixture distribution model Accuracy [16] 2021 Fusion forecasting approach Accuracy [17] 2021 Bi-directional GRU and LSTM Model complexity [18] 2021 Deep Residual Network with convolution layer Model complexity [19] 2021 Regression Models Accuracy [20] 2021 LSTM and Factor Analysis Accuracy [22] 2020 ANN Accuracy 2. Methodology 2.1. Dimensionality Reduction Using Principal Component Analysis (PCA) Principal Component Analysis (PCA) uses the extraction features approach to compress the original dataset to a lower subspace feature, with the aim of maintaining most of the relevant information. Detailed procedure for most relevant feature extraction using PCA is drawn from [25].

Recurrent Neural Network (RNN)
The Recurrent Neural Network (RNN) is a network where the activation status of each hidden neuron for the previous input is used to calculate the activation status of the hidden neuron for the current input [26]. The main and most important feature of RNN is the Hidden state, which recalls some information about previous samples. This In this study, RHM-1 is designed to predict the load based on the last three hours of load, load at the same time for the last three days and loading at the same time but for the last three weeks. The architecture for the proposed RNN model is shown in Figure 1. The PCA algorithm is applied to the input features of the load dataset to find the principal components. It has been observed from the CEVR that six principal components cover almost 90% of the load dataset variance. Thus, the 9 input features i.e., P(h − 1), in each dataset sample are replaced by the corresponding six principal components. These six principal components were used to train the RHM-2. RHM-2 was therefore designed with six input neurons and one output neuron. The architecture of the proposed RNN model is the same as shown in Figure 1, where the number of inputs (N i ) is reduced to 6 by the PCA.
RDM-1 is designed to predict the load based on load at the time of forecasting for the last three days and load at forecast time but for the last three weeks. The architecture for the proposed RNN model is the same as shown in Figure 1, where only six input features are considered, i.e., The PCA algorithm is applied to the input features of the load dataset to find the principal components. Load dataset consists in total of 6 input features i.e., P(h − 24), P(h − 48), P(h − 72), P(h − 168), P(h − 336) and P(h − 504), and one output feature P(h). It has been observed from CEVR that four principal components cover almost 90% of the load dataset variance. Thus, six input features, i.e., P(h − 24), P(h − 48), P(h − 72), P(h − 168), P(h − 336) and P(h − 504) for each dataset sample, are converted into four principal components. These four principal components were used to train the RDM-2. RDM-2 was therefore designed with four input neurons and one output neuron. The architecture of the proposed RNN model is the same as shown in Figure 1, where the N i is finally reduced to 4 by the PCA. Table 2 resumes all this information about the analyzed RNN models, with respect to the considered architecture. Trained RNN model can predict P(h) based on input (X) features using Equations (1) and (2). Performance of the all these RNN models have been observed in terms of error metrics like Mean Square Error (MSE), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) [27] as shown in Equations (3)-(5), respectively.
The complete work done in this paper is presented in Figure 2.

Result Analysis
Data was captured from [28] to train and test the models. This load data consists of a total of 2184 samples (91 days × 24 h) and these data are rearranged into a 1680, i.e., (2184 − (3(W) × 7(D) × 24(h))) × 10 matrix. The first nine columns represent nine input features, whereas the 10th column represents target output (load). Statistical features of the load dataset that have been used to train the RNN model is presented in Table 3. The frequency distribution of output load data values is represented in terms of histogram plot as shown in Figure 3,

Load Forecasting for HAM-(RHM-1)
The train and test datasets for "RHM-1" comprise a total of 1680 observations. Out of 1680 load data samples, 1512 were chosen for training and 168 for validation. For several hidden neurons the performance of the model in terms of performance metrics was observed during both training and testing, as shown in Table 4. Table 4 indicates that models performance in terms of training and test accuracy grows to 13 hidden neurons. If the number of occluded neurons exceeds 13 it is excessively fit and leads in further test errors. At this point, the "RHM-1" is deemed an optimum model, with 13 hidden neurons. In addition, the number of hidden layers in the RNN model was raised to boost the model's performance (RHM-1). The performance of the model with different levels was measured using the performance metrics as illustrated in Table 5. Each hidden layer consists of 13 neurons. It was seen from Table 5 that the model performs well with only one hidden layer. The test error values rise for the same loss of training if the number of hidden layers is more than one. This indicates that if the number of hidden layers is greater than one, then the model gets overfit. Furthermore, as the number of hidden layers rises, the number of training parameters increases the needed memory and processing time. The suggested model, i.e., RNN-HAM-Mode11, has been trained 10 fold in the same data set and is judged to be the ideal load prediction model in real time when the best values for training and validation errors were given. The performance of the suggested 'RHM-1' model is observed in stochastic environment and shown in Table 6. For all error matrices that reflect the sturdy behavior of the "RHM-1" architecture, the standard deviation is noted to be virtually null.

Load Forecasting for HAM-(RHM-2)
The PCA algorithm is applied to the input features of the load dataset to find the principal components. The total variance in the dataset covered by each principal component and the cumulative variance covered are shown in Figure 4. It shows that six principal components cover almost 90% of the variance in the load dataset. Outcome of PCA that feeda as input to RNN for first 10 datasamples are presented in Table 7.  The suggested "RNN-HAM-Model2" has been trained and tested with different number of hidden neurons to detect the optimal "RNN-HAM-Model2". The model is observed in terms of performance measures in the form of Table 8 throughout both the training and testing. The performance of the model has been growing up to 11 hidden neurons with regard to training and test accuracy, the Table 8 was found. So the optimum model is at this point in "RHM-2" with 11 hidden neurons. The number of layers covered by this model (RHM-2), which was meant to predict loading one hour sooner, has also been increased. Each layer is comprised of 11 neurons and performance measurement metrics as shown in Table 9 of a model with different layers were observed. In Table 9, good test performance was found with only one hidden layer. If the number of layers concealed is more than one, then the numbers for the test error rise, then it is overfit if the number of layers hidden is higher than the one. Table 9. Impact of hidden layers on the performance of the model "RHM-2". The model, i.e., RNN-HAM-Model 2, is trained ten times in an identical data set and is regarded to be the ideal model for real-time load prediction for training and validation errors. Table 10 presents the performance of the suggested model, that is, 'RHM-2' inside stochastic environments, and it is shown that for all error matrices, which indicate strong performance of the RHM-2 architecture, a standard deviation of practically zero is present. In Table 11, An original model, i.e., the RHM-1, is compared with the compressed model, i.e., RHM-2. The RHM-2 is tiny with 210 parameters in relation to the 313 RHM-1. Due to the little dimensional compression of the model, RHM-2 losses compared to RHM-1 are somewhat greater. Although the workout parameters of "RHM-2" were compressed in 32.91%, losses of MSE, RMSE and MAE correspondingly rose by 4.5%, 1.7% and 5%.

Load Forecasting for DAM-(RDM-1)
The suggested model is conditioned and assessed using different numbers of hidden neurons in order to identify the best RDM-1. In terms of the performance metrics provided in Table 12, the model performance during training and testing is noted. The outputs of a model have been seen in Table 12 as regards training and test accuracy increases up to 13 hidden neurons. RDM-1 is deemed an optimum model at this moment with 13 hidden neurons. In order to enhance performance (RDM-1), the numbers of hidden layers in the RNN models are increased. There are 13 neurons per hidden layer, the performance of which is demonstrated in Table 13 is illustrated by performance metrics for the model with different layers. The model with only one hidden layer has been noticeable in Table 13 for a positive test performance. The model recommended, i.e., RDM-1 was trained on the same dataset 10 times and is regarded the best way to forecast loads in real time in terms of training and validation errors. Statistical analyses of the training behaviour, shown in Table 14, indicate that the standard deviation in the RDM-1 Architecture is practically zero for all error matrices described as robust behaviour.

Load Forecasting for DAM-(RDM-2)
The PCA algorithm is applied to the input features of the load dataset to find the principal components. Load dataset consists of a total of 6 input features, i.e., P(h − 24), P(h − 48), P(h − 72), P(h − 168), P(h − 336) and P(h − 504), and one output P(h). The total variance in the dataset covered by each principal component and the cumulative variance covered are shown in Figure 5. Figure 5 shows that four principal components cover almost 90% of the variance in the load dataset. Thus, six input features, i.e., P(h − 24), P(h − 48), P(h − 72), P(h − 168), P(h − 336) and P(h − 504) for each dataset sample, are translated into four principal components. These four principal components were used to train the RDM-2. RDM-2 was therefore equipped with four input neurons and one output neuron. In order to find the optimal "RDM-2" in terms of the number of hidden neurons, the proposed "RDM-2" is equipped and evaluated with different numbers of hidden neurons. The performance of the model during both training and testing is observed in terms of performance metrics as shown in Table 15. From Table 15, it has been observed that the performance of the model has increased to 7 hidden neurons in terms of training and test accuracy. At this point, therefore, RDM-2 with 7 hidden neurons is considered to be an optimal model. In addition, there have been higher numbers of hidden layers to improve the model's efficiency (RDM-2) for load prediction. Each hidden layer has 7 neurons and performance metrics as given in Table 15 demonstrate the output of the model with different layers. From Table 16, good test performance with just one hidden layer has been noticed. If the number is more than one, the values for the test error are increased, the model becomes over-fit if the number of hidden layers is higher than one. The recommended model, i.e., RDM-2, is trained ten times on the same data set and is deemed an ideal model for forecasting the load in real time when it has given the best values in relation to training and validation errors. In Table 17 the statistical analysis of the suggested model workouts reveals that the standard deviation is practically Nil for all the error matrices defining resilient behaviour in the RDM-2 architecture. In Table 18, the comparison is shown to the original model, namely the RDM-1, and the compressed model. In comparison with the RDM-1 with 274 parameters the size of RDM-2 is modest with 92 parameters. The model RDM-2 exhibited somewhat higher test losses than the model RDM-1, due to the distortion of the model with the lower dimensionality. Although the training size of "RDM-2" has been reduced by 66.42%, losses, i.e, MSE, RMSE and MAE correspondingly have risen by 2.5%, 0.7% and 1.9%.

Comparative Result Analysis
The performance of the proposed RNN model was verified by comparing with ANN models [22,29,30], Regression models [19] and LSTM model [20] as presented in Table 19. It can be observed that the RNN model was able to predict the load with good accuracy. The performance of the model was compared statistically with models proposed in [22,29,30] and statistical metrics presented in Table 20, showing that the proposed RNN model is statistically robust with zero standard deviation. The comparison with real load on 30 November 2018 of the loads forecast is shown in Figure 6 utilising several suggested RNN models for hourly and day ahead markets. The expected load of RHM-1 and RHM-2 is closer to real load than RDM-1 and RDM-2, since the former model forecast loads an hour earlier and one day in advance.
In Table 21, the total training time for various RNN systems with varying batch sizes is reported. As clearly shown, if we refer to batch size 32 (last row) the number of backpropagation is significantly reduced with respect to batch size 1, thus resulting in a lower computational effort as wanted by the authors' initial design.  In order to show the advantages of using a non-linear approach, the performance of the proposed RNN model was verified by comparing with commonly used linear models like Auto Regression (AR) [31], Moving Average (MA) [32], Auto-regressive Moving Average (ARMA) [33], Auto-regressive Integrated Moving Average (ARIMA) [34] and Simple Exponential Smoothing (SES) [35], as presented in Table 22. It can be observed that the RNN model was able to perform better than traditional linear methods in terms of both RMSE and MAE values of predicted load. Although some concerns have been reported in literature with respect to using MAE as an accuracy indicator [36], we preferred to show both RMSE and MAE error metrics for the sake of comparison with results in previously cited references.

Conclusions
An accurate short-term projection of the electric load allows utilities to efficiently sell their electricity and manage the system on more steady, trustworthy expected information.
In order to ensure that utilities can efficiently trade in energy, the authors proposed different RNN models, notably RHM-1 and RDM-1 for predicting the load accurately. Lightweight models, i.e., RHM-2 and RDM-2, present reduced input features by means of PCA. These light weight models predicted the load with nearly the almost near accu-racy as the original ones but reducing the complexity of the model a lot comparing to original models.
In this paper, real time load data were obtained from a 33/11 kV substation near the Kakatiya University in Warangal (India) for training and testing different RNN models in a practical case study. In order to identify outliers and also to observe the skewedness of data, suitable preprocessing techniques were employed.
The suggested RNN models were verified in terms of error measures by correlating them to those reported in the literature. Randomness in forecast using suggested RNN models is noticed and compared to current models.
Future works could additional take into account external factors and habits, e.g., climate, weather conditions and particular human behavioral patterns.