Forecasting of Electric Load Using a Hybrid LSTM-Neural Prophet Model

Load forecasting (LF) is an essential factor in power system management. LF helps the utility maximize the utilization of power-generating plants and schedule them both reliably and economically. In this paper, a novel and hybrid forecasting method is proposed, combining a long short-term memory network (LSTM) and neural prophet (NP) through an artificial neural network. The paper aims to predict electric load for different time horizons with improved accuracy as well as consistency. The proposed model uses historical load data, weather data, and statistical features obtained from the historical data. Multiple case studies have been conducted with two different real-time data sets on three different types of load forecasting. The hybrid model is later compared with a few established methods of load forecasting found in the literature with different performance metrics: mean average percentage error (MAPE), root mean square error (RMSE), sum of square error (SSE), and regression coefficient (R). Moreover, a guideline with various attributes is provided for different types of load forecasting considering the applications of the proposed model. The results and comparisons from our test cases showed that the proposed hybrid model improved the forecasting accuracy for three different types of load forecasting over other forecasting techniques.


Introduction
Forecasting is the prediction of the behavior of elements that are intermittent in nature [1] based on historical data or from other parameters related to the elements. Forecasting is an important tool for any industry as it helps to plan ahead and choose the best possible solution [2]. Researchers use different artificial intelligence (AI)-based techniques for different types of forecasting. AI techniques are based on the development of computational techniques and algorithms that automatically improve from experience and learn from historical data. Some AI-based forecasting models include machine learning, deep learning, neural network, and support vector machines [3]. These methods are also combined with lagged and additional regressors (i.e., weather information, statistical features, demographic data) to improve their accuracy. Although these AI models are more complex than traditional forecasting models, they provide more accurate predictions for different types of forecasting [3]. Nevertheless, not all of them are suitable for different types of forecasting (i.e., some may provide better solutions for a short prediction period, some may require a larger data set, some may require more time, etc). Researchers have also tried to combine several of these models with classical models to improve the accuracy of forecasting, as well as its interpretability [4].
Electric load forecasting or load forecasting (LF) is an estimation of load in advance to predict the behavior of load consumption in an area. In the power industry, load forecasting has already become a vital tool. LF has also become an essential part of modern grids due to the high utilization of renewable energy resources such as wind and solar [5]. Starting from generation, through to end-user consumption, along with transmission and distribution, all the companies in these sectors need load forecasting to plan their scheduling and to maintain the system's reliability [6]. Moreover, LF has now become more vital with the inclusion of smart grids, or smart energy management systems, as they require accurate prediction to ensure optimum grid performance. LF depends on several components. Major factors that affect LF are [7]: • Demographic factors: population, income, type of industry, and so on; • Time factors: seasonal effects, day of the week, hour of the day, holidays, and so on; • Weather factors: temperature, dew point, humidity, wind speed, cloud cover, and so on; • Pricing factors: real-time electricity pricing, fuel pricing.
These factors impact differently on LF depending on the types of forecasting. LF can generally be classified into three categories [8]: (a) short-term load forecasting, (b) mediumterm load forecasting, and (c) long-term load forecasting. Short-term load forecasting (STLF) predicts the load in the range of minutes to weeks ahead [9]. STLF is essential to perform daily operations, such as load flow estimation (to prevent overloading and take necessary corrective actions), scheduling the generating units economically, and so on. Medium-term load forecasting (MTLF) has a time horizon over weeks to months. MTLF is helpful for planning maintenance of the generating units [10], predicting the necessary power required to purchase from or sell to the neighboring networks, and scheduling energy storage facilities [11]. Finally, long-term load forecasting (LTLF) has a time horizon over years to decades [9]. LTLF is vital to make decisions regarding increasing the number of generating units, estimating the fuel supply required for the future, and so on.
Over the decades, LF has been performed using many different methods with reasonable prediction accuracy. Among all the techniques, the regression model is one of the simpler and more traditional used for load forecasting [12,13]. Different types of regression analysis have been used for load forecasting, such as linear regression [14], multiple regression [15], exponential regression [16], and so on. The authors used an incremental regression tree to predict the load in [17] using historical data, and support vector regression was used to predict load forecasting [18]. The authors in [19] used deep residual networks to predict the load for the short term only but did not investigate the performance of the model for medium-or long-term forecasting. Most of these are only used for short-term load forecasting as they are not suitable for long-term load forecasting. Another popular way to predict different kinds of load forecasting is using artificial neural networks [20][21][22]. In [10], the authors used an improved artificial neural network (ANN) technique to forecast short-term load using 10-year historical data for New England and showed the proposed model performed better than the regular ANN. However, the prediction was only performed for STLF, while performance on either MTLF or LTLF of the model was not shown in the article. Researchers have recently been using deep neural networks (DNN) for load forecasting [7,23]. For example, DNN was used to predict substation-based hourly load forecast in [24], and residential hourly load forecast in [25]. Long short-term memory was used in [26][27][28] to predict short-term load forecasting for a small region due to its recurrent nature. Most of the techniques that have been implemented to predict the load are focused on short-term load forecasting. Therefore, more research is needed on mediumand long-term load forecasting to improve the accuracy [29]. The aim of this paper is to predict load for three different types of forecasts with different prediction periods.
As technologies are evolving, additional new parameters such as integration of renewable resources and weather-related events are affecting prediction accuracy. In [16], the authors showed that the mean average percentage error for predicting MTLF was approximately 10 using different regression techniques. However, this number is on the higher side of the expected outcomes for this kind of forecasting [11]. Chen et al. used a support vector machine to solve a daily maximum peak load forecasting problem where the prediction period was one month [30]. The paper did not show the effects of adding weather data as input features on the forecasting results. The authors used decision trees to perform a long-term forecast in [31], but the error percentage of the model was on the higher side. Short-term correlation data were extracted from each of these models, and later, an iterative algorithm was designed for forecasting long-term electric load consumption by decomposing the forecasting problem into multiple simple linear regression models in [32]. Researchers have also been trying to use enhanced models and combine multiple models for power systems applications [33]. In [11], the authors tested their enhanced deep network model for medium-term load forecasting, resulting in improved accuracy compared to the traditional model. However, these deep networks tend to take a lot of time for training, which is true for this case as well. Authors in [34][35][36][37] combined different classical models and/or different AI-based models to predict the load of a power system and showed that the hybrid models improve the accuracy of the prediction.
From the discussion above, it can be stated that load forecasting is an important topic and many methods have been proposed to solve the forecasting problem. However, with the rapid increase of time series data and machine learning techniques, developing explainable forecasting techniques still remains a challenging task in the decision-making process. That is why hybrid solutions are needed to decrease the gap between classical interpretable techniques and machine learning models. Moreover, most of the load forecasting methods discussed above are focused on predicting load for a short time horizon, where higher accuracy can be found quite often. However, it is harder to get high accuracy for long-term forecast models. In this paper, a novel hybrid solution is proposed by combining neural prophet (NP) with long short-term memory (LSTM) network through an ANN that can perform prediction with better accuracy than the individual ones. Selection of LSTM for the hybrid model is due to LSTM's recurrent nature and the fact that short-term power demand is closely related to its previous time step values. NP is one of the newest entry in the field of time series forecasting, and its application in electric load forecasting is yet to be explored. NP can be more interpretable than the traditional AI-based models [4]. Thus, this paper adds these interpretable elements to the solution. The paper also focuses on both short-term and long-term load forecasting. Multiple real-world tests of the model are performed using a real data set from the National Renewable Energy Laboratory (NREL), USA, and the Electric Reliability Council of Texas (ERCOT) for different kinds of load forecasting.

Methodology of the Proposed Method
The hybrid model is a combination of LSTM and NP, and results from these models are then fed into an ANN to get the forecasting. Therefore, the hybrid model consists of three different models: ANN, LSTM, and NP. The basics of each of these models are presented here, and later, the structure of our proposed hybrid model is discussed.

Artificial Neural Network (ANN)
An ANN [38] is a mathematical model that mimics the function of the brain and is useful for pattern recognition, optimization, and prediction. Figure 1 represents a basic ANN model. An ANN model can be implemented with different algorithms. For the hybrid model, backpropagation is used to implement the ANN model, as it is empirically observed that this method provides higher accuracy for the data sets. In the figure, w ij is the weight vector that links between the input and the hidden layer, and w jk is the weight vector that connects between the hidden layer and the output layer. Steps required for the implementation of the ANN with backpropagation using the Levenberg-Marquardt (LM) algorithm are provided below.
Step 1: Initialization of the weights and input parameters; Step 2: Training and propagating the data set using the network; Step 3: Minimization of the error with the comparison between the actual and predicted results; Step 4: Updating the weights using the Levenberg-Marquardt algorithm and repeating this process for each pattern; Step 5: Continuation of the process until it matches the tolerance level. Weights can be updated with various algorithms. The LM algorithm is used for the hybrid model because it requires less time to train the model even though it requires higher memory. Therefore, the larger the data sets are, the higher the requirements for memory will be. However, training time is an important factor for STLF. The hybrid model consists of three models, and among them, the LSTM model is computationally expensive. As a result, the training time for ANN becomes a vital factor for the hybrid model. For this reason, the LM algorithm is chosen for the model. However, the Bayesian regularization algorithm can be used for LTLF, as training time is not a concern for this type of load forecasting. The performance index for the Levenberg-Marquardt algorithm can be expressed by Equation (1) [39].
where w = [w 1 , w 2 , ...., w n ] T = weights of the network, k = number of outputs, n = number of the weights, p = number of patterns, P k = k th predicted value, and A k = k th actual value. The update rule for the weights is provided by Equations (2) and (3).
where J = Jacobian matrix that includes derivatives of each error to each weight, µ = Levenberg's damping factor, and e = error vector. If µ is very large, the approximated gradient descent is used, and if µ is small, the Gauss-Newton method is used.

Long Short-Term Memory (LSTM)
LSTM is one kind of recurrent neural network (RNN) model, which can solve the shortterm dependency problem by learning the long-term dependencies of the parameters [40]. The basic structure for the LSTM model consists of four layers, which are represented in Figure 2. The four critical layers of LSTM are forget gate layer, input gate layer, memory cell layer, and output gate layer. The equations that describe the gate layer are provided in Equations (4)- (9).
where x is input signal, f is forget layer cell, h is hidden layer, C is candidate hidden state, C is the unit's internal memory, o t is the output, U is the weight matrix that connects the input layer to the hidden layer, and W is the connection between the previous and current hidden layer.

Prophet and Neural Prophet (NP)
Prophet, also known as Fbprophet, is a decomposable time series forecasting model developed by Facebook's Core Data Science Team [41]. NP consists of different components such as trend, seasonality, auto-regression, additional regressors, and so on. Prophet has three main model components, which are trend, seasonality, and holidays. These components are combined with Equation (10). Here, • g(t) is a trend-modeling function that can be specified as a linear function or a logistic function; • s(t) represents a seasonality function that can be daily, weekly, and/or yearly, which is handled with Fourier terms; • h(t) is a holiday function that considers the effect of holidays, which occur irregularly; • e(t) represents the error changes that are not fitted by the model.
Neural prophet (NP) [4] is a successor of FbProphet that has not been introduced in the field of power load forecasting. The fundamental difference between these two methods is that NP integrates deep learning terms to the equations which are fitted on lagged data. The hyperparameters of this model can also be tuned automatically for optimum performance. Otherwise, NP has the same basic design flow and forecasting process, with similar model components. Figure 3 represents forecasting design flow in FbProphet and neural prophet.
In the forecasting process, the forecasting of time series data is initially produced using different parameters and specifications that have direct human interpretation. After that, the forecasting performance is evaluated in the model, and if any problem arises (i.e., poor performance), the model will notify a human analyst to intervene. The analyst can then adjust the model properly based on its feedback.

Combining LSTM with Neural Prophet through ANN
In the hybrid model, output from the LSTM and the neural prophet are combined through an artificial neural network. The block diagram of the proposed hybrid model can be found in Figure 4. First, both NP and LSTM are used separately to produce output features f N , and f L , respectively, with time series data. Later, these outputs are fed into the ANN model to get the final prediction. However, the ANN model takes temperature as an additional regressor to improve the accuracy of our hybrid model. The ANN model also considers other features along with them (i.e., time series data, statistical parameters from historical load) to produce the forecasted load. Other weather parameters are also considered with temperature, but temperature is found to be the most correlated parameter among all the different weather-related features. In order to make sure the accuracy is not hampered due to one of the methods' shortcomings, a safety net is added to the hybrid model. If the performance of one of the individual methods in terms of accuracy drops significantly below the other one, then the lower-performing model will be excluded from the model. The threshold for this action is selected to be a 2% mean average percentage error. The selection of this specific number was decided based on a trial-and-error method by performing tests on different data sets.

Training of the Model
In order to train and test the model, a data set from a city in Florida is initially used. For this, the model is trained with hourly load demand data sets from 2016-2019 for the city, which are collected from [42], and weather information, taken from [43]. Temperature data is taken at different city locations, and the calculated average is used for the training and testing phase. Three years of load demand, from 2016 to 2018, are used for the training phase, and the data set is split into 75:25 for training and validation. The model is tested using the data set of year 2019. The variation of load for 2016, 2017, and 2018 are provided in Figure 5. From the figure, it can be seen that the load consumption over the years is pretty similar, with slight variations. However, there are power dips for very short periods in the month of September in 2016 and 2017 and in the month of October in 2018. All these drops are caused by hurricanes, as power lines are destroyed by them. There are some power spikes that can be found in the month of January in 2016, 2017, and 2018 due to sudden drop of temperature during those periods. There is also a power spike in the month of July in 2018, along with some other spikes, which could be because of festivals, the addition of huge loads, weather-related events, and so on. Next, the 15-year electrical load data set for the state of Texas collected from [44] is used to train and test the model. The three kinds of time horizons of load forecasting used to test the model are mentioned below.

1.
Hour Ahead Load Forecasting: For this case, the objective is to forecast the load every day one hour ahead of time. Two prediction periods are used here: one for winter, and one for summer. The training is performed using different statistical and weather features, and it is empirically observed that the selection of the variables mentioned below provides better results [45]. Moreover, there is a close relationship between temperature and power load. Day Ahead Load Forecasting: The objective for this type of forecasting is to forecast electrical load consumption one day ahead. Therefore, the one-hour prior load cannot be used as an input variable for prediction. That is why the previous 24 h average load and 1-day prior load are used as input features for this type of forecasting. Other input variables used for this case are similar to the previous one. However, the temperature needs to be predicted a day ahead as well. According to [46], the error percentage for day ahead weather temperature is less than 1.5. Therefore, the error in predicted temperature will not have any significant impact on the results. Two prediction periods are used for this case as well, winter and summer. 3.
Year Ahead Load Forecasting: The objective of this forecasting is to estimate daily peak load consumption a year ahead. The error percentage is relatively high for predicting weather temperature a year ahead, which will affect our results significantly. Initially, predicted temperature is added as an input feature to predict the load. Later, this forecasting is performed only based on the historical load data set, ignoring the temperature. The rest of the input variables used for this case are provided in the list below. Four performance metrics are used to evaluate the test result. They are the mean average percentage error (MAPE), the root mean square error (RMSE), the sum of square error (SSE), and the coefficient of determination (R 2 ). Equations (11)- (14) are used for calculating the performance parameters, where N is the number of data points, A t is the actual value, and F t is the forecasted value.

Hyperparameter Tuning
With the three years of training data set, different models are trained. The proposed hybrid model is compared with regression tree (RT), ANN, LSTM, and NP. The number of hidden units for ANN is optimized based on the error percentage. The training is done multiple times with different numbers of hidden units, and the number of hidden units is chosen from the best solution. The momentum of the ANN model is set to 0.9. For the case of LSTM model, if the hidden unit's number is increased, the performance improves, but it also takes more time to train the model. At a certain point, the performance does not improve significantly; rather, it becomes computationally expensive. Therefore, the number of hidden units is chosen to be 200 to train the LSTM model to get the best possible outcome considering both situations. For the case of the NP model, the number of hidden layers is chosen to be 2, as the model gives good results with this number. Other important hyperparameters used for the hybrid model are provided in Table 1. Optimized parameters of the table are dissimilar for different data sets. The same hyperparameters are also used for testing each model individually. In the case of RT, the performance depends on the size of the leaf node. The leaf size of the RT model is optimized with an optimization algorithm where the objective is to minimize the percentage of error for different cases while changing the number of leaf sizes. After training the models, the hour ahead load forecasting has been performed for different times of the year in 2019. All the training and testing have been performed using the same configured laptop. The performances of different techniques for load forecast for a city in Florida are shown in Figures 6-9 for the first seven days of January 2019. From the results, it can be clearly seen that the proposed hybrid model performs better than all other models.    The model is tested for the summer season as well. Figure 10 shows the performance of the different techniques along with the hybrid model during the summer season for a city in Florida. The result is consistent with the previous one, as the proposed hybrid model outperforms other models in this paper. However, the LSTM model is the one that performs closest to the hybrid model. In order to check the performance of the model, a different data set from northern Texas during the winter season is also used. In all three cases, the proposed hybrid model improves the forecasting accuracy compared to the models if applied separately.
A detailed comparison of hour ahead load forecasting with different data sets with the three performance metrics can be found in Table 2. From the table, it can be seen that the hybrid model has the lowest RMSE, and its regression coefficient is the closest to 1 compared to the other models. Later, hour ahead LF was performed for different years on the Florida data set to check the consistency of the hybrid model's performance. From Table 3, it can be seen that the hybrid model outperforms the other models during both the summer and the winter periods. Comparing the hybrid model to its closest-performing model, the hybrid model reduces the error during winter by 0.33%, 1.5%, 0%, 5.4%, and 6.4% for the year 2020, 2019, 2018, 2017, and 2016 respectively. The hybrid model reduces the error during summer by 1.6%, 5.4%, 2.5%, 3.7%, and 8.6% for the year 2020, 2019, 2018, 2017, and 2016, respectively. Therefore, the average improvement of accuracy in forecasting by the hybrid model is almost 3.5% higher than its closest-performing model for this type of forecasting.   In this case, the objective is to predict a day ahead electric load for both seasons with different data sets. The hybrid model performs the best for the day ahead LF, which can be seen in Figure 11. The error percentage is higher in this case than the earlier one, which is expected due to an increase in the forecasting horizon. From Figure 11, it can be seen that the error mainly occurs on the maximum or minimum point. These peak points are harder to predict accurately because the conditions for daily peak load are not constant. They can vary due to the demography of the place, industrial factors, sudden changes in weather, and so on. If these events can be predicted or known quite early, the error percentage can be much less. A detailed comparison of the other AI techniques' performance with the hybrid model using different data sets can be found in Table 4. However, comparing the model that performs closest to the hybrid model, the hybrid model improves the accuracy by 3.8% for the Florida data set during the winter season, 8.1% during the summer season, and 16.1% for the northern Texas data set during the winter season. The hybrid model has also performed better in all four performance metrics.  In order to test the impact of other weather parameters on load forecasting, two additional input signals, wind speed and relative humidity, are introduced to our model. Now, wind speed is fed into the artificial neural network instead of temperature as one of the inputs. Adding wind speed as one of the features does not improve the accuracy; rather, it increases the error percentage to 6.3324 from 5.39 during summer. The experiment is also performed with relative humidity, resulting in MAPE of 6.0476 for the city of Florida data set. Therefore, it is evident that the temperature has a significant positive effect on load forecasting results compared to the other weather parameters. Later, forecasting is performed while considering temperature along with wind speed and relative humidity as input features. The performance parameters do not show any improvement over the model where only the temperature is fed into the neural network. Therefore, the hybrid model only uses temperature instead of any other weather parameters to produce the prediction. As a result, the number of measurement features is reduced. Day ahead LF has also been performed for different years on the Florida data set to check the consistency of the model for this type of forecasting. From Table 5, it can be seen that the hybrid model consistently outperforms individual AI techniques in terms of our performance metrics. Comparing the hybrid model to its closest-performing model, the hybrid model reduces the error during winter by 4%, 3.8%, 6.4%, 11.6%, and 9.2% for the year 2020, 2019, 2018, 2017, and 2016, respectively. Therefore, the average improvement of accuracy in forecasting by the hybrid model is almost 8% higher than its closest-performing model for this type of forecasting.

Case III: Year Ahead Load Forecasting
Initially, the year ahead LF is performed on the Florida data set, which consists of three years of data. For long-term forecasting, it has been observed that daily peak load points are harder to predict, and the objective for this type of forecasting is to predict the daily peak load one year ahead. The results of year ahead load forecasting can be seen in Figure 12. From the results, it can be stated that the hybrid model improves the accuracy of electric load prediction. However, the LSTM does not perform well for this load forecasting. Therefore, the hybrid model ignores the LSTM model, and only the output from the NP model is fed into the neural network to produce the forecast. An improved LSTM model for this type of forecasting can result in better performance of the hybrid model, as the hybrid model will again consider LSTM features. As this is a year ahead prediction, the temperature needs to be known one year ahead; historical temperature data from 2013-2018 are used to perform a temperature prediction with NP. The predicted temperature has an error percentage of about 15%. Then, year ahead LF is performed again considering the predicted temperature. The higher error percentage in predicted temperature is also the reason that the accuracy of this load forecasting is less than the other two cases. The year ahead LF is also performed with the Texas data set, which has 15 years of historical data. The hybrid model provides the best solution for this case as well, which is evident in Figure 13. All the other models perform similarly to the Florida data set. From Table 6, it can be seen that the hybrid model improves the accuracy compared to its closest-performing model by 2.65% for the Florida data set and by 1.02% for the Texas data set. The detailed comparisons with four performance metrics can be found in Table 6 for all data sets. Even if a more extensive data set is used for the Texas region, it does not impact our results significantly. Therefore, it can be stated that the hybrid model does not require a huge data set to provide a good solution.

Conclusions
In this paper, a novel hybrid LSTM-NP model is proposed to predict the electricity load at a utility-level scale. Three different types of forecasts are performed, which include hour ahead, day ahead, and year ahead load forecasting, to test the performance of the model. In order to analyze the results, four statistical performance metrics are also used. The results obtained from the test cases show that the hybrid model provides higher accuracy in all three different types of forecasting than the models compared here. Furthermore, two different data sets are used to test the model's consistency, where one set is of a small city, and the other set is a big part of a state. The hybrid model consistently outperforms the other techniques in both cases. A summarized comparison of these techniques in different types of forecasting with recommendations is provided in Table 7. This paper has also explored the effects of other weather information, such as wind speed and humidity, to improve forecasting accuracy. From the results, it can be said that temperature is the principal weather factor that affects the results positively. However, the model can perform better for year ahead load forecasting if the accuracy of the predicted temperature can be improved.

Attributes Hour Ahead Load Forecasting Day Ahead Load Forecasting Year Ahead Load Forecasting
Training Time LSTM takes the highest time for training, ranging from 10 to 20 min depending on the size of the data and number of epochs. The other three take seconds to a few minutes. However, the hybrid model takes the most time as it combines LSTM with NP.
As this type of forecasting will have one day for the prediction, training time does not have much significance.
As this type of forecasting will be required to perform one year ahead, training time does not have much significance.

Additional Regressor
The hybrid model uses temperature as an additional regressor, which improves the accuracy. LSTM and NP do not use any additional regressors for their forecasting when used separately.
Similar to hour ahead load forecasting.
Any kind of weather information is very hard to incorporate as the regressor will also need to be known a year ahead. Very accurate predicted weather features can improve the model's accuracy.

Size of Data Requirement
A minimum of 3 years of hourly data is good enough for this kind of forecasting, as the accuracy does not go below 98%.
Using 5 years of hourly data for Texas instead of 3 years does not change the result significantly for all the techniques, including the hybrid model. Therefore, a minimum of 3 years gives quite accurate results.
Using 15 years of data for Texas improves the accuracy by 1 percent over using 3 years of data set. Therefore, the hybrid model performs quite well with less historical data.

Overall Performance
The merit order for all the techniques mentioned here is hybrid, LSTM, NP, ANN, and RT.
The merit order for all the techniques mentioned here is hybrid, LSTM, NP, ANN, and RT.
The merit order for all the techniques mentioned here is hybrid, NP, ANN, RT, and LSTM.