1. Introduction
Short-term load forecasting has an important impact on major decisions, such as the day-to-day operation of the power grid and dispatch planning [
1]. Many scholars have carried out numerous studies on power system load prediction, among which the traditional prediction methods include the time series method [
2], the regression analysis method [
3,
4], trend extrapolation [
5], etc. The implementation principle of these methods is simple, fast operation speeds that are suitable for processing data characteristics of a single and small data set. However, despite their nonlinear characteristics, large data volumes, lack of robustness, and poor adaptability, modern load prediction methods mainly include gray mathematical theory [
6,
7], fuzzy prediction method [
8], neural network method [
9,
10], and so on. In recent years, artificial intelligence has come to be widely used in image processing, speech recognition, power systems [
11], and other fields. In the smart grid, artificial intelligence is widely used in power generation, transmission, power distribution, and the power market, while on the power side, accurate load prediction is carried out using the artificial intelligence algorithm [
12], which effectively reduces the cost of power generation, making reasonable power generation plans for the power system. Through the real-time prediction of user-side power consumption, the power grid dispatch work is carried out in a punctual and appropriate manner, so as to maintain the safe and stable operation of the power grid.
Deep learning algorithms exhibit a good ability to extract data characteristics when processing large amounts of power data, and power system load prediction is designed to extract typical features from complex and variable historical load data, so as to make accurate load predictions. The power system load data is typical time-series data. Therefore, the use of deep learning algorithms to process the load data achieves better results. The literature has put forward a time series decomposition model [
13], which effectively reflects the factors that affect load prediction in order to achieve the accurate prediction of load data; however, it is easy to ignore the correlation between time periods, and the load prediction of some time periods is significantly biased. The paper proposes a time series prediction method based on the lifting of wavelets [
14], which predicts the electricity consumption in residential areas by means of denoising the historical load data, exhibiting a stronger nonlinear feature extraction ability than the time series model deep learning algorithm. Studying the factors influencing load prediction, a load prediction model based on artificial neural network (ANN) was developed [
15]; however, the model is easily caught in local extreme values and lacks the modeling of time factors during ANN training. By using an ant colony optimization algorithm to optimize the recurrent neural network (RNN) prediction model, the prediction accuracy of traditional RNN is improved [
16], there is a problem of gradient disappearance and long-term dependence. To improve the accuracy of traditional RNN prediction, the problems of gradient disappearance and long-term dependence need to be solved, and the authors adopt the long-term and short-term memory network (LSTM) prediction model method, combined with the real-time electricity load data, in order to solve the problems of gradient disappearance and long-term dependence [
17], but lacks consideration of the influence of historical information and future information on the current state. Self-encoders are unsupervised deep learning models with stronger feature extraction abilities, and a stack-based self-encoder prediction model was introduced that extracts the characteristics of the input data in a comprehensive way [
18], which has strong prediction accuracy and generalization ability.
Based on our research into the above literature, and by observing the shortcomings of the above prediction methods, this paper proposes a short-term load prediction model of the Seq2seq codec based on LSTM and improves the performance of the multi-layer LSTM network by adopting the Residual mechanism. The introduction of the Attention mechanism into the decoding process achieves selective feature extraction of load prediction data, improves the correlation of input and output data, improves the accuracy of model prediction, and ultimately, comparison with other prediction methods indicates that the method proposed in this paper has better prediction effect.
2. Motivation and Problem Statement
At present, although the power industry is vigorously developing huge energy storage devices, because of the special nature of electricity, it is still difficult to implement large-scale storage at this stage. The prediction of electricity load is carried out so as to reasonably plan power generation and reduce power wastage, with each increase in load forecast accuracy of 1% saving about 0.1% to 0.3% in energy costs [
19]. With the large-scale grid interconnection of new loads, such as renewable energy and flexible loads (such as electric vehicles), the components of user-involved load consumption are growing more complex every day. The uncertainty and nonlinearity of the electricity load are gradually increasing, and the relationship between the demand of the source and the load-side consumption is maintained, which is consistently increasing the accuracy of load prediction. Improving the accuracy of user-side load prediction plays an important role in power grid power planning and power scheduling [
20].
Increasing the load forecasting level saves coal and reduces power generation costs, helping to formulate reasonable power supply construction plans, which will then help in increasing the economic benefits of power systems and society [
21,
22].
A key area of investigation in power load forecasting is the means by which existing historical data can be used to establish suitable forecasting models to predict the load at a future time or in a given time period. Therefore, the reliability of historical load data and the choice of the predictive model are the main factors influencing precision. As the nonlinearity and uncertainty of the power dataset increase, the difficulty of obtaining accurate load forecasting results increases. The accuracy of the prediction results has always been a process that needs to be continuously improved, from the traditional regression prediction method to the current deep learning algorithm [
23,
24]. The prediction method is improving constantly. The deep learning algorithm has the characteristics of information memory, self-learning, optimization calculation, etc. It also has strong computing power, complex mapping ability, and various intelligent processing capabilities [
25].
5. Simulation Experiments
5.1. Introduction to the Dataset
This research study was carried out on a personal computer with a single CPU of 2.6 GHz and 8 GB of memory. The simulation process was done using Python 3.6.8, and the TensorFlow deep learning framework developed by Google.
In this paper, the New York State Power history power system load data, published by NYISO Corporation [
36], was selected as input for the model training and testing process. Hourly data was selected as a load point, and a total of 8760 load data, of which 80% comprises the training data set and the remaining 20% comprises the test data set. The first 7008 load data were used as a single variable load prediction training data set, and the remaining 1752 load data were used as a single variable load prediction test data set.
Figure 6 clearly shows that the load data fluctuates periodically, wherein the 168-h load data for a week are shown in
Figure 7.
The electricity load is a random process, in the study of random processes, the self-correlation coefficient shows whether the random process is stable and chooses the appropriate model order. Therefore, the self-correlation coefficient of the training data is calculated, as shown in
Figure 8.
Figure 7 shows the electricity load in a particular area, showing periodic fluctuations, and
Figure 8 shows, with the increase in latency, that the correlation coefficient begins to decrease in the early stage, and with the increase of lag time, when the lag time is 24 h, the maximum peak is 0.87439. Therefore, the selected time series model order is 24, i.e., the historical load data of the first 24 hours of data is used as the characteristic vector rolling prediction. Hence, it is proved that the model is more applicable when using only single-dimensional data to build the model.
5.2. Performance Indices
For the performance evaluation of the proposed model, this paper uses Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The measures of error are expressed as follows:
Here, represents the true value, represents the predicted value, and represents the dimension of the data.
The MSE represents the expectation of the difference between the estimated values and the true values. In addition, RMSE is the square root of MSE, describing the magnitude of the errors in terms of making decision making more convenient for users. MAE is the average of the absolute errors between the estimated values and the true values, reflecting the true estimated value error. Moreover, MAPE expresses the percentage error accuracy between error and true values. The lower values of MSE, RMSE, MAE, and MAPE shows better prediction characteristics.
5.3. Seq2seq Preferred Model Parameters
In this paper, the Seq2seq codec structure is used to predict the electricity load, with both encoding and decoding using the LSTM structure, followed by the number of LSTM layers to optimize the number of layers in the model.
The initial learning rate of the experiment was set to 0.01, the attenuation rate was set to 0.5, the number of hidden layer nodes was set to 100, and after 100 trainings, the training error and test error obtained by the model under the selection of different layers are shown in
Figure 9 and
Figure 10, respectively.
Figure 8 shows that the training error of using quintuple layered LSTM structure is large, and the test errors after single layer, double layer, and triple layer training are comparatively lower. From
Figure 10, the test error of the Seq2seq model with single layer, double layer and quintuple layer LSTM structures are highly volatile. Therefore, the combined comparison of
Figure 9 and
Figure 10 suggests the selection of the three-tier LSTM structure model, as the error is minimal.
In deep learning, the model learns the “universal law” of all samples from the training sample through training, which tends to cause overfitting and underfitting. By increasing the amount of model training iterations, it is possible to overcome the phenomenon of underfitting. By increasing the data set and introducing the formal approach, it is possible to overcome the overfitting phenomenon. This paper adopts Dropout [
37] on this basis of the nerve unit, which is temporarily removed from the network with a probability of 0.5 during training, and the Attention mechanism and Residual mechanism are introduced.
By selecting the coalescence of the Residual mechanism and the two Attention mechanisms for simulation verification in
Figure 11, the training error, test error and training time of the model are compared, and the results are shown in
Table 1.
In
Table 1, true and false represent whether the model uses corresponding Residual or Attention mechanisms, respectively, and
Table 1 shows that when the model adopts the Residual mechanism, the training error and test error of the model are significantly reduced. It is shown that the addition of the Residual mechanism improves the predictive performance of the model, compared with two different Attention mechanisms, the model adopts the Residual mechanism and the Bahdanau mechanism adopts the model with better performance, and the training error and test error value are minimal. Therefore, the Seq2seq model used for short-term load prediction adopts the combination of Residual mechanism and Bahdanau mechanism, as shown in
Figure 12.
When the Seq2seq model is trained, an iterative prediction is used, and the accuracy of the experimental results is obtained by adjusting and selecting the parameters of the model in order to achieve the desired results. The model parameters are set as shown in
Table 2.
Based on the large power system dataset, 80% of the data are used for training the model and remaining are used for testing of the model. The input length of the data is set to 24, and the output length of the data is set to 1, with a learning rate of 0.01 and a decay rate of 0.5. The hidden neurons of the Residual LSTM are set as 100, with 200 decay steps. The model is trained for up to 300 iterations with a batch size of 200. The Adam optimizer is used to minimize the loss function and the disturbance optimization. The gradient value is set to 5.0.
5.4. Experimental Results and Analysis
After the model training, the test data set is predicted. The test set data is used as input to the model once it has been trained, and the input data is computed by the model to obtain the predicted values, and the forecast results obtained by the load prediction model are compared with the real values. It can be seen in
Figure 13 that the predicted value and the true value basically coincide with each other.
In this paper, the performance shows that the load prediction model with Seq2seq has stronger optimization capabilities. To demonstrate the superiority of the proposed method, compared with the optimal results, the initial and optimal fitness of the Seq2seq model was significantly superior to the results of the RNN, LSTM, and GRU models. The comparison graph is shown in
Figure 14.
In
Figure 14, it can be observed that, compared to RNN, LSTM, and GRU, the short-term load forecasting using Seq2seq model is better. The prediction results obtained by the Seq2seq model proposed in this paper are smooth, and the fitting effect is good. The error between the prediction result and the real value obtained by the RNN, LSTM, and GRU algorithms is large. The errors under different algorithms are shown in
Table 3 and
Table 4.
It can be observed from the above table that short-term load prediction is used by the algorithm of this paper, and the errors are comparatively smaller than for the RNN, LSTM and GRU algorithms, showing that it has a better prediction effect.
5.5. Supplementary Experiment
To illustrate that this experiment shows better prediction results for the load forecasting of small power grids, this paper uses the data of a small power grid as the experimental data set, and uses the Seq2seq proposed in this paper to carry out load forecasting. In [
38], the authors considered the impact of different types of day, which is important for load prediction. In the forecasting process, the weather (temperature, holidays, and humidity, etc.) data were also used as input variables. The detailed input data types are shown in
Table 5. In addition, the experimental results with a comparatively minimized error were obtained, as shown in
Figure 15 and
Figure 16.
It can be observed from
Figure 15 and
Figure 16, that for load forecasting, if more parameters such as date and weather are selected under the same model training parameters, it will improve the model learning. With the same training times, if other relevant features are introduced, the learning performance of the model will be much higher than that of the pure load data training, and the accuracy of the model is improved when the training data is small, also improving the overall model prediction accuracy. In terms of training the model with large and small power system data, the proposed model exhibits smooth behavior, as seen in
Figure 15, and thus the model can be considered to be more stable.
6. Conclusions
The outcomes of load forecasting are conducive to determining the power that needs to be generated in the coming days, the installation of new generator sets in the future, the determination of the size, location and time of the installed capacity, the determination of capacity expansion and reconstruction of the power grid, and the determination of the construction and development of the power grid. Moreover, it assists in the stable operation of the power system by predicting the demand. Therefore, the accuracy of load forecasting directly affects the stable and efficient operation of the power grid. This paper proposes a novel Seq2seq model for more precise power system load forecasting.
The main contributions of this paper are as follows:
- (1)
The progressive application of the Seq2seq model for load forecasting. Initially, the model was widely used in the field of machine translation, and it has been used for load forecasting to obtain better load forecasting results.
- (2)
According to the periodic characteristics of historical load data, the correlation coefficient method is used to determine the order of the input historical load, and the accuracy of data feature extraction is improved.
- (3)
The coalescence of Residual and Attention mechanisms is used to optimize the Seq2seq model, which overcomes shortcomings, such as model instability and lower precision, ensuring the effectiveness of power load forecasting.
- (4)
To demonstrate the robustness and the stability of the proposed model, the electricity dataset of the small power grid is used for prediction, also considering different weather conditions and user behaviors.
In this paper, the short-term load prediction model based on LSTM’s Seq2seq algorithm is developed by the coalescence of the Residual and Attention mechanism, and the effective characteristics of historical load data are extracted using the model. This reduces the error of short-term load prediction, eventually improving the prediction performance of the model and presenting a new method for short-term load prediction. By constantly optimizing the performance of various deep learning algorithm to improve the prediction accuracy, it will be possible to further develop a more advanced, faster and more accurate model for load prediction.
In the future, we look forward to studying the method’s applicability for long-term forecasting. Moreover, price prediction with respect to load forecasting can be studied comparatively. Furthermore, efficiency and prediction accuracy of load forecasting may be improved by combining various forecasting methods in order to develop a robust and stable forecasting model.