An Ultra-Short-Term Electrical Load Forecasting Method Based on Temperature-Factor-Weight and LSTM Model

: Ultra-short-term electrical load forecasting is an important guarantee for the safety and efﬁciency of energy system operation. Temperature is also an important factor affecting the changes in electric load. However, in different cases, the impact of temperature on load forecasting will vary greatly, and sometimes even lead to the decrease of forecasting accuracy. This often brings great difﬁculties to researchers’ work. In order to make more scientiﬁc use of temperature factor for ultra-short-term electrical load forecasting, especially to avoid the negative inﬂuence of temperature on load forecasting, in this paper we propose an ultra-short-term electrical load forecasting method based on temperature factor weight and long short-term memory model. The proposed method evaluates the importance of the current prediction task’s temperature based on the change magnitude of the recent load and the correlation between temperature and load, and therefore the negative impacts of the temperature model can be avoided. The mean absolute percentage error of proposed method is decreased by 1.24%, 1.86%, and 6.21% compared with traditional long short-term memory model, back-propagation neural network, and gray model on average, respectively. The experimental results demonstrate that this method has obvious advantages in prediction accuracy and generalization ability. L.X. and X.D.; formal analysis, F.L.; investigation, D.Z.; resources, F.L.; data curation, H.T.; writing—original draft preparation, H.T. and D.Z.; writing—review and editing, D.Z., L.X. and X.D.; visualization, X.D.; supervision, L.X.; project administration, X.D.; funding acquisition, F.L. All authors have read and agreed to the published version of the


Introduction
Ultra-short-term electrical load forecasting (USTLF), which refers to the forecasting of the load within one day [1], is the basis of safety, reliability, and economy of energy system operation. Owing to the increasing demand of distributed energy and various users, the randomness of load changes increases the difficulty associated with load forecasting. USTLF provides a basis for determining trends in the electricity market price [2]. Too large a prediction error will result in higher operating costs.
There are many load forecasting methods utilized at present. Zhengyuan et al. [3] proposed an original data sequence by third parties for the opening sequence of operations to generate new data. Furthermore, it can be used to establish an improved GM (1,1) model. Song et al. [4] built a combined

Long Short-Term Memory Artificial Neural Networks
LSTM artificial neural networks are a special type of recurrent neural network (RNN). LSTM mainly solves the phenomenon of "gradient explosion" or "gradient disappearance" in the RNN context, making them better able to deal with the problem of long-distance dependence. A multilayer LSTM network structure model is shown in Figure 1. The construction of the LSTM network unit is depicted in Figure 2. Here, C represents the long-term memory of LSTM, which adds new memory in real-time as the network operates. The h t−1 denotes the output from the previous point in time, while h t is the output at the current point in time. Moreover, X t represents the current input. The internal function modules of the LSTM unit will be introduced below.
1. Forget gate: The forget gate determines by the forgetting coefficient, which refers to how much of the long-term memory C t−1 of the previous moment should be retained. It further integrates the output h t−1 of the previous time point with the input X t of the current time point into an input matrix [h t−1 , X t ]. Finally, the sigmoid activation function outputs a real number in the range (0,1); here, 1 means that all memories should be stored, while 0 indicates that all memories should be forgotten: Here, σ is the activation function, W f represents the weight matrix of the fully connected layer network, and b f indicates the bias matrix of the fully connected layer network; moreover, f t is the forgetting coefficient. 2. Input gate: Function of the input gate: it determines how much of the current input X t is saved for long-term memory C t : Here, W i and b i denote the weight and bias parameters, respectively, of the function sigmoid at the fully connection layer, while W c and b C are the weight and bias parameters, respectively, of the tanh function of the fully connection layer. 3. Output gate: Function of the output gate: the intermediate parameter o t is used to determine the extent to which the long-term memory C t affects the current cell output:

TFW-LSTM Method
The traditional USTLF method is not sufficiently comprehensive when temperature is considered. When the correlation between the temperature and power load is strong, temperature can improve the precision of power load prediction; when this correlation is weak, however, this precision will decrease. Similarly, load forecasting that does not consider temperature can in fact achieve higher prediction accuracy, provided that the recent load is stable enough. In order to make better use of the temperature information, we propose an USTLF method that combines TFW and the LSTM model to solve the above problems. Accordingly, we add a TFW calculation module to the LSTM neural network based load forecasting method. After analyzing the historical load and temperature data in the current prediction task, the module feeds back the TFW value which determines whether the system needs to consider the temperature.

Data Acquisition and Preprocessing
We use the electrical load and temperature data of a city in Hunan province in 2019. The temporal resolution of the load data provided by the power company is 15 min, while the weather data was obtained from an open source weather (website Available: http://www.tianqihoubao.com/lishi/ changsha.html) using a web crawler. It is determined that the original electric load data cannot be used directly in the present experiment; there are some missing data, which are marked by the power company using the value −999. Therefore, some missing data are simulated and filled according to the changing trends of the data across time. To accomplish this, a data filling algorithm is proposed to fill in the missing data values so that they are as close as possible to the real values. This filling algorithm, which averages the values in the cells adjacent to the missing data cell to fill in the missing values, is named the adjacent cell average (ACA) method and operates as follows.
Step 1: Get a new Excel cell location (row,col) and check the cell data; repeat this step if the data is normal, and execute step 2 if it is abnormal.
Step 2: Determine whether the data exception is surrounded by data in adjacent cells; if not, record the location and wait for manual processing; if so, perform step 3.
Step 3: Execute the ACA method to calculate the load value of abnormal data points.
Step 4: Determine whether the traversal of all data has been completed; if so, exit the program; if not, return to step 1.

Construction of LSTM Model
As for the selection of hyperparameters, we use different combinations of hyperparameters for experimental comparison. We select the hyperparameter combination with the lowest error metrics mean absolute percentage error (MAPE%) in Table 1. The input data includes the historical load data of the recent four time points, the sampling point and the temperature data of current time point (if the system determines that temperature should not be taken into account, the input value is 5). The parameter keep-prob works to make the neurons working with a certain probability during training. The LSTM neural network structure employed in this paper is illustrated in Figure 3. Here, the number of all hidden layer cells is equal. The LSTM model consists of one input layer, three hidden layers, and one output layer.

TFW
In this paper, a TFW calculation module is proposed to reflect the degree to which it is worth considering temperature in the process of predicting the current power load. The structure diagram of the TFW calculation module is presented in Figure 4. The algorithm flow is shown in Algorithm 1. The module inputs historical temperature data and historical load data. Subsequently, the model outputs the TFW value W temp through the intermediate variable temperature influence coefficient (TIC) T i f and the mapping relation f : T i f → W temp . The module calculates the variance of the load value at the same time point across all dates in the historical data, while the sum of the corresponding variance of the 96 time points is represented by Var. Here, Var is used to reflect the degree of load fluctuation in the training data. As load fluctuation is mainly derived from weather-sensitive load, this variable can reflect the degree to which abrupt changes in weather-sensitive load are present in the training data. Variance calculation block is used to calculate the Var.

Input:
Historical load data L; Historical temperature data T; Output: Calculate the variance of a sequence of historical data consisting of the jth point of the day; 4: The calculated results are temporarily stored in x; Moreover, Var is calculated as Equation (4): Here, N represents the total number of days of historical data used, while L ij represents the load value. The formulas used to calculate the covariance and correlation coefficients are shown in Equations (5) and (6): Here, COV represents the covariance, while X and Y denote the temperature and power load, respectively. The normalized module is used to normalize the data. T i f is calculated according to Equation (7): The TFW mapping block maps the corresponding interval according to the calculated TIC value T i f . As shown in Figure 5, the add temperature factor interval indicates that the TFW W temp is 100%, which indicates that the temperature must be considered in the calculation; moreover, the no temperature factor interval indicates that W temp is 0, which indicates that the temperature should not be considered. However, the fuzzy endpoint T σ is a critical value and is characterized by volatility, which is in turn caused by the randomness and volatility of the power load and temperature. In this paper, the floating ranges of the fuzzy endpoints T σ are obtained via experimental study. Here, the TIC T i f is located in the probability interval of the fuzzy endpoint T σ (0.450, 0.533), while W Temp is calculated according to Equation (8).
Finally, the mapping relation f : T i f → W temp between the TIC and the TFW is as presented in Table 2.

Structure of TFW-LSTM Method
The present paper proposes a short-term power load forecasting method based on TFW and the LSTM model. The block diagram of the method is illustrated in Figure 6. In phase 1, the historical load data and historical temperature data are input into the TFW calculation module, after which the corresponding W Temp is calculated and output to the control block. Here, control block is a logical unit block that controls whether or not historical temperature data will be input into the neural network training module. When the TFW meets W temp ≥ 50%, the control block decides that the temperature factor should be considered in the current prediction work, with the result that the historical temperature data will be passed through the control block; otherwise, historical temperature data are not allowed to pass, and the output value is None. In the next step, the training block receives the historical load data and the historical sampling point data simultaneously. The AdamOptimizer, under the tensorflow framework, is used for training so that the optimal parameters of the model can be found. Once the training is completed, the optimal parameters of the output model are sent on to the LSTM model for testing.
The control block of phase 2 receives the W Temp calculated in phase 1 to control the temperature data used in the current forecast. The LSTM model receives the sampling point, the return value of control block and the optimal model parameters as input, then outputs the corresponding power load prediction results.

Experimental Configuration
In this paper, 39 dates are randomly selected in 2019 as testing set. The data of 10 days' prior to each experimental prediction date are used for training. Therefore, the ratio of the training set to the test set is 10:1. During the experiments, the trained model is used to output the predicted load value corresponding to the predicted time point. Furthermore, the model output value is compared with the label value to calculate the error. Finally, four test sets are extracted to facilitate comparison between the proposed method and the traditional power load forecasting methods. Due to the large number of missing data points, the data for February are not used in this paper.
This article employs three performance metrics to evaluate the results of the model testing: MAPE, mean absolute error (MAE), and root mean square error (RMSE).
The MAPE is defined as follows.
The MAE is defined as follows.
The RMSE is defined as follows.
Here, theỹ denotes the result of the model, whileỹ represents the true value and n is the total number of calculated values.

Results and Discussion
In order to verify the performance of the TFW-LSTM method, data from a certain region in Hunan, China in 2019 were selected for comparative experiments. Among them, 39 dates were randomly selected to compare the performance of the proposed method with the traditional LSTM, and the TIC T i f in each dates was calculated simultaneously. The experimental data results were shown in Table 3. The experiments were conducted on a laptop with Intel(R) Core(TM) i7-8750H CPU 2.20 GHZ, 64-bit Windows 10 operating system and 8GB memory, using Python 3.7.4 in the tensorflow framework. The MAPE% comparison between the TFW-LSTM method and traditional LSTM model were presented in Figure 7, and we also selected 4 typical days for further study. The results of comparison was illustrated in Figure 8.  It can be concluded from the experimental results that the proposed method performs better overall and was generally more stable than the traditional LSTM model. Moreover, because the TFW-LSTM method was able to flexibly apply the temperature factor in the power load forecasting process, it was better able to absorb the advantages of utilizing the temperature factor while avoiding the associated disadvantages.
In the next step, so as to more objectively demonstrate the superiority of the proposed method, the proposed method was compared with the BP neural network and traditional grey model in the four typical dates above. The results of metric were listed in Table 4, and the line graph was presented in Figure 9.  The "Real" stands for the actual value, the TFW-LSTM is our method, the "GM" is the traditional gray model, and the "BP" is the traditional back propagation neural network model.
As we can see in the results, the TFW-LSTM method was obviously superior to other traditional methods in each metrics. In the four typical dates, the proposed method reduced MAPE by 1.24%, 1.68% and 6.21% on average, respectively, compared with the traditional LSTM, BP, and GM . Compared with LSTM, the TFW-LSTM method added the dynamic controlling mechanism of feature, and can show higher stability and prediction accuracy in USTLF. In contrast with other traditional prediction methods, the TFW-LSTM method had a great advantage because of its inherent nonlinear processing ability and temporal data processing ability.

Conclusions
In order to eliminate the negative influence of temperature on load prediction in USTLF, we propose a method for USTLF based on TFW and the LSTM model. The TFW calculation module is the core of the proposed method, which determines whether the temperature factor should be considered.
The proposed method is based on TFW and the LSTM model, which uses real data from a region in Hunan Province, China in 2019 for performance verification. The results show that compared with the traditional load forecasting method, the proposed method evaluates the importance of temperature to forecasting at the current time. It dynamically avoids the negative impact of temperature, and achieves a higher prediction accuracy by combining with the LSTM model. The performance metrics MAPE, MAE, and RMSE reflect the superiority of the proposed method.
In the future, as deep learning theory comes to be utilized more widely in data processing [32][33][34], we will attempt to use additional methods to improve both the accuracy of power load prediction and the overall model stability. In recent years, with the development of nonlinear system theory and research [35,36], we will try to adopt nonlinear time series forecasting models based on chaos theory for power load forecasting. We will also consider adopting image data processing methods [37,38] for power load forecasting.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript.