A Multiscale Electricity Price Forecasting Model Based on Tensor Fusion and Deep Learning

: The price of electricity is an important factor in the electricity market. Accurate electricity price forecasting (EPF) is very important to all competing electricity market parties. Decision-making in the electricity market is highly dependent on electricity prices, making an EPF model an important part of the orderly and efﬁcient operation of the electricity market. Especially during the COVID-19 pandemic, the prices of raw materials for electricity production, such as coal, have risen sharply. Forecasting electricity prices has become particularly important. Currently, existing EPF prediction models face two main challenges: First, how to integrate multiscale electricity price data to obtain a higher prediction accuracy. Second, how to solve the problem of data noise caused by the fusion of EPF samples and multiscale data. To solve the above problems, we innovatively propose a tensor decomposition method to integrate multiscale electricity price data and use L 1 regularization and wavelet transform to remove data noise. In general, this paper proposes a deep learning EPF prediction model, named the WT_TDLSTM model, based on tensor decomposition, a wavelet transform, and long short-term memory (LSTM). Among them, the LSTM method is used to predict electricity prices. We conducted experiments on three datasets. The experimental results of three data prove that the WT_TDLSTM model is better than the compared EPF model. The indicators of MSE and RMSE are 33.65–99.97% better than the comparison model. We believe that the WT_TDLSTM model is a good supplement to the EPF model.


Introduction
The price of electricity is an important factor in the electricity market [1]. Accurate electricity price forecasting (EPF) is very important to all parties in power market competition [2][3][4]. Decision-making in the electricity market is highly dependent on the electricity price, making an EPF model an important part of the orderly and efficient operation of the electricity market. EPF has a certain periodicity, as well as a high degree of randomness and time-varying characteristics [5]. Therefore, building a model for forecasting electricity prices is a very challenging task. The accuracy of EPF in the power market affects the efficiency and rationality of energy resource optimization [6]. Especially for enterprises, it is important to accurately predict the price of electricity to control their production costs. In addition, accurate electricity price forecasting can achieve a balance between the electricity supply and demand, which is conducive to stable electricity market operation. At present, with the global pandemic of COVID-19, the cost of raw materials in the power industry have risen sharply [7]. This exacerbates the fluctuation of electricity prices and further makes electricity price forecasting more important.
Existing EPF methods can be divided into three categories, namely, physical methods, statistical methods, and machine learning methods ( Figure 1). Physical methods are based on safety constrained unit commitment (SCUC) and safety constrained economic dispatch (SCED) models to simulate the day-to-day power market situation based on boundary conditions and physical theory [8,9]. Although physical methods are effective from the perspective of predictive logic, their main problem is that the SCUC and SCED models require a large amount of real-time operating data, such as line transmission capacity, electricity load, and competitors' bids, which leads to very complicated calculations. Statistical methods aim to use curve fitting to reveal the dynamic trends between historical electricity price series. These methods have the advantages of high-speed performance, simple modelling and convenience. Statistical models include the autoregressive moving average (ARMA) [10,11], generalized autoregressive conditional heteroscedasticity (GARCH) [12], and chaos theory [13]. Machine learning methods can be roughly divided into two categories: shallow learning models and deep learning models [14,15]. Shallow learning models are based on the principle of error minimization and generally have better performance than physical and statistical methods [16]. Due to their remarkable ability to extract features, they have been one of the most commonly used methods for electricity price forecasting. Shallow learning models include support vector regression (SVR) [17], artificial neural networks (ANNs) [18], and regression trees [19]. After 2021, some researchers have also proposed a kernel-based extreme learning machine power price prediction model [20,21]. Recently, with the development of deep learning, electricity price prediction models based on deep learning have been continuously proposed, including BP, CNN, RNN, etc. [22][23][24]. Recently, the LSTM-based electricity price prediction method has received widespread attention because of its excellent performance. For example, Huang et al. proposed an LSTM-based electricity price prediction model [25]. Memarzadeh et al. proposed an improved LSTM model named LSTM-NN to achieve a better short-term electricity price prediction model [26]. However, the existing EPF machine learning prediction models still faces two main limitations. First, most of the existing EPF models are single-scale models. Existing research has proven that compared with a single-scale training model, a multiscale EPF model can obtain increasingly comprehensive data features, and thus, it is possible to achieve better prediction accuracy [27]. The feature length of multiscale time series data is inconsistent and contains noise. This leads to difficulty in establishing an end-to-end prediction model. The second limitation is that the existing data denoising methods are still not perfect, and the results of the model still have room for improvement. Electricity price data have high dimensionality and high noise characteristics. Especially after the data are segmented using a sliding window, the sample quality problem of each sliding window becomes more prominent. In addition, the multiscale fusion process of a model may bring additional noise problems. Therefore, it is difficult for researchers to obtain an ideal model based on all samples and features.
In this paper, we propose an EPF prediction model based on tensor decomposition, wavelet transform and long short-term memory. The method of using tensors for feature fusion has been widely used in various artificial intelligence fields and has achieved good results [28][29][30]. This paper proposes a tensor decomposition method based on L 1 to fuse multiscale power data. Among them, the L 1 regularizer is used to solve the data noise problem caused by the fusion of the EPF sample itself and the multiscale tensor. We also used the wavelet transform method for data denoising. A wavelet transform (WT) is a new transform analysis method that inherits and develops the idea of localization of a short-time Fourier transform while overcoming the shortcomings of the window size not changing with frequency by using a WT's "time-frequency" window that changes with frequency. This method has been widely used in the denoising of time series data [31][32][33]. Finally, we use a long short-term memory (LSTM) model on the processed features for data prediction.
The contributions of this study can be summarized as follows: • We innovatively propose a tensor decomposition method based on L 1 regularons, which can effectively fuse multiscale electricity price data and remove the noise generated during data fusion.

•
To the best of our knowledge, WT_TDLSTM is the first multiscale integration model that incorporates wavelet transform and tensor fusion into a computational framework.

•
The experimental results show that, compared with a neural network model that does not perform denoising and fusion of multiscale data, the prediction performance of the WT_TDLSTM model achieves significantly better results.

Datasets
In this study, the monthly electricity price sample we adopted for validating the priority of the proposed method was selected from the Energy Information Administration (EIA) (U.S.) (https://www.eia.gov accessed: October 2021) and included residential, commercial, and industrial monthly electricity price samples (see Table 1). The number of samples for which residential, commercial, and industrial monthly electricity prices are all 245 (January 2001/May 2021). Before training, we preprocess the data of the residential monthly electricity prices as follows. First, we segment the data samples. For example, electricity price data from January to November 2001 are used to predict electricity prices in December 2001, and electricity price data from February to December 2001 are used to predict electricity prices in January 2002. Based on the above preprocessing strategy, we developed a new dataset with 234 samples. Next, we divide these data into multiple scales, and the time lengths are 12, 8, and 5 months.

Tensor Fusion
In this paper, we divide the electricity price data on multiple scales and obtain a matrix A ∈ R m×12 with a scale length of 12, a matrix B ∈ R m×8 with a scale length of 8, and a matrix C ∈ R m×5 with a scale length of 5. To perform tensor fusion on multiscale data, we transform matrices A, B and C into tensors A ∈ R m×12×1 , B ∈ R m×1×8 and C ∈ R m×1×5 . Figure 2 shows the process of tensor fusion for multiscale data. Next, we briefly describe the tensor fusion strategy, and the result is shown using the following formula: where D ∈ R m×12×8 and ⊗ denote the matrix multiplication of two dimensions of the tensor. To be able to further merge with the tensor C, we convert the size of D to m × 96 × 1 and use the following formula: where E ∈ R m×96×5 and is our final multiscale fusion tensor.

Tucker Decomposition
We can regard Tucker tensor decomposition as a form of higher-order PCA [34]. Tucker tensor decomposition factorizes X into N factor matrices and the core tensor that constitutes a compressed version of X . If we assume that X is a tensor of size I 1 × I 2 × · · · × I N , then the optimization problem to be solved to calculate the Tucker decomposition is we can easily find an exact Tucker decomposition of rank (R 1 , factor matrices U n ∈ R I n ×R n and column-wise orthogonal for n = 1, 2, · · · , N, · 2 F . denotes the L2 (or Frobenius) norm. If the solution of Equation (3) is at the optimal solution, then the core tensor G must satisfy Substituting Equation (4) into Equation (3), the optimization goal can be recast as a series of subproblems involving the following maximization problem: We can use the alternating least squares (ALS) method to solve (5); if U * n is a solution to (5), then G * is the corresponding Tucker core of X , and X is "low-rank" approximated bŷ whereX is the reconstructed tensor. The exact solution to (5) remains unknown and can be commonly approximated by means of HOSVD algorithms based on the ALS method.

L1-HOSVD
As noise will inevitably be introduced in the process of tensor fusion of multiscale data, we use the L1-HOSVD [35] decomposition method based on Tucker decomposition to denoise the fused tensor. L1-HOSVD derives by simply replacing the L2-norm in (5) by the corruption resistant L1-norm as follows: We find that to solve Equation (8), it is necessary to continuously iteratively solve the factor matrix U n . For the solution of the factor matrix U n , the objective function Equation (9) needs to be optimized: where U ∈ R I n ×R n ,U T U = I R n ; I R n refers to the identity matrix of size R n × R n . X (n) are the mode-n matricization of tensor X . We pursue the solution to (9) approximately by means of a fixed-point iteration (FPI) algorithm. According to [36], we can obtain the solution process as follows: where B ∈ {±1} P n ×R n is an indicator variable. We develop an alternate updating rule to optimize the objective function of Equation (10) as follows: = sgn(X T (n) U n,t−1 ).
U n,t = argmax U∈R In ×Rn ,U T U=I Rn where sgn(·) returns the ±1 signs of the entries of its argument (sgn(0) = 1), and t refers to the t-th iteration. Moreover, it holds that Φ(H) = argmax U∈R m×n by the Procrustes Theorem [37].
To simplify the updating process, we integrate Equations (11) and (12) as follows: Finally, we randomly initialize the factor matrices U n,1 and update the factor matrices by Equation (13) until the model converges. In this paper, we replace tensor X with multiscale fusion third-order tensor E ∈ R m×96×5 for denoising.

Wavelet Transform
A wavelet transform is a new transform analysis method that inherits and develops the idea of the localization of a short-time Fourier transform and simultaneously overcomes the shortcomings of the window size not changing with frequency by using a "time-frequency" window that changes with frequency. This makes it an ideal tool for signal time-frequency analysis and processing.
The main feature of a WT is that it can fully highlight the characteristics of certain aspects of a problem through transformation and can localize the analysis of time (space) frequency. It gradually refines the signal (function) at multiple scales through expansion and translation operations and finally achieves time subdivision at high frequencies and frequency subdivision at low frequencies. It can automatically adapt to the requirements of time-frequency signal analysis so that it can focus on any detail of the signal.
The basic formula of a wavelet transform is as follows: where α is the scale and τ is the displacement.

Long Short-Term Memory
A long short-term memory network is a special type of RNN that can learn long-term dependence information. The LSTM method was proposed by Hochreiter and Schmidhuber (1997) [38] and was recently improved. LSTM has achieved considerable success and has been widely used [39][40][41].
The key to LSTM is the state of the cell. The first step in the LSTM method is to decide what information the model will discard from a cell state. This decision is made through a layer called the forget door. Suppose the cell state after the last cycle is C t−1 . The gate will read the result h t of the last cycle of the model and the current input x t and output a value between 0 and 1 for each cell state parameter in C t−1 , where 1 means "completely reserved" and 0 means "completely discarded".
We define a function f t to determine the information discarded by the model in this cycle: where σ represents an activation function and W t and b f are the weight and bias information, respectively. The next step is to determine what new information is stored in the cell state. There are two parts here. First, the input gate layer determines the value that the model needs to be updated. Second, a new candidate variable C t is created, which is added to the state.
Finally, the cell state is updated using the following formula: Although the LSTM maintains a similar structure to the standard RNN, the cell composition is different. The LSTM introduced above can effectively solve the problem of gradient disappearance and gradient explosion in the RNN training process because of its unique structure. Figure 3 illustrates the schematic diagram of LSTM network training.
In this paper, we construct an LSTM deep learning model with two hidden layers, the activation functions are sigmoid and tanh, and the hidden layer neurons are set to 5.

WT_TDLSTM
To overcome the influence of noise caused by multiscale data fusion and establish a high-precision regression model, we innovatively combine wavelet transform and tensor decomposition with LSTM (see Figure 4). The specific workflow is as follows: Step 1: We divide the electricity price data into multiscale and use a wavelet transform to denoise data of different scales; Step 2: We integrate multiscale data into a tensor and use L1-HOSVD based on Tucker decomposition to decompose the tensor to obtain factor matrices of different dimensions and the compressed version of the core tensor; Step 3: We use the factor matrices and the core tensor to obtain the reconstructed tensor, which is the denoised tensor. Then, the denoised tensor is converted to a matrix, and the matrix is normalized; Step 4: Based on the normalized matrix, we take advantage of the LSTM model for the final electricity price prediction.

Results
In this section, we first show how to preprocess residential, commercial, and industrial monthly electricity price samples. Then, we introduce the evaluation metrics of the model and the adjustment of hyperparameters. Finally, we compare and analyze the performance of our model with other models.

Data Preprocessing
Considering that activation functions such as ReLu in the neural network will map all negative values to 0, all sample data are normalized before network training to reduce training time and improve the training effect. In this paper, we use the minmax normalization method to preprocess the electricity price data. The normalization formulation is as follows: . (19) where X * denotes the result of normalizing the input data and max(X) and min(X) refer to the maximum and minimum values of the input data, respectively.

Performance Metrics
In this study, four evaluation indicators, the mean squared error (MSE), root mean square error (RMSE), and mean absolute error (MAE), are applied in this subsection. The calculations of these indices are shown in Equations (20)- (22).

The Effects of the Hyperparameters
In our model, some hyperparameters have different effects on the experimental performance. We take the residential monthly electricity price dataset as an example to show the process of adjusting hyperparameters. Here, we focus on two specific hyperparameters, i.e., the number of neurons and the learning rate λ in the WT_TDLSTM model. To better evaluate the influence of hyperparameters on the model, we use the MSE and MAE to select hyperparameters. First, we fix the value of the learning rate at 0.001 and search for the optimal values of the other parameters. Then, we vary the number of neurons in the hidden layer within {3, 5, 10, 20} to find the best parameter. From Table 2, we can see that when the number of hidden neurons is 5 in the WT_TDLSTM model, the values of MSE and MAE are the smallest (MSE = 0.001973, MAE = 0.033888). Finally, after the above hyperparameters are determined, we can change the value of the learning rate within {1, 0.1, 0.01, 0.001, 0.0001} to find the optimal parameter. In Figure 5, we find that when the learning rate is 0.001, the values of MSE and MAE are the smallest. It is worth noting that commercial and industrial monthly electricity price datasets in this paper should also use the above steps for hyperparameter selection.   Tables 3-5 show the prediction performance of different models on the residential, commercial and industrial monthly electricity price datasets, respectively. In this paper, we choose five deep learning-based models as our baseline models, including BP, CNN, RNN, LSTM, and LSTM-NN. For our model, we can see that the average values of the MSE, RMSE, and MAE in five randomized experiments are 0.001985, 3.940309 × 10 −6 , and 0.035024, respectively, as shown in Table 3. The values of the evaluation metrics of the WT_TDLSTM model are 98.25%, 99.97%, and 87.55% better than those of the BP model; 71.04%, 91.61%, and 51.21% better than those of the CNN model; 98.09%, 99.96%, and 85.35% better than those of the RNN model; 98.04%, 99.96%, and 85.33% better than those of the LSTM model; and 97.83%, 99.95%, and 84.50% better than those of the LSTM-NN model. Similarly, from Tables 4 and 5, we can also find that the average values of MSE, RMSE, and MAE of the WT_TDLSTM model are significantly better than those of the baseline models in five randomized experiments. More specifically, in terms of MSE and RMSE, the prediction performance of the WT_TDLSTM model is better than that of the baseline models by 33.65-93.18% and 40.71-88.40%, respectively, as shown in Tables 4 and 5. Furthermore, compared with the LSTM_NN and LSTM models, we can see that the WT_TDLSTM model achieves better fitting ability on the residential monthly electricity price test datasets, as shown in Figure 6a-c. All these results demonstrate that the WT_TDLSTM model is significantly better than the existing electricity price prediction models.

Comparison of Convergence Curves
Figure 7a-c show the comparison of convergence curves of different models on the residential, commercial, and industrial monthly electricity price test datasets, respectively. By comparing the convergence process between LSTM, LSTM-NN, and WT_TDLSTM, we can see that the convergence curve of the WT_TDLSTM model has no obvious fluctuations on the three electricity price datasets. More specifically, we find that the greater the number of iterations is, the greater the fluctuation range of the convergence curve of the LSTM and LSTM-NN models in Figure 7a,b. Moreover, the convergence curves of the LSTM and LSTM_NN models fluctuate significantly in Figure 7c. However, as the number of iterations increases, the convergence curve of the WT_TDLSTM model, especially in Figure 7a,b, gradually flattens. The above results illustrate that the denoising ability of the WT_TDLSTM model is better, and the model is more robust.

Discussion and Conclusions
In this paper, we propose an electricity price prediction model based on tensor decomposition, a wavelet transform and a long short-term memory. We innovatively propose a tensor decomposition method based on L 1 regularization for multiscale data fusion denoising. We have also integrated wavelet transform and LSTM model in the model framework, where wavelet transform is used to remove the noise of the data itself, and the LSTM model is used to achieve high-precision power prediction effects. In the experiment, we fused the data of three time scales and built a 2-layer LSTM model.
The experimental results of these three datasets show that the model proposed in this paper is better than the existing model by 33.65-99.97% in terms of MSE and RMSE. To be more specific, the prediction performance of the WT_TDLSTM model is significantly better than that of the LSTM model, which shows that the use of multiscale data, a wavelet transform, and a tensor for fusion are very helpful to the improvement in the prediction performance of the model. Meanwhile, compared with the baseline models, the WT_TDLSTM model still achieves better prediction performance.
In conclusion, the WT_TDLSTM model can serve as a powerful tool to forecast shortterm multiscale electricity prices. If the model can be used in practice, it will play an important role in the operation of the electricity market. Although the model has achieved good results, the WT_TDLSTM model still has some limitations: due to tensor decomposition, the time complexity of the model is relatively high, the tensor operation requires many computing resources, and the memory usage is large. In the future, we plan to further improve the steps of tensor calculation so that it can reduce memory usage and speed up calculations.