1. Introduction
Electricity prices are the core of the electricity market and have strong economic leverage. The fluctuations in electricity prices affect the flow and allocation of various resources in the electricity market. In the electricity market environment, accurate electricity price forecasting is of great importance to all participants in the market [
1,
2,
3]. The increasing penetration of renewable energy in the power system has made power generation more volatile and the resulting electricity prices more unpredictable than ever before.
Existing studies on electricity price forecasting can be classified into deterministic and probabilistic forecasts based on the form of the results. Deterministic forecasts usually have a single point forecast value as output, while probabilistic forecasts can have quantile estimates, forecast interval estimates, and probability density estimates as output.
Deterministic forecasting methods include mainly statistical and artificial intelligence methods. Statistical models rely on linear regressions and represent forecasts through linear combinations of explanatory variables. They are effective when dealing with linear data but perform poorly when dealing with non-stationary and non-linear data. ARIMA (autoregressive integrated moving average model) [
4,
5] and GARCH (generalized autoregressive conditional heteroskedasticity) [
6] are commonly used statistical models. Artificial intelligence models are better at handling non-smooth and non-linear data than statistical models, especially deep neural networks. The Recurrent Neural Network (RNN) is a powerful model for processing time series data, which achieves impressive results by constructing additional maps to preserve the information from past inputs. As important variants of RNN, LSTM (long short-term memory) and GRU (gated recurrent unit) are used to solve the gradient vanishing problem of RNN [
7,
8]. Reference [
9] decomposed a nonlinear series of electricity prices using wavelet variations and then captured the appropriate behavior of electricity prices using an Adam-optimized LSTM model. The validity of the hybrid model was verified with Australian and French datasets. Reference [
10] divided the electricity price prediction into two parts: ARIMA predicted the linear part of the electricity price series and Bi_LSTM (bidirectional long and short-term memory) predicted the nonlinear part of the electricity price series. The results of the electricity price prediction were obtained by combining the linear and nonlinear parts. Reference [
11] used a new evolutionary algorithm differential evolution DE to identify suitable hyperparameters for LSTM to efficiently obtain optimal solutions for hyperparameters. Convolutional neural networks (CNNs) excel in image-related tasks, and many electricity price prediction studies use CNNs to extract time-series features [
12]. Reference [
13] used feature selection and feature extraction techniques to reduce the dimensionality of the input data to eliminate the redundancy of the data. Moreover, reference [
13] used an enhanced convolutional neural network ECNN and enhanced support vector regression ESVR as prediction models to reduce the overfitting problem. The arithmetic examples of electricity load forecasting and tariff forecasting verify the accuracy and stability of the model. Reference [
14] proposed a LSTNet model that can extract both long-term and short-term dependent patterns of electricity price sequences, where a CNN is set to extract short-term dependent patterns, and a RNN and RNN skip are set to extract long-term dependent patterns with a GRU RNN component. When compared to several state-of-the-art base-line methods, LSTNet significantly outperformed them in studies on real-world data with complicated mixtures of repetitive patterns.
Traditional research on power system forecasting is dominated by deterministic forecasting, and it is difficult to avoid forecast errors. Deterministic forecasting is difficult to apply in new energy power systems, because it is difficult to achieve quantitative analysis and estimation of the fluctuation range of forecast errors. Probabilistic forecasting, as a theory and method to quantify prediction uncertainty, can obtain the probability distribution of the predicted object and provide more comprehensive prediction information for decision makers [
15]. Probabilistic prediction is commonly expressed through the conditional probability distribution of the predicted object given the input information. Probability density functions and cumulative distribution functions are both widely used to accomplish this. Additionally, quantile and prediction intervals are discrete expressions of probability distributions and are frequently used for more understandable and intuitive probability predictions.
Probabilistic forecasting can be divided into parametric and nonparametric methods, depending on whether the prediction object or the distribution model of the prediction error is presupposed. The parametric method relies on prior knowledge of the distribution of the predicted object and assumes that the predicted object or overall error of the prediction follows a specific probability distribution model (e.g., normal distribution [
16], beta distribution [
17], Weibull distribution, etc. [
18]). Based on this assumption, the parameters of the distribution model can be estimated to obtain the prediction results. The parametric method can construct a forecasting model by making direct parametric assumptions about the probability distribution of the predicted object, without relying on deterministic forecasting results [
19]. This approach has been successful in improving the performance of the parametric method’s probabilistic prediction by refining the distribution model. However, this particular distribution model is not always applicable to issues that involve complex and stochastic probabilistic prediction in new energy power systems.
The objects predicted in new energy power systems are highly stochastic and volatile, with distributions that exhibit severe polymorphic and fat-tailed characteristics [
20]. It is challenging to correctly model such objects using traditional parametric distribution models, necessitating the use of sophisticated nonparametric models to measure prediction errors. The nonparametric method of probabilistic prediction avoids issues such as irrational distribution assumptions in the parametric method [
21], by directly describing the prediction distribution rather than making parametric assumptions about the prediction error or the distribution of the predicted object [
22,
23]. The nonparametric method of probability prediction offers a solution to the limitations of existing parametric distribution models [
24]. This method avoids prior assumptions about the prediction object or probability distribution of the prediction error, which enables a more accurate description of the prediction distribution. It can also approximate more complex distribution models, and can provide both continuous and discrete representations of probability distribution by utilizing kernel density estimation [
25,
26], hybrid density network [
27], interval prediction [
28], and quantile regression [
29,
30]. Moreover, it requires less manual intervention and provides a more consistent probability prediction distribution that aligns with the true distribution. As a research hotspot in predicting new energy power systems [
31,
32], the nonparametric method of probability prediction has gained considerable attention.
In response to renewable energy uncertainty and an expanding feature set in electricity price forecasting, we propose a new probabilistic forecasting method that combines SHAP feature selection and LSTNet quantile regression to predict day-ahead electricity prices. First, the SHAP method is used to select features from the electricity dataset to reduce redundancy and achieve feature dimensionality reduction. The SHAP method is an additive feature attribution method that identifies the contribution of each feature to the model and associates these features with the electricity market. Additionally, the SHAP method can be used to replace traditional feature selection methods by using feature importance. Next, we introduce a probabilistic forecasting model that is based on LSTNet quantile regression. With the neural network quantile regression approach, we obtain predicted electricity price quantiles at different levels of probability by applying the LSTNet model to test data. Finally, we use the kernel density estimation algorithm to estimate the probability distribution of the predicted electricity prices and generate prediction intervals at different confidence levels.
The rest of the paper is organized as follows.
Section 2 describes the key techniques used in the prediction method.
Section 3 conducts a case analysis to demonstrate the effectiveness of our proposed method.
Section 4 concludes the paper.
3. Case Studies
This section is dedicated to validating the effectiveness of our proposed method for day-ahead electricity price forecasting. First, we present an overview of the Danish electricity market. Then, using the SHAP method, we assess each feature’s importance on electricity prices and select the feature set with a relatively higher impact on the prices. Furthermore, we provide specific examples of how the selected features impact the prices. Finally, in order to verify the validity of the proposed method, we conduct point forecasting and probabilistic forecasting, respectively. By comparing the performance of the proposed model in point forecasting and probabilistic forecasting, we demonstrate that probabilistic forecasting can effectively quantify the uncertainty of prediction while guaranteeing accuracy. In both point forecasting and probabilistic forecasting, the models using SHAP feature selection have obvious improvements compared with the original models, which proves the effectiveness of the SHAP feature selection. Comparing the accuracy of the benchmark models with the proposed model, the latter is obviously superior to the others, which proves the advantage of the LSTNet quantile regression.
3.1. Overview of the Danish Electricity Market
This paper uses the Danish electricity market dataset. Denmark is a pioneer country in green energy transition. According to statistics, fossil energy generation accounts for 25.9% of the Danish energy mix, while clean energy generation accounts for 74.1%. Among them, 50% of Denmark’s electricity consumption comes from wind and solar energy, while the proportion of coal generation is only 13%. The Danish Energy Agency estimates that Denmark’s green energy production will exceed the country’s total electricity consumption in 2028. The Danish grid is divided into two parts: the eastern grid (Zealand, DK2) and the western grid (Jutland and Funen, DK1), where the eastern grid DK2 is connected to Sweden AC to form the Nordic synchronous grid, and the western grid DK1 is connected to Germany AC to be part of the central European synchronous grid. In addition, the eastern grid is connected to the German DC and the western grid is connected to the Norwegian DC. Denmark’s eastern grid is connected to the western grid via a 400 kV DC line with a transmission capacity of 600,000 kW. the maximum export capacity of Denmark’s connection to neighboring countries is 6.52 million kW and the maximum import capacity is 5.73 million kW. The exchange of electricity between Denmark and neighboring countries depends mainly on tariff differences and transmission capacity limitations. For example, in 2015, Denmark imported large amounts of electricity from Norway and Sweden due to their high hydroelectric generation capacity and low electricity prices. The general situation of the Danish electricity market is shown in
Figure 3.
All data in this paper are available from
www.energidataservice.dk. The time span of the dataset is from 1 May 2019 to 31 August 2019. The day-ahead tariff for the DK1 region is the forecast tariff, as shown in
Figure 4. The dataset contains features as shown in
Table 1. The training set is from 1 May 2019 to 31 July 2019 and the test set is from 1 August 2019 to 31 August 2019.
3.2. Feature Selection and Analysis
In this paper, the SHAP method is used for feature selection of the dataset. The prediction model is LSTNet and the optimization algorithm is the adaptive moment estimation method (ADAM), which can design independent adaptive learning rates for different parameters, thus it is better suited for problems with non-smooth objectives, noisy, or sparse gradients. After hyperparameter adjustment, the number of neurons in the LSTNet model is 128 for RNN and RNN-skip, 64 for CNN, and 24 for the skip parameter.
Figure 5 shows the feature importance ranking chart similar to the traditional feature selection method. The traditional feature selection method usually compares feature series and tariff series and calculates the correlation coefficient between them, and this correlation coefficient is the importance of that class of features, such as the Pearson correlation coefficient. In contrast, the SHAP method characterizes the importance of each class of features by averaging the absolute values of the importance of individual features under each sample. It can be seen that the largest influence on the predicted electricity price is the feature DK2_7. Among the neighboring countries, the German electricity market has the largest influence on the Danish electricity market, with DE_1 ranked fourth in the feature importance ranking chart. Among all renewable energy sources, the feature HydroPower_1, which characterizes hydropower, is the most important, followed by the feature Wind_1, which characterizes wind power, and finally by the feature Solar_1, which characterizes photovoltaic power generation. Among all electric energy exchange features, the electric energy exchange feature ExGE_1 between the DK1 region and Germany is the most important, followed by the electric energy exchange feature ExGB_1 between the DK1 region and the DK2 region. Finally, among all the electrical energy exchange features, the electrical energy exchange feature ExGE_1 between DK1 region and Germany is the most important, followed by the electrical energy exchange feature ExGB_1 between DK1 region and DK2 region, and finally, the electrical energy exchange feature ExNO_1 between DK1 region and Norway. In this paper, we select the top seven features as the input feature set.
Figure 6 shows the effect of each sample on the predicted electricity price under different features. Each point on each feature row represents a sample from the test set. The color of the points is determined by the value of the corresponding feature under that sample. DK2_7 represents the tariff in the DK2 region with a lag of one week from the forecast date. It can be seen that most of the blue sample points are in the left half of the region and most of the purple sample points are in the right half of the region. This means that the feature pulls down the forecasted electricity price when the price of electricity in the DK2 region one week before is lower, and conversely raises the forecasted electricity price when the price of electricity in the DK2 region one week before is higher. The case of feature DK1_7 is similar to that of feature DK2_7, except that the peak of importance of feature DK1_7 is greater. The case of feature SE4_1 is different from that of features DK1_7 and DK2_7. It can be seen that most of the blue sample points in this feature row are located in the right zone, while the purple sample points are located in the left half of the zone. This means that when predicting DK1 electricity prices, the feature will increase the predicted electricity prices when the electricity prices in the SE4 region are lower a day ago, and conversely the feature will pull down the predicted electricity prices when the electricity prices in the DK1 region are higher a day ago.
In order to gain more insight and analyze the Danish electricity market, we plotted the feature dependence between the values of the features and the importance of the features.
Figure 7 shows the feature dependence plot for the wind power feature Wind_1. The vertical axis is the SHAP value and the horizontal axis is wind power generation. It can be seen that when the wind power generation is less than 1000, the sample points are concentrated below the value of SHAP value 0. When the wind power generation is greater than 1000, the sample points are concentrated above the value 0. This means that for the Danish electricity market, the threshold value of wind power discharge affecting electricity price is 1000 a few days ago, and when the feature Wind_1 is less than 1000, the wind power feature reduces the electricity price; when the feature Wind_1 is greater than 1000, the wind power feature increases the electricity price.
Figure 8 shows the feature dependence diagram of the electric energy exchange feature ExGE_1. It can be seen that when the electric energy exchange between DK1 region and Germany is less than 800, the sample points are concentrated above the value of SHAP value 0. The sample points are concentrated below the value 0 when the electricity exchange is greater than 1000. This means that for the Danish electricity market, the threshold value of the electricity exchange between DK1 and Germany is 800, and when the feature ExGE_1 is less than 800, the electricity exchange feature increases the electricity price; when the feature ExGE_1 is greater than 800, the electricity exchange feature decreases the electricity price.
3.3. Probabilistic Forecasting Results
In this paper, we propose a probability density prediction method based on LSTNet quantile regression. After obtaining the predicted electricity prices under different quartiles, we use the kernel density estimation algorithm to estimate the probability density distribution of the predicted electricity prices and obtain the prediction intervals at different confidence levels. In this paper, we set LSTM and BPNN as the benchmark models and the inter-quantile quantile interval is 0.1.
It is worth mentioning that when the quantile is 0.5, the predicted price quantile is the point forecasting price.
Table 2 shows the forecasts of different models when the quantile is 0.5. In this paper, we adopt RMSE and MAPE as the point prediction model evaluation metrics.
It can be seen that among all neural network models, the LSTNet model has the best prediction performance and is most suitable for the day-ahead electricity price forecasting task. Specifically, among the models without feature selection, the LSTNet model has the smallest error metrics RMSE and MAPE; among the models with feature selection, the SHAP–LSTNet model has the smallest error metrics. Among them, compared with the LSTM model, the error metrics RMSE and MAPE of the LSTNet model reduce by 36.41% and 42.61%, respectively. Compared with the SHAP–BPNN model, the error metrics RMSE and MAPE of the SHAP–LSTNet model reduces by 44.96% and 47.87%, respectively. In the comparison between the model without feature selection and the model with feature selection, the error metrics of the model with feature selection are smaller. For example, the error metrics RMSE and MAPE of the SHAP–LSTNet model reduced by 52.62% and 35.76%, respectively, compared to the LSTNet model. This is mainly because feature selection reduces redundant features and reduces the risk of model overfitting.
Figure 9 shows the point forecast price for a three-day period in August 2019. It can be seen that the SHAP–LSTNet model fits the real electricity price curve best.
After obtaining different price quartiles, the kernel density estimation is used to obtain the probability density functions at different moments of the forecast day and the prediction intervals at different confidence levels.
Table 3 shows the metrics at different confidence levels. The confidence levels are 0.9, 0.8, and 0.7, respectively.
It can be seen that the metrics at different confidence levels are different. For example, at the confidence level of 0.9, the largest PINAW among all models is the SHAP–LSTNet model, however, the smallest PICP is SHAP–BPNN. At this time, it is difficult to select the best model based on the PINAW and PICP alone, so we use the average pinball loss to assist us in our decision. At this point, the AL metrics of the SHAP–LSTNet and SHAP–BPNN models are 0.41 and 2.61, respectively, and it is clear that the former is more suitable for the day-ahead price forecasting task.
Figure 10 shows the prediction interval of different models at the confidence level of 0.9. It can be seen that the narrowest interval width is the SHAP–BPNN model, but most of the actual price points are not covered in this interval, which is obviously not possible. The most desirable model is SHAP–LSTNet, which basically covers all actual price points except for a few, and the interval width is also relatively small.
Moreover, from the table we can see that the PINAW is higher, but the PICP is larger at a high confidence level, and the PICP is smaller, but the PINAW is lower at a low confidence level.
Figure 11 shows the prediction interval plot of the SHAP–LSTNet model at different confidence levels. It can be seen that the higher the confidence level, the more actual price points are covered by the prediction interval, but the larger the interval width, while the lower the confidence level, the smaller the interval width, but fewer actual price points are covered by the prediction interval. Similarly, we use the average pinball loss to complicate our decision. It can be seen that at all confidence levels, the 0.9 confidence level has the smallest AL indicator.
In summary, the best model is the SHAP–LSTNet model at a 0.9 confidence level.