Point-Interval Forecasting for Electricity Load Based on Regular Fluctuation Component Extraction

: The ﬂuctuation and uncertainty of the electricity load bring challenges to load forecasting. Traditional point forecasting struggles to avoid errors, and pure interval forecasting may cause the problem of too wide an interval. In this paper, we combine point forecasting and interval forecasting and propose a point-interval forecasting model for electricity load based on regular ﬂuctuation component extraction. Firstly, the variational modal decomposition is combined with the sample entropy to decompose the original load series into a strong regular ﬂuctuation component and a weak regular ﬂuctuation component. Then, the gate recurrent unit neural network is used for point forecasting of the strong regular ﬂuctuation component, and the support vector quantile regression model is used for interval forecasting of the weak regular ﬂuctuation component, and the results are accumulated to obtain the ﬁnal forecasting intervals. Finally, experiments were conducted using electricity load data from two regional electricity grids in Shaanxi Province, China. The results show that combining the idea of point interval, point forecasting, and interval forecasting for components with different ﬂuctuation regularity can effectively reduce the forecasting interval width while having high accuracy. The proposed model has higher forecasting accuracy and smaller mean interval width at various conﬁdence levels compared to the commonly used models.


Introduction
With the emergence of various new electricity-consuming products, people's daily electricity consumption is increasing, and the electricity load is growing, which makes the complexity and uncertainty of the electricity system grow [1]. Therefore, accurate electricity load forecasting is crucial to help decision-makers reasonably arrange the start and stop of engine groups within the electricity grids, maintain the safety and stability of the grid operation, effectively reduce the cost of electricity generation, and improve the efficiency of the electricity system [2]. However, electricity load forecasting is not easy, especially because the fluctuation and uncertainty of electricity load bring great challenges to load forecasting [3].
Currently, the research on electricity load forecasting is mainly divided into deterministic point forecasting and uncertainty forecasting. There are three main types of models for deterministic point forecasting: conventional statistical models [4], machine learning models [5], and deep learning models [6]. Conventional statistical models, such as exponential smoothing [7], autoregressive moving average [8], and autoregressive integrated moving average [9], have long been used in load forecasting, but the accuracy of the models decreases as the forecasting time increases, and the ability to fit volatility datasets and highly nonlinear data is weak. Machine learning methods can capture complex nonlinear relationships and support vector regression [10], random forest [11], and other machine learning methods have been successfully applied to the field of load forecasting. However, the forecasting results obtained by machine learning are often less accurate than deep learning. Deep learning conforms to the trend of big data and has a stronger learning ability for massive data. Lin et al. [12] used graph neural networks for short-term load forecasting. Rafi et al. [13] proposed a combination model of convolutional neural network (CNN) and long short-term memory network (LSTM) for electricity load forecasting. Fang et al. [14] proposed a combination model of CNN and bidirectional gated recurrent unit (BIGRU) based on attention mechanism (ATT) for short-term electricity load forecasting. Although scholars have constantly improved the common models in load forecasting and increased their forecasting accuracy, the error that can occur in point forecasting can never be avoided by model optimization. Compared with deterministic point forecasting, uncertainty forecasting can adequately consider the fluctuation of electricity load [15], reasonably quantify the potential fluctuation, and obtain a possible fluctuation range of electricity load, thus providing a more comprehensive reference for the planning, operation, and scheduling of electricity systems.
Uncertainty forecasting is mainly divided into probability density forecasting and interval forecasting. He et al. [16] proposed a probability density estimation method based on the copula theory for short-term electricity load forecasting. Ding et al. [17] proposed a kernel density estimation method based on bootstrap for natural gas demand forecasting. Although these probability density estimation methods have good forecasting results, they require data to obey a certain hypothetical distribution, which may not hold in reality, resulting in unreliable forecasting intervals obtained [18]. Gan et al. [19] performed interval forecasting with the temporal convolutional network. Saeed et al. [20] used a bidirectional LSTM model for interval forecasting. However, when using the neural network method for interval forecasting, if the training data fluctuate a lot, interval forecasting is not effective [21]. Quantile regression (QR) does not depend on the initial assumption distribution of the data and is not affected by extreme values, so many scholars have started to combine quantile regression with other models for interval forecasting [22]. Zhao et al. [23] combined the LS-SVM model with quantile regression for wind power forecasting and obtained better forecasting results. Ma et al. [24] combined Gaussian distribution correction with the CNN model and used the quantile regression for electricity load-interval forecasting, which improved the forecasting accuracy and flexibility of the model. Hu et al. [25] used quantile regression combined with a time convolution network (TCN) for wind power interval forecasting, and their results proved that the model has significant advantages in interval accuracy and width. The above researchers all perform interval forecasting for the original load series directly without decomposing the fluctuation of the original load series, extracting the irregular fluctuation from it, and performing interval forecasting for the irregular fluctuation only. The direct interval forecasting for the original load series causes the forecasting interval to be too wide, and when the interval width is too wide, the obtained forecasting results lose their value.
The key to interval forecasting is to discover the regularity of fluctuation and to forecast the possible fluctuation. Cartagena et al. [26] proposed that the forecasting interval is composed of a data-generating function and additive noise. Thus, it is necessary to separate the component of strong and weak regularity fluctuation from the original load series before performing interval forecasting and modeling for different fluctuation components separately to obtain more accurate forecasting results. Ding et al. [27] combined empirical modal decomposition (EMD) with singular spectrum analysis (SSA) to extract linear and nonlinear components and then used an autoregressive integrated moving average model (ARIMA) for point forecasting of linear components and combined BP neural network with improved first-order Markov chain (IFOMC) model for interval forecasting of the nonlinear component; however, the EMD decomposition has modal mixing phenomenon, which will affect the final forecasting results. Wang et al. [28] proposed EEMD-SE-RVM combined model for interval forecasting; firstly, the original load series was decomposed into multiple components using the ensemble empirical modal decomposition (EEMD) method, then the components were reconstructed using the sample entropy (SE), finally, using the relevance Energies 2023, 16,1988 3 of 20 vector machine (RVM) to forecast each component and accumulate to obtain the forecasting intervals. The experiment results showed that using sample entropy for reconstruction reduced the complexity of the model and ensured the accuracy and reliability of the forecasting intervals. Although EEMD solves the modal mixing phenomenon in EMD, the white noise it adds to the original signal contaminates the fluctuation trend of the original signal [29]. In contrast, variational modal decomposition (VMD) solves this problem well. Liu et al. [30] used the VMD technique to decompose the original series and used the ATT-GRU model to forecast the individual components, and accumulated the forecasting results to obtain the forecasting intervals. Zhang et al. [31] combined VMD with phase space reconstruction (PSR) for decomposition and reconstruction of the original series, then used the GRUQR model to perform interval forecasting for the reconstructed components and accumulate them to obtain the forecasting intervals. The model reconstructed the decomposed components of the original series, integrated the components with similar features, and solved the problem of model complexity caused by too many components. The above researchers decomposed the original load series and built models to forecast for different components and obtained better forecasting intervals, but still performed interval forecasting for all components, which led to wide interval width.
Based on the above research, this paper proposes a point-interval forecasting model for electricity load based on regular fluctuation component extraction. Firstly, VMD is used to decompose the original load series, and SE is used to reconstruct the decomposed components into strong regular fluctuation components and weak regular fluctuation components. Then, we build the GRU model for point forecasting of the strong regular fluctuation component and the support vector quantile regression (SVQR) model for interval forecasting of the weak regular fluctuation component. Finally, the fluctuation intervals obtained from interval forecasting and the point forecasting results are accumulated to obtain the forecasting intervals of electricity load. In addition, by setting up a variety of comparison experiments, the model forecasting results are analyzed and evaluated using root mean square error, mean absolute error, prediction interval coverage probability, prediction interval normalized average width, and other indicators. The innovations and contributions of this paper include the following:

1.
Reasonable interval forecasting needs to dig out the regularity of load fluctuation and forecast the possible fluctuation. This paper combines point forecasting and interval forecasting, interval forecasting for weak regular fluctuation components, and point forecasting for strong regular fluctuation components, which can ensure the accuracy of the forecasting results and effectively reduce the forecasting interval width.

2.
Reconstructing the decomposed components and merging the components with similar fluctuation regularity to avoid an overly complex model.

3.
Using high accuracy deep learning model for point forecasting and machine learning model with strong anti-fluctuation and short training time for interval forecasting. It avoids the bad results of interval forecasting with deep learning when the training datasets fluctuate a lot and reduces the training time of the whole model.
The rest of this paper is presented as follows: Section 2 introduces the theoretical basis, model structure, and forecasting process, Section 3 describes the datasets and model evaluation metrics used in this paper and experiments and results are analyzed in Section 4, and Section 5 gives the conclusion of this research.

Variational Modal Decomposition
VMD is a non-recursive signal processing method that reduces the non-smoothness of time series with high complexity and strong nonlinearity. Using an iterative search for the optimal solution of the variational modals, the time series data are decomposed into multiple components with limited bandwidth, which are called intrinsic mode functions (IMF). Compared with EMD, VMD has better noise immunity and overcomes the problem of modal mixing [32]. The decomposition steps of VMD are as follows.
Energies 2023, 16, 1988 4 of 20 Firstly, the original series S is decomposed into k subseries u, which makes the sum of the bandwidths of the decomposed components minimum and the sum of all components equal to the original series, and the constrained variation expression is obtained as follows: where {u k } represents the modal component of the decomposition, {ω k } denotes the central frequency of each component, f (t) describes the original series, ∂ t is the obtained gradient, t is the moment, (δ(t) + j/πt) × u k (t) is the Hilbert-Huang transform, and e −jw k t is the forecasted central frequency. Then, we add the penalty parameter, Lagrange multiplication operator, and transforming the constrained variational problem into the unconstrained variational problem to obtain the augmented Lagrange expression as follows: where α is the penalty parameter and λ is the Lagrange multiplication operator. Finally, using the alternating direction method of multipliers to continuously update each component and its central frequency, we calculate the decomposed sub-series, the central frequency of each series, and Lagrange multiplier according to Equations (3)- (5).
whereû n+1 k (ω) represents the Fourier transform of u n+1 k (ω), other Fourier transform symbols are the same, τ v is the tolerance of signal noise.
When the series obtained after n + 1 cycles is not much different from the one obtained in the previous cycle, the cycle is stopped, and the stopping condition is as in Equation (6).
where ε is the positive number that infinitely tends to zero.

Sample Entropy
Multiple components are obtained after decomposition using VMD; building a model for each component for interval forecasting not only greatly increases the complexity of the model but also ignores the similar characteristics among the components. Therefore, this paper uses sample entropy to evaluate the regularity of each component fluctuation.
Sample entropy is an improvement of approximate entropy, which is a proposed method to measure the complexity of time series based on approximate entropy [33]. The Energies 2023, 16,1988 5 of 20 lower the sample entropy, the higher the serial self-similarity and the stronger the regularity; conversely, the more complex the sample series, the weaker the regularity [34]. Currently, the sample entropy has an excellent effect in measuring the complexity of time series. The sample entropy is calculated as follows.
First, the length of the time series is noted as l. The time series is reconstructed by combining the time points of every m adjacent one into a vector to obtain a set of vector series with dimension, Then calculate the absolute value of the maximum difference in the corresponding elements of vectors X m (i) and X m (j), according to Equation (7), where j is the moment in the series that is not equal to i.
For each node satisfying {i : 1 ≤ i ≤ l − m + 1}, give a tolerance deviation r, count the number of d[X m (i), X m (j)] < r, note as B i , and the ratio of B i to the total number of vectors is calculated by Equation (8).
Next, the average value of B m i (r) is calculated as in Equation (9).
Finally, increase the dimension to m + 1, repeat the above steps, and calculate B m+1 (r) and the sample entropy of finite length series by Equation (10).

Gate Recurrent Unit
The most important feature of recurrent neural networks (RNN) is the ability to use the output of the previous moment as the input of the next moment, which makes the data with the time series feature form a connection in the time dimension. However, ordinary RNN cannot solve the long-term time series problem, so LSTM and GRU neural networks successively appeared. Cho et al. [35] proposed the GRU model based on the LSTM neural network model, which makes improvements on the basis of the LSTM neural network. The network structure is simplified by replacing the three-gate structure of forgetting, input, and output gate with an update and reset gate. Many researchers show that GRU possesses similar performance to LSTM and requires less computation [36][37][38]. The GRU network structure diagram is given in Figure 1.
The parametric relationships of the GRU network are as follows: where σ represents the sigmoid function, r t stands for the reset gate, s t means the new hidden state, s t−1 denotes the output state at the previous moment, s t shows the current state, z t is the update gate, x t is the input at the current moment, and W is the weight matrix.
Energies 2023, 16,1988 6 of 20 The reset gate determines how much information from the previous moment is retained, and the update gate determines how the retained information is combined with the current information. Compared with the LSTM model, GRU neural network has a more simplified network structure, fewer model parameters, and faster model training speed. show that GRU possesses similar performance to LSTM and requires less computation [36][37][38]. The GRU network structure diagram is given in Figure 1. The parametric relationships of the GRU network are as follows: where σ represents the sigmoid function, t r stands for the reset gate,  t s means the new hidden state, 1 t s − denotes the output state at the previous moment, t s shows the current state, t z is the update gate, t x is the input at the current moment, and W is the weight matrix. The reset gate determines how much information from the previous moment is retained, and the update gate determines how the retained information is combined with the current information. Compared with the LSTM model, GRU neural network has a more simplified network structure, fewer model parameters, and faster model training speed.

Support Vector Quantile Regression
Support vector machine (SVM) is a supervised learning model that is widely used in classification and regression analysis problems [39]. When applied in time series regression problems, it is called support vector regression (SVR). SVR is a supervised learning algorithm for forecasting discrete values, its computational complexity does not depend on the dimensionality of the input space, and it has strong generalization ability and high forecasting accuracy [40].
For a given set of time series data sample A, their relationship is as follows: where i x represents the input vector, i y means the output vector, w denotes the weight vector, b stands for the deviation vector, and ( ) t x φ is the nonlinear function that maps the input vector to the higher space.
w , b is calculated by minimizing the objective function, which is as follows:

Support Vector Quantile Regression
Support vector machine (SVM) is a supervised learning model that is widely used in classification and regression analysis problems [39]. When applied in time series regression problems, it is called support vector regression (SVR). SVR is a supervised learning algorithm for forecasting discrete values, its computational complexity does not depend on the dimensionality of the input space, and it has strong generalization ability and high forecasting accuracy [40].
For a given set of time series data sample A, their relationship is as follows: where x i represents the input vector, y i means the output vector, w denotes the weight vector, b stands for the deviation vector, and φ(x t ) is the nonlinear function that maps the input vector to the higher space. w, b is calculated by minimizing the objective function, which is as follows: where C represents the penalty parameter, y t means the true value, and f (x) is the forecast value.
Because of the multiple and complex factors affecting electric load, the traditional mean regression is less effective. Koenker et al. [41] proposed quantile regression, which not only retains the statistical information of explanatory and response variables but also reduces the effect of heteroscedasticity. They also proposed to optimize the quantile regression by using the test function and obtaining the optimal parameters by minimizing the test function, which is as follows: where τ is the quantile point. Introduce QR into SVR [42], the penalty function part of the SVR is replaced by quantile regression, and obtain the support vector quantile regression (SVQR) model as follows: where β represents the quantile matrix and ρ τ denotes the value of the test function in Equation (17). Rewrite Equation (19) into the quadratic programming as follows: To solve the above optimization problem, introduce the slack variables to construct the Lagrange function, and the results obtained from solving the quadratic programming problems of Equations (20) and (21) are as follows: where U is the matrix composed of (1, u T t ), K t means the kernel matrix obtained from the kernel function, α, α * denotes the slack variables, the parameters subscripted by τ all represent the parameter values at the τ quantile, and the parameters subscripted by t all represent the parameter values at the moment of t.
In the SVQR model, the selection of the kernel function plays an important role in the forecasting results. Since the electricity load data are very complex and not linearly separable, they need to be processed through ascending dimensionality. In this paper, using the Gaussian kernel as the kernel function of the SVQR model, the Gaussian function is as follows: wherex is the kernel function center and σ k is the width parameter of the kernel function.

Point-Interval Forecasting Model
Based on the above methods, this paper proposes a point-interval forecasting model based on regular fluctuation component extraction. Firstly, the maximum-minimum method is used to normalize the electricity load data to eliminate the influence of the dimension difference between different variables on the experiment results. Next, aiming at the fluctuation and uncertainty of the electricity load, the original electricity load series is decomposed into multiple components with different fluctuation regularity using VMD. Since the sum of the components after VMD decomposition is approximately equal to the original load series, the accumulation method is used in the component reconstruction. By calculating and comparing the sample entropy of the original series and each component, the components larger than the sample entropy of the original series are combined into weak regular fluctuation components, and those smaller ones are combined into strong regular fluctuation components by accumulation. This simplifies the model and enables the separation of the components with different strengths and weaknesses of the fluctuation regularity and the use of appropriate models to forecast for the components with  Figure 2 shows the forecasting process of the proposed model. The specific steps are as follows: 1. Use the maximum-minimum method to normalize the original data and obtain the datasets ( ) y t .
2. Apply VMD technique to decompose the electricity load series into multiple subseries with different fluctuation regularity k u .
3. Calculate the sample entropy of each component, the components larger than the sample entropy of the original series are combined into weak regular fluctuation The specific steps are as follows: 1.
Use the maximum-minimum method to normalize the original data and obtain the datasets y(t).
Apply VMD technique to decompose the electricity load series into multiple subseries with different fluctuation regularity u k . 3.
Calculate the sample entropy of each component, the components larger than the sample entropy of the original series are combined into weak regular fluctuation components, and the components smaller than the sample entropy of the original series are combined into strong regular fluctuation components by accumulation.

4.
Perform point forecasting for the strong regular fluctuation component by GRU neural network and obtain the forecasting results Q.

5.
Perform interval forecasting for the weak regular fluctuation component by SVQR model and obtain the fluctuation intervals [Q low , Q up ]. 6.
Accumulate the point forecasting results with the forecasting intervals to get the forecasting intervals [Q + Q low , Q + Q up ].

Data Description
Electricity load data from the electricity grids in central and southern China's Shaanxi Province are used as the experimental datasets, noted as dataset A and dataset B. The datasets contain meteorological factors, week types, and electricity load values at the 15-min interval from 1 January to 31 December 2017, for a total of 35,040 time points. The meteorological factors include average temperature, maximum temperature, minimum temperature, relative humidity, and precipitation. The division of the dataset is shown in Figure 3, and the statistical descriptive information of the electricity load dataset is given in Table 1, including mean, maximum, minimum, standard deviation, skewness, and kurtosis.

Data Description
Electricity load data from the electricity grids in central and southern China's Shaanxi Province are used as the experimental datasets, noted as dataset A and dataset B. The datasets contain meteorological factors, week types, and electricity load values at the 15-min interval from 1 January to 31 December 2017, for a total of 35,040 time points. The meteorological factors include average temperature, maximum temperature, minimum temperature, relative humidity, and precipitation. The division of the dataset is shown in Figure 3, and the statistical descriptive information of the electricity load dataset is given in Table 1, including mean, maximum, minimum, standard deviation, skewness, and kurtosis.  To reflect the applicability of the proposed model, the datasets are divided into training and test sets in a 4:1 ratio, and one week from each quarter is taken as the validation of the model, with 11 to 17 April in spring, 26 July to 1 August in summer, 17 September to 25 September in fall, and 25 December to 31 December in winter.

Evaluation Metrics
To better evaluate the accuracy and applicability of the model, this paper uses root mean square error (RMSE) and mean absolute error (MAE) as evaluation indicators for the point forecasting, and prediction interval coverage probability (PICP) and prediction  To reflect the applicability of the proposed model, the datasets are divided into training and test sets in a 4:1 ratio, and one week from each quarter is taken as the validation of the model, with 11 to 17 April in spring, 26 July to 1 August in summer, 17 September to 25 September in fall, and 25 December to 31 December in winter.

Evaluation Metrics
To better evaluate the accuracy and applicability of the model, this paper uses root mean square error (RMSE) and mean absolute error (MAE) as evaluation indicators for the point forecasting, and prediction interval coverage probability (PICP) and prediction interval normalized average width (PINAW) as evaluation metrics for the whole interval forecasting.
RMSE is used to measure the deviation between the forecasting values and the true values and is often used as a measure of the machine learning model's forecasting results. MAE is the mean of the absolute error, which can better reflect the actual situation of the forecasting results error. These indicators are defined as follows.
PICP and PINAW are commonly used indicators to evaluate the validity of the forecasting intervals. The larger the PICP and the smaller the PINAW of the forecasting intervals, the higher the correctness and reasonableness of the intervals, PICP and PINWA are calculated as follows.
where k is the total number of samples, y i represents the true value, U i denotes the upper limit of the interval, L i denotes the lower limit of the interval, R means the sample range, which is the maximum value minus the minimum value. While considering the PICP, it is also important to keep the PINAW as small as possible. When PINAW is too large, there must be a large PICP, but the obtained forecasting results are not helpful for decision. Therefore, MC is introduced as the evaluation indicator for interval forecasting [43]. The smaller the MC of the model, the smaller the PINAW, and the larger the PICP, the better the forecasting effect of the model. MC is defined as follows.

Data Standardization
In this paper, there are no missing values in the experimental data, but in order to eliminate the influence of the dimension difference between different variables on the experiment results, the maximum-minimum method is used to normalize the data. The method is as follows: where x represents the normalized load value, x i denotes the original load value, x max is the maximum load value, and x min is the minimum load value.

Data Decomposition and Reconstruction
Before training the interval forecasting model, the VMD-SE method is used to decompose and reconstruct the original load series. However, in the decomposition process of Energies 2023, 16,1988 11 of 20 VMD, the decomposition results are mainly affected by the modal number K. When K takes too small a value, the original series decomposition is not complete and some important information will be filtered out; when K takes too large a value, it will lead to excessive decomposition, resulting in modal duplication or additional noise. The main difference between different modals is the difference in their center frequencies. When the value of K increases by 1, the center frequency of the new modal (the rightmost modal of each column in the table is the new modal) does not change much, the value of K at this time is the optimal number of decomposition. Tables 2 and 3 show the center frequencies of each modal after decomposition. The value of K is determined by the center frequency of the new modal after decomposition, which is shown in Tables 2 and 3. In dataset A, the center frequency of the new modal starts to stabilize after K = 5 in dataset B, the center frequency of the new modal starts to stabilize after K = 6. Therefore, the VMD method is used to decompose the electricity load series of dataset A into five modals, and the electricity load series of dataset B into six modals; each modal is considered a subseries. Figure 4 shows the decomposition results of VMD.
From Figure 4, it can be found that the electricity loads of the two datasets have some fluctuation and nonlinear characteristics, and there are some differences in the fluctuation degree and fluctuation pattern of each subseries. Therefore, we calculate the sample entropy of each subseries after the decomposition and compare it with the sample entropy of the original series. Table 4 shows the sample entropy of the subseries and original series. The value of K is determined by the center frequency of the new modal after decomposition, which is shown in Tables 2 and 3. In dataset A, the center frequency of the new modal starts to stabilize after 5 K = in dataset B, the center frequency of the new modal starts to stabilize after 6 K = . Therefore, the VMD method is used to decompose the electricity load series of dataset A into five modals, and the electricity load series of dataset B into six modals; each modal is considered a subseries. Figure 4 shows the decomposition results of VMD.   The sample entropy of the original load series in dataset A is larger than that in dataset B. And the sample entropy of subseries 1 in dataset A is 0.123, while it is only 0.031 in dataset B. Therefore, the fluctuation of dataset A is more irregular relative to dataset B. Only the sample entropy of subseries 1 in dataset A is smaller than the sample entropy of the original series, so subseries 1 is taken as the strong regular fluctuation component, and the rest subseries are reconstructed as the weak regular fluctuation component. The sample entropy of subseries 1 and 2 in dataset B is smaller than the sample entropy of the original series, so subseries 1 and subseries 2 are reconstructed as strong regular fluctuation components, and the rest subseries are reconstructed as weak regular fluctuation components. Figure 5 shows the reconstructed component series. From Figure 4, it can be found that the electricity loads of the two datasets have some fluctuation and nonlinear characteristics, and there are some differences in the fluctuation degree and fluctuation pattern of each subseries. Therefore, we calculate the sample entropy of each subseries after the decomposition and compare it with the sample entropy of the original series. Table 4 shows the sample entropy of the subseries and original series. The sample entropy of the original load series in dataset A is larger than that in dataset B. And the sample entropy of subseries 1 in dataset A is 0.123, while it is only 0.031 in dataset B. Therefore, the fluctuation of dataset A is more irregular relative to dataset B. Only the sample entropy of subseries 1 in dataset A is smaller than the sample entropy of the original series, so subseries 1 is taken as the strong regular fluctuation component, and the rest subseries are reconstructed as the weak regular fluctuation component. The sample entropy of subseries 1 and 2 in dataset B is smaller than the sample entropy of the original series, so subseries 1 and subseries 2 are reconstructed as strong regular fluctuation components, and the rest subseries are reconstructed as weak regular fluctuation components. Figure 5 shows the reconstructed component series.

Point Forecast
The strong regular fluctuation component obtained after decomposition and reconstruction is taken into the point forecasting model. In addition to the VMD-SE-GRU-SVQR model proposed in this paper, five different classes of representative models are compared, namely the SVQR and GRUQR models without decomposition and reconstruction, the SVQR and GRUQR models with VMD-SE decomposition and reconstruction, and the VMD-SE-SVR-GRUQR model. Among them, the SVQR and GRUQR models do not have the point forecasting process, so the VMD-SE decomposition reconstructed SVQR and GRUQR models, as well as the VMD-SE-GRU-SVQR model and the VMD-SE-SVR-GRUQR model were compared, and the four models, respectively, used the SVR and GRU models for point forecasting. The forecasting results of the SVR and GRU models for point forecasting of the strong regular fluctuation component are shown in Figures 6 and 7.

Point Forecast
The strong regular fluctuation component obtained after decomposition and reconstruction is taken into the point forecasting model. In addition to the VMD-SE-GRU-SVQR model proposed in this paper, five different classes of representative models are compared, namely the SVQR and GRUQR models without decomposition and reconstruction, the SVQR and GRUQR models with VMD-SE decomposition and reconstruction, and the VMD-SE-SVR-GRUQR model. Among them, the SVQR and GRUQR models do not have the point forecasting process, so the VMD-SE decomposition reconstructed SVQR and GRUQR models, as well as the VMD-SE-GRU-SVQR model and the VMD-SE-SVR-GRUQR model were compared, and the four models, respectively, used the SVR and GRU models for point forecasting. The forecasting results of the SVR and GRU models for point forecasting of the strong regular fluctuation component are shown in Figures 6 and 7.  From Figures 6 and 7, it can be concluded that point forecasting using the GRU model is more accurate than point forecasting using the SVR model in both dataset A and dataset B. This is because the GRU neural network, which belongs to the category of deep learning, is better than the SVR model, which belongs to the category of machine learning, in terms of accuracy and precision. In order to show the forecasting errors of From Figures 6 and 7, it can be concluded that point forecasting using the GRU model is more accurate than point forecasting using the SVR model in both dataset A and dataset B. This is because the GRU neural network, which belongs to the category of deep learning, is better than the SVR model, which belongs to the category of machine learning, in terms of accuracy and precision. In order to show the forecasting errors of the two types of models more directly, Figure 8 shows the RMSE and MAE of the two types of models. From Figures 6 and 7, it can be concluded that point forecasting using the GRU model is more accurate than point forecasting using the SVR model in both dataset A and dataset B. This is because the GRU neural network, which belongs to the category of deep learning, is better than the SVR model, which belongs to the category of machine learning, in terms of accuracy and precision. In order to show the forecasting errors of the two types of models more directly, Figure 8 shows the RMSE and MAE of the two types of models. From Figure 8, it can be concluded that the accuracy of the GRU model is higher compared to the SVR model, whatever the season, where the root mean square error is reduced by at least 100 and the mean absolute error is reduced by at least 50. The errors are always much smaller than the SVR model, further demonstrating that the accuracy of the GRU model is much better than the SVR model. From Figure 8, it can be concluded that the accuracy of the GRU model is higher compared to the SVR model, whatever the season, where the root mean square error is reduced by at least 100 and the mean absolute error is reduced by at least 50. The errors are always much smaller than the SVR model, further demonstrating that the accuracy of the GRU model is much better than the SVR model.

Interval Forecast
The weak regular fluctuation component obtained after decomposition and reconstruction is taken into the interval forecasting model, and the fluctuation intervals obtained from interval forecasting are cumulated with the point forecasting results to obtain the electricity load forecasting intervals. In order to better analyze and compare the forecasting results, electricity load-interval forecasting is performed at 80%, 85%, and 90% confidence levels, respectively. The forecasting results of dataset A and dataset B at a 90% confidence level are given in Figures 9 and 10. Meanwhile, in order to observe the details more clearly, partial enlargements are given on the lower right side of each figure.
The following conclusions can be drawn from Figures 9 and 10: 1.
Compared to the pure interval forecasting model, the addition of the point-interval idea ensures both interval accuracy and reduces the interval width.

2.
The accuracy of the intervals obtained by using SVR for point forecasting in the combination model is not as high as that by using the GRU model, which indicates that deep learning still has a great advantage over machine learning in the aspect of accuracy.

3.
According to the interval width and interval coverage, the models can be generally classified into three levels; the first level is the model that uses deep learning GRU neural network for point forecasting, such as VMD-SE-GRUQR, VMD-SE-GRU-SVQR; the second level is the model that uses machine learning SVR for point forecasting, such as VMD-SE-SVQR, VMD-SE-SVR-GRUQR; and the third level is the forecasting model that does not use the point-interval idea, such as SVQR and GRUQR. The forecasting intervals obtained from the first level model have higher coverage, narrower interval width, and are closer to the actual electricity load values. Tables 5 and 6 show the MC values of the forecasting intervals obtained by each model under different seasons. Table 7 shows the training time of each model.

Interval Forecast
The weak regular fluctuation component obtained after decomposition and reconstruction is taken into the interval forecasting model, and the fluctuation intervals obtained from interval forecasting are cumulated with the point forecasting results to obtain the electricity load forecasting intervals. In order to better analyze and compare the forecasting results, electricity load-interval forecasting is performed at 80%, 85%, and 90% confidence levels, respectively. The forecasting results of dataset A and dataset B at a 90% confidence level are given in Figures 9 and 10. Meanwhile, in order to observe the details more clearly, partial enlargements are given on the lower right side of each figure.  The following conclusions can be drawn from Figures 9 and 10: 1. Compared to the pure interval forecasting model, the addition of the point-interval idea ensures both interval accuracy and reduces the interval width. 2. The accuracy of the intervals obtained by using SVR for point forecasting in the combination model is not as high as that by using the GRU model, which indicates      Table 7. Training time of six models. In order to show the results of each model more visually, Figure 11 shows the comparison of different models' MC values under each quarter. From Tables 5-7 and Figure 11, the following conclusions can be drawn:
VMD-SE-GRUQR model and VMD-SE-GRU-SVQR model possess better forecasting intervals compared with other models.

2.
Comparing GRUQR, VMD-SE-GRUQR, and VMD-SE-GRU-SVQR models, the results show that decomposing and reconstructing the time series data before forecasting can effectively reduce the model MC values.
VMD-SE-GRUQR is compared with the VMD-SE-GRU-SVQR model, and the results show that both models have low MC values, indicating that the forecasting intervals obtained by both models are superior, but the training time of the latter model required is shorter than that of the former.
To better evaluate the forecasting ability of each model under different datasets, Table 8 shows the average values of PICP, PINAW, and MC with different confidence levels. From Table 8, it can be found that all the models, except the VMD-SE-SVQR model, possess more than 90% interval coverage, which is caused by the lack of machine learning accuracy and is improved after using the deep learning model. In terms of the average interval width, the VMD-SE-SVQR and VMD-SE-GRU-SVQR models have narrower interval widths, while the VMD-SE-GRUQR and VMD-SE-SVR-GRUQR models have relatively wider interval width, which is due to the ineffectiveness of the neural network method for interval forecasting when the training data are highly fluctuating. In addition, the VMD-SE-GRU-SVQR model proposed in this paper possesses a smaller MC, which is only 0.13 and is lower than other models.
3. Comparing the VMD-SE-GRUQR, VMD-SE-SVQR, VMD-SE-SVR-GRUQR, and VMD-SE-GRU-SVQR models, it is found that the models with point forecasting by GRU possess smaller MC values. 4. VMD-SE-GRUQR is compared with the VMD-SE-GRU-SVQR model, and the results show that both models have low MC values, indicating that the forecasting intervals obtained by both models are superior, but the training time of the latter model required is shorter than that of the former. To better evaluate the forecasting ability of each model under different datasets, Table 8 shows the average values of PICP, PINAW, and MC with different confidence levels. From Table 8, it can be found that all the models, except the VMD-SE-SVQR model, possess more than 90% interval coverage, which is caused by the lack of machine learning accuracy and is improved after using the deep learning model. In terms of the average interval width, the VMD-SE-SVQR and VMD-SE-GRU-SVQR models have narrower interval widths, while the VMD-SE-GRUQR and VMD-SE-SVR-GRUQR models have relatively wider interval width, which is due to the ineffectiveness of the neural network method for interval forecasting when the training data are highly fluctuating. In addition, the VMD-SE-GRU-SVQR model proposed in this paper possesses a smaller MC, which is only 0.13 and is lower than other models. Since the models that did not combine point-interval forecasting clearly have worse MC values than the forecasting results of the other models, the models that combined  Since the models that did not combine point-interval forecasting clearly have worse MC values than the forecasting results of the other models, the models that combined point-interval forecasting are compared again. Figure 12 shows the MC averages of the four combined point-interval forecasting models, in which the VMD-SE-GRU-SVQR model proposed in this paper all has the lowest MC averages at three confidence levels, the MC values of VMD-SE-GRUQR, VMD-SE-SVR-GRUQR, and VMD-SE-SVQR are slightly higher, while the SVQR and GRUQR models have excessive MC values, which fully demonstrates the superiority of the point-interval forecasting model proposed in this paper.