Next Article in Journal
Impact of COVID-19 on University Activities: Comparison of Experiences from Slovakia and Georgia
Next Article in Special Issue
Human Capital and Carbon Emissions: The Way forward Reducing Environmental Degradation
Previous Article in Journal
How Does Digital Transformation Facilitate Enterprise Total Factor Productivity? The Multiple Mediators of Supplier Concentration and Customer Concentration
Previous Article in Special Issue
Key Processes for the Energy Use of Biomass in Rural Sectors of Latin America
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Step Ahead Forecasting of the Energy Consumed by the Residential and Commercial Sectors in the United States Based on a Hybrid CNN-BiLSTM Model

School of Public Affairs and Administration, University of Electronic Science and Technology of China, Chengdu 611731, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(3), 1895; https://doi.org/10.3390/su15031895
Submission received: 2 January 2023 / Revised: 16 January 2023 / Accepted: 16 January 2023 / Published: 19 January 2023
(This article belongs to the Special Issue Development Trends of Environmental and Energy Economics)

Abstract

:
COVID-19 has continuously influenced energy security and caused an enormous impact on human life and social activities due to the stay-at-home orders. After the Omicron wave, the economy and the energy system are gradually recovering, but uncertainty remains due to the virus mutations that could arise. Accurate forecasting of the energy consumed by the residential and commercial sectors is challenging for efficient emergency management and policy-making. Affected by geographical location and long-term evolution, the time series of the energy consumed by the residential and commercial sectors has prominent temporal and spatial characteristics. A hybrid model (CNN-BiLSTM) based on a convolution neural network (CNN) and bidirectional long short-term memory (BiLSTM) is proposed to extract the time series features, where the spatial features of the time series are captured by the CNN layer, and the temporal features are extracted by the BiLSTM layer. Then, the recursive multi-step ahead forecasting strategy is designed for multi-step ahead forecasting, and the grid search is employed to tune the model hyperparameters. Four cases of 24-step ahead forecasting of the energy consumed by the residential and commercial sectors in the United States are given to evaluate the performance of the proposed model, in comparison with 4 deep learning models and 6 popular machine learning models based on 12 evaluation metrics. Results show that CNN-BiLSTM outperforms all other models in four cases, with MAPEs ranging from 4.0034% to 5.4774%, improved from 0.1252% to 49.1410%, compared with other models, which is also about 5 times lower than that of the CNN and 5.9559% lower than the BiLSTM on average. It is evident that the proposed CNN-BiLSTM has improved the prediction accuracy of the CNN and BiLSTM and has great potential in forecasting the energy consumed by the residential and commercial sectors.

1. Introduction

With the declaration of the COVID-19 outbreak as a pandemic on 11 March 2020 by the Word Health Organization (WHO), COVID-19 has resulted in a massive global toll and unprecedented impact on public health, economic growth, and energy security. As of 23 December 2022, there have been 99,027,628 confirmed cases of COVID-19, with 1,080,010 deaths, reported to WHO [1], and the economy also took a devastating hit due to the COVID-19 pandemic, with increasing unemployment rates and dramatically slowed GDP growth rate of the country, as shown in Figure 1. As a country, with the remaining energy security as a priority issue, the United States has been influenced by the pandemic, with the total energy consumption falling to 93 quads, a decrease of 7%, compared to 2019, according to EIA’s Energy Review [2]. In an effort to reduce the spread of COVID-19 associated with community activities, the government and the Centers for Disease Control and Prevention (CDC) have issued mandatory stay-at-home orders in 42 states and territories from 1 March through 31 May 2020 [3]. Affected by the lockdowns (such as closed offices, businesses, schools, industrial facilities, and curtailment recreations), energy consumption by the commercial and residential sectors fell by 7% to less than 17 quads in 2020 and 1% to less than 21 quads (less energy consumption for home heating, due to the warm weather in 2020), respectively. Specifically, compared to April 2019, U.S. residential electricity sales were up 8%, while the commercial sector was down 11% [4]. Several studies have further implemented the significant changes related to the energy consumption by the residential and commercial sectors and analyzed the reasons from the perspective of the lockdowns during the COVID-19 pandemic [5,6,7], revealing that it will take some time for the energy sector to return its “normal” status. More importantly, with more relaxed measures by governments and societies worldwide after the omicron wave and economic recovery activities, but also the risk of a new phase of new variants, COVID-19 will continue to be a major source of uncertainty for energy consumption, especially in the residential and commercial sectors that are tightly connected to human life. Therefore, accurate forecasting of residential and commercial energy consumption attaches great significance to energy security, social operation, and living safe ground in facing these challenges.
For energy consumption forecasting, the forecasting horizon is conventionally divided into short-term, medium-term, and long-term forecasting: short-term forecasting usually predicts the data from one hour to one week (hourly, daily, and weekly horizons), medium-term forecasting is usually applied to intervals ranging from one week to one year (monthly and quarterly horizons), and long-term forecasting is for one year or more than one year (annual horizons) [8,9,10]. For the application fields, Soldo et al. [11] divided the application forecasting fields of energy consumption mainly from the world level, national or state level, and urban areas, based on the existing research. Currently, the forecasting methods for energy consumption can be roughly divided into statistical forecasting models (mainly time series models), grey system models, deep learning models, and machine learning models.
The time series models are the most traditional forecasting methods. In recent years, many scholars have expanded the traditional time series models from the modeling and application level and made a series of achievements in the energy consumption forecasting field. For instance, the seasonal autoregressive integrated moving average with exogenous factors (SARIMAX) is proposed to forecast residential natural gas consumption in Turkey [12], recursive autoregressive with Extra Input is utilized to forecast the short-term gas consumption [13], autoregression distributed lag mixed data sampling model for the long-term energy demand forecasting of China [14], wavelet transform auto regression integrated moving average is proposed for the load forecasting [15], and Bayesian vector autoregressive model for the energy supply and demand forecasting [16]. The energy consumption forecasting based on the time series models is carried out on both the variations and historical patterns in the data with a consideration of the trend or seasonal effects, and the forecasting horizons are diverse, ranging from short-term to long-term forecasting. However, the time series models always hold a better forecasting performance in medium-term and long-term forecasting than the short-term one, as there will be limitations in dealing with large-scale data with complex nonlinearity.
Grey system models can be used in nonlinear time series forecasting and have been widely applied in energy forecasting in recent years. Zhang et al. [17] proposed the grey Lotka–Volterra model to analyze the internal relationship between the energy consumption structure of China, the United States (US), and Germany. A fractional nonlinear grey Bernoulli model with rolling was proposed to forecast the share of renewables in primary energy consumption [18]. Khan and Osińska [19] proposed an optimized nonlinear grey Bernoulli model for energy consumption forecasting in Brazil and India. Chen et al. [20] proposed the optimal nonlinear metabolic grey system model to forecast the energy consumption in the Yangtze River Delta region. Ding et al. [21] proposed the seasonal adaptive grey model to forecast the medium-term renewable energy consumption of the commercial sector in the US. The nonlinear interactive grey system model based on dynamic compensation was proposed for total energy consumption forecasting in China [22]. With its outstanding ability to cope with uncertain problems characterized by small samples and inadequate information, the grey system models are efficient in long-term yearly forecasting and always yield higher precision than the models that heavily rely on historical data [23]. Besides, the grey system models also perform well in nonlinear long-term forecasting tasks with the development of the “grey-box” models combined with other nonlinear approaches. However, a few studies have applied the grey system models for medium-term and short-term forecasting and always resulted in poor precision [9].
Compared to the conventional time series model and grey system models, the data-driven machine learning models are preferred to short-term and medium-term forecasting, as these models need to learn useful information from the data, as well as the hyperparameter tuning to obtain a capability for efficient forecasting. At present, many scholars directly apply single machine learning models to energy forecasting and obtain good prediction results. For example, Gaussian process regression was used in short/medium-term forecasting of solar and wind power [24]. Using multiple machine learning models for energy forecasting and comparing their prediction performance is also a popular research method [25,26]. However, many studies have shown that hybrid models outperform single models, such as the hybrid models based on support vector regression (SVR) [27,28,29,30], the hybrid model based on extreme gradient boosting (XGB) [31], and the hybrid models based on multiple machine learning models [32]. The machine learning models have the advantage of dealing with complex nonlinear problems in short-term and medium-term forecasting. However, there are limitations of the machine learning models in the monthly and quarterly forecasting and always present a “time delay” problem, leading to relatively low prediction accuracy.
With the popularity of deep learning, many neural network models are widely utilized in energy consumption forecasting. Similar to machine learning models, deep learning models are also data-driven models that need a large amount of data for training the model and optimizing the model hyperparameters. Thereby, deep learning models are widely used in short-term and medium-term forecasting with a strong capability of handling complete nonlinear problems, as well as outstanding precision and robustness. At present, the long short-term memory (LSTM), the gate recurrent unit (GRU), the recurrent neural network (RNN), and the convolutional neural network (CNN) are the most popular neural network models. Additionally, many studies have been carried out based on these models. Mustaqeem et al. [33] proposed the CNN-assisted deep echo state network for short-term solar energy forecasting. Amalou et al. [34] compared the RNN, LSTM, and GRU in different energy consumption forecasting. A hybrid model based on the variational auto-encoder and LSTM was proposed to forecast the renewable electricity supply in South Korea [35]. Yang et al. [36] used the nonlinear mapping network to forecast wind power. Ren et al. [37] used the LSTM, Stack LSTM, and Bidirectional LSTM to forecast the multiple energy loads for a university. Ding et al. [38] made renewable energy generation forecasting by the seasonal and trend decomposition using Loess (STL) and LSTM. Khan et al. [39] proposed the integration model based on artificial neural network (ANN) and the LSTM for solar PV generation forecasting. Many studies show that connecting neural networks with different layers is an effective method to improve the prediction accuracy of a single neural network, such as LSTM-RNN [40], A-CNN-LSTM (A is attention layer) [41], CNN-GRU [42], CNN-LSTM-AE (AE is autoencoders) [43], RNN-LSTM [44], ConvLSTM-BDGRU-MLP (ConvLSTM is convolutional LSTM; BDGRU is bidirectional GRU; MLP is multilayer perceptron) [45], CNN-BiGRU (BiGRU is bi-directional GRU) [46], EWT-attention-LSTM (EWT is an empirical wavelet transform) [47], LSTM-GAM (GAM is global attention mechanism) [48], and CNN-LSTM [49]. It can be seen from the above research that CNN and LSTM are the most popular layers, as they have the strong ability to capture the data features. Actually, the bidirectional LSTM (BiLSTM) has been proven that it has better performance than the LSTM in forecasting energy consumption [50,51]. Therefore, a hybrid model based on the CNN and the BiLSTM is proposed to forecast the energy consumed by the residential and commercial sectors in this paper.
The rest of this paper is arranged as follows: Section 2 introduces the modeling process; Section 3 presents the parameter-solving process of the proposed model; Section 4 gives the four forecasting cases of the energy consumed by the residential and commercial sectors; Section 5 presents the discussion; Section 6 gives the conclusions.

2. Methodology

2.1. Spatial Features Extraction of Time Series Based on Convolutional Neural Network

The time series of energy consumption of residential and commercial sectors presents nonlinear and nonstationary features affected by the geographical location of residential and commercial areas. It is difficult to capture these spatial features. However, the convolutional neural network (CNN) can easily solve this problem, as it can learn the structure of the time series very well. CNN is a neural network with convolution, which can reduce the complexity of the neural network. The local features extraction and weights sharing are the key operations of CNN. Since the spatial connection of the time series is local, each neuron only needs to catch the local features. Then, the global information can be obtained by combining these local neurons, which can reduce the number of connections. Weights sharing can reduce the hyperparameters of the CNN, making the structure of the network more adaptable and simpler. The convolution layer and pooling layer are the main modules to achieve the feature extraction of the CNN.
Convolution layer: The main effect of the convolution layer is to capture the spatial features of the time series in this paper. Different from the traditional weight connection layer, this layer extracts features through the convolution operation. Additionally, the one-dimensional convolution layer is used in this paper. When performing a convolution operation, the convolution kernel slides over the input and calculates the inner product with the input to obtain a new feature matrix. Assuming a one-dimensional time series input x = x ( ξ ) ξ = 1 N , the output from the convolution layer with one convolution kernel w = w ( ξ ) ξ = 1 κ is
H = ( h ( j ) ) j = 1 N κ + 1 , h ( j ) = w · x j = ξ = 1 κ w ( ξ ) x j ( ξ ) ,
where x j is the input subsequence covered by the jth slide of w and κ is the kernel size. The convolution kernel only has one channel, as the input is one-dimensional. In the above way, the output size is absolutely controlled by the kernel size and input size.
Pooling layer: The main function of this layer is to compress the feature matrix by reducing the number of features to avoid overfitting. The two commonly used methods of the pooling layer are max pooling and average pooling. The max pooling divides the feature matrix output from the convolution layer into different subregions and then calculates the maximum value of each subregion to obtain a new feature matrix that has a smaller size than the raw feature matrix. Similarly, the average pooling is to calculate the average value of each subregion to obtain a new feature matrix. The average pooling preserves the overall features of the data but reduces the feature differences, while the max pooling can extract the most responsive part of the features. Therefore, the one-dimensional max pooling layer is utilized in this paper. Assuming the input size of the max pooling is ( M , C , N in ) , the output can be described as:
out M j , C i , ξ = max t = 0 , , κ 1 input M j , C i , α × ξ + t ,
where α is the stride of the sliding window. The output size is ( M , C , N out ) , where
N out = N in κ α + 1 .

2.2. Temporal Features Extraction of Time Series Based on Bidirectional Long Short-Term Memory (BiLSTM)

The long short-term memory (LSTM) introduced in 1997 is an extension of RNN that solves the problem of exploding and vanishing gradients more effectively than the conventional RNN models [52]. It has been popularized in a variety of tasks in different research fields, such as machine translation [53], handwriting recognition [54], time series prediction[55], and so on. Out of its various applications, the LSTMs are well-suited in time series forecasting, with the advantage of precisely learning long-term dependencies. With the popularity of LSTM and outstanding behaviors, several variants and different versions of the LSTM appeared in these years, of which the BiLSTM developed by Jurgen Schmidhuber [56] is one of the widely applied LSTMs and always provides better forecasting performance, compared to regular LSTM models in time series forecasting [57]. Therefore, the BiLSTM model is utilized in this work.
For a common LSTM unit, let C t denotes the cell state, which indicates the memory of the LSTM containing both the previous and current information. Besides, there are three gates: the input gate i t , the output gate o t , and the forget gate f t in each LSTM unit to control the flow of information optionally. Within the weight matrix { W j | j = f , C , i , o } , bias term { b j | j = f , C , i , o } , current input data x t , and previously hidden state h t 1 , the whole process of a single LSTM network unit can be explained in three parts: the forget gate to select the information, the input gate and new memory to store and update new information, and the output gate to determine what output from the cell state. A more direct illustration of the LSTM model is shown in Figure 2.
The forget gate: The output value of forget gate is obtained by a Sigmoid function σ , which is based on the combination of the output value of the h t 1 and x t to decide which information should be deleted from the memory and which information should be retained. It can be given as
f t = σ W f · h t 1 , x t + b f .
The input gate and new memory network: Within the same x t and h t 1 , this section consists of the following two parts: one is the input gate, deciding which new information to add according to the Sigmoid function σ , similar to the forget gate, the other one is a new vector C ^ t created by the “tanh” function that shifts the output in the interval 1 , 1 . In short, implements are described in the following formulations:
i t = σ W i · h t 1 , x t + b i ,
C ^ t = tanh W C · h t 1 , x t + b C .
Then, the new cell state C t is updated by multipling f t and C t 1 , and adding i t C ^ t , which is illustrated as below:
C t = f t C t 1 + i t C ^ t .
The output gate: The output gate determines what output information to generate, and the formula for calculating the o t and the new hidden layer h t is given as
o t = σ W o · h t 1 , p t + b o , h t = o t tanh C t .
Compared with the conventional one-directional LSTM, the BiLSTM contains two independent LSTM structures in which feature learning is performed in forward and reverse order for the input sequence. In this way, the model is not only trained from input to output but also can be trained from output to input, thereby effectively improving the model dependence and improving the model’s forecasting accuracy. For BiLSTM, the hidden layer h t consists of both the forward h t and backward h t , which is expressed as follows:
h t = h t h t ,
where ⊕ represents the element-wise summation of the forward and backward output components, and the network structure of the BiLSTM is presented in Figure 3.

2.3. The CNN-BiLSTM

The hybrid model that combines the CNN, the BiLSTM and the connection layer is proposed to forecast the energy consumed by the residential and commercial sectors, abbreviated as CNN-BiLSTM. The input of the CNN-BiLSTM first enters the CNN layer, and a new feature matrix is generated after convolution computation and max pooling. The feature matrix obtained from the CNN is utilized as the input of the BiLSTM, and then the hidden output of the BiLSTM is obtained. The hidden output is pushed into the connection layer, which consists of a linear layer. Then, the final predicted results are obtained from the connection layer.

2.4. Forecasting Strategy for Univariate Time Series

The hybrid model CNN-BiLSTM describes the relationship between input and output. A common method is using the recursive multi-step forecasting strategy to construct the univariate time series forecasting scheme. Since the CNN-BiLSTM is a supervised learning method, the univariate time series needs to be reconstructed to satisfy the input and output of the CNN-BiLSTM. Assuming a univariate time series sample y ( 1 ) , y ( 2 ) , , y ( n ) with lag τ , the predicted value of the y ( τ + 1 ) can be obtained from the previous τ steps. Then, the 1-dimensional vector is reconstructed into ( τ + 1 ) -dimensional matrix. The reconstructed sample matrix Θ can be generated as
Θ = [ Y ( 1 ) , Y ( 2 ) , , Y ( τ ) , Y ( τ + 1 ) ] = y ( 1 ) y ( 2 ) y ( τ ) y ( τ + 1 ) y ( 2 ) y ( 3 ) y ( τ + 1 ) y ( τ + 2 ) y ( n τ 1 ) y ( n τ ) y ( n 2 ) y ( n 1 ) y ( n τ ) y ( n τ + 1 ) y ( n 1 ) y ( n ) ( n τ ) × ( τ + 1 ) ,
where Y ( 1 ) = [ y ( 1 ) , y ( 2 ) , , y ( τ ) , y ( τ + 1 ) ] T is a column vector with lag τ . Then, the input of the CNN-BiLSTM is a matrix X consisting of the previous τ column vectors [ Y ( 1 ) , Y ( 2 ) , , Y ( τ ) ] , and the output is the ( τ + 1 ) -step column vector Y ( τ + 1 ) in Equation (10). The detailed input and output can be presented by
X = y ( 1 ) y ( 2 ) y ( τ ) y ( 2 ) y ( 3 ) y ( τ + 1 ) y ( n τ 1 ) y ( n τ ) y ( n 2 ) y ( n τ ) y ( n τ + 1 ) y ( n 1 ) ( n τ ) × τ , Y = y ( τ + 1 ) y ( τ + 2 ) y ( n 1 ) y ( n ) ( n τ ) × 1 .
When forecasting the time series, the new predicted value of y ( ξ ) is added to the input for the next time step, then the future points can be forecasted by the CNN-BiLSTM based on the recursive strategy. The detailed recursive process is expressed as follows:
y ^ ( n + 1 ) = g y ( n τ + 1 ) , , y ( n 1 ) , y ( n ) ,
y ^ ( n + 2 ) = g y ( n τ + 2 ) , , y ( n ) , y ^ ( n + 1 ) ,
y ^ ( n + τ + 1 ) = g y ^ ( n + 1 ) , y ^ ( n + 2 ) , , y ^ ( n + τ ) ,
It is interesting to see that when the forecasting reaches step τ + 1 , the input vector already completely consists of the forecasted values, which implies that a complete extrapolation has been achieved. The detailed calculation process is given by Algorithm 1.
Algorithm 1: Algorithm of recursive multi-step forecasting strategy.
Sustainability 15 01895 i001
The overall workflow of the CNN-BiLSTM is plotted in Figure 4.

3. Hyperparameters Tuning Based on the Grid Search Approach

It is generally acknowledged that the performance of a deep learning model heavily depends on the selection of hyperparameters since neural networks are notoriously difficult to configure and require multiple hyperparameters to be set. In recent years, swarm intelligence optimization algorithms have been widely used to tune the model hyperparameters. However, some essential factors that may affect the optimization precision exist, such as the high computational cost, algorithm-based parameters tuning and control, rate of convergence and control, etc. [58]. For instance, the popular swarm-based particle swarm optimization (PSO) is weakened in the local explanation and easily falls into the local optimum with a low convergence rate, resulting in a low precision or even failure [59,60,61]. The grid search method is one of the most common and also a direct way of efficiently tuning the model hyperparameters and has been successfully developed for tuning the hyperparameters of the deep learning models [62,63,64]. Therefore, the grid search approach is developed to tune the optimal hyperparameters of the models in this work.
The framework for the hyperparameters optimization based on the grid search is developed in mainly two aspects: data division and grid search scheme. Firstly, the nested cross-validation [65] is utilized in this work for data division, a popular method for hyperparameters optimization and model selection to get over the overfitting problem in time series forecasting. Assumed the training size ς (80% of the data), validation size υ (10% of the data), and testing size p (10% of the data), the raw input X and output Y data are split into the three parts: the training data X ς , Y ς for model fitting, the validation data X υ , Y υ for model validation, and the testing data X p , Y p for out-of-sample forecasting performance model evaluation. Then, the grid search approach is employed to search the optimal model hyperparameters based on the training and validation data. For the CNN-BiLSTM model, there are three main hyperparameters that need to be tuned: the kernel size κ for CNN, the hidden size for the BiLSTM, and the learning rate l r , an important hyperparameter to train a good model in deep learning. Within the grid of the hyperparameters, the key to the hyperparameters optimization based on the grid search approach is to search the κ , , and l r to minimize the mean squared error (MSE) on the validation data X υ , Y υ , which can be expressed as
min 1 υ k = 1 υ y υ ( k ) y υ ^ ( k ) 2 .
Based on the optimally selected hyperparameters κ * , * , and l r * , the CNN-BiLSTM model will be refitted based on the in-sample data (containing both the training X ς and validation data X υ ). Then, the fitted model with optimal hyperparameters is prepared for evaluating the model’s predictive ability on the out-of-sample testing data X p . In general, the integral computational algorithm procedure of the hyperparameters tune based on the grid search approach is presented in the following Algorithm 2.
Algorithm 2: Grid search for tuning the optimal hyperparameters of the CNN-BiLSTM.
Sustainability 15 01895 i002

4. Case Studies

4.1. Data Collection and Pre-Processing

In this paper, the data were collected from the EIA Monthly Energy Review (https://www.eia.gov/totalenergy/data/monthly/, accessed on 13 June 2020) from January 1973 to May 2020, including the total energy consumed by the residential sector, end-use energy consumed by the residential sector, primary energy consumed by the commercial sector, and end-use energy consumed by the commercial sector. Before data division, the data were first normalized into the interval 0 , 1 , based on the MinMax scaling. Then, data from January 1973 to May 2018 (535 samples) were used as the in-sample data to fit and tune the hyperparameters of the models, and data from June 2018 to May 2020 (24 multi-step ahead forecasting) were used as the testing data to verify the out-of-sample predictive ability of the models in this work.

4.2. Benchmarked Models for Comparisons and Evaluation Metrics

In this study, five benchmarked deep learning models, including the single BiLSTM [56], CNN [66], LSTM [52], and combined CNN-LSTM [49], as well as several popular machine learning models, including the SVR [67] based on the RBF kernel, least squares support vector regression (LSSVR) [68] based on the RBF kernel, random forecast (RF) [69], gradient boosting with categorical features support (CatBoost) [70], light gradient boosting machine (LGBM) [71], and XGBoost [72] were developed for model comparison. In addition, twelve assessment metrics were utilized to verify the model forecasting performance. Detailed illustrations of the metrics are presented in Table 1.

4.3. Forecasting Results Analysis

In order to comprehensively compare the CNN-BiLSTM model with other models, four actual cases (including the total energy consumed by the residential sector, end-use energy consumed by the residential sector, primary energy consumed by the commercial sector, and end-use energy consumed by the commercial sector) with the same lag are carried out.

4.3.1. Case I: Total Energy Consumed by the Residential Sector

In the 24 multi-step ahead forecasting of the total energy consumed by the residential sector, the out-of-sample predicted values are displayed in Figure 5, and the corresponding evaluation metrics are presented in Table 2. It is clear that the predicted values of the CNN-BiLSTM are the closest to the raw data. Additionally, it can be seen that the predicted curve of the BiLSTM is similar to those of the CNN-BiLSTM. This is easy to explain as the BiLSTM is one of the layers of the CNN-BiLSTM. However, the BiLSTM failed to forecast the last two points. It is clear that the CNN-BiLSTM outperforms the CNN, which shows that the hybrid model can effectively improve the prediction performance of the single model. The predicted values of the other deep learning models are similar to constants. They absolutely failed to forecast the raw data, as these models are overfitting. For machine learning models, RF and LGBM perform well in this case, but the CatBoost and the XGBoost only catch the rough trend of the raw data. Even the SVR and LSSVR obtain the wrong trends due to overfitting.
It is clear that the CNN-BiLSTM outperforms all the other models, as all its metrics are the best. It can be seen that, although the predicted values of the RF and LGBM are similar to those of the CNN-BiLSTM, their metrics are worse than those of the CNN-BiLSTM. Additionally, prediction accuracy of the CNN-BiLSTM is obviously better than the single CNN and the BiLSTM. For the other deep learning models, the MAPE of the CNN-BiLSTM is at least four times smaller than that of them. For machine learning models, the LSSVR performs the worst, and the SVR has a similar performance to it. In general, the CNN-BiLSTM performs best in this case.

4.3.2. Case II: End-Use Energy Consumed by the Residential Sector

In the 24 multi-step ahead forecasting of the end-use energy consumed by the residential sector, the out-of-sample predicted values are plotted in Figure 6, and the corresponding evaluation metrics are presented in Table 3. It can be seen that the predicted points of the first 12 steps of the CNN-BiLSTM almost coincide with the original points. It is obvious that, although the prediction accuracy of most models becomes worse with the increase in the number of forecasting steps, the CNN-BiLSTM presents the best stability. It is clear that the CNN-BiLSTM outperforms the CNN and the BiLSTM, even though the predicted values of the BiLSTM are closest to those of it. The performance of the BiLSTM for the later points is worse than the CNN-BiLSTM, which implies that the stability of the BiLSTM is worse than the CNN-BiLSTM. The CNN-LSTM presents the worst performance among all the models, as the model is overfitting. The CNN and the LSTM only capture the rough trends. The predicted curves of the machine learning models, except the SVR and LSSVR, are quite similar, but obviously, their predicted values are not as close to the raw data as the CNN-BiLSTM.
It is clear that of all the models, the CNN-BiLSTM has the best metrics while the CNN-LSTM has the largest error. It is interesting to see that the performance of the CNN-BiLSTM is better than the single CNN and BiLSTM, which further explains that the hybrid model with the convolution layer and BiLSTM layer has a stronger prediction ability. And it can be seen that the prediction performance of the LSTM is still far worse than the BiLSTM. Although the predicted curves of most machine learning models look pretty close to the raw data, their metrics are poor. The LGBM presents the best prediction accuracy among the machine learning models, while it is still worse than the CNN-BiLSTM. And the SVR and the LSSVR have the worst performance among the machine learning models.

4.3.3. Case III: Primary Energy Consumed by the Commercial Sector

In the case of the primary energy consumed by the commercial sector multi-step ahead forecasting, 24-step out-of-sample predicted values are shown in Figure 7, and the corresponding evaluation metrics are presented in Table 4. In general, the utilized CNN-BiLSTM model is well-performed with all multi-step ahead predicted values close to the raw samples and outperforms the other benchmarked models. One can find that only the CNN-BiLSTM model successfully captured the fluctuations from July-September 2019, whereas other models are weak in dealing with those fluctuations. Besides, the CNN-BiLSTM holds the advantage of following the trend without less trouble in time delay, while all the predicted values of other models are carried with the “time delay” characteristic. Compared with the other single benchmarked deep learning models (including the BiLSTM, CNN, and LSTM), the CNN-BiLSTM model improves the predictive ability a lot, and it is also interesting to see that the CNN-LSTM model has quite poor performance with a nearly horizontal forecasting line in this case due to the overfitting. On the other hand, the CNN-BiLSTM is more capable of coping with the peaking point and time delay than the machine learning models.
From the perspective of the out-of-sample evaluation metrics in Table 4, the CNN-BiLSTM model consistently produces the lowest metrics (where IA and R 2 are the largest) than other benchmarked deep learning models and machine learning models, of which the MAPE is lower from 6.6096% to 41.2681%. As we can see, the R 2 for the other deep learning models are relatively small, especially the CNN-LSTM model with an R 2 value of −1.8351 and there also no R 2 value above 0.9, whereas the R 2 of the CNN-BiLSTM model is up to 0.972, revealing that the CNN-BiLSTM is capable of the out-of-sample forecasting.

4.3.4. Case IV: End-Use Energy Consumed by the Commercial Sector

In the case of the End-Use energy consumed by the commercial sector multi-step ahead forecasting, 24-step out-of-sample predicted values are shown in Figure 8, and the corresponding evaluation metrics are presented in Table 5. It is obvious that the CNN-BiLSTM model is more efficient in the out-of-sample multi-step ahead forecasting as reflected in Figure 8. For the predicted values based on the deep learning models (plotted in (a)–(e)), the CNN-LSTM model and the LSTM model hold very poor performance in the 24-step ahead forecasting, as the predicted values are far away from the raw samples, and the reason for this phenomenon is that the models are overfitting. Besides, the CNN model also fails to catch the continuance of periodicity and presents an uncorrected predictive growing linear tendency as the CNN is overfitted. Compared to the single BiLSTM model, the forecasting ability of the BiLSTM combined with CNN is improved, in terms of handling the fluctuations and better following the original trend. In terms of the machine learning models, there is a common phenomenon that the predicted values by machine learning models delay with more steps, and it is difficult for the machine learning models to predict the values nearby the peak points. In contrast, the CNN-BiLSTM model shows the advantage of efficiently peaking points, with predicted values closely following the original data and less “lag” characteristic in out-of-sample multi-step ahead forecasting.
From what has been discussed above, the out-of-sample metrics of all models provided in Table 5 also present more precise numerical evidence that the utilized CNN-BiLSTM model, which performs better in the out-of-sample time series forecasting than the popular machine learning models and the single deep learning models, as well as the combined CNN-LSTM models. In terms of benchmark deep learning models, the MAPE of CNN-BiLSTM is almost five times smaller than these models, with the value reduced between 14.0755% and 49.1410%, and the R 2 of CNN-BiLSTM is 0.9330, while the R 2 for other deep learning models are negative. On the other hand, the out-of-sample multi-step predictive ability of the machine learning models is inferior to that of the CNN-BiLSTM model with MAPE exceeding 10%. In brief, the CNN-BiLSTM model always yields a favorable forecasting performance.

5. Discussion

For further discussion, one can find that the CNN-BiLSTM model maintains the preferable performance in all the cases, with the out-of-sample predicted values close to the raw data in an accurate trend, as well as the superior metrics than all the other benchmarked deep learning models and several popular machine learning models. More detailed analyses are presented as follows in terms of the predictive ability of the model and application performance.

5.1. Comparisons of the CNN-BiLSTM Model and Benchmarked Machine Learning Models

The kernel models (such as the SVR and LSSVR model) and tree models (such as RF, CatBoost, etc.) used in this work always produce favorable forecasting performance in recent years, whereas it can be easily found that those models hold homogenous drawbacks, which are unable to follow the raw trends tightly with apparent “lag” characteristic in the multi-step ahead forecasting (noticeable in Case III and Case IV of the commercial sector). Additionally, the machine learning models also waken in handling the top points, with predicted values upward or down from the raw data (in Case I and Case II of the residential sector). However, the deep learning CNN-BiLSTM model is more adaptable to deal with those problems and presents a good stable performance in all cases, showing strength in extracting temporal features and simultaneously processing the time series features. The metrics results also reveal that the CNN-BiLSTM outperforms all the machine learning models in the four cases. The best-performing machine model provides RMSEs of 127.5901, 95.2964, 65.7881, and 95.5099, respectively, while the RMSEs of the CNN-BiLSTM are 110.8020, 68.3823, 27.1889, and 37.4739, respectively.

5.2. Comparisons of the CNN-BiLSTM Hybrid Model and Benchmarked Deep Learning Models

In terms of the deep learning models, the results in our work coincide with the existing works, which compare the performance of the BiLSTM model and LSTM model in the time series forecasting and indicate that the BiLSTM model consistently outperforms the LSTM model in energy consumption forecasting (The LSTM offers the MAPEs of 37.6811, 28.0344, 19.9738, and 44.4674, while the BiLSTM provides the MAPEs of 5.8555, 5.3492, 12.6515, and 18.0989) [57], as the forward LSTM of BiLSTM can extract the past data information of the input sequence and the reverse LSTM of BiLSTM can extract the future data information of the input sequence, so as to realize the dual training of the forward and reverse LSTM of the time series, and further improve the global and complete feature extraction ability. On the other hand, the combination of the CNN and BiLSTM models has improved the predictive ability, compared to the single CNN and BiLSTM models (especially in Case IV, in which the MAPEs of the CNN and BiLSTM are 24.0272 and 18.0789, respectively, while the CNN-BiLSTM offers a MAPE of 4.0034), which is more capable of reaching the peak points and enhanced the predictive ability, with predicted values tightly following the raw trend.

5.3. Performance Analysis of the Models Considering the COVID-19 Stay-at-Home Orders

In terms of the time period of the application aspect, the in-sample data used for model fitting based on the optimal hyperparameters (from January 1973 to May 2018) are the total without an influence of COVID-19, and the out-of-sample data to validate the model forecasting accuracy include both the data (from June 2018 to December 2019, the first 19-step ahead forecasting), without an influence of the COVID-19 and data (from January 2020 to May 2020, the 20-th to 24-th step ahead forecasting) affected by the COVID-19. It can be observed from the figures that the developed CNN-BiLSTM model holds not only a superior performance in the unaffected 19-step ahead forecasting but also a more efficient adaption to the decreased data influenced by COVID-19. In contrast, the other models seem to be less adaptable than the CNN-BiLSTM model in the COVID-19 influenced period, even with optimistic forecasting results in the previous period, such as the BiLSTM and the tree models shown in Figure 5. In detail, as known that the United States government delivered mandatory stay-at-home orders in March-May 2020, and the energy consumed by the residential and commercial sectors has been severely affected by the lockdown. It can be observed that the CNN-BiLSTM model is more capable of dealing with the duration influenced by the COVID-19 stay-at-home orders, with the 22-th to 24-th step ahead predicted values close to the raw data during the lockdown time in 2020, and shows no significant differences in the multi-step ahead forecasting in 2018-2019. In contrast, the other benchmarked deep learning and machine learning models are relatively inferior in effectively predicting the samples during lockdown time.

6. Conclusions

Forecasting the energy consumed by the residential and commercial sectors is of great significance. However, it is hard for traditional machine learning models and deep learning models to obtain accurate forecasting results since the time series of the energy consumed by the residential and commercial sectors has obvious temporal and spatial characteristics. Hence, a hybrid model CNN-BiLSTM based on the convolution neural network and bidirectional long short-time memory is proposed to forecast the energy consumed by the residential and commercial sectors in this paper. The convolution layer of the CNN-BiLSTM is utilized to capture the spatial features of the time series. Additionally, the BiLSTM layer is used to catch the characteristics of long-term evolution and short-term change of the time series. The CNN-BiLSTM can be fully extrapolated by using the recursive multi-step prediction strategy. Additionally, the grid search is utilized to search the optimal hyperparameters of the CNN-BiLSTM.
The case study focuses on the energy consumed by the residential and commercial sectors in the US, including primary energy consumption, end-use energy consumption, and total energy consumption, with raw data from January 1973 to July 2022, which covers 595 points. For the four cases, the proposed CNN-BiLSTM offers improvements for the MAPEs of CNN of 16.6286, 16.2584, 22.372, and 20.0238, respectively; the improvements for the MAPEs of BiLSTM are 0.8138, 0.32, 7.1741, and 14.0755, respectively. What’s more, the MAPEs of the CNN-BiLSTM are 0.1252, 2.3126, 6.6096, and 5.0384, smaller than the MAPEs of the best-performing model among other models, respectively. The forecasting and metrics results of the four cases show that the CNN-BiLSTM obviously improves the adaptability and stability of the single CNN and BiLSTM and has the best forecasting performance among the other deep learning and machine learning models. The CNN-BiLSTM can make accurate short/medium-term forecasting for the energy consumed by the residential and commercial sectors in the US with the above discussions. Additionally, it is expected to be used for more kinds of energy forecasting in the future.
A possible limitation of the CNN-BiLSTM is that this model may not be suitable for small sample data modeling, as this model has complex layers, which require a lot of data to train parameters. The traditional grid search is adopted to optimize the CNN-BiLSTM in this paper, which may result in local optimization. From this perspective, future work can also be extended by improving the parameter optimization algorithm.

Author Contributions

Methodology, Y.C.; Software, Y.C.; Validation, Z.F.; Writing—original draft, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in EIA Monthly Energy Review at https://www.eia.gov/totalenergy/data/monthly/, accessed on 13 June 2020.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization. WHO COVID-19 Dashboard; World Health Organization: Geneva, Switzerland, 2020; Available online: https://covid19.who.int/ (accessed on 23 December 2022).
  2. U.S. Energy Consumption Fell by a Record 7% in 2020. Available online: https://www.eia.gov/todayinenergy/detail.php?id=47397 (accessed on 5 April 2021).
  3. Moreland, A.; Herlihy, C.; Tynan, M.A.; Sunshine, G.; McCord, R.F.; Hilton, C.; Poovey, J.; Werner, A.K.; Jones, C.D.; Fulmer, E.B.; et al. Timing of state and territorial COVID-19 stay-at-home orders and changes in population movement—United States, March 1–May 31, 2020. Morb. Mortal. Wkly. Rep. 2020, 69, 1198. [Google Scholar] [CrossRef]
  4. Stay-at-Home Orders Led to Less Commercial and Industrial Electricity Use in April. Available online: https://www.eia.gov/todayinenergy/detail.php?id=44276# (accessed on 30 June 2020).
  5. Chen, C.f.; de Rubens, G.Z.; Xu, X.; Li, J. Coronavirus comes home? Energy use, home energy management, and the social-psychological factors of COVID-19. Energy Res. Soc. Sci. 2020, 68, 101688. [Google Scholar] [CrossRef]
  6. Krarti, M.; Aldubyan, M. Review analysis of COVID-19 impact on electricity demand for residential buildings. Renew. Sustain. Energy Rev. 2021, 143, 110888. [Google Scholar] [CrossRef] [PubMed]
  7. Chinthavali, S.; Tansakul, V.; Lee, S.; Whitehead, M.; Tabassum, A.; Bhandari, M.; Munk, J.; Zandi, H.; Buckberry, H.; Kuruganti, T.; et al. COVID-19 pandemic ramifications on residential Smart homes energy use load profiles. Energy Build. 2022, 259, 111847. [Google Scholar] [CrossRef] [PubMed]
  8. Tamba, J.G.; Essiane, S.N.; Sapnken, E.F.; Koffi, F.D.; Nsouandélé, J.L.; Soldo, B.; Njomo, D. Forecasting natural gas: A literature survey. Int. J. Energy Econ. Policy 2018, 8, 216. [Google Scholar]
  9. Wei, N.; Li, C.; Peng, X.; Zeng, F.; Lu, X. Conventional models and artificial intelligence-based models for energy consumption forecasting: A review. J. Pet. Sci. Eng. 2019, 181, 106187. [Google Scholar] [CrossRef]
  10. Wright, C.; Chan, C.W.; Laforge, P. Towards developing a decision support system for electricity load forecast. In Decision Support Systems; IntechOpen: London, UK, 2012. [Google Scholar]
  11. Soldo, B. Forecasting natural gas consumption. Appl. Energy 2012, 92, 26–37. [Google Scholar] [CrossRef]
  12. Taşpınar, F.; Çelebi, N.; Tutkun, N. Forecasting of daily natural gas consumption on regional basis in Turkey using various computational methods. Energy Build. 2013, 56, 23–31. [Google Scholar] [CrossRef]
  13. Potočnik, P.; Soldo, B.; Šimunović, G.; Šarić, T.; Jeromen, A.; Govekar, E. Comparison of static and adaptive models for short-term residential natural gas forecasting in Croatia. Appl. Energy 2014, 129, 94–103. [Google Scholar] [CrossRef]
  14. He, Y.; Lin, B. Forecasting China’s total energy demand and its structure using ADL-MIDAS model. Energy 2018, 151, 420–429. [Google Scholar] [CrossRef]
  15. Jain, R.; Mahajan, V. Load forecasting and risk assessment for energy market with renewable based distributed generation. Renewable Energy Focus 2022, 42, 190–205. [Google Scholar] [CrossRef]
  16. Rakpho, P.; Yamaka, W. The forecasting power of economic policy uncertainty for energy demand and supply. Energy Reports 2021, 7, 338–343. [Google Scholar] [CrossRef]
  17. Zhang, Y.; Guo, H.; Sun, M.; Liu, S.; Forrest, J. A novel grey Lotka–Volterra model driven by the mechanism of competition and cooperation for energy consumption forecasting. Energy 2023, 264, 126154. [Google Scholar] [CrossRef]
  18. Şahin, U. Forecasting share of renewables in primary energy consumption and CO2 emissions of China and the United States under Covid-19 pandemic using a novel fractional nonlinear grey model. Expert Syst. Appl. 2022, 209, 118429. [Google Scholar] [CrossRef]
  19. Khan, A.M.; Osińska, M. Comparing forecasting accuracy of selected grey and time series models based on energy consumption in Brazil and India. Expert Syst. Appl. 2023, 212, 118840. [Google Scholar] [CrossRef]
  20. Chen, H.; Yang, Z.; Peng, C.; Qi, K. Regional energy forecasting and risk assessment for energy security: New evidence from the Yangtze River Delta region in China. J. Clean. Prod. 2022, 361, 132235. [Google Scholar] [CrossRef]
  21. Ding, S.; Tao, Z.; Li, R.; Qin, X. A novel seasonal adaptive grey model with the data-restacking technique for monthly renewable energy consumption forecasting. Expert Syst. Appl. 2022, 208, 118115. [Google Scholar] [CrossRef]
  22. Ye, L.; Dang, Y.; Fang, L.; Wang, J. A nonlinear interactive grey multivariable model based on dynamic compensation for forecasting the economy-energy-environment system. Appl. Energy 2023, 331, 120189. [Google Scholar] [CrossRef]
  23. Xie, N.; Liu, S. Discrete GM (1, 1) and mechanism of grey forecasting model. Syst.-Eng.-Theory Pract. 2005, 25, 93–99. [Google Scholar]
  24. Ahmad, T.; Zhang, D.; Huang, C. Methodological framework for short-and medium-term energy, solar and wind power forecasting with stochastic-based machine learning approach to monetary and energy policy applications. Energy 2021, 231, 120911. [Google Scholar] [CrossRef]
  25. Mehmood, F.; Ghani, M.U.; Ghafoor, H.; Shahzadi, R.; Asim, M.N.; Mahmood, W. EGD-SNet: A computational search engine for predicting an end-to-end machine learning pipeline for Energy Generation & Demand Forecasting. Appl. Energy 2022, 324, 119754. [Google Scholar] [CrossRef]
  26. Aras, S.; Hanifi Van, M. An interpretable forecasting framework for energy consumption and CO2 emissions. Appl. Energy 2022, 328, 120163. [Google Scholar] [CrossRef]
  27. Feng, Z.; Zhang, M.; Wei, N.; Zhao, J.; Zhang, T.; He, X. An office building energy consumption forecasting model with dynamically combined residual error correction based on the optimal model. Energy Rep. 2022, 8, 12442–12455. [Google Scholar] [CrossRef]
  28. Rao, C.; Zhang, Y.; Wen, J.; Xiao, X.; Goh, M. Energy demand forecasting in China: A support vector regression-compositional data second exponential smoothing model. Energy 2023, 263, 125955. [Google Scholar] [CrossRef]
  29. Zhu, L.; Li, M.; Wu, Q.; Jiang, L. Short-term natural gas demand prediction based on support vector regression with false neighbours filtered. Energy 2015, 80, 428–436. [Google Scholar] [CrossRef]
  30. Liu, H.; Tang, Y.; Pu, Y.; Mei, F.; Sidorov, D. Short-term Load Forecasting of Multi-Energy in Integrated Energy System Based on Multivariate Phase Space Reconstruction and Support Vector Regression Mode. Electr. Power Syst. Res. 2022, 210, 108066. [Google Scholar] [CrossRef]
  31. Jamei, M.; Ali, M.; Karbasi, M.; Xiang, Y.; Ahmadianfar, I.; Yaseen, Z.M. Designing a Multi-Stage Expert System for daily ocean wave energy forecasting: A multivariate data decomposition-based approach. Appl. Energy 2022, 326, 119925. [Google Scholar] [CrossRef]
  32. Li, R.; Song, X. A multi-scale model with feature recognition for the use of energy futures price forecasting. Expert Syst. Appl. 2023, 211, 118622. [Google Scholar] [CrossRef]
  33. Mustaqeem; Ishaq, M.; Kwon, S. A CNN-Assisted deep echo state network using multiple Time-Scale dynamic learning reservoirs for generating Short-Term solar energy forecasting. Sustain. Energy Technol. Assess. 2022, 52, 102275. [Google Scholar] [CrossRef]
  34. Amalou, I.; Mouhni, N.; Abdali, A. Multivariate time series prediction by RNN architectures for energy consumption forecasting. Energy Rep. 2022, 8, 1084–1091. [Google Scholar] [CrossRef]
  35. Lee, Y.; Ha, B.; Hwangbo, S. Generative model-based hybrid forecasting model for renewable electricity supply using long short-term memory networks: A case study of South Korea’s energy transition policy. Renew. Energy 2022, 200, 69–87. [Google Scholar] [CrossRef]
  36. Yang, B.; Yuan, X.; Tang, F. Improved nonlinear mapping network for wind power forecasting in renewable energy power system dispatch. Energy Rep. 2022, 8, 124–133. [Google Scholar] [CrossRef]
  37. Ren, H.; Li, Q.; Wu, Q.; Zhang, C.; Dou, Z.; Chen, J. Joint forecasting of multi-energy loads for a university based on copula theory and improved LSTM network. Energy Rep. 2022, 8, 605–612. [Google Scholar] [CrossRef]
  38. Ding, S.; Zhang, H.; Tao, Z.; Li, R. Integrating data decomposition and machine learning methods: An empirical proposition and analysis for renewable energy generation forecasting. Expert Syst. Appl. 2022, 204, 117635. [Google Scholar] [CrossRef]
  39. Khan, W.; Walker, S.; Zeiler, W. Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy 2022, 240, 122812. [Google Scholar] [CrossRef]
  40. Chaturvedi, S.; Rajasekar, E.; Natarajan, S.; McCullen, N. A comparative assessment of SARIMA, LSTM RNN and Fb Prophet models to forecast total and peak monthly energy demand for India. Energy Policy 2022, 168, 113097. [Google Scholar] [CrossRef]
  41. Zheng, J.; Du, J.; Wang, B.; Klemeš, J.J.; Liao, Q.; Liang, Y. A hybrid framework for forecasting power generation of multiple renewable energy sources. Renew. Sustain. Energy Rev. 2023, 172, 113046. [Google Scholar] [CrossRef]
  42. Li, C.; Li, G.; Wang, K.; Han, B. A multi-energy load forecasting method based on parallel architecture CNN-GRU and transfer learning for data deficient integrated energy systems. Energy 2022, 259, 124967. [Google Scholar] [CrossRef]
  43. Rick, R.; Berton, L. Energy forecasting model based on CNN-LSTM-AE for many time series with unequal lengths. Eng. Appl. Artif. Intell. 2022, 113, 104998. [Google Scholar] [CrossRef]
  44. Shabbir, N.; Kütt, L.; Raja, H.A.; Jawad, M.; Allik, A.; Husev, O. Techno-economic analysis and energy forecasting study of domestic and commercial photovoltaic system installations in Estonia. Energy 2022, 253, 124156. [Google Scholar] [CrossRef]
  45. Khan, S.U.; Khan, N.; Ullah, F.U.M.; Kim, M.J.; Lee, M.Y.; Baik, S.W. Towards intelligent building energy management: AI-based framework for power consumption and generation forecasting. Energy Build. 2023, 279, 112705. [Google Scholar] [CrossRef]
  46. Niu, D.; Yu, M.; Sun, L.; Gao, T.; Wang, K. Short-term multi-energy load forecasting for integrated energy systems based on CNN-BiGRU optimized by attention mechanism. Appl. Energy 2022, 313, 118801. [Google Scholar] [CrossRef]
  47. Peng, L.; Wang, L.; Xia, D.; Gao, Q. Effective energy consumption forecasting using empirical wavelet transform and long short-term memory. Energy 2022, 238, 121756. [Google Scholar] [CrossRef]
  48. Kim, H.; Kim, M. A novel deep learning-based forecasting model optimized by heuristic algorithm for energy management of microgrid. Appl. Energy 2023, 332, 120525. [Google Scholar] [CrossRef]
  49. Khan, Z.A.; Ullah, A.; Ul Haq, I.; Hamdy, M.; Maria Mauro, G.; Muhammad, K.; Hijji, M.; Baik, S.W. Efficient Short-Term Electricity Load Forecasting for Effective Energy Management. Sustain. Energy Technol. Assessments 2022, 53, 102337. [Google Scholar] [CrossRef]
  50. Yan, K.; Zhou, X.; Chen, J. Collaborative deep learning framework on IoT data with bidirectional NLSTM neural networks for energy consumption forecasting. J. Parallel Distrib. Comput. 2022, 163, 248–255. [Google Scholar] [CrossRef]
  51. He, Y.L.; Chen, L.; Gao, Y.; Ma, J.H.; Xu, Y.; Zhu, Q.X. Novel double-layer bidirectional LSTM network with improved attention mechanism for predicting energy consumption. ISA Trans. 2022, 127, 350–360. [Google Scholar] [CrossRef] [PubMed]
  52. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  53. Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
  54. Graves, A.; Schmidhuber, J. Offline handwriting recognition with multidimensional recurrent neural networks. In Proceedings of the Advances in Neural Information Processing Systems, San Francisco, CA, USA, 30 November–3 December 2008; Volume 21. [Google Scholar]
  55. Schmidhuber, J.; Wierstra, D.; Gomez, F.J. Evolino: Hybrid neuroevolution/optimal linear search for sequence prediction. In Proceedings of the 19th International Joint Conferenceon Artificial Intelligence (IJCAI), Scotland, UK, 30 July–5 August 2005. [Google Scholar]
  56. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar]
  57. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
  58. Yang, X.S.; Deb, S.; Zhao, Y.X.; Fong, S.; He, X. Swarm intelligence: Past, present and future. Soft Comput. 2018, 22, 5923–5933. [Google Scholar] [CrossRef] [Green Version]
  59. Li, M.; Du, W.; Nian, F. An adaptive particle swarm optimization algorithm based on directed weighted complex network. Math. Probl. Eng. 2014, 2014, 434972. [Google Scholar] [CrossRef] [Green Version]
  60. Van, P.T.; Van, T.H.; Tangaramvong, S. Performance Comparison of Variants Based on Swarm Intelligence Algorithm of Mathematical and Structural Optimization. Iop Conf. Ser. Mater. Sci. Eng. 2022, 1222, 012013. [Google Scholar] [CrossRef]
  61. Cui, Y.; Jia, L.; Fan, W. Estimation of actual evapotranspiration and its components in an irrigated area by integrating the Shuttleworth-Wallace and surface temperature-vegetation index schemes using the particle swarm optimization algorithm. Agric. For. Meteorol. 2021, 307, 108488. [Google Scholar] [CrossRef]
  62. Priyadarshini, I.; Cotton, C. A novel LSTM–CNN–grid search-based deep neural network for sentiment analysis. J. Supercomput. 2021, 77, 13911–13932. [Google Scholar] [CrossRef] [PubMed]
  63. Thanh, T.; Van Dai, L.; Minh, L. Effects of Data Standardization on Hyperparameter Optimization with the Grid Search Algorithm Based on Deep Learning: A Case Study of Electric Load Forecasting. Adv. Technol. Innov. 2022, 7, 258–269. [Google Scholar] [CrossRef]
  64. Subramanian, S.; Rao, A. Deep-learning based Time Series Forecasting of Go-around Incidents in the National Airspace System. In Proceedings of the 2018 AIAA Modeling and Simulation Technologies Conference, Atlanta, GA, USA, 25–29 June 2018. [Google Scholar] [CrossRef]
  65. Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B 1974, 36, 111–133. [Google Scholar] [CrossRef]
  66. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
  67. Schölkopf, B.; Smola, A.J.; Bach, F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
  68. Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  69. Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
  70. Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar]
  71. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
  72. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA; 2016; pp. 785–794. [Google Scholar]
Figure 1. Background in the United States: (a) Cumulative confirmed COVID-19 cases and deaths per million people; (b) Daily new confirmed COVID-19 cases and deaths per million people; (c) Gross domestic product (GDP) in the United States from 1960 to 2021; (d) End-Use energy consumed by the residential and commercial sectors from March to May in recent five years.
Figure 1. Background in the United States: (a) Cumulative confirmed COVID-19 cases and deaths per million people; (b) Daily new confirmed COVID-19 cases and deaths per million people; (c) Gross domestic product (GDP) in the United States from 1960 to 2021; (d) End-Use energy consumed by the residential and commercial sectors from March to May in recent five years.
Sustainability 15 01895 g001
Figure 2. The framework of the LSTM model.
Figure 2. The framework of the LSTM model.
Sustainability 15 01895 g002
Figure 3. The network structure of the BiLSTM.
Figure 3. The network structure of the BiLSTM.
Sustainability 15 01895 g003
Figure 4. The overall workflow of the CNN-BiLSTM.
Figure 4. The overall workflow of the CNN-BiLSTM.
Sustainability 15 01895 g004
Figure 5. 24 multi-step ahead out-of-sample forecasting results from June 2018 to May 2020 in Case I of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Figure 5. 24 multi-step ahead out-of-sample forecasting results from June 2018 to May 2020 in Case I of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Sustainability 15 01895 g005
Figure 6. 24 multi-step ahead out-of-sample forecasting results from 2018 June to 2020 May in Case II of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Figure 6. 24 multi-step ahead out-of-sample forecasting results from 2018 June to 2020 May in Case II of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Sustainability 15 01895 g006
Figure 7. 24 multi-step ahead out-of-sample forecasting results from 2018 June to 2020 May in Case III of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Figure 7. 24 multi-step ahead out-of-sample forecasting results from 2018 June to 2020 May in Case III of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Sustainability 15 01895 g007
Figure 8. 24 multi-step ahead out-of-sample forecasting results from 2018 June to 2020 May in Case IV of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Figure 8. 24 multi-step ahead out-of-sample forecasting results from 2018 June to 2020 May in Case IV of all models: predicted values of (a) CNN-BiLSTM; (b) BiLSTM; (c) CNN-LSTM; (d) CNN; (e) LSTM; (f) SVR; (g) LSSVR; (h) RF; (i) CatBoost; (j) LGBM; (k) XGBoost.
Sustainability 15 01895 g008
Table 1. Metrics to evaluate the predictive quality of the CNN-BiLSTM model.
Table 1. Metrics to evaluate the predictive quality of the CNN-BiLSTM model.
AbbreviationDefinitionExpression
MAPEMean Absolute Percentage Error 1 p ξ = 1 p y p ξ y ^ p ξ y p ξ × 100 %
MAEMean Absolute Error 1 p ξ = 1 p y p ξ y ^ p ξ
RMSERoot Mean Square Error 1 p ξ = 1 p y p ξ y ^ p ξ 2
MSEMean Squared Error 1 p ξ = 1 p y p ξ y ^ p ξ 2
MAAPEMean Arctangent Absolute Percentage Error 1 p ξ = 1 p arctan y p ξ y ^ p ξ y p ξ
NRMSENormalized Root Mean Square Error 1 p ξ = 1 p y p ξ y ^ p ξ 2 y p max y p min
RMSPERoot Mean Square Percentage Error 1 p ξ = 1 p y p ξ y ^ p ξ y p ξ 2
SMAPESymmetric Mean Absolute Percentage Error 1 p ξ = 1 p y p ξ y ^ p ξ 0.5 y p ξ + 0.5 y ^ p ξ × 100 %
U1Theil U Statistic 1 1 p ξ = 1 p y p ξ y ^ p ξ 2 1 p ξ = 1 p y p ξ 2 + 1 p ξ = 1 p y ^ p ξ 2
U2Theil U Statistic 2 1 p ξ = 1 p y p ξ y ^ p ξ 2 ξ = 1 p y p ξ 2
IAIndex of Agreement 1 ξ = 1 p y p ( ξ ) y p ^ ( ξ ) 2 ξ = 1 p y p ( ξ ) y p ¯ ( ξ ) + y p ^ ( ξ ) y p ^ ¯ ( ξ ) 2
R 2 Coefficient of Determination 1 ξ = 1 p y p ξ y p ^ ξ 2 ξ = 1 p y p ξ y ¯ p ξ 2
Table 2. 24 multi step out-of-sample forecasting metrics of all models in Case I.
Table 2. 24 multi step out-of-sample forecasting metrics of all models in Case I.
CNN-BiLSTMBiLSTMCNN-LSTMCNNLSTMSVRLSSVRRFCatBoostLGBMXGBoost
lr = 0.01
= 60
κ = 2
lr = 0.1
= 20
lr = 0.001
= 25
κ = 2
lr = 0.1
κ = 2
lr = 0.001
= 60
C = 55
ϵ = 1 × 10 6
γ = 0.03125
C = 625
λ = 0.03125
max_depth = 7
min_samples_leaf = 1
min_samples_split = 2
n_estimators = 50
depth = 10
l2_leaf_reg = 100.0
lr = 0.01
max_depth = 5
num_leaves = 50
reg_alpha = 0.01
reg_lambda = 1
gamma = 0.0
lr = 0.1
max_depth = 5
min_child_weight = 1
reg_alpha = 1
MAPE5.04175.855544.239522.045637.681120.884628.82585.166913.17105.916015.6317
MAE87.7030102.3284803.9523428.3851692.8386343.4988462.918091.8243246.8932105.8958277.2572
RMSE110.8020131.6346869.9062540.1201767.0836410.9314571.3807127.5901318.3963136.3436371.4026
MSE12,277.077117,327.6641756,736.8125291,729.7188588,417.2500168,864.6442326,475.912116,279.2250101,376.217118,589.5845137,939.8906
MAAPE0.05030.05840.41350.21320.35690.20110.26600.05140.12990.05900.1521
NRMSE0.09390.11150.73680.45750.64970.34810.48400.10810.26970.11550.3146
RMSPE0.06260.07270.45340.25930.39240.25770.37310.07080.16030.07300.2040
SMAPE5.08556.020057.868126.136047.571719.029324.11865.238213.86475.890416.0521
U10.03140.03750.32020.17460.27130.11230.14840.03600.09360.03830.1070
U20.06240.07410.48990.30420.43200.23140.32180.07190.17930.07680.2092
IA0.97200.96660.36900.45180.39350.39170.37620.96400.52460.96380.6462
R 2 0.88830.8423−5.8872−1.6551−4.3553−0.5369−1.97130.85180.07740.8308−0.2554
Table 3. 24 multi step out-of-sample forecasting metrics of all models in Case II.
Table 3. 24 multi step out-of-sample forecasting metrics of all models in Case II.
CNN-BiLSTMBiLSTMCNN-LSTMCNNLSTMSVRLSSVRRFCatBoostLGBMXGBoost
lr = 0.01
= 50
κ = 2
lr = 0.01
= 40
lr = 0.01
= 65
κ = 3
lr = 0.1
κ = 2
lr = 0.1
= 45
C = 5
ϵ = 1 × 10 6
γ = 1.72844
C = 15625
λ = 0.13446
max_depth = 7
min_samples_leaf = 1
min_samples_split = 5
n_estimators = 50
depth = 8
l2_leaf_reg = 0
lr = 0.01
max_depth = 5
num_leaves = 50
reg_alpha = 0.001
reg_lambda = 0.1
gamma = 0.0
lr = 0.5
max_depth = 5
min_child_weight = 1
reg_alpha = 1
MAPE5.02925.349240.699221.287628.034411.043813.84049.04707.82717.34187.3984
MAE47.175450.2112453.6898227.9078273.9267111.5945135.658489.343776.873772.042574.2770
RMSE68.382383.3169555.3765292.4198339.5989143.5384166.9033110.398597.012995.296496.7993
MSE4676.14066941.7031308,443.093885,509.3438115,327.414120,603.277927,856.708612,187.82409411.49379081.40989370.1064
MAAPE0.05010.05280.37810.20670.26210.10930.13660.08990.07790.07300.0736
NRMSE0.06760.08240.54910.28910.33580.14190.16500.10910.09590.09420.0957
RMSPE0.06920.09160.44040.24820.35130.13220.16220.10710.09260.09080.0918
SMAPE5.01295.686854.028023.787327.109511.881814.70279.53398.15167.59617.7344
U10.03290.04020.35460.15490.16890.07210.08480.05450.04760.04650.0476
U20.06600.08050.53640.28240.32800.13860.16120.10660.09370.09200.0935
IA0.98950.98400.46400.68170.57310.94910.91920.97170.97860.97950.9780
R 2 0.95460.9327−1.99230.1704−0.11880.80010.72980.88180.90870.91190.9091
Table 4. 24 multi step out-of-sample forecasting metrics of all models in Case III.
Table 4. 24 multi step out-of-sample forecasting metrics of all models in Case III.
CNN-BiLSTMBiLSTMCNN-LSTMCNNLSTMSVRLSSVRRFCatBoostLGBMXGBoost
lr = 0.01
= 55
κ = 2
lr = 0.01
= 25
lr = 0.01
= 70
κ = 2
lr = 0.01
κ = 2
lr = 0.1
= 70
C = 20
ϵ = 1 × 10 6
γ = 0.04500
C = 625
λ = 0.40171
max_depth = 7
min_samples_leaf = 1
min_samples_split = 2
n_estimators = 100
depth = 7
l2_leaf_reg = 1
lr = 0.01
max_depth = 5
num_leaves = 50
reg_alpha = 0.01
reg_lambda = 1
gamma = 0.0
lr = 0.5
max_depth = 3
min_child_weight = 2
reg_alpha = 0.01
MAPE5.477412.651546.745527.849419.973815.458413.073613.057612.087014.068816.0561
MAE21.995147.7852217.115393.858876.853969.527954.745556.163749.517556.916069.7948
RMSE27.188960.9706273.3944103.277290.829894.197874.617876.689465.788177.103996.7247
MSE739.23543717.413374,744.476610,666.18468250.05578873.21955567.81815881.26194328.07695945.01589355.6621
MAAPE0.05470.12480.42250.26500.19520.15150.12870.12850.11940.13820.1569
NRMSE0.05650.12670.56810.21460.18880.19580.15510.15940.13670.16020.2010
RMSPE0.06250.15450.51760.32330.22370.18870.16040.16110.14450.17190.1976
SMAPE5.539411.577466.350032.521120.782717.275314.529714.549013.256515.662618.3556
U10.03220.06800.45390.13190.10920.12040.09430.09750.08280.09720.1253
U20.06420.14390.64530.24380.21440.22230.17610.18100.15530.18200.2283
IA0.99300.96800.45250.89150.92430.89410.94080.93600.95450.93810.8952
R 2 0.97200.8590−1.83510.59540.68710.66340.78880.77690.83580.77450.6451
Table 5. 24 multi step out-of-sample forecasting metrics of all models in Case IV.
Table 5. 24 multi step out-of-sample forecasting metrics of all models in Case IV.
CNN-BiLSTMBiLSTMCNN-LSTMCNNLSTMSVRLSSVRRFCatBoostLGBMXGBoost
lr = 0.01
= 125
κ = 2
lr = 0.001
= 150
lr = 0.01
= 5
κ = 2
lr = 0.1
κ = 3
lr = 0.01
= 10
C = 100
ϵ = 1 × 10 6
γ = 1.72844
C = 390625
λ = 0.09336
max_depth = 11
min_samples_leaf = 2
min_samples_split = 10
n_estimators = 100
depth = 6
l2_leaf_reg = 1
lr = 0.01
max_depth = 5
num_leaves = 50
reg_alpha = 1
reg_lambda = 1
gamma = 0.0
lr = 0.05
max_depth = 3
min_child_weight = 1
reg_alpha = 1
MAPE4.003418.078953.144424.027244.46749.041810.689211.922310.833812.052611.0120
MAE30.5129138.1454424.3060176.0921353.953475.380785.446895.720186.729896.867788.6650
RMSE37.4739144.8520449.4491197.1012374.892296.5099111.9645119.9051106.6557124.0479106.3436
MSE1404.295520,982.1133202,004.500038,848.8867140,544.12509314.163412,536.045114,377.225911,375.435015,387.871611,308.9697
MAAPE0.04000.17840.48590.23180.41590.08980.10570.11780.10740.11900.1093
NRMSE0.07540.29130.90390.39640.75400.19410.22520.24120.21450.24950.2139
RMSPE0.04910.18850.53910.27510.45380.10970.13550.14460.12730.14720.1261
SMAPE4.083920.063673.401623.138558.03519.652111.010212.510211.485312.788611.5984
U10.02400.10050.39390.12400.30880.06410.07270.07880.07040.08180.0704
U20.04750.18360.56960.24980.47510.12230.14190.15200.13520.15720.1348
IA0.98400.79730.35500.49350.41420.85860.79960.80190.84640.78940.8249
R 2 0.9330−0.0004−8.6318−0.8523−5.70130.55590.40230.31450.45760.26630.4608
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Fu, Z. Multi-Step Ahead Forecasting of the Energy Consumed by the Residential and Commercial Sectors in the United States Based on a Hybrid CNN-BiLSTM Model. Sustainability 2023, 15, 1895. https://doi.org/10.3390/su15031895

AMA Style

Chen Y, Fu Z. Multi-Step Ahead Forecasting of the Energy Consumed by the Residential and Commercial Sectors in the United States Based on a Hybrid CNN-BiLSTM Model. Sustainability. 2023; 15(3):1895. https://doi.org/10.3390/su15031895

Chicago/Turabian Style

Chen, Yifei, and Zhihan Fu. 2023. "Multi-Step Ahead Forecasting of the Energy Consumed by the Residential and Commercial Sectors in the United States Based on a Hybrid CNN-BiLSTM Model" Sustainability 15, no. 3: 1895. https://doi.org/10.3390/su15031895

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop