You are currently viewing a new version of our website. To view the old version click .
Sustainability
  • Article
  • Open Access

27 January 2025

Short-Term Residential Load Forecasting Based on the Fusion of Customer Load Uncertainty Feature Extraction and Meteorological Factors

,
,
,
and
1
School of Frontier Crossover Studies, Hunan University of Technology and Business, Changsha 410205, China
2
Xiangjiang Laboratory, Changsha 410205, China
3
State Grid Hunan Electric Power Co., Ltd. Information and Communication Branch, Changsha 410205, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Future Directions in Energy Transition and Sustainable Management

Abstract

With the proliferation of distributed energy resources, advanced metering infrastructure, and advanced communication technologies, the grid is transforming into a flexible, intelligent, and collaborative system. Short-term electric load forecasting for individual residential customers is playing an increasingly important role in the operation and planning of the future grid. Predicting the electrical load of individual households is more challenging with higher uncertainty and volatility at the household level compared to the total electrical load at the feeder and regional levels. The previous research results show that the accuracy of forecasting using machine learning and a single deep learning model is far from adequate and there is still room for improvement.

1. Introduction

Electricity, the most important energy source available today, is essential to the security and stability of urban growth. With the advent of the smart era, traditional power grids are evolving into smart grids. The goal of smart grids is to improve energy utilization and reduce the consumption of non-renewable energy. In smart grids, one of the effective ways to improve renewable energy utilization is to reduce peak and valley and dispatch electricity in a reasonable manner. The percentage of residential electricity consumption in end-use energy consumption is rising as a result of the economy’s quick development and the development of smart grid technology. Power providers can develop reasonable demand response strategies, encourage residents to modify their innate electricity consumption patterns, lower customers’ electricity costs, and accomplish peak and valley reduction goals with the aid of an accurate residential electricity load forecast.
Power load forecasting in smart grids guarantees the safe, reliable, and cost-effective operation of power systems. Traditional load forecasting focuses on regional, system-level, feeder-level, or entire buildings. With the widespread adoption of smart meters, which allow high-frequency sampling, a large amount of high-frequency single-user electricity data is collected, making user-level load forecasting possible [1]. Smart meter data provide important information such as load profiles and individual consumption habits, which can be used to improve the accuracy of both individual and overall load forecasts. Accurate short-term load forecasting (STLF) can effectively support Home Energy Management Systems (HEMSs) [2]. Additionally, load forecasting enables power companies to determine effective electricity pricing structures and implement more rational demand response strategies, thereby reducing operational costs [3]. In order to improve energy efficiency, integrate renewable energy, lower carbon emissions, maintain grid stability, and reap economic and social benefits, user load forecasting is a crucial tool for advancing energy sustainability.
However, because users’ electricity consumption behavior exhibits high randomness and uncertainty, forecasting individual user behavior is more challenging than system-level aggregated power load curves, as the timing of usage activities introduces further randomness and noise. Compared to traditional load forecasting problems, residential load forecasting presents unique challenges. For instance, the load scale of substations or buses is relatively large and does not change within a short period. Load curves for commercial and industrial users often exhibit typical peak–valley patterns, which aid accurate load prediction. In contrast, user behavior in residential settings has its specific characteristics: electricity consumption tends to vary periodically with work, rest, and leisure time cycles. Additionally, usage behavior is easily influenced by external stimuli; for example, when temperature and humidity change, users may activate equipment to control these conditions, which ultimately reflects energy consumption.
In the field of power load forecasting, many researchers have conducted extensive studies, applying statistical and machine learning algorithms such as the Autoregressive Integrated Moving Average (ARIMA) [4] and Gaussian Process (GP) [5] to the forecasting tasks, primarily for system-level load forecasting. However, for household load forecasting, the data’s instability and user behavior uncertainty often limit these models’ effectiveness. Since single-user load data are typically a long, high-frequency sequence, previous studies [6] have identified residential power load as a combination of periodic, uncertain, and noisy data. Periodic data can often be predicted using statistical and traditional machine learning methods, while uncertainty introduces complex nonlinear patterns that are challenging for these methods to handle. With the rapid development of computing power, the ability to utilize deep learning methods has become critical in the rapidly evolving energy industry [7]. It presents a promising solution for such data. Its multilayered nonlinear structure allows it to capture complex feature abstractions and nonlinear mappings. In recent years, deep learning has made significant advances in fields such as image processing and speech recognition, leading many researchers to apply it to load forecasting tasks [8].
In summary, the following problems still exist in current deep learning-based models for short-term household electricity load forecasting:
(1)
The acquisition of user load is usually a large number of long sequences with high frequency, great randomness, and uncertainty, which still requires data preprocessing techniques for feature extraction before being input to the model.
(2)
In user-level load prediction tasks, data uncertainty-awareness enhancement is required in building deep learning models.
(3)
Most scholars compose additional features and load data simultaneously as input vectors for model training, but, in the field of deep learning, it is introduced that features can be learned individually by independent networks and then adapted to downstream prediction tasks by feature fusion.
To address the above problems, this paper proposes segmenting long sequences into multiple subsequences, aligning the data points in these subsequences relative to time for use as input vectors for the neural network, and extracting baselines from these subsequences. The proposed model first utilizes a CNN in the encoder to perform local feature extraction on the input subsequences, followed by an input attention mechanism (IAM) for weight allocation. The weighted vectors are then sequentially fed into LSTM for learning uncertainty features, yielding a power load feature vector. Since this changes the input data dimensions, causing the model to lack user power load context information, an autoregressive component is introduced for information compensation. To mitigate overfitting and enhance load data features, a separate CNN is introduced to learn local features of meteorological factors such as temperature, humidity, and dew point, resulting in a meteorological feature vector. The power load and meteorological feature vectors are then fused in a residual connected fusion module at the same time step to create a combined vector. Finally, another LSTM predicts the fused vector, yielding a nonlinear prediction value, which is added to the output of the autoregressive component for the final forecast. In summary, the main contributions of this paper are as follows.
(1)
A nonlinear relationship extraction method for customer electric load is proposed, combining subsequence partitioning and a CNN-IAM-LSTM module for feature extraction to reduce noise and capture deep load features.
(2)
Meteorological data are integrated using a first-order difference and CNN-based feature extraction network for local meteorological feature analysis.
(3)
A multifactor fusion sub-network is developed, utilizing residual connections and layer normalization to align and integrate electric load and meteorological features, enhancing model performance.
(4)
A fused multi-stage LSTM model is designed, combining feature extraction, multi-feature fusion, and prediction networks to improve user-level nonlinear capability and forecasting accuracy.
The remainder of this paper is organized as follows: Section 2 introduces the proposed model and provides a brief overview of its components. Section 3 describes the UMASS dataset and the parameter settings for model training and prediction, and discusses the prediction results and error analysis of the proposed model on three user datasets. Section 4 presents the conclusions.

3. Model

3.1. Problem Description

We intend to achieve a high-accuracy short-term forecast of household short-term power load based on household electricity consumption and external characteristic data, such as meteorological data. The specific definition of the problem is as follows: given a household electricity consumption time series X = ( x 1 , x 2 , , x T ) over the past T time steps, where x t denotes the consumption readings within period t , and a meteorological time series Q = ( q 1 , q 2 , , q m ) composed of m meteorological factors reported locally, where q i = ( q i , 1 , q i , 2 , , q i , T ) , i ( 1 , m ) represents the time series of readings for the i -th meteorological factor over the past T time steps. The aim of the model is to learn a nonlinear model F ( X , Q ) to forecast household electricity consumption Ŷ for the next time steps.

3.2. Model Construction

To address the challenges in short-term load forecasting—characterized by high uncertainty, numerous and random user behavior factors, and the strong generalization needed for long-sequence data—this paper proposes a short-term household electricity load forecasting model based on uncertainty feature extraction of user load and the integration of meteorological factors. The proposed model is composed of four main parts: user electricity load data subsequence partitioning and feature extraction, meteorological data feature extraction, multi-feature fusion, and prediction. In the electricity data subsequence partitioning and feature extraction stage, the long sequence is divided into multiple subsequences, and feature learning is performed using a CNN-IAM-LSTM-based neural network to obtain multiple local feature vectors. For meteorological data feature extraction, local features are obtained using a combination of first-order differencing and a CNN. The multi-feature fusion network utilizes a residual connection network to merge feature vectors from electricity data and meteorological factors. The prediction layer, composed of an independent LSTM, learns from the fused feature vectors output by the fusion layer and produces nonlinear forecast data. The final prediction result is obtained by adding this output to the autoregressive information compensation. The model structure is illustrated in Figure 1. In the remainder of this section, we briefly introduce the convolutional neural network (CNN), the long short-term memory (LSTM) network, the input attention mechanism, and the residual connection mechanism.
Figure 1. Model organization structure.

3.2.1. User Load Subsequence Partitioning and Local Feature Extraction

Subsequence Partitioning and CNN Feature Extraction
To further analyze the uncertainty in users’ electricity consumption, the power load sequence over the past T hours is divided into subsequences, where the length of each subsequence is L and the number of subsequences is N , so T = L N . The division of L typically takes into account human daily routines and activity cycles to maintain temporal correlations between each subsequence, represented as X = s 1 , s 2 , , s N , where S i = ( S i , 1 , S i , 2 , , S i , L ) T ,   i ( 1 , N ) . The optimal subsequence length is determined experimentally. Local features are extracted using a convolutional neural network (CNN) without pooling layers. CNN, proposed by Lecun Y et al. in 1998, is widely applied in various deep learning fields, including image and speech recognition [34]. Reference [35] demonstrates that a 1D-CNN is effective in extracting temporal feature information from time series. CNN extracts local features through local connections, weight sharing, and spatial pooling, which enhances its capacity for abstract representation. Its main structure comprises convolutional layers, pooling layers, and fully connected layers [36]. The main formula for the CNN is shown in Equation (1):
y i k = f ( y k 1 w i k + b i k )
where y k 1 represents the input of the k -th convolutional layer, denotes the convolution operation, w i k is the weight of the i -th filter in the k -th convolutional layer, and b i k is the bias term for the k -th convolutional layer. By applying the CNN convolution operation in Equation (1) to the input subsequence, the output sequence Ŝ = ( ŝ 1 , ŝ 2 , , ŝ k ) is obtained, where k is the number of output channels.
Input Attention Weighting
The input attention mechanism (IAM) employs a deterministic attention model, namely a multilayer perceptron model. IAM applies attention to the input layer, assigning different weights to the model’s inputs, thereby significantly reducing the number of learnable parameters. This is because IAM does not require the typical encoder–decoder network used in existing attention mechanisms (Figure 2).
Figure 2. Input attention.
Input attention weights are calculated for multiple feature time-series vectors after local feature extraction. The input attention weight for the j -th feature sequence is computed as follows:
e j = V i n T tanh ( W i n s j + b i n ) + b i n
α j = e x p ( e j ) j = 1 n e x p ( e j )
where V i n T , W i n , b i n , and b i n are learnable parameters, and α j is the input weight for the j -th feature sequence. Then, the input attention-weighted matrix for all local features can be represented as S ¯ = ( α 1 ŝ 1 , α 2 ŝ 2 , , α k ŝ k ) , with the weighted input vector at time t denoted as S ¯ t = ( α 1 ŝ 1 , t , α 2 ŝ 2 , t , , α k ŝ k , t ) T .
LSTM Time-Series Feature Extraction
Compared with traditional RNNs, which have only a single hidden state, LSTM introduces the concept of a cell state, accounting for temporal correlations embedded in long-term states. The cell structure of the LSTM is shown in the Figure 3.
Figure 3. LSTM neuron structure.
LSTM is based on recurrent neural network by adding input gates, forget gates, and output gates to control the input and output of the information flow and the state of the cell, respectively, to achieve the update of the control information on the cell state. Calculating the output of an LSTM cell requires the calculation of the cell state at the input gate, the forget gate, and the output gate for the current input and the previous moment, respectively, and the LSTM neural cell at moment t is calculated as follows:
i t = σ ( W i x i + U i h i 1 + V i c i 1 + b i ) f t = σ ( W f x t + U f h t 1 + V f c t 1 + b f ) ĉ t = W c x t + U c h t 1 + b c c t = f t c t 1 + i t tanh ( W c x t + U c h t 1 + b c ) o t = σ ( W o x t + U o h t 1 + V o c t 1 + b o ) h t = o t tanh ( c t )
where c t is the cell state of the memory cell at time t ; c t 1 denotes the cell state at the previous moment; ĉ t denotes the candidate state of the input; h t is all the outputs of the LSTM unit at time t ; W , U , V , b are the matrices of the coefficients and the vectors of the biases, respectively; σ is the activation function Sigmoid; x t is the input at time t ; i t , f t , o t are the inputs at time t and the outputs of the input, forget, and output gates at time t, respectively. The LSTM neural network in the encoder has a multilayer network structure, and each layer consists of multiple neuronal cells, whose hidden state h t at moment t can be calculated by the formula:
h t = f e ( h t 1 , S ¯ t )
where f e is the LSTM network, and h t 1 is the output of the LSTM at the previous moment. After LSTM layer coding, L electrical load codes H = ( h 1 , h 2 , , h l ) are obtained.
Autoregressive Information Compensation
By subsequence partitioning and feature extraction of the customer load, we extracted the uncertainty features in the electrical load using a deep neural network, while, for the periodic features, we compensated for the information by introducing an autoregressive component in the LSTM prediction stage; we used the classical autoregressive (AR) model as the linear component to obtain linear prediction values ŷ T + 1,2 .
ŷ T + 1,2 = j = 0 T w j a r x T j + b a r
where w j a r and b a r are the coefficients of the autoregressive model. Then, the final predicted value of the model consists of the linear and nonlinear results superimposed.

3.2.2. Meteorological Factor Feature Extraction

Meteorological factors, as indirect factors, have a relationship with residential electrical loads, and the generalization performance of the model is improved by introducing meteorological time-series data. The coding of meteorological factors is divided into two steps. The first is differencing, where meteorological data is compared to electricity consumption; the magnitude of change is smaller and the trend is more obvious. Through data differencing, the changes in the data can be explicitly extracted. At the same time, after differential processing, the meteorological data, unlike residential electricity load data, are smooth. In encoding the meteorological data, we still used CNNs for feature extraction of meteorological changes on a local time scale, and, unlike the residential electrical loads, no further LSTMs were subsequently used for feature learning in order not to increase the complexity of the model. The differential data for the i -th meteorological factor at moment t can be expressed as
q i , t = q i , t q i , t 1
The m meteorological factors are each subjected to first-order differencing to obtain the sequence Q = ( q 1 , q 2 , , q m ) , and the local features are extracted by the convolution formula in Equation (1) to obtain the meteorological factor coding sequence Q ^ = ( q ^ 1 , q ^ 2 , , q ^ m ) .

3.2.3. Multi-Feature Fusion

After the first two stages of feature learning and temporal coding of the data, respectively, the two coding vectors H and Q , extracted from the electrical and meteorological data features, need to be aligned and fused with the data before the predictions can be made. The multi-feature fusion layer constructed in this paper is based on the residual connectivity mechanism. The residual connectivity mechanism was first proposed to solve the gradient degradation problem of deep networks [37]. In traditional neural networks, a direct mapping function is generally used to establish the connection between the input and output, i.e., calculating the mapping relationship from input X to output Y . The calculation formula can be expressed as Y = F ( X ) , where F ( ) is the mapping relationship from X to Y . Unlike conventional neural networks, instead of learning a direct mapping function F ( ) from input X to output Y , the residual connectivity mechanism defines the output as a linear superposition and nonlinear transformation of the input, which can be expressed as
Y = F ( X ) + X
The structure of our constructed multi-feature fusion neural network is shown in the figure. For the two encoding matrices, H and Q ^ , output from the encoder, the data are first aligned in temporal order and fused by a residual network with layer normalization.
The residual network is shown in Figure 4, where F C is the fully connected layer, the fusion module is stacked by multiple residual modules, and the fusion vector at moment t is calculated at the i -th residual block as follows:
H ^ t , i = V r e s , i T R e l u ( W r e s , i H ^ t + b r e s , i ) + b r e s , i + H ^ t
Figure 4. Multi-feature fusion module.
LN was proposed by Jimmy Lei Ba et al. in 2016. LN improves the training speed of neural networks by directly estimating normalized statistics on the total input from neurons within the hidden layer [38]. If the number of hidden points in a given fully connected layer is H N and the neuron input for the corresponding point is α , then we can calculate the normalized statistics μ and σ for the LN by using Equations (10) and (11):
μ = 1 H N i = 1 H N α i
σ = 1 H i = 1 H ( α i μ ) 2
In order not to destroy the previous information, the LN introduces a set of gain g and bias b parameters, whose dimensional size is kept consistent with the output dimension, and let f be the activation function. The output after the LN is shown in Equation (12):
h = f ( g σ 2 + o ` ( a μ ) + b )
where o ` is a very small constant for the purpose of solving the division by zero problem.

3.2.4. Forecasting

The prediction layer is based on the LSTM learning fusion layer output of multiple time step vectors H ^ = ( h ^ 1 , h ^ 2 , , h ^ l ) , and the LSTM hidden state at time t is d t = f d ( d t 1 , h ^ t ) according to the LSTM neural network unit calculation formula given above, where f d is the LSTM calculation formula in the decoder. Its internal neurons are calculated as in Formula (4), the output vector of the last time step is used as the input of the linear fully connected layer, and a nonlinear prediction value y ^ T + 1,1 is obtained through the output of the fully connected layer.
y ^ T + 1,1 = V l T ( W l d l + b l , 1 ) + b l , 1
The final prediction of the model consists of a superposition of linear and nonlinear results.
Y ^ = y ^ T + 1,1 + y ^ T + 1,2

4. Case Study

In this section, we first describe an empirical study in the electricity open-source dataset. We then describe the parameter settings and evaluation metrics of the model. Finally, we conduct training and testing of the model on each of the three users, along with error analysis. This paper uses a GTX3090 GPU and, four A5000 GPUs, and an Ubuntu 16.04 server environment with an I9-10100 CPU for training and testing. All code was implemented in Python 3.8, PyTorch 1.18, Sci-Learn 1.11, and other frameworks.

4.1. Datasets

The UMass Smart Dataset—2017 release [39] contains data from multiple single-family apartments, along with weather data from a nearby observation station, including temperature, humidity, atmospheric pressure, dew point, wind speed, and wind direction. In this dataset, apartment power consumption is recorded every 15 min, while weather data are recorded hourly. In this study, the power consumption data from the first 90 days of 2015 for users identified as APT14, APT16, and APT101 were selected for analysis. Temperature, humidity, and dew point were used as the meteorological features. Before training, the power consumption data were upsampled to an hourly interval using an upsampling method, and all time-series data were normalized to a 0–1 range to eliminate differences in units. Finally, each user dataset was split into training, validation, and test sets at a ratio of 8:1:1.

4.2. Performance Evaluation

In this paper, root mean square error (RMSE), mean absolute error (MAE), and mean square error (MSE) are used to assess the prediction error, with smaller values indicating more accurate model predictions.
M A E = 1 n i = 1 n y ^ i y i
R M S E = 1 n i = 1 n ( y ^ i y i ) 2
M S E = 1 n i = 1 n ( y ^ i y i ) 2
where n is the number of load values, y i is the true load value, and y ^ i is the predicted value from the forecasting model.

4.3. Experimental Settings

4.3.1. Loss Function Setting

The neural network is trained by a backpropagation algorithm. In the training phase, this paper chose to use the Adam optimizer [40] to train the model through the MSE loss, with the MSE loss function formulated as follows:
L ( θ ) = y ^ y 2 2
where θ is all the trainable parameters in the model.

4.3.2. Hyperparameter Setting

The model proposed in this paper has six hyperparameter settings, subsequence length l , subsequence CNN output channels O s , LSTM hidden dimension E s , number of weather encoder output channels O w , time window size T , and predicted LSTM network hidden state E d . Set the time window length as T = 72 . For simplicity, E s and E d are set to be the same, E s = E d = 32,64,128,256 , and a grid search is used to find the optimum. The hyperparameter settings for the remaining comparison models are shown in the table below. During training, we tested five learning rates ranging from 1 × 10−1 to 1 × 10−5 with dropout set to 0 or 0.5 for each model, using early stop and setting the patience parameter to 5 to stop training when the loss in the validation set no longer decreases. Finally, we fine-tune the models for 150 epochs with a learning rate of 1 × 10−4, exponential decay of e0.98 per step, early stop of 5, and Adam as the optimizer.
In the prediction, the final performance of the models was averaged over 10 runs for all models. In order to verify the experimental effects, five models are compared with MLP, LSTM, GRU, AE-MLP, and CNNLSTM, and a brief description of the comparison models is given in Table 1. The hyperparameter settings of the compared models are summarized in Table 2.
Table 1. Contrast model introduction.
Table 2. Contrast model hyperparameters.

4.4. Subsequence Partitioning and Information Compensation

The subsequence partitioning enhances the nonlinear capability of the model by changing the data dimensionality of the long series of customer electric loads, allowing the model to better learn the uncertain data features. Our work requires the prediction of residential electricity load time series in a given past T time period. By dividing T into n equal-length subseries of length l for data uncertainty special extraction, it is obvious that the value is related to the periodic characteristic of electricity use by users, and it is easy to think that 24 h is an applicable length. In order to further study the periodic characteristic, we studied the relationship between the lagging and autocorrelation coefficients of three users in 0–72 h; the peak point in the figure is the periodic relationship in the load. As shown in Figure 5, it can be seen that the autocorrelation coefficients of both APT14 and APT101 users show peak points at a lag of 24 h; in APT101, in addition to 24 h, peak points also appear at 2 h, i.e., this user takes 2 as the minimum period, while APT16 users have smaller changes in the 12–24 interval and the peak points are not significant, i.e., the periodicity is weak. Through the study of lag and autocorrelation, we have been able to select 24 as the appropriate subsequence length. To further validate the reasonableness of this subsequence division, as shown in Figure 6 and Table 3, we performed model prediction performance validation in the same conditions for subsequence lengths of 6, 9, 12, 24, and 36 in the load data of three users.
Figure 5. Hysteresis and autocorrelation coefficient.
Figure 6. MAEs for different length subseries.
Table 3. Prediction performance of different subsequence length models.
The results from multiple experiments indicate that among the three users, the model performs best when the subsequence length is set to 24, making this choice for the optimal subsequence length reasonable. Compared to the unsegmented sequence length l = 72, the model with subsequence partitioning provides better predictive performance, demonstrating the model’s sensitivity to subsequence length. An inappropriate subsequence length adversely affects the model’s ability to process user power load information. Notably, for APT14, subsequence lengths of 12 and 36 yielded poor performance, which aligns with the information provided by the autocorrelation plot. For APT16, the model performs poorly at subsequence lengths of 9 and 36. Cross-referencing with the autocorrelation coefficients shows that APT16 lacks significant periodicity within a 24 h interval, with a trough at 36 h, indicating that when subsequence partitioning does not match user periodicity, the model’s inference ability decreases. A similar conclusion can be drawn for APT101, though due to the presence of a minimum cycle of 2, the model’s error range remains relatively stable across different subsequence lengths for this user.

4.5. Forecasting and Error Analysis

Based on the above parameters, the model was trained and predicted using three user data with user numbers APT14, APT16, and APT101, and the prediction results are shown in Figure 7, Figure 8 and Figure 9.
Figure 7. Prediction results of the six models for user APT14.
Figure 8. Prediction results for the six models of the user APT16.
Figure 9. Prediction results for the six models of the user APT101.
Based on the error measures described above, the error analysis of the proposed model and the comparison model for the three user predictions was carried out in this paper, and the error results are shown in the following Table 4:
Table 4. Error statistics for the proposed model and the comparison model.
The prediction errors of the model proposed in this paper for the three users of numbers APT14, APT16, and APT101 are as follows: MAE: 4.23 × 10−2, 3.67 × 10−2, 5.00 × 10−2; MSE: 3.46 × 10−3, 3.77 × 10−3, 4.95 × 10−3; RMSE: 5.88 × 10−2, 6.14 × 10−2, 7.04 × 10−2. Compared to the five comparison models such as MLP and LSTM, our model error is optimal on all three error evaluation metrics. The forecasting results for the three users showed that direct forecasting using the fully connected MLP model was poor and the least effective of all the forecasting models; however, the results also showed that the MLP model could make predictions of upward and downward trends. The improvement in all three error indicators over direct regression using the MLP demonstrates the validity of the analysis of one indicator series by multiple fully connected networks followed by data fusion, which informs our approach to constructing external factor inputs to the data fusion. However, AEMLP is prone to overfitting, while the model has a large number of hyperparameters that make it difficult to train and debug, providing a challenge to use the model for load forecasting. Comparing the added CNN with the traditional LSTM model, the CNNLSTM model is not effective in improving the model prediction performance compared to the LSTM. Local features extracted using CNN convolution in sequences with multiple different metrics still require further processing of the local features. Our proposed model employs an input attention scheme for multiple sequences of local features extracted by the CNN, giving each feature a different weight through input attention, which works well in subsequent prediction tasks.
The error is further analyzed, and the box plot of the error distribution in Figure 10 shows that the error of our proposed model has the smallest interquartile and three-quarter interquartile range of its error distribution and the most stable error.
Figure 10. Prediction error distribution.
Figure 11, Figure 12 and Figure 13 present an analysis of the MAE in 24 h predictions for each of the three users. For user APT14, our proposed model shows relatively stable error levels between 13:00 and 21:00. In the case of user APT16, all models exhibit higher error rates in the first 12 h, followed by a more stable error pattern in the latter 12 h. For user APT101, the proposed model demonstrates more pronounced fluctuations in prediction error throughout the 24 h period.
Figure 11. The 24 h error distribution of APT14 user prediction results.
Figure 12. The 24 h error distribution of APT16 user prediction results.
Figure 13. The 24 h error distribution of APT101 user prediction results.
By comparing the 24 h prediction error statistics of all models, the prediction stability of our proposed model is better in APT14 and APT16 users, while in APT101, the error of LSTM performs the most stable most of the time.

4.6. Ablation

As mentioned before, we introduced a meteorological feature extraction and multi-feature into the model we constructed, along with a special information compensation component and a weather factor fusion component. The data dimensionality of the input network is changed by subsequence division for feature extraction of user behavior uncertainty, but it also makes the model ignore part of the contextual information of the time-series data. Therefore, we introduce an autoregressive component for the compensation of full-text information. In the process of introducing meteorological factors into the model, we did not input them simultaneously with the electric load. Instead, we separately extracted features from the data using independent sub-networks and fused the data with the coded results from the subseries feature extraction in a structure equipped with residual connectivity and an LN mechanism. To verify the contribution of these components to the prediction performance, we performed carefully designed ablation experiments on the model with the following three sets of experiments under the conditions of the best length of the subsequence l = 24 and the same hyperparameter settings above.
Case 1: Delete the information compensation component.
Case 2: No meteorological factors are introduced.
Case 3: Introduce meteorological data directly without fusion.
As is shown in Table 5, further analysis of the experimental results in Case 1 shows that adding information compensation effectively improves model prediction performance in APT14 and APT101 users, and, in APT16 users, there is no significant improvement in the MAE by adding information compensation, but there is a significant improvement in model performance by adding the information compensation component in the MSE and RMSE errors. The results in Case 1 demonstrate that although the results in Case 1 show the improvement in model capability by information compensation is small in some cases, the information compensation component is still effective in improving the model results in several error metrics. Overall, the introduction of the information compensation component enhances the stability of the model and effectively improves the overall performance of the model.
Table 5. Errors in ablation experiments.
Comparing the results of Case 2 and Case 3shows that the introduction of meteorological data can improve the prediction ability of our proposed model. The analysis of the results of Case 3 shows that the prediction model without the multifactor fusion layer has an increase in the values of the MAE, MSE, and RMSE when compared with the prediction model with the multifactor fusion layer proposed in this paper; in APT14 users, the removal of the multifactor fusion process has less impact on the prediction results, while, in APT16 and APT14 users, removing the fusion layer has less impact on the prediction results. However, in APT16 and APT101, removing the fusion layer has a greater impact on the prediction performance, which proves that for meteorological sensitive users, the introduced multifactor fusion layer can improve the model performance, while, for users who are not sensitive to meteorological data, adding meteorological data can also improve the fault tolerance of the model and ensure the prediction performance.

4.7. Complexity and Scalability

During training, the LSTM model took 102 min, the CNN-LSTM model took 114 min, and the proposed model took 217 min. Due to the more complex model structure, the proposed model’s training time is approximately twice that of the LSTM and CNN-LSTM models. Moreover, the model’s design incorporates a 1D CNN module for extracting external features and a residual-based fusion module for integration, showcasing its scalability in adapting to additional features effectively.

5. Conclusions

This paper presents a short-term residential electricity load forecasting model based on the fusion of customer load uncertainty feature extraction and meteorological factors. The model first performs subsequence partitioning and alignment for long sequences of customer electricity loads, processes local features using the CNN, and captures the customer’s electricity consumption behavior over historical periods using an input attention mechanism. Meanwhile, for external meteorological factors, the input data were preprocessed with first-order differences, and a CNN model was subsequently used for feature extraction and coding. In the feature fusion layer, user load features and meteorological factors are fused on a time-step-by-time basis in a fusion module based on residual connections, and, finally, the fused data are predicted using an LSTM network with an autoregressive module. The best performance of the proposed model was verified by testing the user load on the real UMASS dataset and comparing it with five benchmark models, MLP, LSTM, GRU, AEMLP, and CNNLSTM.
However, only meteorological factors are used in this paper for external influences on customer behavior; social and calendar factors should also be taken into account for customer load forecasting, and the length of the subseries used in this paper is a fixed 24 h. Therefore, in future work, we will try to integrate the calendar factors and the social information factors as a method to solve the model overfitting and improve the prediction accuracy. We will also explore the impact of variable-length subseries and other user load uncertainty analysis methods on the model accuracy.

Author Contributions

Conceptualization, W.C. and Y.Z.; methodology, W.C. and H.L.; software, H.L. and X.Z.; validation, Y.Z. and X.L.; formal analysis, W.C. and H.L.; investigation, Y.Z. and X.Z.; data curation, W.C. and H.L.; writing—original draft, W.C. and H.L.; writing—review and editing, W.C., Y.Z., and H.L.; visualization, W.C. and H.L.; supervision, W.C. and Y.Z.; project administration, Y.Z.; funding acquisition, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (Grant no. 72274058), the Hunan Province Education Department General Project for Teaching Reform in Colleges and Universities (Grant no. HNJG-20230794), the General Project of Xiangjiang Laboratory in China (Grant no. 22XJ03021), and the Interdisciplinary Research Project at Hunan University of Technology and Business, China (Grant no. 2023SZJ01).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the first author.

Conflicts of Interest

Author Xiao Ling was employed by the company State Grid Hunan Electric Power Co., Ltd. Information and Communication Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Wang, S.; Chen, H.; Wu, L.; Wang, J. A novel smart meter data compression method via stacked convolutional sparse auto-encoder. Int. J. Electr. Power Energy Syst. 2020, 118, 105761. [Google Scholar] [CrossRef]
  2. Raza, A.; Jingzhao, L.; Ghadi, Y.; Adnan, M.; Ali, M. Smart home energy management systems: Research challenges and survey. Alex. Eng. J. 2024, 92, 117–170. [Google Scholar] [CrossRef]
  3. O’Donnell, J.; Su, W. Attention-Focused Machine Learning Method to Provide the Stochastic Load Forecasts Needed by Electric Utilities for the Evolving Electrical Distribution System. Energies 2023, 16, 5661. [Google Scholar] [CrossRef]
  4. Wu, F.; Cattani, C.; Song, W.; Zio, E. Fractional ARIMA with an improved cuckoo search optimization for the efficient Short-term power load forecasting. Alex. Eng. J. 2020, 59, 3111–3118. [Google Scholar] [CrossRef]
  5. Aflaki, A.; Gitizadeh, M.; Kantarci, B. Accuracy improvement of electrical load forecasting against new cyber-attack architectures. Sustain. Cities Soc. 2022, 77, 103523. [Google Scholar] [CrossRef]
  6. Heng, S.; Minghao, X.; Ran, L. Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
  7. Danish, M.S.S.; Ahmadi, M.; Ibrahimi, A.M.; Dinçer, H.; Shirmohammadi, Z.; Khosravy, M.; Senjyu, T. Data-Driven Pathways to Sustainable Energy Solutions; Springer Nature: Cham, Switzerland, 2024; Volume 2024, pp. 1–31. [Google Scholar] [CrossRef]
  8. Wang, Y.; Chen, Q.; Hong, T.; Kang, C. Review of Smart Meter Data Analytics: Applications, Methodologies, and Challenges. IEEE Trans. Smart Grid 2019, 10, 3125–3148.6. [Google Scholar] [CrossRef]
  9. Aseeri, A.O. Effective RNN-Based Forecasting Methodology Design for Improving Short-Term Power Load Forecasts: Application to Large-Scale Power-Grid Time Series. J. Comput. Sci. 2023, 68, 101984. [Google Scholar] [CrossRef]
  10. Yu, P.; Fang, J.; Xu, Y.B.; Shi, Q. Application of Variational Mode Decomposition and Deep Learning in Short-Term Power Load Forecasting. J. Phys. Conf. Ser. 2021, 1883, 012128. [Google Scholar] [CrossRef]
  11. Wang, J.; Du, Y.; Wang, J. LSTM based long-term energy consumption prediction with periodicity. Energy 2020, 197, 117197. [Google Scholar] [CrossRef]
  12. Kong, W.; Dong, Z.; Jia, Y.; Hill, D.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 2019, 10, 841–851. [Google Scholar] [CrossRef]
  13. Cho, K.; van Merrienboer, B.; Gülçehre, Ç.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
  14. Wang, Y.; Liu, M.; Bao, Z.; Zhang, S. Short-Term Load Forecasting with Multi-Source Data Using Gated Recurrent Unit Neural Networks. Energies 2018, 11, 1138. [Google Scholar] [CrossRef]
  15. Hossen, T.; Nair, A.; Chinnathambi, R.; Ranganathan, P. Residential Load Forecasting Using Deep Neural Networks (DNN). In Proceedings of the 2018 North American Power Symposium (NAPS), Fargo, ND, USA, 9–11 September 2018; pp. 1–5. [Google Scholar] [CrossRef]
  16. Tang, X.; Dai, Y.; Liu, Q.; Dang, X.; Xu, J. Short-term Power Load Prediction Based on Multilayer Bidirectional Recurrent Neural Network. Power Capacit. React. Power Compens. 2022, 43, 96–104. [Google Scholar] [CrossRef]
  17. Pavlatos, C.; Makris, E.; Fotis, G.; Vita, V.; Mladenov, V. Enhancing Electrical Load Prediction Using a Bidirectional LSTM Neural Network. Electronics 2023, 12, 4652. [Google Scholar] [CrossRef]
  18. Zou, Z.; Wang, J.; E, N.; Zhang, C.; Wang, Z.; Jiang, E. Short-Term Power Load Forecasting: An Integrated Approach Utilizing Variational Mode Decomposition and TCN–BiGRU. Energies 2023, 16, 6625. [Google Scholar] [CrossRef]
  19. Zhang, M.; Zhou, Y.; Xu, Z. Short-Term Load Forecasting Using Recurrent Neural Networks with Input Attention Mechanism and Hidden Connection Mechanism. IEEE Access 2020, 8, 186514–186529. [Google Scholar] [CrossRef]
  20. Rahman, A.; Srikumar, V.; Smith, A. Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks. Appl. Energy 2018, 212, 372–385. [Google Scholar] [CrossRef]
  21. Liu, T.; Tan, Z.; Xu, C.; Chen, H.; Li, Z. Study on deep reinforcement learning techniques for building energy consumption forecasting. Energy Build. 2020, 208, 109675. [Google Scholar] [CrossRef]
  22. Kim, T.; Cho, S. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
  23. Zang, H.; Xu, R.; Cheng, L.; Ding, T.; Liu, L.; Wei, Z.; Sun, G. Residential load forecasting based on LSTM fusing self-attention mechanism with pooling. Energy 2021, 229, 120682. [Google Scholar] [CrossRef]
  24. Chen, H.; Wang, S.; Wang, S.; Li, Y. Day-ahead aggregated load forecasting based on two-terminal sparse coding and deep neural network fusion. Electr. Power Syst. Res. 2019, 177, 105987. [Google Scholar] [CrossRef]
  25. Khan, Z.; Hussain, T.; Ullah, A.; Rho, S.; Lee, M.; Baik, S. Towards Efficient Electricity Forecasting in Residential and Commercial Buildings: A Novel Hybrid CNN with a LSTM-AE based Framework. Sensors 2020, 20, 1399. [Google Scholar] [CrossRef]
  26. Hosein, S.; Hosein, P. Load forecasting using deep neural networks. In Proceedings of the 2017 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Torino, Italy, 26–29 September 2017; pp. 1–5. [Google Scholar] [CrossRef]
  27. Geng, G.; He, Y.; Zhang, J.; Qin, T.X.; Yang, B. Short-Term Power Load Forecasting Based on PSO-Optimized VMD-TCN-Attention Mechanism. Energies 2023, 16, 4616. [Google Scholar] [CrossRef]
  28. Cui, J.; Li, Y.; Liu, J.L.; Li, J.; Yang, Z.; Yin, C.L. Efficient Self-attention with Relative Position Encoding for Electric Power Load Forecasting. J. Phys. Conf. Ser. 2022, 2205, 012009. [Google Scholar] [CrossRef]
  29. Gou, Y.; Guo, C.; Qin, R. Ultra short term power load forecasting based on the fusion of Seq2Seq BiLSTM and multi head attention mechanism. PLoS ONE 2024, 19, e0299632. [Google Scholar] [CrossRef]
  30. Wu, Y.; Zhang, Z. Attention-BiLSTM Short-Term Electricity Load Forecasting Based on Sparrow Search Optimization. Process Autom. Instrum. 2023, 44, 91–95. [Google Scholar] [CrossRef]
  31. Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. arXiv 2017, arXiv:1704.02971. [Google Scholar] [CrossRef]
  32. Maryam, I. Electrical load-temperature CNN for residential load forecasting. Energy 2021, 227, 120480. [Google Scholar] [CrossRef]
  33. Wang, Y.; Zhang, N.; Chen, X. A Short-Term Residential Load Forecasting Model Based on LSTM Recurrent Neural Network Considering Weather Features. Energies 2021, 14, 2737. [Google Scholar] [CrossRef]
  34. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  35. Abdeljaber, O.; Avci, O.; Kiranyaz, S.; Gabbouj, M.; Inman, D. Real-time vibration-based structural damage detection using one-dimensional convolutional neural networks. J. Sound Vib. 2017, 388, 154–170. [Google Scholar] [CrossRef]
  36. Chen, K.; Chen, F.; Lai, B.; Jin, Z.; Liu, Y.; Li, K. Dynamic Spatio-Temporal Graph-Based CNNs for Traffic Flow Prediction. IEEE Access 2020, 8, 185136–185145. [Google Scholar] [CrossRef]
  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
  38. Ba, L.; Kiros, R.; Hinton, G. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
  39. Smart—UMass Trace Repository. Available online: https://traces.cs.umass.edu/docs/traces/smartstar (accessed on 21 April 2022).
  40. Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.