A Hybrid Forecast Model for Household Electric Power by Fusing Landmark-Based Spectral Clustering and Deep Learning

Shi, Jiarong; Wang, Zhiteng

doi:10.3390/su14159255

Open AccessArticle

A Hybrid Forecast Model for Household Electric Power by Fusing Landmark-Based Spectral Clustering and Deep Learning

by

Jiarong Shi

^1,2,*

and

Zhiteng Wang

¹

School of Science, Xi’an University of Architecture and Technology, Xi’an 710055, China

²

State Key Laboratory of Green Building in Western China, Xi’an University of Architecture and Technology, Xi’an 710055, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(15), 9255; https://doi.org/10.3390/su14159255

Submission received: 26 June 2022 / Revised: 21 July 2022 / Accepted: 26 July 2022 / Published: 28 July 2022

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Household power load forecasting plays an important role in the operation and planning of power grids. To address the prediction issue of household power consumption in power grids, this paper chooses a time series of historical power consumption as the feature variables and uses landmark-based spectral clustering (LSC) and a deep learning model to cluster and predict the power consumption dataset, respectively. Firstly, the investigated data are reshaped into a matrix and all missing entries are recovered by matrix completion. Secondly, the data samples are divided into three clusters by the LSC method according to the periodicity and regularity of power consumption. Then, all samples in each cluster are expanded via bootstrap aggregating technique. Subsequently, a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) is employed to predict power consumption. The goal of CNN is to extract the features from input data in sequence learning, and LSTM aims to train and predict the power consumption. Finally, the forecasting performance of the LSC–CNN–LSTM is compared with several other deep learning models to verify its reliability and effectiveness in the field of household power load. The experimental results show that the proposed hybrid method is superior to other state-of-the-art deep learning techniques in forecasting performance.

Keywords:

household power load forecasting; deep learning; matrix completion; landmark-based spectral clustering; convolutional neural network; long short-term memory

1. Introduction

Environmental protection and low-carbon economy have gradually become the focus of attention, and the exploitation of energy has gradually changed to the direction of sustainability [1]. Energy consumption is an important factor in economic benefits, energy conservation and emission reduction contribute to the realization of social sustainable development. The role of electricity in energy supply has been gradually strengthened. Grasping the development trend of energy and electricity in the future has become a necessary way for economic growth and environmental protection. Correctly predicting household electricity consumption on the power grid is of great significance for energy transformation and the pursuit of sustainable development goals [2]. With economic development and population growth, global power consumption has increased significantly. Accurate power consumption prediction is very important for energy supply, power dispatching and distribution, capital investment and market research management [3]. More and more scholars devote themselves to the research of accurate and reliable energy consumption prediction methods. In essence, energy consumption prediction is a typical time series problem, including univariate prediction and multivariate prediction. Due to the uncertainty and volatility of power data, the traditional machine learning methods cannot predict power consumption well. In contrast, the deep learning techniques can achieve higher accuracy. For the problem of power consumption prediction, numerous related literatures have investigated abundant prediction models based on machine learning or even deep learning. Summarizing the relevant literature, the prediction models for power consumption are roughly divided into three categories: statistical methods, machine learning techniques [4,5,6,7] and hybrid models [8,9,10,11,12,13,14].

The first category focuses on the statistical analysis of historical time series data with different features, and the identification of the relationship among variables. The classical statistical learning methods mainly consist of the Markov process, regression model, exponential smoothing model, autoregressive model (AR), moving average model (MA), autoregressive moving average model (ARMA), autoregressive comprehensive moving average model (ARIMA) and so on [15]. These methods have the merits of fast calculation speeds and strong interpretability in predicting energy generation and power consumption. However, they cannot deal with sudden non-stationary data in general. More specifically, power consumption data are prone to be influenced by the seasons, the working day and other factors. Hence, this high variability can greatly reduce the prediction accuracy of statistical learning methods. The machine learning models can learn the complex nonlinear relationships and other relevant parameters in time series. Generally, these advanced learning models are superior to those statistical learning methods because of their powerful representation ability.

With the maturity of artificial intelligence technology, machine learning models have been widely applied in the community of energy generation and power consumption [16,17,18]. These models can effectively extract high-dimensional complex features and construct nonlinear mapping directly from input to output. Machine learning models adopted in the power consumption mainly include support vector machines (SVM) [19], support vector regression (SVR) [20,21], decision trees and artificial neural network (ANN) [22,23]. In the early stage of ANN, the shallow network construction is easier to implement and thus is widely used. The number of hidden layers in a shallow neural network is relatively small to avoid over-fitting, falling into a local minimum and gradient disappearance. One possible disadvantage of the aforementioned shallow topology is that the features contained in the data may not be completely extracted. As a result, the prediction accuracy needs to be further improved.

Deep learning techniques have emerged and achieved a rapid development with the continuous improvement of computer hardware, software and big data technology. So far, the boom of deep neural network technology has bred massive deep learning models, including deep belief network [24], convolutional neural network (CNN), long short-term memory (LSTM) [25,26], generative adversarial network, deep residual network [27], etc. As a branch of neural networks, a CNN has also been employed in time series in recent years. Compared with fully connected network, a CNN has advantages in fewer parameters and lower complexity. In addition, LSTM is a variant of a recurrent neural network (RNN), and it considers feedback connections to update the state of previous input neurons. The main difference between RNN and LSTM is that the latter has a long-term memory unit, which overcomes the disadvantage of unstable gradients in long-term dependent sequences. Nowadays, the LSTM model has been extensively applied in power system [28,29,30].

The hybrid methods concentrate the advantages of physical methods and data-driven methods, and they are widely utilized in building, household and urban power load forecasting. Yang et al. [31] explored the potential of hybrid models based on extreme learning machines, RNN, SVM and multi-objective particle swarm optimization (MOPSO) in multi-step load forecasting. The experimental results in [31] showed that the combined prediction model has higher prediction accuracy than the single model. Deep learning methods have also been extensively adopted in multi-step advance energy generation and power load forecasting [32,33]. In [20], an integration of generalized RNN and SVM was proposed to predict power demand, and it ensured the accuracy of prediction results and the robustness of the model. In [34], a power consumption prediction method was developed based on bidirectional LSTM (BiLSTM) and ANN, and it was enhanced with a multiple simultaneously decreasing delays approach coupled with function fitting neural networks. The hybrid method in [34] was also employed to predict the total power consumption of business center consumers and refrigerator storage room. Wu et al. [35] explored the CNN–LSTM–BiLSTM model with attention mechanism to predict short-term power load. Furthermore, the combination of the clustering method and deep learning model has also been comprehensively studied in power system prediction [36,37], such as the energy consumption based on k-means clustering [38,39], residential power load [40] and residential and small commercial building power demand by means of k-nearest neighbor [41].

In the power system, the accurate prediction of energy generation and power consumption has increasingly become a research hotspot. Various methods and technologies have been proposed to predict the energy generation, power demand or consumption, so as to provide a basic guarantee for the safe and stable operation in the power grid.

The aim of this paper is to forecast the household electric power through fusing landmark-based spectral clustering (LSC) and deep learning. Firstly, the missing values are recovered by matrix completion. Then, the linear normalization is employed to guarantee that the features are scaled in a proper order of magnitude. Next, household electric power samples are partitioned by LSC, where LSC is a scalable clustering method in massive datasets. Ultimately, a deep learning model is established by combining the advantages of CNN and LSTM. As a subsequence, a hybrid of LSC and deep learning is proposed for household electric power forecast.

The remaining structure of the paper is organized as follows. Section 2 elaborates on the investigated dataset, reviews methods for data preprocessing and provides a framework of the hybrid model. Section 3 introduces landmark-based spectral clustering and several deep learning techniques and proposes the LSC–CNN–LSTM model. The experimental results and analysis are discussed in Section 4. Finally, the conclusions are drawn in Section 5.

2. Materials and Methods

2.1. Dataset

Individual Household Electric Power Consumption (IHEPC), a dataset from UCI machine learning repository, is studied in this paper. It collected the power consumption of a family in Paris, France, from 17 December 2006 to 20 November 2010. There exist seven attributes in IHEPC, including global active power, reactive power, voltage, global intensity, kitchen active energy, laundry active energy, electric water heater and air conditioning active energy. Among them, the first attribute measures the total consumption of household electricity. The above attributes are recorded with a one-minute sampling rate, so they are arranged by a multivariable time series in essence. Without loss of generality, this paper only investigates the global active power. This attribute contains some missing entries, and the proportion of missing values is nearly 7.51%. For more complex power consumption behavior, more attributes can be considered further. The annual change trend of global active power is shown in Figure 1, where GAP is the abbreviation of global active power. It can be seen from this figure that the duration of consecutive missing data is different, and the missing values are concentrated in April 2007, June 2009 and January, March, August and September 2010. The longest time span of consecutive missing data is about 5 days.

Figure 2 shows the global active power with 20-min temporal resolution in June 2008. From the figure we can see that there is an approximate periodicity in weeks. The power consumption at the weekend reaches the peak value within a week and that on weekdays is less in most cases.

2.2. Data Preprocessing

2.2.1. Matrix Completion

The traditional interpolation methods are not suitable for the case of consecutive missing data. To recover all missing values in global active power, this part will resort to matrix completion, a promising signal processing technique [42]. Suppose that the maximum time span for consecutive missing elements is less than

O

weeks. Let

D

be the total number of weeks in IHEPC. For concise description, let us further assume that

D = O P

, where

P

is a positive integer.

Let

v_{i}

be a column vector recording the household electric power in the i-th week,

i = 1, 2, \dots, D

. Therefore, the original data can be rewritten as a matrix

V = [\begin{matrix} v_{1} & v_{O + 1} & \dots & v_{(P - 1) O + 1} \\ v_{2} & v_{O + 2} & \dots & v_{(P - 1) O + 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ v_{O} & v_{2 O} & \dots & v_{P O} \end{matrix}]

(1)

When

P

is a relatively large number, the new data matrix

V

is approximately low rank in view of its construction mode. For a given real matrix

V

, the nuclear norm

{‖V‖}_{*}

is defined as the sum of its singular values [42]. The two-dimensional index set

Ω

is introduced to represent the subscript of the known elements in matrix

V

. Thus, a projection operator

T_{Ω} (\cdot)

is defined as follows: if

(i, j) \in Ω

, then

T_{Ω} (v_{i j}) = v_{i j}

; otherwise,

T_{Ω} (v_{i j}) = 0

.

Under mild conditions, all missing elements can be accurately recovered by solving a nuclear norm minimization problem, also called a matrix completion [42]. Mathematically, we establish the following optimization problem:

\min_{\tilde{V}} {‖\tilde{V}‖}_{*}, s . t . T_{Ω} (\tilde{V}) = T_{Ω} (V)

(2)

The optimal solution of the above problem is denoted by

V^{*}

.

The vectorization of

V^{*}

is the completed time series of global active power. Based on this, we increase the sampling period of global active power to improve stability and reduce computational complexity. Concretely speaking, an hourly power consumption is synthetized by adding the values every 60 min, which means that the temporal resolution is changed from one minute to one hour. As a result, a new hourly power consumption dataset is created.

2.2.2. Data Normalization

In the community of machine learning and data science, data normalization is an indispensable component, and it is conducive to promote the convergence speed and effect. This normalization process is extremely essential to perform forecasting, and it is also beneficial to improve the prediction accuracy.

In this paper, the linear normalization method, also known as deviation standardization, is employed to process the global active power data. Let

x_{\min}

and

x_{\max}

be the minimum and maximum values of all observations, respectively. An attribute value

x

can be normalized to the interval

[0, 1]

as follows:

x_{std} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(3)

Of course, other interval

[a, b]

is easily obtained by scale transformation:

x_{scaled} = x_{std} \times (b - a) + a

(4)

The goal of normalization is to eliminate the influence of dimensions on different attributes. Simultaneously, it can be instrumental in more accurate training and prediction of neural network.

2.2.3. Samples Generation

In machine learning and deep learning, the number of training samples is expected to be large enough for the purpose of avoiding overfitting. The time series is represented by

{l_{1}, l_{2}, \dots, l_{N^{'}}}

, where

N^{'}

is the number of time stamps. The sliding windows with width

d

and the sliding step size

s

are selected to intercept the sequence data, and the dimensionality of output is set to

h

. In other words, the previous

d

hours are employed to predict the next

h

hours, as shown in Figure 3 for the case

s = 1

. The input–output paired samples are constructed as follows:

Input features:

x_{i} = {(l_{i}, l_{i + 1}, \dots, l_{i + d - 1})}^{T}

.

Output features:

y_{i} = {(l_{i + d}, l_{i + d + 1}, \dots, l_{i + d + h - 1})}^{T}

.

Input–output pairs:

{(x_{i}, y_{i})}_{i = 1}^{N}

, where

N = N^{'} - d - h + 1

.

2.3. Construction of the Hybrid Model

The hybrid model presented in this paper is mainly divided into three parts. Taking the global active power of household electricity as the studied object, the first part adopts matrix completion to recover the missing values. In the second part, a large scale spectral clustering with landmark-based representation is harnessed to cluster the generated samples, and the bootstrap aggregating is used to resample on each cluster to ensure that the number of new samples is consistent with that of original samples. A deep learning model by combining CNN and LSTM is proposed by carrying out the prediction in the last part. The detailed flow chart of the proposed hybrid model is illustrated in Figure 4.

3. A Deep Learning Framework by Combining LSC, CNN and LSTM

3.1. Landmark-Based Spectral Clustering

Spectral clustering is a kind of clustering method based on matrix decomposition [43]. Compared with other algorithms, spectral clustering usually produces better experimental performance because it considers the manifold structure of samples instead of Euclidean space. A key process in spectral clustering is calculating the eigenvectors of Laplacian matrix constructed by a similarity matrix. However, the eigen-decomposition has high computational complexity. Hence, the unscalable clustering method is limited in large scale data applications.

Landmark-based spectral clustering (LSC) can effectively deal with the issue of large scale samples [44]. The basic principle of LSC is to design an effective eigen-decomposition method of a Laplace matrix by means of constructing a novel graph. In implementation,

p

representative data points are firstly selected as landmarks and each sample in the original data is approximately represented by the sparse linear combinations of these landmarks, where

p ≪ N

, and

N

is the number of samples. More specially, the sparse representation coefficients are directly obtained through the landmark-based representation matrix.

Matrix decomposition attempts to compress data by seeking a set of basis vectors and thus each data point is written as a linear combination of the bases [44]. Given the samples matrix

X = (x_{1}, x_{2}, \dots, x_{N}) \in ℜ^{d \times N}

, it is approximately decomposed into the product of two low-rank matrices:

X \approx U Z

(5)

where the base matrix

U \in ℜ^{d \times p}

and the coefficients matrix

Z \in ℜ^{p \times N}

. As a basis vector, each column of

U

captures the high-dimensional features from the original samples space. Meanwhile, each column of

Z

is the

p

-dimensional representation coefficient of the corresponding input instance under the new basis vectors. The difference between two matrices is frequently measured by the Frobenius norm

{‖\cdot‖}_{F}

of the residual. Hence, the optimal low-rank representation can be obtained by solving the following optimization problem:

\min_{U, Z} {‖X - U Z‖}_{F}^{2}

(6)

The optimal coefficient matrix

Z

in the above minimization problem is usually dense, which means that each sample is a linear combination of all bases. These dense coefficients may result in a negative effect on classification performance. Sparse coding in matrix decomposition is a popular method to overcome the aforementioned defect. Furthermore, a sparse regularization term is imposed on the objective function in Equation (6), and a new optimization problem is formulated as below:

\min_{U, Z} {‖X - U Z‖}_{F}^{2} + α f (Z)

(7)

where

f

is a function to evaluate the sparsity of each column in

Z

and

α

is a tradeoff coefficient to control the sparsity penalty.

To simplify model and reduce the computational complexity, LSC takes the landmarks as the basis vectors, which means all elements in basis matrix

U

are not variables. The landmarks can be randomly chosen from the samples set or acquired via k-means clustering algorithm, where all clustering centers are regarded as the landmarks.

For data point

x_{i}

, its estimated value is denoted by

{\hat{x}}_{i} = \sum_{j = 1}^{p} z_{j i} u_{j}

, where

u_{j}

is the

j

-th column of

U

and

z_{j i}

is an element in the

j

-th row and

i

-th column of

Z

. A natural assumption is that the closer

x_{i}

is to

u_{j}

, the larger the value of

z_{j i}

should be. Let

〈i〉

be the index set composed by

r

nearest neighbors of

x_{i}

from

{\{u_{j}\}}_{j = 1}^{p}

and

r ≪ p

. If

j \notin 〈i〉

, then

z_{j i}

is set to 0; otherwise, the value of

z_{j i}

is determined by the following formula:

z_{j i} = \frac{K (x_{i}, u_{j})}{\sum_{j^{'} \in 〈i〉} K (x_{i}, u_{j^{'}})}

(8)

where

K (\cdot, \cdot)

is a kernel function. In practice, the Gaussian function or thermal kernel is a priority to recommend the kernel function:

K (x_{i}, u_{j}) = \exp (- \frac{{‖x_{i} - u_{j}‖}_{2}^{2}}{2 h^{2}})

(9)

where

h

is a fixed bandwidth. The value of

r

restricts that

Z

is a sparse matrix.

According to the sparse representation matrix

Z

calculated from landmarks, the affinity matrix, also named as similarity matrix, can be obtained as follows:

W = {\hat{Z}}^{T} \hat{Z}

(10)

where

\hat{Z} = {\hat{D}}^{- 1 / 2} Z

,

\hat{D} = diag ({\hat{d}}_{11}, {\hat{d}}_{22}, \dots, {\hat{d}}_{p p})

is a

p

-order diagonal square matrix whose diagonal elements are the sum of each row of matrix

\hat{Z}

, i.e.,

{\hat{d}}_{i i} = \sum_{j} z_{i j}

.

The thin singular value decomposition is performed on matrix

\hat{Z}

and thus

\hat{Z} = A Σ B^{T}

is obtained, where

Σ = d i a g (σ_{1}, σ_{2}, \dots, σ_{p})

and

σ_{1} \geq σ_{2} \geq \dots \geq σ_{p} \geq 0

are singular values of

\hat{Z}

. Denote

A = (a_{1}, a_{2}, \dots, a_{p}) \in ℜ^{p \times p}

and

B = (b_{1}, b_{2}, \dots, b_{p}) \in ℜ^{N \times p}

, where

a_{i}

is called left singular vector and

b_{i}

is the right singular vector. It is easy to draw the following conclusions: each column of matrix

B

is an eigenvector of matrix

{\hat{Z}}^{T} \hat{Z}

, each column of matrix

A

is an eigenvector of matrix

\hat{Z} {\hat{Z}}^{T}

, and

σ_{i}^{2}

is the

i

-th largest eigenvalue of

{\hat{Z}}^{T} \hat{Z}

or

\hat{Z} {\hat{Z}}^{T}

. Furthermore, it holds that:

B^{T} = Σ^{- 1} A^{T} \hat{Z}

(11)

The procedure of LSC is summarized in Algorithm 1.

Algorithm 1 LSC

input:

{x_{i} \in ℜ^{d}}_{i = 1}^{N}

: samples set,

c

: number of clusters,

p

: number of landmarks.

output:

c

clusters after the clustering process.

1. Generate

p

landmarks using k-means clustering or a random selection from

{x_{i}}_{i = 1}^{N}

.

2. Construct the sparse affinity matrix

Z \in ℜ^{p \times N}

between data points and landmarks according to Equation (8) and normalize it by

\hat{Z} = {\hat{D}}^{- 1 / 2} Z

.

3. Calculate the first top

p

eigenvectors of matrix

\hat{Z} {\hat{Z}}^{T}

and form a matrix

A = (a_{1}, a_{2}, \dots, a_{p})

by stacking them column by column.

4. Compute matrix

B = (b_{1}, b_{2}, \dots, b_{p})

through Equation (11).

5. Regard each row of matrix

B

as a new data point and cluster them with k-means clustering or other algorithms to obtain the final result.

In Algorithm 1, the constructed samples matrix

X

is used as the input of clustering. Each column of

X

is regarded as an original sample, and all samples are eventually grouped into three clusters in our experiments. Under ideal conditions, the similarity of data points from the same cluster is high; otherwise, the similarity is low. The resulting three clusters are respectively adopted as the training set of deep neural network for prediction.

3.2. Bootstrap Aggregating

Generally, the number of training samples will decrease significantly owing to the utilization of LSC. However, all deep learning techniques require sufficient training samples. Under such circumstances, the proposed clustering method may lead to a risk of deteriorating overfitting. Meanwhile, fewer sample points will also seriously affect the prediction performance of deep learning. For these reasons, a sample augmentation strategy is proposed.

Bootstrap aggregating, also called as bagging, is a commonly-used ensemble learning method. For a given sample set, this method will generate a new sample set using a resampling technique. In the process of resampling, each point in original sample set is randomly selected according to the uniform distribution, and duplicate samples are permissible in the resulting set. Though the bootstrap aggregating, samples in each cluster can be augmented arbitrarily. For each cluster, this paper generates stochastically an updated sample set whose sample number is

N

.

Once each cluster is augmented, the prediction process will be executed based on deep learning methods. It is worth noting that the bootstrap aggregating technique allows us to fairly compare the experimental results before and after clustering, and simultaneous overfitting is alleviated to a certain extent. In addition, the data variability caused by bootstrap aggregating has a low risk of overfitting because it has little impact on batches.

3.3. Convolutional Neural Network

As a special case of a feedforward neural network, a convolutional neural network (CNN) owns the characteristics of convolution calculation and depth structure, and it has been one of the representative models among deep learning techniques. What is more, CNN can effectively learn the latent features from input data, respond to the surrounding units within the coverage, and have strong learning representation ability.

Although CNN is usually applied to extracting features on two-dimensional images, it also supports multiple one-dimensional input and is very suitable for a multi-step time series prediction. This paper will pay more attention to one-dimensional CNN, which mainly includes an input layer, several hidden layers and an output layer. Among them, the hidden layer consists of a convolution operator, pooling operator and full connection operator.

The first operator involves multiple convolution kernels and its function is to extract the features from the corresponding input. Each element of a convolution kernel represents a weight coefficient. The size of a convolution nucleus determines the size of a receptive field, and the step size controls the sliding length of the receptive field.

After feature extraction via the convolution operator, the output feature map will be transferred to the pooling operation for feature selection and information filtering. The effect of the pooling is to replace the single point results in the feature graph with the feature statistics of adjacent regions. The pooling area and the sliding size are, respectively, controlled by the pooling size and the step size, where the step is similar to that in the convolution kernel for the scanning feature graph.

The full connection operator is equivalent to the hidden layer in traditional feedforward neural network (FFNN). Both the convolution and pooling can extract the features from the input data, and the full connection is utilized to nonlinearly combine the extracted features to obtain the output. The prior layer of the output layer in CNN is usually the full connection.

An analytical diagram of one-dimensional convolution and pooling process is shown in Figure 5. The input vector with

d

dimensions is operated with

k

convolution kernels. In the later experiments, the size of convolution kernel and the step size are set to 3 and 1, respectively. Convolution operation is a process of continuously sliding weighted summation, where the height is set to 3. When the input sample is passed through the sliding window operation with

k

convolution kernels, a matrix

C = {(c_{i j})}_{(d - 2) \times k}

can be calculated through weighted summation:

c_{i j} = l_{i} w_{1 j} + l_{i + 1} w_{2 j} + l_{i + 2} w_{3 j} i = 1, 2, \dots, d - 2; j = 1, 2, \dots, k

(12)

The sliding window operation in pooling is similar to the convolution operation, and the pooling size is set to 2 in the experimental section. Pooling operator is a process of dimensionality reduction by continuously sliding the window with height 2 and width 1. Maximum pooling selects the largest value from the fixed region of interest as the output, where the region is obtained by convolution in the sliding window. After convolution and pooling operators, the previous feature matrix

C

is eventually be transformed into another matrix

P = {(p_{i j})}_{q \times k}

, where

p_{i j} = \max {c_{2 i - 1, j}, c_{2 i, j}}

,

q = (d - 2) / 2

.

3.4. Long Short-Term Memory

A recurrent neural network (RNN) carries the memory information from prior inputs and outputs, and it is especially suitable for the sequential data or time series data. As a popular variant of RNN, long short-term memory (LSTM) can overcome the problem of gradient disappearance and exploding by fusing self-connected gates in hidden cells. Figure 6 gives the architecture of an LSTM unit. There exist three types of gates in LSTM, i.e., input gate, forgetting gate and output gate. The first gate controls the input of new information in the memory cell. The second gate is response for whether previous information should be forgotten from memory cells. The last gate controls the output of information. The information is transmitted in the following order. The current information is first passed through the input gate to see whether there is input information. The second step is to judge whether the forgetting gate chooses to forget the information in the memory cell. Finally, the available information is further transmitted the output gate to judge whether to output the information at that moment.

Given input vector

x_{t}

in LSTM unit at time

t

, the calculation process is shown in Equations (13)–(18):

f_{t} = σ (W_{f h} h_{t - 1} + W_{f x} x_{t} + b_{f})

(13)

i_{t} = σ (W_{i h} h_{t - 1} + W_{i x} x_{t} + b_{i})

(14)

o_{t} = σ (W_{o h} h_{t - 1} + W_{o x} x_{t} + b_{o})

(15)

{\tilde{c}}_{t} = \tanh (W_{c h} h_{t - 1} + W_{c x} x_{t} + b_{c})

(16)

c_{t} = f_{t} \circ c_{t - 1} + i_{t} \circ {\tilde{c}}_{t}

(17)

h_{t} = o_{t} \circ \tanh (c_{t})

(18)

where the notation

\circ

indicates the Hadamard product,

σ (\cdot)

is the sigmoid activation function and

\tanh (\cdot)

is the hyperbolic tangent activation function. In Equations (13)–(15),

f_{t}

,

i_{t}

and

o_{t}

are, respectively, the forget gate, the input gate and the output gate;

W_{f h}

,

W_{i h}

and

W_{o h}

are the weight matrices of forget gate, input gate and output gate connecting the output of the previous unit, respectively;

W_{f x}

,

W_{i x}

and

W_{o x}

are the weight matrices of aforementioned three gates connecting the input of the current unit, respectively. In Equations (16)–(18),

{\tilde{c}}_{t}

,

c_{t}

and

h_{t}

are, respectively, the candidate values, the state vector and the output vector;

W_{c h}

and

W_{c x}

are the input weight matrices connecting the output of the previous unit, respectively. Moreover,

b_{f}

,

b_{i}

and

b_{o}

are the bias vectors of three gates, respectively, and

b_{c}

is the input bias vector.

3.5. Integration of LSC, CNN and LSTM

LSC classifies the original samples into several groups. In general, samples in each group have lower model complexity than all samples, which means that the trained forecasting model tends to have higher performance. CNN has powerful ability in feature extraction and representation learning. In addition, LSTM has natural advantage to transmitting information in time series data. Combining the advantages of these three methods, this paper presents a hybrid forecasting model, called as LSC–CNN–LSTM. Figure 7 sketches the framework of the proposed novel deep learning model. In this figure, the structure of LSC–CNN–LSTM mainly includes input feature vector module, large scale spectral clustering model, resampling module, CNN feature extraction module and sequence training prediction module.

Samples generated by the global active power data are firstly passed on to the input feature vector module, and then all input samples are partitioned into three clusters through LSC. Subsequently, bootstrap aggregating is applied in each cluster to increase the number of samples. For each cluster, the global active power of household electricity in previous 24 h is used as the input of deep neural network. The feature extraction module in CNN mainly includes two one-dimensional convolution layers and a maximum pooling layer. Specifically, one-dimensional convolution can read the sequence input features and automatically learn the latent features. The first convolution layer establishes the feature map from the input sequence, and the second performs the same convolution operation on the feature map created by the first layer to enlarge its salient features. The maximum pooling layer simplifies the feature map obtained by convolution for the sake of dimensionality reduction. Finally, the reduced feature map is flattened into a long vector through leveling.

The purpose of the Repeat Vector is that it repeats the internal representation of the input sequence multiple times, and repeats once for each time step in the output sequence. The output of CNN is input into the LSTM unit. The subsequent sequence prediction module mainly consists of one LSTM layer and two full connection layers. The LSTM layer is connected by two fully connected layers. The fully connected layer makes a nonlinear transformation to the previously extracted features, extracts the correlation between features and maps them to the output space. The global active power of the next hour is ultimately obtained after feature extraction and sequence training prediction.

4. Results and Discussion

In this section, the proposed hybrid model is applied to the IHEPC dataset to realize the one-step prediction of global active power of household electricity. The experimental results are compared with other benchmark neural network models, including FFNN, CNN and LSTM. The prediction accuracies with and without LSC are compared.

4.1. Experimental Preparation and Settings

In matrix completion,

O

is set to 2. As a result, the size of matrix

V^{*}

is

7 O \times P

. The rank of

V^{*}

is assumed to be

r^{'}

,where

r^{'} ≪ \min (7 O, P)

. All missing values can be recovered by matrix completion technique since the sampling rate of the dataset is greater than

(7 O \times P - r^{'}) \times r^{'} / (7 O \times P)

. Taking January 2010 as an example, there are 3129 consecutive missing values between 12 January to 14 January. Figure 8 illustrates the complete data in that month, where the blue stars represent the observed values and the red filled circles are the recovered values. It can be seen from this figure that the matrix completion has preferable performance in recovering missing values. Moreover, this completion technique is conducive to suppressing large noise.

The task in this paper is to predict the global active power of household electricity in the next hour based on the previous 24 h. According to the construction method of samples, a total of 34,416 samples are generated. In the process of the learning model, the first 80% of samples are taken as the training set and the last 20% as the test set. Figure 9 shows the diagrammatic sketch of the division of sample set.

In LSC, all samples are partitioned into three clusters, and

p

and

r

are set to 1000 and 5, respectively. Table 1 shows the parameter settings in each layer for the combined deep learning model. An input sample is firstly processed through two successive layers of one-dimensional CNN and one layer of maximum pooling operation is followed. The number of convolution kernels is set to 32, and the size of each kernel is 3. What is more, the rectified linear unit (relu) is chosen as the activation function in the convolutional module. After the first convolution operation, the input sample is converted to a 22 × 32 matrix. Through the second convolution operation, the above matrix is transformed into a 20 × 32 matrix. Ultimately, the feature matrix is further converted to a 10 × 32 matrix via the maximum pooling. The number of neurons in the LSTM unit is set to 50, and the relu is employed as the activation function. The number of neurons in two fully connected layers is set to 10 and 1, respectively.

The parameter settings for solving the deep neural network are shown in Table 2. Adam is selected as the optimizer of the network and mean square error (MSE) is the loss function to evaluate the learning performance. Furthermore, let the batch size be 40 and the number of epochs be 80.

Three commonly used evaluation indexes, the root mean square error (RMSE), the mean absolute error (MAE) and the determination coefficient

R^{2}

, are employed to quantitatively evaluate the prediction performance of the proposed model. Their calculation formulas are as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(19)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(20)

R^{2} = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \sqrt{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}}

(21)

where the scalar

y_{i}

is the actual value corresponding to the

i

-th input instance,

{\hat{y}}_{i}

is the predicted value of

y_{i}

,

\bar{y}

is the average value of the actual values,

\bar{\hat{y}}

is the average value of the predicted values and

n

is the total number of samples.

4.2. Results Comparison without Clustering

This subsection will only consider the prediction performance of the CNN–LSTM model without clustering. FFNN, CNN and LSTM models are compared with the combination of CNN and LSTM. A hidden layer with 10 nodes is set in FFNN. The model of CNN concludes two layers of one-dimensional CNN and one layer of maximum pooling, two fully connected layers are followed. The number of convolution kernels is set to 32, the size of each kernel is 3 and the pooling size is 2. The LSTM model consists of one LSTM layer and two fully connected layers. The number of neurons in the LSTM unit is set to 50, and the activation function is relu. The number of neurons in the two fully connected layers is set to 10 and 1, respectively.

Figure 10 shows the training loss of the aforementioned four models in power consumption prediction. The training loss of FFNN model converges in advance and the iteration is stopped after 19 epochs, which effectively avoids overfitting and keeps the error relatively small. As can be seen from Figure 10, the training loss becomes small with the increasing of the number of epochs, and it drops sharply in the second epoch, which lead to the subsequent fluctuation changes that are relatively small. The higher the number of epochs, the smaller the loss drop. The loss gradually approaches a steady state when the epoch is relatively large. Compared with other methods, CNN–LSTM achieves the optimal RMSE within 80 epochs.

For four single models, three evaluation errors on the test set are shown in the first four rows of Table 3, Table 4 and Table 5. It can be seen from the above tables that the RMSE of FFNN, CNN, LSTM and CNN–LSTM are 0.1535, 0.1618, 0.1564 and 0.1776, respectively; the MAE are 0.1065, 0.1197, 0.1071 and 0.1216, respectively; and the

R^{2}

are 0.7406, 0.7190, 0.7341 and 0.6646, respectively. According to these observations, FFNN and LSTM have better prediction performance than CNN and CNN–LSTM.

The last three days of the test set were selected to further compare the forecast results after inverse normalization. The RMSE of FFNN, CNN, LSTM and CNN–LSTM in these three days are 38.2275, 38.2686, 37.0078 and 40.6258, respectively; the MAE are 21.2063, 23.8025, 20.8860 and 23.8505, respectively; and the

R^{2}

are 0.6999, 0.7006, 0.7235 and 0.6593, respectively. The experimental results are also visualized in Figure 11. In this figure, the black curve is the actual observed value, and the others are the predicted values. It can be seen that the predicted value curves of these models are consistent with the actual observed value curve to a certain extent. From the evaluation indexes and prediction curves, LSTM yields the best prediction performance, and its prediction curve is closer to the actual curve.

4.3. Results Comparison with Clustering

Both k-means and LSC are adopted during the procedure of clustering, and the samples are classified into three clusters. Bootstrap aggregating is applied to expand samples in each cluster, which avoids overfitting to a certain extent and yields fair comparison. Subsequently, several deep learning models are employed to perform prediction in different clusters. Samples from the same cluster will be easier to train in general. Experiments are carried out separately via fusing the clustering method with machine learning or deep learning techniques. The prediction results with and without clustering are recorded and compared on test set.

For four single models, their combinations with clustering are, respectively, used to predict household electric power, and the evaluation errors are shown in Table 3, Table 4 and Table 5. With k-means clustering, the CNN–LSTM has the best performance among four models. On average, the RMSE of CNN–LSTM is 0.0896, the MAE is 0.0533 and the

R^{2}

is 0.9396. Hence, the performance of CNN–LSTM is greatly improved compared with other three models. With LSC, the RMSE values of three clusters in the CNN–LSTM model are 0.0392, 0.1072 and 0.1040, respectively, and the average is 0.0835; the MAE values are 0.0240, 0.0625 and 0.0622, respectively, and the average is 0.0496; and the

R^{2}

values are 0.9813, 0.9285 and 0.9304, respectively, and the average is 0.9467. In summary, two error evaluation indexes of CNN–LSTM with LSC are smaller than those without clustering in three clusters, and the

R^{2}

obtains the maximum value. As for running time, k-means clustering takes 6.6689 s and LSC is 2.4655 s faster than k-means clustering. Although k-means–CNN–LSTM has a better result in cluster 2 and cluster 3 in terms of RMSE and MAE, it is only slightly better and is not dominant in time. LSC–CNN–LSTM has obvious advantages in cluster 1 and has superiority in the overall accuracy. The experimental results show that the prediction accuracy can be effectively improved by using the CNN–LSTM model with LSC.

Similarly, the forecast results of the last three days (72 samples) for three clusters are, respectively, displayed to visualize the disparity between the actual value and the predicted value under different models with LSC. The evaluation errors of LSC–FFNN, LSC–CNN, LSC–LSTM and LSC–CNN–LSTM in these three days are shown in Table 6, Table 7 and Table 8. The RMSE and MAE of LSC–CNN–LSTM are smaller than other models in three clusters, and the

R^{2}

is higher. The corresponding experimental results are illustrated in Figure 12.

In Figure 12, the black curve is the actual power consumption, and other curves represent the prediction results of different models. It can be seen from this figure that the predicted value of LSC–CNN–LSTM model has the similar change trend with the actual value, which is in line with the actual value in most cases. To summarize, the predicted performance of CNN–LSTM with LSC is better than that without clustering.

From the analysis of the above experimental results, the proposed CNN–LSTM model integrating LSC enormously improved the prediction accuracy as a whole. The hybrid model of fusing landmark-based spectral clustering and deep learning is effective and feasible to some extent.

5. Conclusions

This paper investigates the problem of household electric power prediction. The fluctuation and uncertainty of power consumption bring out great challenges to the prediction of future power consumption. To address these issues, a hybrid forecasting model is proposed by combining LSC, CNN and LSTM. Firstly, the large scale spectral clustering method based on landmarks is employed to cluster all samples according to the periodicity and seasonality of electricity consumption. Then, the samples in each cluster are expanded via the bootstrap aggregating technique. Next, the combined deep learning model is used to accomplish prediction task on all clusters. Finally, the prediction performances of with clustering and without clustering are compared. The experimental results show that the proposed LSC–CNN–LSTM model outperforms other machine learning or deep learning methods in predicting household power consumption. Simultaneously, these results verify the effectiveness of the hybrid deep learning model based on the LSC method and CNN–LSTM.

The combination of clustering and deep learning can effectively deal with the massive samples brought to us in the era of big data and has considerable development prospects in the field of artificial intelligence. This paper only considers one-step prediction and one attribute in household power consumption. Multiple-step prediction involving multiple attributes is quite worthwhile for further research. In addition, although bootstrap aggregating is employed in the proposed method, the ensemble learning technique is not considered. The ensemble of deep learning can yield better generalization performance and provide an interval prediction. Hence, a deep ensemble model will be a promising research subject. All aforementioned models also can be more widely applied in renewable energy fields, such as solar energy, wind energy and geothermal power generation, which is a promising research direction.

Author Contributions

Conceptualization, J.S.; Methodology, Z.W.; Software, Z.W.; Supervision, J.S.; Writing—original draft, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received financial support from the Natural Science Basic Research Plan in Shaanxi Province of China (2021JM-378,2021JQ-493) and the National Key R&D Program of China (2018YFB1502902).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption (accessed on 20 July 2022).

Acknowledgments

We are grateful for the insightful comments and suggestions made by the editor and anonymous reviewers.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Nesticò, A.; Maselli, G. Declining discount rate estimate in the long-term economic evaluation of environmental projects. J. Environ. Account. Manag. 2020, 8, 93–110. [Google Scholar] [CrossRef]
AlKhars, M.; Miah, F.; Qudrat-Ullah, H.; Kayal, A. A Systematic Review of the Relationship Between Energy Consumption and Economic Growth in GCC Countries. Sustainability 2020, 12, 3845. [Google Scholar] [CrossRef]
Son, N.; Yang, S.; Na, J. Deep neural network and long short-term memory for electric power load forecasting. Appl. Sci. 2020, 10, 6489. [Google Scholar] [CrossRef]
Lin, Y.; Luo, H.; Wang, D.; Guo, H.; Zhu, K. An ensemble model based on machine learning methods and data preprocessing for short-term electric load forecasting. Energies 2017, 10, 1186. [Google Scholar] [CrossRef] [Green Version]
Shi, H.; Xu, M.; Li, R. Deep learning for household load forecasting-A novel pooling deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
Cai, M.; Pipattanasomporn, M.; Rahman, S. Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques. Appl. Energy 2019, 236, 1078–1088. [Google Scholar] [CrossRef]
Liu, H. The forecast of household power load based on genetic algorithm optimizing BP neural network. J. Phys. Conf. Ser. 2021, 1871, 12110. [Google Scholar] [CrossRef]
Zhang, J.; Wei, Y.; Li, D.; Tan, Z.; Zhou, J. Short term electricity load forecasting using a hybrid model. Energy 2018, 158, 774–781. [Google Scholar] [CrossRef]
Sajjad, M.; Khan, Z.; Ullah, A.; Hussain, T.; Ullah, W.; Lee, M.; Baik, S. A novel CNN-GRU-based hybrid approach for short-term residential load forecasting. IEEE Access 2020, 8, 143759–143768. [Google Scholar] [CrossRef]
Barman, M.; Choudhury, N. Season specific approach for short-term load forecasting based on hybrid FA-SVM and similarity concept. Energy 2019, 174, 886–896. [Google Scholar] [CrossRef]
Jin, Y.; Guo, H.; Wang, J.; Song, A. A hybrid system based on LSTM for short-term power load forecasting. Energies 2020, 13, 6241. [Google Scholar] [CrossRef]
Alhussein, M.; Aurangzeb, K.; Haider, S. Hybrid CNN-LSTM model for short-term individual household load forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
Khan, Z.; Hussain, T.; Ullah, A.; Rho, S.; Lee, M.; Baik, S. Towards efficient electricity forecasting in residential and commercial buildings: A novel hybrid CNN with a LSTM-AE based framework. Sensors 2020, 20, 1399. [Google Scholar] [CrossRef] [Green Version]
Jogunola, O.; Adebisi, B.; Hoang, K.; Tsado, Y.; Popoola, S.; Hammoudeh, M.; Nawaz, R. CBLSTM-AE: A hybrid deep learning framework for predicting energy consumption. Energies 2022, 15, 810. [Google Scholar] [CrossRef]
Matrenin, P.; Safaraliev, M.; Dmitriev, S.; Kokin, S.; Ghulomzoda, A.; Mitrofanov, S. Medium-term load forecasting in isolated power systems based on ensemble machine learning models. Energy Rep. 2021, 8, 612–618. [Google Scholar] [CrossRef]
Pirbazari, A.; Sharma, E.; Chakravorty, A.; Elmenreich, W.; Rong, C. An ensemble approach for multi-step ahead energy forecasting of household communities. IEEE Access 2021, 9, 36218–36240. [Google Scholar] [CrossRef]
Han, L.; Peng, Y.; Li, Y.; Yong, B.; Zhou, Q.; Shu, L. Enhanced deep networks for short-term and medium-term load forecasting. IEEE Access 2019, 7, 4045–4055. [Google Scholar] [CrossRef]
Navid, S.; Ameer, N.; Mohammadali, K.; Mazdak, N. Medium-term regional electricity load forecasting through machine learning and deep learning. Designs 2021, 5, 27. [Google Scholar]
Tan, Z.; Zhang, J.; He, Y.; Zhang, Y.; Xiong, G.; Liu, Y. Short-term load forecasting based on integration of SVR and stacking. IEEE Access 2020, 8, 227719–227728. [Google Scholar] [CrossRef]
Li, W.; Yang, X.; Li, H.; Su, L. Hybrid forecasting approach based on GRNN neural network and SVR machine for electricity demand forecasting. Energies 2017, 10, 44. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, X.; Tian, J.; Huang, T. Hierarchical parameter optimization based support vector regression for power load forecasting. Sustain. Cities Soc. 2021, 71, 102937. [Google Scholar] [CrossRef]
Daut, M.; Hassan, M.; Abdullah, H.; Rahman, H.; Abdullah, M.; Hussin, F. Building electrical energy consumption forecasting analysis using conventional and artificial intelligence methods: A review. Renew. Sustain. Energy Rev. 2017, 70, 1108–1118. [Google Scholar] [CrossRef]
Al Mamun, A.; Sohel, M.; Mohammad, N.; Sunny, M.; Dipta, D.; Hossain, E. A comprehensive review of the load forecasting techniques using single and hybrid predictive models. IEEE Access 2020, 8, 134911–134939. [Google Scholar] [CrossRef]
Tang, X.; Dai, Y.; Liu, Q.; Dang, X.; Xu, J. Application of bidirectional recurrent neural network combined with deep belief network in short-term load forecasting. IEEE Access 2019, 7, 160660–160670. [Google Scholar] [CrossRef]
Heidari, A.; Khovalyg, D. Short-term energy use prediction of solar-assisted water heating system: Application case of combined attention-based LSTM and time-series decomposition. Sol. Energy 2020, 207, 626–639. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.; Jia, Y.; Hill, D.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2019, 10, 841–851. [Google Scholar] [CrossRef]
Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2019, 10, 3943–3952. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Zhang, N.; Chen, X. A short-term residential load forecasting model based on LSTM recurrent neural network considering weather features. Energies 2021, 14, 2737. [Google Scholar] [CrossRef]
Hou, T.; Fang, R.; Tang, J.; Ge, G.; Yang, D.; Liu, J.; Zhang, W. A novel short-term residential electric load forecasting method based on adaptive load aggregation and deep learning algorithms. Energies 2021, 14, 7820. [Google Scholar] [CrossRef]
Hora, S.K.; Poongodan, R.; de Prado, R.P.; Wozniak, M.; Divakarachari, P.B. Long short-term memory network-based metaheuristic for effective electric energy consumption prediction. Appl. Sci. 2021, 11, 11263. [Google Scholar] [CrossRef]
Yang, Y.; Shang, Z.; Chen, Y.; Chen, Y. Multi-objective particle swarm optimization algorithm for multi-step electric load forecasting. Energies 2020, 13, 532. [Google Scholar] [CrossRef] [Green Version]
Yu, P.; Fang, J.; Xu, Y.; Shi, Q. Application of variational mode decomposition and deep Learning in short-term power load forecasting. J. Phys. Conf. Ser. 2021, 1883, 12128. [Google Scholar] [CrossRef]
Gul, M.; Urfa, G.; Paul, A.; Moon, J.; Rho, S.; Hwang, E. Mid-term electricity load prediction using CNN and Bi-LSTM. J. Supercomput. 2021, 77, 10942–10958. [Google Scholar] [CrossRef]
Petrosanu, D.; Pirjan, A. Electricity consumption forecasting based on a bidirectional long-short-term memory artificial neural network. Sustainability 2020, 13, 104. [Google Scholar] [CrossRef]
Wu, K.; Wu, J.; Feng, L.; Yang, B.; Liang, R.; Yang, S.; Zhao, R. An attention-based CNN-LSTM-BiLSTM model for short-term electric load forecasting in integrated energy system. Int. Trans. Electr. Energy Syst. 2020, 31, e12637. [Google Scholar] [CrossRef]
Aurangzeb, K.; Alhussein, M.; Javaid, K.; Haider, S. A pyramid-CNN based deep learning model for power load forecasting of similar-profile energy customers based on clustering. IEEE Access 2021, 9, 14992–15003. [Google Scholar] [CrossRef]
Shi, J.; Yang, L. A climate classification of China through k-nearest-neighbor and sparse subspace representation. J. Clim. 2020, 33, 243–262. [Google Scholar] [CrossRef]
Bian, H.; Zhong, Y.; Sun, J.; Shi, F. Study on power consumption load forecast based on k-means clustering and FCM-BP model. Energy Rep. 2020, 6, 693–700. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, H.; Ding, S.; Zhang, X. Power consumption predicting and anomaly detection based on transformer and k-means. Front. Energy Res. 2021, 9, 779587. [Google Scholar] [CrossRef]
Han, F.; Pu, T.; Li, M.; Taylor, G. Short-term forecasting of individual residential load based on deep learning and k-means clustering. CSEE J. Power Energy Syst. 2021, 7, 261–269. [Google Scholar]
Gomez-Omella, M.; Esnaola-Gonzalez, I.; Ferreiro, S.; Sierra, B. K-nearest patterns for electrical demand forecasting in residential and small commercial buildings. Energy Build. 2021, 253, 111396. [Google Scholar] [CrossRef]
Wang, L.; Shi, J. A comprehensive application of machine learning techniques for short-term solar radiation prediction. Appl. Sci. 2021, 11, 5808. [Google Scholar] [CrossRef]
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Cai, D.; Chen, X. Large scale spectral clustering via landmark-based sparse representation. IEEE Trans. Cybern. 2015, 45, 1669–1680. [Google Scholar]

Figure 1. The annual change trend of global active power.

Figure 2. The global active power in June 2008.

Figure 3. The schematic diagram of sliding window.

Figure 4. Hybrid forecasting framework of the household electric power.

Figure 5. The process of one-dimensional convolution and pooling.

Figure 6. The architecture of LSTM unit.

Figure 7. The framework of LSC–CNN–LSTM model.

Figure 8. The complete data in January 2010.

Figure 9. The division of dataset.

Figure 10. MSE of training set after normalization.

Figure 11. Three-day power consumption prediction of different models.

Figure 12. Three-day power consumption prediction under different models after LSC. (a) Cluster 1, (b) Cluster 2 and (c) Cluster 3.

Table 1. Parameter setting of each layer in the proposed hybrid model.

Type	Parameter	Value	Output Size
Conv1D-1	Kernels	32	22 × 32
	Kernel_size	3
	Activation function	relu
Conv1D-2	Kernels	32	20 × 32
	Kernel_size	3
	Activation function	relu
MaxPooling1D	Pooling_size	2	10 × 32
Flatten	-	-	320
RepeatVector	-	-	1 × 320
LSTM	Number of neurons	50	1 × 50
	Activation function	relu
	Return_sequences	True
Dense-1	Number of neurons	10	1 × 10
Dense-1	Activation function	relu	1 × 10
Dense-2	Number of neurons	1	1 × 1

Table 2. Parameter settings for solving the deep neural network.

Parameter	Setting
Optimizer	Adam
Loss function	Mean Square Error (MSE)
Batch size	40
Number of epochs	80

Table 3. RMSE without and with clustering.

Model	Cluster 1	Cluster 2	Cluster 3	Average
FFNN	-	-	-	0.1535
CNN	-	-	-	0.1618
LSTM	-	-	-	0.1564
CNN–LSTM	-	-	-	0.1776
k-means–FFNN	0.1811	0.1506	0.1852	0.1723
k-means–CNN	0.1545	0.1472	0.1636	0.1551
k-means–LSTM	0.1627	0.1490	0.1651	0.1589
k-means–CNN–LSTM	0.0888	0.0995	0.0806	0.0896
LSC–FFNN	0.1294	0.1816	0.1714	0.1608
LSC–CNN	0.1165	0.1702	0.1589	0.1485
LSC–LSTM	0.1223	0.1801	0.1623	0.1549
LSC–CNN–LSTM	0.0392	0.1072	0.1040	0.0835

Table 4. MAE without and with clustering.

Model	Cluster 1	Cluster 2	Cluster 3	Average
FFNN	-	-	-	0.1065
CNN	-	-	-	0.1197
LSTM	-	-	-	0.1071
CNN–LSTM	-	-	-	0.1216
k-means–FFNN	0.1254	0.1003	0.1336	0.1198
k-means–CNN	0.1100	0.0999	0.1206	0.1102
k-means–LSTM	0.1126	0.1018	0.1168	0.1104
k-means–CNN–LSTM	0.0513	0.0608	0.0479	0.0533
LSC–FFNN	0.0855	0.1321	0.1131	0.1102
LSC–CNN	0.0773	0.1249	0.1075	0.1032
LSC–LSTM	0.0828	0.1272	0.1116	0.1072
LSC–CNN–LSTM	0.0240	0.0625	0.0622	0.0496

Table 5.

R^{2}

without and with clustering.

Table 5.

R^{2}

without and with clustering.

Model	Cluster 1	Cluster 2	Cluster 3	Average
FFNN	-	-	-	0.7406
CNN	-	-	-	0.7190
LSTM	-	-	-	0.7341
CNN–LSTM	-	-	-	0.6646
k-means–FFNN	0.7996	0.7536	0.7755	0.7762
k-means–CNN	0.8609	0.7672	0.8303	0.8195
k-means–LSTM	0.8420	0.7627	0.8290	0.8112
k-means–CNN–LSTM	0.9560	0.9008	0.9621	0.9396
LSC–FFNN	0.7695	0.7752	0.7842	0.7763
LSC–CNN	0.8217	0.8092	0.8188	0.8166
LSC–LSTM	0.8087	0.7878	0.8140	0.8035
LSC–CNN–LSTM	0.9813	0.9285	0.9304	0.9467

Table 6. RMSE of different models with LSC in the last three days.

Model	Cluster 1	Cluster 2	Cluster 3	Average
LSC–FFNN	24.7004	36.1183	33.1139	31.3109
LSC–CNN	26.4603	33.1796	27.7059	29.1153
LSC–LSTM	28.0363	32.0900	32.4298	30.8520
LSC–CNN–LSTM	7.3222	21.3219	23.9670	17.5370

Table 7. MAE of different models with LSC in the last three days.

Model	Cluster 1	Cluster 2	Cluster 3	Average
LSC–FFNN	18.2412	27.5652	21.3021	22.3695
LSC–CNN	16.1984	26.1465	19.0166	20.4538
LSC–LSTM	18.3121	23.4432	21.4003	21.0519
LSC–CNN–LSTM	4.5688	12.4394	12.2364	9.7482

Table 8.

R^{2}

of different models with LSC in the last three days.

Table 8.

R^{2}

of different models with LSC in the last three days.

Model	Cluster 1	Cluster 2	Cluster 3	Average
LSC–FFNN	0.8424	0.7704	0.8610	0.8246
LSC–CNN	0.8127	0.8043	0.9032	0.8401
LSC–LSTM	0.7879	0.8206	0.8645	0.8243
LSC–CNN–LSTM	0.9866	0.9255	0.9342	0.9488

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, J.; Wang, Z. A Hybrid Forecast Model for Household Electric Power by Fusing Landmark-Based Spectral Clustering and Deep Learning. Sustainability 2022, 14, 9255. https://doi.org/10.3390/su14159255

AMA Style

Shi J, Wang Z. A Hybrid Forecast Model for Household Electric Power by Fusing Landmark-Based Spectral Clustering and Deep Learning. Sustainability. 2022; 14(15):9255. https://doi.org/10.3390/su14159255

Chicago/Turabian Style

Shi, Jiarong, and Zhiteng Wang. 2022. "A Hybrid Forecast Model for Household Electric Power by Fusing Landmark-Based Spectral Clustering and Deep Learning" Sustainability 14, no. 15: 9255. https://doi.org/10.3390/su14159255

APA Style

Shi, J., & Wang, Z. (2022). A Hybrid Forecast Model for Household Electric Power by Fusing Landmark-Based Spectral Clustering and Deep Learning. Sustainability, 14(15), 9255. https://doi.org/10.3390/su14159255

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Forecast Model for Household Electric Power by Fusing Landmark-Based Spectral Clustering and Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Data Preprocessing

2.2.1. Matrix Completion

2.2.2. Data Normalization

2.2.3. Samples Generation

2.3. Construction of the Hybrid Model

3. A Deep Learning Framework by Combining LSC, CNN and LSTM

3.1. Landmark-Based Spectral Clustering

3.2. Bootstrap Aggregating

3.3. Convolutional Neural Network

3.4. Long Short-Term Memory

3.5. Integration of LSC, CNN and LSTM

4. Results and Discussion

4.1. Experimental Preparation and Settings

4.2. Results Comparison without Clustering

4.3. Results Comparison with Clustering

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI