1. Introduction
Accurate and stable load forecasting provides indispensable guidance for optimal unit commitment, efficient power distribution, and energy efficiency and thus plays a crucial role in power systems. Recently, the Energy Internet (EI) [
1], as a further upgrade of the smart grid, has attracted considerable attention from academia and industry. The EI not only innovates the infrastructure, but also puts higher demands on the intelligent management of energy [
2]. On the one hand, due to the high penetration of plug-and-play renewable energy resources in the EI, their variable output will exacerbate grid load volatility; on the other hand, the EI transforms the centralized infrastructure into distributed energy management, in which the distributed microgrids with different scales are responsible for managing densely deployed energy supply/demand units [
3,
4]. Fully exploiting the spatiotemporal correlation of load trends between geographically close microgrids will greatly facilitate forecasting tasks. Therefore, load forecasting faces new challenges and opportunities against the backdrop of the EI.
Various load forecasting techniques have emerged over the past decades [
5,
6,
7,
8,
9], among which artificial intelligence based methods have become promising solutions because they excel at mapping the relationship between dependent and independent variables [
10,
11]. Reis et al. [
6] embedded the discrete wavelet transform (WT) into the multilayer perceptron and proposed a multi-model short term load forecasting scheme. In [
7], the authors employed support vector regression (SVR) machines to build a parallel forecasting architecture for hourly load in a day and used particle swarm search to optimize the SVR hyperparameters. Lin et al. [
8] proposed an ensemble model based on variational mode decomposition (VMD) and extreme learning machine (ELM) for multi-step ahead load forecasting. Ko et al. [
12] developed a model based on SVR and radial basis function networks and leveraged the extended Kalman algorithm to filter the model parameters. In addition, the ELM [
7,
13], bi-square kernel regression model [
14], and k-nearest neighbor algorithm [
15] were employed to fulfill forecasting tasks. However, the above traditional machine learning methods and shallow networks are inadequate to fully model the complexity of the power demand side and thus often suffer from limited accuracy or stability.
Over the past decade, deep learning (DL) has achieved great success in many fields [
16,
17]. DL can spontaneously extract valuable information from a variety of related factors and then leverage their powerful nonlinear representation ability to pursue ideal results. Inspired by this, some popular DL based models, such as the feed-forward deep neural network (FF-DNN) [
18], convolutional neural network (CNN) [
19], deep recurrent neural network (RNN) [
20], and long short term memory (LSTM) [
21,
22], have been applied to load forecasting and shown excellent performance.
In [
23], the authors evaluated the effects of deep stacked autoencoders and RNN-LSTM forecasting models, and the test results suggested the superiority of the DL based method compared to traditional models. In [
18], the authors built models using FF-DNN and deep RNN and extracted up to 13 features from the raw load and meteorological data to drive the model. Li et al. [
19] employed deep CNN to cluster two-dimensional input loads, then combined it with various weather variables and a feed-forward network to obtain better forecast results. Shi et al. [
20] developed a deep recurrent architecture using the LSTM unit and merged residential load and associated weather information as input data. In recent studies, the work in work [
22] used the density based clustering technique to preprocess historical information such as loads and holidays and built an LSTM based model that can process multiple time step data simultaneously; The Copula function and deep belief network were used in [
24] to establish an hour-ahead forecasting method, where electricity prices and temperatures were introduced into the input; Farfar et al. [
25] proposed a two stage forecasting system, in which the k-means first clusters the forecasted load according to the estimated temperature, then multiple stacked denoising autoencoder (SDAE) based models for different load clusters perform prediction in the second stage; In [
26], cutting edge deep residual networks were modified into a forecasting model and combined with Monte Carlo dropout to achieve probabilistic forecasting.
The above mentioned deep models hope to rely on redundant connections to accommodate as many fluctuation patterns as possible, thereby improving the inference robustness. Unfortunately, this vision is difficult to achieve for complex and volatile electricity consumption problems, and these DL based methods often struggle with performance degradation elicited by two major problems: (1) The constructed input data usually fail to exploit historical load information fully. Therefore, to achieve the desired result, they have to resort to more related variables (e.g., over 10 types of variables in [
18,
26]), along with the necessary data preprocessing. (2) The developed deep framework usually focuses on creating precise mapping from input to output, which may be unreliable or not even hold in practice, which makes them less flexible and self-regulating in the face of increasingly complex and diverse load situations [
11]. To improve the forecasting performance, some methods [
20,
25] have to train separate models according to different load data patterns.
The major challenge of load forecasting technology is how to meet the diversity of data patterns brought by strong randomness and volatility. A daily load curve typically consists of (1) the cyclical load (accounting for a relatively large proportion) in the regular pattern, (2) the uncertain load (small proportion) caused by external factors such as weather, holidays, and customer behavior, and (3) the noise that cannot be physically explained (minimum proportion) [
20], all of which are ultimately quantified and superimposed as a load sequence. Sufficient load sequences represent a variety of data patterns, and thus, reliable forecasts can be achieved by mining large amounts of highly correlated load data only, which has also been confirmed in several recent studies [
9,
20,
22]. However, since the uncertainty and noise account for less in the load, the contribution to the total forecast error is relatively small, which makes it easier for the forecasting model to remember the data pattern of cyclical load, but not the fluctuation. As a result, many models, on the one hand, need to resort to more types of input variables to remedy the neglect of the fluctuation data patterns due to poor representation learning or nonlinear mapping capabilities; on the other hand, they have to face the complex data preprocessing and cumulative errors resulting from multivariate inputs.
In this work, we propose an ultra short term (one-hour-ahead) load forecasting scheme to provide decision support for power quality control, online operation safety monitoring and prevention, and emergency control. The scheme designs an input data plan based on historical load only and develops a deep model with excellent representation learning and regression capabilities, aiming to address the problems mentioned above as much as possible.
Specifically, we first formulated a historical load matrix (HLM) in the context of the EI, which covers load data for multiple zones at different time points. Second, we leveraged the HLMs to create a historical load tensor (HLT), aiming to provide a forecasting model with enough source materials. Third, based on the spatiotemporal correlation of our HLM, we proposed a novel matrix decomposition algorithm to separate the base load and random components in the HLT effectively. Finally, we calculated the gradient information of the HLTs and formed all the HLT based preprocessing results into a multidimensional array to drive the forecasting model.
Further, to transform the constructed input into the desired forecast results, we developed a forecasting model consisting of a feature learning module and a regression module. The feature learning module is based on the 3D CNN [
27] architecture and can extract valuable data from three input dimensions (i.e., depth, width, and height), which greatly improves the richness of learned features. For the regression module, given the consecutive time attribute of the learned feature sequences, we implemented nonlinear mapping, employing a gated recurrent unit (GRU) [
28] that works well on time series based tasks. The GRU is in the family of recurrent networks and is as good at tackling cases with long term dependencies as LSTM, but with much fewer network parameters [
29,
30]. The proposed model, the 3D CNN-GRU, closely matches the constructed input data and can spontaneously explore various data patterns from the fed multidimensional arrays, thereby facilitating forecasting performance.
Figure 1 presents an overview of our load forecasting scheme against the backdrop of the EI. This work mainly contributes three points: (1) In the EI context, we created an HLT with spatiotemporal correlation and then preprocessed it based on matrix low rank decomposition and the load gradient to get refined hierarchical input data. (2) We developed a novel 3D CNN-GRU model, which consists of two functional modules and can forecast the load trend of any zone covered by the HLM by changing the load label. (3) We constructed the input data based on real-world data and built the 3D CNN-GRU model with TensorFlow [
31]. Self-assessments from multiple perspectives and comprehensive comparisons with several advanced methods were carried out.
The rest of this paper is organized as follows.
Section 2 details the proposed forecasting scheme. The experimental evaluations and analyses are in
Section 3. In
Section 4, the conclusions and prospect are given.