Next Article in Journal
Neuro-Symbolic Word Embedding Using Textual and Knowledge Graph Information
Next Article in Special Issue
Designing a Vertical Handover Algorithm for Security-Constrained Applications
Previous Article in Journal
A Reversible Data-Hiding Method with Prediction-Error Expansion in Compressible Encrypted Images
Previous Article in Special Issue
Trust Management for Artificial Intelligence: A Standardization Perspective
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stride-TCN for Energy Consumption Forecasting and Its Optimization

1
Department of ICT Convergence System Engineering, Chonnam National University, 77, Yongbong-ro, Buk-gu, Gwangju 61186, Korea
2
Korea Electric Power Research Institute (KEPRI), 105, Munji-ro, Yuseong-ku, Daejeon 34056, Korea
3
Korea Electric Power Corporation (KEPCO), 55, Jeollyeok-ro, Jeollanam-do, Naju-si 58322, Korea
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2022, 12(19), 9422; https://doi.org/10.3390/app12199422
Submission received: 5 August 2022 / Revised: 16 September 2022 / Accepted: 16 September 2022 / Published: 20 September 2022
(This article belongs to the Special Issue Advances in Applied Smart Mobile Media & Network Computing)

Abstract

:
Forecasting, commonly used in econometrics, meteorology, or energy consumption prediction, is the field of study that deals with time series data to predict future trends. Former studies have revealed that both traditional statistical models and recent deep learning-based approaches have achieved good performance in forecasting. In particular, temporal convolutional networks (TCNs) have proved their effectiveness in several time series benchmarks. However, presented TCN models are too heavy to deploy on resource-constrained systems, such as edge devices. As a resolution, this study proposes a stride–dilation mechanism for TCN that favors a lightweight model yet still achieves on-pair accuracy with the heavy counterparts. We also present the Chonnam National University (CNU) Electric Power Consumption dataset, the dataset of energy consumption measured at CNU by smart meters every hour. The experimental results indicate that our best model reduces the mean squared error by 32.7%, whereas the model size is only 1.6% compared to the baseline TCN.

1. Introduction

Research on energy usage management has a long tradition, especially about energy consumption. According to Enerdata (https://yearbook.enerdata.net/electricity/ electricity-domestic-consumption-data.html, accessed on 18 July 2022) record in 2021, electricity accounts for 10% of all types of global energy, mostly used by China (42%), the United States (21%), and India (7%). Forecasting models are necessary to optimize an energy consumption program to reduce energy loss efficiently. Many forecasting models, both classical statistical methods and deep learning, are available.
The time-series forecasting problem is usually overcome by classical statistical methods, such as autoregressive (AR) [1] models and the Gaussian process [2], or methods using deep learning, such as LSTM, GRU, and Transformers [3]. Although the above methods operate well in specific circumstances [4], their potential in practice is limited due to heavy computation.
In this paper, we propose the stride–dilated temporal convolutional networks (TCNs), a family of lightweight models for predicting energy consumption. The proposed method is based on the periodic patterns of time series that are usually visible in energy consumption data. Given the capability of detecting periodic patterns, the proposed model is capable of automatically focusing on learning the important parts of data to make predictions. To predict a period in the future, we focus on extracting information from moments in the past. We hypothesize that there are only a few important time points when forecasting, and most of the history has a low correlation; therefore, we can ignore it to reduce the computational burden. Moreover, we observe that different time series data have cyclic patterns that differ from others. Therefore, we suggest using a search algorithm to determine an optimized TCN architecture with appropriate stride hyperparameters; thus, this paper adopted Bayes optimization.
The aim of this work is two-fold:
-
We propose a new TCN architecture with performance on par with state-of-the-art models on several benchmarks, but the number of parameters is greatly reduced based on the stride mechanism. We search for the best stride hyperparameter, representing cyclic patterns on the data.
-
We introduce a dataset of electrical energy consumption measured at Chonnam National University (CNU), South Korea. Along with the dataset, we also present the baseline benchmark.
The rest of the paper is organized as follows: Section 2 presents the background work. Section 3 details the methodology, and Section 4 describes the experiments on the dataset. Then, Section 5 discusses the work and presents a conclusion. Additionally, we publish the source code and datasets used for the experiments (https://github.com/andrewlee1807/tcns-with-nas).

2. Background

The staleness of the model, quality of data, and long prediction range are usually ill-posed problems in the case of time-series forecasting [5]. Various methods have been proposed to solve the above problems, not only with classical statistics but also with recent deep-learning-based techniques.
As a representative of traditional methods, autoregressive integrated moving average (ARIMA) [5] has been used for many years in the field of time series modeling. An ARIMA model combines AR and the moving average to account for seasonality, long-term trends, autoregression, and autocorrelation embedded in the data. However, long-term models are inevitably prone to overfitting and high computational costs [3].
Moreover, deep learning techniques, such as CNNs and RNNs, are also introduced and widely applied for time series forecasting. The advantage that can be gained using a deep learning approach for time series forecasting is that it does not require manually dealing with in-depth data beforehand because representative features are automatically extracted through a training process. The various extensions of RNNs, such as long short-term memory (LSTM) and gated recurrent units (GRU), are suited for time series forecasting [3]. However, these memory and gate-based models still suffer the problem of a long-time dependency on time-series data. In particular, the number of parameters significantly increases even when only a short additional time interval is considered [6].
The TCN [7] is proposed to consider the long-time dependency issue in time series data. As far as we know, the TCN is currently a prominent model in time series forecasting due to two significant modifications: dilation and causal one-dimensional (1D) convolutional layers. The causal 1D convolutional layer offers a learnable kernel filter with proper padding only on the past side to avoid overprediction. Moreover, dilation is the main factor that grants the TCN the ability to enlarge the history coverage via stacking layers. Therefore, the TCN is very effective in dealing with time-series problems and is easily extensible. However, to ensure the range of history for forecasting, a TCN often stacks many convolutional layers on top of each other, significantly increasing the number of parameters and requiring a considerable time to train the model. With the principles of TCN in mind, we develop a simplified architecture that uses correlation factors from a dataset to achieve a compact model that is robust in terms of model lower complexity and robust performance.
A large number of existing studies in the broader literature have examined TCN and its variants for various tasks and types of datasets. Gan et al. employed TCN with an interval width adjustment strategy for wind speed forecasting [8]. For the same task of wind speed forecasting, Li et al. proposed a framework that combined patch transformation, mode decomposition with adaptive noise, and TCN [9]. In mechanical systems, Cao et al. introduced TCN with a residual self-attention mechanism for remaining useful life prediction [10]. For video segmentation, Dipika et al. proposed coarse to fine multi-resolution encoder–decoder TCN to ensure smoothness and temporal coherency [11]. Ma et al. presented densely connected TCN with a squeeze-and-excitation block and attention mechanism to increase the receptive field’s size for solving the lip-reading problem in videos [12].

3. Methodology

In this section, the time series forecasting problem is formulated first. In addition, the TCN model is used as the method in the comparative evaluation. Finally, the stride-TCN is introduced.

3.1. Time Series Forecasting Problem

First, we highlight the nature of the time-series forecasting task. Given an input sequence X = { x 0 , , x T } ,   where the task is to predict the outputs Y = ( y 0 ,   ^ , y P   ^ ) each time, a key constraint is that, from an ordered number of T observed data points, P data points must be immediately predicted chronologically. Formally, a modeling network is any function f : X Y that produces the mapping
Y = f ( X )
Building function f is the process of learning to find optimal parameters of network f from a set of time series { x 0 : T ( i ) } i = 1 N that denotes the future time series as { x ( T + 1 ) : ( T + P ) ( i ) } i = 1 N , where N is the number of series, T represents the length of the historical observations, and P indicates the length of the forecasting horizon. The learning process is the process of minimizing the error function L ( Y , f ( X ) )   between the actual output and predictions. Time-series forecasting generally focuses on the prediction of real values, usually loss functions, such as the mean squared error (MSE), mean absolute error (MAE), or its variants (mean absolute percentage error, root mean square error, etc.). The MSE is greater for learning the outliers in the dataset, whereas MAE is good for ignoring outliers. However, in some cases, the data are less sensitive to outliers, and those points should not have high priority. Therefore, the Huber loss combines the proposed MAE and MSE and solves this problem [6]. The mathematical form of the Huber loss is:
L ( Y ,   f ( X ) ) = { 1 2 ( Y f ( X ) ) 2 for   | Y f ( X ) | δ δ | Y f ( X ) | 1 2 δ 2 otherwise
where the δ parameter is sensitive to outliers. In this study, we used the Huber loss as the error function to calculate the difference between the current and expected output where the δ   parameter is fine-tuned by the algorithm.

3.2. TCN Architecture

A reasonable volume of training data was available in this work; thus, we decided to use deep learning models to predict hour-ahead energy consumption. The TCN is a variation of the CNN for sequence modeling tasks by combining aspects of the RNN and CNN architectures. The TCN achieves better performance than RNNs in many tasks while avoiding the common drawbacks of recurrent models, such as the exploding/vanishing gradient problem or lack of memory retention [7]. The original architecture of the TCN (Figure 1) introduced includes two components: dilated and causal 1D convolutional layers [13], which smooth the input time series. Thus, we do not need to add the rolling mean or rolling standard deviation values in the input features.
Dilated convolution can be applied in the long information dependency problem of the sequence to determine the output 𝒴 at position t for a sequence input x     n , expressed as:
𝒴 ( t ) = ( x w ) ( t ) = i = 0 K 1 w ( i ) · x t d i
where d represents the dilation factor, K indicates the 1D convolutional window size, and t d i   denotes the direction of the past with kernel w   { 0 , , K 1 }     . However, to construct a deep model with the increased depth of the architecture to capture a more extended history based on the TCN, using skip connections is recommended [7]. Accordingly, the shortcut connections across layers were added to TCNs against the degradation problem, and accuracy saturates as the network converges.
Stacking multiple dilated convolutions enables networks to have extensive receptive fields and capture long-range temporal dependencies with a smaller number of layers [14]. Beside d l is increased consecutive layers within a block, calculated as d l = 2 l for layer l in the network. Therefore, each TCN block contains γ elements identified based on R f i e l d m a x , the maximum supported R f i e l d :
γ   =   [ log 2 ( R f i e l d m a x 1 ) ] + 1
As defined in Equation (4), we can consider γ a parameter to determine the number of dilations of a TCN block. However, setting the TCN hyperparameters by hand requires an empirical and time-consuming trial-and-error process and is not optimal [7]. In this paper, we reduce this work by automatically searching the hyperparameters. Table 1 lists the hyperparameters automatically searched for in the TCN using Bayesian optimization.

3.3. Stride-TCN

The study focused on energy consumption and was tested on three datasets related to energy consumption. The data analysis found that the energy consumption data are seasonal (Figure 2). If we break down the time series into small components based on that period and compare them, we observe the similarity in the shape of the time series. Therefore, the information in the same positions in the periods has a strong relationship and supports making predictions for the next periods.
The proposed architecture extracts and learns the information at the corresponding locations of each related period, where the model is efficient and reduces the model complexity. Similar to the TCN, the proposed architecture consists of two components: dilated and causal 1D convolutional layers. However, the proposed model was adjusted in the calculation of 1D convolution. The layer L n is calculated directly based on the lower layer L n 1 . Node X i L n   in layer L n is determined by convolution through a kernel of size K sliding over the layer L n 1 . Node X i + 1 L n in layer L n is calculated in the same way as X i L n , but the convolutional position on layer L n 1   must be at a distance S from the position of the calculated node X i L n . This distance is considered the time dependence of the data mentioned above.
The autocorrelation method is used to determine the time dependence S of the data series [15]. We call the model built by this approach heuristic–stride–TCN. Although the model parameters were reduced, the model error was still high compared to the TCN method.
In another approach to determining the time dependence S of the data series, we propose the stride–TCN architecture by applying Bayesian optimization to determine the hyperparameters automatically. We reduced model parameters and errors. Table 2 lists the hyperparameters automatically searched for in the stride-TCN.
An overview of stride–TCN architecture is illustrated in Figure 3.

3.4. Bayesian Optimization

The Bayesian optimization technique probabilistic model   p ( θ   | λ   ) of the configuration performs on an evaluation index   θ (i.e., loss or accuracy of the test), given a set of hyperparameters λ [16]. Bayesian optimization uses a surrogate model to estimate the function to be optimized, as demonstrated in Algorithm 1.
Algorithm 1 Bayesian Optimization (BO)
Input: Search space Λ , black-box function F, acquisition function S , maximal number of function evaluations m
1. D 0   = initialize( Λ )
2. for n = 1 to m | D 0 |  do
3.   p ( θ | λ ,   D ) = fit predictive model on D n 1 ;
4.   select x n by optimizing
λ n = a r g   max λ Λ S ( λ ;   D n 1 ,   p ( θ | λ ,   D ) )   ;
5. end for
6. Query θ n   : = F ( λ n ) ;
7. Add observation to data D n = D n 1     { λ n ,   θ n } ;
8. return Best λ *
Bayesian optimization determines an optimized λ * of the function F :   X to denote a black-box function. Bayesian optimization performs an iterative process to determine the probabilistic p ( θ | λ ,   D ) based on the previous observation D = { ( λ 0 ,   θ 0 ) ,   ( λ 1 ,   θ 1 ) , , ( λ n 1 ,   θ n 1 ) } , where it is assumed to only access noisy observations θ = F ( λ ) + ε   with ε ~ N ( 0 ,   σ n o i s e 2 ) . To select the next λ , the acquisition function–expected improvement S [17] is used to determine a point that maximizes it. Then, it evaluates F   at λ n , obtains   θ n , updates the probabilistic model, and iterates [18].

3.5. Training Procedure

In this paper, the two models TCN and stride-TCN use Bayesian optimization to determine hyperparameters before training the model. The training procedure for each model is described in Algorithm 2, where W denotes model parameters that must be learned, λ * represents the set of optimal adjustable hyperparameters, and loss (W) is calculated using the Huber loss.
Algorithm 2 Training procedure
Input: Search space Λ , epoch = 100;
1. λ * =  Algorithm 1 ( Λ )
2. Initialize W with λ *
3. for n = 1 to epoch do
4.   Update W based on Huber loss (W)
5. end for

4. Experiments

We evaluated the number of model parameters and predictive power of the stride-TCN compared with the TCN architecture and RNNs, such as LSTMs and GRUs.

4.1. Setup

4.1.1. Datasets

We perform experiments on two public and one private dataset for empirical studies. All datasets are available for online access. Because this research focuses on univariate time-series forecasting, we only study time series with a single dimension for each dataset above. Table 3 presents an overview of the corpus statistics.
Individual household electric power consumption, Dataset 1 is available online (https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption, accessed on 17 July 2022). It contains minute-by-minute electric power consumption in one household in France for 47 months (December 2006 to November 2010). The time series includes the total active power consumed, total reactive power consumed, average current intensity, active energy for the kitchen, active energy for the laundry, and active energy for climate control systems [19]. In total, we have 2,075,259 multivariable sequences. For this dataset, 34,589 univariable sequences were used as the study value for the hour–global active power data.
The energy consumption curves of 499 customers from Spain, Dataset 2 is available online (https://fordatis.fraunhofer.de/handle/fordatis/215, accessed on 17 July 2022). The dataset contains hourly energy consumption data, outside temperatures for the region, and the metadata for 499 customers in Spain for about one year (1 January 1 to 31 December 2019). The entire dataset consists of 8760 data points. The energy consumption data was used for this dataset as the study value.
The CNU energy consumption, Dataset 3 is available online (https://github.com/andrewlee1807/tcns-with-nas/tree/main/Dataset/cnu-dataset, accessed on 5 August 2022). It contains a real-world dataset with energy consumption values of 90 locations at CNU, collected continuously hourly for 1.3 years (from 1 January 2021 to 14 January 2022). Each location has information for 11,232 data points. In this research, we focus on the total electricity consumption of a particular location: Engineering-Building-07.monitor_02.
For each dataset, it is necessary to conduct preprocessing procedures before the training process because the power values in the datasets are relatively high; for example, in Dataset 3, mean = 130.48 and std = 46.97. Therefore, to avoid overflow, increased computational cost, and dataset distortion, each dataset refers to the rescaling of the features to a range of [0, 1] using the min–max normalization, calculated as follows (5):
z i = x i min ( x ) max ( x ) min ( x )
where x = ( x 1 , , x n ) and z i is the i t h normalized data point.
We also evaluated the time series forecasting task on three datasets in this experiment. More specifically, most models choose an input length of 168 h and output length of 1 to 84 h. Each dataset was split into a training set (80%), validation set (10%), and testing set (10%) in chronological order.

4.1.2. Model variants

We conducted experiments on two model variants, the heuristic–stride–TCN and stride-TCN. Depending on each dataset, the models have the appropriate configuration. The heuristic–stride–TCN is built in two hidden layers relying on a pattern of individual data to determine the value of the stride. Table 4 presents the configuration for the three datasets.
For each dataset, we experimented on three different stride–TCN models, with two hidden layers, three hidden layers, and four hidden layers. The stride–TCN was built relying on Bayesian optimization to determine the optimal hyperparameters in Table 2.

4.1.3. Evaluation Metrics

Two evaluation metrics, the MAE and MSE for univariate forecasting, are employed, defined as follows:
M A E = 1 n i = 1 n | Y i r e a l Y i p r e d i c t |
M S E = 1 n i = 1 n ( Y i r e a l Y i p r e d i c t ) 2
The implementations of the proposed methods were built based on the Keras library with a Tensorflow backend. All models were trained and tested on four Nvidia Quadro RTX A5000 24 GB GPUs. The source code is available online (https://github.com/andrewlee1807/tcns-with-nas).

4.2. Experimental Results and Comparison

4.2.1. Baselines and Configurations

The LSTM, GRU, and TCN models are included for an evaluation to build a baseline test benchmark. The LSTM model is built with two hidden LSTM layers. The first LSTM layer identifies 200 hidden nodes, and the second LSTM layer identifies 150 nodes. The final layer is dense. The GRU model is built with two hidden GRU layers. The first GRU layer identifies 103 hidden nodes, and the second GRU layer identifies 103 nodes. The final layer is dense with the rectified linear unit activation function. In both LSTM and GRU models, the dropout is 25%, the optimizer is Adam, and the loss function is MSE to train the model. The TCN model was built using Bayesian optimization to determine the optimal hyperparameters in Table 1.
Both the TCN baseline model and our proposed stride-TCN are built with BO for the optimal hyperparameters. To find an optimum robust model that is general to an arbitrary training process, we keep the hyperparameters on the training phase unchanged and only search for TCN architecture involving kernel size, dilation, and the number of layers which are the most important hyperparameters. On the other hand, we further shrink the search space of the Stride-TCN family where only stride, the number of layers, kernel size, and whether using dropout is considered. Our purpose is to drive the search process to focus more on finding the best stride hyperparameters yet not too much to suppress the contribution of other important hyperparameters. The range for each hyperparameter of the stride-TCN family is given in Table 2. Additional TCN and stride-TCN use Huber loss during training, parameter δ is set to 1 for all cases.
As mentioned above, configurations of the training phase are kept unchanged for all learning models. For details, we set the starting learning rate at 0.001 and reduce it by 1% when there is no improvement in validation loss. We train the model for 100 epochs with the Adam optimizer batch size of 32. In addition, we apply early stopping when there is no improvement after 20 epochs.

4.2.2. Results and Analysis

The comparison was made using the MSE and MAE for forecast horizons set from the first hour to the 84th hour. The forecast horizon is the length of time into the future for which forecasts are to be prepared. To avoid abundant observations, we hence only report a result at particular time steps, which are 1, 12, 24, 36, 48, 60, 72, and 84 h. In time series forecasting, larger horizons make forecast prediction more challenging. Thus, the experiments offer a detailed analysis of the results in this vast horizon. We compared the results of the proposed method with other algorithms (LSTM, GRU, and TCN) to prove the effectiveness of this approach. The best results (lower values are better) per method are highlighted in red.
Table 5 reports the MSE and MAE values of predicted energy consumption on the CNU dataset (dataset 1) over time steps from baselines LSTM, GRN, TCN, and our proposed stride–TCNs. As clearly depicted in Table 5, our stride–TCNs steadily achieve the lowest errors between 60 h and 84 h. Notably, the stride–TCN with two layers model reaches the lowest prediction error of 84 h. Compared to the LSTM, GRU, heuristic–stride–TCN, and TCN baseline, our auto–stride–2 layers are 2.07%, 7.19%, 30.05%, and 32.7% better in terms of average relative errors, respectively. In the case of dataset 2, the TCN baseline has demonstrated its superior when substantially better than other models for short-time prediction, as in Table 6 However, the limitations become clear when experimenting on dataset 1, when our proposed architecture could not overcome baseline models, as shown in Table 7. This indicates the lack of complexity of the model to capture information well in a large dataset, as well known as the underfitting phenomenon. Overall, our proposed architecture achieved on-par results with other baseline models on time series forecasting, considering the error is slightly higher or even lower than the baselines’ error in some specific settings.
We obtain a significant reduction in model complexity with stride–TCNs, as seen in Table 4. Remarkably, our heuristic–stride–TCN for long-time forecasting (84 h) has approximately 6K parameters, which is only 1.6%, 5%, and 1% compared to the number of parameters from the baseline LSTM, GRU, and TCN, respectively. We also note that the model’s complexity depends on the length of both history and forecast horizontal, so each model’s complexity on different datasets is variant. Details are given in Table 8 (dataset 2), Table 9 (dataset 3) and Table 10 (dataset 1).
Figure 4 illustrates the correlation between performance and complexity between seven models on the CNU dataset, including the LSTM, GRU, and TCN baseline, together with our proposed stride–TCN and heuristic–stride–TCN. In most cases, the baseline TCN has the largest number of parameters, followed by the LSTM and GRU. On the contrary, the heuristic–stride–TCN model constantly reaches the smallest number of parameters. Besides, the family of the stride–TCN also has a relatively small number of parameters compared to baseline models. Figure 4 also demonstrates that using BO usually leads to a model with better performance but suffers the model’s complexity trade-off. Another statement that can be drawn here is that the baseline models outperform stride-TCN only in the task of short-term forecasting since the difference in MSE between baseline and the proposed architecture is not trivial later. Generally, the results confirm that our proposed TCN architecture yields a family of lightweight models capable of being implemented on constrained resource devices. Last but not least, among models in the stride–TCN family, the heuristic–stride–TCN does favor not only less complexity but also the fastest in terms of training because the stride hyperparameter can be predefined by data’s pattern instead of being searched by BO.

5. Conclusions

This paper presents three contributions. First, we propose a lightweight TCN family with the stride mechanism; secondly, we introduce a new dataset about electrical energy consumption along with its benchmark; and thirdly, we search for a robust model based on Bayesian Optimization. The experiments have shown that our architecture achieves comparable results on small and medium datasets while significantly reducing model complexity compared to baselines. We argue that the performance on the large dataset is not high as expected because we limit the number of dilation layers which makes our model underfit. Importantly, our results provide evidence for the hypothesis that highly correlated time points are crucial for the forecasting task. Furthermore, we suggest that the stride factors should be trained alongside the model’s parameters to make it adaptable to various datasets. This assumption might be addressed in future studies.

Author Contributions

Conceptualization, L.H.A. and J.Y.K.; methodology, L.H.A., G.H.Y. and D.T.V.; software, L.H.A.; validation, J.Y.K., G.H.Y. and J.C.Y.; formal analysis, D.T.V.; investigation, J.I.L. and J.C.Y.; resources, J.S.K.; data curation, L.H.A. and J.Y.K.; writing—original draft preparation, L.H.A.; writing—review and editing, J.Y.K., J.S.K. and J.C.Y.; visualization, L.H.A.; supervision, J.Y.K. and J.S.K.; project administration, J.C.Y., J.S.K. and J.Y.K.; funding acquisition, J.I.L., J.S.K. and J.Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Electric Power Research Institute (KEPRI) grant funded by the Korea Electric Power Corporation (KEPCO) (No. R20IA02). And this work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Individual household electric power consumption is available online at https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption (accessed on 17 July 2022). The energy consumption curves of 499 customers from Spain are available online at https://fordatis.fraunhofer.de/handle/fordatis/215 (accessed on 17 July 2022). The CNU energy consumption is available online at https://github.com/andrewlee1807/tcns-with-nas/tree/main/Dataset/cnu-dataset (accessed on 5 August 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Akaike, H. Fitting autoregressive models for prediction. Ann. Inst. Stat. Math. 1969, 21, 243–247. [Google Scholar] [CrossRef]
  2. Frigola, F.; Rasmussen, C.E. Integrated pre-processing for Bayesian nonlinear system identification with Gaussian processes. In Proceedings of the 52nd IEEE Conference on Decision and Control, Firenze, Italy, 10–13 December 2013; pp. 552–560. [Google Scholar]
  3. Shi, J.; Jain, M.; Narasimhan, G. Time Series Forecasting (TSF) Using Various Deep Learning Models. arXiv 2022, arXiv:2204.11115. [Google Scholar]
  4. Jadon, S.; Milczek, J.K.; Patankar, A. Challenges and approaches to time-series forecasting in data center telemetry: A Survey. arXiv 2021, arXiv:2101.04224. [Google Scholar]
  5. Nelson, B.K. Time series analysis using autoregressive integrated moving average (ARIMA) models. Acad. Emerg. Med. 1998, 5, 739–744. [Google Scholar] [CrossRef] [PubMed]
  6. Rousseeuw, P.J.; Hampel, F.R.; Ronchetti, E.M.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
  7. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  8. Gan, Z.; Li, C.; Zhou, J.; Tang, G. Temporal convolutional networks interval prediction model for wind speed forecasting. Electr. Power Syst. Res. 2021, 191, 106865. [Google Scholar] [CrossRef]
  9. Li, D.; Jiang, F.; Chen, M.; Qian, T. Multi-step-ahead wind speed forecasting based on a hybrid decomposition method and temporal convolutional networks. Energy 2022, 238, 121981. [Google Scholar] [CrossRef]
  10. Cao, Y.; Ding, Y.; Jia, M.; Tian, R. A novel temporal convolutional network with residual self-attention mechanism for remaining useful life prediction of rolling bearings. Reliab. Eng. Syst. Saf. 2021, 215, 107813. [Google Scholar] [CrossRef]
  11. Singhania, D.; Rahaman, R.; Yao, A. Coarse to Fine Multi-Resolution Temporal Convolutional Network. arXiv 2021, arXiv:2105.10859. [Google Scholar]
  12. Pingchuan, M.; Yujiang, W.; Jie, S.; Stavros, P.; Maja, P. Lip-Reading With Densely Connected Temporal Convolutional Networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); 2021. Available online: https://openaccess.thecvf.com/content/WACV2021/html/Ma_Lip-Reading_With_Densely_Connected_Temporal_Convolutional_Networks_WACV_2021_paper.html (accessed on 4 August 2022).
  13. Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
  14. Churchill, R.M.; Tobias, B.; Zhu, Y.; DIII-D Team. Deep convolutional neural networks for multi-scale time-series classification and application to tokamak disruption prediction using raw, high temporal resolution diagnostic data. Phys. Plasmas 2020, 27, 062510. [Google Scholar] [CrossRef]
  15. Broersen, P.M. Automatic Autocorrelation and Spectral Analysis; Springer Science & Business Media: New York, NY, USA, 2006. [Google Scholar]
  16. Dewancker, I.; McCourt, M.; Clark, S. Bayesian Optimization Primer; 2015. Available online: https://app.sigopt.com/static/pdf/SigOpt_Bayesian_Optimization_Primer.pdf (accessed on 15 September 2022).
  17. Jones, D.R.; Schonlau, M.; Welch, W.J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 1998, 13, 455–492. [Google Scholar] [CrossRef]
  18. Lindauer, M.; Feurer, M.; Eggensperger, K.; Biedenkapp, A.; Hutter, F. Towards Assessing the Impact of Bayesian Optimization’s Own Hyperparameters. In Proceedings of the IJCAI 2019 DSO Workshop, Macao, China, 10–16 August 2019. [Google Scholar]
  19. Parate, A.; Bholte, S. Individual household electric power consumption forecasting using machine learning algorithms. Int. J. Comput. Appl. Technol. Res. 2019. Available online: https://www.researchgate.net/publication/335911657_Individual_Household_Electric_Power_Consumption_Forecasting_using_Machine_Learning_Algorithms- (accessed on 7 July 2022). [CrossRef]
Figure 1. TCN architecture.
Figure 1. TCN architecture.
Applsci 12 09422 g001
Figure 2. Sample in Chonnam National University energy consumption.
Figure 2. Sample in Chonnam National University energy consumption.
Applsci 12 09422 g002
Figure 3. Stride-TCN architecture.
Figure 3. Stride-TCN architecture.
Applsci 12 09422 g003
Figure 4. Comparison between models with respect to performance and complexity. The comparison is conducted on the CNU dataset. The x-axis depicts MSE error, and the y-axis depicts the forecast horizontal at eight milestones (1, 12, 24, 36, 48, 60, 72, and 84 h). A circle represents a model whose color represents its category, and the circle radius describes its complexity.
Figure 4. Comparison between models with respect to performance and complexity. The comparison is conducted on the CNU dataset. The x-axis depicts MSE error, and the y-axis depicts the forecast horizontal at eight milestones (1, 12, 24, 36, 48, 60, 72, and 84 h). A circle represents a model whose color represents its category, and the circle radius describes its complexity.
Applsci 12 09422 g004
Table 1. Search space of the TCN using Bayesian optimization.
Table 1. Search space of the TCN using Bayesian optimization.
HyperparametersSymbolChoices
1D convolutional window size K 1, 3, 5, 7, 9
Number of filters in each convolution layer N i 8, 16, 32, 64, 128, 256, 512
Number of TCN layers N t 2
Dilation factor γ 1
Skip connection Yes, No
Batch Normalization Yes, No
Table 2. Search space of stride-TCN used by Bayesian optimization.
Table 2. Search space of stride-TCN used by Bayesian optimization.
HyperparametersSymbolChoices
Kernal size K 1, 3, 5, 7, 9
Number of filters N i 8, 16, 32, 64, 128, 256, 512
Stride S 1 , 2 , 3 , 24
Dropout rate ρ 0, 0.1,…0.5
Table 3. Dataset statistics.
Table 3. Dataset statistics.
DatasetsLength of Time SeriesTotal Number of VariablesAttributions
Dataset 12,075,2597Global active power
Global reactive power
Voltage
Global intensity
Submetering 1
Submetering 2
Submetering 3
Dataset 287602Energy consumption
Outside temperature
Dataset 3 11,2321Energy consumption
Table 4. Configuration for the heuristic-stride-TCN for three datasets.
Table 4. Configuration for the heuristic-stride-TCN for three datasets.
HyperparametersDataset 1Dataset 2Dataset 3
1D convolutional window size333
Number of filters in each convolution layer323232
Stride 1122424
Stride 2777
Table 5. Performance (MSE and MAE) of all models on the CNU dataset.
Table 5. Performance (MSE and MAE) of all models on the CNU dataset.
MethodTime Prediction
1 h12 h24 h36 h48 h60 h72 h84 h
MSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
LSTM0.00200.02970.01090.06870.00970.06700.01130.07410.01400.08130.0140.0830.01400.08300.01450.0861
GRU0.00220.03090.01090.07020.01280.07640.01450.08180.01570.08530.01560.08660.01560.08660.01530.0871
TCN0.00200.02980.01130.07180.00840.06390.01070.07140.01570.08520.01680.08860.01790.09270.02030.0980
Heuristic–stride–TCN0.01210.07990.01910.10120.02080.10490.02070.10440.02120.1050.02130.10530.02130.10530.02110.1047
Stride–TCN
2 layers0.00250.0340.01150.07630.01360.07950.01290.07930.01480.08430.01370.08230.01330.07930.01420.0849
3 layers0.00240.03310.0120.07250.01090.0710.01490.08370.01530.08410.01650.08650.01230.07710.0160.0868
4 layers0.00230.03220.01170.07650.01070.07270.01390.08330.01540.08280.01740.09220.01690.08880.0160.0865
Table 6. Performance (MSE and MAE) of all models on the Spain dataset.
Table 6. Performance (MSE and MAE) of all models on the Spain dataset.
MethodTime Prediction
1 h12 h24 h36 h48 h60 h72 h84 h
MSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
LSTM0.00810.06770.01690.09710.01650.0960.01690.09740.0160.09490.01730.09820.01580.09490.01630.0957
GRU0.00950.07230.01610.09370.01670.09550.01880.10150.01970.10370.020.10440.01980.1040.02050.1061
TCN0.0080.06720.01490.08940.01560.09210.01660.0960.01730.09810.01870.10150.01820.10.01820.1005
Heuristic–stride–TCN0.02470.11660.04030.15620.04240.16070.04210.16040.04260.16150.04260.16130.04280.16210.04320.1626
Stride-TCN
2 layers0.01160.07920.01750.09890.02090.10710.01920.10360.02150.10830.01990.1060.020.10460.02020.107
3 layers0.01790.09890.0180.10040.02230.11040.01810.10060.02020.10790.02030.10470.02030.10470.02130.109
4 layers0.00850.07170.01780.09980.02050.10890.01990.10430.02040.11020.020.10490.02020.10640.01840.1001
Table 7. Performance (MSE and MAE) of all models on the Household dataset.
Table 7. Performance (MSE and MAE) of all models on the Household dataset.
MethodTime Prediction
1 h12 h24 h36 h48 h60 h72 h84 h
MSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
LSTM0.00560.05170.00810.06560.00830.0660.00850.06730.00860.06790.00910.07140.00890.06940.0090.0711
GRU0.00650.05690.00790.06570.00820.0680.00830.06870.00850.06930.00850.06960.00860.07040.0090.0732
TCN0.00520.04960.00820.06580.00850.06710.00870.06890.00890.06930.00950.07240.00910.07120.0090.0715
Heuristic–stride–TCN0.01040.07980.01150.08630.01160.08710.01160.08730.01160.08740.01160.0870.01160.08710.01170.0874
Stride-TCN
2 layers0.00630.05980.00880.06870.00890.07250.00880.07070.0090.07270.00930.07410.00930.07260.00930.0743
3 layers0.00550.05030.00860.07070.00880.0720.0090.07360.0090.07260.0090.07320.00930.07570.00960.078
4 layers0.00540.05080.01010.08080.01040.07890.00940.0730.00960.07510.00930.07540.01010.08080.00970.0777
Table 8. Models’ complexity in the Household dataset (dataset 1), demonstrated by the number of parameters.
Table 8. Models’ complexity in the Household dataset (dataset 1), demonstrated by the number of parameters.
ModelTime Prediction
1 h12 h24 h36 h48 h60 h72 h84 h
LSTM 372,351 374,012 375,824 377,636 379,448 381,260 383,072 384,884
GRU 103,747 104,462 105,242 106,022 106,802 107,582 108,362 109,142
TCN 23,681 1,495,436 269,144 646,052 647,600 1,501,628 650,696 11,716
Heuristic–stride–TCN35213884428046765072546858646260
Stride-TCN
2 layers208130,54031,32032,10032,880969234,4403492
3 layers58,81759,53260,31261,09261,87262,65263,43264,212
4 layers87,809126889,30423,556199291,64418082316
Table 9. Models’ complexity in CNU dataset (dataset 3), demonstrated by the number of parameters.
Table 9. Models’ complexity in CNU dataset (dataset 3), demonstrated by the number of parameters.
ModelTime Prediction
1 h12 h24 h36 h48 h60 h72 h84 h
LSTM372,351374,012 375,824 377,636 379,448 381,260 383,072 384,884
GRU103,747 104,462 105,242 106,022 106,802 107,582 108,362 109,142
TCN6433 88,652 1,037,720 580,004 353,968 355,516 587,720 589,268
Heuristic–stride–TCN35213884428046765072546858646260
Stride-TCN
2 layers593810880032,100101633,66034,44010,484
3 layers 1081 59,532 60,312 61,092 16,624 62,65263,43264,212
4 layers 38,401 88,524 89,304 90,084 90,864 91,644220868,500
Table 10. Models’ complexity in Spain dataset (dataset 2), demonstrated by the number of parameters.
Table 10. Models’ complexity in Spain dataset (dataset 2), demonstrated by the number of parameters.
ModelTime Prediction
1 h12 h24 h36 h48 h60 h72 h84 h
LSTM 372,351 374,012 375,824 377,636 379,448 381,260 383,072 384,884
GRU 103,747 104,462 105,242 106,022 106,802 107,582 108,362 109,142
TCN 345,857 374,988 260,824 319,076 188,528 52,700 814,280 1,504,724
Heuristic–stride–TCN35213884428046765072546858646260
Stride-TCN
2 layers774569231,32026769296308434,44010,484
3 layers58,8175,953260,3124548150462,652172017,812
4 layers22,401601289,30490,08490,86491,64492,42493,204
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Anh, L.H.; Yu, G.H.; Vu, D.T.; Kim, J.S.; Lee, J.I.; Yoon, J.C.; Kim, J.Y. Stride-TCN for Energy Consumption Forecasting and Its Optimization. Appl. Sci. 2022, 12, 9422. https://doi.org/10.3390/app12199422

AMA Style

Anh LH, Yu GH, Vu DT, Kim JS, Lee JI, Yoon JC, Kim JY. Stride-TCN for Energy Consumption Forecasting and Its Optimization. Applied Sciences. 2022; 12(19):9422. https://doi.org/10.3390/app12199422

Chicago/Turabian Style

Anh, Le Hoang, Gwang Hyun Yu, Dang Thanh Vu, Jin Sul Kim, Jung Il Lee, Jun Churl Yoon, and Jin Young Kim. 2022. "Stride-TCN for Energy Consumption Forecasting and Its Optimization" Applied Sciences 12, no. 19: 9422. https://doi.org/10.3390/app12199422

APA Style

Anh, L. H., Yu, G. H., Vu, D. T., Kim, J. S., Lee, J. I., Yoon, J. C., & Kim, J. Y. (2022). Stride-TCN for Energy Consumption Forecasting and Its Optimization. Applied Sciences, 12(19), 9422. https://doi.org/10.3390/app12199422

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop