Energy Demand Forecasting Using Temporal Variational Residual Network
Abstract
1. Introduction
- We propose TVRN, a novel hybrid architecture that combines a ResNet-embedded VAE with a BiLSTM network to address nonlinearities, irregular seasonality, and deep latent structure in energy demand forecasting.
- To the best of our knowledge, this is the first application of a ResNet-embedded VAE for time series forecasting. While ResNet has primarily been applied in image recognition, our adaptation enables deeper VAE structures without gradient degradation, enhancing the model’s capacity to extract local, seasonal, and global features from complex energy data.
- The TVRN model integrates ResNet-embedded VAEs with BiLSTM to improve forecasting by combining denoising, deep residual learning, and bidirectional temporal modeling.
- We conduct extensive evaluations of TVRN on both hourly and daily energy consumption data, comparing it against a suite of strong baselines, including traditional statistical models (ARIMA), deep learning models (DNN, CNN, BiLSTM), and hybrid solutions (PCA-BiLSTM, CPL). Results consistently show that TVRN outperforms all benchmarks in RMSE and MAE, showing significant performance gains across all forecasting horizons.
2. Methodology
2.1. TVRN Algorithm Architecture
2.2. Out-of-Sample Energy Demand Forecasting Using Recursive Algorithms and Evaluation Metrics
3. Data Descriptions
Design and Optimization of Hyperparameters
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
TVRN | Temporal Variational Residual Network |
ARIMA | Autoregressive Integrated Moving Average |
ARMAX | Autoregressive Moving Average with eXogenous |
GARCH | Generalized Autoregressive Conditional Heteroskedasticity |
ARX | Autoregressive with eXtra Input |
TARX | Threshold Autoregressive with eXogenous |
SVM | Support Vector Machine |
ANN | Artificial Neural Network |
RNN | Recurrent Neural Network |
CNN | Convolutional Neural Network |
LSTM | Long Short-Term Memory |
BiLSTM | Bidirectional Long Short-Term Memory |
GRU | Gated Recurrent Unit |
PCA | Principal Component Analysis |
CEEMD | Complementary Ensemble Empirical 81 Mode Decomposition |
AE | Autoencoder |
VAE | Variational Autoencoder |
MWDN | Multilevel Wavelet Decomposition Network |
(SVD) | Singular Value Decomposition |
References
- US Energy Information Administration (EIA). International Energy Outlook 2013; US Energy Information Administration (EIA): Washington, DC, USA, 2013; p. 25.
- Suganthi, L.; Samuel, A.A. Energy models for demand forecasting—A review. Renew. Sustain. Energy Rev. 2012, 16, 1223–1240. [Google Scholar] [CrossRef]
- Chen, S. Data centres will use twice as much energy by 2030—Driven by AI. Nature 2025. [Google Scholar] [CrossRef]
- Zhao, H.X.; Magoulès, F. A review on the prediction of building energy consumption. Renew. Sustain. Energy Rev. 2012, 16, 3586–3592. [Google Scholar] [CrossRef]
- Wei, N.; Li, C.; Peng, X.; Zeng, F.; Lu, X. Conventional models and artificial intelligence-based models for energy consumption forecasting: A review. J. Pet. Sci. Eng. 2019, 181, 106187. [Google Scholar] [CrossRef]
- Klyuev, R.V.; Morgoev, I.D.; Morgoeva, A.D.; Gavrina, O.A.; Martyushev, N.V.; Efremenkov, E.A.; Mengxu, Q. Methods of forecasting electric energy consumption: A literature review. Energies 2022, 15, 8919. [Google Scholar] [CrossRef]
- Zhou, E.; Gadzanku, S.; Hodge, C.; Campton, M.; Can, S.d.R.d.; Zhang, J. Best Practices in Electricity Load Modeling and Forecasting for Long-Term Power System Planning; National Renewable Energy Laboratory (NREL): Golden, CO, USA, 2023.
- Cuaresma, J.C.; Hlouskova, J.; Kossmeier, S.; Obersteiner, M. Forecasting electricity spot-prices using linear univariate time-series models. Appl. Energy 2004, 77, 87–106. [Google Scholar] [CrossRef]
- Bakhat, M.; Rosselló, J. Estimation of tourism-induced electricity consumption: The case study of the Balearic Islands, Spain. Energy Econ. 2011, 33, 437–444. [Google Scholar] [CrossRef]
- Garcia, R.C.; Contreras, J.; Van Akkeren, M.; Garcia, J.B.C. A GARCH forecasting model to predict day-ahead electricity prices. IEEE Trans. Power Syst. 2005, 20, 867–874. [Google Scholar] [CrossRef]
- Moghram, I.; Rahman, S. Analysis and evaluation of five short-term load forecasting techniques. IEEE Trans. Power Syst. 1989, 4, 1484–1491. [Google Scholar] [CrossRef]
- Weron, R.; Misiorek, A. Forecasting spot electricity prices: A comparison of parametric and semiparametric time series models. Int. J. Forecast. 2008, 24, 744–763. [Google Scholar] [CrossRef]
- Dong, B.; Cao, C.; Lee, S.E. Applying support vector machines to predict building energy consumption in tropical region. Energy Build. 2005, 37, 545–553. [Google Scholar] [CrossRef]
- Rodrigues, F.; Cardeira, C.; Calado, J.M.F. The daily and hourly energy consumption and load forecasting using artificial neural network method: A case study using a set of 93 households in Portugal. Energy Procedia 2014, 62, 220–229. [Google Scholar] [CrossRef]
- Lahouar, A.; Slama, J.B.H. Day-ahead load forecast using random forest and expert input selection. Energy Convers. Manag. 2015, 103, 1040–1051. [Google Scholar] [CrossRef]
- Taieb, S.B.; Hyndman, R.J. A gradient boosting approach to the Kaggle load forecasting competition. Int. J. Forecast. 2014, 30, 382–394. [Google Scholar] [CrossRef]
- Shi, H.; Xu, M.; Li, R. Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Trans. Smart Grid 2017, 9, 5271–5280. [Google Scholar] [CrossRef]
- Wang, H.Z.; Li, G.Q.; Wang, G.B.; Peng, J.C.; Jiang, H.; Liu, Y.T. Deep Learning-Based Ensemble Approach for Probabilistic Wind Power Forecasting. Appl. Energy 2017, 188, 56–70. [Google Scholar] [CrossRef]
- Bedi, J.; Toshniwal, D. Deep Learning Framework for Forecasting Electricity Demand. Appl. Energy 2019, 238, 1312–1326. [Google Scholar] [CrossRef]
- Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
- Wang, X.; Wang, X.; Zhao, Q.; Wang, S.; Fu, L. A multi-energy load prediction model based on deep multi-task learning and an ensemble approach for regional integrated energy systems. Int. J. Electr. Power Energy Syst. 2021, 126, 106583. [Google Scholar]
- Wu, K.; Wu, J.; Feng, L.; Yang, B.; Liang, R.; Yang, S.; Zhao, R. An attention-based CNN-LSTM-BiLSTM model for short-term electric load forecasting in an integrated energy system. Int. Trans. Electr. Energy Syst. 2021, 31, e12637. [Google Scholar] [CrossRef]
- Kim, T.; Lee, D.; Hwangbo, S. A deep-learning framework for forecasting renewable demands using variational auto-encoder and bidirectional long short-term memory. Sustain. Energy Grids Netw. 2024, 38, 101245. [Google Scholar] [CrossRef]
- Maćkiewicz, A.; Ratajczak, W. Principal components analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
- Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 2001, 13, 556–562. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing data dimensionality with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
- Ng, A. Sparse autoencoder. Cs294a Lect. Notes 2011, 72, 1–19. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar] [PubMed]
- Kingma, D.P.; Welling, M. An introduction to variational autoencoders. Found. Trends Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
- Zhang, Y.; Yan, B.; Aasma, M. A novel deep learning framework: Prediction and analysis of financial time series using CEEMD and LSTM. Expert Syst. Appl. 2020, 159, 113609. [Google Scholar] [CrossRef]
- Wang, J.; Wang, Z.; Li, J.; Wu, J. Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2437–2446. [Google Scholar]
- Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
- Cai, B.; Yang, S.; Gao, L.; Xiang, Y. Hybrid variational autoencoder for time series forecasting. Knowl.-Based Syst. 2023, 281, 111079. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2018, 10, 3943–3952. [Google Scholar] [CrossRef]
- Orr, G.B.; Müller, K.R. (Eds.) Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, UK, 2016; Volume 1, No. 2. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 June 2015; pp. 448–456. [Google Scholar]
- Doersch, C. Tutorial on variational autoencoders. arXiv 2016, arXiv:1606.05908. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
- Cheng, H.; Tan, P.N.; Gao, J.; Scripps, J. Multistep-ahead time series prediction. In Advances in Knowledge Discovery and Data Mining, Proceedings of the 10th Pacific-Asia Conference, PAKDD 2006, Singapore, 9–12 April 2006; Proceedings 10; Springer: Berlin/Heidelberg, Germany, 2006; pp. 765–774. [Google Scholar]
- Sorjamaa, A.; Hao, J.; Reyhani, N.; Ji, Y.; Lendasse, A. Methodology for long-term prediction of time series. Neurocomputing 2007, 70, 2861–2869. [Google Scholar] [CrossRef]
- Chevillon, G. Direct multi-step estimation and forecasting. J. Econ. Surv. 2007, 21, 746–785. [Google Scholar] [CrossRef]
- Aras, S.; Kocakoç, İ.D. A new model selection strategy in time series forecasting with artificial neural networks: IHTS. Neurocomputing 2016, 174, 974–987. [Google Scholar] [CrossRef]
- Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar]
- Kwiatkowski, D.; Phillips, P.C.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econom. 1992, 54, 159–178. [Google Scholar] [CrossRef]
- Ullah, S.; Xu, Z.; Wang, H.; Menzel, S.; Sendhoff, B.; Bäck, T. Exploring clinical time series forecasting with meta-features in variational recurrent models. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–9. [Google Scholar]
- Chen, W.; Tian, L.; Chen, B.; Dai, L.; Duan, Z.; Zhou, M. Deep variational graph convolutional recurrent network for multivariate time series anomaly detection. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 June 2022; pp. 3621–3633. [Google Scholar]
- Choi, H.; Ryu, S.; Kim, H. Short-term load forecasting based on ResNet and LSTM. In Proceedings of the 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Aalborg, Denmark, 29–31 October 2018; pp. 1–6. [Google Scholar]
- Wu, Z.; Huang, N.E. Ensemble Empirical Mode Decomposition: A Noise-Assisted Data Analysis Method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Operation | Forward LSTM | Backward LSTM |
---|---|---|
Forget Gate | ||
Input Gate | ||
Output Gate | ||
Cell Input | ||
Cell State | ||
Hidden State |
Parameter | Value | Description |
---|---|---|
Initial Convolution | 64 Filters, Kernel Size 3, Stride 1 | Initial layer settings |
ResNet Block 1 | 64 Filters, Stride 1, Kernel Size 3 | Includes Shortcut |
ResNet Block 2 | 32 Filters, Stride 1, Kernel Size 3 | Includes Shortcut |
ResNet Block 3 | 16 Filters, Stride 1, Kernel Size 3 | Includes Shortcut |
ResNet Block 4 | 16 Filters, Stride 1, Kernel Size 3 | Includes Shortcut |
Pooling Type | MaxPooling, Size 2, Stride 1, padding=’same’ | Pooling settings |
Activation Functions | ReLU, Tanh | Types of activation used |
Regularization | L2 (0.0001) | Regularization for BiLSTM layers |
Batch Normalization | Applied | Used in all ResNet blocks and shortcuts |
Horizon | ARIMA | DNN | BiLSTM | CNN | CPL | PCA-BiLSTM | TVRN |
RMSE | |||||||
train | 0.228 | 0.070 | 0.024 | 0.034 | 0.175 | 0.015 | 0.076 |
h = 1 | 0.052 | 0.079 | 0.028 | 0.032 | 0.259 | 0.021 | 0.003 |
h = 12 | 0.424 | 0.209 | 0.298 | 0.250 | 0.270 | 0.304 | 0.158 |
h = 24 | 0.402 | 0.304 | 0.324 | 0.297 | 0.297 | 0.322 | 0.261 |
MAE | |||||||
train | 0.182 | 0.058 | 0.023 | 0.027 | 0.139 | 0.013 | 0.038 |
h = 1 | 0.052 | 0.079 | 0.028 | 0.032 | 0.259 | 0.021 | 0.003 |
h = 12 | 0.369 | 0.177 | 0.258 | 0.212 | 0.269 | 0.261 | 0.132 |
h = 24 | 0.363 | 0.269 | 0.291 | 0.264 | 0.296 | 0.288 | 0.224 |
Horizon | ARIMA | DNN | BiLSTM | CNN | CPL | PCA-BiLSTM | TVRN |
RMSE | |||||||
train | 0.186 | 0.062 | 0.061 | 0.062 | 0.125 | 0.060 | 0.068 |
h = 1 | 0.077 | 0.073 | 0.083 | 0.078 | 0.059 | 0.099 | 0.005 |
h = 7 | 0.180 | 0.128 | 0.114 | 0.125 | 0.200 | 0.118 | 0.079 |
h = 15 | 0.178 | 0.129 | 0.120 | 0.133 | 0.167 | 0.123 | 0.086 |
MAE | |||||||
train | 0.047 | 0.048 | 0.047 | 0.048 | 0.095 | 0.047 | 0.050 |
h = 1 | 0.077 | 0.073 | 0.083 | 0.078 | 0.059 | 0.099 | 0.005 |
h = 7 | 0.149 | 0.107 | 0.094 | 0.106 | 0.190 | 0.100 | 0.069 |
h = 15 | 0.159 | 0.111 | 0.103 | 0.115 | 0.147 | 0.106 | 0.074 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ashebir, S.; Kim, S. Energy Demand Forecasting Using Temporal Variational Residual Network. Forecasting 2025, 7, 42. https://doi.org/10.3390/forecast7030042
Ashebir S, Kim S. Energy Demand Forecasting Using Temporal Variational Residual Network. Forecasting. 2025; 7(3):42. https://doi.org/10.3390/forecast7030042
Chicago/Turabian StyleAshebir, Simachew, and Seongtae Kim. 2025. "Energy Demand Forecasting Using Temporal Variational Residual Network" Forecasting 7, no. 3: 42. https://doi.org/10.3390/forecast7030042
APA StyleAshebir, S., & Kim, S. (2025). Energy Demand Forecasting Using Temporal Variational Residual Network. Forecasting, 7(3), 42. https://doi.org/10.3390/forecast7030042