Improving Short-Term Load Forecasting with Multi-Scale Convolutional Neural Networks and Transformer-Based Multi-Head Attention Mechanisms
Abstract
:1. Introduction
2. Related Works
3. Method
3.1. Selection of Time Window Length
3.2. Multi-Scale Convolutional Layer
3.3. Single Transformer Block (Temporal Attention Mechanism)
3.3.1. Data Preprocessing: Polynomial Interpolation
3.3.2. Transformer Reshaping Operation
3.4. Feature Fusion and Output Layer
4. Experiments
4.1. Results
4.2. Sensitivity Analysis of Embedding Dimension and Time Delay Parameters
- Embedding Dimensions: ;
- Time Delay: .
4.3. Ablation Study
4.4. Comparison to Other Models
4.5. Evaluation on Diverse Datasets
- Residential Electricity Consumption Dataset:Contains electricity consumption data from 50 households, sampled at 1 h intervals over the course of one year. The data capture daily and seasonal usage patterns, allowing for the modeling of both short-term fluctuations and long-term trends in residential energy consumption [31].
- Commercial Electricity Consumption Dataset:Analyzes the different electricity consumption behaviors of 30 commercial establishments during and after business hours, sampled every 1 h. Discovers that different business types have different operating schedules and peak electricity usage periods [32].
- Geographically Diverse Electricity Consumption Dataset:In order to assess the consumption patterns in different regions and climates, we obtained hourly electricity consumption data for three regions with different climatic conditions; they are sampled at 1 h intervals, and refer to the sampling method of power data in [33]. The influence of environmental factors as well as the socio-economic environment on energy use is derived.
- Residential Dataset: Achieves the lowest MSE and highest R2 value, indicating precise predictions closely aligned with actual consumption patterns.
- Commercial Dataset: Exhibits slightly higher MSE and lower R2 compared to the Residential dataset, reflecting the more complex and variable nature of commercial electricity usage.
- Geographically Diverse Dataset: Performance metrics suggest that the model effectively adapts to different regional consumption behaviors and climatic influences.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Arvanitidis, A.; Bargiotas, D.; Daskalopulu, A.; Laitsos, V.; Tsoukalas, L. Enhanced Short-Term Load Forecasting Using Artificial Neural Networks. Energies 2021, 14, 7788. [Google Scholar] [CrossRef]
- Gross, G.; Galiana, F. Short-term load forecasting. Proc. IEEE 1987, 75, 1558–1573. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, M.; Sun, M.; Deng, R.; Cheng, P.; Niyato, D.; Chow, M.-Y.; Chen, J. Vulnerability of Machine Learning Approaches Applied in IoT-Based Smart Grid: A Review. IEEE Internet Things J. 2024, 11, 18951–18975. [Google Scholar] [CrossRef]
- Musleh, A.S.; Chen, G.; Dong, Z.Y. A Survey on the Detection Algorithms for False Data Injection Attacks in Smart Grids. IEEE Trans. Smart Grid 2020, 11, 2218–2234. [Google Scholar] [CrossRef]
- Chu, J.; Wei, C.; Li, J.; Lu, X. Short-Term Electrical Load Forecasting Based on Multi-Granularity Time Augmented Learning. Electr. Eng. 2024. [Google Scholar] [CrossRef]
- Jain, A.K. Optimal Planning and Operation of Distributed Energy Resources; Machine Learning Applications in Smart Grid; Singh, S.N., Jain, N., Agarwal, U., Kumawat, M., Eds.; Springer: Singapore, 2023; pp. 193–213. [Google Scholar] [CrossRef]
- Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM Model for Short-Term Individual Household Load Forecasting. IEEE Access. 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
- Kim, T.Y.; Cho, S.B. Predicting the Household Power Consumption Using CNN-LSTM Hybrid Networks. In Proceedings of the Intelligent Data Engineering and Automated Learning–IDEAL 2018, Madrid, Spain, 21–23 November 2018; pp. 481–490. [Google Scholar] [CrossRef]
- Chai, T.; Draxler, R.R. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)? – Arguments Against Avoiding RMSE in the Literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
- Chen, W.; Shi, K. Multi-Scale Attention Convolutional Neural Network for Time Series Classification. Neural Netw. 2021, 136, 126–140. [Google Scholar] [CrossRef]
- Deng, Z.; Wang, B.; Xu, Y.; Xu, T.; Liu, C.; Zhu, Z. Multi-Scale Convolutional Neural Network With Time-Cognition for Multi-Step Short-Term Load Forecasting. IEEE Access 2019, 7, 88058–88071. [Google Scholar] [CrossRef]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.-c. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; Volume 28, pp. 802–810. [Google Scholar] [CrossRef]
- Wirsing, K.; Mohammady, S. Wavelet Theory; IntechOpen: Rijeka, Croatia, 2020; Chapter 1. [Google Scholar] [CrossRef]
- Sen, P.; Farajtabar, M.; Ahmed, A.; Zhai, C.; Li, L.; Xue, Y.; Smola, A.; Song, L. Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 5546–5557. [Google Scholar] [CrossRef]
- Salinas, D.; Bohlke-Schneider, M.; Callot, L.; Medico, R.; Gasthaus, J. High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–12 December 2020; Volume 33, pp. 6827–6837. [Google Scholar] [CrossRef]
- Peng, H.; Wang, W.; Chen, P.; Liu, R. DEFM: Delay-Embedding-Based Forecast Machine for Time Series Forecasting by Spatiotemporal Information Transformation. Chaos 2024, 34, 043112. [Google Scholar] [CrossRef] [PubMed]
- Maroor, J.P.; Sahu, D.N.; Nijhawan, G.; Karthik, A.; Shrivastav, A.K.; Chakravarthi, M.K. Image-Based Time Series Forecasting: A Deep Convolutional Neural Network Approach. In Proceedings of the 2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM), Uttar Pradesh, India, 21–23 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Cui, Z.; Chen, W.; Chen, Y. Multi-Scale Convolutional Neural Networks for Time Series Classification. arXiv 2016, arXiv:1603.06995. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
- Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A Transformer-Based Framework for Multivariate Time Series Representation Learning. arXiv 2020, arXiv:2010.02803. [Google Scholar] [CrossRef]
- Salman, D.; Direkoglu, C.; Kusaf, M.; Fahrioglu, M. Hybrid Deep Learning Models for Time Series Forecasting of Solar Power. Neural Comput. Appl. 2024, 36, 9095–9112. [Google Scholar] [CrossRef]
- Liu, G.; Zhong, K.; Li, H.; Chen, T.; Wang, Y. A State of Art Review on Time Series Forecasting with Machine Learning for Environmental Parameters in Agricultural Greenhouses. Inf. Process. Agric. 2024, 11, 143–162. [Google Scholar] [CrossRef]
- Yang, Y.; Fan, C.; Xiong, H. A Novel General-Purpose Hybrid Model for Time Series Forecasting. Appl. Intell. 2022, 52, 2212–2223. [Google Scholar] [CrossRef] [PubMed]
- Elsworth, S.; Güttel, S. Time Series Forecasting Using LSTM Networks: A Symbolic Approach. arXiv 2020, arXiv:2003.05672. [Google Scholar] [CrossRef]
- Liang, M.; He, Q.; Yu, X.; Wang, H.; Meng, Z.; Jiao, L. A Dual Multi-Head Contextual Attention Network for Hyperspectral Image Classification. Remote Sens. 2022, 14, 3091. [Google Scholar] [CrossRef]
- Yang, Y.; Lu, J. Foreformer: An Enhanced Transformer-Based Framework for Multivariate Time Series Forecasting. Appl. Intell. 2022, 53, 12521–12540. [Google Scholar] [CrossRef]
- Sun, F.-K.; Boning, D.S. FreDo: Frequency Domain-Based Long-Term Time Series Forecasting. arXiv 2022, arXiv:2205.12301. [Google Scholar] [CrossRef]
- Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Niu, D.; Yu, M.; Sun, L.; Gao, T.; Wang, K. Short-Term Multi-Energy Load Forecasting for Integrated Energy Systems Based on CNN-BiGRU Optimized by Attention Mechanism. Appl. Energy 2022, 313, 118801. [Google Scholar] [CrossRef]
- Abbasimehr, H.; Paki, R. Improving Time Series Forecasting Using LSTM and Attention Models. J. Ambient Intell. Humaniz. Comput. 2022, 13, 673–691. [Google Scholar] [CrossRef]
- Cascone, L.; Sadiq, S.; Ullah, S.; Mirjalili, S.; Siddiqui, H.U.R.; Umer, M. Predicting Household Electric Power Consumption Using Multi-step Time Series with Convolutional LSTM. Big Data Res. 2023, 31, 100360. [Google Scholar] [CrossRef]
- Semmelmann, L.; Henni, S.; Weinhardt, C. Load Forecasting for Energy Communities: A Novel LSTM-XGBoost Hybrid Model Based on Smart Meter Data. Energy Inform. 2022, 5 (Suppl. S1), 24. [Google Scholar] [CrossRef]
- Pimm, A.J.; Cockerill, T.T.; Taylor, P.G.; Bastiaans, J. The Value of Electricity Storage to Large Enterprises: A Case Study on Lancaster University. Energy 2017, 128, 378–393. [Google Scholar] [CrossRef]
Model | MSE | MAE | NMSE | NRMSE | R2 |
---|---|---|---|---|---|
LSTM | 2.514 | 0.78 | 0.92 | 0.96 | 0.73 |
CNN-LSTM | 2.121 | 0.73 | 0.78 | 0.88 | 0.80 |
CNN-LSTM-Attention | 1.452 | 0.45 | 0.53 | 0.73 | 0.85 |
TMSF | 1.122 | 0.34 | 0.41 | 0.64 | 0.93 |
Embedding Dimension (d) | Time Delay () | Parameter Combination | MSE | R2 |
---|---|---|---|---|
2 | 1 | 1.200 | 0.78 | |
2 | 2 | 1.150 | 0.80 | |
2 | 3 | 1.180 | 0.77 | |
3 | 1 | 1.100 | 0.85 | |
3 | 2 | 1.050 | 0.88 | |
3 | 3 | 1.070 | 0.86 | |
4 | 1 | 0.900 | 0.83 | |
4 | 2 | 0.980 | 0.88 | |
4 | 3 | 0.910 | 0.90 | |
5 | 1 | 1.020 | 0.81 | |
5 | 2 | 0.990 | 0.90 | |
5 | 3 | 1.000 | 0.79 |
Spatial Attention Layer | Transformer Block | Multi-Scale Convolution | MSE | R2 |
---|---|---|---|---|
0.66 | 0.312 | |||
✓ | 0.63 | 0.324 | ||
✓ | 0.58 | 0.378 | ||
✓ | 0.64 | 0.544 | ||
✓ | ✓ | 0.81 | 0.675 | |
✓ | ✓ | 0.89 | 0.875 | |
✓ | ✓ | 0.72 | 0.865 | |
✓ | ✓ | ✓ | 0.90 | 0.912 |
Model | MSE | MAE | NMSE | NRMSE | R2 | GFLOPs | IT |
---|---|---|---|---|---|---|---|
LSTM | 2.324 | 1.50 | 0.85 | 0.92 | 0.79 | 0.0011 | 573.11 s |
CNN-LSTM | 2.021 | 1.30 | 0.75 | 0.86 | 0.82 | 0.0014 | 702.38 s |
CNN-LSTM-Attention | 1.322 | 0.95 | 0.60 | 0.77 | 0.81 | 0.0110 | 898.12 s |
TMSF (ours) | 1.012 | 0.62 | 0.44 | 0.66 | 0.91 | 0.2312 | 1396.32 s |
Dataset | MSE | MAE | R2 |
---|---|---|---|
Residential | 0.77 | 0.65 | 0.89 |
Commercial | 0.81 | 0.66 | 0.88 |
Geographically Diverse | 0.79 | 0.73 | 0.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, S.; He, D.; Liu, G. Improving Short-Term Load Forecasting with Multi-Scale Convolutional Neural Networks and Transformer-Based Multi-Head Attention Mechanisms. Electronics 2024, 13, 5023. https://doi.org/10.3390/electronics13245023
Ding S, He D, Liu G. Improving Short-Term Load Forecasting with Multi-Scale Convolutional Neural Networks and Transformer-Based Multi-Head Attention Mechanisms. Electronics. 2024; 13(24):5023. https://doi.org/10.3390/electronics13245023
Chicago/Turabian StyleDing, Sheng, Dongyi He, and Guiran Liu. 2024. "Improving Short-Term Load Forecasting with Multi-Scale Convolutional Neural Networks and Transformer-Based Multi-Head Attention Mechanisms" Electronics 13, no. 24: 5023. https://doi.org/10.3390/electronics13245023
APA StyleDing, S., He, D., & Liu, G. (2024). Improving Short-Term Load Forecasting with Multi-Scale Convolutional Neural Networks and Transformer-Based Multi-Head Attention Mechanisms. Electronics, 13(24), 5023. https://doi.org/10.3390/electronics13245023