Carbon Emission Forecasting Using Multi-Scale Temporal Patches
Abstract
1. Introduction
2. Overall Architecture of SST
- Multi-Scale Temporal Patches,
- Global Patterns Expert,
- Local Variations Expert,
- Long-Short Router,
- Forecasting Module.
2.1. Multi-Scale Temporal Patches—MSTP
2.2. Global Patterns Expert
2.3. Local Variations Expert
2.4. Long-Short Router
2.5. Forecasting Module
3. Experiments
3.1. Dataset
3.2. Data Preprocessing
- We assign a stable geographic identifier (ID) to each of the 497 subregions by rounding latitude/longitude coordinates, forming a spatiotemporal index.
- We extract seasonal attributes from weekly timestamps (Season ∈ {Spring, Summer, Autumn, Winter}) and apply one-hot encoding to obtain 4-dimensional orthogonal basis vectors.
- We group the data by ID and sort records by time. For each ID, we compute a 7-step (i.e., 7-week) moving average (MA7) and standard deviation (SD7) of the target in a strictly causal manner (using only historical observations). For initialization, the first six MA7/SD7 values are set to the first available full-window statistic (the 7th value).
- We construct a binary indicator for the COVID-19 period (is_covid) and an additional lockdown-status feature (is_lockdown).
- Using the central coordinates of each region, we generate directional rotation features at multiple azimuth angles (e.g., rot_15_x, rot_15_y, rot_30_x, rot_30_y).
- We compute spherical distances from each region to five key landmarks in Rwanda using the Haversine formula.
- We perform K-means spatial clustering on the 497 regions (K = 12) using the training split only, producing a clustering feature (geo_cluster). We additionally compute the spherical distance from each point to each cluster centroid (cluster_i_dist, ).
- We remove low-variance features (variance < 0.1) to filter redundant attributes.
- To avoid data leakage, all preprocessing statistics are estimated using only the training split (a 6:2:2 time-based split within each ID). Features with more than 50% missing values in training are discarded. For the remaining features, missing values are filled by within-ID forward fill along the time axis, then imputed using training-only group means and the training global mean. Any residual missing entries are finally filled with zeros.
3.3. Experimental Setup
- Long-term modeling (Mamba branch): a low-resolution configuration with RPTS = 0.125 (i.e., PL = 8, StrL = 8) and an input length IL = 72.
- Short-term modeling (LWT branch): a high-resolution configuration with RPTS = 2 (i.e., PS = 4, StrS = 2) and an input length IS = 36.
3.4. Results and Discussion
3.5. Ablation Studies
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1
| Parameters | SST | PatchTST | DLinear | Autoformer | Informer | Transformer |
|---|---|---|---|---|---|---|
| Epoch | 30 | 30 | 30 | 30 | 30 | 30 |
| Learning rate (Lr) | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
| d_model/dff | 256/1024 | 256/1024 | 256/1024 | 256/1024 | 256/1024 | 256/1024 |
| dropout | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |
| batch_size | 64 | 64 | 64 | 64 | 64 | 64 |
| random_seed | 2021 | 2021 | 2021 | 2021 | 2021 | 2021 |
| w | 7 | null | null | null | null | null |
| patience | 6 | 6 | 6 | 6 | 6 | 6 |
| optimizer | Adam | Adam | Adam | Adam | Adam | Adam |
| schedule | Lr × 0.9max(0,Epoch−3) | Lr × 0.9max(0,Epoch−3) | Lr × 0.9max(0,Epoch−3) | Lr × 0.9max(0,Epoch−3) | Lr × 0.9max(0,Epoch−3) | Lr × 0.9max(0,Epoch−3) |
| e_layers/d_layers | 2/1 | 2/1 | 2/1 | 2/1 | 2/1 | 2/1 |
| heads | 8 | 8 | 8 | 8 | 8 | 8 |
| m_layers | 1 | null | null | null | null | null |
Appendix A.2
- IS—short input length: number of historical time steps provided to the short branch (short-branch input length). The short branch models local/short-term dynamics.
- IL—long input length: number of historical time steps provided to the long branch (long-branch input length). The long branch models long-term trends and global state.
- O—prediction horizon or output length: number of future time steps the model predicts.
- pm, sm—Mamba patch length and stride: for the Mamba (long) branch, pm is the number of original time steps grouped into one patch token; sm is the step between adjacent Mamba patches.
- pl, sl—LWT patch length and stride: for the Local Window Transformer (short branch), pl and sl are defined analogously to pm, sm.
- w—LWT local window size: number of patch tokens included in each local self-attention window of the LWT. w is expressed in patch tokens and must be an odd integer (center token + symmetric left/right wings).
- d_model—model embedding dimension: dimension of token embeddings/hidden representations.
- dff—feed-forward width: inner dimension of the position-wise feed-forward network.
- heads—number of attention heads: number of parallel heads in multi-head attention.
- m_layers/e_layers/d_layers—layer counts for different modules: m_layers = number of Mamba layers; e_layers and d_layers = encoder/decoder layer counts for Transformer-style modules.
References
- Wang, F.; Harindintwali, J.D.; Yuan, Z.; Wang, M.; Wang, F.; Li, S.; Yin, Z.; Huang, L.; Fu, Y.; Li, L.; et al. Technologies and perspectives for achieving carbon neutrality. Innovation 2021, 2, 100180. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Deng, Z.; He, G.; Wang, H.; Zhang, X.; Lin, J.; Qi, Y.; Liang, X. Challenges and opportunities for carbon neutrality in China. Nat. Rev. Earth Environ. 2022, 3, 141–155. [Google Scholar] [CrossRef]
- Chen, H.; Wang, R.; Liu, X.; Du, Y.; Yang, Y. Monitoring the enterprise carbon emissions using electricity big data: A case study of Beijing. J. Clean. Prod. 2023, 396, 136427. [Google Scholar] [CrossRef]
- Liu, Y.; Xiao, H.; Zhang, N. Industrial carbon emissions of China’s regions: A spatial econometric analysis. Sustainability 2016, 8, 210. [Google Scholar] [CrossRef]
- Hu, Y.; Man, Y. Energy consumption and carbon emissions forecasting for industrial processes: Status, challenges and perspectives. Renew. Sustain. Energy Rev. 2023, 182, 113405. [Google Scholar] [CrossRef]
- Tollefson, J. China’s carbon emissions could peak sooner than forecast. Nature 2016, 531, 425–426. [Google Scholar] [CrossRef]
- Gao, H.; Wang, X.; Wu, K.; Zheng, Y.; Wang, Q.; Shi, W.; He, M. A review of building carbon emission accounting and prediction models. Buildings 2023, 13, 1617. [Google Scholar] [CrossRef]
- Libao, Y.; Tingting, Y.; Jielian, Z.; Guicai, L.; Yanfen, L.; Xiaoqian, M. Prediction of CO2 emissions based on multiple linear regression analysis. Energy Procedia 2017, 105, 4222–4228. [Google Scholar] [CrossRef]
- Sharma, S.; Mittal, A.; Bansal, M.; Joshi, B.P.; Rayal, A. Forecasting of carbon emissions in India using (ARIMA) time series predicting approach. In Proceedings of the International Conference on Renewable Power, Singapore, 28–29 November 2023; Springer Nature: Singapore, 2023; pp. 799–811. [Google Scholar]
- Lin, C.S.; Liou, F.M.; Huang, C.P. Grey forecasting model for CO2 emissions: A Taiwan study. Appl. Energy 2011, 88, 3816–3820. [Google Scholar] [CrossRef]
- Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
- Rigatti, S.J. Random forest. J. Insur. Med. 2017, 47, 31–39. [Google Scholar] [CrossRef] [PubMed]
- Ahmad, A.S.; Hassan, M.Y.; Abdullah, M.P.; Rahman, H.A.; Hussin, F.; Abdullah, H.; Saidur, R. A review on applications of ANN and SVM for building electrical energy consumption forecasting. Renew. Sustain. Energy Rev. 2014, 33, 102–109. [Google Scholar] [CrossRef]
- Nie, W.; Huang, Z.; Mai, S.; Ha, W.; Chen, X.; Zhang, Q.; Feng, X.; Yuan, Z. Carbon emission prediction and analysis of influencing factors based on the LSTM model. In Proceedings of the International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2024), Guangzhou, China, 6–8 December 2024; SPIE: Bellingham, WA, USA, 2025; Volume 13560, pp. 631–636. [Google Scholar]
- Yang, F.; Liu, D.; Zeng, Q.; Chen, Z.; Ye, Y.; Yang, T.; He, Y.; Zhou, S.; Zheng, L. Prediction of Mianyang carbon emission trend based on adaptive gru neural network. In Proceedings of the 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 9–11 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 747–750. [Google Scholar]
- Han, Z.; Cui, B.; Xu, L.; Wang, J.; Guo, Z. Coupling LSTM and CNN neural networks for accurate carbon emission prediction in 30 Chinese provinces. Sustainability 2023, 15, 13934. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Łukasz, K.; Illia, P. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
- Nie, Y. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Wang, Z.; Kong, F.; Feng, S.; Wang, M.; Yang, X.; Zhao, H.; Wang, D.; Zhang, Y. Is mamba effective for time series forecasting? Neurocomputing 2025, 619, 129178. [Google Scholar] [CrossRef]
- Liang, A.; Jiang, X.; Sun, Y.; Shi, X.; Li, K. Bi-mamba+: Bidirectional mamba for time series forecasting. arXiv 2024, arXiv:2404.15772. [Google Scholar]
- Ahamed, M.A.; Cheng, Q. Timemachine: A time series is worth 4 mambas for long-term forecasting. In Proceedings of the ECAI 2024: 27th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 19–24 October 2024; Volume 392, p. 1688. [Google Scholar]
- Ma, H.; Chen, Y.; Zhao, W.; Yang, J.; Ji, Y.; Xu, X.; Yang, G. A Mamba Foundation Model for Time Series Forecasting. arXiv 2024, arXiv:2411.02941. [Google Scholar] [CrossRef]
- Hong, J.T.; Han, S.; Yan, J.; Liu, Y.-Q. Dual-path Frequency Mamba-Transformer Model for Wind Power Forecasting. Energy 2025, 332, 137225. [Google Scholar] [CrossRef]
- Shen, T.; Shi, W.; Lei, J.; Li, Q. PAKMamba: Enhancing electricity load forecasting with periodic aggregation and Koopman analysis. Comput. Electr. Eng. 2025, 123, 110113. [Google Scholar] [CrossRef]
- Hu, J.; Duan, P.; Cao, X.; Xue, Q.; Zhao, B.; Zhao, X.; Yuan, X.; Zhang, C. A multi-energy load forecasting method based on the Mixture-of-Experts model and dynamic multilevel attention mechanism. Energy 2025, 324, 135947. [Google Scholar] [CrossRef]
- Lee, J.; Hong, S. Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems. arXiv 2026, arXiv:2601.01410. [Google Scholar] [CrossRef]
- Xu, X.; Chen, C.; Liang, Y.; Huang, B.; Bai, G.; Zhao, L.; Shu, K. SST: Multi-Scale Hybrid Mamba-Transformer Experts for Time Series Forecasting. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2025; pp. 3655–3665. [Google Scholar]
- Faruque, M.O.; Rabby, M.A.J.; Hossain, M.A.; Hossain, M.A.; Islam, M.R.; Rashid, M.M.U.; Muyeen, S.M. A comparative analysis to forecast carbon dioxide emissions. Energy Rep. 2022, 8, 8046–8060. [Google Scholar] [CrossRef]
- Moruri, D.; Bray, A.; Reade, W.; Chow, A. Predict CO2 Emissions in Rwanda; Kaggle: San Francisco, CA, USA, 2023; Available online: https://kaggle.com/competitions/playground-series-s3e20 (accessed on 17 May 2025).







| Dataset | Sampling Interval | Cycle | SNR | ADF p | |
|---|---|---|---|---|---|
| ETTh1 | 1 h | 24 (Daily) | 0.9406 | 15.10 | 0.0116 |
| ETTh1 | 1 h | 168 (Weekly) | 0.8701 | —— | —— |
| Rwanda | 7 days | 13 (Quarter) | 0.0091 | —— | —— |
| Rwanda | 7 days | 26 (Half a year) | 0.0021 | —— | —— |
| Rwanda | 7 days | 53 (Annual) | 0.0129 | −5.33 | 6.177 × 10−30 |
| Experimental Group | MSE | |||
|---|---|---|---|---|
| Low–High (baseline) | 3 | (8, 8) | (4,2) | 0.0090 |
| High–High | 3 | (4, 2) | (4,2) | 0.0092 |
| Low–Low | 3 | (8, 8) | (8,8) | 0.0095 |
| High–Low | 3 | (4, 2) | (8,8) | 0.0105 |
| 72-36-6 | 72-36-12 | 72-36-24 | 72-36-48 | 72-36-72 | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | RSE | MSE | MAE | RSE | MSE | MAE | RSE | MSE | MAE | RSE | MSE | MAE | RSE |
| SST | 0.0089 | 0.0312 | 0.2392 | 0.0141 | 0.0446 | 0.3013 | 0.0249 | 0.0616 | 0.4003 | 0.0435 | 0.0880 | 0.5296 | 0.0743 | 0.1260 | 0.6909 |
| PatchTST | 0.0097 | 0.0338 | 0.2974 | 0.0194 | 0.0502 | 0.3469 | 0.0286 | 0.0712 | 0.4286 | 0.0461 | 0.0917 | 0.5637 | 0.0678 | 0.1221 | 0.6754 |
| DLinear | 0.0169 | 0.0679 | 0.3302 | 0.0279 | 0.0952 | 0.4237 | 0.0423 | 0.1229 | 0.5220 | 0.0657 | 0.1696 | 0.6509 | 0.0868 | 0.2080 | 0.7468 |
| Autoformer | 0.0590 | 0.1715 | 0.6170 | 0.0549 | 0.1638 | 0.5951 | 0.1400 | 0.2449 | 0.9502 | 0.1216 | 0.2291 | 0.8855 | 0.2140 | 0.3151 | 1.1743 |
| Informer | 0.0555 | 0.1496 | 0.5986 | 0.0838 | 0.1752 | 0.7353 | 0.3097 | 0.3329 | 1.4131 | 0.1745 | 0.2412 | 1.0606 | 0.6586 | 0.4203 | 2.0598 |
| Transformer | 0.0548 | 0.1212 | 0.5950 | 0.1674 | 0.1704 | 1.0391 | 0.1479 | 0.2091 | 0.9767 | 0.1068 | 0.2011 | 0.8297 | 0.2032 | 0.2803 | 1.1442 |
| IL-IS-O | 24-12-53 | 48-24-53 | 72-36-53 | 96-48-53 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | MSE | MAE | RSE | MSE | MAE | RSE | MSE | MAE | RSE | MSE | MAE | RSE |
| SST | 0.0544 | 0.1082 | 0.5924 | 0.0478 | 0.0953 | 0.5552 | 0.0466 | 0.0911 | 0.5481 | |||
| PatchTST | 0.0662 | 0.1191 | 0.6532 | 0.0584 | 0.1223 | 0.6327 | 0.0485 | 0.0956 | 0.5592 | 0.0491 | 0.1011 | 0.5627 |
| DLinear | 0.0696 | 0.1719 | 0.6697 | 0.0701 | 0.1751 | 0.6722 | 0.0700 | 0.1778 | 0.6715 | 0.0683 | 0.1742 | 0.6635 |
| Autoformer | 0.1192 | 0.2307 | 0.8766 | 0.1443 | 0.2399 | 0.9644 | 0.1543 | 0.2542 | 0.9973 | 0.1440 | 0.2564 | 0.9635 |
| Informer | 0.3544 | 0.2878 | 1.5114 | 0.1468 | 0.2370 | 0.9726 | 0.1569 | 0.2454 | 1.0057 | 0.3616 | 0.3320 | 1.5265 |
| Transformer | 0.2003 | 0.2167 | 1.1362 | 0.2066 | 0.2638 | 1.1539 | 0.1379 | 0.2251 | 0.9428 | 0.1097 | 0.1933 | 0.8408 |
| Model | Total Number of Parameters | Memory Cost (MB) | Average Time per Epoch (s/epoch) |
|---|---|---|---|
| SST | 2,648,842 | 2685.46 | 69.13 |
| PatchTST | 1,614,088 | 3057.38 | 49.56 |
| DLinear | 876 | 21.08 | 4.45 |
| Autoformer | 2,732,033 | 380.64 | 43.55 |
| Informer | 2,936,065 | 182.02 | 29.15 |
| Transformer | 2,738,689 | 226.32 | 23.84 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xiong, Y.; Wang, M. Carbon Emission Forecasting Using Multi-Scale Temporal Patches. Appl. Sci. 2026, 16, 2025. https://doi.org/10.3390/app16042025
Xiong Y, Wang M. Carbon Emission Forecasting Using Multi-Scale Temporal Patches. Applied Sciences. 2026; 16(4):2025. https://doi.org/10.3390/app16042025
Chicago/Turabian StyleXiong, Yuanhao, and Meiling Wang. 2026. "Carbon Emission Forecasting Using Multi-Scale Temporal Patches" Applied Sciences 16, no. 4: 2025. https://doi.org/10.3390/app16042025
APA StyleXiong, Y., & Wang, M. (2026). Carbon Emission Forecasting Using Multi-Scale Temporal Patches. Applied Sciences, 16(4), 2025. https://doi.org/10.3390/app16042025

