Machine Learning-Based Energy Consumption and Carbon Footprint Forecasting in Urban Rail Transit Systems
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Description and Preprocessing
2.2. Machine Learning Methods
2.2.1. Support Vector Regression (SVR)
2.2.2. Extreme Gradient Boosting (XGBoost)
2.2.3. Long Short-Term Memory (LSTM)
2.2.4. Adaptive Neuro-Fuzzy Inference System (ANFIS)
2.2.5. Nonlinear Autoregressive Neural Network (NAR-NN)
2.3. Hyperparameter Tuning
2.4. Performance Metrics
3. Results
4. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ML | Machine Learning |
| SVR | Support Vector Regression |
| XGBoost | Extreme Gradient Boosting |
| LSTM | Long Short-Term Memory |
| ANFIS | Adaptive Neuro-Fuzzy Inference System |
| NAR-NN | Nonlinear Autoregressive Neural Network |
| MAPE | Mean Absolute Percentage Error |
| CO2 | Carbon Dioxide |
| BEMS | Building Energy Management Systems |
| GRU | Gated Recurrent Unit |
| CNN | Convolutional Neural Network |
| IoT | Internet of Things |
| AutoML | Automated Machine Learning |
| TCN | Temporal Convolutional Network |
| SWT | Stationary Wavelet Transform |
| FRBPS | Fuzzy Rule-Based Prediction Systems |
| GAN | Generative Adversarial Network |
| NDC | Nationally Determined Contributions |
| PCA | Principal Component Analysis |
| BSS | Blind Source Separation |
| XAI | eXplainable Artificial Intelligence |
| SHAP | SHapley Additive exPlanations |
| WPP | Wind Power Plant |
| K-NN | K-Nearest Neighbors |
| RBF | Radial Basis Function |
| RMSE | Root Mean Squared Error |
| MAE | Mean Absolute Error |
| kWh | Kilowatt-Hour |
Appendix A
| Model | Parameter | Substation-1 | Substation-2 | Substation-3 | Substation-4 | Substation-5 | Carbon Footprint |
|---|---|---|---|---|---|---|---|
| SVR | C | 1.9762 | 1.9762 | 2.2465 | 1.9762 | 1.9762 | 1.9762 |
| ε (epsilon) | 0.0312 | 0.0312 | 0.0105 | 0.0312 | 0.0312 | 0.0312 | |
| γ (gamma) | 0.0685 | 0.0685 | 0.6708 | 0.0685 | 0.0685 | 0.0685 | |
| kernel | rbf | rbf | poly | rbf | rbf | rbf | |
| XGBoost | n_estimators | 138 | 94 | 94 | 150 | 31 | 91 |
| max_depth | 4 | 5 | 5 | 2 | 4 | 3 | |
| learning_rate | 0.0726 | 0.2330 | 0.2330 | 0.2925 | 0.2267 | 0.1857 | |
| subsample | 0.9506 | 0.8796 | 0.8796 | 0.9497 | 0.9187 | 0.7423 | |
| colsample_bytree | 0.7962 | 0.7468 | 0.7468 | 0.7637 | 0.9314 | 0.9407 | |
| min_child_weight | 4 | 4 | 4 | 4 | 4 | 4 | |
| gamma | 0.1163 | 0.1232 | 0.1232 | 0.1734 | 0.2434 | 0.4948 | |
| reg_alpha | 0.6318 | 0.8796 | 0.8796 | 0.3738 | 0.2043 | 0.7950 | |
| reg_lambda | 0.7098 | 0.6410 | 0.6410 | 0.5723 | 0.8768 | 0.2788 | |
| LSTM | units_1 | 9 | 23 | 23 | 9 | 17 | 17 |
| units_2 | 14 | 3 | 3 | 14 | 4 | 4 | |
| dropout | 0.3803 | 0.2195 | 0.2195 | 0.3803 | 0.4486 | 0.4486 | |
| learning_rate | 0.005106 | 0.008890 | 0.008890 | 0.005106 | 0.002274 | 0.002274 | |
| batch_size | 8 | 32 | 32 | 8 | 8 | 8 | |
| lookback | 6 | 5 | 5 | 6 | 4 | 4 | |
| ANFIS | n_mfs | 2 | 3 | 3 | 3 | 3 | 3 |
| learning_rate | 0.0201 | 0.0205 | 0.0148 | 0.0893 | 0.0132 | 0.0196 | |
| lookback | 4 | 4 | 2 | 5 | 4 | 4 | |
| NAR-NN | hidden_units | 15 | 18 | 10 | 14 | 17 | 17 |
| n_layers | 1 | 2 | 2 | 1 | 1 | 1 | |
| activation | tanh | relu | tanh | relu | relu | tanh | |
| learning_rate | 0.0352 | 0.0116 | 0.0771 | 0.0232 | 0.0483 | 0.0391 | |
| lookback | 3 | 3 | 4 | 4 | 4 | 2 |
References
- Qureshi, M.; Arbab, M.A.; Rehman, S. Deep learning-based forecasting of electricity consumption. Sci. Rep. 2024, 14, 6489. [Google Scholar] [CrossRef]
- Elhabyb, K.; Baina, A.; Bellafkih, M.; Deifalla, A.F. Machine Learning Algorithms for Predicting Energy Consumption in Educational Buildings. Int. J. Energy Res. 2024, 2024, 6812425. [Google Scholar] [CrossRef]
- Morcillo-Jimenez, R.; Mesa, J.; Gómez-Romero, J.; Vila, M.A.; Martin-Bautista, M.J. Deep learning for prediction of energy consumption: An applied use case in an office building. Appl. Intell. 2024, 54, 5813–5825. [Google Scholar] [CrossRef]
- Mahjoub, S.; Chrifi-Alaoui, L.; Marhic, B.; Delahoche, L. Predicting Energy Consumption Using LSTM, Multi-Layer GRU and Drop-GRU Neural Networks. Sensors 2022, 22, 4062. [Google Scholar] [CrossRef] [PubMed]
- Ramos, P.V.B.; Villela, S.M.; Silva, W.N.; Dias, B.H. Residential energy consumption forecasting using deep learning models. Appl. Energy 2023, 350, 121705. [Google Scholar] [CrossRef]
- Chan, J.W.; Yeo, C.K. A Transformer based approach to electricity load forecasting. Electr. J. 2024, 37, 107370. [Google Scholar] [CrossRef]
- Chung, J.; Jang, B. Accurate prediction of electricity consumption using a hybrid CNN-LSTM model based on multivariable data. PLoS ONE 2022, 17, e0278071. [Google Scholar] [CrossRef]
- Maleki, N.; Lundström, O.; Musaddiq, A.; Jeansson, J.; Olsson, T.; Ahlgren, F. Future energy insights: Time-series and deep learning models for city load forecasting. Appl. Energy 2024, 374, 124067. [Google Scholar] [CrossRef]
- Deng, Y.; Yue, Z.; Wu, Z.; Li, Y.; Wang, Y. TCN-Attention-BIGRU: Building energy modelling based on attention mechanisms and temporal convolutional networks. Electron. Res. Arch. 2024, 32, 2160–2179. [Google Scholar] [CrossRef]
- Frikha, M.; Taouil, K.; Fakhfakh, A.; Derbel, F. Predicting Power Consumption Using Deep Learning with Stationary Wavelet. Forecasting 2024, 6, 864–884. [Google Scholar] [CrossRef]
- Gorzałczany, M.B.; Rudziński, F. Energy Consumption Prediction in Residential Buildings—An Accurate and Interpretable Machine Learning Approach Combining Fuzzy Systems with Evolutionary Optimization. Energies 2024, 17, 3242. [Google Scholar] [CrossRef]
- Chahardoli, M.; Osati Eraghi, N.; Nazari, S. An energy consumption prediction approach in smart cities by CNN-LSTM network improved with game theory and Namib Beetle Optimization (NBO) algorithm. J. Supercomput. 2025, 81, 403. [Google Scholar] [CrossRef]
- Ajala, A.A.; Adeoye, O.L.; Salami, O.M.; Jimoh, A.Y. An examination of daily CO2 emissions prediction through a comparative analysis of machine learning, deep learning, and statistical models. Environ. Sci. Pollut. Res. 2025, 32, 2510–2535. [Google Scholar] [CrossRef]
- Begum, A.M.; Mobin, M.A. A machine learning approach to carbon emissions prediction of the top eleven emitters by 2030 and their prospects for meeting Paris agreement targets. Sci. Rep. 2025, 15, 19469. [Google Scholar] [CrossRef]
- Xia, X.; Zhu, D.; Sha, J.; Ma, R.; Kang, W. Research on industrial carbon emission prediction method based on CNN–LSTM under dual carbon goals. Int. J. Low-Carbon. Technol. 2025, 20, 580–589. [Google Scholar] [CrossRef]
- Meng, Z.; Sun, H.; Wang, X. Forecasting energy consumption based on SVR and Markov model: A case study of China. Front. Environ. Sci. 2022, 10, 883711. [Google Scholar] [CrossRef]
- Li, F.; Sun, M.; Xian, Q.; Feng, X. MDL: Industrial carbon emission prediction method based on meta-learning and diff long short-term memory networks. PLoS ONE 2024, 19, e0307915. [Google Scholar] [CrossRef]
- Ghorbal, A.B.; Grine, A.; Elbatal, I.; Almetwally, E.M.; Eid, M.M.; El-Kenawy, E.M. Predicting carbon dioxide emissions using deep learning and Ninja metaheuristic optimization algorithm. Sci. Rep. 2025, 15, 4021. [Google Scholar] [CrossRef] [PubMed]
- Alam, G.M.I.; Arfin Tanim, S.; Sarker, S.K.; Watanobe, Y.; Islam, R.; Mridha, M.F.; Nur, K. Deep learning model based prediction of vehicle CO2 emissions with eXplainable AI integration for sustainable environment. Sci. Rep. 2025, 15, 3655. [Google Scholar] [CrossRef]
- Lee, E.H.; Lee, I.; Cho, S.-H.; Kho, S.-Y.; Kim, D.-K. A Travel Behavior-Based Skip-Stop Strategy Considering Train Choice Behaviors Based on Smartcard Data. Sustainability 2019, 11, 2791. [Google Scholar] [CrossRef]
- Zhu, C.; Yang, X.; Wang, Z.; Fang, J.; Wang, J.; Cheng, L. Optimization for the Train Plan with Flexible Train Composition Considering Carbon Emission. Eng. Lett. 2023, 31, 562–573. [Google Scholar]
- Troyanskaya, O.; Cantor, M.; Sherlock, G.; Brown, P.; Hastie, T.; Tibshirani, R.; Botstein, D.; Altman, R.B. Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17, 520–525. [Google Scholar] [CrossRef]
- Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inform. Decis. Mak. 2016, 16, 74. [Google Scholar] [CrossRef]
- Vapnik, V.N. The Nature of Statistical Learning Theory, 2nd ed.; Springer: New York, NY, USA, 1999. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Jang, J.-S.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
- Narendra, K.S.; Parthasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]










| Model | Parameter | Search Range | Description |
|---|---|---|---|
| SVR | C | [0.1, 100] | Regularization parameter |
| ε (epsilon) | [0.01, 0.5] | Epsilon tube width | |
| γ (gamma) | [0.001, 1.0] | Kernel coefficient | |
| kernel | {rbf, poly, sigmoid} | Kernel function type | |
| XGBoost | n_estimators | [30, 200] | Number of boosting trees |
| max_depth | [2, 5] | Maximum tree depth | |
| learning_rate | [0.05, 0.3] | Boosting learning rate | |
| subsample | [0.7, 1.0] | Row sampling ratio | |
| colsample_bytree | [0.7, 1.0] | Column sampling ratio | |
| min_child_weight | [3, 10] | Minimum child weight | |
| gamma | [0.1, 0.5] | Minimum loss reduction | |
| reg_alpha | [0.1, 1] | L1 regularization | |
| reg_lambda | [0.1, 1] | L2 regularization | |
| LSTM | units_1 | [8, 32] | First-LSTM-layer neurons |
| units_2 | [0, 16] | Second-LSTM-layer neurons | |
| dropout | [0.2, 0.5] | Dropout rate | |
| learning_rate | [0.001, 0.01] | Adam optimizer learning rate | |
| batch_size | [8, 16] | Training batch size | |
| lookback | [2, 6] | Input sequence length (months) | |
| ANFIS | n_mfs | [2, 4] | Number of membership functions |
| learning_rate | [0.01, 0.1] | Training learning rate | |
| lookback | [2, 6] | Input sequence length (months) | |
| NAR-NN | hidden_units | [5, 20] | Hidden-layer neurons |
| n_layers | [1, 2] | Number of hidden layers | |
| activation | {relu, tanh, sigmoid} | Activation function | |
| learning_rate | [0.01, 0.1] | Training learning rate | |
| lookback | [2, 6] | Input sequence length (months) |
| Series | Model | RMSE | MAE | MAPE | |
|---|---|---|---|---|---|
| Substation-1 | SVR | 21,599.48 | 17,662.95 | 2.33 | 0.9925 |
| XGBoost | 68,299.08 | 50,895.30 | 7.25 | 0.9254 | |
| LSTM | 155,191.60 ± 11,436.16 [138,025.37, 170,070.82] | 135,673.58 ± 11,586.49 [114,312.54, 149,836.81] | 18.18 ± 1.77 [15.00, 20.33] | 0.6126 ± 0.0567 [0.5372, 0.6952] | |
| ANFIS | 209,438.81 ± 63,972.16 [157,263.01, 310,265.45] | 176,343.84 ± 56,124.21 [133,731.33, 264,501.84] | 24.46 ± 8.00 [17.94, 37.08] | 0.2327 ± 0.4840 [−0.5402, 0.6043] | |
| NAR-NN | 157,985.13 ± 28,459.15 [116,653.59, 202,780.03] | 135,148.81 ± 24,940.76 [98,305.49, 173,593.29] | 17.80 ± 3.98 [11.98, 23.97] | 0.5877 ± 0.1463 [0.3421, 0.7823] | |
| Substation-2 | SVR | 16,865.56 | 11,560.38 | 14.30 | 0.9339 |
| XGBoost | 56,641.67 | 35,641.17 | 44.93 | 0.2543 | |
| LSTM | 79,534.98 ± 875.60 [78,241.21, 80,977.70] | 52,568.16 ± 1237.46 [50,675.46, 54,225.54] | 61.91 ± 2.54 [58.49, 65.29] | −0.4706 ± 0.0324 [−0.5242, −0.4229] | |
| ANFIS | 98,435.09 ± 38,844.81 [73,507.97, 176,563.45] | 78,505.43 ± 46,488.22 [44,804.84, 170,444.80] | 60.08 ± 6.56 [55.63, 73.25] | −1.6030 ± 2.2854 [−6.2463, −0.2560] | |
| NAR-NN | 91,594.20 ± 8872.72 [78,487.76, 105,433.78] | 75,760.94 ± 14,072.37 [50,409.93, 96,917.79] | 59.75 ± 1.59 [57.12, 62.56] | −0.9684 ± 0.3815 [−1.5839, −0.4319] | |
| Substation-3 | SVR | 8262.74 | 6178.73 | 1.46 | 0.9835 |
| XGBoost | 31,913.56 | 22,434.98 | 5.43 | 0.7533 | |
| LSTM | 58,542.59 ± 1122.29 [57,410.18, 60,551.14] | 37,234.43 ± 318.04 [36,550.83, 37,588.60] | 8.66 ± 0.18 [8.36, 8.87] | 0.1694 ± 0.0321 [0.1118, 0.2015] | |
| ANFIS | 65,135.65 ± 6644.51 [58,414.13, 82,150.90] | 46,170.16 ± 7842.64 [35,946.46, 56,964.77] | 10.95 ± 1.97 [8.27, 12.92] | −0.0385 ± 0.2249 [−0.6350, 0.1734] | |
| NAR-NN | 69,564.57 ± 12,615.23 [58,477.52, 102,970.07] | 51,671.38 ± 15,398.50 [40,400.17, 96,979.43] | 12.66 ± 4.59 [9.28, 26.26] | −0.2109 ± 0.4925 [−1.5687, 0.1716] | |
| Substation-4 | SVR | 33,346.45 | 22,771.74 | 9.43 | 0.8963 |
| XGBoost | 73,261.88 | 42,046.45 | 14.92 | 0.4993 | |
| LSTM | 100,030.24 ± 3147.34 [96,440.70, 106,937.19] | 76,155.37 ± 3888.11 [72,626.71, 84,118.12] | 31.31 ± 0.80 [30.43, 32.95] | 0.0655 ± 0.0598 [−0.0669, 0.1323] | |
| ANFIS | 136,129.45 ± 10,809.75 [117,779.85, 154,636.97] | 106,819.06 ± 13,769.74 [82,376.38, 129,388.28] | 39.57 ± 3.52 [33.48, 45.44] | −0.7398 ± 0.2747 [−1.2310, −0.2942] | |
| NAR-NN | 97,974.43 ± 7127.85 [90,328.29, 113,004.48] | 72,900.80 ± 10,556.17 [61,336.33, 92,456.26] | 30.52 ± 2.03 [28.33, 34.31] | 0.0997 ± 0.1344 [−0.1914, 0.2388] | |
| Substation-5 | SVR | 19,454.83 | 14,990.49 | 2.67 | 0.9579 |
| XGBoost | 39,697.80 | 30,124.83 | 5.79 | 0.8249 | |
| LSTM | 114,885.11 ± 4519.64 [105,424.16, 121,603.17] | 88,280.44 ± 5563.59 [75,818.18, 95,184.05] | 16.22 ± 0.91 [14.19, 17.36] | −0.4688 ± 0.1140 [−0.6430, −0.2349] | |
| ANFIS | 118,577.12 ± 22,955.94 [94,469.65, 171,229.28] | 88,801.44 ± 26,098.44 [62,359.06, 144,884.41] | 16.17 ± 4.46 [11.87, 25.74] | −0.6208 ± 0.6756 [−2.2577, 0.0084] | |
| NAR-NN | 126,552.50 ± 25,121.28 [95,403.72, 175,977.97] | 105,205.63 ± 28,547.49 [69,915.27, 161,802.63] | 19.80 ± 4.90 [14.08, 30.23] | −0.8496 ± 0.7394 [−2.4409, −0.0113] | |
| Carbon Footprint | SVR | 36,453.30 | 32,957.69 | 3.51 | 0.9420 |
| XGBoost | 89,872.04 | 63,528.38 | 6.71 | 0.6478 | |
| LSTM | 155,814.77 ± 11,389.00 [136,773.98, 179,909.05] | 135,883.70 ± 10,731.47 [118,432.45, 159,547.63] | 14.45 ± 1.18 [12.51, 17.01] | −0.0644 ± 0.1568 [−0.4115, 0.1842] | |
| ANFIS | 211,620.99 ± 87,646.75 [131,336.01, 376,235.93] | 189,984.87 ± 82,757.54 [114,000.12, 344,417.14] | 20.22 ± 8.96 [11.84, 36.94] | −1.2880 ± 1.9896 [−5.1732, 0.2478] | |
| NAR-NN | 130,322.23 ± 24,819.48 [103,474.85, 169,522.05] | 112,519.97 ± 23,610.53 [86,281.51, 149,907.69] | 11.73 ± 2.68 [8.70, 15.95] | 0.2325 ± 0.2961 [−0.2533, 0.5331] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Savaş, S.; Külahcı, K. Machine Learning-Based Energy Consumption and Carbon Footprint Forecasting in Urban Rail Transit Systems. Appl. Sci. 2026, 16, 1369. https://doi.org/10.3390/app16031369
Savaş S, Külahcı K. Machine Learning-Based Energy Consumption and Carbon Footprint Forecasting in Urban Rail Transit Systems. Applied Sciences. 2026; 16(3):1369. https://doi.org/10.3390/app16031369
Chicago/Turabian StyleSavaş, Sertaç, and Kamber Külahcı. 2026. "Machine Learning-Based Energy Consumption and Carbon Footprint Forecasting in Urban Rail Transit Systems" Applied Sciences 16, no. 3: 1369. https://doi.org/10.3390/app16031369
APA StyleSavaş, S., & Külahcı, K. (2026). Machine Learning-Based Energy Consumption and Carbon Footprint Forecasting in Urban Rail Transit Systems. Applied Sciences, 16(3), 1369. https://doi.org/10.3390/app16031369

