Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach
Highlights
- An XGBoost machine learning framework classifies bus delay severity at stops with high predictive accuracy.
- Meteorological and seasonal variables emerge as dominant predictors of delay severity, reflecting the influence of system-wide operating conditions.
- Delay classification serves as a spatial diagnostic tool for identifying and ranking reliability hotspots along the network.
- These hotspots provide a clear basis for prioritizing targeted infrastructure upgrades at specific bus stops and corridor segments.
Abstract
1. Introduction
2. Literature Review
3. Materials and Methods
3.1. Study Area
3.2. Data Collection
3.3. Feature Engineering and Delay Definition
3.4. Methodology
3.4.1. eXtreme Gradient Boosting—XGBoost
3.4.2. Model Training, Optimization, and Evaluation
4. Results and Discussion
4.1. Model Performance and Classification Accuracy
4.2. Feature Importance and Policy Implications
4.3. Spatial Identification of Delay Hotspots
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AVL | Automatic Vehicle Location |
| APC | Automated Passenger Counter |
| FCNN | Fully Connected Neural Network |
| GIS | Geographic Information Systems |
| GTFS-RT | General Transit Feed Specification Real-Time |
| ITS | Intelligent Transportation Systems |
| LSTM | Long Short-Term Memory |
| MAE | Mean Absolute Error |
| MAPE | Mean Absolute Percentage Error |
| MDARNN | Dual Attention Recurrent Neural Network |
| ML | Machine Learning |
| PCMCI | Peter Clark Momentary Conditional Independence |
| QoS | Quality of Service |
| RMSE | Root Mean Square Error |
| TSP | Transit Signal Priority |
| XGBoost | eXtreme Gradient Boosting |
References
- United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations: New York, NY, USA, 2015. [Google Scholar]
- United Nations, Department of Economic and Social Affairs, Population Division. World Urbanization Prospects: The 2018 Revision; United Nations: New York, NY, USA, 2018; Available online: https://population.un.org/wup/assets/WUP2018-Report.pdf (accessed on 20 April 2026).
- Le, T.P.L.; Trinh, T.A. Encouraging Public Transport Use to Reduce Traffic Congestion and Air Pollutant: A Case Study of Ho Chi Minh City, Vietnam. Procedia Eng. 2016, 142, 236–243. [Google Scholar] [CrossRef]
- Thabit, A.S.M.; Kerrache, C.A.; Calafate, C.T. A survey on monitoring and management techniques for road traffic congestion in vehicular networks. ICT Express 2024, 10, 1186–1198. [Google Scholar] [CrossRef]
- Amirgholy, M.; Shahabi, M.; Gao, H.O. Optimal design of sustainable transit systems in congested urban networks: A macroscopic approach. Transp. Res. Part E Logist. Transp. Rev. 2017, 103, 261–285. [Google Scholar] [CrossRef]
- Mavlutova, I.; Atstaja, D.; Grasis, J.; Kuzmina, J.; Uvarova, I.; Roga, D. Urban Transportation Concept and Sustainable Urban Mobility in Smart Cities: A Review. Energies 2023, 16, 3585. [Google Scholar] [CrossRef]
- Georgiadis, G.; Politis, I.; Verani, E.; Kopsacheilis, A.; Sdoukopoulos, A.; Fyrogenis, I. Public transport travel time perception: A comparative study of passenger estimates and actual bus trip durations. Sustain. Futures 2025, 9, 100590. [Google Scholar] [CrossRef]
- Hensher, D.A.; Stopher, P.; Bullock, P. Service quality—Developing a service quality index in the provision of commercial bus contracts. Transp. Res. Part A Policy Pract. 2003, 37, 499–517. [Google Scholar] [CrossRef]
- Olstam, J.; Häll, C.H.; Bhattacharyya, K.; Gebrehiwot, R. Traffic impacts of dynamic bus lanes: A simulation experiment of real-world bus operations. Eur. Transp. Res. Rev. 2025, 17, 10. [Google Scholar] [CrossRef]
- Yannis, G.; Chaziris, A. Transport System and Infrastructure. Transp. Res. Procedia 2022, 60, 6–11. [Google Scholar] [CrossRef]
- Kaewunruen, S.; Sresakoolchai, J.; Sun, H. Causal analysis of bus travel time reliability in Birmingham, UK. Results Eng. 2021, 12, 100280. [Google Scholar] [CrossRef]
- Shen, J.; Liu, Q.; Zhang, Y.; Yu, M. A novel model incorporating deep learning and Kalman filter augmentation for route-level bus arrival time prediction with error accumulation mitigation. Expert Syst. Appl. 2025, 281, 127622. [Google Scholar] [CrossRef]
- Lei, J.; Chen, Y.; Han, Q.; Zeng, L.; He, G. Effective Bus Travel Time Prediction System of Multiple Routes: Introducing PMLNet Based on MDARNN. Appl. Sci. 2025, 15, 8104. [Google Scholar] [CrossRef]
- Rashvand, N.; Hosseini, S.S.; Azarbayjani, M.; Tabkhi, H. Real-Time Bus Departure Prediction Using Neural Networks for Smart IoT Public Bus Transit. IoT 2024, 5, 650–665. [Google Scholar] [CrossRef]
- Ahmed, I.; Kumara, I.; Reshadat, V.; Kayes, A.S.M.; van den Heuvel, W.-J.; Tamburri, D.A. Travel Time Prediction and Explanation with Spatio-Temporal Features: A Comparative Study. Electronics 2021, 11, 106. [Google Scholar] [CrossRef]
- Zhu, L.; Shu, S.; Zou, L. XGBoost-Based Travel Time Prediction between Bus Stations and Analysis of Influencing Factors. Wirel. Commun. Mob. Comput. 2022, 2022, 3504704. [Google Scholar] [CrossRef]
- Munadi, R.; Ramadan, D.N.; Sussi; Fitriyanti, N.; Nuha, H.H. Ensemble Machine Learning Approach for Traffic Congestion and Travel Time Prediction in Urban Bus Rapid Transit Systems: A Case Study of Trans Metro Bandung. IoT 2026, 7, 22. [Google Scholar] [CrossRef]
- Balbin, P.P.F.; Barker, J.C.R.; Leung, C.K.; Tran, M.; Wall, R.P.; Cuzzocrea, A. Predictive analytics on open big data for supporting smart transportation services. Procedia Comput. Sci. 2020, 176, 3009–3018. [Google Scholar] [CrossRef] [PubMed]
- Aemmer, Z.; Ranjbari, A.; MacKenzie, D. Measurement and classification of transit delays using GTFS-RT data. Public Transp. 2022, 14, 263–285. [Google Scholar] [CrossRef]
- Camillo, F.S.; Martins, M.S.R. Machine Learning for Forecasting Public Transport Delays: A Case Study for Smart Cities Applications. In Proceedings of the 2025 16th IEEE International Conference on Industry Applications (INDUSCON), São Sebastião, Brazil, 14–17 October 2025; pp. 700–705. [Google Scholar] [CrossRef]
- Park, Y.; Mount, J.; Liu, L.; Xiao, N.; Miller, H.J. Assessing public transit performance using real-time data: Spatiotemporal patterns of bus operation delays in Columbus, Ohio, USA. Int. J. Geogr. Inf. Sci. 2020, 34, 367–392. [Google Scholar] [CrossRef]
- Zhang, Q.; Wang, W.; She, J.; Ma, Z. Understanding bus network delay propagation: Integration of causal inference and complex network theory. J. Transp. Geogr. 2025, 123, 104098. [Google Scholar] [CrossRef]
- Saarela, M.; Jauhiainen, S. Comparison of feature importance measures as explanations for classification models. SN Appl. Sci. 2021, 3, 272. [Google Scholar] [CrossRef]
- Tzanni, O.; Nikolaou, P.; Giannakopoulou, S.; Arvanitis, A.; Basbas, S. Social Dimensions of Spatial Justice in the Use of the Public Transport System in Thessaloniki, Greece. Land 2022, 11, 2032. [Google Scholar] [CrossRef]
- Open-Meteo. Free Weather Forecast API. Available online: https://open-meteo.com/ (accessed on 20 April 2026).
- Zoutendijk, M.; Mitici, M. Probabilistic Flight Delay Predictions Using Machine Learning and Applications to the Flight-to-Gate Assignment Problem. Aerospace 2021, 8, 152. [Google Scholar] [CrossRef]
- Wang, X.; Wang, Z.; Wan, L.; Tian, Y. Prediction of Flight Delays at Beijing Capital International Airport Based on Ensemble Methods. Appl. Sci. 2022, 12, 10621. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Chen, Z.; Fan, W. A Freeway Travel Time Prediction Method Based on an XGBoost Model. Sustainability 2021, 13, 8577. [Google Scholar] [CrossRef]
- Shi, R.; Xu, X.; Li, J.; Li, Y. Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl. Soft Comput. 2021, 109, 107538. [Google Scholar] [CrossRef]
- QGIS.org. QGIS Geographic Information System; QGIS Association, 2024. Available online: https://qgis.org/ (accessed on 20 April 2026).






| Category | Variable Name | Description | Data Type |
|---|---|---|---|
| Target Variable | Delay_Class | Categorical delay severity (0: Minor, 1: Moderate, 2: Severe) | Categorical |
| Temporal Features | Hour_sin | Sine transformation of the exact arrival hour | Continuous |
| Hour_cos | Cosine transformation of the exact arrival hour | Continuous | |
| Month | Month of the year | Categorical | |
| Is_Weekend | Binary indicator for weekends (1: Weekend, 0: Weekday) | Binary | |
| Spatial and Operational | Stop_ID | Unique identification code for each bus stop | Categorical |
| Stop_Sequence | The sequential rank of the stop along the route | Continuous | |
| Passenger_Load | Total passenger activity (sum of embarking and debarking passengers) | Continuous | |
| Meteorological | temp | Hourly temperature forecast (°C) | Continuous |
| precip | Hourly precipitation forecast (mm) | Continuous | |
| Cloud_cover | Hourly cloud cover forecast (%) | Continuous |
| Hyperparameter | Value |
|---|---|
| Evaluation Metric | merror |
| Learning Rate (eta) | 0.05 |
| Maximum Tree Depth | 6 |
| Subsample Ratio | 0.8 |
| Column Subsample | 0.8 |
| Maximum Iterations | 1000 |
| Early Stopping Guardrail | 50 rounds |
| Feature | Insight from GAIN | Policy Interpretation | Infrastructure Actions (Hotspots) |
|---|---|---|---|
| Month | Strong seasonal effect | Delays vary across seasons | Seasonal timetable adjustments at critical stops; passenger real-time information systems |
| Temperature | Dominant in outbound | Delays sensitive to temperature | Shaded stops; climate-resilient materials |
| Precipitation/Cloud | Major weather influence | Delays increase in bad weather | Shelters; drainage; protected waiting areas |
| Stop_Sequence | Delay accumulation | Bottlenecks along the route | Bus lanes; TSP; intersection improvements |
| Hour (sin/cos) | Moderate effect | Peak-time variation | Bus priority measures near key stops during peak hours |
| Inbound Direction (From Lagadas to City Center) | Outbound Direction (From City Center to Lagadas) | ||
|---|---|---|---|
| Stop Name | Severe Delay Ratio (%) | Stop Name | Severe Delay Ratio (%) |
| T.S.LAGKADA | 73.0 | T.S.PANEPISTIMIA | 69.5 |
| PARODOS_TZELILI | 76.5 | KAMARA | 71.5 |
| IKA | 76.0 | PLATEIA_ARISTOTELOUS | 68.2 |
| PLATEIA | 75.7 | ANTIGONIDON | 73.5 |
| KTEL | 78.2 | PLATEIA_DIMOKRATIAS | 74.2 |
| EXODOS_LAGKADA | 76.5 | AGIA_PARASKEVI | 73.3 |
| SUPER_MARKET | 77.7 | TAXIDROMEIO | 78.2 |
| DEI | 76.7 | STAVROUPOLI | 78.4 |
| STROFI_PERIVOLAKIOU | 76.0 | LAGINA | 59.2 |
| STROFI_KAVALARIOU | 77.6 | AGNO | 56.5 |
| STRATOPEDO | 76.6 | STRATOPEDO | 57.7 |
| AGNO | 79.2 | STROFI_KAVALARIOU | 58.5 |
| STAVROUPOLI | 62.2 | STROFI_PERIVOLAKIOU | 61.3 |
| TAXIDROMEIO | 62.3 | DEI | 56.6 |
| AGIA_PARASKEVI | 64.0 | SUPER_MARKET | 60.3 |
| PLATEIA_DIMOKRATIAS | 65.3 | EISODOS_LAGKADA | 60.4 |
| ANTIGONIDON | 65.3 | KTEL | 57.4 |
| PLATEIA_ARISTOTELOUS | 66.6 | PLATEIA | 57.1 |
| KAMARA | 67.4 | VASILEOS_ALEXANDROU | 58.9 |
| T.S.PANEPISTIMIA | 67.6 | GIMNASIO | 61.1 |
| PARODOS_TZELILI | 63.1 | ||
| T.S.LAGKADA | 61.8 | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Andreadis, I.M.; Georgiadis, G.; Politis, I. Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach. Smart Cities 2026, 9, 99. https://doi.org/10.3390/smartcities9060099
Andreadis IM, Georgiadis G, Politis I. Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach. Smart Cities. 2026; 9(6):99. https://doi.org/10.3390/smartcities9060099
Chicago/Turabian StyleAndreadis, Ioannis Marios, Georgios Georgiadis, and Ioannis Politis. 2026. "Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach" Smart Cities 9, no. 6: 99. https://doi.org/10.3390/smartcities9060099
APA StyleAndreadis, I. M., Georgiadis, G., & Politis, I. (2026). Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach. Smart Cities, 9(6), 99. https://doi.org/10.3390/smartcities9060099

