Next Article in Journal
Machine-Learning-Based Prediction of Gushing-Induced Ground Disturbance Around Shield Tunnels
Previous Article in Journal
Smart City Mobility Readiness in Thailand: A C.A.S.E. Framework Assessment of Connected, Autonomous, Shared, and Electric Transportation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach

by
Ioannis Marios Andreadis
,
Georgios Georgiadis
* and
Ioannis Politis
School of Civil Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece
*
Author to whom correspondence should be addressed.
Smart Cities 2026, 9(6), 99; https://doi.org/10.3390/smartcities9060099 (registering DOI)
Submission received: 22 April 2026 / Revised: 4 June 2026 / Accepted: 9 June 2026 / Published: 11 June 2026

Highlights

What are the main findings?
  • An XGBoost machine learning framework classifies bus delay severity at stops with high predictive accuracy.
  • Meteorological and seasonal variables emerge as dominant predictors of delay severity, reflecting the influence of system-wide operating conditions.
What are the implications of the main findings?
  • Delay classification serves as a spatial diagnostic tool for identifying and ranking reliability hotspots along the network.
  • These hotspots provide a clear basis for prioritizing targeted infrastructure upgrades at specific bus stops and corridor segments.

Abstract

Public transport reliability largely determines the performance of smart urban mobility systems, as it directly affects passenger satisfaction and network efficiency. However, the strategic planning of public transport infrastructure is often carried out without dynamic, data-driven insights into operational performance, instead relying solely on static historical records of network operations. This study develops a data-driven framework based on the XGBoost machine learning algorithm to support the prioritization of infrastructure interventions by predicting delay severity and identifying reliability hotspots along an urban bus route. Delay severity is categorized into three classes (minor, moderate, and severe), using a model that incorporates spatial, temporal, operational, and meteorological variables. The XGBoost framework achieves a high predictive performance, with classification accuracies of 91.5% and 89.7% for the outbound and inbound bus route directions, respectively. Feature importance analysis indicates that seasonal and meteorological variables are critical factors influencing delay severity, highlighting the role of broader external environmental conditions on corridor performance. Furthermore, spatial analysis identifies specific bus stops with high delay probabilities, indicating hotspots where infrastructure upgrades should be prioritized at the stop and corridor levels. This study proposes a decision-support tool that enables targeted infrastructure investments at locations where they are most needed, contributing to more efficient and resilient public transport systems in smart cities.

1. Introduction

Public transport is considered a key factor for the development of a sustainable and resilient city [1]. As urbanization grows, Europe’s level of urbanization is expected to increase to approximately 83.7% in 2050 [2], and as the number of private cars increases, urban centers around the world face congestion and traffic collisions [3,4]. In this context, public transport systems offer a vital solution to these problems, contributing significantly to the shaping of modern, smart, and sustainable cities [5,6]. To achieve this goal and reach a high level of ridership, they need to be sufficiently attractive to passengers. As previous studies confirm, the operational reliability of public transport systems, especially the predictability of waiting and travel times, is a key factor in passenger satisfaction [7,8]. However, translating the above particular need into a significant Quality of Service (QoS) factor requires substantial physical and operational infrastructure interventions. Implementing solutions such as dedicated bus lanes, Transit Signal Priority (TSP) systems, and upgraded passenger facilities requires important financial resources and complex spatial compromises in dense urban environments. Consequently, to maximize their operational impact and effectively improve system operation, strategic planners and transit authorities must ensure that these infrastructure upgrades are highly targeted, intervening precisely where the network looks vulnerable [9,10].
Despite this critical need, the strategic planning of transit infrastructure often relies only on static scheduling and historical averages. In practice, however, public transport operations are highly dynamic. Service reliability is often affected by several factors, such as roadworks, weather conditions, and public events [11]. Without data-driven insights into actual operational performance, transit authorities struggle to locate network hotspots. Inevitably, this reliance on static planning paradigms often leads to costly, poorly targeted infrastructure investments. Addressing this challenge requires analytical tools that are capable not only of predicting service reliability but also of translating operational performance into actionable planning insights. Such an approach can provide authorities with a more systematic basis for allocating resources and improving performance.
Against this background, the present study develops a multi-class XGBoost framework to classify delay severity and identify spatial hotspots along an urban bus route in Thessaloniki, Greece. The framework integrates spatial, temporal, operational, and meteorological variables and employs feature importance analysis to interpret the factors associated with delay formation. By combining delay classification with hotspot identification, the study aims to provide a data-driven decision-support tool for prioritizing targeted public transport infrastructure interventions.
The remainder of this article is organized as follows: Section 2 reviews relevant literature on machine learning applications in public transport reliability and delay classification. Section 3 details the materials and methods, including data collection, feature engineering, and the configuration of the multi-class XGBoost algorithm. Section 4 presents and discusses the results, evaluating the model’s classification accuracy, analyzing feature importance, spatially ranking transit delay hotspots, and outlining policy implications. Finally, Section 5 presents the study’s conclusions, identifies its limitations, and outlines directions for future research.

2. Literature Review

Recent advances in Intelligent Transportation Systems (ITS) and the widespread availability of General Transit Feed Specification Real-Time (GTFS-RT) data have accelerated the adoption of Machine Learning (ML) techniques in public transport, especially for travel time and arrival time prediction. In recent years, researchers worldwide have extensively used machine learning algorithms to predict transit delays, arrival times, and departure times. Shen et al. [12] developed and tested a model in Foshan, a megacity in China, that integrates an offline Long Short-Term Memory (LSTM) and attention mechanism with an online Kalman filter adjustment to mitigate cumulative error amplification at subsequent stops. Their approach reduced Mean Absolute Error by 38.59–79.33% and Root Mean Square Error by 34.89–81.41% during peak periods compared to seven state-of-the-art models. Lei et al. [13] proposed a novel method that groups the travel time into four distinct components, utilizing LSTM alongside a newly proposed Dual Attention Recurrent Neural Network (MDARNN) to predict each segment, and subsequently calibrates the predictions using four real-time traffic impact factors. Testing their algorithm on four bus routes of the Chongqing city network in China proved that their methodology significantly outperforms traditional combination frameworks, achieving a Mean Absolute Percentage Error (MAPE) as low as 2.91% and keeping the MAE below 1.45 min. Rashvand et al. [14] developed a Fully Connected Neural Network (FCNN) to process high-dimensional transit and meteorological data. Applying their model to a large-scale system of 151 bus routes in Boston, USA, they achieved an impressive accuracy of under 80 s in departure-time prediction.
In parallel with Deep Learning, tree-based ensemble methods have been rigorously evaluated, often emerging as the optimal predictive framework. Recent comparative studies highlight the predictive superiority of the eXtreme Gradient Boosting (XGBoost) algorithm. Ahmed et al. [15] evaluated multiple supervised algorithms across three large-scale datasets. Their comparative findings indicated that XGBoost and LightGBM consistently yield the highest accuracy and stability when compared to traditional Support Vector Regression and neural networks, with XGBoost achieving the highest R2 scores in explaining spatiotemporal travel time variations. Zhu et al. [16] applied an XGBoost-based model to predict travel times between bus stations, using a one-month dataset of bus operations in Guangzhou, China. By comparing it with KNN, LightGBM, and Back Propagation neural networks, they showed that XGBoost significantly outperforms these models, achieving the lowest MAPE of 11.96%.
As demonstrated by the studies mentioned earlier, the recent literature has focused on continuous regression models to identify the optimal algorithm for achieving the most accurate time prediction. However, while predicting exact arrival times in minutes is highly beneficial for the field of passenger information, it provides limited actionable insights for policymakers. Strategic planners and transit authorities require a broader understanding of the network to make informed, effective infrastructure decisions. In this context, shifting the analytical focus from continuous-time estimation to categorical performance evaluation (i.e., classification) provides a more practical framework for decision-making.
A limited but growing number of studies have begun exploring classification as a method for predicting the network’s hotspots. Munadi et al. [17] used tree-based ensemble machine learning models to analyze GPS data from a Bus Rapid Transit system in Bandung, Indonesia. While their methodology incorporated continuous travel time estimation, they also classified traffic congestion into discrete severity levels. Analyzing 20,156 spatial samples, their results showed that ensemble classifiers successfully categorize network conditions, achieving a classification accuracy of 96.8%. Similarly, Balbin et al. [18] proposed a predictive analytics framework that shifts away from exact-minute prediction, using decision-tree-based machine learning classification to categorize bus performance into discrete operational states (e.g., early, on-time, and late). Applying their model to an open big data repository of over 90 million transit logs from Winnipeg, Canada, they demonstrated that classifying delay states yields highly actionable insights to support smart transportation services with robust predictive accuracy. More recently, Aemmer et al. [19] classified transit performance from GTFS-RT feeds into systematic (consistently predictable) and stochastic (random) delay components on a segment-by-segment basis, demonstrating that such a classification can serve as a network-wide screening tool to locate where targeted treatments, such as dedicated bus lanes or transit signal priority, would be most effective. In the same vein, Camillo and Martins [20] framed bus-delay forecasting as a binary classification task (“delayed” versus “on-time”) for the network of Curitiba, Brazil, and notably incorporated the geographic coordinates of each stop as predictive features, allowing the model to implicitly capture route-specific congestion bottlenecks and achieving an F1-score of 0.88.
While these studies demonstrate the value of classification approaches for identifying problematic locations within public transport networks, they provide limited insight into how delays evolve and propagate spatially along a route. To address this issue, a parallel stream of research has focused on understanding how these localized delays propagate and cascade throughout transit corridors. For instance, Park et al. [21] used real-time data to demonstrate that delays initiated at critical core stops propagate downstream and disproportionately degrade on-time performance along a route. While recent literature has begun exploring these mechanisms from a broader network-level perspective, specifically using the PCMCI method to generate causal graphs and combining them with complex network theory to identify key bus stops and sub-communities [22], the focus has largely remained on understanding delay propagation rather than using predictive classification frameworks to identify reliability hotspots.
Consequently, there is currently a lack of studies that employ multi-class prediction to evaluate the network’s operational state and translate these outputs into spatial diagnostic tools for infrastructure planning. Predicting actual bus delays as multi-class severity levels to pinpoint localized spatial bottlenecks dynamically remains a significant research gap. To address this gap, the present study aims to develop a multi-class XGBoost framework to classify delay severity and identify spatial hotspots in the city of Thessaloniki, the second-largest metropolitan area in Greece. To interpret the model’s inner mechanics, this framework utilizes feature importance metrics as an explainable machine learning (XAI) tool, an approach recommended by recent literature to provide interpretable insights specifically for classification models [23].

3. Materials and Methods

3.1. Study Area

In this paper, we use the city of Thessaloniki, the second-largest metropolitan area in Greece, as a case study. Thessaloniki is characterized by a dense urban form and a highly centralized spatial structure. While most economic and social activities take place in the compact city center, a large share of people travel daily to the center, creating intense travel demand within a relatively small area [24]. For many years, the public transport system relied solely on a wide bus network comprising more than 70 fixed routes and approximately 2000 stops, covering distances up to 50 km from the center [7]. To upgrade its public transport system, Thessaloniki recently acquired its first underground metro line, which extends 9.6 km and has 13 stations. However, while this new infrastructure will improve the system’s service quality in the city center, its spatial footprint does not extend to the broader periphery. Consequently, commuters from the surrounding suburban areas continue to rely heavily on peri-urban bus services to access the urban core. In this study, we focus on the main route of Bus Line 83 (Lagadas–Thessaloniki). Line 83 is currently operated by the regional transit provider (KTEL) on behalf of the Organization of Urban Transportation of Thessaloniki (OASTH). It connects the suburban municipality of Lagadas directly to the city center. Spanning approximately 25 km per direction, it serves exactly 20 stops towards the city center and 22 stops towards Lagadas.

3.2. Data Collection

To investigate the schedule reliability of Line 83, empirical transit data were obtained from historical Automatic Vehicle Location (AVL) and Automated Passenger Counter (APC) logs, which contain a rich set of attributes, including operational identifiers (e.g., Route, Trip, and Direction IDs), precise temporal parameters (scheduled and actual measured arrival/departure time at each bus stop), and passenger load variables (number of embarking and debarking passengers at each bus stop). AVL and APC equipment are installed on buses scheduled on Line 83, and the records used in this study cover the entire year of 2024.
To ensure the spatial and operational consistency of the machine learning models, we cleaned the raw transit data. First, the dataset was filtered to retain only the transit logs that strictly corresponded to the main Express Line 83 (removing variations such as 83A, 83B, 83N, 83M), which directly connects the Lagadas suburb with the city center. In addition, meteorological data were incorporated into the analysis to account for the effects of weather conditions on bus delays. Hourly meteorological data, including ambient temperature (°C), precipitation (mm), and cloud cover (%), were retrieved from the Open-Meteo database [25]. It is essential to note that the retrieved data are historical hourly weather forecasts for the specific area. The meteorological data were synchronized with the corresponding bus transit data, establishing an initial baseline of 228,145 operational records.
Subsequently, entries containing missing or blank temporal values due to telemetry dropouts were removed. Furthermore, statistical outliers were filtered by removing records where the bus arrived more than 30 min earlier than scheduled, as well as cases where the arrival delay exceeded 2 h. Finally, to address directional variations in traffic dynamics and commuting patterns, the dataset was specifically partitioned by route direction: inbound (toward Thessaloniki’s city center) and outbound (toward Lagadas). The final dataset, after all data cleaning, filtering, and merging, consisted of 219,008 valid operational records for both directions.

3.3. Feature Engineering and Delay Definition

To transform the raw transit logs into a structured dataset suitable for predictive modeling, a comprehensive feature engineering process was executed. The primary step involved the mathematical formulation of the actual bus delay. This delay was calculated as the temporal discrepancy between the actual measured arrival time and the officially scheduled arrival time at each bus stop. Both timestamps were initially converted into continuous seconds elapsed since midnight. The resulting discrepancy was then converted into continuous minutes. To accurately account for instances where transit trips crossed midnight, a periodic correction factor of ±1440 min was mathematically applied. To ensure the robustness of the dataset, extreme temporal outliers, such as delays of over 120 min or early arrivals of over 30 min, were excluded from the model. Also, the final result was then categorized into three discrete levels of delay severity. The first level includes delays of 5 min or less (Class 0), the second includes delays of 5 to 15 min (Class 1), and the third includes severe delays of 15 min or more (Class 2).
To accommodate daily and seasonal variations, categorical variables, such as the month of the year, and a binary variable for weekends, were introduced. Furthermore, to maintain the inherent periodicity of the daily schedule, temporal variables were mathematically projected using trigonometric functions [26,27]. Specifically, the exact hour of the day ( h ) was mapped onto a two-dimensional circular space by calculating its sine and cosine components (i.e., sin ( 2 π h 24 ) and cos ( 2 π h 24 ) ). This continuous cyclic encoding guarantees that the algorithm accurately grasps the actual chronological closeness of late-night and early-morning hours (e.g., 23:00 and 01:00). As highlighted in recent predictive modeling research, substituting traditional linear or one-hot encoding frameworks with this trigonometric transformation yields superior forecasting accuracy while simultaneously preserving clear model interpretability [27].
In terms of spatial and operational dynamics, key variables included the specific stop identifier, the sequential rank of the stop along the route, and the total dynamic passenger load, calculated as the sum of embarking and debarking passengers. Ultimately, these variables, combined with the integrated meteorological parameters, formed the dataset utilized to predict the operational state of the transit system (Table 1). Because this framework is intended for strategic network diagnostics rather than real-time passenger information, it relies on historical archives in which all operational and passenger indicators are fully recorded after each event.

3.4. Methodology

3.4.1. eXtreme Gradient Boosting—XGBoost

For the predictive classification of bus delays at stops, this study utilizes the eXtreme Gradient Boosting (XGBoost) algorithm, a highly scalable, computationally efficient tree-boosting method proposed by Chen and Guestrin [28]. It is particularly suitable for transport networks because transit delays involve complex, non-linear relationships. The algorithm effectively captures such spatial and temporal interactions. Furthermore, it operates significantly faster than many other machine learning methods because this algorithm efficiently handles large datasets through parallel processing [29,30]. Finally, XGBoost calculates relative feature importance scores. This provides valuable insights for the prediction results, enabling researchers to identify the primary causes of bus delays [29,30]. In terms of algorithmic methodology, XGBoost is an advanced implementation of the gradient boosting framework, which constructs a set of decision trees, with each subsequent tree trained to predict and correct the errors of the preceding ones [29,30]. To achieve this, the algorithm seeks to minimize a specific objective function at each step. At iteration t , the objective function L ( t ) is defined as:
L ( t ) =   i = 1 n l y i , y ι ^ t 1 +   f t x i + Ω f t
The first term is the loss function l . It measures the difference between the actual delay class y i and the current prediction y ^ ι t 1 . The function f t x i represents the new tree added to improve the prediction.
The second term, Ω , is the regularization penalty. This specific feature separates XGBoost from traditional gradient boosting machines, and it is defined as:
Ω f = γ T +   1 2   λ   j = 1 T w j 2
T represents the number of leaves, and w represents the leaf weights. The parameters γ and λ penalize large trees and extreme weights. The purpose of this function is to control the tree’s complexity to prevent the algorithm from overfitting the training data [28,29,30].

3.4.2. Model Training, Optimization, and Evaluation

All computational modeling, parameter optimization, and evaluation procedures were executed within the R programming environment (v4.3.2) utilizing the xgboost (v1.7.5) package. Prior to modeling, categorical features (i.e., Month and Stop_ID) were transformed using one-hot encoding via sparse model matrices to optimize computational efficiency, while temporal features preserved their trigonometric cyclic structure. To ensure the absolute reproducibility of the data splitting and training processes, a fixed random seed was set to 123 before partitioning the dataset sequentially into a training set comprising 80% of the observations and an independent testing set containing the remaining 20%. To optimize multi-class classification of transit delays and prevent overfitting, specific XGBoost parameters were configured based on the literature and recent similar studies [28,29,30]. These hyperparameter values were selected within standard operational ranges and validated during preliminary testing, where the optimization process was securely guided by an early-stopping strategy to prevent overfitting.
The primary parameter controlling the model’s complexity is the maximum tree depth (max_depth). An excessively high value allows the model to learn highly specific patterns, often leading to overfitting the training data. Conversely, a very low value results in underfitting, as the model fails to capture the underlying relationships. Based on the dataset’s characteristics, the maximum depth was set to 6, a widely accepted threshold in classification models that provides sufficient complexity while maintaining strong generalization capabilities.
To further enhance the model’s robustness, the learning rate (eta) was defined. This parameter scales the contribution of each newly added tree, effectively shrinking the weights to prevent the algorithm from prematurely converging to suboptimal solutions. A learning rate of 0.05 was selected to ensure a gradual, stable learning process across boosting iterations.
Additionally, data subsampling was used to reduce the risk of overfitting. As recommended by Chen and Guestrin [28], the fraction of training instances (subsample) and the fraction of features (colsample_bytree) randomly evaluated to grow each tree were both set to 0.8. This ensures that individual trees remain correlated, producing a resilient and unbiased ensemble model. The overall training process was constrained to a maximum of 1000 boosting rounds, with an early-stopping mechanism at 50 rounds that automatically terminated training if the multi-class classification error (merror) on the test set stopped improving. Furthermore, no artificial resampling techniques were applied to address class imbalance, as the dataset reflects the native operational reality of the transit corridor and preliminary testing confirmed that the empirical class distribution did not negatively affect the overall prediction accuracy. Finally, the model’s predictive efficiency on the unseen test set was evaluated using overall Accuracy as the primary performance metric, supplemented by a Confusion Matrix to assess the classification accuracy across the three delay classes. The complete configuration of the XGBoost hyperparameters is presented in Table 2.

4. Results and Discussion

4.1. Model Performance and Classification Accuracy

This section presents the classification results of the XGBoost models for both the inbound (Lagadas to city center) and outbound (city center to Lagadas) directions. The outbound model achieved an overall classification accuracy of 91.5%, while the inbound model recorded an overall accuracy of 89.7%. To further evaluate the classification performance across the three delay severity levels (minor, moderate, severe), confusion matrices were generated for both directions (Figure 1 and Figure 2).
Analyzing the confusion matrices reveals a similar classification pattern for both routes. In both the inbound and outbound directions, the algorithm effectively identified both minor and severe delays. However, difficulty in distinguishing the moderate delay class is evident. Regarding the outbound direction, the algorithm struggled at the boundaries of minor delays, incorrectly predicting 682 actual moderate delays as minor and 282 actual minor delays as moderate. The same weakness is observed in the inbound direction, where the algorithm incorrectly classified 695 actual moderate delays as minor and 501 as severe. This difficulty in classifying moderate delays is expected, as the temporal difference between minor (<5 min) and moderate (5–15 min) delays is often very small in practice, making it difficult for the algorithm to separate the adjacent classes clearly.

4.2. Feature Importance and Policy Implications

To evaluate the factors associated with a higher number of bus delays, feature importance was calculated separately for each bus route direction (inbound and outbound) using the Gain metric. Gain quantifies the contribution of each feature to the reduction of the model’s loss function across all tree splits. Features with a relative importance of less than 1% were excluded from the analysis. The results are presented in Figure 3 and Figure 4.
These Figures show that seasonal and meteorological variables are the major contributors in both directions, collectively accounting for approximately 65% of the total gain. Notably, the dominant factor differs depending on the direction. In the inbound direction, seasonality (Month) is the most influential feature (44.22%), followed by temperature (temp) (22.21%). Conversely, in the outbound direction, temperature is the most influential feature (41.15%), with seasonality ranking second (23.85%).
Spatial and temporal features also contribute to the model, albeit to a lesser extent. The vehicle’s position along the route (Stop_Sequence) represents the most important secondary factor, contributing 6.51% and 8.12% in the inbound and outbound directions, respectively. Temporal features (Hour_sin and Hour_cos) and cloud cover (cloud_cover) contribute moderately, ranging from 4% to 7% in both directions.
Operational features display limited relative importance. Passenger activity (Passenger_Load) and the day-of-week indicator (Is_Weekend) rank at the lower end of the distributions, each contributing less than 4% to the total gain.
These feature importance distributions indicate a consistent pattern across both directions. Seasonal and meteorological variables primarily drive the model’s predictive performance. This suggests that variations in delay severity are more influenced by broader external conditions than by local operational factors. In fact, the Month and Temperature variables act more as proxies for broader system conditions, including variations in traffic demand (e.g., school periods, holidays, or seasonal travel patterns) and weather-related changes in roadway performance. More specifically, adverse weather conditions are possibly associated with more cautious driving behavior, reduced operating speeds, and increased stop-and-go traffic, all of which contribute to higher travel time variability. For instance, drivers may increase headways and braking distances during periods of rain or low visibility, leading to slower and less predictable traffic flow. Additionally, seasonal effects captured by the Month variable reflect recurring changes in congestion patterns and network usage, which influence bus operations independently of localized stop characteristics. Within this context, the comparatively lower importance of precipitation (precip) relative to Month and Temperature can be attributed to the pronounced sparsity of the variable, as precipitation values are zero for 90.8% of the observations in the dataset. Under such zero-inflated conditions, tree-based algorithms such as XGBoost tend to split preferentially on continuous, high-variance variables that capture broader macro-level trends. This behavior reduces the standalone feature importance score attributed to precipitation, notwithstanding its considerable operational impact during actual rainfall events.
Furthermore, spatial and temporal features contribute in a secondary but non-negligible manner. The relative importance of Stop_Sequence highlights the role of cumulative effects along the route, indicating that delays tend to propagate and intensify across specific segments of the corridor. Temporal encodings of the hour of the day further enhance the model’s explanatory capacity, reflecting recurring daily patterns in system performance, albeit with a lower impact than environmental factors.
In contrast, operational variables such as passenger activity and day-of-week indicators make only a limited contribution to the model’s loss reduction. This suggests that, within the examined bus route corridor, delays are not primarily caused by demand-related pressures but rather by exogenous conditions and network dynamics.
Considering these, the feature importance analysis highlights a dual structure of delay formation: a dominant system-wide component linked to environmental and seasonal variability and a secondary spatial component associated with vehicle progression along the route. Table 3 summarizes how the most influential features translate into targeted infrastructure interventions at high-delay locations.
The spatial prioritization of these interventions may be further specified through the spatial analysis of hotspots presented in the following sub-section.

4.3. Spatial Identification of Delay Hotspots

Building on the feature importance analysis, which identified the primary factors associated with delay formation and their policy implications, this section spatially identifies the locations where such interventions on public transport infrastructure should be prioritized. To convert the model predictions into tangible insights, the Severe Delay Ratio was calculated for each stop along Route 83. This indicator was derived as the ratio of predicted severe delays (delays exceeding 15 min) to the total number of predicted arrivals at each stop, multiplied by 100:
S e v e r e   D e l a y   R a t i o =   P r e d i c t e d   S e v e r e   D e l a y s T o t a l   P r e d i c t e d   A r r i v a l s   ( % )
This indicator enables a normalized comparison of reliability across bus stops, independently of the total number of observations. Using this indicator, hotspots, i.e., critical locations where the network performs comparatively worse, can be recognized. Table 4 presents the Severe Delay Ratio for each stop and enables ranking reliability hotspots along the route.
In the inbound direction, severe delays are highly concentrated at the initial stops in the Lagadas area, where the AGNO (79.2%) and KTEL (78.2%) stops are the primary hotspots. This result indicates that delays are already present at the route’s origin (potentially due to delays carried over from previous trips) and are subsequently propagated downstream. Once the vehicle enters the main arterial segment, the percentage of delays decreases noticeably. This improvement is likely due to the vehicle’s ability to operate at higher speeds and the absence of stops or signalized intersections along this segment of the route, resulting in a reduction to 62.2% (STAVROUPOLI). However, upon entering Thessaloniki’s urban center, reliability tends to deteriorate again. At the final stops of the route, the percentage of severe delays increases further to 67.6% due to dense traffic conditions in the city center.
In the outbound direction, the probability of severe delays is concentrated at the beginning of the route, during departure from the city center. The STAVROUPOLI and TAXIDROMEIO stops, located at the exit of the city of Thessaloniki, where delays accumulated from previous central stops are carried forward, recorded the highest percentages at 78.4% and 78.2%, respectively, while the first eight stops consistently exceed 68%. As the bus moves away from the city center and proceeds toward suburban areas, on-time performance gradually improves, with the ratio of severe delays decreasing to approximately 56–61% for the remainder of the route.
To facilitate clearer interpretation of these findings, the severe delay ratio for each stop was spatially visualized in the Geographic Information Systems (GIS) software QGIS 4.0 [31]. The spatial distribution maps for the outbound and inbound routes are shown in Figure 5 and Figure 6, respectively.
This spatial analysis confirms that delay patterns are not uniformly distributed along the route but are instead concentrated at specific locations, particularly at terminal areas and major urban entry points. These findings reinforce the feature importance results, indicating that while delays are influenced by system-wide factors, they manifest spatially in identifiable hotspots. This enables prioritizing targeted infrastructure interventions at specific bus stops and corridor segments, thereby enhancing the effectiveness of planning strategies.

5. Conclusions

The findings of this study demonstrate the strong potential of supervised machine learning methods for analyzing public transport reliability and supporting decision-making for tailored infrastructure investments. The robust performance of the XGBoost models, achieving accuracies close to 90% for both bus route directions, confirms the effectiveness of predicting and classifying delay severity using real-world operational data. While a limited number of recent studies have begun to integrate meteorological variables into transport reliability models, our research highlights their critical importance.
As the feature importance analysis demonstrates, meteorological and seasonal factors play a dominant role, indicating that delay patterns are largely influenced by external conditions rather than by spatial and operational factors alone. At the same time, the spatial analysis shows that these effects are not uniformly distributed across the route but are instead concentrated at specific locations. Severe delays are most often observed at bus stops at route origins, transition zones, and dense urban segments, indicating hotspots where operational conditions and traffic interactions are more challenging, and where targeted infrastructure interventions can be prioritized for implementation.
From a policy perspective, the proposed framework enables a shift from generic planning approaches to evidence-based interventions. While delays are strongly influenced by exogenous factors, such as weather conditions and seasonality, their impacts materialize at identifiable bus stops and corridor segments. As a result, bus stops and their adjacent infrastructure emerge as the most practical units for intervention. Locations identified as reliability hotspots can be prioritized for infrastructure measures such as dedicated bus lanes, TSP, and intersection improvements, as well as for the implementation of climate-resilient shelters and real-time passenger information systems. Such measures would improve passenger experience during periods when external conditions negatively affect network operations. This integrated approach enables transport authorities to comprehensively address both the systemic and spatial dimensions of reliability.
Nevertheless, the present study has certain limitations that suggest directions for future research. The proposed framework relies on historical operational data and meteorological forecasts, which do not fully capture real-time traffic conditions or unexpected disruptions. Future research should integrate live meteorological inputs and real-time traffic flow data sourced from widely used navigation applications (such as Google Traffic or Waze) to enhance short-term predictive accuracy, improve operational responsiveness, and better capture unexpected network disruptions. In addition, a recognized limitation of this study is its empirical application to a single operational bus line, which may restrict the immediate generalizability of the localized feature weights. Nevertheless, the core scientific contribution lies in the development of a systematic and highly reproducible workflow. Since the framework relies strictly on universally standardized telematics and meteorological data streams, it provides a practical blueprint that can be easily replicated on other routes or in different cities with similar operational characteristics. Extending the analysis from a single corridor to the entire metropolitan network remains a clear direction for future research to support a network-wide investment strategy. and improve the overall resilience and performance of public transport systems in smart cities.

Author Contributions

Conceptualization, I.M.A. and G.G.; methodology, I.M.A.; software, I.M.A.; validation, G.G. and I.P.; formal analysis, G.G. and I.P.; investigation, I.M.A.; resources, I.P.; data curation, I.M.A.; writing—original draft preparation, I.M.A.; writing—review and editing, G.G. and I.P.; visualization, I.M.A.; supervision, G.G.; project administration, I.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Part of the data used in this study is publicly available from the website of the Transport Authority of Thessaloniki (OSETH). Additional data were obtained within the framework of the RECREATE project, “smaRt ECosystem foR improvEment of public trAnsporT pErformance” (Project code: ΚΜΡ6-0284565), and are not publicly available due to data usage agreements and privacy restrictions.

Acknowledgments

The authors would like to acknowledge the Transport Authority of Thessaloniki (OSETH) for providing access to data through its online database, as well as the RECREATE project, “smaRt ECosystem foR improvEment of public trAnsporT pErformance” (Project code: ΚΜΡ6-0284565), implemented under the framework of the Action “Investment Plans of Innovation” of the Operational Program “Central Macedonia 2014–2020”, co-funded by the European Regional Development Fund and Greece. During the preparation of this manuscript, the authors used Gemini 3.5 for text formatting. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AVLAutomatic Vehicle Location
APCAutomated Passenger Counter
FCNNFully Connected Neural Network
GISGeographic Information Systems
GTFS-RTGeneral Transit Feed Specification Real-Time
ITSIntelligent Transportation Systems
LSTMLong Short-Term Memory
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MDARNNDual Attention Recurrent Neural Network
MLMachine Learning
PCMCIPeter Clark Momentary Conditional Independence
QoSQuality of Service
RMSERoot Mean Square Error
TSPTransit Signal Priority
XGBoosteXtreme Gradient Boosting

References

  1. United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations: New York, NY, USA, 2015. [Google Scholar]
  2. United Nations, Department of Economic and Social Affairs, Population Division. World Urbanization Prospects: The 2018 Revision; United Nations: New York, NY, USA, 2018; Available online: https://population.un.org/wup/assets/WUP2018-Report.pdf (accessed on 20 April 2026).
  3. Le, T.P.L.; Trinh, T.A. Encouraging Public Transport Use to Reduce Traffic Congestion and Air Pollutant: A Case Study of Ho Chi Minh City, Vietnam. Procedia Eng. 2016, 142, 236–243. [Google Scholar] [CrossRef]
  4. Thabit, A.S.M.; Kerrache, C.A.; Calafate, C.T. A survey on monitoring and management techniques for road traffic congestion in vehicular networks. ICT Express 2024, 10, 1186–1198. [Google Scholar] [CrossRef]
  5. Amirgholy, M.; Shahabi, M.; Gao, H.O. Optimal design of sustainable transit systems in congested urban networks: A macroscopic approach. Transp. Res. Part E Logist. Transp. Rev. 2017, 103, 261–285. [Google Scholar] [CrossRef]
  6. Mavlutova, I.; Atstaja, D.; Grasis, J.; Kuzmina, J.; Uvarova, I.; Roga, D. Urban Transportation Concept and Sustainable Urban Mobility in Smart Cities: A Review. Energies 2023, 16, 3585. [Google Scholar] [CrossRef]
  7. Georgiadis, G.; Politis, I.; Verani, E.; Kopsacheilis, A.; Sdoukopoulos, A.; Fyrogenis, I. Public transport travel time perception: A comparative study of passenger estimates and actual bus trip durations. Sustain. Futures 2025, 9, 100590. [Google Scholar] [CrossRef]
  8. Hensher, D.A.; Stopher, P.; Bullock, P. Service quality—Developing a service quality index in the provision of commercial bus contracts. Transp. Res. Part A Policy Pract. 2003, 37, 499–517. [Google Scholar] [CrossRef]
  9. Olstam, J.; Häll, C.H.; Bhattacharyya, K.; Gebrehiwot, R. Traffic impacts of dynamic bus lanes: A simulation experiment of real-world bus operations. Eur. Transp. Res. Rev. 2025, 17, 10. [Google Scholar] [CrossRef]
  10. Yannis, G.; Chaziris, A. Transport System and Infrastructure. Transp. Res. Procedia 2022, 60, 6–11. [Google Scholar] [CrossRef]
  11. Kaewunruen, S.; Sresakoolchai, J.; Sun, H. Causal analysis of bus travel time reliability in Birmingham, UK. Results Eng. 2021, 12, 100280. [Google Scholar] [CrossRef]
  12. Shen, J.; Liu, Q.; Zhang, Y.; Yu, M. A novel model incorporating deep learning and Kalman filter augmentation for route-level bus arrival time prediction with error accumulation mitigation. Expert Syst. Appl. 2025, 281, 127622. [Google Scholar] [CrossRef]
  13. Lei, J.; Chen, Y.; Han, Q.; Zeng, L.; He, G. Effective Bus Travel Time Prediction System of Multiple Routes: Introducing PMLNet Based on MDARNN. Appl. Sci. 2025, 15, 8104. [Google Scholar] [CrossRef]
  14. Rashvand, N.; Hosseini, S.S.; Azarbayjani, M.; Tabkhi, H. Real-Time Bus Departure Prediction Using Neural Networks for Smart IoT Public Bus Transit. IoT 2024, 5, 650–665. [Google Scholar] [CrossRef]
  15. Ahmed, I.; Kumara, I.; Reshadat, V.; Kayes, A.S.M.; van den Heuvel, W.-J.; Tamburri, D.A. Travel Time Prediction and Explanation with Spatio-Temporal Features: A Comparative Study. Electronics 2021, 11, 106. [Google Scholar] [CrossRef]
  16. Zhu, L.; Shu, S.; Zou, L. XGBoost-Based Travel Time Prediction between Bus Stations and Analysis of Influencing Factors. Wirel. Commun. Mob. Comput. 2022, 2022, 3504704. [Google Scholar] [CrossRef]
  17. Munadi, R.; Ramadan, D.N.; Sussi; Fitriyanti, N.; Nuha, H.H. Ensemble Machine Learning Approach for Traffic Congestion and Travel Time Prediction in Urban Bus Rapid Transit Systems: A Case Study of Trans Metro Bandung. IoT 2026, 7, 22. [Google Scholar] [CrossRef]
  18. Balbin, P.P.F.; Barker, J.C.R.; Leung, C.K.; Tran, M.; Wall, R.P.; Cuzzocrea, A. Predictive analytics on open big data for supporting smart transportation services. Procedia Comput. Sci. 2020, 176, 3009–3018. [Google Scholar] [CrossRef] [PubMed]
  19. Aemmer, Z.; Ranjbari, A.; MacKenzie, D. Measurement and classification of transit delays using GTFS-RT data. Public Transp. 2022, 14, 263–285. [Google Scholar] [CrossRef]
  20. Camillo, F.S.; Martins, M.S.R. Machine Learning for Forecasting Public Transport Delays: A Case Study for Smart Cities Applications. In Proceedings of the 2025 16th IEEE International Conference on Industry Applications (INDUSCON), São Sebastião, Brazil, 14–17 October 2025; pp. 700–705. [Google Scholar] [CrossRef]
  21. Park, Y.; Mount, J.; Liu, L.; Xiao, N.; Miller, H.J. Assessing public transit performance using real-time data: Spatiotemporal patterns of bus operation delays in Columbus, Ohio, USA. Int. J. Geogr. Inf. Sci. 2020, 34, 367–392. [Google Scholar] [CrossRef]
  22. Zhang, Q.; Wang, W.; She, J.; Ma, Z. Understanding bus network delay propagation: Integration of causal inference and complex network theory. J. Transp. Geogr. 2025, 123, 104098. [Google Scholar] [CrossRef]
  23. Saarela, M.; Jauhiainen, S. Comparison of feature importance measures as explanations for classification models. SN Appl. Sci. 2021, 3, 272. [Google Scholar] [CrossRef]
  24. Tzanni, O.; Nikolaou, P.; Giannakopoulou, S.; Arvanitis, A.; Basbas, S. Social Dimensions of Spatial Justice in the Use of the Public Transport System in Thessaloniki, Greece. Land 2022, 11, 2032. [Google Scholar] [CrossRef]
  25. Open-Meteo. Free Weather Forecast API. Available online: https://open-meteo.com/ (accessed on 20 April 2026).
  26. Zoutendijk, M.; Mitici, M. Probabilistic Flight Delay Predictions Using Machine Learning and Applications to the Flight-to-Gate Assignment Problem. Aerospace 2021, 8, 152. [Google Scholar] [CrossRef]
  27. Wang, X.; Wang, Z.; Wan, L.; Tian, Y. Prediction of Flight Delays at Beijing Capital International Airport Based on Ensemble Methods. Appl. Sci. 2022, 12, 10621. [Google Scholar] [CrossRef]
  28. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  29. Chen, Z.; Fan, W. A Freeway Travel Time Prediction Method Based on an XGBoost Model. Sustainability 2021, 13, 8577. [Google Scholar] [CrossRef]
  30. Shi, R.; Xu, X.; Li, J.; Li, Y. Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl. Soft Comput. 2021, 109, 107538. [Google Scholar] [CrossRef]
  31. QGIS.org. QGIS Geographic Information System; QGIS Association, 2024. Available online: https://qgis.org/ (accessed on 20 April 2026).
Figure 1. Confusion Matrix—Outbound (City Center to Lagadas).
Figure 1. Confusion Matrix—Outbound (City Center to Lagadas).
Smartcities 09 00099 g001
Figure 2. Confusion Matrix—Inbound (Lagadas to City Center).
Figure 2. Confusion Matrix—Inbound (Lagadas to City Center).
Smartcities 09 00099 g002
Figure 3. Feature Importance—Outbound (City Center to Lagadas).
Figure 3. Feature Importance—Outbound (City Center to Lagadas).
Smartcities 09 00099 g003
Figure 4. Feature Importance—Inbound (Lagadas to City Center).
Figure 4. Feature Importance—Inbound (Lagadas to City Center).
Smartcities 09 00099 g004
Figure 5. Severe Delay Ratio Map—Outbound Direction. The blue line represents Route 83 and the colored dots represent bus stops. Colors indicate the severe delay ratio at each stop. Place names displayed on the basemap are presented in Greek and are included solely for geographic reference.
Figure 5. Severe Delay Ratio Map—Outbound Direction. The blue line represents Route 83 and the colored dots represent bus stops. Colors indicate the severe delay ratio at each stop. Place names displayed on the basemap are presented in Greek and are included solely for geographic reference.
Smartcities 09 00099 g005
Figure 6. Severe Delay Ratio Map—Inbound Direction. The blue line represents Route 83 and the colored dots represent bus stops. Colors indicate the severe delay ratio at each stop. Place names displayed on the basemap are presented in Greek and are included solely for geographic reference.
Figure 6. Severe Delay Ratio Map—Inbound Direction. The blue line represents Route 83 and the colored dots represent bus stops. Colors indicate the severe delay ratio at each stop. Place names displayed on the basemap are presented in Greek and are included solely for geographic reference.
Smartcities 09 00099 g006
Table 1. Summary of the basic information on the variables used for this study.
Table 1. Summary of the basic information on the variables used for this study.
Category Variable Name DescriptionData Type
Target VariableDelay_ClassCategorical delay severity
(0: Minor, 1: Moderate, 2: Severe)
Categorical
Temporal FeaturesHour_sinSine transformation of the exact arrival hourContinuous
Hour_cosCosine transformation of the exact arrival hourContinuous
MonthMonth of the yearCategorical
Is_WeekendBinary indicator for weekends (1: Weekend, 0: Weekday)Binary
Spatial and OperationalStop_IDUnique identification code for each bus stopCategorical
Stop_SequenceThe sequential rank of the stop along the routeContinuous
Passenger_LoadTotal passenger activity
(sum of embarking and debarking passengers)
Continuous
MeteorologicaltempHourly temperature forecast (°C)Continuous
precipHourly precipitation forecast (mm)Continuous
Cloud_coverHourly cloud cover forecast (%)Continuous
Table 2. Configuration of the XGBoost hyperparameters used in the model.
Table 2. Configuration of the XGBoost hyperparameters used in the model.
HyperparameterValue
Evaluation Metricmerror
Learning Rate (eta)0.05
Maximum Tree Depth6
Subsample Ratio0.8
Column Subsample0.8
Maximum Iterations1000
Early Stopping Guardrail50 rounds
Table 3. Policy implications of key feature importance (GAIN) results for public transport infrastructure interventions.
Table 3. Policy implications of key feature importance (GAIN) results for public transport infrastructure interventions.
FeatureInsight from GAINPolicy InterpretationInfrastructure Actions (Hotspots)
MonthStrong seasonal effectDelays vary across seasonsSeasonal timetable adjustments at critical stops; passenger real-time information systems
TemperatureDominant in
outbound
Delays sensitive to temperatureShaded stops; climate-resilient
materials
Precipitation/CloudMajor weather
influence
Delays increase in bad weatherShelters; drainage;
protected waiting areas
Stop_SequenceDelay accumulationBottlenecks along the routeBus lanes; TSP; intersection improvements
Hour (sin/cos)Moderate effectPeak-time variationBus priority measures near key stops during peak hours
Table 4. Severe Delay Ratio of each bus stop.
Table 4. Severe Delay Ratio of each bus stop.
Inbound Direction (From Lagadas to City Center) Outbound Direction (From City Center to Lagadas)
Stop NameSevere Delay Ratio (%)Stop NameSevere Delay Ratio (%)
T.S.LAGKADA73.0T.S.PANEPISTIMIA69.5
PARODOS_TZELILI76.5KAMARA71.5
IKA76.0PLATEIA_ARISTOTELOUS68.2
PLATEIA75.7ANTIGONIDON73.5
KTEL78.2PLATEIA_DIMOKRATIAS74.2
EXODOS_LAGKADA76.5AGIA_PARASKEVI73.3
SUPER_MARKET77.7TAXIDROMEIO78.2
DEI76.7STAVROUPOLI78.4
STROFI_PERIVOLAKIOU76.0LAGINA59.2
STROFI_KAVALARIOU77.6AGNO56.5
STRATOPEDO76.6STRATOPEDO57.7
AGNO79.2STROFI_KAVALARIOU58.5
STAVROUPOLI62.2STROFI_PERIVOLAKIOU61.3
TAXIDROMEIO62.3DEI56.6
AGIA_PARASKEVI64.0SUPER_MARKET60.3
PLATEIA_DIMOKRATIAS65.3EISODOS_LAGKADA60.4
ANTIGONIDON65.3KTEL57.4
PLATEIA_ARISTOTELOUS66.6PLATEIA57.1
KAMARA67.4VASILEOS_ALEXANDROU58.9
T.S.PANEPISTIMIA67.6GIMNASIO61.1
PARODOS_TZELILI63.1
T.S.LAGKADA61.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Andreadis, I.M.; Georgiadis, G.; Politis, I. Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach. Smart Cities 2026, 9, 99. https://doi.org/10.3390/smartcities9060099

AMA Style

Andreadis IM, Georgiadis G, Politis I. Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach. Smart Cities. 2026; 9(6):99. https://doi.org/10.3390/smartcities9060099

Chicago/Turabian Style

Andreadis, Ioannis Marios, Georgios Georgiadis, and Ioannis Politis. 2026. "Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach" Smart Cities 9, no. 6: 99. https://doi.org/10.3390/smartcities9060099

APA Style

Andreadis, I. M., Georgiadis, G., & Politis, I. (2026). Optimizing Public Transport Infrastructure Through AI-Driven Reliability Prediction: A Data-Driven Approach. Smart Cities, 9(6), 99. https://doi.org/10.3390/smartcities9060099

Article Metrics

Back to TopTop