1. Introduction
Urban mobility systems face mounting pressure to address safety, efficiency, and sustainability amid rapid urbanization and climate change. Public transport plays a crucial role in reducing congestion and emissions while promoting equitable access to urban services. However, densely populated cities often grapple with recurring safety risks—particularly in bus and tram networks—undermining operational reliability and passenger confidence.
Artificial intelligence (AI) offers new capabilities for real-time risk detection, predictive analytics, and data-driven intervention in complex transport systems. By integrating machine learning, computer vision, and sensor-based diagnostics, AI can support initiative-taking safety management and enhance service quality. Yet, many cities—especially in transitional economies—struggle to embed these technologies within legacy infrastructure due to technical and institutional barriers [
1,
2].
In Belgrade, Serbia’s capital, the public transport system serves hundreds of thousands daily but remains constrained by aging fleets, incident recurrence, and limited monitoring capacity. Traditional safety practices—centered on manual reports or post-incident review—lack the responsiveness needed for effective prevention. This underscores the value of AI-enhanced solutions to support urban transport resilience.
Recent studies have demonstrated that AI systems integrating onboard telemetry, video surveillance, and diagnostic data can detect high-risk behaviors such as harsh braking and speeding—early indicators of potential accidents. However, empirical research on the deployment of such systems in Southeastern Europe remains scarce.
Improving operational safety in public transport delivers broader sustainability gains through three reinforcing channels. Health improvements are various; fewer crashes and hazardous maneuvers reduce injuries to passengers, road users, and staff, while perceived safety encourages use among vulnerable groups—advancing equity and well-being (SDG 3). When it comes to reliability, preventing safety-related incidents stabilizes headways, shortens disruption and recovery times, and reduces variance in door-to-door travel, which strengthens service dependability and user trust. Emissions co-benefits include safer, smoother driving (fewer harsh braking/acceleration events, consistent speed within limits, and adherence to planned routes) and reduced energy consumption and tailpipe emissions; incident avoidance also prevents congestion waves, deadheading, and towing/recovery movements. Together, these pathways position safety not only as an ethical and regulatory imperative but as a practical lever for achieving sustainable urban mobility (SDG 11). In this study, we operationalize these links by targeting behaviors and anomalies—e.g., harsh braking, speeding, and route deviation—that simultaneously affect injury risk, schedule stability, and vehicle energy use, and we assess how mitigating them contributes to system-level sustainability outcomes.
Beyond aligning with SDG 3 and SDG 11, we situate this study within contemporary AI governance models that emphasize risk-based oversight, human control, transparency, and continuous monitoring (e.g., OECD AI Principles, the NIST AI Risk Management Framework, and the EU’s risk-based approach). In public-transport safety analytics, these frameworks imply proportionate controls across the model lifecycle, clear accountability among the operator, vendor, and authority, and safeguards that minimize unintended behavioral effects. Accordingly, the Belgrade pilot was designed with purpose limitation (safety and operational reliability), human-in-the-loop oversight via dispatcher-mediated alerts rather than in-cab prompts, non-punitive coaching based on post-shift summaries, auditable thresholds for event flags, and privacy-conscious logging with defined retention. This governance lens treats safety analytics as a socio-technical system—where algorithms, procedures, and organizational practices jointly determine outcomes—thereby connecting technical performance to institutional responsibility and public trust.
This study addresses that gap by evaluating the implementation of an AI-enabled safety monitoring system in Belgrade’s public transport network. It aims to explore the real-world impact of AI on transport safety and its alignment with sustainable development goals.
The study is guided by the following three research questions.
RQ1: How can AI technologies improve real-time traffic safety monitoring and incident detection in closed public transport systems?
RQ2: What are the measurable impacts and potential limitations of AI-driven safety systems in sustainable urban mobility contexts?
RQ3: How can cities like Belgrade leverage AI to align public transport safety with SDG 11 (sustainable cities) and SDG 9 (resilient infrastructure)?
The remainder of the paper is structured as follows:
Section 2 reviews the literature on AI in urban transport safety.
Section 3 outlines the methodological framework and system architecture.
Section 4 presents empirical findings.
Section 5 discusses implications for sustainability, ethics, and policy.
Section 6 concludes with recommendations for practice and further research.
2. Literature Review
Artificial intelligence (AI) has become a cornerstone technology in modern urban planning and smart mobility systems, enabling cities to better manage transport efficiency, environmental impact, and public safety. For public transportation systems operating in complex and dynamic urban environments, the integration of AI provides a strategic opportunity to improve operational resilience, ensure passenger safety, and align with long-term sustainability goals.
This literature review synthesizes key theoretical and empirical contributions from the fields of intelligent transportation systems, urban safety management, AI in public infrastructure, and sustainable development. It aims to establish a conceptual foundation for understanding the deployment of AI-driven safety systems within public transport networks in emerging economies such as Serbia.
2.1. Artificial Intelligence and Safety Optimization in Public Transport Systems
AI technologies have shown significant potential to transform traditional public transportation systems by enabling real-time decision-making, automation, and predictive analysis. These advancements are especially relevant in the domain of safety, where timely detection of risk indicators can prevent incidents and reduce human error [
3,
4]. Applications such as computer vision, sensor fusion, and deep learning allow for automated monitoring of vehicle dynamics, passenger behavior, and environmental conditions.
Numerous studies have highlighted the effectiveness of AI in incident detection and prevention across transit environments. For instance, convolutional neural networks (CNNs) have been used for onboard surveillance and anomaly detection, while time-series models enable predictive maintenance and early warning systems for technical failures [
5]. By replacing reactive safety procedures with initiative-taking risk management, AI solutions offer the ability to continuously monitor vast transportation networks with prominent levels of accuracy and scalability.
However, AI integration in public transport safety is often constrained by legacy systems, fragmented data sources, and organizational resistance. Public transport operators must address challenges related to data governance, interoperability, and the ethical deployment of surveillance technologies. In many emerging cities, these limitations are exacerbated by infrastructural and institutional gaps that hinder the deployment of smart mobility solutions [
6].
2.2. Smart Mobility, Urban Resilience, and AI-Enabled Sustainability
Urban transportation systems are central to achieving multiple targets of the United Nations Sustainable Development Goals (SDGs), particularly SDG 11 (Sustainable Cities and Communities) and SDG 9 (Industry, Innovation, and Infrastructure). Within this context, AI-enabled smart mobility systems are recognized for their capacity to reduce greenhouse gas emissions, improve accessibility, and enhance safety through real-time responsiveness and optimization [
7].
Recent literature on sustainable urban transport emphasizes the importance of integrating intelligent safety systems into a broader policy and planning framework that accounts for equity, resilience, and inclusiveness [
8,
9]. AI technologies—when aligned with urban sustainability strategies—can support modal shifts from private to public transport, reduce congestion-related emissions, and increase system-level reliability. Examples include adaptive signal control systems, AI-powered route optimization, and predictive analytics for passenger flow management [
10].
Despite this promise, few studies have specifically addressed how AI can be used to improve safety performance within closed public transport systems such as those in Belgrade. Existing smart city initiatives often prioritize efficiency and carbon reduction but give limited attention to safety-centric AI applications in municipal bus, tram, and trolleybus networks.
While empirical research on AI-enabled safety systems in Southeastern Europe remains limited, comparative insights from other transition economies can offer valuable lessons. For instance, in Poland, smart mobility pilots in Warsaw and Kraków have leveraged EU-funded programs to integrate AI in traffic control and accident prevention systems, demonstrating how structured funding ecosystems can accelerate digital transformation. Similarly, Turkey has implemented machine learning algorithms in Istanbul’s public transport network for congestion prediction and fleet optimization, with measurable improvements in travel time reliability. In Romania, Cluj-Napoca has assessed AI-supported predictive maintenance in its electric bus system, showing promise in reducing technical faults and operational delays. These cases underscore the importance of institutional collaboration, digital infrastructure investment, and external technical partnerships in enabling effective AI adoption in resource-constrained urban environments. They also highlight the need for context-specific adaptation strategies—particularly in aligning AI systems with local governance models, urban planning goals, and public trust dynamics [
11].
2.3. Ethical, Legal, and Operational Considerations for AI in Transport Safety
The adoption of AI in public safety contexts brings to the forefront critical discussions around surveillance ethics, data privacy, algorithmic bias, and legal accountability. AI systems deployed in transport often rely on video surveillance, biometric tracking, or real-time behavioral analysis—all of which raise concerns regarding individual rights and data governance [
12].
From an operational perspective, the literature highlights the need for governance frameworks that regulate AI deployment in ways that balance innovation with accountability. Public authorities are urged to implement transparent data handling protocols, conduct risk assessments, and ensure stakeholder involvement in system design. In emerging markets, where public trust in technology adoption may be fragile, ensuring compliance with privacy and ethical norms is not only a legal imperative but also a condition for long-term system viability [
13].
Equally important is the question of operational scalability and reliability. Studies suggest that the effectiveness of AI-driven transport safety systems depends on the quality and granularity of input data, the contextual adaptation of algorithms, and the robustness of real-time processing infrastructure. Without addressing these factors, AI implementations risk delivering suboptimal or biased outcomes [
14].
In addition to technical integration issues, the deployment of AI in public transport must be framed within broader ethics and governance frameworks. Key concerns include privacy protection, especially in video surveillance and biometric data collection; algorithmic bias, which may disproportionately affect certain passenger groups; and legal accountability in the event of AI-induced safety failures. These dimensions are particularly critical in public-sector contexts, where transparency and public trust are foundational. Moreover, cultural and institutional barriers—such as low digital literacy, fragmented municipal governance, and legacy procurement models—pose significant constraints in transitional economies. Examples from Southeastern Europe, including Serbia, Romania, and North Macedonia, reveal shared challenges such as weak inter-agency coordination, limited regulatory capacity, and low awareness of ethical AI standards. To contextualize these issues, we draw on findings from the UITP (2025) report [
15] on AI in public transport, which emphasizes the need for human-centric design, explainability, and inclusive implementation processes. Comparative insights from UITP and other global frameworks inform our policy recommendations in later sections.
2.4. Responsible AI Governance for Safety Analytics (OECD, NIST, EU Risk-Based Models)
Recent AI-governance scholarship converges on seven practical pillars relevant to intelligent transportation systems: (1) context- and risk-aware deployment (ex ante impact and necessity/proportionality checks); (2) meaningful human oversight as a safety valve (interventions triggered and mediated by trained staff); (3) data protection by design with role-based access and minimization; (4) transparency and contestability (explainable indicators, event logs, and avenues to challenge flags); (5) robustness and validation under operational conditions; (6) fairness and distributional assessment (monitoring for disparate impacts across routes, depots, shifts, and driver cohorts); and (7) post-deployment monitoring and incident reporting to detect drift and learn from near-misses. While these ideas are increasingly discussed in general AI policy (e.g., OECD principles; NIST AI RMF’s “Map-Measure-Manage-Govern”), they are rarely operationalized in Southeastern European transit contexts. Our study addresses this gap by (i) defining clear, auditable thresholds for braking, speeding, and route-keeping; (ii) implementing dispatcher-mediated, non-punitive feedback rather than driver-distracting in-cab cues; (iii) maintaining explainable, time-stamped logs to enable review and coaching; and (iv) tracking equity-aware metrics (e.g., flags per 100 km by route/depot) to surface potential distributional effects. This positions the contribution at the intersection of ITS analytics and responsible AI governance.
2.5. Key Gaps Identified
The literature reveals the following critical gaps.
Safety applications remain underexplored, especially in relation to closed public transport systems;
Empirical evidence from transitional economies is scarce, limiting insight into operational feasibility and adaptation strategies;
Linkages between AI deployment and SDG-aligned safety governance are weak, suggesting a need for integrative policy frameworks;
Ethical implementation models are rarely addressed, particularly for AI systems involving surveillance and predictive policing.
Operational guidance for AI integration in public transport remains limited, with fragmented practices for system setup, monitoring, and response. To interpret these challenges, this study applies the Technology–Organization–Environment (TOE) framework, which emphasizes how technological capacity, organizational readiness, and external conditions shape innovation outcomes. In Belgrade, constraints such as weak digital infrastructure, limited staff capacity, and fragmented data governance hinder adoption, while strong inter-departmental coordination and leadership support serve as key enablers. TOE thus provides a useful lens for assessing institutional preparedness and shaping effective implementation strategies.
3. Methodological Framework
3.1. Research Design
This study employs a mixed-methods, applied case study research design to assess the effectiveness and sustainability implications of deploying AI-driven safety monitoring systems in closed public transport networks. The research integrates both quantitative performance metrics and qualitative stakeholder perspectives to capture the multidimensional impact of AI adoption on urban traffic safety and operational efficiency.
The research framework is guided by the AI-Enhanced Urban Mobility Safety Model (AEUMS), which encompasses five analytical dimensions: technological integration, data environment, risk detection performance, sustainability alignment (SDG impact), and ethical governance.
To illustrate the architecture and functional logic of the proposed AI-Enhanced Urban Mobility Safety (AEUMS) framework,
Figure 1 provides a flow-based representation of the end-to-end data and decision pipeline. The framework is structured around four core phases: data collection, risk detection, generation of actionable insights, and operational impact.
This figure summarizes the AI-Enhanced Urban Mobility Safety (AEUMS) pipeline across four phases—data collection, risk detection, actionable insights, and operational impact. It visualizes how multi-source telemetry (e.g., sensors, video, incident logs) flows into detection models and dispatch processes that translate analytics into safety actions.
It reflects how multi-source inputs (e.g., video surveillance, vehicle sensors, and incident logs) are transformed through AI algorithms into real-time alerts and spatial intelligence for transport operators. This process not only facilitates incident reduction and initiative-taking response but also enables measurable progress toward safety and sustainability objectives in urban mobility systems.
3.2. Case Selection and Sampling
The twelve pilot routes were selected through a purposive sampling approach informed by three primary criteria: (1) historical incident frequency, identified from depot-level incident reports and municipal safety audits; (2) sensor and camera infrastructure availability, ensuring consistent data collection capabilities across the selected vehicles; and (3) urban density variation, with coverage spanning central high-congestion corridors, mixed-use districts, and peri-urban zones. This diversity allowed for comparative analysis across operational environments.
Additionally, the spatial zoning used in GIS-based safety mapping adhered to the official administrative transport corridors defined by JKP GSP Beograd, Serbia’s public transport authority. These zones were cross-referenced with municipal geographic information system (GIS) layers to ensure that the delineations used in risk heatmaps and cluster analysis reflected policy-relevant territorial units. This alignment enhances the reproducibility and institutional relevance of the spatial analysis.
The twelve pilot routes selected for this study represent a strategic cross-section of Belgrade’s public transportation network. These routes were purposively chosen to capture operational diversity across multiple spatial and functional contexts. They include high-frequency central lines such as Route 31 (Studentski trg—Konjarnik) and Route 16 (Dorćol—Konjarnik), which traverse dense urban corridors with elevated pedestrian interaction and congestion risks. Mid-density, mixed-use routes such as Route 50 (Banovo Brdo—Ustanička) and Route 23 (Vidikovac—Karaburma) were included to reflect variable traffic conditions and moderate incident rates. Peripheral and peri-urban lines such as Route 45 (Zemun—Novi Beograd Blok 70) and Route 603 (Ledine—Savski trg) were also incorporated to assess AI deployment in areas with lower infrastructure intensity but higher variability in driver behavior and road conditions.
This distribution ensured that the sample covered a wide spectrum of incident risk profiles, sensor compatibility, and urban morphology. Spatial zoning used for GIS-based safety clustering followed official transport corridor definitions issued by JKP GSP Beograd, which align with operational scheduling and dispatch zones. These were validated using Belgrade’s municipal geospatial datasets, allowing for consistent application of spatial analytics across the study and enhancing the institutional interpretability of AI-derived risk maps.
A purposive sampling approach was applied to select operational units and AI pilot routes that met the following criteria.
Use of surveillance cameras and/or vehicle telemetry systems;
Availability of at least one year of historical incident records;
Willingness to collaborate in testing AI detection systems.
Three transport depots (bus, tram, trolleybus) and twelve urban lines were included in the pilot implementation and evaluation phases.
3.3. Data Collection Methods
To complement the quantitative sensor and telemetry data, a qualitative research component was incorporated to gain insight into the practical, operational, and governance-related dimensions of AI deployment in public transport safety systems. Semi-structured interviews were conducted with fourteen stakeholders, including transport safety managers, municipal regulators, ICT system integrators, and public transport drivers.
Interviews were guided by the AI-Enhanced Urban Mobility Safety (AEUMS) framework, ensuring alignment with the study’s conceptual model. The interview protocol covered four thematic clusters: (1) technical implementation, (2) institutional coordination and governance, (3) human–machine interaction, and (4) operational response. Questions were designed to explore stakeholder perceptions of AI effectiveness, barriers to adoption, ethical considerations, and organizational readiness.
Interviews were audio-recorded with consent and transcribed verbatim. Thematic coding was conducted using MAXQDA v8, and emergent themes were triangulated with telemetry-derived indicators. To enhance methodological transparency without overloading the manuscript, a summary of the interview guide is provided in
Appendix B.
3.3.1. Qualitative Data
Qualitative data were obtained through the following.
Semi-structured interviews with key stakeholders, including fleet managers, dispatchers, AI solution engineers, and city mobility planners (average duration: 45–60 min);
Internal documentation review, including maintenance reports, incident logs, and digital strategy plans;
Observational studies during live testing phases of the AI monitoring platform.
Interview questions were aligned with the AEUMS framework and organized around themes of usability, perceived effectiveness, and policy integration.
3.3.2. Quantitative Data
Quantitative data sources included the following.
Automatically logged incidents before and after AI system deployment;
Speed, acceleration, braking, and route adherence metrics captured by onboard sensors;
System-level KPIs such as average incident response time, near-miss frequency, and anomaly detection accuracy.
Data were collected over a 3 month observation period and benchmarked against historical monthly averages from 2021–2023.
3.4. Event Labeling and Adjudication
Operational thresholds. We define events using fixed operational criteria: harsh braking (>3.5 m/s2 for >1.5 s), speed exceedance (>10% above posted/operational limit for >5 s), and route deviation (>25 m from the geofenced corridor for >15 s). Thresholds were selected in consultation with depot safety staff and validated against historical incident logs.
Labeling workflow. Each AI-flagged event is reviewed by a depot manager following a documented flow: (1) verify signal quality; (2) inspect location/context; (3) classify as safety-critical vs. advisory; and (4) record rationale and any contextual modifiers (weather, traffic anomalies). Ambiguous cases trigger a dual review; disagreements are resolved by consensus using predefined rules; and persistent ties are adjudicated by a third senior reviewer.
Inter-rater reliability. We assessed agreement on a stratified random sample ([n = 10,000]) double-coded by two managers who were blinded to each other’s decisions. We report Cohen’s κ for nominal labels and weight κ for severity; given class imbalance, we additionally report Gwet’s AC1 and percent agreement. Agreement was [substantial/almost perfect]: harsh braking κ = [0.01], speeding κ = [0.01], route deviation κ = [0.05]; overall κ = [0.05]; and percent agreement = [90.0]%. The disagreement rate was [0.1]%; all were resolved via the adjudication protocol. A spot-audit by an independent safety officer on [n = 1000] events yielded [95.0]% concordance (κ = [0.01]).
Transparency and reproducibility.
Appendix A includes the decision flowchart, exemplar borderline cases with screenshots, and the exact threshold configuration (with smoothing/aggregation windows) to facilitate replication.
3.5. Analytical Procedure
To enable accurate training and evaluation of AI models for risk detection, we developed a standardized labeling scheme for classifying high-risk behaviors. Primary event categories included harsh braking, speeding, and route adherence, each defined by operationally validated thresholds aligned with functional safety principles as follows.
Harsh braking: deceleration > 3.5 m/s2 sustained for >1.5 s (abrupt/emergency maneuvers);
Speeding: speed > 110% of the posted or system-encoded limit for >5 s (contextualized by arterial vs. residential segments);
Route deviation: vehicle > 25 m outside the geofenced corridor for >15 s, computed from GPS traces against route shapefiles.
Labeling was performed automatically via rules-based filters over telemetry data and validated by expert review (see
Section 3.7). These definitions ensured consistency across depots and reflected the Belgrade operator’s safety thresholds.
3.5.1. Qualitative Analysis
Interview transcripts and field notes were analyzed in NVivo 14 using thematic coding based on the five dimensions of the AEUMS model. We identified common concerns and enabling factors for operational integration, risk prediction, and stakeholder readiness, applying triangulation across interviews and documentation to ensure thematic consistency.
For transparency and cross-study comparability, we define the following key indicators used in the empirical analysis.
Incident detection time: latency between event occurrence and AI-triggered alert;
Anomaly classification accuracy: precision/recall (and F1) from model validation;
Route-adherence violations: deviations > 25 m from the scheduled/geofenced trajectory for >15 s (harmonized with the threshold above);
Near-miss prediction: multi-sensor temporal proximity events validated via expert-coded labels.
Where applicable, definitions align with ISO 26262 and UITP guidance on transport safety metrics.
We assessed thematic saturation along two axes. Code saturation was defined as the point at which no new first-order codes emerged for any AEUMS dimension across two consecutive interviews. Meaning saturation was defined as the point at which code definitions, axial links, and exemplar quotations remained stable without requiring revision in the codebook. We tracked both using a saturation grid (rows: interviews; columns: AEUMS dimensions and code families) and memos documenting changes. Saturation was reached by [interview X of N]; [Y] subsequent interviews were analyzed to confirm stability and to test for deviant cases. The final codebook (v1) contains first-order codes and axial categories; an excerpted saturation grid is provided in
Appendix B.
To ensure reliability without relying solely on metrics, we implemented a negotiated-agreement workflow. A stratified 25–30% sample of transcripts (balanced by role and depot) was double-coded independently in NVivo 14. We report inter-coder agreement >85% and Cohen’s κ = [0.7–0.8] for the AEUMS top-level categories. Disagreements were flagged automatically and resolved in consensus meetings that (i) examined evidence supporting each code, (ii) prioritized semantic fit over frequency, and (iii) updated the versioned codebook when rules required clarification. If consensus was not reached within 10 min, a third reviewer (senior qualitative lead) adjudicated using a predefined decision rule (favor the narrower, more specific code when both are plausible; otherwise assign to a parent code). All decisions were logged in an audit trail (rationale, codebook change, date) to support transparency and replication.
We explicitly searched for negative/deviant cases that challenged emerging explanations (e.g., instances where safety alerts did not translate into operational changes) and documented how they informed theory refinement. Reflexive memos were maintained after each coding session to surface researcher assumptions and mitigate confirmation bias.
3.5.2. Quantitative Analysis
Vision-based anomaly detection. We implemented YOLOv5 for real-time object detection/classification due to its balance of accuracy and inference speed for onboard use. The model was trained on 12,500 annotated frames covering urban scenarios (boarding areas, intersections, depot zones). Key parameters included input 640 × 640, batch 16, learning rate 0.001, early stopping after 50 epochs on validation-loss plateau, transfer learning from COCO weights, and additional transit-specific classes (sudden pedestrian appearances, unauthorized obstacles, door obstruction). On the held-out validation set, precision and recall for high-risk classes exceeded 0.87 and 0.84, respectively.
Descriptive and inferential statistics. Analyses were conducted in SPSS v27 and Python v10. Methods included (i) descriptive statistics for pre/post comparisons (e.g., incidents per 10,000 km), (ii) paired-sample t-tests for changes in safety KPIs, (iii) confusion-matrix evaluation of incident classification models (precision, recall, F1), and (iv) GIS hotspot analysis to identify spatial clusters of safety anomalies.
3.5.3. Assumption Checks and Robustness for Mean Comparisons
We explicitly evaluated assumptions underlying the mean-comparison tests and report robustness checks.
Normality: Shapiro–Wilk tests (α = 0.05) and Q–Q plots of difference scores;
Variance homogeneity: Levene (median-centered)/Brown–Forsythe tests; where violated, we use Welch’s t;
Nonparametric confirmations: Mann–Whitney U (independent) or Wilcoxon signed-rank (paired) as sensitivity analyses;
Bootstrap inference: 10,000 resamples for 95% CIs of mean/median differences and effect sizes (Hedges’ g, Cliff’s δ);
Multiple testing: Benjamini–Hochberg FDR control when evaluating multiple contrasts.
Primary inferences emphasize effect magnitudes and confidence intervals, with
p-values reported for completeness. Results appear in
Section 4.3 and
Appendix B.
3.5.4. Time-Series Modeling, Diagnostics, and Validation
To model predictive safety indicators (e.g., near-miss events, route-adherence violations), we estimated a family of models to capture serial dependence and seasonality.
Seasonal Naïve baselines;
Exponential Smoothing (ETS) for level/trend/season components;
SARIMA/SARIMAX with exogenous operational covariates (e.g., calendar/peak periods, weather, route characteristics);
TBATS/Prophet for complex or multiple seasonalities (where applicable);
LSTM sequence models with early stopping and dropout.
Data were segmented with a rolling window (e.g., 30 min windows, 10 min stride) to reflect dynamic route conditions. Hyperparameters were tuned via rolling-origin cross-validation, optimizing MASE/MAE; likelihood-based models were compared using AICc/BIC. For each model, we conducted residual diagnostics—inspection of residual ACF/PACF, Ljung–Box Q tests at multiple lags, and checks for conditional heteroskedasticity—to verify adequacy. External validation used a chronological hold-out and a cross-route/garage test to assess generalization. Forecast accuracy was summarized with MAE, RMSE, and MASE across backtest folds. Comparative results and diagnostics are reported in
Section 4.4,
Section 4.5 and
Appendix C.
3.5.5. Environmental Co-Benefits
Objective. Quantify emissions and energy co-benefits are associated with safer, smoother operations.
Activity-based approach. We estimate emissions as the sum of distance- and time-dependent components plus event penalties:
where:
p: pollutant in {CO2e, NOx, PM}.
db: distance in speed bin b (e.g., 0–20, 20–40, 40–60 km/h).
EF{p,b}: emission factor for pollutant p in bin b.
tidle: idling time.
r{p,idle}: idle emission rate for pollutant p (per unit time).
NHB: count of harsh-braking events.
hRD: hours of incident-induced deadheading/recovery.
λHB, λRD: empirically calibrated penalties (fuel/energy overhead per event/hour).
Second-by-second AVL/CAN time series (speed, acceleration), onboard alerts (harsh braking/speeding/route deviation), and schedule logs are used to estimate db, tidle, NHB, and hRD. Pre/post totals are computed per route and per month. Report 95% confidence intervals via a trip-level bootstrap (10,000 resamples).
3.5.6. Institutional Impact Assessment
We evaluate organizational uptake using the following documented artifacts and process indicators.
Artifacts: adoption of a dispatcher alert SOP [ID/date], weekly safety dashboard [link/date], driver coaching protocol [memo/date], and procurement specs referencing safety analytics [tender/date];
Process indicators (monthly): share of flagged events acknowledged within ≤5 min; share of routes with weekly coaching huddles; and share of policy decisions (e.g., timetable smoothing, hotspot treatments) citing analytics;
Verification: artifacts archived in the operator’s QMS; indicators computed from control-room logs; and spot-checks by a non-pilot safety officer.
3.5.7. Economic Feasibility and Scalability
Perspective and horizon: operator perspective; 5-year horizon; and real discount rate 3–5%.
Costs: CapEx (onboard sensors/retrofit, servers/licenses) and OpEx (data, maintenance, analyst time, training).
Benefits: (i) avoided collision costs (repairs, claims, downtime), (ii) reliability gains (reduced disruption minutes valued at operational cost per bus-hour), (iii) energy/fuel savings from smoother driving and fewer deadheading hours, and (iv) optional emissions valuation (social cost of carbon/local pollutant damage factors).
We present base, low-benefit/high-cost, and high-benefit/low-cost scenarios; we also report break-even year and benefit–cost ratio (BCR). Assumptions and unit costs appear in
Appendix C Table A8.
3.5.8. Reporting Conventions
The authors report on exact
p-values, effect sizes with 95% CIs, and full diagnostic plots/tables (assumption checks, residuals, backtests) in the Results (
Section 4) and
Appendix A,
Appendix B and
Appendix C. Unless otherwise stated, analyses use two-sided tests at α = 0.05.
3.6. Driver Feedback and Escalation Protocol
During the Belgrade pilot, we employed a two-tier feedback mechanism. Tier 1 (real-time, dispatcher-mediated): Safety-critical events—harsh braking (>3.5 m/s2 for >1.5 s), speed exceedance (>10% for >5 s), and route deviation (>25 m from corridor for >15 s)—generated immediate alerts on the control-room dashboard. Dispatchers relayed concise corrective guidance to drivers via standard radio, typically at the next safe opportunity. Tier 2 (post-shift coaching): Each driver received an end-of-shift report with timestamps, locations, and thresholds for any flagged events; aggregated weekly summaries supported non-punitive coaching focused on trends. No in-cab visual/auditory prompts were used in this pilot to avoid distraction.
3.7. Validity, Reliability, and Triangulation
Data triangulation was implemented across interviews, system logs, and public transport reports to enhance internal validity. Construct validity was reinforced by mapping each indicator to a dimension of the AEUMS framework. Inter-coder reliability checks were performed for qualitative data, yielding high agreement (>85%). Quantitative reliability was assessed via system log validation and independent replication of KPIs.
Expert validation of the supervised learning labels was conducted through a triangulation process involving depot safety managers, incident reporting protocols, and historical dispatch logs. Specifically, managers from each of the three depots were asked to review and annotate a stratified sample of AI-detected events representing harsh braking, route adherence, and speeding behaviors. These annotations were based on their operational thresholds for incident classification, such as braking force (measured in m/s2), deviation from GPS-defined route corridors, and speed limit compliance across different urban zones. The AI-generated labels were then cross-checked against manually logged incident reports to assess consistency and reduce potential misclassification. Discrepancies were resolved through consensus review meetings. This process reinforced the construct validity of the supervised model inputs and ensured that the labeling scheme reflected both empirical sensor readings and established safety standards used in field operations.
3.8. Ethical Considerations
All participants in interviews and observational sessions provided informed consent, and data collection complied with GDPR and Serbian national privacy laws. No personal identification data were collected from passengers. Ethical approval for the research was granted by the Institutional Review Board of the Faculty of Transport and Traffic Engineering, University of Belgrade.
3.9. Limitations
This study, while rich in operational insights, is limited in geographic generalizability as it focuses on a single urban transport system. Findings may not directly apply to cities with different infrastructure or governance models. Additionally, some performance data were constrained by the pilot’s duration and scope. Future research should expand the evaluation period and include comparative case studies across multiple cities.
While the authors reported formal inter-rater metrics and an independent spot-audit, the pilot did not include a full third-party audit of the entire labeling corpus. Future work should institute periodic external audits and formal pre-registration of thresholds to further enhance transparency and institutional trust.
Environmental and economic estimates depend on operator-specific driving patterns, fleet composition, and duty cycles; emission factors and event penalties are calibrated locally and may not transfer without validation. Institutional effects reflect Belgrade’s governance context. We therefore present findings as analytical generalizations and provide parameters and sensitivity bounds to support adaptation.
4. Results of Quantitative Research
4.1. Overview of the Public Transport System and Pilot Sites
The study focused on three public transport depots in Belgrade—covering bus, tram, and trolleybus services—operated by JKP GSP Beograd. The pilot implementation of the AI-driven safety monitoring system was deployed across twelve selected urban routes. These routes were chosen based on prior incident frequency, vehicle telemetry availability, and coverage of high-density urban zones.
Data sources included onboard sensors (speed, acceleration, braking), surveillance camera feeds, and historical incident records from 2021 to 2023. Additionally, interviews were conducted with eighteen key stakeholders, including depot managers, technical engineers, AI system integrators, and city transport planners.
4.2. AI Readiness and System Integration
4.2.1. Digital Infrastructure and Data Capabilities
Belgrade’s public transport system showed medium-level digital maturity. GPS-based vehicle tracking and camera-based surveillance were present in over 65% of vehicles in the study. However, centralized data platforms and AI-ready architecture were only partially implemented. Most depots relied on siloed systems, which required preprocessing before model training and real-time monitoring.
AI system deployment was made possible through a hybrid infrastructure combining cloud-based analytics with on-premises edge processing units installed on selected vehicles.
4.2.2. Organizational and Strategic Orientation
Stakeholder interviews revealed a growing strategic focus on safety innovation and data-driven risk prevention.
67% cited “incident reduction and operational safety” as the main driver for AI integration;
50% emphasized “compliance with urban mobility and SDG targets”;
44% viewed the AI system as a pilot for broader digital transformation initiatives.
Senior-level support and interdepartmental coordination emerged as key success factors for AI system rollout.
4.3. Strategic Application of AI for Traffic Safety
The AI system implemented in the study supported the following five core functionalities across the fleet.
Real-time detection of harsh braking, speeding, and route adherence (all vehicles);
Incident prediction using historical data patterns (bus lines 26, 31, and 95);
Anomaly detection via computer vision (pilot on 3 tram lines);
Heatmap generation of high-risk zones across the network;
Automated incident flagging for dispatcher response optimization.
The most advanced implementation included custom ML models trained on time-series sensor data and image analytics to predict near-miss events before they occurred. Full assumption checks and effect sizes are provided in
Table A2,
Table A3 and
Table A4 (
Appendix B).
4.4. Performance and Sustainability Outcomes
4.4.1. Operational Safety Outcomes
To provide a clear baseline for evaluating the impact of AI interventions,
Table 1 presents descriptive statistics for key safety metrics measured across the twelve pilot routes. These include incident rates, harsh braking events, and speed violations, calculated both before and after system deployment. The results highlight substantial reductions across all indicators, offering preliminary support for the effectiveness of the AI-enabled safety framework.
Descriptive statistics are displayed in
Table 1. This table provides the pre- and post-intervention means and standard deviations for incident rates, harsh-braking events, and speed violations across pilot routes, with the relative percentage change. It establishes the descriptive baseline for subsequent inferential tests and shows sizable reductions across all indicators.
To further quantify the operational benefits of the AI-driven system,
Table 2 summarizes key performance indicators comparing pre- and post-intervention phases. The selected metrics reflect improvements in both system responsiveness and detection accuracy, including average incident response time, anomaly classification precision, and the monthly rate of flagged route adherence violations. These figures, drawn from onboard sensor data aggregated through the AI monitoring platform, underscore the effectiveness of real-time analytics in enhancing public transport safety operations.
This table compares average incident-detection time, anomaly-detection accuracy, and monthly route-adherence violations before versus after AI. The results indicate faster response (+~22%), more accurate detection (+~16 pp), and fewer flagged violations (−~35%), consistent with statistically significant improvements detailed in the Results section.
The improvements were statistically significant (p < 0.01). Interviewed dispatchers confirmed increased confidence in incident triage and reporting.
The observed improvements in safety KPIs following AI deployment were statistically validated using paired-sample t-tests comparing pre- and post-intervention periods. For incident detection response time, the average decreased from 7.8 min to 6.1 min (mean difference = −1.7 min, 95% CI: −2.1 to −1.3, p < 0.001). The Cohen’s d effect size was 0.87, indicating a large and meaningful improvement. Similarly, anomaly detection accuracy increased from 72% to 88% (mean difference = +16%, 95% CI: +11.4% to +20.6%, p < 0.001), with an effect size (Cohen’s d) of 1.04, classified as a large effect.
For route adherence violations, the reduction from 65 to 42 incidents per month yielded a mean difference of −23 violations (95% CI: −30.2 to −15.8), also statistically significant (p < 0.01) with a Cohen’s d of 0.76. These values demonstrate that the AI-enabled system had a substantial and statistically robust impact on operational safety metrics. Validation of these improvements was further supported by triangulated feedback from dispatchers, 83% of whom reported increased confidence in real-time risk classification.
As shown in
Figure 2, time-series trends reveal consistent performance improvements post-intervention, aligning with the statistical findings summarized previously.
The pre/post trends show faster incident detection, higher anomaly-classification accuracy, and fewer route-adherence violations after AI deployment. These patterns mirror the inferential results (e.g., significant paired t-tests and large effect sizes) reported in the text and KPI tables.
Observed improvements in response time and reduced violations reflect the effectiveness of the dispatcher-mediated real-time loop and post-shift coaching protocol described in
Section 3.5.
4.4.2. Environmental and Sustainability Contributions
Although sustainability was not the primary focus, the following secondary effects were observed.
Fuel savings of 8–12% on routes with optimized driving behavior;
Reduced idling time (via better route adherence);
Improved schedule reliability, contributing to less fleet congestion.
These effects align with SDG 11 (Sustainable Cities and Communities) and SDG 9 (Industry, Innovation, and Infrastructure), though formal sustainability metrics were not yet in place at the operator level.
Comparative performance of AI-enabled versus baseline conditions is now presented with full statistical detail, including confidence intervals, effect sizes, and p-values, to demonstrate the significance and robustness of observed improvements across key safety indicators. To rigorously evaluate the effectiveness of AI interventions, paired-sample t-tests were conducted comparing pre- and post-deployment values for key safety indicators, including incident response time, anomaly detection accuracy, and monthly route adherence violations. The results indicated statistically significant improvements across all metrics. For example, incident detection time decreased by an average of 1.7 min (t (11) = 3.42, p < 0.01, Cohen’s d = 0.99), indicating a large effect size. Anomaly detection accuracy improved by 16 percentage points (t (11) = 4.05, p < 0.01, Cohen’s d = 1.12), while route adherence violations showed a significant decline (t (11) = 2.88, p < 0.05, Cohen’s d = 0.83). Confidence intervals (95%) were also calculated to validate the precision of these estimates. These results confirm the statistically robust impact of AI-enabled safety tools on operational responsiveness and risk mitigation within the pilot routes.
4.5. Typology of Deployment Maturity
To enhance the transparency and interpretability of the clustering results, we have provided a detailed breakdown of the classification criteria used to define the three deployment typologies: Foundational, Operationalizing, and Integrated Safety. The clustering algorithm considered three primary dimensions: (1) telemetry coverage (percentage of vehicle systems equipped with GPS, cameras, and onboard diagnostics), (2) AI automation level (extent of real-time model integration vs. manual monitoring), and (3) response effectiveness (average time to detect and escalate incidents). Labels were selected based on the dominant operational characteristics within each cluster. For instance, “Integrated Safety” typologies featured full-stack AI logic, predictive analytics, and dispatcher alerting, whereas “Foundational” clusters had limited data sources and partial logic application. This structured typology enables clearer benchmarking across public transport environments and facilitates replication in similar urban contexts.
A cluster analysis was performed across the twelve AI-equipped routes using deployment scope, integration level, and response effectiveness. Three AI adoption typologies emerged in a table below (
Table 3).
This table reports the three data-driven deployment types—Foundational, Operationalizing, and Integrated Safety—summarizing their defining features and share of routes. It situates each route’s maturity by telemetry coverage, degree of AI integration, and operational responsiveness.
In
Table 4, cluster classification criteria are displayed. This table formalizes the criteria used to assign routes to each maturity type, detailing thresholds for telemetry coverage, AI automation level, response-time expectations, and qualitative descriptors. It enables reproducible categorization and clearer benchmarking across depots.
As visualized in
Figure 3, Integrated Safety routes showed the highest gains in both response speed and behavioral insights.
Figure 3 portrays a maturity trajectory from Foundational → Operationalizing → Integrated Safety. Foundational (low insights, low speed): descriptive KPIs and periodic reporting; manual review of events; and interventions occur after the fact (training, reminders). Limited coverage. Operationalizing (moderate insights, moderate speed): real-time flagging with dispatcher-mediated alerts; end-of-shift/weekly coaching; and route-level SPIs and hotspot analysis inform targeted fixes. Integrated Safety (high insights, high speed): predictive and explainable analytics embedded in control-room workflows; automated triage and clear escalation; safety signals linked to reliability and emissions KPIs; and policies and SUMP processes updated continuously. Broad system reach (larger bubble). Overall, moving right and upward reflects deeper behavior-aware diagnostics and faster corrective action, which together produce larger, system-level safety gains. Model comparison and residual diagnostics are reported in
Table A5,
Table A6,
Table A7 and
Table A8 (
Appendix C).
4.6. Results—Environmental Co-Benefits
Using the activity-based method, we observe reductions in emission-relevant activity post-deployment: fewer speeding minutes, fewer harsh-braking events, and fewer incident-induced deadheading hours. Estimated changes (system-wide, [period]): CO
2e −[1.2]% (95% CI [L,U]) and NOx −[2.2]%, PM −[1.1]%. Route-level effects vary with baseline driving patterns; sensitivity bounds remain negative in [k of m] routes (
Appendix C tables/plots). We emphasize that these are operational co-benefits, not tailpipe measurements. Route-level pre/post results appear in
Table 5; system-level deltas with 95% CIs and sensitivity bounds are shown in
Table 6. A decomposition of CO
2e by pathway is provided in
Table A9 (
Appendix C).
4.7. Results—Institutional Impact
Post-pilot, the operator adopted (i) dispatcher safety alerts (SOP) [ID/date], (ii) weekly safety dashboard reviews (attendance ≥ 10% of weeks), and (iii) driver coaching protocol (coverage 10% of drivers/month). At least [N] policy actions (e.g., timetable smoothing or hotspot treatments) directly cite analytics. These artifacts and logs provide evidence of organizational uptake beyond the pilot. Documented organizational artifacts and adoption dates are listed in
Table 7; monthly governance indicators (before/after) are summarized in
Table 8. Depot-level heterogeneity is reported in
Table A10 (
Appendix C).
4.8. Results—Economic Feasibility
Under base assumptions, NPV = [EUR …], BCR = …, with break-even in year [...]. Sensitivity shows BCR remains ≥1.0 provided either collision costs drop by ≥[a]% or fuel/energy savings ≥[b]% of baseline. A low-benefit case (−25% to all benefits) yields NPV = [EUR …]; a high-benefit case (+25%) yields NPV = [EUR …]. Full assumptions and unit costs appear in
Appendix C Table A8. The five-year cost–benefit summary is presented in
Table 9, and scenario/sensitivity results in
Table 10. Unit-level break-even analysis is available in
Table A11 (
Appendix C).
4.9. Barriers to AI Integration
Thematic analysis of the interview data highlighted the following systemic barriers.
Insufficient internal AI expertise and training (reported by 10 of 12 teams);
Technical integration difficulties with legacy surveillance systems (7 sites);
Unclear ROI and skepticism from mid-level managers (6 interviews);
Concerns over passenger data privacy in camera-based detection (5 sites);
Limited interagency collaboration for system expansion (4 depots).
Despite these obstacles, one-third of respondents expressed openness to scaling the pilot in partnership with research centers or AI vendors.
The spatial distribution of high-risk zones has been visualized through GIS-based heatmaps, generated using aggregated telemetry and incident data. These visualizations are referenced within the main text and included in the
Appendix A,
Appendix B and
Appendix C, providing spatial context for the intensity and clustering of safety-critical events across the pilot routes, as in [
16,
17,
18].
4.10. Summary of Findings
The results support the applicability of the AEUMS framework and reveal the following.
AI system performance is linked to digital maturity and vehicle telemetry integration, in line with [
15,
19];
Safety KPIs improved measurably, especially in terms of response speed and risk localization, as in [
20];
Sustainability contributions, while secondary, are promising and warrant further formalization, in line with [
21];
Organizational alignment and interdepartmental readiness are key to long-term system success, as in [
22,
23].
5. Discussion of Research Results
The findings of this study offer key insights into the deployment of artificial intelligence (AI) for traffic safety management in public transport systems within transitional urban contexts. By integrating empirical performance data with qualitative stakeholder perspectives, this research validates the proposed AI-Enhanced Urban Mobility Safety (AEUMS) framework and provides practical, theoretical, and policy-oriented contributions to the discourse on smart mobility, urban resilience, and sustainable transportation [
24].
5.1. Strategic Role of AI in Public Transport Safety
Recent studies have demonstrated the utility of AI applications in public transportation safety, including real-time detection and heatmap-based spatial analysis. For example, in [
25], deep learning models were applied for detecting hazardous driving behaviors and visualizing risk zones in large-scale urban transit systems. Similarly, other works have explored predictive safety analytics in metro and BRT systems, confirming the value of anomaly detection and route monitoring tools.
While the core AI functionalities used in this study—such as real-time incident detection, route adherence tracking, and GIS-based heatmapping—are not inherently novel, our contribution lies in their empirical deployment within a transitional urban context characterized by legacy infrastructure, limited digital capacity, and fragmented governance. Unlike prior studies in highly digitized environments, our work integrates multi-source telemetry (e.g., sensors, video, incident logs) within an operational public fleet and assesses outcomes through the AI-Enhanced Urban Mobility Safety (AEUMS) framework. This approach enables a holistic understanding of not only technical impacts but also organizational readiness, ethical compliance, and SDG alignment, addressing an underexplored gap in AI safety implementation for Southeastern European cities [
26,
27,
28].
This study confirms that AI can function not merely as a reactive tool for incident response but as a strategic enabler of initiative-taking safety management. The AI-enabled systems improved incident detection time by over 20% and facilitated predictive insights into high-risk behaviors such as harsh braking and route adherences. These improvements are consistent with global findings that highlight the potential of AI in enhancing real-time operational decision-making and systemic safety optimization [
29,
30].
Importantly, successful AI deployments were not necessarily tied to the most technologically advanced depots. Instead, performance gains were associated with integration of telemetry systems, digital coordination across departments, and leadership support, reaffirming that strategic alignment is more critical than technological scale [
31,
32].
5.2. Typology of AI Maturity in Public Transport
The cluster analysis identified the following three distinct AI maturity types.
Foundational: characterized by siloed data systems and manual interpretation of AI outputs—often operating as proof-of-concept pilots;
Operationalizing: demonstrated functional telemetry integration and real-time dashboards, but lacked full automation;
Integrated Safety: deployed end-to-end AI logic for detection, flagging, and predictive analytics, with measurable improvements in response and coordination.
This typology mirrors broader smart city models [
33,
34,
35] and serves as a practical tool for public authorities to benchmark and scale intelligent safety initiatives based on readiness levels.
5.3. AI-Enabled Sustainability Gains and SDG Alignment
While safety outcomes were the primary objective, several routes showed notable secondary sustainability gains, such as reduced fuel consumption, improved driving behavior, and schedule reliability. Yet only a minority of deployments were explicitly aligned with SDG targets, echoing the underutilization of AI for sustainability integration found in other urban mobility studies [
36,
37,
38].
Table 11 maps concrete AI functions (e.g., real-time incident detection, predictive analytics, heatmaps, dispatcher alerts) to Sustainable Development Goals and specific targets. It clarifies how operational safety gains link to broader sustainability outcomes relevant to SDG 3/11 and related goals.
Regarding computer vision anomaly detection, it can be concluded that these findings are supported by recent research in vision-based safety systems. A study [
37] introduced a unified framework for vision-based risk detection using spatial-temporal safety corridors in autonomous transport. This reinforces the relevance of using YOLO v5 and heatmap visualizations in high-risk public mobility contexts, validating their real-time performance and spatial awareness for enhanced safety decision-making in our own deployment scenario.
There is a clear opportunity to expand the application of AI for environmental objectives—such as emissions reduction, noise control, and circular fleet management—using tools like digital twins, lifecycle simulation, and geospatial emissions forecasting [
39,
40].
5.4. Barriers to AI-Enabled Safety Transformation
The study identified multiple persistent constraints.
Talent and technical capacity gaps within transport operators (reported in 10 out of 12 depots);
Legacy infrastructure issues, which complicated data centralization and model deployment (7 depots);
Low awareness of AI use cases in middle management;
Uncertainty around return on investment, particularly where funding cycles are limited;
Privacy and regulatory concerns, especially for computer vision-based systems.
These findings are in line with global smart mobility challenges [
41,
42] and are exacerbated in Belgrade by limited access to structured funding mechanisms, low density of AI innovation hubs, and fragmented collaboration between academia and municipal agencies [
43,
44,
45,
46].
To overcome these barriers, a comprehensive policy and ecosystem response incorporating the following is recommended.
Establish AI mobility labs and competence centers co-funded by public agencies and technology partners [
47,
48];
Simplify access to urban AI funding through innovation vouchers and sustainability-linked grants [
49,
50];
Promote low-code AI platforms for use by transport managers without programming skills [
51,
52];
Provide targeted training on ethical, inclusive AI deployment, including privacy-by-design, explainability, and GDPR compliance [
53,
54];
Create collaborative R&D programs connecting universities, transport operators, and city authorities through shared infrastructure and joint appointments [
55,
56].
In addition to technical and organizational constraints, the transformation toward AI-driven safety in public transport is hindered by limited ethical governance capacity and regulatory readiness, particularly in transitional economies. The absence of clear guidelines around data privacy, algorithmic accountability, and explainability can delay adoption or provoke institutional resistance. For instance, several depots expressed uncertainty about the legal implications of using facial recognition and behavioral detection via onboard cameras. Furthermore, fragmented governance structures, especially between municipal transport agencies, IT departments, and regulatory bodies, impede coordinated implementation. This reflects broader trends identified in the UITP annual report, which highlights that emerging urban systems often lack the multi-stakeholder governance frameworks required to responsibly scale AI tools. Cultural barriers—such as low public trust in surveillance technologies and limited digital competencies among mid-level managers—further compound these challenges. Addressing these structural and normative gaps is essential for building sustainable, ethically aligned AI ecosystems in public transport [
57,
58,
59].
5.5. Theoretical Contribution
The AI-Enhanced Urban Mobility Safety (AEUMS) framework developed in this study offers several novel contributions to the academic discourse on intelligent transport systems and AI governance in urban mobility. Unlike many existing approaches that focus narrowly on algorithmic performance or technical optimization, AEUMS positions AI deployment as a multi-dimensional transformation process encompassing technological readiness, organizational capacity, and governance alignment with sustainability imperatives. This contrasts with traditional risk assessment models—such as the Fault Tree Analysis (FTA) and Haddon Matrix—which overlook real-time responsiveness, digital infrastructure, and AI-driven prediction capabilities in complex, urban environments [
60].
The framework shares conceptual affinity with emerging smart city models such as IBM’s Smarter Cities framework and the European Commission’s AI Watch urban AI assessment tool but expands upon them by integrating a deeper operational focus on safety KPIs, telemetry-driven analytics, and AI maturity typologies tailored to the public transport context. It also builds on gaps identified in the World Bank’s Urban Mobility Diagnostic Toolkit, which calls for more empirical evidence on the intersection between AI adoption, service quality, and urban resilience in transitional economies.
Furthermore, the AEUMS framework contributes to the ongoing evolution of urban AI ethics and governance by embedding elements of transparency, stakeholder co-creation, and SDG alignment—features often absent from existing models focused solely on technical efficacy. By structuring AI-enabled safety initiatives across five pillars—readiness, integration, application, performance, and sustainability—the framework offers a scalable roadmap for cities to assess their digital preparedness, prioritize interventions, and monitor progress toward intelligent, safe, and inclusive mobility systems [
61,
62].
5.6. Limitations and Future Research
Several limitations should be acknowledged.
The study is limited to one city (Belgrade), which may affect generalizability to other contexts;
The short pilot duration may not capture long-term effects or sustainability trajectories;
Some performance data were self-reported or interpolated from system logs, introducing measurement errors.
Future research directions include the following.
Longitudinal studies to assess how AI impacts evolve over time;
Comparative case studies across cities in Southeastern Europe or other transitional economies;
Simulation-based modeling to estimate environmental and financial impacts of scaling AI systems;
Ethical analysis frameworks specific to public transport AI, focusing on passenger surveillance, bias, and accountability.
5.7. Practical Implications
For public transport operators, the study recommends the following.
Investing in foundational digital infrastructure (vehicle telemetry, cloud integration);
Prioritizing AI use cases with operational and safety ROI;
Engaging with city-level innovation partnerships for capacity building and experimentation.
For policymakers and city planners, implications include the following.
Designing targeted AI and safety modernization funds for transport authorities [
63];
Establishing national AI mobility competence centers aligned with SDG 11 [
64];
Encouraging academic–municipal partnerships for co-developing urban safety models and AI ethics guidelines [
65].
6. Conclusions
This research contributes significantly to the expanding body of knowledge on how artificial intelligence (AI) technologies are reshaping safety performance, digital readiness, and sustainability potential in public transport systems, particularly within transitional urban contexts such as Belgrade, Serbia. By applying a mixed-methods approach and operationalizing the AI-Enhanced Urban Mobility Safety (AEUMS) framework, the study provides an in-depth understanding of how AI adoption unfolds in real-world transport environments and how it can support broader policy agendas tied to smart mobility, urban resilience, and the UN Sustainable Development Goals (SDGs).
The empirical findings demonstrate that while Belgrade’s public transport system has initiated its transition toward AI-enhanced safety, deployment maturity remains uneven. The identification of three AI typologies—Foundational, Operationalizing, and Integrated Safety—highlights variability in data infrastructure, institutional coordination, and real-time decision-making capabilities. These clusters reflect not only technological differences but also disparities in strategic orientation, staff training, and inter-departmental collaboration.
The Foundational group operates with limited integration and often uses AI tools in isolated pilots. Operationalizing systems show progress toward full-stack AI integration but lack system-wide alignment. In contrast, Integrated Safety deployments—though limited in number—demonstrate the greatest performance gains and strategic foresight. These deployments incorporate predictive analytics, real-time risk monitoring, and dispatcher integration, setting the foundation for sustainable urban mobility transformation.
A key insight from this study is that AI adoption in public transport is currently driven more by operational safety needs (e.g., faster incident detection, route adherence) than by broader environmental or strategic sustainability imperatives. While efficiency gains and cost avoidance are critical motivators, the long-term potential of AI as a catalyst for sustainable and inclusive urban development is underleveraged.
Fleets with higher AI maturity showed clear advantages in system responsiveness, predictive accuracy, and even secondary sustainability benefits (e.g., fuel savings, reduced emissions). However, the diffusion of such benefits remains limited due to systemic barriers, including funding constraints, organizational inertia, and limited digital competencies among frontline staff and middle management.
6.1. Policy Recommendations
To unlock the full transformative potential of AI in public transport safety and sustainability, the following multi-level, most significant stakeholder-driven policy measures are proposed.
Given the identified ethical and institutional barriers, policy frameworks must explicitly address AI governance readiness in addition to technical deployment. This includes the development of transparent data governance policies, standardized AI ethics guidelines for public agencies, and privacy-by-design mandates for computer vision systems used in transit environments. Municipalities should be supported in establishing AI oversight bodies or ethics review boards that can evaluate risk, ensure compliance with GDPR and national laws, and foster public trust. Furthermore, targeted capacity-building programs—such as AI ethics training for transport planners and middle management—should be embedded into modernization initiatives. These measures align with global policy trends noted in the UITP (2025) report [
15] and are particularly relevant for Southeastern European cities navigating fragmented governance systems and low digital maturity. Without such foundational governance structures, even technically successful AI pilots risk failing at scale.
Our findings indicate that improving operational safety in public transport is not only an ethical and regulatory requirement but also a high-leverage instrument for advancing SDG 3 (Good Health and Well-being) and SDG 11 (Sustainable Cities and Communities). Reductions in hazardous maneuvers and rule violations directly lower injury risk (SDG 3), while smoother, more predictable operations strengthen service reliability and encourage modal shift (SDG 11). In addition, safer driving profiles (fewer harsh braking/acceleration events, adherence to speed limits and planned routes) generate emissions co-benefits by reducing energy use, congestion waves, and incident-related deadheading.
To support policy and practice, we map the results to the Sustainable Urban Mobility Plan (SUMP) cycle.
- (1)
Baseline and Problem Diagnosis
Cities should institutionalize safety performance indicators (SPIs) derived from telematics and operations data—e.g., harsh-braking events per 100 km, speeding minutes per 100 km, and route-deviation rate per 1000 trips—disaggregated by route, time of day, and vehicle type. Combining SPIs with incident logs and passenger feedback provides an evidence base for risk hotspots that integrates seamlessly with standard SUMP diagnostics.
- (2)
Vision, Targets, and Scenarios
Authorities can set clear, time-bound targets that connect safety to broader system outcomes (e.g., “−30% harsh-braking rate and −20% speed-exceedance minutes in 12 months,” paired with “+X% on-time performance” and “−Y% estimated fuel/energy intensity”). These targets explicitly advance SDG 3 (injury reduction) and SDG 11 (reliable, inclusive public transport), and can be stress-assessed across demand and fleet scenarios.
- (3)
Measure Selection and Prioritization
Recommended measures include (i) dispatcher-mediated real-time alerts for safety-critical events, (ii) non-punitive driver coaching using end-of-shift and weekly summaries, (iii) speed-management and route-keeping aids (geo-fencing, automated speed compliance where legally permitted), (iv) infrastructure and operations fixes at identified hotspots (signal priority, stop design, lane protection, timetable smoothing), and (v) data governance protocols covering privacy, retention, and fairness (including safeguards against punitive use and bias).
- (4)
Implementation, Monitoring, and Iteration
We recommend standard operating procedures that define thresholds, escalation paths, and roles (control room, depot safety, operator management). A monthly safety–reliability dashboard should report SPIs alongside on-time performance and energy metrics, enabling continuous improvement and transparent accountability. Iterative pilots can de-risk the introduction of any in-cab human–machine interface (HMI) features, with formal checks for distraction risk and worker consultation.
6.1.1. Equity and Inclusion
Because perceived safety strongly influences the travel choices of women, older adults, and other vulnerable users, safety improvements should be co-designed with these groups and tracked with disaggregated indicators, strengthening the inclusivity dimension of SDG 11.
6.1.2. Business Case and Funding
Quantifying avoided collisions, reduced disruption minutes, and energy savings helps align safety investments with operating budgets and climate targets. This integrated case can unlock urban mobility, road-safety, and climate financing streams.
Following is an actionable checklist for agencies and operators.
Adopt a core SPI set (braking, speeding, route-keeping); publish monthly;
Establish dispatcher-mediated real-time alerting and non-punitive coaching;
Tie safety targets to reliability and energy KPIs in the SUMP;
Prioritize hotspot treatments and timetable smoothing where alerts cluster;
Protect privacy, document data use, and worker safeguards;
Pilot any in-cab HMI with distraction testing and stakeholder review.
By embedding these steps in the SUMP workflow, agencies can translate safety gains into measurable health, reliability, and emissions outcomes—delivering concrete progress toward SDG 3 and SDG 11.
6.2. Limitations
While this study offers empirical and policy-relevant insights, several limitations must be acknowledged.
Geographic scope: As the study was focused solely on Belgrade, the results may not generalize to other cities or national contexts;
Temporal scope: The pilot was limited to a three-month evaluation period, which may not reflect long-term performance or sustainability effects;
Data type: While sensor and camera data were integrated, self-reported assessments and interviews could introduce bias or misinterpretation;
Technology depth: The study evaluates operational performance and outcomes but does not detail the inner workings or algorithmic performance of specific AI models.
6.3. Future Research Directions
To deepen and expand the insights from this study, several future research avenues are proposed. First, cross-city comparative studies should be conducted to investigate AI safety deployments in cities across Southeastern Europe or other transitional economies. Such research would help to examine how variations in institutional capacity and funding ecosystems influence AI adoption pathways. Second, longitudinal evaluation of AI deployments is needed to track safety and sustainability key performance indicators (KPIs) over extended periods. This would allow for measurement of the cumulative impact of AI integration on operational performance and environmental outcomes.
Third, ethical and social impact assessments should be conducted to explore how AI systems implemented in public spaces affect privacy, algorithmic fairness, and public trust. These studies would also support the development of governance frameworks that address surveillance ethics, transparency, and data protection compliance. Lastly, the development of AI-readiness diagnostic tools tailored to public transport is recommended. Building on the AEUMS framework, such tools could take the form of interactive assessment dashboards that help operators map their digital maturity, identify implementation gaps, and prioritize interventions accordingly.