Data Quality in Traffic Management: Framework and Real-World Impacts

Petkani, Viktoria; Tzanis, Dimitris; Mitsakis, Evangelos; Mintsis, Evangelos; Vlahogianni, Eleni I.

doi:10.3390/futuretransp6030124

Open AccessArticle

Data Quality in Traffic Management: Framework and Real-World Impacts

by

Viktoria Petkani

^1,*

,

Dimitris Tzanis

¹

,

Evangelos Mitsakis

¹,

Evangelos Mintsis

¹

and

Eleni I. Vlahogianni

²

¹

Centre for Research and Technology Hellas (CERTH), Hellenic Institute of Transport, 57001 Thessaloniki, Greece

²

Department of Transportation Planning and Engineering, School of Civil Engineering, National Technical University of Athens (NTUA), 15773 Athens, Greece

^*

Author to whom correspondence should be addressed.

Future Transp. 2026, 6(3), 124; https://doi.org/10.3390/futuretransp6030124

Submission received: 15 April 2026 / Revised: 28 May 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Intelligent Transportation Systems and Traffic Management in Urban Networks)

Download

Browse Figures

Versions Notes

Abstract

Effective traffic management relies on the availability of high-quality traffic data to support real-time decision-making for optimizing traffic flow, enhancing safety, and reducing environmental impacts. This study aims to address the lack of integrated and operational approaches for traffic data quality management by proposing a scalable and adaptable framework for the systematic assessment and enhancement of traffic data. The framework consists of four interconnected layers, including data ingestion, data quality assessment, data imputation and correction, and a real-time alerting mechanism. Its applicability is demonstrated through a real-world case study on traffic signal control plan selection, using sensitivity and simulation-based analyses in SUMO. The results indicate that degraded data quality, particularly due to missing or invalid records, can significantly affect system behavior, leading to suboptimal decisions and reduced traffic performance. These findings highlight the importance of continuous and systematic data quality monitoring as a critical component for reliable and efficient traffic management systems.

Keywords:

data quality; traffic management; sensitivity analysis; microsimulation

1. Introduction

In the era of big data, massive volumes of information are generated, collected, and analyzed daily. These data are used across a wide range of Intelligent Transportation System (ITS) applications, including traffic management systems, traveler information services, freight and logistics, connected and autonomous vehicles (CAVs), electronic pricing and parking payment systems, environmental and sustainability applications, as well as vehicle safety systems [1]. However, the increasing volume and heterogeneity of traffic data are also accompanied by various data-related errors and inconsistencies. These issues may originate from the data collection solutions themselves, such as sensor malfunctions or calibration issues, as well as from technological limitations, including communication and transmission problems [2,3]. Consequently, it is not sufficient to simply collect large amounts of traffic data; equal emphasis must also be placed on ensuring and continuously assessing data quality, since the reliability of any ITS application ultimately depends on the quality of its underlying datasets.

More specifically, within the domain of traffic management, high-quality data constitute the foundation for effective descriptive, predictive, and prescriptive analytics [4]. Traffic data support a broad range of applications, such as traffic simulations, traffic forecasting, congestion monitoring, traffic signal optimization, travel time estimation, strategic planning, and real-time traffic management operations. Since these applications heavily rely on data-driven and AI-based methodologies, poor-quality input data may significantly compromise the accuracy and reliability of analytical outcomes and operational decisions, following the well-known principle of “garbage in, garbage out” [5,6].

The concept of data quality in traffic operations is not new. It began to emerge in the United States as early as the 1970s, when the first studies on the reliability of traffic detectors began to appear. Although the term “data quality” was not explicitly used at that time, related expressions such as “detector reliability”, “errors of traffic detectors”, and “flagged values” [7,8,9] were commonly employed, reflecting core aspects of what we now understand as traffic detector data quality.

The term itself gained widespread recognition in 2002, when the Federal Highway Administration (FHWA) commissioned three organizations—Battelle, the Texas Transportation Institute, and Cambridge Systematics—to develop an action plan aimed at supporting stakeholders in addressing traffic data quality issues. This initiative resulted in the publication of three white papers [10,11,12]. Among them, Defining and Measuring Traffic Data Quality by Shawn Turner became a widely cited source and remains a key reference in the field. Turner provided a foundational definition, stating that “data quality is the fitness of data for all purposes that require it”, and identified six core dimensions of traffic data quality: accuracy, completeness, validity, timeliness, coverage, and accessibility. These dimensions have since become the cornerstone of subsequent research in the field, although formal definitions and quantification methods were introduced only in later studies, most notably in the Traffic Data Quality Measurement report published in 2004 [13].

In recent years, research on traffic data quality has expanded significantly, focusing either on further reviewing [14] and refining its theoretical foundations or on introducing more practical approaches through the application of data quality metrics in real-world scenarios and the exploration of improvement techniques [15,16,17]. However, these efforts are often constrained to limited or static historical datasets [18,19,20], rather than addressing the dynamic challenges associated with on-demand and real-time data streams [21]. Although some studies utilize data derived from real-world measurements [22], they rarely integrate their approaches into operational traffic management or decision-making systems. As a result, their ability to demonstrate measurable impacts in real-world environments remains limited, highlighting the need for more comprehensive and system-oriented approaches to traffic data quality management.

In response to the above-identified limitations, this paper makes the following contributions:

The development of an operational and integrated traffic data quality management framework capable of supporting on-demand and real-time traffic management applications.
Bridging the gap between theoretical traffic data quality research and operational traffic management by embedding data quality processes directly into real-world decision-making workflows.
The introduction of an application-aware and multi-level quality assessment methodology, supporting different spatial granularities, temporal resolutions, and task-specific quality thresholds.
The design of a flexible validity assessment methodology capable of supporting statistical, temporal, spatial, multivariate, and AI-based traffic data validation techniques.
A real-world causal evaluation of how degraded traffic data quality influences traffic signal control decisions and leads to incorrect control plan activations.
The quantification of the operational impacts of data quality degradation through SUMO-based microscopic simulation, demonstrating measurable effects on travel time, waiting time, and environmental indicators.
The establishment of a foundation for resource-aware traffic data quality management, highlighting the trade-off between data quality improvement and operational or computational costs in real-time traffic management systems.

The remainder of the paper is structured as follows. Section 2 presents the architecture of the proposed framework, which was developed within the SYNCHROMODE EU-funded research project [23]. Section 3 introduces the real-world case study and presents the sensitivity analysis on traffic signal control decisions under different data quality scenarios. Section 4 extends this analysis through a simulation-based evaluation using SUMO, assessing the operational impacts on traffic performance. Finally, Section 5 and Section 6 conclude the paper by summarizing the key findings and discussing implications for future research and deployment in data-driven traffic operations.

2. Traffic Data Quality Framework

This section provides a comprehensive overview of the proposed traffic data quality framework.

As illustrated in Figure 1, the framework consists of four interconnected components designed to work seamlessly together to assess and maintain data quality in real time, at scale, and across diverse data sources, spatial resolutions, and temporal levels. The core components of the framework include the Data Ingestion Layer, the Data Quality Assessment Layer, the Data Quality Imputation/Correction Layer, and the Alerting System.

2.1. Data Ingestion Layer

The Data Ingestion Layer constitutes the initial phase of the proposed framework and is responsible for collecting and importing data from a wide variety of external sources into the system for subsequent processing and storage. It is designed to handle heterogeneous traffic-related data sources, including loop detectors, Bluetooth sensors, floating car data (FCD), public transport operational data, weather information, and incident reports. These data sources can be accessed and integrated through various ingestion technologies, including relational databases (e.g., PostgreSQL, MySQL), APIs from third-party traffic data services, streaming platforms (e.g., Kafka), as well as flat files and cloud-based storage formats (e.g., CSV, Parquet).

In addition to data acquisition, this layer also performs essential transformation tasks such as parsing, format normalization, basic validation, and schema alignment. These operations ensure that data, regardless of its original structure or format, is brought into a consistent, process-ready state. By combining ingestion and transformation, this layer provides a unified entry point that prepares data for reliable and efficient downstream analysis.

2.2. Data Quality Assessment Layer

The Data Quality Assessment Layer is the core component of the framework, responsible for conducting a comprehensive evaluation of incoming data. It applies a well-defined set of six core data quality dimensions, as presented in Table 1, accuracy, completeness, validity, timeliness, coverage, and accessibility, which are detailed in the table below, defined by “Traffic Data Quality Measurement: Final Report” [13]. Although the layer currently uses these core metrics, it is scalable and designed to support the integration of additional dimensions, which are necessary for modern traffic management systems operating with heterogeneous multi-source data simultaneously [24]. All metrics are applied depending on the type and nature of the data, meaning that not all metrics are appropriate or useful for every dataset.

It is worth noting that certain dimensions are more straightforward to compute. For example, completeness can often be determined using a simple mathematical formula, while accuracy can be assessed through measurable differences between observed and ground truth data. However, validity is difficult to quantify, as it involves logical criteria and contextual considerations. Various studies have applied different techniques to detect invalid traffic data, particularly in the context of loop detectors [18,25,26,27,28]. To address these challenges, our framework proposes an advanced approach to assessing validity, categorized into the eight types of checks shown in Table 2.

Assessments are performed across multiple spatial and temporal levels. The level of assessment can be defined by the user according to the specific task and the intended use of the data. For example, real-time traffic management applications typically require fine-grained assessments at the detector or link level using short temporal aggregation windows (e.g., 1–5 min), while network performance evaluation or strategic planning may rely on corridor- or network-level indicators aggregated hourly or daily. This adaptive approach enables the framework to support a broad range of traffic management applications.

To ensure that the assessed data quality remains suitable for the intended application, each metric is evaluated against predefined thresholds tailored by the user according to the requirements of the corresponding use case. If a metric does not satisfy its defined threshold, the system can automatically initiate appropriate actions through the integrated alerting mechanism described below.

2.3. Data Quality Imputation and Correction Layer

The Data Quality Imputation and Correction Layer is activated based on the outcomes of the assessment process. When specific quality metrics, such as completeness, fall below the acceptable thresholds defined for a given task or application, this layer intervenes to enhance the reliability and usability of the data.

The framework supports two complementary imputation strategies, which can be used independently or in combination depending on the characteristics of the detected data quality issue.

The first strategy leverages external third-party traffic data sources to fill in missing or unreliable measurements. This approach is primarily applied in cases involving extended missing periods in the data stream. For example, when traffic measurements are collected at one-minute intervals and an entire continuous period (e.g., five hours) is missing, third-party data sources are preferred, as AI-based reconstruction methods tend to exhibit reduced accuracy under such conditions. These external sources may include floating car data, neighboring detectors, Bluetooth sensors, or other traffic-related data sources.

The second strategy employs a Spatiotemporal Graph Convolutional Network (ST-GCN) reconstruction approach, developed according to the methodologies presented in studies [29,30,31,32]. This strategy is primarily applied in cases involving smaller gaps and partial missing observations within the time series, where sufficient historical and neighboring traffic information remains available.

In this approach, the traffic network is represented as a graph structure G = (V, E), where each traffic detector corresponds to a node (V), while the spatial relationships between detectors are represented through graph edges (E). The spatial connectivity between detectors is modeled through an adjacency matrix, which is calculated based on the actual network distances between detectors using Dijkstra’s shortest path algorithm. For each node, historical traffic measurements are collected over consecutive time intervals. The ST-GCN model utilizes both the adjacency matrix and the historical temporal traffic patterns to learn spatial and temporal dependencies across the network. Based on these learned relationships, the model reconstructs missing or corrupted traffic values simultaneously across multiple nodes of the network.

The ST-GCN model was pretrained offline using historical traffic data. To support real-time deployment and reduce computational overhead, the ST-GCN reconstruction module is activated only when the detected missing patterns are considered suitable for graph-based reconstruction.

2.4. Alerting System

The Alerting System operates as a cross-layer monitoring mechanism, overseeing all core components of the data quality pipeline: Ingestion, Assessment, and Imputation/Correction. Its primary function is to ensure that the overall data quality workflow runs reliably and transparently. Within the Ingestion and Imputation layers, the alerting system tracks the operational status of processes, issuing notifications in case of failures, delays, or abnormal behavior during data collection or correction activities.

In the Assessment layer, the alerting functionality is more granular and user-configurable. Users can develop targeted alert rules based on specific data quality metrics, while the corresponding thresholds are explicitly defined by the users themselves according to the requirements and sensitivity of each application or operational task. For example, an alert can be triggered if the completeness of incoming data remains below 70% for more than one consecutive hour. This flexibility allows stakeholders to proactively monitor data health and respond to emerging issues in real time, ensuring that critical data-driven processes remain accurate and timely.

3. Assessing the Impact of Traffic Signal Control

To demonstrate the importance of data quality and showcase the practical value of the proposed framework, we have applied it to a real-world use case in traffic management: the activation of traffic signal control plans along a signalized corridor. As part of this analysis, we construct three distinct scenarios that represent varying levels of data quality. These scenarios are built using the components of the framework, which assess data quality across multiple dimensions, detect specific issues, and implement targeted corrections. By comparing system performance across these scenarios, we demonstrate how discrepancies in data quality can substantially influence decision-making processes.

3.1. Case Study Area

The case study focuses on a segment of K. Karamanli Street, one of the main arterial corridors in Thessaloniki. The study area consists of a combination of primary and secondary intersections, with a total of eight junctions. The primary intersections are signalized and equipped with traffic detectors (installed from YUNEX Traffic), whereas the secondary intersections are signalized but do not include detector infrastructure. The corridor exhibits significant variability in traffic patterns throughout the day, with pronounced traffic peaks during the morning and evening peak hours.

The area was selected due to the availability of multiple data sources, including loop detectors, Bluetooth sensor data, and third-party floating car data (FCD), making it a suitable environment for assessing traffic data quality. Figure 2 illustrates the selected section of K. Karamanli Street, including the eight intersections, the corresponding loop detectors installed at each intersection, and the examined links associated with the primary intersections, all operated by the Traffic Management Center of the Region of Central Macedonia.

3.2. Signal Plan Activation Logic

To properly interpret the outcomes of the sensitivity analysis, it is important to understand the basic logic behind the signal plan activation system. The Traffic Management Centre has developed five predefined traffic signal control plans, each tailored to a specific traffic state and time period of the day: AM Peak (weekdays), PM Peak (weekdays), Off-Peak (weekdays), Peak (weekends), and Off-Peak (weekends). These plans were derived through a systematic analysis of historical traffic conditions along the corridor and defined in accordance with the operational requirements of the TMC. At fixed evaluation intervals of six minutes, the system processes real-time traffic data, including flow and speed measurements, and selects the most appropriate plan among the available options, based on predefined trigger thresholds.

The plan selection decision is driven by measurements collected at the primary signalized intersections along the main arterial axis, where traffic detectors are installed. Once a plan is selected, it is activated simultaneously across all intersections of the corridor, including both primary and secondary junctions. Consequently, the corridor operates under a unified coordinated signal strategy, whereby the same plan type is activated corridor-wide. For instance, when the AM Peak plan is triggered, all intersections switch to their respective AM Peak signal timings.

It is important to note that each intersection maintains its own pre-optimized phasing and timing parameters within each plan; therefore, what is shared across the corridor is the plan selection decision rather than identical signal timings. This coordinated approach ensures temporal and spatial coherence in signal operations along the corridor, while interaction effects between intersections, such as queue formation and downstream propagation of delays, are explicitly captured within the SUMO microsimulation environment described in Section 4.

Traffic measurements collected from the corresponding corridor links are continuously monitored, as described above, and the threshold conditions are evaluated simultaneously across multiple links. For a signal plan to be activated, at least one associated condition across the monitored links must be satisfied. The symbolic pseudocode presented in Algorithm 1 summarizes the structured evaluation process used to determine whether the peak or off-peak signal plan should be applied.

Algorithm 1 Generalized Traffic Signal Plan Activation Logic

For each evaluation time step (e.g., every 6 min):
   Define set of monitored links: L={L_1,L_2,…,L_n}
   For each link i∈L:
   IF (traffic_volume_i < volume_threshold_i) AND (average_speed_i > speed_threshold_i):
        Set condition_i = TRUE
ELSE:
        Set condition_i = FALSE
IF ALL condition_i = TRUE (for all i∈L):
       Apply Traffic Signal Plan → Off-Peak (Plan Code: P_off_i)
ELSE:
       Apply Traffic Signal Plan → Peak (Plan Code: P_peak_i)

The exact values and logic used to define these thresholds were established in a previous study, based on the analysis of time-series data for flow, speed, and occupancy across the network links. Nevertheless, representative threshold values are summarized in Table 3 to provide an overview of the operational traffic conditions and the activation logic applied within the corridor. Different activation criteria are applied depending on the day type (typical weekday or weekend) and the corresponding time period (06:00–13:00, 13:00–23:00, and 23:00–06:00).

In Table 3, the exact threshold values used in the present study are presented. As shown, the threshold evaluation process considers only the primary signalized intersections, since the secondary intersections follow the same signal plan activation strategy as the primary intersections, as previously described.

3.3. Data Quality Scenarios

The impact of data quality on traffic signal control decisions was examined using real-world traffic data collected during a typical weekday (27 January 2025) along the K. Karamanli Street corridor. The analysis dataset was obtained from eight loop detectors installed at the primary signalized intersections of the corridor, providing flow, speed, and occupancy measurements at 90 s intervals. The analysis focuses on the morning operational period between 06:00 and 13:00, during which traffic management strategies are typically activated in response to peak demand and fluctuating traffic conditions. This resulted in an expected total of 2240 detector-level observations during the examined period. The detector-level measurements were subsequently aggregated into five corridor-level link indicators used for the traffic signal plan activation process, resulting in an expected total of 1400 link-level observations for the examined time period.

We consider three primary data quality scenarios:

Scenario 1—Imperfect Data (Baseline Quality)

This scenario represents a common real-world situation in Thessaloniki, where data completeness is around 90%. Within the available data, approximately 5% of the values are invalid due to detector malfunctions, communication failures, or transmission issues. Consequently, roughly 10% of the expected observations are missing, while an additional 5% are considered unreliable according to the applied validity checks.

Regarding the temporal distribution of data quality issues, invalid records were observed more frequently during the peak traffic period between 08:00 and 11:00, when traffic volumes and detector activity were significantly higher. Missing values also varied across the examined time window; however, no particularly strong temporal fluctuations were identified. Concerning the spatial distribution, both missing and invalid observations were present across all examined links. Nevertheless, the highest concentration of invalid observations at the link level was associated with Link 118, mainly due to extreme and inconsistent values produced by the corresponding loop detectors installed along this section of the corridor.

The traffic signal control system operates using this imperfect dataset, making decisions under conditions that are far from ideal, yet relatively common in real-world traffic management practice.

Scenario 2—Cleaned and Corrected Data (Enhanced Quality—Ground Truth)

Here, we assume that efforts have been made to improve the dataset by applying the imputation and correction mechanisms described in Section 2.3. In this case study, missing and invalid speed measurements were corrected using available third-party Floating Car Data (FCD), while traffic flow reconstruction was performed using the pretrained ST-GCN model.

The pretrained ST-GCN model was trained offline using one year of historical traffic flow measurements collected along the study corridor under normal operating conditions. The model utilizes rolling temporal windows and neighboring detector observations to reconstruct missing or erroneous flow values. The model architecture consists of two spatiotemporal graph convolution blocks followed by a fully connected prediction layer. The dataset was divided into training, validation, and testing subsets using a 70–20–10% split, respectively.

Table 4 summarizes the main training and optimization parameters of the ST-GCN model used in this study. The resulting dataset represents the enhanced-quality version of the original measurements and is used as the ground truth reference for evaluating system performance under different data quality scenarios.

Scenario 3—Noisy Data (Low Quality)

In this case, we intentionally introduce random noise into the baseline dataset, gradually increasing the distortion in flow and speed measurements. Specifically, we applied noise levels of 5%, 10%, and 15% to examine how sensitive the system is to inaccuracies in the input data. The noise was added to the flow and speed values of the detectors using a uniform random distribution to simulate natural variations or sensor measurement errors. By gradually introducing this noise, we assess whether the system continues to make accurate decisions or begins to respond in unexpected ways. This scenario underscores how even minor and less noticeable data issues can significantly influence system behavior.

3.4. Sensitivity Analysis Results on Data Quality

This section presents the results of the sensitivity analysis conducted to evaluate the impact of varying data quality scenarios on traffic signal control decisions. By comparing system behavior across different conditions, from baseline to cleaned and corrected data, and up to scenarios with increasing levels of artificially induced noise, we quantify how data degradation affects the accuracy and reliability of traffic signal plan activation. Importantly, all scenarios are evaluated in comparison to Scenario 2—Cleaned and Corrected Data (Enhanced Quality—Ground Truth), which serves as the ground truth reference. This scenario represents the ideal classification performance under optimal data conditions (i.e., 100% correct labels) and thus provides the benchmark against which the performance of both the baseline and the noisy data scenarios (5%, 10%, 15% noise) is assessed.

Figure 3 illustrates the traffic signal plan selections (peak vs. off-peak) over time under different data quality scenarios. As noted earlier, the analysis focuses on the morning period from 06:00 to 13:00, which encompasses the typical morning peak. Green segments indicate minutes during which the off-peak plan was activated, while orange segments denote activation of the peak plan.

By comparing each scenario to the ground truth (Scenario 2), we observe that, in most cases, the system correctly selects the appropriate signal plan (true-true for peak plan and false-false for off-peak plan). However, there are also instances where incorrect decisions are made, either activating the peak plan during off-peak periods or vice versa. These correspond to two types of errors: (a) cases where the system should have activated the peak plan (true) but instead selected the off-peak plan (false), leading to a true–false condition, and (b) cases where the system should have activated the off-peak plan (false) but incorrectly triggered the peak plan (true), leading to a false–true condition.

As the level of noise increases, these errors become more frequent, particularly during the critical morning peak window between 08:00 and 10:00, indicating that the system is more sensitive to data quality issues during this period.

Figure 4 and Table 5 illustrate the system’s classification accuracy under different data quality conditions. Specifically, Figure 4 presents a bar chart showing a breakdown of classification outcomes, with each bar segmented by classification type: correct classifications (blue), misclassifications where the Peak plan was incorrectly selected instead of Off-peak (yellow), and misclassifications where the Off-peak plan was incorrectly selected instead of Peak (pink). Table 5 (the confusion matrix) complements this visualization by reporting the exact classification durations (in minutes) for each scenario. It provides a detailed breakdown of how many minutes the system correctly activated each signal plan (Peak and Off-peak), the duration of each type of misclassification, as well as the classification accuracy for each scenario, based on Equation (7) below.

A c c u r a c y = \frac{c o r r e c t c l a s s i f i c a t o n}{t o t a l c l a s s i f i c a t i o n} = \frac{T P + T N}{T P + T N + F P + F N}

(7)

where TP is the True positive, FP is the False positive, TN is the True negative, FN is the False negative rates.

As shown in Table 5, the rate of increase in classification errors across noise levels, as well as the classification accuracy, is not constant. We observe a sharper rise in errors from the baseline scenario to the one with 5% of noise (9 mistakes), and from 5% to 10% noise (14 mistakes), compared to the smaller increase from 10% to 15% noise (5 mistakes). This non-linearity suggests the existence of a practical data quality threshold beyond which marginal further degradation yields diminishing additional misclassifications, while the cumulative operational cost remains significant. Viewed from the opposite perspective, this observation highlights the importance of identifying the accuracy threshold at which it is optimal to stop, based on the trade-off between performance improvement and the computational resources required. Section 4 translates these misclassification outcomes into concrete operational impacts, quantifying the performance cost each time the system crosses the activation boundary and triggers the wrong signal plan.

4. SUMO-Based Simulation Environment

To evaluate the operational impact of misclassifications in traffic signal control decisions and their effect on overall corridor performance, a microscopic traffic simulation was developed using the Simulation of Urban MObility (SUMO) [33], an open-source microscopic traffic simulation platform widely used for modeling individual vehicle and pedestrian movements with high spatial and temporal resolution. Its extensibility and ability to support real-time control via the TraCI interface make it particularly suitable for evaluating traffic management strategies under diverse operational and data quality conditions.

4.1. Corridor Modeling and Simulation Setup

During the simulation setup, K. Karamanli Street, a two-way arterial road, was modeled, incorporating detailed geometric and control features to accurately reflect the real-world characteristics of the corridor. Specifically, the SUMO model includes eight signalized intersections, each with multiple incoming and outgoing edges, as well as general-purpose (GP) lanes and dedicated bus lanes in both directions. Signal-controlled junctions were configured with realistic phasing and timing plans, imported from actual field equipment and converted into SUMO-compatible formats. In addition, bus stops were placed according to the actual locations of public transport infrastructure along the corridor. The model supports multiple vehicle types, including passenger cars, taxis, buses, light goods vehicles (LGVs), and heavy goods vehicles (HGVs), with calibrated behavior parameters for car-following, lane-changing, and acceleration/deceleration dynamics. Public transport operations, including schedules and average passenger volumes, were integrated to reflect realistic multimodal traffic conditions. Pedestrian crossings and walking routes at major intersections were also modeled based on origin-destination data to capture realistic pedestrian flows.

To support real-time simulation, the 47 loop detectors operated by the Traffic Management Center, along with their exact locations, were accurately integrated into the SUMO environment. These detectors provided the needed input values for traffic control decisions under various simulation scenarios. The demand composition reflects the presence of diverse road users, including private vehicles, taxis, buses, trucks, and pedestrians. The developed model of the K. Karamanli corridor underwent a thorough calibration process, utilizing multiple data sources, such as vehicle speeds, travel times, traffic flows, and densities along the corridor, to fine-tune driver behavior and improve the model’s realism and accuracy.

The calibration of the SUMO model was performed using a two-stage approach, combining flow-based and speed-based validation. Traffic flow calibration was conducted at both the link level, using aggregated traffic count data, and at the individual detector level, using measurements from the 47 loop detectors operated by the Traffic Management Centre. Speed calibration was performed at the link level using aggregated Floating Car Data (FCD) available at one-minute temporal resolution. The goodness-of-fit of the flow calibration was assessed using the GEH statistic, a standard criterion widely adopted in traffic simulation practice. A GEH value below 5 is considered indicative of a good fit between simulated and observed flows. In the present study, more than 92% of the evaluated links and detectors satisfied this criterion, exceeding the commonly accepted threshold of 85% and confirming the adequacy of the calibrated model for the purposes of the simulation-based evaluation.

4.2. Data Quality Scenarios in SUMO Environment

The impact of degraded data quality on traffic signal control decisions was assessed through a series of simulation scenarios developed and tested using the calibrated SUMO network of K. Karamanli Street. The base network was designed to accurately replicate traffic operations during the morning peak hour (08:00–09:00), including demand profiles, traffic compositions, and pedestrian flows. Based on this validated configuration, a set of data-driven scenarios was created.

The core logic of the simulation framework involves the dynamic selection of pre-defined traffic signal control plans. These plans are triggered every six (6) minutes, based on aggregated flow and speed values from loop detectors. Under normal conditions, accurate and complete detector data leads to the correct activation of the morning peak-hour traffic signal control plan, which is tailored to the actual traffic load. However, in cases of degraded data quality, the system may incorrectly assess traffic conditions and trigger suboptimal plans, such as those designed for off-peak periods, that do not match the real traffic demand.

To systematically explore these effects, two complementary groups of scenarios were designed, each consisting of two distinct simulation cases.

Group A—Underestimation of Demand Due to Low Data Quality

This group includes two scenarios designed to evaluate the impact of underestimating traffic demand because of low-quality input data. In both cases, the actual traffic demand corresponds to morning peak hour conditions. However, due to degraded data quality (e.g., missing, outdated, or noisy loop detectors) the system underestimates traffic volumes and, in one scenario, erroneously activates an off-peak signal control plan. This mismatch can lead to insufficient green times at critical approaches and reduced network efficiency during congestion.

Scenario A1: Peak-hour demand with the correct peak-hour signal plan (correct).

Scenario A2: Peak-hour demand with an incorrect off-peak signal plan, triggered by misinterpreted data (wrong).

Group B—Overestimation of Demand Due to Erroneous Inputs

This group includes two scenarios aimed at assessing the effects of overestimating traffic demand caused by erroneous input data. In both scenarios, traffic demand is reduced by 40% to simulate off-peak conditions. Ideally, the control system should detect this and apply the appropriate off-peak signal plan. However, due to misclassification resulting from inaccurate or misleading detector data, the system mistakenly activates a peak-hour control strategy. This leads to excessively long green times and inefficient signal timing during periods of light traffic.

Scenario B1: Off-peak demand with the correct off-peak signal plan (correct).

Scenario B2: Off-peak demand with an incorrect peak-hour signal plan, activated due to misinterpreted data (wrong).

4.3. Performance Assessment Under Data Quality Variations

This section presents the results of the simulation-based evaluation under different data quality scenarios. The performance of the modeled corridor was assessed using a suite of Key Performance Indicators (KPIs), as shown in Table 6, covering both traffic-related and environmental aspects.

It is worth noting that the environmental performance indicators were estimated using SUMO’s built-in emission model, based on the HBEFA3 framework (Handbook Emission Factors for Road Transport, version 3.1) [34]. This model computes instantaneous vehicle emissions at each simulation time step as a function of the vehicle’s current speed and acceleration, using pre-tabulated emission factors for each vehicle category and Euro emission standard.

In the present study, SUMO’s default HBEFA3 emission classes were assigned according to the vehicle type used in the simulation: passenger cars and taxis were assigned the PC_G_EU4 class (petrol, Euro 4), light goods vehicles the LDV_D_EU4 class (diesel, Euro 4), heavy goods vehicles the HDV_D_EU4 class (diesel, Euro 4), and buses the Bus_D_EU4 class (diesel, Euro 4). Per-vehicle instantaneous emissions of CO₂, NO_x, and PM_x were calculated at each simulation time step and subsequently aggregated spatially and temporally across all vehicles and corridor links to produce the network-level environmental KPIs reported in Table 7.

A comparison of the KPIs in Table 7 (between Scenarios A2 and A1, and between B2 and B1) reveals a consistent deterioration in all performance metrics when an incorrect control plan (A2 or B2) is implemented instead of the appropriate one (A1 or B1). Specifically, activating the incorrect scenario in group A (A2) results in a 14% increase in average travel time and a 23% increase in average waiting time. Regarding environmental indicators, the impact ranges from a 17% to 21% increase. A similar trend is observed in the results of group B, further highlighting the importance of selecting the correct control plan to ensure optimal system performance.

A closer examination of the results indicates that in Scenario A2, compared to A1, demand is high and queues are already forming; therefore, using an off-peak plan worsens the situation further. In Scenario B2 versus B1, traffic demand is low and queues are absent; however, applying a peak-hour plan creates unnecessary delays on the main and side roads. Overall, the results suggest that applying a peak-hour plan during off-peak hours causes greater disruption than using an off-peak plan during periods of high demand.

To further explore the spatial distribution of impacts, the same analysis was conducted at the intersection level. The results indicate that the junction exhibiting the most significant performance deterioration under incorrect signal plan activation is Karamanli 25th Martiou (Link 118), consistently across both scenario groups. This finding is aligned with the data quality analysis presented in Section 3, where this junction was associated with the highest concentration of invalid observations along the corridor.

5. Discussion

Traffic management is a critical component of modern transportation systems, aiming to optimize traffic flow, reduce congestion, enhance road safety, and minimize CO₂ emissions. Its success relies heavily on informed decision-making and the implementation of effective strategies that respond dynamically to real-time conditions. At the heart of these processes lies the need for high-quality traffic data, which provides accurate insights into road usage, traffic patterns, and potential disruptions. Without reliable available data, traffic management systems are limited in their ability to optimize flows based on the anticipated traffic conditions [35].

Building on this concept, this study proposes a comprehensive, modular, scalable, adaptable, and application-aware framework for traffic data quality assessment and enhancement, consisting of four interconnected layers: the ingestion layer, the data quality assessment layer, the imputation/correction layer, and real-time alerting mechanisms. Existing research has mainly addressed individual components of this pipeline in isolation. Studies [16,18,28] focused primarily on data quality assessment by identifying and flagging invalid values and calculating data quality dimensions, while studies [15,19,36] concentrated on data enhancement and imputation techniques. Although these contributions are valuable, they typically examine isolated aspects of the data quality problem and are often designed for offline or laboratory-oriented environments rather than real-time operational settings.

In contrast, the proposed approach introduces an integrated closed-loop framework in which the different layers continuously interact and influence one another. Specifically, the outputs of the data quality assessment layer directly trigger the activation of imputation/correction mechanisms, as well as the corresponding alerting mechanisms. In this way, the framework moves beyond static or standalone quality assessment practices and enables continuous, real-time, and operational data quality management within live traffic management environments. Furthermore, the proposed framework is generic and extensible, allowing the integration of additional datasets, services, and application-specific functionalities depending on operational requirements.

The practical value of this framework, as well as the importance of data quality in the traffic management domain, is demonstrated through a real-world case study involving traffic signal control plan selection. A detailed sensitivity and simulation analysis revealed how deviations in data quality, especially due to missing values and invalid records, can significantly alter system behavior, leading to suboptimal control decisions and degraded traffic performance. This underscores the need for continuous and systematic monitoring of data quality, rather than relying on periodic or ad hoc validation approaches.

These findings are consistent with previous studies highlighting the sensitivity of traffic management systems, particularly in the case of traffic signal control plans. Previous research has shown that traffic states can still be correctly estimated and appropriate control decisions can be made when only a limited number of detectors are malfunctioning; however, a critical threshold may exist beyond which system performance deteriorates abruptly [37]. Moreover, when missing or invalid data are spatially concentrated at a specific network location, such as an intersection or link, the negative effects are expected to be more pronounced at that specific location and may subsequently propagate to the wider traffic management mechanism [38]. This observation is consistent with the spatial concentration of impacts identified in the present study at the Karamanli–25th Martiou junction.

Beyond the actual findings of the sensitivity and simulation analyses, which quantitatively demonstrate the manner low-quality input data can lead the system to make significantly more classification errors and, in turn, degrade corridor performance by a measurable percentage, this study also lays the foundation for further exploring questions such as: at which level of data accuracy is it optimal to stop improving, considering the trade-off between marginal gains in performance and the computational or operational costs required to achieve higher accuracy? And further: how can traffic data quality management strategies be effectively extended to support multimodal transportation systems, where data sources are heterogeneous, vary in reliability, follow different quality standards across modes, and differ in their impact on decision-making processes? This opens a new line of investigation toward resource-aware data quality management in real-time traffic control systems.

6. Conclusions

Several limitations related to the proposed framework should be acknowledged. Although the framework was designed as a modular and scalable architecture capable of supporting real-time traffic management applications, several components could be further enhanced. Currently, the quality assessment and alerting thresholds are defined by the user according to the operational task and application requirements. Future extensions could incorporate adaptive and intelligent mechanisms capable of dynamically adjusting or recommending appropriate thresholds based on traffic conditions, operational objectives, or system performance. In addition, further multimodal-oriented quality metrics, advanced validation methodologies, and alternative imputation and correction techniques could be integrated into the framework in order to better support heterogeneous transportation datasets and evolving ITS environments. Furthermore, although the employed AI-based reconstruction techniques achieved satisfactory performance within the scope of the present study, future improvements in AI and machine learning models may further reduce reconstruction errors and improve adaptability under more complex traffic conditions. Due to its modular architecture, the framework can be continuously extended and customized according to the needs of different traffic management authorities and operational contexts.

Several limitations associated with the case study and simulation-based evaluation should also be considered. First, the analysis focuses on a single urban corridor in Thessaloniki and primarily examines weekday morning peak-hour conditions. Therefore, the findings may not be directly generalizable to larger urban networks, different traffic environments, weekends, special events, or extreme weather conditions. In addition, the examined traffic management strategy is based on the selection between only two predefined signal control plans (peak and off-peak), whereas more advanced adaptive traffic control systems may respond differently to degraded traffic data quality.

Moreover, the sensitivity analysis adopted uniformly distributed random noise as a simplified representation of traffic data degradation. However, real-world traffic data quality issues often involve more complex uncertainty patterns, including systematic sensor biases, correlated temporal errors, communication failures, and localized detector malfunctions. Future research could therefore investigate additional uncertainty scenarios and examine whether certain traffic states or signal control strategies are inherently more robust to imperfect traffic data, allowing some data quality degradations to be partially absorbed without significantly affecting operational traffic management decisions.

From a policy and practice perspective, the findings of this study carry direct implications for traffic management authorities and infrastructure operators. First, the results underscore the importance of investing in continuous and automated data quality monitoring as an integral component of traffic management centre operations, rather than relying on periodic or manual validation. Second, the identification of a critical data quality threshold, below which misclassification frequency increases sharply, provides a practical basis for defining minimum data quality standards in procurement specifications and service-level agreements for traffic data providers. Third, the spatial concentration of impacts at specific bottleneck intersections suggests that targeted detector maintenance and redundancy strategies at critical locations can yield disproportionately large system-level benefits. Finally, the framework’s modular and scalable design facilitates its integration into existing traffic management platforms, supporting a cost-effective path toward data-driven and resilient urban traffic operations.

Author Contributions

Conceptualization, V.P.; methodology, V.P.; software, V.P. and; D.T.; validation, V.P., D.T., E.M. (Evangelos Mintsis), E.I.V. and E.M. (Evangelos Mitsakis); formal analysis, V.P. and D.T.; investigation, V.P.; resources, V.P. and D.T.; data curation, V.P.; writing—original draft preparation, V.P. and D.T.; writing—review and editing, V.P., D.T., E.M. (Evangelos Mintsis) and E.I.V.; visualization, V.P.; supervision, E.I.V. and E.M. (Evangelos Mitsakis); project administration, E.M. (Evangelos Mitsakis); funding acquisition, E.M. (Evangelos Mitsakis). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union, Horizon Europe research and innovation programme, under grant agreement No 101104171.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author because the traffic data used in this study were obtained from an operational traffic management system and third-party data providers. Due to licensing, privacy, and contractual restrictions, the raw data cannot be made publicly available.

Acknowledgments

This work has been carried out within the framework of the SYNCHROMODE project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SUMO	Simulation of Urban MObility
SD	Standard Deviation
CV	Coefficient of Variation

References

Garg, T.; Kaur, G. A Systematic Review on Intelligent Transport Systems. J. Comput. Cogn. Eng. 2023, 2, 175–188. [Google Scholar] [CrossRef]
Teh, H.Y.; Kempa-Liehr, A.W.; Wang, K.I.-K. Sensor data quality: A systematic review. J. Big Data 2020, 7, 11. [Google Scholar] [CrossRef]
Lopes, J.; Bento, J.; Huang, E.; Antoniou, C.; Ben-Akiva, M. Traffic and Mobility Data Collection for Real-Time Applications. In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems (ITSC), Madeira Island, Portugal, 19–22 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 216–223. [Google Scholar] [CrossRef]
Laña, I.; Sanchez-Medina, J.J.; Vlahogianni, E.I.; Del Ser, J. From Data to Actions in Intelligent Transportation Systems: A Prescription of Functional Requirements for Model Actionability. Sensors 2021, 21, 1121. [Google Scholar] [CrossRef]
Vlahogianni, E.I.; Karlaftis, M.G.; Golias, J.C. Short-term traffic forecasting: Where we are and where we’re going. Transp. Res. Part C Emerg. Technol. 2014, 43, 3–19. [Google Scholar] [CrossRef]
Mantouka, E.G.; Vakrinou, K.; Christidis, K.N.; Magoutas, B.; Kiousi, A.; Vlahogianni, E.I. Automated Vehicle Traffic: A Review of Operational Challenges, Infrastructure Requirements and Research Directions. Sensors 2026, 26, 1232. [Google Scholar] [CrossRef]
Chen, L.; May, A.D. Traffic Detector Errors and Diagnostics. Transp. Res. Rec. J. Transp. Res. Board 1987, 1132, 82–93. Available online: https://onlinepubs.trb.org/Onlinepubs/trr/1987/1132/1132-010.pdf (accessed on 14 April 2026).
Jacobson, L.N.; Nihan, N.L.; Bender, J.D. Detecting Erroneous Loop Detector Data in a Freeway Traffic Management System. Transp. Res. Rec. J. Transp. Res. Board 1990, 1287, 151–166. Available online: https://onlinepubs.trb.org/Onlinepubs/trr/1990/1287/1287-016.pdf (accessed on 14 April 2026).
Dudek, C.L.; Dutt, A.; Messer, C.J.; Ritch, G.P. A Study of Detector Reliability for a Safety Warning System on the Gulf Freeway; Texas Transportation Institute: Bryan, TX, USA, 1974. [Google Scholar]
Turner, S. Defining and Measuring Traffic Data Quality: White Paper; Federal Highway Administration, U.S. Department of Transportation: Washington, DC, USA, 2002. Available online: https://rosap.ntl.bts.gov/view/dot/4195 (accessed on 14 April 2026).
Margiotta, R. State of the Practice for Traffic Data Quality: White Paper; Federal Highway Administration, U.S. Department of Transportation: Washington, DC, USA, 2002. Available online: https://rosap.ntl.bts.gov/view/dot/4193 (accessed on 14 April 2026).
Middleton, D.; Gopalakrishna, D.; Raman, M. Advances in Traffic Data Collection and Management: White Paper; Federal Highway Administration, U.S. Department of Transportation: Washington, DC, USA, 2002. Available online: https://rosap.ntl.bts.gov/view/dot/4194 (accessed on 14 April 2026).
Margiotta, R.; Turner, S. Traffic Data Quality Measurement: Final Report; Federal Highway Administration, U.S. Department of Transportation: Washington, DC, USA, 2004. Available online: https://rosap.ntl.bts.gov/view/dot/4226 (accessed on 14 April 2026).
Carvalho, A.M.; Soares, S.; Montenegro, J.; Conceição, L. Data Quality: Revisiting Dimensions towards New Framework Development. Procedia Comput. Sci. 2025, 253, 247–256. [Google Scholar] [CrossRef]
Sun, T.; Zhu, S.; Hao, R.; Sun, B.; Xie, J. Traffic Missing Data Imputation: A Selective Overview of Temporal Theories and Algorithms. Mathematics 2022, 10, 2544. [Google Scholar] [CrossRef]
Montenegro, J.; Conceição, L.; Soares, S.; Beernaerts, J.; Hovestad, M.; Carvalho, A.M. A Data Quality Framework on Urban Vehicle Access Regulations. Transp. Res. Procedia 2026, 95, 129–136. [Google Scholar] [CrossRef]
Bangad, N.; Jayaram, V.; Krishnappa, M.S.; Banarse, A.R.; Bidkar, D.M.; Nagpal, A.; Parlapalli, V. A Theoretical Framework for AI-Driven Data Quality Monitoring in High-Volume Data Environments. Int. J. Comput. Eng. Technol. 2024, 15, 618–636. [Google Scholar] [CrossRef]
Chen, Z.; Qin, X.; Schneider, E.; Cheng, Y.; Parker, S.; Shaon, R.R. Designing a Comprehensive Procedure for Flagging Archived Traffic Data: A Case Study. Transp. Res. Rec. 2019, 2673, 165–175. [Google Scholar] [CrossRef]
Gouran, P.; Nadimi-Shahraki, M.H.; Masoud Rahmani, A.; Mirjalili, S. An Effective Imputation Method Using Data Enrichment for Missing Data of Loop Detectors in Intelligent Traffic Control Systems. Remote Sens. 2023, 15, 3374. [Google Scholar] [CrossRef]
Purkrábková, Z.; Hrubeš, P. Validity of Speed-Based Congestion Detection in Traffic Data. In Proceedings of the 2025 Smart City Symposium Prague (SCSP); IEEE: Piscataway, NJ, USA, 2025. [Google Scholar] [CrossRef]
Purkrábková, Z.; Langr, M.; Hrubeš, P. Detecting Anomalies in Traffic Data Using a Flexible Semi-Parametric Model. Eur. Transp. Res. Rev. 2025, 17, 33. [Google Scholar] [CrossRef]
Bachechi, C.; Rollo, F.; Po, L. Detection and Classification of Sensor Anomalies for Simulating Urban Traffic Scenarios. Clust. Comput. 2022, 25, 2793–2817. [Google Scholar] [CrossRef]
Mitsakis, E.; Tzanis, D.; Petkani, V.; Dolianitis, A.; Mintsis, E.; Kotsi, A.; Psonis, V. A Data-Driven Decision Support System for Multimodal Network and Traffic Management—SYNCHROMODE. In Conference on Sustainable Urban Mobility; Springer Nature: Cham, Switzerland, 2024; pp. 94–106. [Google Scholar] [CrossRef]
Liu, Z.; Chen, H. Short-Term Online Taxi-Hailing Demand Prediction Based on the Multimode Traffic Data in Metro Station Areas. J. Transp. Eng. Part A Syst. 2022, 148, 04022036. [Google Scholar] [CrossRef]
Turner, S.; Albert, L.; Gajewski, B.; Eisele, W. Archived Intelligent Transportation System Data Quality: Preliminary Analyses of San Antonio TransGuide Data. Transp. Res. Rec. 2000, 1719, 77–84. [Google Scholar] [CrossRef]
Weijermars, W.A.M.; Van Berkum, E.C. Detection of Invalid Loop Detector Data in Urban Areas. Transp. Res. Rec. 2006, 1945, 82–88. [Google Scholar] [CrossRef]
Kwon, J.; Chen, C.; Varaiya, P. Statistical Methods for Detecting Spatial Configuration Errors in Traffic Surveillance Sensors. Transp. Res. Rec. 2004, 1870, 148–156. [Google Scholar] [CrossRef]
Hamad, K. Quality Control of Archived Intelligent Transportation Systems Data. Int. J. Traffic Transp. Eng. 2015, 5, 238–251. [Google Scholar] [CrossRef]
Cui, Z.; Henrickson, K.; Ke, R.; Wang, Y. Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4883–4894. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
Yin, X.; Wu, G.; Wei, J.; Shen, Y.; Qi, H.; Yin, B. Deep Learning on Traffic Prediction: Methods, Analysis, and Future Directions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4927–4943. [Google Scholar] [CrossRef]
Karetsos, P.; Petkani, V.; Tzanis, D.; Mintsis, E.; Mitsakis, E. A Scalable Context-Aware STGCN Framework for Real-Time Traffic Forecasting with Residual Correction. Future Transp. 2026, 6, 111. [Google Scholar] [CrossRef]
Álvarez López, P.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.-P.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wießner, E. Microscopic Traffic Simulation Using SUMO. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2575–2582. Available online: https://elib.dlr.de/127994/1/08569938.pdf (accessed on 14 April 2026).
HBEFA (Handbook Emission Factors for Road Transport), Version 3.1. 2010. Available online: https://www.hbefa.net (accessed on 25 April 2026).
Al-Quhfa, H.; Mothana, A.; Song, J. A Systematic Review on Data-Driven Traffic Management for Sustainable Urban Transport. Int. J. Adv. Netw. Appl. 2026, 17, 6992–7007. [Google Scholar] [CrossRef]
Shang, Q.; Tang, Y.; Yin, L. A Hybrid Model for Missing Traffic Flow Data Imputation Based on Clustering and Attention Mechanism Optimizing LSTM and AdaBoost. Sci. Rep. 2024, 14, 26473. [Google Scholar] [CrossRef]
Mei, H.; Li, J.; Shi, B.; Wei, H. Reinforcement Learning Approaches for Traffic Signal Control under Missing Data. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23), Macau, China, 19–25 August 2023; pp. 5009–5017. [Google Scholar] [CrossRef]
Feng, Y.; Head, K.L.; Khoshmagham, S.; Zamanipour, M. A Real-Time Adaptive Signal Control in a Connected Vehicle Environment. Transp. Res. Part C Emerg. Technol. 2015, 55, 460–473. [Google Scholar] [CrossRef]

Figure 1. Data quality framework.

Figure 2. (a) Location of the 8 intersections along K. Karamanli Street. Intersections are classified as primary (Pr) and secondary (Sec). (b) Spatial distribution of the loop detectors installed at each intersection along the study corridor. (c) Main corridor links considered in the signal plan activation analysis at the primary intersections. These links correspond to the link IDs reported in Table 3 and are used for evaluating the predefined flow and speed thresholds.

Figure 3. Traffic signal control plan activation time.

Figure 4. Classification accuracy per scenario.

Table 1. Data quality assessment layer.

Data Quality Dimensions	Definitions	Formulas
Completeness	The degree to which data values are present in the attributes that require them	$P e r c e n t a g e c o m p l e t e (%) = \frac{n_{a v a i l i a b l e v a l u e s}}{n_{t o t a l e x p e c t e d}} \times 100$ (1) where $n_{a v a i l a b l e v a l u e s}$ = number of records or rows with available values present $n_{t o t a l e x p e c t e d}$ = the total number of records or rows expected
Validity	The degree to which data values satisfy acceptance requirements of the validation criteria or fall within the respective domain of acceptable values.	$P e r c e n t v a l i d (%) = \frac{n_{v a l i d}}{n_{t o t a l}}$ × 100 (2) where $n_{v a l i d} =$ the number of records or rows with values meeting validity criteria $n_{t o t a l} =$ the number of records or rows subjected to validity criteria
Accuracy	The measure or degree of agreement between a data value or set of values and a source assumed to be correct	Mean absolute percent error (MAPE) Signed percent error Root mean squared error (RMSE)
Timeliness	The degree to which data values or a set of values are provided at the time required or specified	$P e r c e n t t i m e l y d a t a (%) = \frac{n_{o n - t i m e}}{n_{t o t a l}} \times 100$ (3) where $n_{o n - t i m e} =$ the number of data messages or packets received within acceptable time limits $n_{t o t a l} =$ the total number of data messages or packets received $A v e r a g e d e l a y f o r l a t e d a t a = (\frac{1}{n_{l a t e}}) \times (\sum {(t}_{l a t e}) - (t_{e x p e c t e d}))$ (4) where: $n_{l a t e} =$ the number of data messages or packets received outside acceptable time limits $t_{l a t e} =$ the actual arrival time of a late data message or packet $t_{e x p e c t e d} =$ the expected arrival of a late data message or packer
Coverage	The degree to which data values in a sample accurately represents the whole of that which is to be measured	$N e t o w o r k c o v e r a g e (%) = \frac{N u m b e r o f r e l e v a n t l i n k c o v e r e d b y d a t a}{T o t a l n u m b e r o f r e l e v a n t l i n k s} \times 100$ (5) $A r e a s C o v e r a g e (%) = \frac{N u m b e r o f r e l e v a n t a r e a s c o v e r e d b y d a t a}{T o t a l n u m b e r o f r e l e v a n t a r e a s} \times 100$ (6)
Accessibility	The relative ease with which data can be retrieved and manipulated by data consumers to meet their needs	Qualitative: A listing or description of the mechanisms or media in which data can be obtained for use. Quantitative: The average time required for data consumers to perform specified data retrieval or manipulation tasks.

Table 2. Types of validity checks.

Category	Description
Format checks	Ensure that data adhere to expected formats (e.g., numeric values, timestamp structure)
Basic (univariate) checks	Identify outliers or improbable values in single variables (e.g., negative speeds)
Statistical checks	Use statistical methods (e.g., thresholds, distributions) to flag anomalies
Spatial consistency checks	Compare data across adjacent detectors or lanes to detect inconsistencies
Temporal consistency and pattern checks	Examine time series behavior to detect unexpected fluctuations or gaps.
Conservation-based checks	Ensure that vehicle counts entering and exiting a segment align logically
Multivariate checks based on the fundamental diagram	Validate relationships between flow, density, and speed
Advanced or AI-based validity checks	Leverage machine learning or anomaly detection algorithms to detect complex patterns of invalidity

Table 3. Signal plan activation thresholds for the examined corridor (06:00–13:00).

Signalized Intersection	Link Id	Flow Threshold (veh/h)	Speed Threshold (km/h)	If Conditions Satisfied	Otherwise
K.Karamani-M.Mpotsari	127	800	25	Off-peak plan (Plan 3)	Morning peak plan (Plan 1)
K.Karamani-A.Papanastsiou	136	700	17	Off-peak plan (Plan 3)	Morning peak plan (Plan 1)
K.Karamani-Voulgari	110	700	15	Off-peak plan (Plan 3)	Morning peak plan (Plan 1)
K.Karamani-25th Martiou	118	850	15	Off-peak plan (Plan 3)	Morning peak plan (Plan 1)
K.Karamani-P.Sindika	123	800	20	Off-peak plan (Plan 3)	Morning peak plan (Plan 1)

Table 4. ST-GCN model configuration and evaluation results.

Parameters	Value
Temporal window	30
Hidden dimension	32
Batch size	64
Optimizer	Adam
Loss function	Huber Loss
Learning rate	$1 \times 10^{- 3}$
Early stopping	30 epochs
Evaluation metrics	MAE (veh/h) = 46 RMSE (veh/h) = 70 MAPE (%) = 30

Table 5. Comparison of correct and incorrect activation of each traffic signal control plan and their duration.

Scenario	True Off-Peak (ΤΝ)	True Peak (TP)	Off-Peak à Peak (FN)	Peak à Off Peak (FP)	Accuracy
Baseline	206	165	49	0	0.88
5% noise	213	149	42	16	0.86
10% noise	213	135	42	30	0.83
15% noise	213	130	42	35	0.82

Table 6. Key Performance Indicators.

Traffic Performance Indicators	Description
Avg. Travel Time (min)	The time the vehicle needed to accomplish the route
Avg. Waiting Time (min)	The time during which the vehicle speed was below or equal to 0.1 m/s
Environmental Performance Indicators	Description
CO₂ (mg/km)	Average CO₂ emissions per kilometer traveled
NO_x (mg/km)	Average NO_x emissions per kilometer traveled
PM_x (mg/km)	Average PM_x emissions per kilometer traveled

Table 7. Key Indicators from the Simulation Analysis.

Scenario A2 Compared to A1
	A1 Values			A2 Values			Percentage (%)
	Mean	±SD	CV (%)	Mean	±SD	CV (%)	Percentage (%)
Avg. Travel Time (min)	3 min 47 s	3 s	1.35	4 min 19 s	4 s	1.48	↑14
Avg. Waiting Time (min)	1 min 51 s	2 s	2.08	2 min 16 s	2 s	1.97	↑23
CO₂ (mg/km)	87.4100	1.44	1.64	102.1873	1.49	1.51	↑17
NO_x(mg/km)	0.0376	0.00066	1.74	0.0443	0.00068	1.6	↑18
PM_x(mg/km)	0.0018	0.000036	1.95	0.0022	0.000036	1.75	↑21
Scenario B2 compared to B1
	B1 Values			B2 Values			Percentage (%)
	Mean	±SD	CV (%)	Mean	±SD	CV (%)	Percentage (%)
Avg. Travel Time (min)	2 min	1 s	0.53	2 min 12 s	1 s	0.58	↑10
Avg. Waiting Time (min)	0 min 36 s	1 s	1.33	0 min 48 s	1 s	1.33	↑33
CO₂ (mg/km)	74.8297	0.43	0.53	92.2172	0.69	0.75	↑23
NO_x(mg/km)	0.0309	0.00002	0.64	0.03824	0.00031	0.82	↑23
PM_x(mg/km)	0.0014	0.000011	0.79	0.00177	0.000017	0.97	↑25

↑ indicates an increase between the first and the second scenarios that have been analyzed.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Petkani, V.; Tzanis, D.; Mitsakis, E.; Mintsis, E.; Vlahogianni, E.I. Data Quality in Traffic Management: Framework and Real-World Impacts. Future Transp. 2026, 6, 124. https://doi.org/10.3390/futuretransp6030124

AMA Style

Petkani V, Tzanis D, Mitsakis E, Mintsis E, Vlahogianni EI. Data Quality in Traffic Management: Framework and Real-World Impacts. Future Transportation. 2026; 6(3):124. https://doi.org/10.3390/futuretransp6030124

Chicago/Turabian Style

Petkani, Viktoria, Dimitris Tzanis, Evangelos Mitsakis, Evangelos Mintsis, and Eleni I. Vlahogianni. 2026. "Data Quality in Traffic Management: Framework and Real-World Impacts" Future Transportation 6, no. 3: 124. https://doi.org/10.3390/futuretransp6030124

APA Style

Petkani, V., Tzanis, D., Mitsakis, E., Mintsis, E., & Vlahogianni, E. I. (2026). Data Quality in Traffic Management: Framework and Real-World Impacts. Future Transportation, 6(3), 124. https://doi.org/10.3390/futuretransp6030124

Article Menu

Data Quality in Traffic Management: Framework and Real-World Impacts

Abstract

1. Introduction

2. Traffic Data Quality Framework

2.1. Data Ingestion Layer

2.2. Data Quality Assessment Layer

2.3. Data Quality Imputation and Correction Layer

2.4. Alerting System

3. Assessing the Impact of Traffic Signal Control

3.1. Case Study Area

3.2. Signal Plan Activation Logic

3.3. Data Quality Scenarios

3.4. Sensitivity Analysis Results on Data Quality

4. SUMO-Based Simulation Environment

4.1. Corridor Modeling and Simulation Setup

4.2. Data Quality Scenarios in SUMO Environment

4.3. Performance Assessment Under Data Quality Variations

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI