1. Introduction
Monitoring and analyzing vehicle speed distribution is crucial for understanding traffic flow dynamics and ensuring road safety, particularly in rural road segments [
1]. Traditional methodologies predominantly utilize fixed sensors deployed at predetermined locations to record instantaneous vehicle speeds. This approach facilitates the generation of empirical speed distributions and enables the computation of related statistical measures, thereby offering a discrete snapshot of speed patterns at specific points along the roadway. However, these fixed-point measurements inherently lack continuity, as they can only offer intermittent data points based on the placement and density of sensors [
2]. Fixed sensors, while useful, present a limitation in that they often fail to correlate data from individual vehicles across multiple sections. As a result, traffic behavior between points must be inferred through aggregate data comparisons or interpolations, leaving significant gaps in the understanding of speed distribution along the entire road segment [
3]. For instance, methods such as linear, polynomial, or spline interpolations are commonly employed to estimate speed trends between sensor points, but these techniques do not capture the true variability and heterogeneity of speed distributions over continuous distances [
4].
The advent of Floating Car Data (FCD) technology has introduced a transformative approach to traffic analysis by providing near-continuous temporal tracking of equipped vehicles. Li et al. [
5] presented a computationally simple and robust cross-validation method for reconciling traffic speed measurements from probe and stationary sensors, effectively identifying discrepancies using both simulation models and real-world freeway data. Although FCD samples represent a smaller fraction of the total vehicle population, they offer a high level of detail in spatial and temporal segmentation, enhancing traffic data accuracy [
6]. To optimize this accuracy, Altintasi et al. [
7] propose a method to assess the traffic Penetration Rate (PR) of commercial FCD by comparing its speed estimation to Ground Truth (GT) data, finding that a PR of 15% significantly improves FCD quality, with a suggested PR of around 5% for commercial FCD. Despite the increasing volume of research utilizing FCD for assessing mean speeds, percentiles, and other statistics along road segments, there is a noticeable gap in the literature regarding the analysis of complete speed distributions. For instance, Budimir et al. [
8] explore the use of mobile vehicles to collect real-time traffic flow data through FCD and Probe vehicles, highlighting their efficiency, cost-effectiveness, and extensive coverage for achieving sustainable mobility, supported by technologies like GIS, GNSS, and wireless communication. Ambros et al. [
9] investigate the use of FCD for proactively identifying risk locations on rural roads by analyzing GPS-derived speeds and their relationship with accident frequency, highlighting practical feasibility and implications for rural safety monitoring and evaluation. Fabrizi and Ragona [
10] present a model for short-term traffic speed forecasting using an FCD system, which enhances coverage without expensive infrastructure and provides real-time traffic speed information across a road network. Zhang et al. [
11] developed a method for identifying bottlenecks using FCD, employing speed difference as a primary indicator and speed-at-capacity as a secondary indicator to evaluate bottlenecks by duration, affected distance, delay, and cause.
This paper aims to address this gap by proposing a novel method for evaluating variations in vehicle speed distribution along rural road segments with various geometric and functional characteristics. We utilize a validated FCD sample cross-referenced with data from a fixed control station to apply a non-parametric similarity measure—specifically, the 1-Wasserstein distance—to compare speed histograms at closely spaced intervals along the road. By highlighting areas of homogeneity and heterogeneity in speed distributions, this approach allows for a detailed examination of how physical and geometric features of the road, such as curvature, lane count, intersections, and access points, influence driving behavior. The continuous nature of FCD data, combined with sophisticated similarity measures, enables a more comprehensive understanding of speed distribution heterogeneity. This, in turn, can inform more effective traffic management strategies and road design improvements aimed at enhancing safety and efficiency on rural roads.
2. Materials
The evaluation of operating speed is pivotal in assessing the effects of roadside and geometric features on both collision occurrence and severity. Indeed, roadside features increase collision frequency while reducing speed, whereas geometric features have the opposite effect, and lower operating speeds lead to a reduction in collision frequency [
12]. The 85th percentile speed, commonly referred to as the operating speed, is typically used to characterize the distribution of vehicle speeds. This percentile provides an estimate of the speed below which 85% of the traffic is traveling, offering a more robust representation of typical driving behavior than mean speed. Historically, operating speed models have relied on data from inductive loops and magnetic sensors, which capture vehicle speeds at specific locations [
13]. These sensors provide detailed temporal data but are limited in their spatial coverage, as they only record speeds at their installation sites. Recent advancements have introduced Floating Car Data (FCD), derived from vehicles equipped with GPS devices. FCD offers comprehensive spatiotemporal insights by tracking vehicle movements across entire road networks, thus enabling more accurate modeling of operational speeds.
Floating Car Data (FCD) refers to speed and location data collected from vehicles that are part of the traffic stream and equipped with GPS tracking systems. These data are valuable because they provide continuous monitoring of vehicle speeds over extended areas, capturing real-world driving behavior across different road segments. FCD is advantageous due to its ability to collect data over both temporal and spatial domains, giving a holistic view of traffic dynamics over a network rather than at discrete points [
2]. However, one challenge associated with FCD is ensuring the representativeness of the data, given that it is often collected from a limited subset of the total vehicle population [
7].
In contrast, fixed sensors (such as radar units, inductive loops, and ANPR cameras) are deployed at specific locations along the roadway to capture the instantaneous speeds of passing vehicles. These sensors provide high-resolution temporal data for vehicle speeds at particular points, offering insights into traffic flow characteristics at fixed locations. However, the major limitation of fixed sensors is their spatial restriction; they cannot provide continuous data along the road network and are unable to capture speed variations beyond their immediate vicinity [
4].
In this research, we have used data from the Automatic Statistical Traffic Detection System managed by ANAS SpA, which operates within the Italian national road network. This system uses a variety of sensors to collect traffic data, which is centralized in the Platform for Monitoring and Analysis (PANAMA). The accuracy and reliability of this data are ensured through rigorous validation procedures. The fixed-point data were collected over three distinct months—August 2018, February 2019, and May 2019—within the Veneto Region. The data included the following variables:
Time Reference: Date and time of data acquisition;
Lane: The specific lane on which the vehicle was traveling;
Direction: The direction of travel;
Speed: The speed of the vehicle recorded in km/h;
Vehicle Class: A code identifying the vehicle type (e.g., cars and trucks).
The FCD was obtained from a commercial provider that aggregates GPS data from over four million vehicles equipped with black boxes and from approximately 1.5 million smartphone applications. The dataset contained nearly one billion data points for the same time periods and region as the fixed sensor data. Key variables included the following:
Identification Code: Unique identifier for each data point;
Longitude and Latitude: Vehicle location coordinates in WGS 84 format;
Direction: Direction of travel expressed as an azimuth angle;
Speed: Vehicle speed at the time of signal emission;
Date and Time: Timestamp of the GPS signal;
Signal Quality: Quality of the GPS signal;
Vehicle ID: Unique identifier for each vehicle;
Vehicle Type: Classification of the vehicle.
Both data sources provided extensive datasets that allowed for a comprehensive analysis of vehicle speed profiles across different road segments.
4. Speed Probability Distribution on Two-Lane Road Segments
Although the assumption of homogeneous conditions simplifies classic traffic flow theory and modeling, real-world traffic flow exhibits significant heterogeneity. Similarity measures effectively evaluate the gradient of similarity between homogeneous and heterogeneous conditions, ranging from maximal to minimal similarity. In traffic analysis, studying how traffic flow characteristics vary along a road segment is crucial, particularly in understanding vehicular speed trends on highways. This knowledge is vital for highway design, performance and safety verification, and regulatory compliance monitoring.
Analyzing vehicle speed data along a highway segment provides insights into the probability distribution of speeds at different sections. By examining a sequence of sections, the speed behaviors form a sequence of random variables. Instantaneous speed samples from vehicles allow the inference of probability distributions, beyond simple sample statistics like centrality, dispersion, and percentiles, including v85 profiles.
This paper section illustrates how the 1-Wasserstein distance can represent variations in speed distribution along a highway segment. Using a limited sample of speeds from Floating Vehicles (FV) or Floating Car Data (FCD), validated by a larger sample from fixed monitoring devices, we can concisely capture these variations.
4.1. Establishing Baseline Data for Analyzing Speed Distribution Similarity
This study focuses on a secondary rural road that runs from Mestre (Venice) to Pesek (San Dorligo della Valle) in the province of Trieste. The section examined crosses the Veneto Region and is classified as a secondary rural road according to Italian standards (DM2001). It is characterized by variable geometric and functional features, such as curvature, access points, intersections, and lane numbers. The analysis centers on two specific segments of SS14: Segment 218, which extends from kilometer marker 12,000 to 22,000 in the descending direction (DESC), and Segment 3191, which extends from kilometer marker 4000 to 14,000 in the ascending direction (ASC). Segment km 4000–8000 features a dual carriageway with two lanes in each direction, with a total width of approximately 15 m. The presence of at-grade intersections that handle significant traffic volumes necessitates the inclusion of specialized lanes, such as left-turn lanes. Segment km 8000–13,470 has a single carriageway with one lane per direction, each lane being 3.5 m wide, and 1 m-wide shoulders on both sides. Segment km 12,000 to 22,000 also has a single carriageway with one lane per direction. However, at kilometer 17 + 750, the presence of an at-grade intersection with acceleration and deceleration lanes facilitates smoother turning maneuvers. Both segments—218 and 3191—are equipped with fixed monitoring stations that provide continuous data collection. For Segment 218, the fixed station is located at kilometer marker 17,085, while for Segment 3191, it is positioned at kilometer marker 9047. Data were collected from these stations during three distinct periods: August 2018, February 2019, and May 2019. The data collection for each of the three periods spanned the entire month. During these months, data were collected continuously without any intentional breaks in the collection process. These time frames were chosen to capture a representative sample of traffic conditions over different seasons.
The Floating Car Data (FCD) database, which contains detailed information on vehicle positions and speeds, was meticulously processed using map-matching techniques. These techniques align the recorded vehicle positions with the road segment’s progression, ensuring spatial accuracy. Additionally, vehicles that did not pass the fixed monitoring stations, or those lacking continuous trajectory and sufficient signal emission frequency (i.e., 1 Hz), were filtered out to maintain data integrity. To determine the lack of a continuous trajectory for a vehicle, specific criteria based on the consistency and completeness of the recorded GPS data have been applied:
Signal Frequency Check: Vehicles with data points recorded at frequencies lower than 1 Hz could indicate gaps in the trajectory and were flagged for further inspection;
Temporal and Spatial Continuity: For each vehicle, the sequence of recorded positions (latitude, longitude, and corresponding timestamps) has been examined to ensure that the vehicles followed a logical and continuous progression along the road. Significant gaps in time between successive data points, or abrupt, unrealistic jumps in spatial position (which could not be explained by the vehicle’s speed or the road’s geometry), were used as indicators of a non-continuous trajectory;
Cross-Reference with Road Geometry: If the trajectory data suggested that a vehicle deviated significantly from the expected path without any corresponding road features that could explain such a deviation (e.g., intersections, exits), this was considered a lack of continuity.
By establishing this comprehensive baseline data, including detailed descriptions of the road segments, fixed monitoring stations, and the processed FCD, the study provides a solid foundation for analyzing speed distribution similarity along the highway segments. This robust dataset enables a precise and accurate evaluation of traffic patterns and their variability, facilitating a deeper understanding of speed distributions under varying road and traffic conditions.
4.2. Speed Distributions Heterogeneity and Similarity Measure
The distance progression measurements for each Floating Vehicle (FV) and their corresponding instantaneous speeds were extracted from the entire database. These measurements, sampled at irregular intervals due to variable signal emission frequencies and vehicle speeds, resulted in non-uniform data points along the road track. To achieve a seamless speed representation across the monitored road segment, a cubic smoothing spline was applied to each vehicle’s distance progression and speed vectors. During the method selection phase, we considered alternatives like kernel smoothing and moving average. However, these were discarded in favor of the cubic smoothing spline. Kernel smoothing showed excessive variability with different bandwidths, and moving averages did not reliably capture local variations. Additionally, cubic smoothing spline is a robust method widely used for FCD in the literature [
55], offering easily implementable solutions with a good balance between data flexibility and curve smoothness control. This third-degree polynomial function is implemented using Matlab R2020a’s ‘csaps’ function with a smoothing parameter of 2 × 10
-4. After conducting preliminary analyses and cross-validations, this value provided a good compromise, capturing the data’s underlying trend while filtering out high-frequency noise. We tested various smoothing parameters and evaluated the fit quality and smoothness of the resulting curves.
Resampling of speeds was performed in virtual counting sections (VSs) along the highway segment, identified at uniform 10-m intervals from the initial point of the monitored stretch. The choice of a 10-m interval for the VSs along the highway segment was chosen as the optimal compromise, balancing detailed representation of road variations and efficient data use. It was based on several factors, considering that we aimed for granularity that allows effective analysis and accurate correlation of speed data with road characteristics. Preliminary analyses showed that a 10-m interval captures significant changes in road geometry, such as curvature, better than coarser intervals. Finer intervals would have led to over-detailing without significant advantages.
During our preliminary analysis, we observed that virtual tail sections with fewer than 10 vehicles resulted in excessively high variability in the data due to the small sample size. Thus, virtual tail sections collecting fewer than 10 speed values were excluded to minimize the influence of outliers and anomalies, ensuring that the data from each virtual section is representative of the behavior of a larger group.
Figure 3 and
Figure 4 display the smoothing splines for VSs with a minimum of 10 FVs in two segments of SS14: segment 218 (km 12,660 to km 20,310, DESC direction) and segment 3191 (km 4000 to km 13,470, ASC direction).
To validate the overall speed distribution across all virtual sections, the distribution obtained from the resampling process was compared to that from all original FV speed measurements. As illustrated by the histograms in
Figure 5 and the descriptive statistics in
Table 1, the probability distributions from the resampled 10-m VSs were consistent with the original FCD distributions. This confirmed that the smoothing splines effectively maintained the continuity and smoothness of individual vehicle speed tracks without distorting the original data.
The SS14 highway segments 218 and 3191 are equipped with fixed monitoring devices (Control Units, CUs). The device on segment 218 is located at kilometer marker 17,085, while the device on segment 3191 is at kilometer marker 9047.
Figure 6 illustrates the speed distributions of vehicles recorded by these devices, which operated continuously during three distinct periods: August 2018, February 2019, and May 2019.
Table 2 provides the descriptive statistics of the speeds detected in both travel directions. It is important to note that the Floating Car Data (FCD) for these segments pertains to the same monitoring periods.
5. Result and Discussion
With an extensive dataset of speed values recorded by the Control Units (CUs), the similarity between the probability distributions of these data and those from the Virtual Sections (VSs) can be assessed. Simply observing histograms or comparing descriptive statistics (
Figure 5 and
Figure 6,
Table 1 and
Table 2) does not provide a precise answer. Even with numerous statistics on centrality, dispersion, shape, symmetry, and percentiles, determining how closely the CU-sampled distributions match those sampled by mobile devices along the highway remains challenging.
To address this concisely, the 1-Wasserstein distance can be used to measure the similarity between pairs of histograms . For histograms with congruent binning ( bins), the normalized 1-Wasserstein distance ranges from 0 (complete overlap and total similarity) to 1 (maximum dissimilarity).
Using the CU-recorded speed distribution
(histogram Q from
Figure 6) as the reference, the similarity to any other distribution
can be measured by calculating
.
Table 3 presents the
values for different segments and travel directions, with
representing CU data and
representing mobile sensor data.
Specifically,
denotes the probability distribution
from the entire resampled dataset in the VSs (
Figure 5b,d) and
represents
, the resampled FCD data in the VSs near the CU. Consistent binning with
1 bins is used, covering speed classes in 5 km/h intervals from 0 to 150 km/h plus an additional interval for speeds over 150 km/h.
The primary objective of this study is to analyze the evolution of speed distributions along highway segments, which has not been addressed thus far. Vehicular speed distributions may fluctuate due to varying conditions. To examine this variation, speed values in virtual sections along the two segments were considered, obtained by resampling speed values for each vehicle every 10 m using smoothing splines.
Figure 7 and
Figure 8 present boxplots of speed data across these 10-m subsections. Each box represents the interquartile range (IQR) of speeds, with the median speed indicated by a line within the box. Whiskers extend to the furthest points within 1.5 times the IQR from the quartiles, and data beyond these whiskers are marked as outliers.
The blue band in the graphs represents the IQR, the continuous red line indicates the median, the dashed gray band shows the whiskers’ trend, and the pinpointed red values are outliers. The non-uniformity of the speed distribution among the different VSs can be inferred by observing the variability in the box plots as follows:
Box Length: Longer boxes indicate a wider range of speed values, so a variation in the box length suggests a variation in the range of speeds between sections;
Whisker Length: Longer whiskers show that there are more extreme speed values, so a variation in the whisker length between sections indicates a variation in the extreme speed values between them;
Position of the Median Line: A median line not centered within the box implies skewness in the data, indicating that the speed distribution is not symmetrical. Therefore, a variation in the position of the median line relative to the box between sections indicates a variation in the symmetry of the distribution among the sections;
Presence of Outliers: A higher number of outliers suggests more variability and potential anomalies in the speed distribution, so a variation in the outliers, in terms of number and position, indicates a variation in the values identified as anomalies among the different sections.
The boxplot series reveals significant variability across sections: median speeds fluctuate, indicating influences from factors such as road geometry, local traffic density, and differing regulations. Higher median speeds suggest road segments that permit or induce faster driving, while lower medians indicate areas with speed reductions due to geometry, interferences, lower speed limits, calming measures, or higher congestion. Thus, analyzing speed distribution variations requires considering the actual driving experiences of vehicles along the highway segment.
Thus, non-homogeneity in the speed distribution can be inferred by qualitatively observing the variability of the boxes and whiskers in the two boxplots. However, quantifying the dissimilarity in speed behaviors remains challenging.
Figure 9 and
Figure 10, which display trends of selected statistical values (including the mean, median, standard deviation, mean ± standard deviation, skewness, and kurtosis), also confirm this non-homogeneity in behavior. In
Figure 9, both the median (red line) and mean (blue line) exhibit noticeable drops between 1.3 and 1.4 × 10
4 km, between 1.78 and 1.83 × 10
4 km, and over 2.0 × 10
4 km, reflecting significant changes in central speed values. Additional fluctuations along the
x-axis indicate variability in driving behavior. The standard deviation (pink line) shows substantial variations in the same locations, highlighting increased speed dispersion in these sections. The mean ± standard deviation (gray lines) further emphasizes changes in speed distribution around the mean at these key points. Skewness (pink line) is generally close to zero but deviates around the key points and in other virtual sections along the axis, suggesting occasional asymmetry in speed distribution. Kurtosis (blue line) shows significant changes, especially coinciding with skewness variations, indicating variations in the peakedness of the speed distribution with more extreme values present in sections with high values.
Similar considerations can be made by examining the trends in
Figure 10. The median (red line) and mean (blue line) display notable fluctuations along the segment, especially around 0.53–0.74 × 10
4 km, signifying significant changes in central speed values. The standard deviation (pink line) exhibits marked increases at these same points, highlighting greater speed dispersion and variability in these sections. The mean ± standard deviation (gray lines) underscores these changes, illustrating how speed values spread out from the mean. Skewness (pink line) tends to stay close to zero but shows deviations, indicating that the speed distribution sometimes shifts asymmetrically. Kurtosis (blue line) displays notable peaks and troughs, especially where skewness changes, signifying variations in the peakedness of the speed distribution.
Although various statistical indices are available to examine different aspects of the experimental distributions, there is no effective synthesis method for determining homogeneity from the perspective of the probabilistic distribution of speeds.
In this context, the normalized 1-Wasserstein distance can be used to measure the similarity of the speed distribution along the two road segments considered as examples.
Figure 11 shows the trend of the series of histograms aggregated with
1 bins (speed classes in 5 km/h intervals from 0 to 150 km/h plus an additional interval for speeds over 150 km/h) that represent the vehicular speed distribution trends in the different VSs identified in the two segments, 218 DISC and 3191 ASC.
Having already confirmed the strong similarity between the CU-recorded speed distribution and the speed distribution from resampled FCD data in the VSs near the CU, the histogram representing the distribution can be assumed as a benchmark for the similarity analysis of the highway segment. Consequently, the normalized 1-Wasserstein distance is determined, signifying the variation in speed histograms (with changing as the VS progresses) compared to the histogram in the VS close to the CU location along the entire length of the highway segment.
Figure 12 and
Figure 13 describe the segment features in terms of road geometry and the surrounding context. The variable
was superimposed on the same graph with the curvature diagram to assess the influence of winding elements. The curvature trend was calculated using the radius
of the curvature of the highway axis at a given point (
) by the formula
= 1/
. The graphs also show the positions along the curvilinear abscissa of various elements such as intersections and access to private and public areas [
56]. Additionally, each segment includes the localization of the CU as a reference term. The symbols explained in the legend (blue cross symbol for Intersections, blue diamond symbol for Lateral accesses, and green star for Control units) are positioned along the
x-axis based on their locations along the kilometric distances.
Examining the graph in
Figure 12, which pertains to segment 218 DISC, it is evident that the red line
exhibits significant peaks at points where the curvature is more pronounced, specifically around 1.33, and 2.02 × 10
4 km. These peaks indicate that changes in road curvature have a considerable impact on the similarity of speed distributions. Sharp curves tend to cause variations in driving behavior, which in turn affects the similarity between speed distributions in these areas compared to the reference section.
Throughout the segment, lateral access points and intersections are scattered. Although lateral access points do not show a strong correlation with the peaks or troughs in the red line, intersections sometimes cause slight fluctuations. This suggests that intersections introduce minor disruptions to speed consistency, but their impact is not as significant as road curvature. The flatter portions of the red line, between 1.4 and 1.7 × 104 km, correspond to sections with fewer intersections and lateral accesses, coupled with relatively stable road curvature. In these segments, the speed distribution aligns more closely with the reference section, indicating that less complex road geometry and fewer interruptions contribute to higher similarity in speed distributions.
Turning to the graph in
Figure 13, related to segment 3191 ASC, the red line
shows significant peaks at points of high road curvature and near lateral accesses and intersections. Between 0.5 and 0.8 × 10
4 km, where the road is widened, the red line exhibits notable fluctuations, suggesting that changes in road width significantly impact the speed distribution. Similarly, regions with straight road segments and the absence of lateral accesses and intersections present the greatest similarity with the reference section. These areas likely allow for more uniform driving speeds, enhancing the similarity in speed distributions.
Following the analysis of the graphs in
Figure 12 and
Figure 13, the
Table 4 provides a numerical summary of the average normalized 1-Wasserstein distance for specific sub-segments. Segment 218 (DESC direction) and segment 3191 (ASC direction) are evaluated at three distinct test sub-segments each, and the average values highlight the variability in speed distribution similarity. These numerical insights provide a quantitative perspective on how various road features influence the similarity of speed distributions, complementing the visual analysis from the graphs.
Overall, tends to peak at points where there are significant changes in road curvature, road widening, lateral accesses, and intersections. This pattern indicates that these factors are crucial in influencing the homogeneity of speed probability distributions along the road segment. The analysis highlights the critical role of road geometry in shaping speed distribution probability. Sharp curves and high curvature areas consistently disrupt speed patterns, suggesting that road design must account for these features to maintain consistent driving behavior.
Intersections, while causing slight fluctuations, have a less significant impact compared to road curvature. Lateral access points also introduce some variability in speed distributions, though their effect is minimal. The widening of the road, as seen in segment 3191 ASC, significantly impacts speed distributions, suggesting that road widening projects need to consider the potential for increased variability in driving behavior, which can affect overall traffic flow and safety. Regions with straight road segments and fewer interruptions (intersections and accesses) exhibit the highest homogeneity in speed distributions. This implies that simpler road designs may contribute to more uniform driving speeds, potentially enhancing traffic efficiency and safety [
57].
Future Developments of the Research
As we have seen in
Section 3.4,
can be easily visualized as the area between two CDFs,
and
. Considering
with the Euclidean norm in Equation (4), the
-Wasserstein (
W) distance
can be defined, also known as the Fréchet distance [
49]:
Arroyo and Maté [
34] and Balzanella and Irpino [
23] provide the explicit form for
for histograms
and
. Irpino and Romano [
58] show a particularly useful property of the
demonstrating the equivalence with a three-term decomposition using the differences between statistics of the two distributions: location, spread, and shape. In fact, the squared value
, a natural extension of the Euclidean distance from point data to distribution data [
59], can be decomposed as the sum of the square difference of the means (i.e., location), the square difference of the standard deviations (i.e., spread) and a residual term, which can be assumed to represent a shape distance between two distributions.
where
and
are the mean of
and
and
and
are the standard deviation of
and
.
is the Pearson correlation of the points in the Quantile-Quantile plot of
and
.
Thus, these three terms can be used to assess similarity/dissimilarity between and in a useful and distinctive manner regarding location, spread, and shape differences.
Moreover, when and are not explicitly given, as for experimental distributions represented as step functions (commonly seen in similarity/dissimilarity analysis of vehicle speed distributions along a road axis), the measurement of similarity according to the Wasserstein distance can be carried out. This avoids the need to represent experimental distribution functions as histograms, which involves challenges such as choosing an appropriate origin and the number of bins. However, for , the three-term decomposition allows us to directly represent the square of the Wasserstein distance using the empirical versions of the corresponding quantities in Equation (7).
These aspects related to the 2-Wasserstein distance and its representation through decomposition are suggested as points for further investigation. They can be explored in future research activities and application tests, aiming to achieve an effective representation of the homogeneity or heterogeneity of speed distribution along rural highway segments.
6. Conclusions
This study delves into the analysis of the heterogeneity in speed distributions along secondary rural road segments using Floating Car Data (FCD) and data from fixed control stations. The goal is to provide a useful tool to better understand how physical and geometric characteristics of roads influence speed behavior, which is crucial for road safety and traffic management. The presence of at-grade intersections that handle significant traffic volumes necessitates the inclusion of specialized lanes, such as left-turn lanes. These features likely contribute to more consistent speeds due to the structured flow of traffic and the availability of dedicated lanes for turning movements. A reduced lane width and the absence of specialized lanes can lead to greater variability in speed distribution as vehicles may need to adjust their speeds more frequently due to the presence of direct access points and narrower lanes.
This study highlights the benefits of using advanced similarity measures to capture the variability and heterogeneity in traffic speed distributions. These measures facilitate a more comprehensive analysis of traffic patterns, which is essential for informed highway design, performance and safety verification, and regulatory compliance.
To achieve this, continuous speed and location data were collected from GPS-equipped vehicles (FCD) and validated against radar control units. The normalized 1-Wasserstein distance, a non-parametric similarity measure, was employed to compare speed distributions in virtual sections placed at 10-m intervals along the road. In each virtual section, vehicle speeds were obtained by resampling the smoothing splines that reconstruct the speed-position profiles of each equipped vehicle. The findings demonstrate that the normalized 1-Wasserstein distance effectively captures speed distribution variability. This allows for a detailed examination of how road features, such as curvature, intersections, and access points, influence speed behavior. By utilizing the normalized 1-Wasserstein distance, the analysis provides a concise and effective metric for evaluating speed distribution similarities across various road segments. This approach offers a more comprehensive analysis than traditional summary statistics or representations.
Overall, the findings underscore the potential of these techniques to enhance traffic management strategies and improve road safety by providing a deeper insight into the dynamics of vehicular speeds across different road environments. The practical implications of this research are significant, as the application of the normalized 1-Wasserstein distance can directly inform road safety measures, traffic regulation policies, and the design of more effective traffic management systems.
Furthermore, the study proposes the future exploration of the application of the Wasserstein distance with the Euclidean L2 norm. This approach decomposes the measure into three key dimensions: mean (location), variance (spread), and correlation (shape) of the experimental distributions. Continued research utilizing this decomposition could provide a more detailed assessment of dissimilarity, moving beyond simple histogram characterization to provide a more refined understanding of the differences in speed distributions.