Next Article in Journal
Edge-Enhanced CrackNet for Underwater Crack Detection in Concrete Dams
Previous Article in Journal
GNN-MFF: A Multi-View Graph-Based Model for RTL Hardware Trojan Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coverage-Based Framework for Estimating Total Vehicle Travel Distance Using Point-to-Point Trajectory Data

Department of Highway & Transportation Research, Korea Institute of Civil Engineering and Building Technology, Goyang-Si 10223, Republic of Korea
Appl. Sci. 2025, 15(19), 10325; https://doi.org/10.3390/app151910325
Submission received: 27 August 2025 / Revised: 18 September 2025 / Accepted: 21 September 2025 / Published: 23 September 2025

Abstract

Vehicle kilometers traveled (VKT) is a critical metric in transportation and environmental research. However, conventional VKT estimation approaches frequently fail to capture the complexity of route selection and spatiotemporal dynamics of individual road users. This study presents a framework for accurately estimating the total VKT using high-resolution trajectory data obtained from a commercial navigation system. To address the structural limitations of conventional origin destination matrix-based models, such as the modifiable areal unit problem, representative routes were identified based on cumulative travel distance coverage. A novel metric, coverage of estimated travel (CET), was introduced to quantify the explanatory capacity of these routes in approximating total travel distance. Representative routes were selected to maximize CET, and the resulting VKT estimates were validated against national statistical yearbook data. Robustness was further evaluated using mean absolute percentage error, correlation analysis, paired t-tests, and bootstrap-based confidence intervals. The results indicated that as few as five representative routes accounted for over 80% of the total estimated VKT, exhibiting strong agreement with the national statistics after temporal adjustment. These findings demonstrate that trajectory data can serve as a practical alternative to traditional methods, offering higher spatial resolution and enabling dynamic traffic analyses that support transportation policy and environmental planning.

1. Introduction

Vehicle kilometers traveled (VKT) is a vital metric in various transportation and environmental fields, including traffic planning, road maintenance, and greenhouse gas emission estimation. Traditionally, VKT has been estimated by multiplying the observed traffic volumes by the average travel distance or the outputs of four-step transportation demand models. These estimations have been widely adopted for macroscopic statistical analyses at the national and regional scales [1,2]. However, such approaches often fail to capture the complex route choices and spatiotemporal dynamics of individual road users. Consequently, they exhibit limitations in supporting precise decision-making, particularly for microscopic urban-level analyses and the implementation of eco-friendly vehicle policies. Accurate estimation of VKT is essential for effective transportation planning and environmental management. Inaccurate estimates can result in misallocation of resources, misguided policy decisions, and unreliable greenhouse gas emission calculations. Such inaccuracies may further lead to inefficient infrastructure investments and impede efforts toward sustainable mobility. Therefore, the development of reliable VKT estimation methods is imperative to support evidence-based decision-making in transportation and environmental policies.
Previous studies have frequently relied on zone-based origin destination (OD) data, which are highly susceptible to the modifiable areal unit problem (MAUP), whereby analytical outcomes are distorted by the choice of zone size and shape [3,4,5,6]. To mitigate these limitations, extensive research has been conducted on estimating VKT more accurately by using vehicle trajectory data collected from commercial navigation services. High-resolution trajectory data, which represent the actual travel paths of individual vehicles, provide greater accuracy and spatial resolution than conventional average-based estimations and are not affected by the MAUP [7,8,9].
Despite these advantages, the use of trajectory-based data presents several challenges. First, due to the incomplete penetration rate of probe vehicles or global positioning system (GPS)-enabled devices, ensuring adequate representation of the total vehicle traffic volume is difficult. Second, research on methodologies for minimizing information loss during the processing of large-scale trajectory data, as well as for statistically validating the reliability of estimation results, remains insufficient. Most prior studies have primarily concentrated on analyzing traffic volumes or spatiotemporal patterns [10,11], whereas a comprehensive framework for evaluating the reliability of total VKT estimates—particularly one that explicitly accounts for inherent data uncertainties—has not yet been developed.
To address these challenges, the present study introduces a comprehensive framework for estimating total VKT using high-resolution trajectory data. To overcome the data uncertainties that previous studies have insufficiently considered, a novel metric termed the coverage of estimated travel (CET) is proposed. The statistical reliability of the estimates is then rigorously evaluated through bootstrap resampling and sensitivity analysis. By incorporating CET alongside bootstrap resampling and sensitivity analysis, the study extends conventional VKT estimation methods and establishes a methodology capable of ensuring statistical reliability suitable for practical applications, even when trajectory data are limited.

2. Literature Review

2.1. Traditional Zone-to-Zone VKT Estimation

The traditional VKT estimations rely on zone-to-zone OD matrices. This approach predicts the total VKT using assignment results from large-scale transportation demand models or by multiplying traffic volumes by average travel distances [12,13]. These methods primarily leverage data collected from fixed sensors, such as loop detectors, and have evolved to inversely estimate OD demand using optimization techniques. Recent research has focused on estimating dynamic OD matrices in real time to capture fluctuating traffic conditions more efficiently [14,15,16,17]. Despite these advancements, traditional models face several limitations. For instance, they often assess the attractiveness of a route solely based on its total cost, neglecting driver preferences such as avoiding “local detourness”—that is, detours occurring on subsections of a route [18]. Similarly, Tang et al. [19] used a Gaussian mixture model to demonstrate that although reconstructing route trajectories from OD information can improve traffic demand estimation in congested networks, challenges remain in accurately capturing the evolution of urban traffic demand and road flow. OD matrix-based methodologies are intrinsically limited because they only consider trip origins and destinations, failing to adequately reflect actual route choice behaviors or spatiotemporal variations within zones. Moreover, as discussed earlier, these methods are subject to MAUP, in which analytical results can vary significantly depending on the size and shape of the zones, potentially leading to fundamental distortions.

2.2. High-Resolution Trajectory Data Utilization Trends

Recent studies investigated route analysis using high-resolution mobility data [20,21,22]. These studies aimed to address the limitations of traditional OD matrices and to more accurately capture actual travel behavior. For instance, Toole et al. [23] proposed the “path most traveled” approach, which identifies the most frequently used routes using taxi GPS data, highlighting its potential to accurately reflect real user behavior. Chong [24] employed high-resolution vehicle trajectory data to detect unstable traffic flows on highways in real time and to estimate influence zones of merging/diverging sections and accident-prone areas. Kim et al. [25] developed a methodology for determining representative routes connecting specific OD pairs from large-scale vehicle trajectory data, thereby providing a foundation for extracting meaningful travel patterns. Liu et al. [26] proposed a unified route representation learning framework for multimodal transportation recommendations; their method addresses a key gap by simultaneously considering spatiotemporal autocorrelations and semantic coherence in route data and leveraging a hierarchical multitask learning framework to enhance the effectiveness of their recommendations.

2.3. Diverse Applications of Trajectory-Based Analysis

Interventionary research utilizing trajectory data has been extended beyond VKT analyses into several other domains. For example, bi-criterion dynamic user equilibrium models have been developed to evaluate dynamic road pricing strategies by capturing heterogeneous user behaviors, including different values of time and path choices in response to time-varying tolls [27]. High-resolution trajectory data have been employed to study route optimization techniques and their associated logistical and economic benefits [28], as well as to assess road safety in real time by integrating traffic states with collision risk information [29]. In addition, trajectory data have been leveraged to detect anomalous driving patterns, revealing fraudulent routes or adverse traffic events through statistical behavioral pattern analysis [30]. These data are also critical for environmental applications; for instance, studies have estimated urban traffic emissions by reconstructing spatiotemporal patterns from multi-source datasets such as taxi GPS and license plate recognition records [31]. Similarly, methods have been proposed to estimate vehicular fuel consumption and emissions in large areas, particularly in complex signalized corridors. These methods first reconstruct trajectories for the entire traffic population and then apply emission factor models [32]. Collectively, these studies underscore the vast potential of trajectory data analysis based on the spatiotemporal resolution of travel paths and real user behavior, thereby fostering innovation across multiple fields beyond transportation. Wang et al. [33] proposed a comprehensive data-processing pipeline for trajectory data that enables the calculation of various mobility performance indices, including vehicle delay and space-mean speed. These methodologies provide robust and scalable tools for the efficient evaluation of urban traffic networks from diverse perspectives. The data-cleaning process involved removing incomplete OD records, trajectories with unrealistic speeds (>150 km/h), and duplicate samples. For map matching, raw GPS points were aligned to the road network using PostGIS spatial functions, specifically ST MapMatch and ST DWithin, which ensured that trajectory points were projected to the nearest network links within a 30 m tolerance. OD aggregation was implemented in PostGIS through SQL queries, grouping trajectories by OD pairs at the grid-cell level and summing trip counts. This workflow allowed for efficient handling of large-scale trajectory data and ensured reproducibility within a standardized spatial database environment.

2.4. Limitations in Verifying the Reliability of Trajectory-Based Analysis

Despite the increasing use of high-resolution trajectory data, methods for rigorously verifying the reliability of analysis results remain insufficient. Existing studies have proposed various criteria to evaluate the representativeness of outcomes and developed algorithms to efficiently identify representative routes from large-scale trajectory datasets. These algorithms often apply clustering techniques to construct representative route sets for route choice model [34,35]. For example, probe vehicle-based OD estimation models have been devised to correct prior OD matrices while accounting for heterogeneous probe vehicle penetration rates. The strengths and limitations of such approaches have been examined [36]. Methods aimed at recovering day-to-day dynamic OD flows from limited observations have been developed to address data sparsity caused by low penetration rates and to ensure consistency between estimates [37]. Although these efforts demonstrate promising advances, they generally lack rigorous validation of the statistical errors and uncertainties inherent in the data. They also fail to provide a comprehensive framework for assessing how well trajectory-based estimates align with actual field measurements. In particular, the reliability of trajectory-based analyses depends heavily on the probe vehicle penetration rate; however, quantitative validation studies addressing this dependency remain scarce. Nevertheless, the absence of an overarching validation system that systematically assesses potential errors and discrepancies between trajectory-based analyses and real-world data continues to hinder their practical application. Recent studies in fault detection systems—specifically, the Shrinkage Mamba Relation Network (SMRN) with out-of-distribution (OOD) data augmentation and a zero-faulty sample variant—employ relation network architectures combined with synthetic OOD data to achieve robust detection under severe data scarcity. These approaches highlight the potential of leveraging OOD data augmentation to compensate for missing samples in critical parts of the feature space [38,39].
By analogy, similar strategies could be applied in trajectory-based estimation frameworks. For example, OOD trajectory generation methods—perhaps grounded in network traffic models or generative models—could supplement low-penetration areas to enhance VKT reliability. We propose exploring such data augmentation or relation-informed modeling as part of future methodological advancements.

2.5. Research Gap and Contribution

Based on a synthesis of the existing literature, research utilizing high-resolution trajectory data is actively advancing; however, several important gaps remain. First, a path-based VKT estimation method is required that can overcome the structural limitations inherent in existing OD-based analyses, including MAUP. Second, a universal and quantitative reliability metric is lacking for evaluating how well the VKT estimates derived from limited trajectory samples represent the entire population. Third, methodologies for rigorously assessing and correcting the effects of inherent uncertainties—such as incomplete data penetration and spatiotemporal discrepancies—on the final estimates remain insufficient.
This study aimed to address these limitations by developing a framework for VKT estimation based on high-resolution trajectory data, designed to avoid the MAUP issue. Specifically, this study bridges these research gaps and enhances academic rigor by introducing the CET metric to quantify the explanatory power of the estimates. Subsequently, a comprehensive methodology is established to validate the statistical reliability of the results through bootstrap resampling and sensitivity analysis.

3. Methodology

3.1. Framework Overview

To overcome limitations of conventional zone-based OD and deterministic assignment methods, this study proposes a novel data-driven framework leveraging high-resolution point-to-point vehicle trajectory data from commercial navigation services. Traditional approaches often fail to capture the dynamic variability and diversity of actual travel routes due to spatial aggregation constraints. In contrast, the proposed framework estimates total VKT by directly analyzing real driving trajectories. A key component is a metric that quantifies the contribution of a representative route set to the total VKT estimation, with its reliability evaluated based on the accuracy and robustness of the travel demand analysis. The overall framework is illustrated in Figure 1.
The analysis was conducted using high-resolution vehicle-trajectory data collected from a commercial navigation platform. A key characteristic of this dataset is that each record contains OD coordinates along with the sequence of links that constitute the corresponding route. Notably, for each OD pair, only one unique route was recorded, eliminating the need for complex route-choice modeling. By leveraging this feature, the proposed framework selects representative routes for each OD pair based on important metrics, such as trip frequency or travel distance. This approach simplifies the analysis while effectively capturing the dominant travel patterns. To ensure the integrity and accuracy of the analysis, data preprocessing was conducted using the following three steps:
  • Data cleaning: Enhance data quality by removing erroneous or incomplete trajectory records.
  • Network mapping: Map the link sequences onto a digital road network to accurately determine the length of each link.
  • Aggregation: To ensure reproducibility and eliminate redundancy, duplicate OD trajectories defined as records with identical OD coordinates and the same ordered sequence of traversed road links were grouped and aggregated. The trip counts of these identical trajectories were summed, and the resulting unique OD pairs were used as input for route ranking and CET estimation.
The dataset processed through these steps was used for total VKT estimation and reliability evaluation, as detailed in Section 3.2.

3.2. CET Metric Calculation and Validation

To quantitatively evaluate the representativeness of the selected representative routes, the CET metric was defined as follows: the proportion of the total VKT that can be explained by estimates based on the representative route set. The CET metric is mathematically expressed as follows:
C E T c = r R s ( d c , r × V c , r ) r R a l l ( d c , r × V c , r )
C E T t o t a l = r R s ( d r × V r ) r R a l l ( d r × V r )
where
C : Vehicle type (passenger car, heavy vehicle, or bus)
R s : Set of selected representative routes
R a l l : Set of all OD pairs
d c , r : Distance (in km) traveled by vehicle type C on route r
V c , r : Demand (traffic volume) of C on route r
In Equation (1), the denominator represents the total distance traveled by a given vehicle type across the entire route set, and the numerator corresponds to the total distance traveled along the selected representative routes. The CET values range from 0 to 1. Values closer to 1 indicate that the representative routes explain a larger share of the total VKT for that vehicle type. When vehicle-type information is not available, a vehicle-type-aggregated form, hereafter referred to as C E T t o t a l is applied. This study applied C E T t o t a l owing to the lack of vehicle-type attributes in the dataset. This formulation assumes homogeneous travel behavior across vehicle types, which may not hold in reality. For example, freight-dominated corridors are likely to have longer trip distances or distinct routing preferences, introducing bias when using aggregated CET. This limitation highlights the importance of integrating vehicle classification data or sensor-based composition estimates in future implementations of the C E T c formulation (Equation (1)). Finally, the proposed framework was validated by comparing the total VKT estimated using CET with the national statistical VKT data, yielding a practical assessment of the reliability of trajectory-based analyses.

3.3. Representative Route Selection Methodology

The selection of representative routes is a critical step in the proposed framework because the quality and coverage of the chosen set directly influence the accuracy of VKT estimation and the resulting CET values. To ensure transparency and reproducibility, the following procedure was applied to extract representative routes from the full set of observed point-to-point trajectories:
(Step 1) Route Aggregation: All point-to-point trajectories obtained from the navigation dataset are aggregated into unique routes. Each route is represented as an ordered sequence of network links with associated attributes, including the total travel distance (km) and cumulative traffic volume.
(Step 2) Initial Ranking: The aggregated routes are ranked in descending order of the total traffic volume. In cases of equal volume, a longer route distance serves as a tie breaker.
(Step 3) Greedy Selection with Overlap Control: A greedy algorithm is applied to iteratively select routes from the ranked list.
  • Initialize the representative route set R s = ϕ .
  • The top-ranked route is selected from the remaining list and added to R s if its link overlap ratio with all the routes in R s   is less than 70%.
  • After adding each route, recalculate CET for R s .
  • Continue the selection until either the target CET threshold (for example, ≥85%) is reached or the maximum number of routes N m a x is selected.
The 70% overlap threshold was selected based on preliminary tests to ensure a sufficient level of route distinction while preserving CET coverage. Similarly, the k = 2 constraint was introduced to mitigate spatial bias by limiting the overrepresentation of high-volume OD pairs. Both parameters are tunable and will be subjected to sensitivity analysis in future work to assess their impact on CET performance.
(Step 4) OD pairs Constraint: To ensure spatial diversity, no more than k representative routes (k = 2 in this study) are allowed for any single OD pair. This constraint prevents overrepresentation of high-demand corridors while maintaining network-wide coverage.
(Step 5) Output for CET Calculation: The final representative route set R s , obtained through this process is used as the input for CET calculation (Section 3.2) and subsequent VKT estimation.
This approach prioritizes high-demand routes, while preventing redundancy and ensuring coverage across multiple OD pairs. By explicitly controlling route overlap and imposing OD-specific limits, the methodology enhances both the efficiency and geographical representativeness of the selected set, thereby improving the robustness of the CET-based evaluation.

4. Analysis Results

4.1. Data Acquisition and Analysis Environment

Bucheon, Gyeonggi Province, South Korea was selected as the study area to empirically evaluate the total VKT estimation and reliability assessment framework based on high-resolution vehicle trajectory data. As shown in Figure 2, Bucheon is a major hub situated between Seoul and Incheon. It experiences frequent intra-city and inter-regional travel, making it a critical transportation node. The well-developed road network of the city includes expressways, such as the Gyeongin Expressway and the Seoul Ring Expressway, along with primary and secondary arterial roads connecting the entire urban area. This functional road classification generates complex traffic patterns by blending long-distance inter-regional trips with short-distance intra-city movements. Consequently, the intricate road network of Bucheon provides an ideal testbed for evaluating the accuracy of the VKT estimation and the representativeness analysis of routes.
The data used in this study comprised driving records collected from a commercial navigation platform, comprising trajectory data from January to March 2025. This dataset includes more than 90,000 point-to-point OD pairs, with each record containing OD coordinates and the sequence of road links traveled between them. Data preprocessing was conducted in three steps: removing erroneous or incomplete trajectories, matching link sequences to the road network to obtain accurate link lengths, and aggregating trajectories for identical OD pairs.
To validate the reliability of the total VKT estimates—a core objective of this study—a macroscopic estimate was constructed using national statistical data from Statistics Korea and the Ministry of Land, Infrastructure and Transport for 2023, as summarized in Table 1. Although there is an inherent limitation owing to the difference in collection periods between the statistical data and the trajectory dataset, the primary purpose of this study was to verify the methodological effectiveness of the proposed framework and the CET metric. Specifically, the focus was on assessing whether these methods can reasonably approximate macroscopic traffic volumes despite spatiotemporal discrepancies. Statistical data from 2020 to 2023 reflect abnormal travel patterns caused by the COVID-19 pandemic, rendering them unsuitable as a comparative baseline. In contrast, 2023 marked a period when social activities had largely normalized, closely resembling stable pre-pandemic travel behavior. Considering the shortest time gap, 2023 was determined to provide a reasonable benchmark for methodological validation.
The empirical analysis was conducted using high-resolution point-to-point vehicle trajectory data collected from a commercial navigation platform. This dataset offers a significant advantage over conventional zone-to-zone-based travel path data by enabling a far more granular VKT estimation. As detailed in Table 2, the data comprise the OD information for each trip along with the corresponding sequence of road links traveled.
As described above, the collected trajectory data were cleaned, mapped to the road network, and aggregated by identical OD pairs to ensure analytical accuracy. To validate the proposed framework, national statistical data from 2023 were used as a macroscopic benchmark. The temporal mismatch between the datasets was addressed through correction factors based on annual traffic growth rates, as detailed in Section 4.4. These preprocessing and alignment steps ensured that the trajectory-based estimates could be reasonably compared with official statistics, supporting the methodological reliability of the proposed CET framework. Table 3 summarizes the data analysis environment and analysis period.
The dataset collected from a commercial navigation service comprised 230,000 trips. Table 4 lists detailed information, including travel dates, origin and destination addresses and coordinates, and arrival times. The availability of high-resolution point-to-point trajectory data provides a significant advantage. It enables the analysis of microscopic route-choice behavior and actual traffic demand at the individual vehicle level. These factors are challenging to capture using conventional aggregated zone-to-zone OD matrices.
The trajectories were filtered to focus on the study area of Bucheon, resulting in approximately 230,000 trips that reached their destinations and were used in the final analysis. As these floating car data (FCD) were collected from the voluntary movements of commercial service users, they constitute an empirical dataset that closely reflects real-world traffic flow and route usage frequencies. However, FCD have inherent limitations. Specifically, with an estimated penetration rate of approximately 40% of the total number of registered vehicles during the analysis period, the dataset does not fully represent the entire traffic volume. To address this limitation and ensure the explanatory power of the VKT estimates derived from the limited trajectory information, the proposed representative path-based VKT coverage (CET) validation framework was applied. Under these conditions, the reliability and practical applicability of the CET metric were empirically evaluated by comparing the total VKT estimates obtained from point-to-point trajectory data with those derived from national statistical data.

4.2. Results of Representative Path-Based VKT Analysis

This section describes the empirical validation of the hypothesis that a limited number of representative routes account for a substantial proportion of the total VKT. Such validation is critical for practical implementation because it establishes a benchmark for achieving high estimation accuracy while minimizing data processing and observation requirements. The analysis of high-resolution trajectory data from a commercial navigation platform revealed a strong concentration of travel demand on a small set of key routes. Figure 3 plots the profiles of CET ratio (%) and cumulative traveled distance (km) as a function of the number of representative routes (N), demonstrating the growing explanatory power and data coverage of the selected paths. The cumulative VKT coverage represents the proportion of trajectory distance traveled along the selected representative routes relative to the total sampled trajectory distance. In contrast, CET quantifies the proportion of the same in the statistical total VKT, explained by estimates derived from the same set of routes. With only five representative paths, the cumulative VKT coverage reached approximately 31.7% of the sampled trajectory distance. The corresponding CET, reflecting explanatory power relative to the statistical total VKT, was 83.3%. Expanding the set to 20 paths increased the cumulative coverage to 58.2% and further improved the CET ratio. These findings quantitatively confirm that the majority of the travel demand is concentrated along a relatively small number of core routes, as indicated by the two complementary metrics.
Spatial analysis of the top-ranked representative paths indicated that they predominantly utilized major arterial roads within Bucheon. Notably, the two highest-ranked paths as shown in Figure 4 traverse the key transportation corridors of Jungdong-daero and Gyeongin-ro, together accounted for approximately 19.2% of the total VKT.
A notable finding was that the top five representative routes utilized only 4.2% of the total links within the entire road network. This indicates that a small fraction of the physical road infrastructure accounts for a large portion of the overall travel patterns, underscoring the high efficiency of the representative path-based analysis method. This result empirically demonstrates how effectively the CET metric captures the actual concentration of travel data. The identified representative path set serves as the basis for the reliability validation of the VKT estimates, as detailed in the following section. Five routes account for 83.3% CET but only 31.7% cumulative VKT coverage, demonstrating CET’s high concentration efficiency yet raising concerns about over-reliance on arterial roads. This concentration risks overlooking secondary roads and intra-zone trips, potentially introducing MAUP-like biases in micro-level analyses. Future research should test the framework on more diverse networks, including less hierarchical urban structures, to assess whether the concentration pattern persists across different spatial configurations.

4.3. CET Metric Calculation Results

This section evaluated the explanatory power of the selected representative route set with respect to the CET metric and compared it with the cumulative VKT coverage described in Section 4.2. Cumulative VKT coverage represents the proportion of the total sampled trajectory distance captured by the representative routes. By contrast, CET measures the proportion of the total distance traveled along the representative routes in the statistical total VKT, which can be explained by the estimates derived from those routes. The CET ratio increases sharply as the number of representative routes (N) increases from 2 to 5, indicating the route-concentered nature of travel demand (Figure 5). The rate of increase tapers off above N = 5, suggesting diminishing returns in CET improvement as N increases. Notably, N = 5 yields a CET of 83.3%, even though the corresponding cumulative VKT coverage of sampled trajectories remain at 31.7% (Section 4.2). This contrast underscores that CET and cumulative VKT coverage, despite being related, represent distinct aspects of representativeness, each with its own denominator and interpretative implications.
These results provide empirical evidence that a relatively small set of representative routes account for a substantial share of the total VKT, which confirms the efficiency and practical applicability of the representative-route-based VKT estimation approach.
This result suggests that the limitations of traditional assignment models can be addressed by analyzing actual traffic data, enabling a reliable estimation of the total traffic volume with only a small number of representative samples. The high CET values obtained in this analysis served as key indicators for assessing the potential reliability of the representative path-based estimation method during subsequent comparisons and validation against statistical estimates, which are presented in Section 4.4.

4.4. Methodology Enhancement and Robustness Analysis

This section presents advanced analytical techniques for addressing the limitations of conventional methodologies and for rigorously validating the reliability and robustness of the proposed framework. To evaluate the estimation performance, two complementary error measures were considered: mean absolute percentage error (MAPE), which reflects the average magnitude of percentage errors regardless of sign, and mean relative error (MRE), which retains the sign to indicate systematic over- or under-estimation. Although both metrics are reported for completeness, the MAPE was adopted as the primary indicator for accuracy assessment, while the MRE was provided as a supplementary reference.

4.4.1. Sensitivity Analysis Based on Probe Data Penetration Rate

Probe data obtained from commercial navigation services inherently underrepresent the total vehicle population owing to incomplete data penetration rate. A sensitivity analysis was performed to evaluate the effect of data penetration rate on the total VKT estimates. Low penetration rates widened the margin of error between the trajectory-based VKT estimates and macroscopic statistical benchmarks (Figure 6). These results show that the penetration rate has a pronounced effect on the estimation accuracy.
The assumed 40% probe vehicle penetration rate was treated as spatially uniform across the network, which may not reflect real-world variability. In practice, penetration rates can vary significantly depending on urban density, road type, or time-of-day. To address this, future work may incorporate spatially disaggregated penetration models or utilize multi-source data fusion—such as GPS, loop detector, and vision-based traffic sensor data—to dynamically estimate the actual coverage of probe vehicles. Such enhancements could improve correction accuracy and enhance the framework’s applicability to large-scale, heterogeneous networks.

4.4.2. Correction Factor-Based VKT Estimation and Statistical Correction

To address the limitations arising from incomplete data penetration and enhance the representativeness of the estimates, this study developed a total VKT estimation model that incorporates the following correction factor:
A d j u s t e d   V K T = O b s e r v e d   V K T C o r r e c t i o n   F a c t o r
To account for the temporal gap between the 2023 national statistical data and the 2025 trajectory dataset, a correction factor was applied. The annual traffic growth rate of 2.1%, reported in Bucheon’s official transportation yearbooks for 2021–2023, corresponds to a cumulative growth factor of 1.043 over two years. This adjustment was applied uniformly to the 2023 national VKT data, aligning them with the 2025 baseline. Thus, consistency between trajectory-based VKT estimates and national statistics improved, with the explanatory power of the CET metric increasing from 83.3% to 84.1%.

4.4.3. Statistical Fitness Evaluation and Validation Results

To rigorously assess the statistical reliability and predictive validity of the proposed CET-based representative path estimation methodology, a multifaceted analysis employing various statistical techniques was conducted. The analysis examined the degree of alignment between the VKT estimates derived from representative paths and external statistical benchmarks obtained from national statistical yearbooks and Bucheon transportation statistics. A correction factor was applied to account for the low data penetration rate of 40% and generate a VKT estimate representative of the entire vehicle population. Statistical validation was performed using the adjusted VKT estimate, yielding the following results:
  • MAPE was calculated to evaluate the overall estimate accuracy. The results indicated a low error rate of 6.3%, demonstrating that the proposed estimation method exhibited stable performance with low average deviation from the actual statistical values.
  • The Pearson correlation coefficient (r) was calculated to assess the linear relationship between two datasets. The correlation of r = 0.96 reflected a close alignment of the daily variation patterns, as illustrated in the scatter plot in Figure 7. The coefficient of determination (R-squared value) was 0.92, indicating that 92% of the variance in the trajectory-based VKT was explained by the national statistical VKT. With a p-value of less than 0.001, the correlation was highly statistically significant.
3.
A paired t-test was used to assess the significance of the mean difference between the estimated and actual values. The test yielded t = −0.57, with a p-value of 0.57. At a significance level of 0.05, the difference in the means was not statistically significant, suggesting that the representative path-based VKT estimates were comparable to the external statistical values.
4.
Residual analysis was performed to further evaluate the predictive stability and validity of the model. The results showed that the prediction errors (residuals) were distributed approximately symmetrically around a mean of zero. The Q–Q plot in Figure 8 confirmed that the residuals satisfied the normality assumption: the red line represents the theoretical quantiles under perfect normality, while the blue dots denote the observed residual quantiles. Their close alignment demonstrates that the residuals closely follow a normal distribution. In addition, the histogram of residuals is displayed with yellow bars, and the overlaid yellow curve provides a kernel density estimation, offering a smooth approximation of the residual distribution. Overall, the prediction errors were randomly distributed without systematic bias, thereby supporting the stability and validity of the proposed estimation model.
These results provide strong evidence that the representative path-based VKT estimation method, despite being constrained by the low data penetration rate of 40% in this study, achieved high statistical reliability through the application of a correction factor. A small set of representative paths (N = 5) accounted for 83.3% of the total VKT, while ensuring consistency with macroscopic statistical benchmarks and sustaining a low, stable error. For the same analysis period and sample set, the method yielded an MAPE of 6.3% and an MRE of +1.7%. This result aligns with the specification presented in Section 4.4, where MAPE is adopted as the primary metric for interpreting estimation accuracy throughout the paper. Although the MRE of +1.7% is relatively low, it indicates a mild overestimation tendency, likely from the underrepresentation of low-traffic peripheral zones in the trajectory dataset, where representative routes may not fully capture travel diversity. Consequently, the aggregated VKT estimate may slightly exceed the true value in areas with lower vehicle densities. Although the Q–Q plot indicated approximate normality of residuals, low-penetration conditions may still produce outliers caused by rare events such as traffic accidents or unusual congestion. In preprocessing, implausible trajectories (e.g., incomplete OD records or unrealistic speeds) were excluded, but future applications should adopt more robust outlier handling methods—such as anomaly detection or heavy-tailed error modeling—to further enhance the reliability of residual-based validation.

4.5. Uncertainty and Sensitivity Analysis

In this section, we evaluate the statistical uncertainties inherent in the VKT estimation results produced by the proposed framework and conduct a sensitivity analysis to examine how variations in the key parameters affected the outcomes. This approach facilitated the quantitative assessment of the reliability and practical applicability of the methodology.

4.5.1. Statistical Reliability Validation of the CET Metric

The bootstrap method was applied to assess the statistical stability of the CET metric, which measures the cumulative VKT ratio based on representative paths. A total of 1000 random resamples were drawn from the original dataset, and the CET values were recalculated for each resample. The CET values were consistently distributed within a 95% confidence interval (CI), ranging from 81.1% to 85.6% as depicted in Figure 9. The orange curve represents the kernel density estimation, providing a smooth approximation of the CET distribution. This finding suggests that the representative path set (N = 5) does not depend on any specific sample and possesses statistically robust explanatory power.
A sensitivity analysis was conducted to assess the effect of the data penetration rate on the accuracy of the VKT estimation. Various penetration rate scenarios (30, 40, 50, and 70%) were simulated using the trajectory dataset, and the MAPE was calculated for each scenario. The results revealed a clear trend: the MAPE decreased substantially as the penetration rate increased. The MAPE was approximately 12.4%, at 30% penetration rate, decreasing to 3.1% at 70% penetration rate. These findings indicate that a higher penetration rate improves the accuracy of representative path-based VKT estimation. In this study, the trajectory data had an estimated penetration rate of approximately 40%, yielding an MAPE of 7.1%, which is considered sufficiently reliable for practical applications. Figure 10 shows the MAPE as a function of penetration rate. While Figure 10 illustrates the overall sensitivity of the model, it assumes uniform coverage across the network. A spatially resolved analysis remains an important direction for future work to validate robustness under uneven penetration conditions. Bootstrap resampling with 1000 iterations produced a 95% CI of 81.1–85.6% CET, confirming estimate stability when N = 5 routes are considered. However, the analysis was limited to this fixed number of routes. Future extensions should apply bootstrap resampling across varying values of N to how CIs evolve as additional routes are included, thereby strengthening the generalizability of the results.

4.5.2. Correction and Sensitivity Analysis for Spatiotemporal Differences

The trajectory data used in this study correspond to the year 2025, whereas the national statistics used for comparison are from 2023, thereby requiring correction for temporal discrepancies. An annual average road traffic volume growth rate of 2.1% (as reported in the Bucheon City Statistical Yearbook (2023)) was applied to adjust the 2023 total VKT data to the 2025 baseline. The total VKT before correction was approximately 828 million kilometers, which increased to approximately 863 million kilometers after adjustment. Based on this correction, the CET for the representative path set (N = 5) increased from 83.3% to 84.1%. This change was not statistically significant at the 5% level (p > 0.05), indicating that the temporal differences between the datasets had only a limited impact on the reliability of the representative path-based estimation.
This correction addresses only temporal discrepancies and does not account for spatial changes such as road network expansion or variations in vehicle fleet composition. Therefore, future time-series analyses should incorporate dynamic network adjustments and vehicle-type disaggregation to improve the estimation precision.

4.5.3. Discussion and Implications

This study empirically validated the reliability of a representative path-based VKT estimation method by utilizing high-resolution point-to-point vehicle trajectory data and comparing the results with national statistical benchmarks. The proposed approach addresses the spatial limitations and information loss inherent in conventional zone-to-zone OD-based analyses, and offers a novel framework for producing traffic statistics that capture the detailed characteristics of actual vehicle travel patterns.
  • This study formulated the CET metric to quantitatively evaluate the explanatory power of representative paths by measuring the proportion of actual VKT they account for. The analysis demonstrated that a small set of representative paths (N = 5) accounted for 83.3% of the total VKT, thereby significantly enhancing data efficiency and practical applicability.
  • The total VKT estimates derived using the CET metric exhibited a low relative error rate of 1.7% and strong correlation (r = 0.96) with the 2023 national statistical values. These results validate the accuracy and stability of the representative path-based estimation method. They also indicate that trajectory data can serve as an effective alternative to traditional statistical approaches, while providing higher spatial resolution and greater potential for time-series analysis.
  • High-resolution travel behavior-based analysis enables a microscopic understanding of urban mobility patterns. This capability has substantial potential for diverse applications, including smart traffic management, environmental emission tracking, and targeted policy formulation. In particular, the ability of the method to capture real-time traffic conditions and support policy simulations indicates its value as an innovative tool for urban traffic planning and environmental management.
This study had some limitations. National statistical data represent point-in-time averages and therefore do not fully capture seasonal fluctuations or time-of-day variability. Moreover, the analysis did not incorporate vehicle-type-specific travel characteristics, and generalizability was constrained by the penetration rate of the probe vehicle dataset. Future research should address these limitations by employing more refined temporal unit analyses, implementing vehicle-classification algorithms, and developing integrated data fusion and correction techniques.
In conclusion, this study enhanced the accuracy and practical applicability of traffic volume estimation by introducing representative path analysis in conjunction with the CET metric. The proposed approach is expected to serve as a critical foundation for advancing traffic statistics and management systems, with potential applications across diverse urban contexts and policy scenarios. While the aggregated C E T t o t a l demonstrated high explanatory power in this study, the assumption of uniform travel behavior across vehicle types remains a limitation. Routes with disproportionately high heavy vehicle traffic may exhibit estimation bias due to differences in average trip lengths and link preferences. Future work should aim to incorporate vehicle-classified probe data or use proxy information—such as inductive loop classifier counts or classification statistics from public sources—to estimate C E T c and validate this assumption more rigorously. To improve temporal robustness, future studies should consider seasonal validation using quarterly or monthly disaggregated traffic datasets. Such analyses can better account for fluctuations due to holidays, school calendars, or weather-related factors that influence travel behavior, thereby improving the reliability of VKT estimation over time. A key limitation of this study is that vehicle types were not considered in the current C E T t o t a l formulation. This assumption of homogeneous travel behavior may introduce bias, particularly in freight-heavy corridors where routing patterns and trip lengths differ substantially from passenger vehicles. Future research should address this limitation by incorporating vehicle-classified trajectory data or by leveraging supplementary sources such as inductive loop classifiers and government traffic composition statistics. This would enable the use of vehicle-specific C E T c formulations and enhance the accuracy of disaggregated VKT estimation.

5. Conclusions and Future Work

This study develops a novel framework for reliable VKT estimation using high-resolution point-to-point vehicle trajectory data derived from commercial navigation services and provides empirical validation of its feasibility. To address the spatial limitations and information loss inherent in traditional zone-based OD analyses, representative paths were extracted from actual driving routes. The reliability of the estimation results was quantitatively assessed using the proposed CET metric.
The proposed framework demonstrated that a small set of representative paths account for a substantial proportion of the total VKT. The VKT estimates exhibited a MAPE of 6.3% and an MRE of +1.7%. They also displayed a high correlation with national statistical benchmarks. These results confirm the statistical robustness of the approach despite the constraint of limited data penetration. These findings indicated that trajectory data can serve as a viable alternative to conventional methods by providing a more detailed and dynamic representation of urban traffic patterns.
The main achievements and contributions of this study are as follows: First, the empirical analysis of Bucheon revealed that approximately 83.3% of the total VKT could be explained by the top five representative paths. This finding demonstrates that, under real traffic conditions, a small number of core routes account for a substantial share of travel demand, thereby confirming the efficiency of the representative path-based approach for VKT estimation. Second, the VKT estimates derived from representative paths showed high accuracy, with a MAPE of 6.3% (MRE +1.7%), compared with the 2023 national statistics. Third, the proposed CET metric objectively evaluates the explanatory power of representative paths by incorporating the proportion of actual VKT. CET serves as a practical evaluation criterion that complements existing traffic volume-oriented metrics and enhances the assessment of estimation reliability.
Despite these promising results, several limitations remain. The commercial navigation trajectory dataset lacks vehicle type information, which precludes the application of vehicle-type specific weights, as outlined in the theoretical definition of the CET metric. This limitation may restrict the ability to fully capture the VKT differences by vehicle class. Moreover, the validation in this study was based on annual statistical data, and therefore did not sufficiently capture hourly or seasonal fluctuations in traffic volume. Based on these findings, the following directions for future research are proposed.
(1)
Refined temporal and vehicle-type analysis: Incorporate seasonal and time-of-day variations along with the distinct travel characteristics of different vehicle types using more granular trajectory data and vehicle classification methods.
(2)
Dynamic network integration: Enhance the framework to reflect changes in the road network, such as new road construction or closures to improve the long-term predictive accuracy.
(3)
Data fusion and correction techniques: Overcome the limitations of low penetration rates by combining trajectory data with other data sources and applying advanced correction methods.
(4)
Enhanced representative path selection: Move beyond a purely traffic volume-centric approach toward a multi-criteria decision-making framework incorporating factors such as travel time and congestion levels.
An important avenue for future research is to explore the contributions of digital traffic management systems and carbon-neutral policy developments. This would involve applying the proposed CET metric and representative path-based methodology across diverse urban contexts and datasets to verify its generalizability, as well as leveraging vehicle type-specific VKT estimation results to support evidence-based policy formulation.

Funding

This research was supported by the Ministry of Land, Infrastructure, and Transport and the Korea Agency for Infrastructure Technology Advancement (KAIA) in 2025 (G. No. RS-2025-00245781) as part of the third year of Advanced Technology Development for Enhancing Carbon Neutrality Reduction Strategies in the Transportation Sector. The author is grateful for the support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study included commercial navigation trajectories and national statistical data. Commercial navigational trajectory data are not publicly available due to commercial licensing agreements and proprietary restrictions. However, the national statistical data (for example, vehicle registration and daily travel distance statistics) utilized for validation purposes in this study are publicly accessible through official government websites and public data portals, such as those provided by Statistics Korea and the Ministry of Land, Infrastructure, and Transport of South Korea. All key findings and analytical results derived from the data are presented in this manuscript.

Acknowledgments

The author would like to express sincere gratitude to the Ministry of Land, Infrastructure, and Transport (MLIT) and the Korea Agency for Infrastructure Technology Advancement (KAIA) for their continued support and contributions that made this research possible.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
CETCoverage of estimated travel
CIConfidence interval
FCDFloating car data
GPSGlobal positioning system
MAUPModifiable areal unit problem
MAPEMean absolute percentage error
MREMean relative error
ODOrigin destination
ODDOut-of-distribution
VKTVehicle kilometers traveled

References

  1. Korea Transport Institute. Guidelines for Greenhouse Gas Emissions Estimation in the Transport Sector. Korea Transport Database (KTDB). 2020 in Korean. Available online: https://www.ktdb.go.kr/ (accessed on 10 August 2025).
  2. Federal Highway Administration. 2017 National Household Travel Survey: Summary of Travel Trends. 2017; U.S. Department of Transportation: Washington, DC, USA, 2017.
  3. Hillel, B.-G. Traffic assignment by paired alternative segments. Trans. Rec. Part B 2010, 44, 1022–1046. [Google Scholar] [CrossRef]
  4. Kumar, A.; Peeta, S. Entropy weighted average method for the determination of a single representative path flow solution for the static user equilibrium traffic assignment problem. Trans. Rec. Part B 2015, 71, 213–229. [Google Scholar] [CrossRef]
  5. Buzzelli, M. Modifiable areal unit problem. Int. Encycl. Hum. Geogr. 2019, 4, 169–173. [Google Scholar] [CrossRef]
  6. Chen, X.; Ye, X.; Widener, M.J.; Delmelle, E.; Kwan, M.; Shannon, J.; Racine, E.F.; Adams, A.; Liang, L.; Jia, P. A systematic review of the modifiable areal unit problem in community food environmental research. Urban Inform. 2022, 1, 22. [Google Scholar] [CrossRef]
  7. Fan, J.; Fu, C.; Stewart, K.; Zhang, L. Using big GPS trajectory data analytics for vehicle miles traveled estimation. Trans. Rec. Part C 2019, 103, 298–307. [Google Scholar] [CrossRef]
  8. Gurram, S.; Sivaraman, V.; Apple, J.T.; Pinjari, A.R. Agent-based modeling to simulate road travel using big data from smartphone GPS: An application to the continental United States. In Proceedings of the 2019 International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3553–3562. [Google Scholar]
  9. Aslanyan, T.; Jiang, S. Examining passenger vehicle miles traveled and carbon emissions in the Boston metropolitan area. In Urban Informatics and Future Cities; Geertman, S., Pettit, C., Goodspeed, R., Staffans, A., Eds.; Springer Nature: Cham, Switzerland, 2021; pp. 319–340. [Google Scholar] [CrossRef]
  10. Sunderrajan, A.; Viswanathan, V.; Cai, W.; Knoll, A. Traffic state estimation using floating car data. Procedia Comput. Sci. 2016, 80, 2008–2018. [Google Scholar] [CrossRef]
  11. Wang, T.; Huang, S.; Bao, Z.; Culpepper, J.S.; Arablouei, R. Representative routes discovery from massive trajectories. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Walter E. Washington Convention Center, Washington, DC, USA, 14–15 August 2022; pp. 4059–4069. [Google Scholar]
  12. Nanne, J.; Van Der Zijpp, N. Dynamic origin-destination matrix estimation from traffic counts and automated vehicle identification data. Trans. Res. Rec. 1997, 1607, 87–94. [Google Scholar]
  13. McNally, M.G. The Four-Step Model; UC Irvine Institute of Transportation Studies, Center for Activity Systems Analysis, University of California: Irvine, CA, USA, 2008. [Google Scholar]
  14. Osorio, C. Dynamic origin-destination matrix calibration for large-scale network simulators. Trans. Rec. Part C 2019, 98, 186–206. [Google Scholar] [CrossRef]
  15. Yu, H.; Zhu, S.; Yang, J.; Guo, Y.; Tang, T. A Bayesian method for dynamic origin-destination demand estimation synthesizing multiple sources of data. Sensors 2021, 21, 4971. [Google Scholar] [CrossRef]
  16. Englezou, Y.; Timotheou, S.; Panayiotou, C.G. Dynamic origin-destination matrix estimation for networks operating under free-flow conditions using macroscopic flow dynamics. IFAC-PapersOnLine 2024, 58, 213–218. [Google Scholar] [CrossRef]
  17. Ma, W.; Pi, X.; Qian, S. Estimating multi-class dynamic origin-destination demand through a forward-backward algorithm on computational graphs. Trans. Res. Part C 2020, 119, 102747. [Google Scholar] [CrossRef]
  18. Ramussen, T.K.; Duncan, L.C.; Watling, D.P.; Nielsen, O.A. Local detourness: A new phenomenon for modeling route choice and traffic assignment. Trans. Res. Part B 2024, 190, 103052. [Google Scholar] [CrossRef]
  19. Tang, W.; Chen, J.; Sun, C.; Wang, H.; Li, G. Traffic demand estimations considering route trajectory reconstruction in congested networks. Algorithms 2022, 15, 307. [Google Scholar] [CrossRef]
  20. Mazimpaka, J.D.; Timpf, S. Trajectory data mining: A review of methods and applications. J. Spat. Inf. Sci. 2016, 13, 61–99. [Google Scholar] [CrossRef]
  21. Feng, Z.; Zhu, Y. A survey on trajectory data mining: Techniques and applications. IEEE Access 2016, 4, 2056–2067. [Google Scholar] [CrossRef]
  22. Wang, D.; Miwa, T.; Morikawa, T. Big trajectory data mining: A survey of methods, applications, and services. Sensors 2020, 20, 4571. [Google Scholar] [CrossRef] [PubMed]
  23. Toole, J.L.; Colak, S.; Sturt, B.; Alexander, L.P.; Evsukoff, A.; Gonzalez, M.C. The path most traveled: Travel demand estimation using big data resources. Trans. Res. Part B 2015, 58, 162–177. [Google Scholar] [CrossRef]
  24. Chong, K. Spatiotemporal influence analysis through traffic speed pattern analysis using spatial classification. Appl. Sci. 2025, 15, 196. [Google Scholar] [CrossRef]
  25. Kim, H.M.; Nam, D.; Cheon, S. Determination of representative path set from vehicle trajectory samples. J. Comput. Civ. Eng. 2016, 30, 04015052. [Google Scholar] [CrossRef]
  26. Liu, H.; Han, J.; Fu, Y.; Li, Y.; Chen, K.; Xiong, H. Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pretraining. VLDB J. 2022, 32, 325–342. [Google Scholar] [CrossRef]
  27. Lu, C.C.; Mahmassani, H.S.; Zhou, X. A bi-criterion dynamic user equilibrium traffic assignment model and solution algorithm for evaluating dynamic road pricing strategies. Trans. Rec. Part C 2008, 16, 371–389. [Google Scholar] [CrossRef]
  28. Lan, S. Path optimization and logistics economic benefits based on sparsely sampled GPS trajectory data. Mob. Inf. Syst. 2022, 2022, 3350120. [Google Scholar] [CrossRef]
  29. Hu, Y.; Huang, H.; Lee, J.; Yuan, C.; Zou, G. A high-resolution trajectory data driven method for real-time evaluation of traffic safety. Accid. Anal. Prev. 2022, 165, 106503. [Google Scholar] [CrossRef]
  30. Wang, Y.; Qin, K.; Chen, Y.; Zhao, P. Detecting anomalous trajectories and behavior patterns using hierarchical clustering from taxi GPS data. ISPRS Int. J. Geo-Inf. 2018, 7, 25. [Google Scholar] [CrossRef]
  31. Liu, L.; Han, K.; Chen, X.; Ong, G.P. Spatial-temporal inference of urban traffic emissions based on taxi trajectories and multi-source urban data. Trans. Res. Part C 2019, 106, 145–165. [Google Scholar] [CrossRef]
  32. Sun, Z.; Hao, P.; Ban, X.; Yang, D. Trajectory-based vehicle energy/emissions estimation for signalized arterials using mobile sensing data. Trans. Res. Part D 2015, 34, 27–40. [Google Scholar] [CrossRef]
  33. Wang, X.; Jerome, Z.; Zhang, C.; Shen, S.; Kumar, V.V.; Liu, H.X. Trajectory data processing and mobility performance evaluation for urban traffic networks. Trans. Res. Rec. 2022, 2677, 355–370. [Google Scholar] [CrossRef]
  34. Aani, C.; Bhaskar, A.; Haque, M. Bi-level clustering of vehicle trajectories for path choice set and its nested structure identification. Trans. Res. Part C 2022, 144, 103895. [Google Scholar] [CrossRef]
  35. Kim, M.; Kwak, B.L.; Hou, J.; Kim, T. Robust long-term vehicle trajectory prediction using link projection and a situation-aware transformer. Sensors 2024, 24, 2398. [Google Scholar] [CrossRef]
  36. Yang, X.; Lu, Y.; Hao, W. Origin-destination estimation using probe vehicle trajectory and link counts. J. Adv. Transp. 2017, 2017, 4341532. [Google Scholar] [CrossRef]
  37. Cao, Y.; Tang, K.; Sun, J.; Ji, Y. Day-to-day dynamic origin-destination flow estimation using connected vehicle trajectories and automatic vehicle identification data. Trans. Rec. Part C 2021, 129, 103241. [Google Scholar] [CrossRef]
  38. Chen, Z.; Huang, H.-Z.; Deng, Z.; Wu, J. Shrinkage mamba relation network with out-of-distribution data augmentation for rotating machinery fault detection and localization under zero-faulty data. Mech. Syst. Signal Process 2025, 224, 112145. [Google Scholar] [CrossRef]
  39. Chen, Z.; Huang, H.-Z.; Wu, J.; Wang, Y. Zero-faulty sample machinery fault detection via relation network with out-of-distribution data augmentation. Eng. Appl. Artif. Intell. 2025, 141, 109753. [Google Scholar] [CrossRef]
Figure 1. Overall framework of the proposed research methodology.
Figure 1. Overall framework of the proposed research methodology.
Applsci 15 10325 g001
Figure 2. Spatial location of the study area.
Figure 2. Spatial location of the study area.
Applsci 15 10325 g002
Figure 3. Changes in cumulative vehicle kilometers traveled (VKT) and coverage of estimated travel (CET) as a function of the number of representative paths.
Figure 3. Changes in cumulative vehicle kilometers traveled (VKT) and coverage of estimated travel (CET) as a function of the number of representative paths.
Applsci 15 10325 g003
Figure 4. Major roads in Bucheon.
Figure 4. Major roads in Bucheon.
Applsci 15 10325 g004
Figure 5. Variation in CET with the number of representative routes.
Figure 5. Variation in CET with the number of representative routes.
Applsci 15 10325 g005
Figure 6. Variation in observed VKT with sample penetration rate.
Figure 6. Variation in observed VKT with sample penetration rate.
Applsci 15 10325 g006
Figure 7. Relationship between VKT from national statistics and VKT from trajectory data.
Figure 7. Relationship between VKT from national statistics and VKT from trajectory data.
Applsci 15 10325 g007
Figure 8. Q–Q plot of residuals for normality test.
Figure 8. Q–Q plot of residuals for normality test.
Applsci 15 10325 g008
Figure 9. Bootstrap-based CET distribution and 95% confidence interval.
Figure 9. Bootstrap-based CET distribution and 95% confidence interval.
Applsci 15 10325 g009
Figure 10. Visualization of mean absolute percentage error changes by penetration rate.
Figure 10. Visualization of mean absolute percentage error changes by penetration rate.
Applsci 15 10325 g010
Table 1. Bucheon city vehicle registration and daily travel distance data (2023).
Table 1. Bucheon city vehicle registration and daily travel distance data (2023).
Data SourceStatistical ItemValueRemarks
Statistics Korea
(Bucheon)
Number of Registered VehiclesApproximately 350,000 vehiclesAs of May, 2023
Ministry of Land, Infrastructure and TransportAverage Daily Driving Distance by Vehicle Type in Nationwide/Capital AreaApproximately 40 km/dayAverage Value based on 2023 Data
This study
(Estimate)
Estimated Total Daily Driving DistanceApproximately 14 million kmCalculated as 350,000 vehicles × 40 km/day
Table 2. Bucheon city vehicle registration and daily travel distance data (based on 2023 data).
Table 2. Bucheon city vehicle registration and daily travel distance data (based on 2023 data).
Statistical ItemData Composition and Characteristics
OD dataThe dataset consists of over 90,000 point-to-point OD pairs, each including information on origin, destination, travel time, and travel route.
Trajectory dataRoad link sequences associated with each OD pair are provided, which allow for total travel distance estimation by summing the lengths of the constituent links.
Link information Link IDs and their corresponding lengths are defined in the proprietary navigation road network, representing the comprehensive local road infrastructure of Bucheon.
Table 3. Data analysis environment and analysis period.
Table 3. Data analysis environment and analysis period.
CategoryDetailed Description
Analysis toolsPython 3.11 (Pandas, GeoPandas), QGIS 3.36, PostgreSQL 14 (PostGIS).
System environmentIntel Core i9, 64 GB RAM, Windows 11.
Analysis periodMay to July 2025.
Table 4. Summary of trajectory data composition.
Table 4. Summary of trajectory data composition.
Data FieldDescription
Date of travelDate on which the trip was made (for instance, 1 January 2025)
Origin informationNames of the origin city/province and district.
Destination informationDestination name and coordinates (X, Y)
Destination addressStreet and lot number addresses of the destination
Arrival timesArrival times for each destination (up to 3 destinations per trip)
Next destination detailsName of the next destination and the number of subsequent arrivals.
Data periodYear and month of the trip (for example, January 2025)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, C. Coverage-Based Framework for Estimating Total Vehicle Travel Distance Using Point-to-Point Trajectory Data. Appl. Sci. 2025, 15, 10325. https://doi.org/10.3390/app151910325

AMA Style

Yang C. Coverage-Based Framework for Estimating Total Vehicle Travel Distance Using Point-to-Point Trajectory Data. Applied Sciences. 2025; 15(19):10325. https://doi.org/10.3390/app151910325

Chicago/Turabian Style

Yang, Choongheon. 2025. "Coverage-Based Framework for Estimating Total Vehicle Travel Distance Using Point-to-Point Trajectory Data" Applied Sciences 15, no. 19: 10325. https://doi.org/10.3390/app151910325

APA Style

Yang, C. (2025). Coverage-Based Framework for Estimating Total Vehicle Travel Distance Using Point-to-Point Trajectory Data. Applied Sciences, 15(19), 10325. https://doi.org/10.3390/app151910325

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop