1. Introduction
The accelerated pace of urbanization and the continuous growth of population have rendered road network data increasingly critical for the construction of integrated transportation systems and the enhancement of infrastructure connectivity in metropolitan regions [
1]. As fundamental components of geographic information systems (GIS) and intelligent transportation systems (ITS), road networks contribute significantly to improving the quality of urban life and facilitating urban–rural integration [
2,
3]. Moreover, accurate, high-quality road networks serve as a foundational reference for urban spatial analysis, transportation planning, and location-based services [
4,
5]. Accordingly, the automatic construction and timely updating of road networks have become central tasks in transportation geographic information science. Nevertheless, trajectory-driven road map updating remains challenging due to noisy and irregular sampling, heterogeneous data modalities (e.g., pedestrian vs. vehicle), and the need to preserve network topology at the city scale.
With the widespread availability of GPS-enabled mobile devices, crowdsourced trajectory data—such as pedestrian or vehicle traces—have emerged as a rich source of spatial–temporal information. Compared to static surveying data and remote-sensing imagery, these trajectories exhibit denser spatial coverage, richer semantic context, and fine-grained temporal characteristics [
6,
7]. Consequently, an increasing number of studies have sought to extract road networks directly from such data by leveraging their inherent mobility patterns and structural features. Recent advancements have expanded the scope from one-shot geometry extraction to spatial–semantic road map construction using trajectories and geo-tagged content [
8], and to topology-aware representations that explicitly embed trajectories in road-network space [
9]. Turning raw trajectories into topology-ready networks, however, requires models that capture both localized movement organization and broader spatial context while remaining robust across cities and modalities. In this study, we focus on generating a topology-preserving, vectorization-ready road skeleton from trajectories, rather than attribute enrichment or trajectory–network alignment.
Various methods have been proposed for road network extraction using GNSS trajectory data. For example, Cao and Sun [
10] and Yang et al. [
11] utilized Delaunay triangulation to derive road centerlines and boundaries, demonstrating strong adaptability to varying trajectory densities [
12]. Zhou et al. [
13] constructed pedestrian density maps from walking traces and applied discrete Morse theory [
14] to extract “ridge lines” as indicative of pedestrian pathways. Guo et al. [
15] enhanced the compactness of the extracted network by adjusting trajectory distributions. Yang et al. [
16] further proposed a multi-scale fractal analysis combined with connected-component filtering to differentiate between random and goal-directed walking behaviors. Further, Yang et al. [
17] developed the human-flow probability field (HFPF) approach and combined it with hydrological modeling to effectively delineate both primary and secondary pedestrian paths. Complementing these density- and field-based strategies, hybrid incremental pipelines that fuse spatial density with temporal continuity have also been proposed to progressively update road networks from spatio-temporal trajectories [
18]. In recent studies, Tang et al. [
19,
20] proposed an outdoor hiking road network construction method that uses trajectory density stratification and kernel density estimation to generate a 2D road network, fuses elevation data for 3D extension, and employs rasterized density maps, direction-constrained density clustering, and adaptive thresholding to extract pedestrian areas and intersections, building a complete 3D outdoor pedestrian road network framework. Meanwhile, Dal Poz and Morceli [
21] focused on smartphone-based GPS trajectories, using Kernel Density Estimation, morphological skeletonization, and Voronoi diagrams to extract roads in mixed urban–rural environments. These density- and field-based approaches can reveal corridor-like structures from global statistics, but they may under-utilize neighborhood interactions and often require additional regularization to ensure topological consistency. Specifically, the classification of a grid cell should be constrained by its neighbors; for instance, a cell with moderate trajectory density is more likely to be part of a road if its adjacent cells exhibit consistent heading directions, forming a coherent linear flow rather than isolated noise.
In addition to global density-based methods, clustering techniques have also been widely used. For example, fuzzy C-means clustering with velocity constraints was used to distinguish adjacent road segments [
22], while B-spline curves were applied to smooth road geometries, albeit with high computational costs and sensitivity to local anomalies [
23]. Kasemsuppakorn and Karimi [
24] extracted azimuth and speed-based key points for simplified road generation via Partitioning Around Medoids (PAM), a method also adopted by Wu et al. [
25]. Automatic road network construction from massive GPS trajectory data has also been demonstrated using scalable clustering/inference pipelines [
26]. Furthermore, Stanojević et al. [
27] proposed a two-phase method that clusters trajectory points for intersection detection and groups segments based on directional and speed similarities. Xie et al. [
28] introduced a density-based clustering method that integrates similarity metrics across position, speed, direction, and angle for pedestrian network inference. Recently, Buchin et al. [
29] automatically constructed high-precision road network maps from chaotic GPS trajectory data by clustering and optimizing paths using multi-width trajectory clustering. Such clustering approaches have proven effective even for modality-specific analyses, for example, mining spatial patterns and road-type preferences from crowdsourced cycling traces [
30]. Clustering- and similarity-based methods capture local movement organization but can be sensitive to noise and parameterization, and typically need explicit mechanisms to enforce connectivity and suppress spurious branches.
Intersection-aware approaches have also drawn significant attention. Karagiorgou and Pfoser [
31] identified intersections via abrupt changes in speed and direction and then connected them into a coherent network. Direction-ratio statistics have been exploited explicitly for intersection detection from trajectories [
32], and Yuan et al. [
33] advanced this line of work by detecting intersections and lane geometry changes, using principal curves for lane centerlines and Gaussian mixture models for topology inference. Shen et al. [
34] incorporated cycling behavior to locate intersections and applied shape-aware curve fitting for turn path reconstruction. Zhang et al. [
5] proposed a virtual representative point and CFDP clustering framework combined with Delaunay triangulation to improve intersection extraction. Lyu et al. [
35] developed the Motion-Aware Map Construction (MAMC) approach by clustering turning points and trajectory segments, while Wang et al. [
36] decomposed large interchanges into smaller intersections to simplify road network generation. Furthermore, Jiao et al. [
37] employed forward and backward trajectory tracking mechanisms to identify divergence and convergence points in interchanges, effectively addressing the challenge of detecting false intersections in multi-layer complex road networks. While these intersection- and lane-level models improve junction fidelity, their cross-modal and cross-city generalization remains difficult without hierarchical semantics and neighborhood-aware context.
Despite the diversity and technical sophistication of these methods, three limitations persist in current trajectory-based road network extraction techniques. First, many studies emphasize point-level semantics or global density fields, while the hierarchical coupling of intra-cell semantics and inter-cell context remains under-explored. Second, generalization across modalities and cities is often limited when sampling density, noise, and movement behaviors change substantially. Third, topology artifacts (e.g., block-like clusters and spurious short branches) can degrade the connectivity and vectorization-readiness of extracted networks, and reproducible end-to-end evaluation protocols are not always clearly documented.
To address these challenges, this study proposes a novel framework for trajectory-based road network extraction that couples multi-level grid features with supervised learning. Specifically, we transform trajectory data into a structured grid space, enabling the construction of both intra-grid and inter-grid semantic indicators, such as convex-hull density, direction clustering, and neighborhood density differences. A Random Forest classifier is trained to identify key grids that most likely correspond to road segments. Finally, an improved morphological thinning algorithm is applied to extract a topologically coherent single-pixel road network, followed by structural refinement to eliminate noise and discontinuities. In this way, the pipeline combines hierarchical grid semantics, supervised key-grid detection, and topology-aware refinement to provide a scalable and interpretable route from raw trajectories to vectorization-ready networks.
The main contributions of this work are as follows:
(1) We design a hierarchical grid-based representation that couples intra-grid movement structure (dispersion, directionality, and segment heterogeneity) with inter-grid neighborhood continuity (density gradients), enabling trajectory semantics to be modeled beyond point-level features and purely global density fields.
(2) We formulate candidate road-region discovery as a key-grid binary classification problem and train a Random Forest model on pedestrian trajectories, which is then directly transferred to vehicle trajectories in other cities to assess cross-modal and cross-city generalization under heterogeneous sampling and urban layouts.
(3) We develop a topology-oriented reconstruction pipeline that combines morphological closing, four-neighborhood thinning with artifact correction, and Kalman smoothing to produce a single-pixel, vectorization-ready road skeleton, reducing grid-induced staircasing while preserving junction structures for downstream GIS network analysis.
(4) We provide a reproducible end-to-end evaluation setup based on buffer matching against OSM centerlines and report both key-grid detection performance (Precision/Recall/F1) and network extraction quality (correctly extracted length and length-based precision) across multiple zones, facilitating fair comparison and replication.
The remainder of this paper is organized as follows.
Section 2 describes the proposed framework, including trajectory preprocessing, multi-level grid feature construction, supervised key-grid detection, and morphology-based reconstruction with smoothing.
Section 3 reports experimental settings, evaluation protocols, and results on within-city testing and cross-city/cross-modal transfer.
Section 4 discusses the implications, limitations, and potential extensions.
Section 5 concludes the paper.
2. Methods
This section presents a supervised-learning framework for trajectory-based road network extraction. The overall pipeline is summarized in Algorithm 1, which consists of four main stages: (i) trajectory preprocessing and grid indexing (Steps 1–2), (ii) coupled multi-level grid feature construction (Steps 3–5), (iii) key-grid detection via Random Forest classification (Steps 6), and (iv) morphology-aware reconstruction for a topology-preserving road skeleton (Steps 7–10). The detailed implementation of each stage is elaborated in the following subsections.
| Algorithm 1: Coupled multi-level grid semantics for trajectory-driven road network extraction |
Input: Spatial extent: Grid size: 200 × 200 DBSCAN parameters: anglular threshold , Label file containing grid-type annotations {sequence index, type} Output: Refined road network skeleton in raster and vector formats 01: Step 1: Preprocess raw trajectories and construct trajectory segments 02: Remove duplicate points and filter out outliers; 03: Construct trajectory segments ; 04: Compute direction vectors for consecutive trajectory points. 05: Step 2: 06: Step 3: Compute intra-grid features 07: 08: 09: 10: 11: 12: 13: Otherwise 14: Set F2 = 0 and F3 = 0 15: End if 16: F4: 17: F5: 18: F6: DTW-based heterogeneity index computed from pairwise DTW distances between trajectory groups 19: End for 20: Step 4: Compute neighborhood features 21: 22: ∆den8: and its eight neighborhood cells 23: ∆dir4: 4 directional density difference in four principal directions 24: End for 25: Step 5: Construct 18-dimensional grid-level feature vector 26: 27: Step 6: Apply Random Forest classification to identify key and non-key grid cells 28: Step 7: Convert the classified key grids into a binary raster image 29: Step 8: Perform morphological-aware optimization 30: Remove small connected components with fewer than 100 pixels 31: Apply morphological closing using 9-pixel cross-shaped structuring element 32: Extract the road skeleton using an improved 4-neighborhood thinning algorithm 33: Step 9: Smooth the skeleton using the Kalman filter 34: Step 10: Transform the skeleton back to geographic coordinates and output final road network |
2.1. Trajectory Data Preprocessing and Grid Indexing
Pedestrian trajectories, recorded by GPS-enabled devices, contain timestamped geographic coordinates that reflect human movement paths. However, GPS signals are often affected by multipath effects and signal delays in dense urban or indoor environments. Therefore, it is essential to preprocess raw trajectory data to ensure positional consistency and reduce noise.
Each GPS trajectory segment is defined as:
where
denotes the
point of the
trajectory segment.
and
denote longitude, latitude, and timestamp, respectively.
is the number of points in segment
, and
is the total number of trajectory segments.
The preprocessing stage involves three sequential steps to prepare the raw trajectory data for subsequent analysis. First, redundant points and abnormal trajectory segments—characterized by unrealistic distance or speed values—are removed through the construction of Euclidean distance and velocity matrices. Second, direction vectors are computed for each trajectory point based on the displacement between temporally adjacent observations, capturing local movement orientation. Finally, the study area is partitioned into uniform square grids , where is the total number of uniform square grids. Each grid cell thus serves as a localized spatial unit for further multi-level feature extraction. The subset of points contained within is denoted as , where represents the trajectory point of the segment located within grid , organized according to their respective trajectory segment identifiers.
2.2. Coupled Multi-Level Grid Feature Construction
Trajectory semantics are inherently hierarchical: points interact within a cell, reflecting local dispersion and directionality, while road continuity emerges from neighborhood context. We therefore constructed a coupled feature set that integrates internal grid features, which describe within-cell geometry and movement organization, with neighborhood features, which characterize between-cell density continuity. For each grid cell, 18 features were computed, including six intra-grid features and twelve neighborhood features. The six intra-grid features include the number of trajectory points, the HMC index, convex-hull area, the number of directional clusters, trajectory point density, and the DTW-based heterogeneity index. The twelve neighborhood features consist of eight neighborhood density-difference features and four directional density-difference features.
2.2.1. Internal Grid Feature Indices
To capture the intrinsic geometric and semantic characteristics of trajectory distribution within each grid cell, a series of internal grid feature indices is computed. These indices reflect both spatial dispersion and directional complexity of movement patterns.
First, the convex-hull geometry is computed from the set of trajectory points in each grid. The hull area
, hull perimeter
, and hull-based point density
(i.e., number of points per unit hull area) were used to quantify spatial compactness and dispersion.
In addition, two standard geometric enclosures are derived: the minimum bounding circle, characterized by its area and circumference , and the minimum bounding rectangle, represented by its area . These structures help evaluate the shape and alignment regularity of trajectory clusters.
To enhance feature differentiation, two shape ratio indices are constructed, named HMR (Hull-to-Rectangle Ratio) and HMC (Hull-to-Circle Ratio), respectively. The calculation formulas are as follows:
These normalized ratios help assess how tightly the trajectory cluster conforms to different bounding shapes, indicating movement constraints or structural forms.
To quantify directional coherence, an improved DBSCAN clustering algorithm is applied to the direction vectors of trajectory points, with angular difference (rather than Euclidean distance) as the distance metric. Specifically, the angular distance between two direction vectors
and
is defined as
As illustrated in
Figure 1, each direction vector is mapped to a point on the unit circle, where clusters (e.g., orange, green, blue) represent movement along distinct dominant directions. A point is considered a core object if it has at least
neighboring vectors with
. Following the empirical evidence in existing trajectory-driven road-extraction studies, the angular threshold is set to
[
5,
12,
25], which provides an optimal balance between accommodating GPS measurement noise and distinguishing topological road branches. Consequently, the number of direction clusters within the grid serves as a directional dispersion index, reflecting whether movement is well-aligned or chaotic.
Finally, to evaluate the trajectory segment similarity, the Dynamic Time Warping (DTW) algorithm is used to calculate pairwise distances between all trajectory segments within a grid. For two ordered segments
and
, DTW identifies an optimal warping path
that minimizes the cumulative distance:
where
represents the alignment between point
and
. The path
is subject to boundary conditions
and
, as well as monotonicity and continuity constraints.
We then defined a grid-level heterogeneity (chaotic) index as the proportion of segment pairs whose DTW distance exceeded a threshold
, reflecting how consistently trajectories align within the grid (
Figure 2).
2.2.2. Neighborhood Grid Feature Indices
Given the spatial continuity and connectivity inherent in road infrastructure, the contextual relationships between adjacent spatial units are crucial for accurate road network inference. To capture such relationships, this study introduces two neighborhood-level feature indices based on grid-level trajectory point densities
, as illustrated in
Figure 3.
The Neighborhood Density Difference (NDD) quantifies local density variation between the central grid and each of its eight immediate neighbors (
Figure 3a). For every adjacent grid
through
, the difference in trajectory point density relative to the central cell is computed. Missing neighbors (e.g., boundaries) were assigned a difference of 0. This produces eight neighborhood density-difference features describing local density gradients.
The Directional Density Difference (DDD) further captures anisotropic spatial patterns by focusing on opposing pairs of grids along the same directional axis—namely, vertical, horizontal, and diagonal directions (
Figure 3b). For each axis, the absolute difference in density between the two opposing neighbors (e.g.,
and
) is calculated and averaged to produce a directional smoothness index. This feature reflects the directional continuity of traffic flows, which is vital for detecting linear road segments and distinguishing between regular road grids and cul-de-sacs or intersections.
Together, these two indices provide a robust description of the spatial context surrounding each grid, enhancing the classifier’s ability to identify key road-related regions based on both local density gradients and directional consistency.
2.3. Key-Grid Detection via Supervised Learning
Key-grid detection was formulated as a binary classification task: each grid cell
is labeled as a key grid (1) or non-key grid (0) according to the following labeling function:
where
denotes the center point of grid index i, and
represents the reference road surface that derived from OpenStreetMap (OSM). The coordinates of the grid center point are calculated as:
where
and
are the minimum latitude and longitude of the study area,
and
are the grid indices, and
and
are the grid resolutions. Since the coordinates of the grid center point are calculated directly, both latitude and longitude coordinates in the formula require an addition of 0.5 to account for a linear translation equivalent to half the step size.
Label generation is executed through a deterministic, semi-automated workflow to ensure spatial consistency. The primary labeling criterion is based on a geometric point-in-polygon (PIP) relationship: a grid cell is automatically labeled as a road unit if its geometric center is located within the buffer zone of the reference road network (derived from OSM). To account for complex environments such as sidewalks and narrow trails, high-resolution remote-sensing imagery is utilized for manual verification. This manual step serves as a secondary quality control to ensure that only grids with a dominant road presence, specifically those where the road surface coverage exceeds 50% of the grid area, are maintained as positive samples.
A Random Forest (RF) classifier is employed to perform the classification task. As an ensemble learning method based on decision tree aggregation, Random Forest exhibits high tolerance to noise, effective handling of nonlinear feature interactions, and strong generalization capabilities, particularly for moderate-scale spatial datasets. In this application, each grid cell is represented by an 18-dimensional feature vector, which includes the internal semantic indicators (e.g., shape descriptors, directional clustering, trajectory similarity) and neighborhood-context features (e.g., density gradients). The center point of the grid serves as the spatial reference for classification.
As illustrated in
Figure 4, the training dataset is randomly partitioned into multiple subsets, each feeding into a different decision tree using a distinct combination of feature subsets (e.g.,
). Each tree independently outputs a predicted class label for the given grid cell. The final classification result is determined through majority voting across all trees in the ensemble. This voting-based mechanism reduces the risk of overfitting and increases classification robustness across spatially heterogeneous regions.
The model not only provides reliable classification results but also facilitates feature-importance analysis, enabling interpretation of which semantic indicators contribute most to key-grid identification. This aspect is particularly valuable for understanding which spatial characteristics are most predictive of road presence in trajectory-based representations.
2.4. Morphology-Based Road Network Reconstruction
Upon classification of key grids, a binary raster image is generated by mapping the centers of positively identified grid cells into pixel space, thereby forming a coarse representation of potential road regions. To derive a geometrically clean, topologically connected, and single-pixel-wide road skeleton, we implement an enhanced morphological thinning algorithm grounded in four-neighborhood connectivity.
As shown in
Figure 5, several types of abnormal structures frequently arise in the initial binary output, such as densely packed square clusters, elongated blocks, and staircase-like patterns. These formations disrupt topological continuity and visual clarity, and must be addressed to produce a vectorization-ready network.
The proposed refinement procedure targets two key artifact types. First, square and block artifacts—identified by detecting 2 × 2 or larger high-density pixel regions—are simplified through dimensionality reduction. Specifically, such blocks are converted into transitional triangular patterns, enabling further processing through directional rules. Pixels at the periphery of the blocks with low neighborhood connectivity are iteratively removed, yielding thinner and more centralized road representations. Second, to eliminate residual connected triangle artifacts, we analyze four typical configurations as illustrated in
Figure 6. These structures consist of three connected pixels forming an angled “L” shape. For each configuration, the algorithm checks a designated set of outer neighborhood pixels. If the sum of pixel values in this surrounding region is below a certain threshold—indicating isolation or redundancy—the triangle is removed from the network. This ensures that only structurally meaningful segments are retained.
By combining artifact detection (
Figure 5) with structure-aware triangle removal (
Figure 6), the proposed morphological process significantly improves the geometric fidelity, topological continuity, and visual smoothness of the extracted road network skeleton. This post-processing step enhances the usability of the output in downstream GIS tasks, such as map matching, vectorization, and spatial topology analysis.
3. Results
3.1. Experimental Data and Study Areas
This study utilizes two types of trajectory data: pedestrian and vehicle, covering three cities, Shenzhen, Wuhan, and Changsha.
Pedestrian trajectory data were collected from the Yuehai Campus of Shenzhen University, covering an area of 1.44 km2. Characterized by a complex road network, dense vegetation, and numerous hidden footpaths, this area is highly suitable for research on fine-grained pedestrian road extraction. The data were sampled at 1 s intervals and contain longitude, latitude, and timestamp fields, totaling 212 trajectories and 85,802 GPS points.
Vehicle trajectory data were obtained from the urban areas of Wuhan and Changsha. To investigate urban road networks under different topological structures, experimental areas were selected based on road density variations. Wuhan exhibits significant spatial heterogeneity in road distribution; thus, two experimental regions were established: Wuhan experimental zone 1 (30.5377–30.6152° N, 114.2334–114.3320° E) covering 81.3 km2, and Wuhan experimental zone 2 (30.4538–30.5368° N, 114.2736–114.3639° E) covering 79.8 km2. Changsha features a regular radial-grid network with high density and connectivity. Two regions were likewise established: Changsha experimental zone 1 (28.1764–28.2232° N, 112.9616–113.0160° E) covering 27.7 km2, and Changsha experimental zone 2 (28.2181–28.2648° N, 112.9063–112.9607° E) covering 27.6 km2. Both vehicle datasets include longitude, latitude, and timestamp attributes.
Auxiliary data include Jilin-1 high-resolution satellite imagery with a fused resolution of 0.75 m, used to extract the hidden campus pedestrian network. OpenStreetMap (OSM) road networks for Wuhan and Changsha were selected as the benchmark for experimental accuracy evaluation.
3.2. Experimental Setup and Feature Visualization
We trained the classifier on pedestrian trajectories collected at the Yuehai Campus of Shenzhen University. Each record contains longitude, latitude, and timestamp. To test spatial and modal transferability, we then applied the trained model to vehicle trajectories in Wuhan and Changsha. High-resolution remote-sensing imagery and OSM road centerlines were used as external references. Road-extraction quality was evaluated using a buffer-matching protocol: an extracted segment was counted as correct if it overlapped a 10 m buffer around the reference road centerlines. For the labeled Shenzhen dataset, we report Precision, Recall, and F1-score for binary key-grid detection. For the larger Wuhan and Changsha transfer regions, where exhaustive grid-level labels are unavailable, we evaluate end-to-end road-extraction quality using the length of correctly extracted roads, the total extracted length, and length-based precision under the OSM-buffer matching protocol.
Figure 7 visualizes representative grid-level features, including the number of trajectory points, the HMC index, convex-hull area, directional-cluster count, point density, and the DTW-based chaotic index. These maps reveal coherent spatial patterns along arterial corridors and at intersections, supporting subsequent supervised detection of key grids.
More specifically, the number of trajectory points per grid ranges from 0 to 96. Higher values concentrate along main campus corridors (e.g., Liyan Road, Lide Road, and Ligong Road), indicating denser pedestrian usage. The HMC index shows higher values in smaller pathways where trajectories are tightly clustered, while convex-hull area becomes larger along major roads where movements span a wider within-grid space. Directional cluster counts are predominantly one to two in most grids, but increase to three or four near intersections, consistent with multi-directional turning behaviors. The DTW-based index tends to be lower along structured corridors with aligned movement, and becomes higher in open plaza areas (e.g., “Shiguang Square”) where movements are less constrained and trajectory segments are more heterogeneous.
To justify the proposed feature set and examine the relative contribution of different semantic dimensions, a Random Forest-based feature-importance analysis was conducted. For clearer presentation, individual neighborhood-related variables were aggregated into their corresponding feature groups. As shown in
Figure 8, geometric and movement-structure features, such as hull area, DTW-based heterogeneity, and the HMC index, contribute the most to key-grid detection. Importantly, neighborhood-context features also contribute substantially to the model, indicating that inter-grid density continuity provides complementary information beyond within-grid geometry. These results support the rationale of coupling intra-grid movement semantics with inter-grid contextual features, rather than relying on either feature group alone.
3.3. Key-Grid Detection Performance Under Varying Grid Sizes and Classifiers
We assessed sensitivity to grid resolution by testing 100 × 100, 200 × 200, 300 × 300, 400 × 400, and 600 × 600 pixel divisions. Labels were generated using the deterministic point-in-polygon rule described in
Section 2.3 and were subsequently quality-checked against high-resolution imagery and campus road references. A Random Forest classifier was trained with an 80/20 train–test split. As summarized in
Table 1, the 200 × 200 grid achieved the most balanced result (Precision 83%, Recall 79%, F1 0.81 for key grids).
The effectiveness of the proposed trajectory-based features is sensitive to grid resolution. As the grid size becomes very small (e.g., 600 × 600), although the total number of samples increases substantially, each grid cell contains significantly fewer trajectory points and segments. This results in dilution of semantic information within individual grids, reduced directional consistency, weaker aggregation signals, amplified noise from sparse points, GPS jitter, and partial road coverage, as well as decreased discriminative power of key features such as trajectory point density, HMR, and HMC indices, direction clustering results, and neighborhood contrasts. Conversely, very large grids enhance statistical robustness within each cell but reduce the number of training samples and lower the overall spatial resolution. This creates a clear trade-off between sample quantity, feature quality, and spatial precision. The 200 × 200 grid resolution provided the best balance among these competing effects, achieving the highest classification performance.
Under the same feature set and the recommended 200 × 200 grid setting, we further compared three supervised classifiers for key-grid detection: Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost). Hyperparameters were tuned using grid search on the Shenzhen training data, and performance was evaluated using Precision, Recall, and F1-score. As summarized in
Table 2, RF achieved the best overall performance (F1 = 0.79), slightly outperforming SVM (0.78) and XGBoost (0.75). Therefore, RF was adopted as the default classifier in subsequent transfer and end-to-end extraction experiments.
Because the 80/20 grid-level split may still retain spatial dependence among neighboring cells, the within-Shenzhen experiment is mainly used for grid-scale selection and classifier comparison. Generalization is therefore further assessed through the independent zero-shot transfer experiments in Wuhan and Changsha.
3.4. Cross-City and Cross-Modal Transferability of Key-Grid Detection
The identification accuracy of key grids serves as the foundation for constructing road networks. We established an automatic key-grid detection model using pedestrian trajectories from Shenzhen University. Under a grid division of 200 × 200, the training set comprised 5450 samples, and the model achieved a training accuracy of 83% for key grids. The test set contained 1363 samples; under this setting, the model predicted 741 grids as key grids and 622 grids as non-key grids. The spatial distribution of predicted key grids in Shenzhen is shown in
Figure 9.
To verify the generalization capability, this study employs a zero-shot approach to directly transfer the trained model to the Wuhan and Changsha regions. During validation, the model parameters and decision thresholds were retained entirely from the optimal configuration on the training set, without any targeted fine-tuning or parameter adjustments. The identification results are illustrated in
Figure 10. For Wuhan experimental zone 1, out of 11,189 sample grids, 7568 were identified as key grids and 3621 as non-key grids; for Wuhan experimental zone 2, among 7922 grids, 4983 were classified as key grids and 2939 as non-key grids. For Changsha experimental zone 1, out of 15,091 grids, 9069 were identified as key grids and 6022 as non-key grids; for Changsha experimental zone 2, among 9265 grids, 5232 were recognized as key grids and 4033 as non-key grids.
Overall, the spatial distribution of predicted key grids aligns closely with the apparent road corridors in the imagery overlays, and the model shows consistent behavior across regions with high road network density, sparse road distribution, and heterogeneous layouts (
Figure 10).
3.5. Morphological Post-Processing and Curve Smoothing
Starting from the classified key-grid outputs, we converted the predicted key grids into a binary raster representation. We then applied a morphology-aware refinement pipeline to obtain a topology-preserving, vectorization-ready road skeleton. First, isolated components and salt-and-pepper noise were removed using connected-component filtering, which suppressed spurious pixels unlikely to correspond to road structures. We then performed a morphological closing operation with a 9-pixel cross-shaped structuring element to bridge small gaps and enhance local continuity, followed by four-neighborhood thinning to extract a one-pixel-wide skeleton while preserving the main corridor topology.
The effect of these operations is illustrated in
Figure 11. Panel (a) shows the trajectory points retained after key-grid screening, which already highlight corridor structures but still contain discontinuities and small isolated fragments. Panel (b) presents the binary image after noise removal, where isolated artifacts were reduced. Panel (c) shows the result after closing, where short breaks between adjacent segments were effectively filled, providing a more continuous support for thinning and skeletonization. Quantitatively, the morphology stage improved structural continuity: in the Wuhan experimental zone 1, against a complex background composed of approximately 3.2 million pixel units, the number of connected components decreased from 144,913 to 4180, corresponding to a 97.1% reduction. The spurious endpoint ratio decreased from 5.42% to 0.2% when computed on the skeleton graph, indicating fewer fragmented segments and fewer dangling branches. This reduction indicates that the initial key-grid output contains many fragmented candidate components, primarily due to sparse trajectory points, GPS drift, and isolated false-positive grids across a large urban raster. Because these components are mostly small and spatially disconnected, they are better interpreted as candidate-level fragmentation rather than failure of the key-grid classifier. The subsequent morphology-aware refinement therefore acts as a spatial-scale filter that removes isolated fragments, bridges short gaps, and preserves the connected structural backbone of the road network.
To further regularize geometry and improve visual smoothness, we smoothed the extracted centerlines using Kalman filtering applied to sequences of grid-center coordinates (treated as observations). As shown in
Figure 12, Kalman smoothing mitigated staircase artifacts and small zigzag oscillations caused by grid discretization while maintaining junction configurations. The zoom-in views demonstrate that the fitted centerlines (red) closely follow the refined skeleton (black) but provide a smoother, more consistent geometry. This improvement can also be quantified by a reduction in geometric roughness, e.g., the mean turning-angle (or curvature surrogate) enhanced from 96.6883 to 99.6385, and the number of short spurs shorter than 20 m reduced from 25,751 to 2458, which facilitates downstream vectorization and network analysis.
3.6. End-to-End Road Extraction and Baseline Comparison
Using the buffer-matching protocol,
Table 3 compares the proposed method with two raster-based reference methods [
7,
38] across four experimental zones. Overall, our method achieves longer correctly extracted road length in most zones while maintaining competitive length-based precision, with the highest precision reaching 83% in Wuhan Zone 1. This indicates that the proposed multi-level grid semantics and neighborhood-context modeling contribute to both corridor completeness and structural continuity. The two reference approaches [
7,
38] represent raster-based trajectory-to-road-extraction pipelines. Their performance varies across urban scenes, suggesting that fixed rasterization or morphology-based rules can be sensitive to trajectory density, local network complexity, and noise distribution. The advantages of the proposed method are particularly evident in Changsha Zones 1–2, where it extracts substantially longer correct road segments (85.013 km and 60.176 km) than the reference method [
38] (53.488 km and 35.942 km), indicating improved coverage of secondary streets and local connections.
For Wuhan zone 2, the reference method [
38] achieves higher precision than the proposed method (81% vs. 68%), whereas our method recovers a longer correctly extracted road length (93.512 km vs. 77.960 km). This reflects a precision–coverage trade-off in heterogeneous urban environments: our method preserves more candidate corridors to improve network completeness, but may introduce a small number of false positives in open or mixed-use areas. Reference method [
7], which relies on fixed morphological rules without explicit neighborhood-context modeling, tends to over-connect noisy trajectory fragments and produces redundant extracted segments, resulting in a relatively low precision of 54% in this zone. Compared with the raster-based references, the proposed method provides a more balanced extraction outcome by integrating intra-grid movement structures with inter-grid contextual continuity. This pattern is consistent with
Figure 13b, where the extracted network exhibits broader spatial coverage along minor streets and at complex intersections, with few false positives in open or mixed-use spaces.
Figure 13 provides qualitative overlays of reconstructed road networks on high-resolution imagery. Across the four zones, the proposed method better preserves arterial continuity and junction topology, producing more complete and connected corridor structures (
Figure 13a–d). The improvement is particularly visible in dense grid-like neighborhoods (e.g., Changsha Zone 1) and in areas with varying road densities (e.g., Wuhan Zone 2), where the reference methods tend to miss fine structures or yield fragmented segments. These observations are consistent with the quantitative results in
Table 3.
3.7. Ablation Study of Feature Components
To examine the contribution of neighborhood context, we conducted a component-level ablation study using the same 200 × 200 grid setting and the same evaluation protocol. Four feature configurations were compared: internal features only, neighborhood features only, internal features plus neighborhood density-difference features, and the full feature set.
As shown in
Table 4, the internal-only configuration achieves F1-scores of 0.78 and 0.75 for key and non-key grids, respectively. Adding neighborhood density-difference features improves the F1-scores to 0.80 and 0.77, while the full feature set achieves the best overall performance, with F1-scores of 0.81 and 0.78. These results indicate that neighborhood features provide complementary contextual information and improve classification balance when combined with internal geometric and movement-structure features. In contrast, using neighborhood features alone leads to substantially lower non-key-grid performance, suggesting that neighborhood context is not sufficient as a standalone discriminator but is effective as a contextual supplement to intra-grid features.