Next Article in Journal
LS2ODiff: A Diffusion-Based Framework with Partial Convolution for Lunar SAR-to-Optical Image Translation
Next Article in Special Issue
MACER-UNet: A Connected Rural Road Extraction Model Integrating Multi-Scale Perception and Edge Enhancement
Previous Article in Journal
Assessing the Effect of Long-Term Soil Warming on Subarctic Grasslands Using High-Resolution Multispectral Drone Images
Previous Article in Special Issue
DINOv3-PEFT: A Dual-Branch Collaborative Network with Parameter-Efficient Fine-Tuning for Precise Road Segmentation in SAR Imagery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Positive-Guided Local Supervision for Robust Road Extraction from Remote Sensing Imagery

by
Hao He
1,2,*,
Shuyang Wang
1,
Lei Huang
1,
Xiaohu Fan
1,
Yongfei Li
3 and
Dongfang Yang
3
1
Hi-Tech Institute, Qingzhou 262500, China
2
Defense Innovation Institute, Academy of Military Sciences, Beijing 100071, China
3
Department of Automatic Control, Rocket Force University of Engineering, Xi’an 710025, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(10), 1589; https://doi.org/10.3390/rs18101589
Submission received: 26 March 2026 / Revised: 2 May 2026 / Accepted: 12 May 2026 / Published: 15 May 2026

Highlights

What are the main findings?
  • We demonstrate that prevailing road extraction datasets universally suffer from severe underlabeling. We further systematically reveal that end-to-end dense prediction networks are inherently vulnerable to such annotation defects, while patch-based methods can naturally alleviate this problem via localized supervision.
  • The proposed Positive-guided Local Supervision (PLS) strategy integrates the noise robustness of patch-based paradigms and the global context learning efficiency of end-to-end frameworks. By isolating model optimization from misleading gradients induced by inaccurate annotations, PLS achieves prominent IoU and F1 improvements over baseline models on the refined DeepGlobe-mini-test and CH4P-mini-test datasets.
What are the implications of the main findings?
  • We develop the CH4P road extraction dataset, comprising 13,498 high-resolution satellite images with authentic public road annotations from real-world maps, together with an extra 150 images equipped with refined road annotations. With realistic underlabeling inherited from public maps, CH4P serves as a challenging benchmark for evaluating model robustness and promotes future research on noisy label learning for dense prediction tasks.
  • PLS effectively boosts the extraction performance of low-grade rural roads and imparts strong noise robustness to mainstream segmentation networks. It supports direct training and practical deployment on real-world underlabeled data without introducing extra computational overhead in both training and inference stages.

Abstract

Road extraction from high-resolution remote sensing imagery is fundamental to numerous practical applications, yet still faces notable challenges caused by label noise, particularly the underlabeling of rural roads within training datasets. End-to-end dense prediction networks deliver high efficiency and strong global context capture capability, yet they are highly vulnerable to such label noise. In contrast, patch-based methods achieve better robustness but sacrifice global reasoning ability and computational efficiency. This paper proposes a novel training strategy named Positive-guided Local Supervision (PLS), which integrates the strengths of the two aforementioned paradigms. PLS preserves the full end-to-end forward pass to leverage global context, while restricting loss computation to local patches centered on reliably annotated road pixels (positive samples) via a standard dense segmentation loss. By isolating the model from misleading gradients generated in underlabeled regions, PLS effectively mitigates the negative impact of underlabeling without compromising computational efficiency and prediction quality. We evaluate the proposed PLS on two datasets: the public DeepGlobe benchmark and a newly constructed challenging dataset, namely China Four Provinces (CH4P). CH4P includes 13,498 high-resolution images of rural China, which suffers from severe underlabeling inherited from public web maps. Extensive quantitative evaluations on DeepGlobe and the newly built CH4P dataset validate that our PLS strategy surpasses conventional end-to-end baselines and competitive state-of-the-art methods under both noisy original labels and manually refined annotations. On the refined DeepGlobe-mini-test and CH4P-mini-test subsets, PLS obtains prominent absolute IoU improvements of 0.127 and 0.104 over baseline models, respectively, showing distinct superiority in handling severe real-world underlabeling. Qualitative visualizations and cross-dataset generalization tests further demonstrate that PLS can effectively retrieve road segments omitted in raw annotations, delivers strong robustness against practical label noise, and introduces no extra computational burden in the inference stage.

1. Introduction

Road extraction from high-resolution remote sensing imagery plays a vital role in numerous applications, including autonomous navigation, urban planning, and disaster response [1]. Despite significant progress driven by deep learning, accurate and robust road extraction remains challenging due to the presence of label noise in training data. A particularly pervasive form of noise is underlabeling: many roads, especially low-grade rural roads, are missing from annotations. This issue is exacerbated in real-world datasets sourced from public maps such as OpenStreetMap, where annotation quality is highly variable and often incomplete [1].
Figure 1 illustrates representative examples from two datasets used in this study. In the DeepGlobe dataset (Figure 1a–d), original annotations frequently miss narrow or unpaved roads (highlighted by red boxes). The CH4P dataset (Figure 1e–h), constructed from OSM maps, exhibits even more severe underlabeling, with numerous rural roads completely omitted. Such missing annotations systematically mislead models trained with dense supervision, forcing them to predict these road pixels as background and thereby degrading extraction performance.
Deep learning-based road extraction methods have evolved along two main paradigms: patch-based classification and end-to-end dense segmentation networks. Patch-based methods [2] divide the image into small overlapping patches and classify only the central pixel of each patch. This localized supervision inherently isolates the effect of label noise, because incorrectly annotated pixels within a patch do not contribute to the gradient. However, patch-based methods lack global contextual information, are computationally inefficient due to sliding-window inference, and produce stitching artifacts, limiting their practical applicability for large-scale mapping.
End-to-end dense prediction networks, especially these Encoder–Decoder networks such as U-Net [3] and D-LinkNet [4], process the entire image in a single forward pass, capture long-range dependencies, and deliver high-resolution outputs with high efficiency. These properties have made them the de facto choice for modern road extraction. Nevertheless, end-to-end networks supervise every pixel equally, making them highly vulnerable to underlabeling: missing road pixels generate gradients that push the model to incorrectly predict them as background, creating a systematic bias against extracting those roads.
To overcome this dilemma, we propose to integrate the complementary advantages of the two paradigms. Our key insight lies in that underlabeled regions are unreliable for model training, whereas pixels explicitly annotated as road (i.e., positive samples) are generally highly credible. This is because manual annotation only actively labels genuine road pixels from satellite remote sensing imagery. We therefore introduce Positive-guided Local Supervision (PLS), a training strategy that retains the full end-to-end forward pass to leverage global context, but restricts loss computation to local patches centered on reliably annotated road pixels. As illustrated in Figure 2, the network first extracts dense feature maps for the entire image. From the ground truth, we sample a set of positive pixels and extract square patches around them. The loss (e.g., binary cross-entropy and Dice) is computed only within these patches and then back-propagated. By isolating the model from misleading gradients originating from underlabeled regions, PLS effectively mitigates the impact of missing annotations without sacrificing global reasoning capability or inference efficiency.
The main contributions of this work are fourfold:
  • We systematically analyze the adverse impact of underlabeling in road extraction datasets. We reveal the inherent vulnerability of end-to-end dense supervision to incomplete annotations, and clarify that the localized supervision mechanism of patch-based methods possesses natural robustness against label noise.
  • We propose a novel training strategy named Positive-guided Local Supervision (PLS). By constraining loss calculation within local patches anchored on reliable positive road samples, PLS embeds patch-level noise robustness into standard end-to-end networks. It maintains global context modeling and computational efficiency without introducing extra inference overhead.
  • We build the challenging rural road dataset CH4P (China Four Provinces), which contains 13,498 high-resolution remote sensing images covering four provinces of China. Derived directly from public web map annotations, CH4P faithfully reflects real-world underlabeling scenarios in practical applications.
  • Extensive experiments conducted on DeepGlobe and CH4P datasets verify that PLS achieves superior performance over mainstream baselines and state-of-the-art methods. Comprehensive ablation studies further demonstrate the rationality, robustness and effectiveness of the proposed method.

2. Related Work

Road extraction from high-resolution remote sensing images has been extensively studied, with deep learning becoming the dominant paradigm [1]. In this section, we focus on the two paradigms most relevant to our work, patch-based methods and end-to-end dense prediction networks, and discuss their respective strengths and weaknesses concerning label noise.

2.1. Patch-Based Methods

Early deep learning approaches adopted a patch-based classification strategy [2,5]. These methods divide the input image into small overlapping patches (e.g., 32 × 32 or 64 × 64 pixels) and train a CNN to classify the central pixel of each patch as road or background. The primary advantage of patch-based methods is their inherent robustness to label noise: the loss depends only on the center label, so incorrectly annotated pixels elsewhere in the patch do not affect the gradient. This property makes them well suited for datasets with incomplete annotations, such as those sourced from public maps. However, patch-based methods suffer from several limitations. First, they lack global contextual information because each patch is processed independently, often leading to fragmented road predictions [6]. Second, sliding-window inference is computationally expensive due to massive overlap, making them inefficient for large-scale imagery [7]. Third, the final segmentation map is obtained by stitching patch-level predictions, which introduces artifacts and limits spatial precision [8].
Recent breakthroughs in Transformer-based architectures for semantic segmentation have also benefited road extraction. Notably, Vision Transformers (ViT) and their variants such as Swin-Unet [9], SegFormer [10] can be viewed as a form of patch-based method: they divide the input image into regular patches, linearly embed each patch into a token, and process the resulting sequence through self-attention mechanisms. However, unlike traditional patch-based CNNs that process patches independently, Transformers explicitly model global semantic relationships across all patches via multi-head self-attention, thereby overcoming the limitation of local context [11,12]. This ability to capture long-range dependencies has led to state-of-the-art performance in road extraction tasks, with models such as SegFormer, Swin Transformer, and Mask2Former [13] achieving significant improvements on benchmarks like DeepGlobe and Massachusetts Road Dataset. By integrating global context modeling with the patch-based paradigm, Transformer architectures have gained substantial attention in recent years for remote sensing road extraction [1].

2.2. End-to-End Dense Prediction Networks

The introduction of fully convolutional networks (FCNs) [14] marked a paradigm shift toward end-to-end dense prediction. Especially the Encoder-Decoder architectures such as U-Net [3], SegNet [15], LinkNet [16], and their variants have become the de facto standard for road extraction. These networks process the entire image in a single forward pass, produce pixel-level outputs, and excel at capturing global context and long-range dependencies [1]. Numerous improvements have been proposed, including multi-scale feature fusion [17,18], attention mechanisms [19,20], and specialized loss functions [21,22,23]. End-to-end methods are computationally efficient and deliver high-resolution outputs without stitching artifacts. However, they supervise every pixel equally, making them highly sensitive to label noise—particularly underlabeling, where road pixels are incorrectly marked as background. This vulnerability is critical because real-world road datasets often suffer from systematic underlabeling of rural or unpaved roads.

2.3. Handling Label Noise in Road Extraction

Several strategies have been proposed to mitigate the impact of label noise in road extraction. One category focuses on designing noise-robust loss functions. Structure-based loss [21] leverages the continuity and smoothness priors of road topology to regularize predictions under imperfect labels. Generalized Cross Entropy (GCE) [24] uses a tunable parameter to interpolate between cross entropy and mean absolute error, making the loss tolerant to mislabeled examples while retaining convergence speed on clean samples. These loss functions, however, operate globally and uniformly over all pixels. A large number of consistently underlabeled pixels can still dominate the gradient and systematically bias the model toward predicting background.
A second category develops road-specific architectures that explicitly or implicitly handle annotation noise. The noise probability model RDNN [25] estimates pixel-wise noise distributions to down-weight unreliable annotations. UGD-DLinkNet [26] integrates Monte Carlo dropout and uncertainty-guided knowledge distillation to reduce the influence of noisy labels by directing supervision toward predictions with higher confidence. RCFSNet [27], although primarily designed to address road occlusions and discontinuity, can also recover some unlabeled roads through its attention-enhanced global context modeling. While effective in their intended settings, these methods still rely on the full set of annotated pixels to drive training. When a substantial portion of low-grade roads is entirely missing from the annotations, the global loss remains contaminated by erroneous background labels for those missing segments, limiting the ability of these methods to recover systematically underlabeled roads.
Weakly supervised and semi-supervised approaches reduce annotation requirements by using incomplete labels such as scribbles [28], road centerlines [29], or open-source maps [30]. Generative adversarial networks (GANs) have also been employed to refine noisy road masks or to adapt across domains [31,32]. These methods target a different problem: they assume a data regime where supervision is deliberately sparse or where labeled source data exist for adaptation. They do not address the case where dense annotations are available but contain systematic omissions in the ground truth.
Rather than modifying the loss function or the model architecture, we spatially restrict loss computation to local patches centered on reliably annotated road pixels. This strategy inherits the noise-isolating advantage of patch-based methods while preserving the full forward pass of end-to-end networks, effectively isolating the model from gradients originating in underlabeled regions, without sacrificing global context or inference efficiency.

3. Mechanistic Analysis of Patch-Based and End-to-End Paradigms for Road Extraction

To lay the groundwork for our proposed method, we delve into the fundamental mechanisms that govern the behavior of the two dominant paradigms in road extraction under label noise. While patch-based and end-to-end methods both aim to produce accurate road maps, they differ fundamentally in task formulation, annotation complexity, loss computation, and gradient dynamics. These differences lead to a stark contrast in their sensitivity to underlabeling—the systematic omission of road pixels in training annotations. In this section, we provide a formal analysis of these mechanisms, which directly motivates the design of our Positive-guided Local Supervision strategy.

3.1. Task Formulation and Annotation Complexity

Patch-based methods treat road extraction as a point-wise classification problem. For each image patch (typically 32 × 32 or 64 × 64 pixels), the annotator only needs to label the central pixel as road or background. This low-complexity task inherently limits the introduction of label noise: even if peripheral road pixels are missed, the center label remains correct. Consequently, label noise in the training data is only limited to a small fraction of patches with mislabeled center pixels. Such cases occur infrequently because patch centers are generally much easier to identify accurately.
End-to-end dense prediction networks require pixel-level dense annotations. Every pixel in the image must be labeled as road or background. This task is significantly more demanding: annotators must trace every road segment, including narrow paths, unpaved roads, and roads occluded by vegetation. Under such complexity, missing annotations become inevitable, especially in rural areas where roads are difficult to discern. As a result, the training data for end-to-end networks inherently contain a substantial number of underlabeled pixels. These pixels are actual road regions that are incorrectly annotated as background.

3.2. Loss Decomposition and Gradient Behavior

To understand how underlabeling affects training, we analyze the loss and gradient dynamics. In an end-to-end network, the total loss L total is the sum over all pixels:
L total = p R + ( p ^ p , 1 ) + p R ( p ^ p , 0 ) ,
where R + and R are the sets of pixels labeled as road and background, respectively, and is a pixel-wise loss function (e.g., binary cross-entropy). Due to underlabeling, R actually contains two disjoint subsets:
R = B M ,
with B being true background pixels and M being true road pixels that are incorrectly labeled as background. Thus,
L total = p R + ( p ^ p , 1 ) + p B ( p ^ p , 0 ) + p M ( p ^ p , 0 ) ,
where the first term represents correctly annotated roads, the second term true background, and the third term underlabeled roads that are incorrectly marked as background.
From the network’s perspective, the terms from M are indistinguishable from those of true background: both contribute gradients that push the predicted probability p ^ p toward 0. During early training, the abundant correct samples in R + and B dominate the gradient. As these become well fitted, the loss on M becomes relatively more significant, and the gradients from these mislabeled pixels begin to steer the network toward predicting roads as background. This creates a systematic bias that actively suppresses the very roads that were omitted from the annotations. Moreover, because the loss is summed over all pixels, every missing road contributes to the gradient, and their influence cannot be localized, which leads to a global bias.
In contrast, patch-based methods compute loss only on the center pixel of each sampled patch. Let a patch be centered at pixel c. The loss is:
L patch = ( p ^ c , y c ) .
Even if the patch contains underlabeled road pixels near its edges, they do not contribute to the gradient. A missing road pixel will affect training only if it happens to be the center of a patch. The probability of such a scenario is extremely low. Thus, localized supervision naturally isolates the effect of underlabeling, preventing the systematic gradient bias that plagues end-to-end training.

3.3. Practical Trade-Offs and the Path to Synergy

The above analysis reveals a fundamental trade-off. End-to-end networks excel at capturing global context and are computationally efficient, but their dense global supervision makes them brittle under underlabeling. Patch-based methods are robust to label noise due to localized supervision, but they sacrifice global reasoning (leading to fragmented predictions), are computationally inefficient (due to overlapping sliding windows), and introduce stitching artifacts that degrade output quality.
This contrast suggests that an ideal solution would retain the global context and efficiency of end-to-end architectures while selectively adopting the localized supervision principle of patch-based methods. Crucially, we do not need to abandon the end-to-end framework. Instead, we can modify the way supervision is applied: perform a full forward pass to leverage global features, but restrict loss computation to local regions that are anchored by reliably annotated road pixels. Therefore, we can shield the model from the harmful gradients of underlabeled areas while preserving its ability to learn from the global context. This insight directly motivates the design of our Positive-guided Local Supervision strategy, which we detail in the next section.

4. Method

Following the insight from Section 3, we propose a training strategy that imbues end-to-end networks with patch-level noise robustness while retaining their global context and computational efficiency. The core idea is to simulate the localized supervision of patch-based methods within an end-to-end framework: we still perform a full forward pass on the entire image to leverage global features, but restrict loss computation to local regions anchored by reliably annotated road pixels. We term this concept Localized Dense Supervision (LDS) and instantiate it through Positive-guided Local Supervision (PLS).

4.1. Localized Dense Supervision via Positive-Guided Local Supervision

The core insight of our method is that label noise in road extraction datasets exhibits non-uniform distribution. By empirically analyzing existing datasets, we confirm that underlabeling constitutes the primary noise type, where numerous low-grade roads in rural regions are commonly omitted from annotation records. Such pixels are implicitly regarded as background due to the absence of manual annotations. Well-annotated arterial roads possess high annotation reliability, while many low-grade rural roads suffer from underlabeling. This asymmetry suggests that positive samples can serve as trustworthy anchors for supervision.
Building on this observation, we design a training procedure that selectively focuses loss computation on regions anchored by these reliable positive pixels. As illustrated in Figure 2f, the network first performs a standard forward pass on the input image, producing a dense prediction map P ^ . Let Y denote the ground truth label map. Instead of computing the loss between P ^ and Y for all pixels, we only utilize reliably annotated road pixels. Specifically, we first extract all pixels where Y p = 1 as positive samples, and randomly select K pixels from these samples as patch centers. For each center, we crop an S × S square patch from both the prediction map P ^ and the ground truth Y . Subsequently, we only calculate the standard dense segmentation loss (e.g., binary cross-entropy and Dice loss) on these K patches, as visualized in Figure 2g. Mathematically, we denote P i as the set of pixels in the i-th sampled patch. The total loss is:
L PLS = 1 K i = 1 K 1 | P i | p P i p ^ p , y p ,
where is the pixel-wise loss function. The gradients from these patches are back-propagated to update the network, while pixels outside the sampled patches contribute zero gradient.
Algorithm 1 formalizes the procedure. The parameter K controls how many positive-centered patches are used per image, ensuring sufficient coverage of reliable road regions. The patch size S determines the local context incorporated into each supervision signal, balancing precision and recall as discussed in Section 5.3. Gradients from pixels outside the sampled patches are zeroed, preventing the model from being misled by underlabeled areas.
This design directly mimics the behavior of patch-based methods: each sampled patch acts as a “super-pixel” that is densely supervised, but its center is guaranteed to be a trustworthy road pixel. Because the loss is limited to regions centered on reliable positives, the model is isolated from the harmful gradients that would otherwise originate from the vast underlabeled areas, such as the blue dash box in Figure 2g. Even if a missing road pixel happens to fall inside a sampled patch, the patch will still contain many correctly labeled road pixels, and their collective gradient will dominate, mitigating the impact of the single noisy pixel.
Algorithm 1: Positive-guided Local Supervision
Remotesensing 18 01589 i001

4.2. Why Positive-Guided Local Supervision Works

The inherent rationality and superior performance of the proposed PLS supervision mechanism are based on a core prerequisite: manually annotated road positive samples are reliable. In manual annotation pipelines, road positive pixels are labeled actively and selectively, while background regions are marked passively. It is illogical for background regions to be mistakenly labeled as roads in manual annotation. Furthermore, we manually inspected all training samples in the DeepGlobe dataset and verified the high accuracy of its road positive annotations. This evidence confirms that the reliability premise of positive samples is firmly established, at least for the DeepGlobe dataset.
Beyond reliable positive annotations, the effectiveness of PLS is based on two key technical factors:
  • Selective supervision: By restricting loss computation to local patches anchored exclusively on credible positive road samples, our method effectively excludes a portion of under-labeled regions from gradient backpropagation. These incompletely annotated areas would otherwise mislead the model and force it to falsely classify road regions as background. Masking such regions during training directly mitigates the systematic annotation bias analyzed in Section 3.
  • Preservation of global context: The full-image forward propagation pipeline is completely retained, enabling the network to capture long-range dependencies and comprehensive contextual features from the entire scene. The localized supervision only regulates gradient calculation for back-propagation, without impairing the global reasoning capability of end-to-end architectures. Unlike conventional patch-based methods that process patches independently, the global forward pass of PLS ensures prediction consistency across the image and eliminates stitching artifacts.
In particular, the PLS training strategy introduces no additional computational overhead during inference. All modifications are confined to the training stage. At test time, the model operates identically to a standard end-to-end segmentation network with a single forward pass per image. Meanwhile, since PLS only adjusts the scope of loss calculation without introducing additional network parameters, its training overhead remains negligible. Extensive experiments validate that the training efficiency of PLS is fully comparable to that of the baseline networks.
PLS serves as a conceptual bridge between end-to-end dense prediction and patch-based methods, as illustrated in Figure 2. Traditional patch-based pipelines split the image into numerous small patches, where each patch contributes only a single center pixel to the loss (see Figure 2c,e). We invert this paradigm: the entire image is treated as a unified global patch for full forward pass and global context modeling. Within this global scope, we sample K local sub-patches anchored at reliable road pixels, and each sub-patch provides dense pixel-level supervision for loss optimization. These sub-patches function as the supervisory “anchors” in patch-based methods, but deliver rich dense local supervision instead of sparse point-wise constraints.
This design inherits the noise robustness of patch-based methods by confining loss computation to trustworthy positive-anchored regions. Simultaneously, the global forward pass enables the network to learn holistic scene information, avoiding feature fragmentation and the inefficiency of sliding-window inference. In summary, PLS Loss harmonizes the two paradigms: it inherits the label noise robustness of patch-based methods without sacrificing the prediction performance and computational efficiency of end-to-end networks.

4.3. Theoretical Analysis of Robustness to Positive Annotation Noise

Although we have verified the reliability of positive annotations in the DeepGlobe dataset, positive label noise (background pixels incorrectly labeled as roads) cannot be fully excluded in new datasets, which may arise from outdated public web maps, geographic coordinate shifts, and other real-world factors. Thus, it is essential to theoretically analyze the performance of PLS under such positive annotation noise.
PLS is based on the core assumption that positive samples are reliably annotated. Positive annotation noise inevitably introduces erroneous gradients and thus impairs model optimization. Nevertheless, we argue that PLS still maintains robustness to positive noise, which benefits from the global feature learning capability inherited from full-image supervision. The abundant correctly labeled negative samples within training patches can effectively counteract the erroneous gradients caused by sparse false positive annotations. This conclusion holds on the premise that the number of mislabeled positive samples in a local patch is lower than that of correctly annotated negative samples, allowing the model to learn effective feature representations of true negative samples. This condition is almost always satisfied in road extraction tasks, as the number of positive samples is inherently much smaller than that of negative ones. This mechanism implies a critical design principle: the size of local supervision patches should not be excessively small. Overly small patches will discard numerous valid negative samples, weaken the gradient counterbalance effect, and amplify the negative impact of positive annotation noise.
Subsequent experiments on our self-built CH4P dataset further verify this inference. This dataset contains severe positive annotation noise under real mapping scenarios, which provides experimental evidence for the above analysis.

5. Experiments

We conducted extensive experiments to evaluate the proposed PLS strategy. This section first introduces the experimental datasets and elaborates on the implementation settings. We then perform parameter sensitivity analysis on the mini validation sets of the two datasets to explore feasible configurations of patch size S and patch number K. With the optimal parameters determined, we train D-LinkNet and Segformer as backbone networks on both datasets, and analyze their performance with and without the PLS module embedded. We also compare the performance of the GCE noise-robust loss function when applied to D-LinkNet, and further conduct direct comparisons with state-of-the-art noise-robust road extraction methods including RCFSNet and UGD-DLinkNet. Finally, we implement cross-dataset generalization validation experiments, which comprehensively demonstrate the superiority of the proposed PLS strategy.

5.1. Datasets

5.1.1. CH4P (China Four Provinces) Dataset

CH4P is a large-scale dataset constructed to evaluate model robustness facing severe real-world underlabeling scenarios. It contains 13,498 remote sensing images with a fixed resolution of 1000 × 1000 pixels and a ground sampling distance of approximately 0.5 m/pixel. These images are collected from four representative Chinese provinces: Shandong, Shanxi, Gansu, and Guangxi. The selected provinces cover diverse geographical terrains, including coastal plains, loess plateaus, arid inland areas, and karst landforms, which guarantees rich and representative rural road morphological patterns.
The dataset construction pipeline is detailed as follows. We first retrieved road centerline coordinates from OpenStreetMap (OSM) covering the four target provinces. Along the extracted road network, we randomly sampled seed points and took their latitude and longitude as the centers of candidate image tiles. For each seed point, we downloaded a 1000 × 1000 satellite image tile via the Mapbox Static Images API at zoom level 17, corresponding to a spatial resolution of about 0.5 m/pixel.
For each downloaded image tile, we extracted OSM road annotations within the corresponding geographic bounding box. We retained line features tagged as highway, bridge, or tunnel as road vector primitives, and rasterized these vector data into binary segmentation masks consistent with the spatial resolution of remote sensing imagery. Since OSM lacks explicit width information for most minor rural roads, we adopted a fixed default road width of 6 pixels, following the standard setting of the Massachusetts Roads Dataset [2]. The raw annotations directly derived from public map data naturally contain realistic label noise, such as missing road segments, coordinate offsets, and inaccurate road widths.
The proposed China Four Provinces (CH4P) dataset consists of 13,498 high-resolution remote sensing image-mask pairs. We partitioned the whole dataset into a training set of 11,296 samples and a validation set of 2202 samples, accounting for 83.7% and 16.3% of the total volume, respectively. Geographically, CH4P spans a broad spatial range across China, with longitude ranging from 93.03°E to 122.00°E and latitude from 20.92°N to 42.25°N, ensuring abundant geographic diversity and complex rural road scenarios. In terms of road density statistics, the training set achieves an average road coverage of 3.94% with a standard deviation of 3.10%, while the test set has a mean road density of 4.15% and a standard deviation of 3.27%. Both subsets share similar road proportion distributions, with the minimum road density of 0.10% and the maximum value around 25%.
To support reliable quantitative evaluation with clean ground truth, we additionally constructed a refined subset containing 150 manually annotated images sampled from the same geographic distribution as CH4P. The manual refinement mainly focuses on retrieving missing road segments and optimizing road boundary accuracy. We adopted a dual-annotator cross-verification annotation protocol: two annotators independently revised each image, and all annotation inconsistencies were settled through collective discussion and consensus confirmation. From these 150 high-quality refined images, we randomly selected 25 samples to form CH4P-mini-val for hyperparameter tuning, and reserved the remaining 125 samples as CH4P-mini-test for final model evaluation.
To quantitatively analyze the annotation quality of raw OSM labels, we conducted a comparative evaluation against the manually refined annotations on the 150-image subset. Statistically, 18.4% of real road pixels are missing in the original OSM annotations. Among the pixels marked as roads in raw labels, 94.2% are verified as correct by manual refinement, demonstrating that positive road annotations remain highly reliable despite widespread underlabeling. These imperfections arise from coordinate offsets where annotated road centerlines deviate from actual road locations, width biases caused by inappropriately narrow or broad default road settings, and outdated map geographic data. Several typical cases of such raw annotation errors are illustrated in the Figure 3.
These results verify that the CH4P dataset well reproduces the inherent annotation defects of public map resources. It thus provides a realistic and challenging benchmark to evaluate the noise robustness of road extraction algorithms under practical real-world label corruption.

5.1.2. DeepGlobe Road Extraction Dataset

The DeepGlobe Road Extraction Dataset [33] provides 6226 high-resolution satellite images (1024 × 1024 pixels, 0.5 m/pixel) with pixel-wise road annotations, covering rural and urban areas in Thailand, Indonesia, and India. We randomly split the 6226 annotated images into a training set of 5189 images and a noisy validation set of 1037 images. The dataset also contains an additional 1243 unannotated images originally intended for benchmark evaluation.
To obtain clean evaluation data with near-complete road labels, we randomly selected 100 images from the 1243 unannotated images and manually refined their road annotations to correct primarily missing roads, such as narrow, unpaved, or partially occluded segments. The refinement protocol followed the same dual-annotator cross-verification procedure used for CH4P: two independent annotators corrected each image and all discrepancies were resolved through consensus review to minimize subjective bias.
From these 100 refined images, we randomly split off 20 images as DG-mini-val, which are used exclusively for hyperparameter selection (e.g., patch size S and number of patches K). The remaining 80 images constitute the DG-mini-test set, which is used to evaluate the practical capability of different methods, including our proposed approach, in extracting underlabeled roads.

5.2. Implementation Details

We adopted D-LinkNet34 and Segformer as the backbone networks to validate the effectiveness of the proposed PLS strategy. The training pipeline follows the original implementation of D-LinkNet34 and Segformer, with a batch size of 8 and standard data augmentation strategies, including random flipping, rotation, and color jitter. All models were trained for 100 epochs, with an early stopping mechanism set based on validation loss. For the PLS strategy, we cropped K patches of size S × S per image, each centered on a positively labeled road pixel. The loss function adopted a combination of binary cross-entropy loss and Dice loss, which was computed exclusively within these sampled patches. Parameter sensitivity analysis (Section 5.3) was performed on the DG-min-val subset and CH4P-mini-val subset, and the optimal parameters obtained were fixed for all subsequent experiments. Notably, the mini-val subsets of the DeepGlobe and CH4P datasets were used solely for hyperparameter tuning, while the corresponding mini-test subsets were reserved exclusively for the final comparative experiments.
We report standard segmentation metrics: Intersection over Union (IoU), F1-score, Precision, and Recall. IoU measures the overlap between predicted and ground truth road pixels, while F1-score balances precision and recall. Precision reflects the accuracy of positive predictions and recall indicates the fraction of true road pixels captured by the model. All metrics are computed on the original validation splits, and we also report results on the manually refined samples to assess performance under corrected labels.

5.3. Analysis of Parameter Sensitivity

In accordance with the implementation workflow of Algorithm 1, it is essential to determine two core hyperparameters: the size of local patches S and the number of sampled patches K. To quantitatively evaluate the hyperparameter sensitivity of the proposed PLS strategy, we trained models with various combinations of S and K on both the DeepGlobe and CH4P datasets, and assessed their inference performance on the manually refined mini-val validation subsets.

5.3.1. Effect of Patch Size S

Table 1 reports IoU, F1, Precision, and Recall for S { 32 , 64 , 128 , 256 , 512 } with K = 16 fixed, along with the baseline full-image training ( S = 1024 , K = 1 ). The choice of powers of two for S is not a restriction but merely a convenience, for instance to allow potential future extensions such as hierarchical or multi-scale patch processing.
On the two manually refined mini-val subsets, the influence of patch size S on model performance exhibits a consistent trend. Precision increases monotonically as S grows, while Recall decreases accordingly with increasing S. These ablation results reveal that the patch size S serves as an effective knob to balance the model’s sensitivity to false negatives (i.e., missing roads). As S increases, the model incorporates richer contextual information around each positive anchor, which boosts Precision by suppressing spurious predictions, yet incurs a decline in Recall. This is because larger patches may introduce background noise that overwhelms weak road signals. Conversely, an excessively small S forces the model to compute loss over a higher proportion of positive samples within a much smaller receptive field, making the model more sensitive to positive annotation noise (i.e., mislabeled road pixels). Therefore, by tuning S, PLS can be adapted to diverse data characteristics and application requirements. For instance, we can prioritize Recall for comprehensive road network mapping in rural areas, or emphasize Precision for urban planning scenarios where false positives incur higher costs. This inherent flexibility renders PLS a versatile framework for robust road extraction under diverse real-world conditions.
Finally, the model achieves the optimal overall comprehensive performance at S = 256 on both manually refined mini validation subsets. Accordingly, the fixed configuration of S = 256 is adopted for the aforementioned PLS strategy in all subsequent experiments.

5.3.2. Effect of Number of Patches K

With the patch size fixed at S = 256 , we varied the number of sampled patches K within the set { 4 , 8 , 16 , 32 } , trained the PLS-based model on both the DeepGlobe and CH4P datasets, and evaluated its inference performance on the manually refined mini-val subsets. The corresponding quantitative metrics are summarized in Table 2. Theoretically, the value of K governs the supervision intensity of the locally sampled patches: a larger K corresponds to a higher probability of applying effective supervision in valid road regions within the image. However, the impact of K on model performance does not stem from the alteration of the model’s feature extraction capability, but mainly from the training convergence process. For this reason, its influence on final segmentation performance is less pronounced than that of the patch size S.
Quantitative results show that the maximum performance fluctuation across the reasonable range of K { 4 , 8 , 16 , 32 } is less than 6% in IoU for both datasets. On the DeepGlobe dataset, the model achieves the optimal performance at K = 8 , while the best performance on the CH4P dataset is obtained at K = 16 . In this paper, we select K = 16 as the fixed configuration for all subsequent test experiments. Collectively, compared to the patch size S, the number of patches K has a relatively mild impact on model performance, indicating that PLS is not highly sensitive to the exact choice of K within a reasonable range. Notably, an excessively large K will lead to performance degradation. This is because repeated local sampling within sparse road regions may cause the model to suffer from local overfitting, which further impairs the overall segmentation performance.

5.4. Comparative Experiments

To comprehensively demonstrate the superiority of the proposed PLS strategy, we conducted a series of comparative experiments as follows.
First, we adopted D-LinkNet (a representative encoder–decoder architecture) and Segformer (a representative Transformer-based architecture) as the backbone networks, to verify that the PLS strategy can bring robust performance, especially a significant improvement in the extraction capability of underlabeled roads.
Second, with D-LinkNet as the backbone network, we tested the noise-robust loss function GCE, as well as RLS, a variant of the PLS strategy that samples patches randomly from the whole image regardless of the label of the patch center (random sampling). This group of experiments is designed to verify the superiority of PLS over noise-robust loss functions and other local supervision strategies, and further demonstrate the effectiveness of local supervision based on positive sample sampling.
Finally, we conducted direct comparisons with two recent state-of-the-art road extraction methods that are robust to annotation noise, namely RCFSNet [27] and UGD-DLinkNet [26], to validate the superior performance of the proposed PLS strategy. Both methods were fully trained using the publicly released source codes from their authors.

5.4.1. Results on DeepGlobe

All models were trained on the DeepGlobe training set, and the quantitative performance of different methods on the DG-mini-test subset is presented in Table 3.
First, with the PLS strategy integrated, our method outperforms the vanilla D-LinkNet backbone by 0.124 in F1 score and 0.123 in IoU, and outperforms the vanilla Segformer backbone by 0.076 in F1 score and 0.082 in IoU. These quantitative results solidly demonstrate the superiority of the proposed PLS strategy. The significant performance gain mainly comes from the sharp rise in recall, which validates our core claim that PLS can effectively improve the model’s ability to extract underlabeled roads.
Compared with the noise-robust GCE loss function, D-LinkNet + PLS outperforms D-LinkNet + GCE by 0.128 in F1 score and 0.150 in IoU. Overall, D-LinkNet + GCE achieves slightly worse performance than the vanilla D-LinkNet backbone, which confirms that the statistical noise smoothing mechanism of GCE is not applicable to scenarios with widespread road underlabeling, and is at least not significantly superior to the standard loss function.
Compared with different local supervision strategies, D-LinkNet + PLS outperforms D-LinkNet + RLS by 0.093 in F1 score and 0.113 in IoU. This result strongly validates the effectiveness of positive-sample-based local sampling, and further justifies our core assumption that “positive annotations are reliable”. Meanwhile, it is worth noting that D-LinkNet + RLS achieves slightly better performance than the vanilla D-LinkNet backbone. This is because random sampling passively discards part of the underlabeling areas during the training process, thus reducing the adverse impact of erroneous gradients from underlabeled samples to a certain extent.
Compared with state-of-the-art (SOTA) noise-robust road extraction methods, both our D-LinkNet + PLS and Segformer + PLS significantly outperform RCFSNet and UGD-DLinkNet, with a minimum lead of 0.063 in F1 score and 0.065 in IoU. Two key conclusions can be drawn from this comparison: First, RCFSNet exhibits notable robustness, which is reflected in its processing of semantic consistency of similar appearance features. This enables it to mine and extract potential road features through global context, making it outperform all other compared methods except the proposed PLS. Second, although UGD-DLinkNet is designed to address the same underlabeling problem as our work, its complex design and restrictive assumptions lead to degraded performance. It tends to down-weight the features with high uncertainty, which results in fewer underlabeled road regions participating in gradient update during training.
Representative visualization examples are presented in Figure 4, which intuitively corroborate the quantitative conclusions summarized in Table 3. Figure 4a,b illustrate the extraction performance of rural unpaved trails in diverse environments. Within the orange bounding boxes, the proposed PLS method correctly extracts most of these rural roads, followed by RCFSNet which retrieves a portion of the target roads, while all other compared methods fail. This performance gap arises because such low-grade roads are partially underlabeled as background in the training set. Figure 4c presents a challenging scenario where roads are occluded by dense tree cover. Similarly, only our proposed PLS method achieves complete road extraction in this case, RCFSNet realizes partial extraction, and all other methods fail to retrieve the occluded road segments. Finally, Figure 4d depicts another common scenario: the extraction of arterial roads and secondary roads in dense road network regions, where some secondary roads are also prone to underlabeling. In this scenario, all methods can successfully extract most of the road segments. Our method shows the capability to retrieve some underlabeled secondary roads, but its superiority is not prominent in this scenario. From the above qualitative analysis, we can draw a clear conclusion: the core advantage of the proposed PLS strategy lies in extracting low-grade rural roads that are far from arterial roads and frequently affected by underlabeling. In contrast, the superiority of our method diminishes in dense urban road regions. This is because the sampling range is wider in dense road regions, and even underlabeled road segments are highly likely to be included in the sampled patches for loss calculation under the PLS framework, and thus cannot be excluded from the gradient optimization process.

5.4.2. Results on CH4P

The CH4P dataset exhibits even more severe noisy labels, indicating the situation of real-world map data that lack human annotation. On this dataset, we test the robustness of the road extraction methods for large-scale road extraction applications in real scenarios.
The quantitative results of different methods on the CH4P-mini-test subset are summarized in Table 4.
First, compared with the vanilla backbone networks, D-LinkNet + PLS outperforms the original D-LinkNet by 0.104 in F1 score and 0.104 in IoU; Segformer + PLS outperforms the original Segformer by 0.083 in F1 score and 0.086 in IoU. Consistent with the observations on the DeepGlobe dataset, the performance gain is mainly attributed to a sharp rise in recall, accompanied by a moderate decline in precision, which reflects the inherent trade-off between precision and recall under noisy annotation scenarios. Overall, the PLS-based methods achieve a substantial lead over their corresponding backbone networks in comprehensive evaluation metrics.
Second, compared with the noise-robust loss function, D-LinkNet + PLS outperforms D-LinkNet + GCE by 0.299 in F1 score and 0.258 in IoU. The conclusions here are consistent with the analysis on the DeepGlobe dataset, and thus will not be elaborated further.
Subsequently, compared with the random local supervision strategy, D-LinkNet + PLS significantly outperforms D-LinkNet + RLS in comprehensive metrics, further validating the effectiveness of the positive-sample-based sampling strategy.
Finally, compared with the two SOTA methods, both of our PLS-based variants outperform SOTA robust road extraction methods by a minimum margin of 0.071 and 0.082 in F1 score, demonstrating the superiority of the proposed method. Notably, RCFSNet achieves a lower quantitative score, which is inconsistent with the visualization results presented in subsequent sections. This is because its segmentation outputs contain tiny holes and artifacts in regions with high uncertainty, which reduces its pixel-level evaluation scores, while the structural accuracy of the extracted road network is actually favorable.
Figure 5 presents the road extraction results of representative samples from the four provinces covered by the CH4P dataset. The results demonstrate that our method can effectively extract low- and medium-grade roads, including the rural roads in Figure 5a, the mountain trails in Figure 5b, the dense roads in villages and towns in Figure 5c, and the secondary roads in Figure 5d. This confirms that the proposed PLS strategy also outperforms the compared methods on the CH4P dataset with inherent positive annotation noise, and experimentally validates the correctness of the theoretical analysis in Section 4.3.
Meanwhile, it should be noted that the performance difference between different methods is no longer significant in dense urban road scenarios, and the prominent superiority of PLS is difficult to observe from visualization examples. This phenomenon has been analyzed in Section 5.4.1. We further analyze the performance variation of each method under different road densities in the following subsection to verify this conclusion, and clarify the applicable boundary of the superiority of PLS.

5.4.3. Analysis with Road Density

We statistically analyzed the performance of different methods on samples with varying road densities, with the results shown in Figure 6. In the left analysis plot for the DeepGlobe dataset, the D-LinkNet + PLS variant achieves excellent performance across all road density intervals. It only underperforms the vanilla D-LinkNet in high-density samples with road density ranging from 0.15 to 0.30, while the two PLS-based variants rank top 2 in overall performance for low- and medium-density samples with road density below 0.10. This is because these low- and medium-density regions are exactly the high-frequency occurrence scenarios of rural low-grade roads, which contribute the main performance gain for the PLS-based methods. In the right analysis plot for the CH4P dataset, we draw consistent conclusions: the PLS strategy exhibits prominent superiority in low- and medium-density road scenarios with a positive sample proportion below 10%. Notably, the two datasets show opposite trends in performance variation with road density. For the DeepGlobe dataset, the performance of all methods improves as road density increases, while for the CH4P dataset, performance declines with rising road density. This discrepancy arises because most samples in the CH4P dataset lack explicit road width information, which makes the model more prone to width estimation errors in dense road scenarios such as urban arterial roads, resulting in lower quantitative scores. In summary, the core superiority of PLS lies in the extraction of rural low-grade roads in low and medium-road-density scenarios, while its advantage over baseline methods diminishes in high-road-density scenarios. This is because positive-sample-guided local sampling may cover most regions of the image in high-density scenarios, and its ability to shield the model from adverse gradients of underlabeled samples degrades accordingly. On the other hand, as another core contribution of this work, the CH4P dataset is of great value for the training of rural low-grade road extraction tasks, rather than for model training in high-density road scenarios (mainly urban road types).

5.5. Analysis of Cross-Dataset Generalization Ability

To further verify the generalization ability of the proposed PLS method, we perform zero-shot cross-dataset evaluation. All models are trained on the source dataset and directly evaluated on the unseen target dataset without additional fine-tuning. Quantitative results are reported in Table 5 and Table 6, while qualitative visualization illustrations are provided in Figure 7 and Figure 8.
The quantitative results in Table 5 consistently demonstrate the strong generalization capability of the proposed PLS strategy on unseen data, which significantly outperforms its corresponding vanilla backbones and other compared methods. We attribute this robustness to the suppression of overfitting to negative sample features achieved by the positive-sample-guided local sampling of PLS. This confirms that the PLS method can learn robust road extraction capability from datasets with higher annotation noise, enabling it to substantially outperform the compared methods on unseen datasets. Meanwhile, the overall performance of the compared methods warrants further in-depth analysis: models trained on the CH4P dataset yield overall lower quantitative metrics on the DG-mini-test subset. The visualization results in Figure 7a,b reveal that this performance degradation stems from the failure of these models to effectively extract the abundant low-grade rural roads present in the DeepGlobe dataset. In addition, the insufficient annotation of road width in the CH4P dataset also constrains model performance, as shown in Figure 7c,d, where all compared methods fail to achieve accurate road width prediction.
Compared with the cross-dataset evaluation on the DG-mini-test subset, the cross-dataset validation on the CH4P-mini-test subset presents a notable distinct phenomenon. Table 6 shows that for most methods, the road extraction performance achieved via cross-dataset transfer (i.e., models trained on the DeepGlobe dataset and tested on the CH4P-mini-test subset) is even superior to that of models with in-domain training and testing (i.e., models trained and tested on the CH4P dataset). By comparing Table 6 with Table 4, the vast majority of methods achieve better or comparable results in the cross-dataset setting. The representative examples presented in Figure 8 also explicitly support this conclusion: models trained on the DeepGlobe dataset mostly exhibit superior visual road extraction performance on the CH4P-mini-test subset.
This phenomenon can be attributed to two key factors. First, the road annotations in the DeepGlobe dataset include explicit width information, which is consistent with the annotation format of the CH4P-mini-test subset. Thus, training with DeepGlobe data enables the model to learn more accurate road width estimation, leading to improved pixel-level quantitative scores. Second, the DeepGlobe dataset has higher annotation quality with less positive annotation noise, which has been verified in Section 4.2, further contributing to the performance improvement.
We then conduct further analysis on the comparison results in Table 4: First, both PLS-based variants achieve consistent performance gains over their corresponding vanilla backbones. Specifically, D-LinkNet + PLS outperforms the vanilla D-LinkNet backbone by 0.009 in F1 score and 0.007 in IoU, while Segformer + PLS outperforms the vanilla Segformer backbone by 0.015 in F1 score and 0.016 in IoU. Meanwhile, the D-LinkNet backbone outperforms Segformer + PLS. This result reveals that cross-dataset transfer can improve generalization performance, because the sample noise distributions differ between the two datasets: the biased samples present in the DeepGlobe dataset rarely appear in the CH4P dataset.
Second, D-LinkNet + PLS still significantly outperforms D-LinkNet+GCE and D-LinkNet + RLS, and these two methods still fail to achieve competitive generalization performance on unseen data.
Finally, compared with the two SOTA schemes, both PLS-based variants outperform the two SOTA compared methods, which further demonstrates the superior generalization capability of our proposed PLS on unseen datasets.
In addition, the visualization examples shown in Figure 8 further confirm that the proposed PLS method maintains stronger low-grade road extraction capability and better overall comprehensive performance in cross-dataset scenarios.
Overall, the most critical conclusion is drawn from the zero-shot cross-dataset generalization experiment with models trained on the CH4P dataset and tested on the DeepGlobe dataset: the proposed PLS strategy possesses stronger noise robustness, and is capable of learning more robust road representation capability from real-world noisy data compared with all other compared methods.

5.6. Limitations

While the proposed PLS strategy achieves superior performance in extracting low-grade rural roads, several limitations and application caveats should be noted as follows.
Despite the improved extraction capability for low-grade rural roads, the proposed method still has failure cases. As shown in Figure 9, neither the proposed PLS strategy, the vanilla backbones, nor other compared methods successfully extract the road segments within the orange bounding boxes. This failure is attributed to two main factors. First, these target road segments are inherently challenging. The ambiguity of their semantic features makes it difficult for the algorithm to achieve effective identification. Second, the fixed values of patch size S, and the shape of local supervision patches adopted in this work are overly rigid when dealing with more complex sample distributions. As a result, the positive-guided sampling inevitably includes some underlabeled road segments, meaning that the adverse gradients caused by underlabeling are only mitigated rather than completely eliminated. For future work, we will explore the adaptive adjustment of S, K, and further the shape of the sampled patches according to road density or the uncertainty of feature extraction, as well as more intelligent positive sample anchor selection strategies, to further mitigate the adverse impact of underlabeled samples.
In addition, it should be noted that the core superiority of the PLS strategy is concentrated in the extraction of low-grade rural roads. Therefore, this method is not the optimal choice for application scenarios that focus on extracting drivable arterial roads, such as high-precision map construction for autonomous driving.

6. Conclusions

Accurate road extraction from high-resolution remote sensing imagery remains a critical bottleneck for large-scale geospatial applications, primarily limited by the pervasive underlabeling of low-grade rural roads in real-world training datasets—a defect that systematically introduces misleading gradients and undermines the performance of mainstream end-to-end dense segmentation networks. This work first provides a mechanistic decomposition of the gradient dynamics of the two dominant road extraction paradigms under label noise, revealing that the localized supervision of patch-based methods inherently isolates adverse effects from underlabeled regions, while end-to-end architectures suffer from inherent vulnerability to incomplete annotations due to their global equal-weight dense supervision scheme. Building on this theoretical insight, we propose the Positive-guided Local Supervision (PLS) strategy, a lightweight training paradigm that reconciles the complementary strengths of the two frameworks: it retains the full end-to-end forward pass to preserve global context modeling and computational efficiency, while restricting loss computation exclusively to local patches anchored at reliably annotated positive road pixels, effectively shielding the model from the systematic bias induced by missing road annotations.
We validate the effectiveness of PLS through extensive experiments on two datasets: the public DeepGlobe road extraction benchmark, and the China Four Provinces (CH4P) dataset, a large-scale challenging benchmark constructed in this work. Comprising 13,498 high-resolution rural remote sensing images with annotations directly sourced from public web maps, CH4P faithfully reproduces the realistic underlabeling, width inaccuracy, and coordinate offset defects of real-world mapping workflows, filling a key gap in existing benchmarks that lack authentic noisy annotations for rural road extraction research. Quantitative results on manually refined test subsets demonstrate that PLS consistently delivers substantial performance gains over vanilla backbones, state-of-the-art noise-robust loss functions, and specialized road extraction networks, with the most prominent improvements observed in low- and medium-density rural road scenarios. Zero-shot cross-dataset generalization experiments further confirm that PLS imparts stronger noise robustness to mainstream segmentation networks, enabling superior road representation learning from highly noisy real-world data and better transfer performance on unseen datasets, with no additional computational overhead introduced during either training or inference.
This work also explicitly delineates the applicability boundary of the proposed method to guide practical deployment. The core superiority of PLS is concentrated in low-grade rural road extraction tasks, while its performance advantage diminishes in high-density urban road scenarios.
Overall, the proposed PLS strategy provides a practical, scalable, and easily implementable solution for robust road extraction under real-world noisy annotation conditions, with no requirement for architectural modifications or complex inference pipelines. Beyond road extraction, the core principle of positive-guided localized supervision holds strong transfer potential for other dense prediction tasks in remote sensing that suffer from systematic label incompleteness, such as building extraction and fine-grained land cover mapping. Future work will explore adaptive adjustment of patch parameters based on scene characteristics and feature uncertainty, integration with semi-supervised learning frameworks to further reduce annotation reliance, and extension of the PLS paradigm to a wider range of remote sensing image interpretation tasks.

Author Contributions

Conceptualization, H.H.; methodology, H.H.; software, H.H.; data curation, H.H.; validation, S.W. and Y.L.; formal analysis, S.W., L.H. and Y.L.; investigation, S.W., L.H. and Y.L.; resources, L.H. and D.Y.; writing—original draft preparation, H.H.; writing—review and editing, D.Y.; supervision, X.F. and D.Y.; project administration, X.F.; funding acquisition, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Shandong Province under Grant ZR2023QD087 and the China Postdoctoral Science Foundation under Grant 2024M764316.

Data Availability Statement

The code and datasets are available at https://github.com/hehao209/Positive-guided-Local-Supervision (accessed on 2 May 2026). The use of Mapbox satellite imagery for this non-commercial research is permitted under the Mapbox Terms of Service, which allow tracing of Mapbox-hosted satellite maps to produce derivative datasets for non-commercial purposes. To comply with the copyright restrictions of the original imagery providers, we release only the annotation files and tile coordinates, not the raw satellite image tiles. Researchers can obtain the corresponding imagery by querying the Mapbox API with their own access tokens.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, R.; Wu, J.; Lu, W.; Miao, Q.; Zhang, H.; Liu, X.; Lu, Z.; Li, L. A Review of Deep Learning-Based Methods for Road Extraction from High-Resolution Remote Sensing Images. Remote Sens. 2024, 16, 2056. [Google Scholar] [CrossRef]
  2. Mnih, V.; Hinton, G.E. Learning to detect roads in high-resolution aerial images. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Berlin, Germany, 2010; pp. 210–223. [Google Scholar]
  3. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
  4. Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: Piscataway, NJ, USA, 2018; pp. 192–1924. [Google Scholar] [CrossRef]
  5. Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
  6. Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-of-The-Art Review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
  7. Lian, R.; Wang, W.; Mustafa, N.; Huang, L. Road Extraction Methods in High-Resolution Remote Sensing Images: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5489–5507. [Google Scholar] [CrossRef]
  8. Zhang, X.; Ma, W.; Li, C.; Wu, J.; Tang, X.; Jiao, L. Fully Convolutional Network-Based Ensemble Method for Road Extraction From Aerial Images. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1777–1781. [Google Scholar] [CrossRef]
  9. Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the Computer Vision—ECCV 2022 Workshops; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 205–218. [Google Scholar]
  10. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY, USA, 6–14 December 2021. [Google Scholar]
  11. Bolcek, J.; Gibril, M.B.A.; Al-Ruzouq, R.; Shanableh, A.; Jena, R.; Hammouri, N.; Sachit, M.S.; Ghorbanzadeh, O. A comprehensive evaluation of deep vision transformers for road extraction from very-high-resolution satellite data. Sci. Remote Sens. 2025, 11, 100190. [Google Scholar] [CrossRef]
  12. Chen, H.; Yang, L.; Jia, Q.; Xiong, W. RoadFocusNet: Road extraction from remote sensing imagery using focused transformer and focused masked image modeling. Int. J. Digit. Earth 2025, 18, 2549435. [Google Scholar] [CrossRef]
  13. Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
  14. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
  15. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
  16. Chaurasia, A.; Culurciello, E. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP); IEEE: Piscataway, NJ, USA, 2017; pp. 1–4. [Google Scholar] [CrossRef]
  17. Gao, X.; Sun, X.; Zhang, Y.; Yan, M.; Xu, G.; Sun, H.; Jiao, J.; Fu, K. An End-to-End Neural Network for Road Extraction From Remote Sensing Imagery by Multiple Feature Pyramid Network. IEEE Access 2018, 6, 39401–39414. [Google Scholar] [CrossRef]
  18. Li, Y.; Guo, L.; Rao, J.; Xu, L.; Jin, S. Road Segmentation Based on Hybrid Convolutional Network for High-Resolution Visible Remote Sensing Image. IEEE Geosci. Remote Sens. Lett. 2019, 16, 613–617. [Google Scholar] [CrossRef]
  19. Xu, Y.; Xie, Z.; Feng, Y.; Chen, Z. Road Extraction from High-Resolution Remote Sensing Imagery Using Deep Learning. Remote Sens. 2018, 10, 1461. [Google Scholar] [CrossRef]
  20. Dong, S.; Chen, Z. Block Multi-Dimensional Attention for Road Segmentation in Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6504505. [Google Scholar] [CrossRef]
  21. Wei, Y.; Wang, Z.; Xu, M. Road Structure Refined CNN for Road Extraction in Aerial Image. IEEE Geosci. Remote Sens. Lett. 2017, 14, 709–713. [Google Scholar] [CrossRef]
  22. Henry, C.; Azimi, S.M.; Merkle, N. Road Segmentation in SAR Satellite Images with Deep Fully Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1867–1871. [Google Scholar] [CrossRef]
  23. Mosinska, A.; Marquez-Neila, P.; Kozinski, M.; Fua, P. Beyond the Pixel-Wise Loss for Topology-Aware Delineation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2018; pp. 3136–3145. [Google Scholar] [CrossRef]
  24. Zhang, Z.; Sabuncu, M.R. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Red Hook, NY, USA, 3–8 December 2018; pp. 8792–8802. [Google Scholar]
  25. Li, P.; He, X.; Qiao, M.; Cheng, X.; Li, Z.; Luo, H.; Song, D.; Li, D.; Hu, S.; Li, R.; et al. Robust Deep Neural Networks for Road Extraction From Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6182–6197. [Google Scholar] [CrossRef]
  26. Yang, P.; Xiao, H.; Lin, C.; Xie, X. UGD-DLinkNet: An Enhanced Network for Occluded Road Extraction Using Attention Mechanisms and Uncertainty Estimation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 24144–24161. [Google Scholar] [CrossRef]
  27. Yang, Z.; Zhou, D.; Yang, Y.; Zhang, J.; Chen, Z. Road Extraction From Satellite Imagery by Road Context and Full-Stage Feature. IEEE Geosci. Remote Sens. Lett. 2023, 20, 8000405. [Google Scholar] [CrossRef]
  28. Wei, Y.; Ji, S. Scribble-Based Weakly Supervised Deep Learning for Road Surface Extraction From Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5602312. [Google Scholar] [CrossRef]
  29. Wu, S.; Du, C.; Chen, H.; Xu, Y.; Guo, N.; Jing, N. Road Extraction from Very High Resolution Images Using Weakly labeled OpenStreetMap Centerline. ISPRS Int. J.-Geo-Inf. 2019, 8, 478. [Google Scholar] [CrossRef]
  30. Bonafilia, D.; Gill, J.; Basu, S.; Yang, D. Building High Resolution Maps for Humanitarian Aid and Development with Weakly- and Semi-Supervised Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, 16–20 June 2019; Computer Vision Foundation/IEEE: Piscataway, NJ, USA, 2019; pp. 1–9. [Google Scholar]
  31. Abdollahi, A.; Pradhan, B.; Sharma, G.; Maulud, K.N.A.; Alamri, A.M. Improving Road Semantic Segmentation Using Generative Adversarial Network. IEEE Access 2021, 9, 64381–64392. [Google Scholar] [CrossRef]
  32. Zhang, Y.; Xiong, Z.; Zang, Y.; Wang, C.; Li, J.; Li, X. Topology-Aware Road Network Extraction via Multi-Supervised Generative Adversarial Networks. Remote Sens. 2019, 11, 1017. [Google Scholar] [CrossRef]
  33. Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. DeepGlobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops; IEEE: Piscataway, NJ, USA, 2018; pp. 172–181. [Google Scholar]
Figure 1. Representative samples illustrating underlabeling. The first row are satellite images and the second are the road annotations. (ad) are from DeepGlobe Road Extraction Dataset, and (eh) are from CH4P Dataset. The underlabeled roads are highlighted in red boxes.
Figure 1. Representative samples illustrating underlabeling. The first row are satellite images and the second are the road annotations. (ad) are from DeepGlobe Road Extraction Dataset, and (eh) are from CH4P Dataset. The underlabeled roads are highlighted in red boxes.
Remotesensing 18 01589 g001
Figure 2. Overview of the proposed Positive-guided Local Supervision (PLS) strategy. (a,b) illustrates the end-to-end paradigm, where the entire image is processed in a single forward pass. (c,e) illustrates the patch-based paradigm, where each local patch independently classifies its central pixel. In (d), the patch-based idea is integrated into the end-to-end framework: positive pixels (reliably annotated roads) are sampled as centers, and local patches are extracted from both the prediction and the ground truth. This workflow is concretely formulated as the local dense supervision paradigm in (f). The loss is computed only within these patches in (g), focusing supervision on trustworthy regions while effectively ignoring underlabeled areas (indicated by blue dashed boxes). This design inherits the noise robustness of patch-based methods while preserving the global context and efficiency of end-to-end networks.
Figure 2. Overview of the proposed Positive-guided Local Supervision (PLS) strategy. (a,b) illustrates the end-to-end paradigm, where the entire image is processed in a single forward pass. (c,e) illustrates the patch-based paradigm, where each local patch independently classifies its central pixel. In (d), the patch-based idea is integrated into the end-to-end framework: positive pixels (reliably annotated roads) are sampled as centers, and local patches are extracted from both the prediction and the ground truth. This workflow is concretely formulated as the local dense supervision paradigm in (f). The loss is computed only within these patches in (g), focusing supervision on trustworthy regions while effectively ignoring underlabeled areas (indicated by blue dashed boxes). This design inherits the noise robustness of patch-based methods while preserving the global context and efficiency of end-to-end networks.
Remotesensing 18 01589 g002
Figure 3. Examples of annotation noise in the CH4P dataset. In the first row, raw public annotations from OSM are highlighted in red; in the second row, manually refined annotations are highlighted in green. In addition to the aforementioned underlabeling, these samples also contain positive sample noise such as inaccurate road width and road coordinate offsets. (a) shows the width error of original annotations, (b) presents an example with annotation offset, (c) illustrates missing annotations and offsets on hardened roads, and (d) demonstrates missing annotations, width errors and offsets of rural roads.
Figure 3. Examples of annotation noise in the CH4P dataset. In the first row, raw public annotations from OSM are highlighted in red; in the second row, manually refined annotations are highlighted in green. In addition to the aforementioned underlabeling, these samples also contain positive sample noise such as inaccurate road width and road coordinate offsets. (a) shows the width error of original annotations, (b) presents an example with annotation offset, (c) illustrates missing annotations and offsets on hardened roads, and (d) demonstrates missing annotations, width errors and offsets of rural roads.
Remotesensing 18 01589 g003
Figure 4. Qualitative Comparison of Road Extraction Results on the DG-mini-test subset. (a,b) show the extraction performance of rural unpaved trails under complex environments; (c) presents a challenging scene with roads occluded by dense tree cover; (d) illustrates the extraction of arterial and secondary roads in dense road networks. Orange bounding boxes highlight easily overlooked road targets, where the proposed PLS achieves superior performance.
Figure 4. Qualitative Comparison of Road Extraction Results on the DG-mini-test subset. (a,b) show the extraction performance of rural unpaved trails under complex environments; (c) presents a challenging scene with roads occluded by dense tree cover; (d) illustrates the extraction of arterial and secondary roads in dense road networks. Orange bounding boxes highlight easily overlooked road targets, where the proposed PLS achieves superior performance.
Remotesensing 18 01589 g004
Figure 5. Qualitative comparison on CH4P-mini-test set. (a) rural road extraction scene; (b) mountain trail extraction scene; (c) dense village road extraction scene; (d) secondary road extraction scene. Trained on real-world data, the orange boxes indicate the areas where the other methods failed to extract correct roads, while our method can extract these roads.
Figure 5. Qualitative comparison on CH4P-mini-test set. (a) rural road extraction scene; (b) mountain trail extraction scene; (c) dense village road extraction scene; (d) secondary road extraction scene. Trained on real-world data, the orange boxes indicate the areas where the other methods failed to extract correct roads, while our method can extract these roads.
Remotesensing 18 01589 g005
Figure 6. Performance analysis of different road extraction methods on samples with varying road densities. The left plot shows the quantitative performance of all methods on the DeepGlobe dataset across different road density intervals, while the right plot presents the corresponding results on the CH4P dataset.Each subplot adopts a dual-Y-axis design: the left vertical axis represents the value of quantitative evaluation metrics, and the right vertical axis indicates the number of samples in the corresponding road density interval.
Figure 6. Performance analysis of different road extraction methods on samples with varying road densities. The left plot shows the quantitative performance of all methods on the DeepGlobe dataset across different road density intervals, while the right plot presents the corresponding results on the CH4P dataset.Each subplot adopts a dual-Y-axis design: the left vertical axis represents the value of quantitative evaluation metrics, and the right vertical axis indicates the number of samples in the corresponding road density interval.
Remotesensing 18 01589 g006
Figure 7. Qualitative cross-dataset generalization results (trained on CH4P, tested on DG-mini-test). (ad) correspond to the same data samples as those in Figure 4.
Figure 7. Qualitative cross-dataset generalization results (trained on CH4P, tested on DG-mini-test). (ad) correspond to the same data samples as those in Figure 4.
Remotesensing 18 01589 g007
Figure 8. Qualitative cross-dataset generalization results (trained on Deepglobe, tested on CH4P-mini-test). (ad) correspond to the same data samples as those in Figure 5.
Figure 8. Qualitative cross-dataset generalization results (trained on Deepglobe, tested on CH4P-mini-test). (ad) correspond to the same data samples as those in Figure 5.
Remotesensing 18 01589 g008
Figure 9. Representative Failed Samples of Low-Grade Road Extraction. (a,b) show failure cases from the DG-mini-test set, while (c,d) present failure cases from the CH4P-mini-test set. The regions in the orange bounding boxes indicate incomplete road extraction.
Figure 9. Representative Failed Samples of Low-Grade Road Extraction. (a,b) show failure cases from the DG-mini-test set, while (c,d) present failure cases from the CH4P-mini-test set. The regions in the orange bounding boxes indicate incomplete road extraction.
Remotesensing 18 01589 g009
Table 1. Ablation on patch size S (fixed K = 16 ). Metrics reported on DG-mini-val and CH4P-mini-val.
Table 1. Ablation on patch size S (fixed K = 16 ). Metrics reported on DG-mini-val and CH4P-mini-val.
SDG-Mini-Val CH4P-Mini-Val
IoU F1 Prec. Recall IoU F1 Prec. Recall
320.1870.3010.1920.8960.1710.2790.1900.779
640.3660.5220.3940.8610.3300.4750.3770.751
1280.5690.7190.6520.8170.3760.5220.4320.748
2560.6790.8030.8420.7760.4990.6340.5930.746
5120.6290.7650.9240.6650.4220.5760.6380.560
10240.5390.6800.9280.5640.3630.5090.7580.424
Note: Bold values indicate the optimal IoU and F1 scores.
Table 2. Ablation on number of patches K (fixed S = 256 ). Metrics on DG-mini-val and CH4P-mini-val.
Table 2. Ablation on number of patches K (fixed S = 256 ). Metrics on DG-mini-val and CH4P-mini-val.
KDG-Mini-ValCH4P-Mini-Val
IoU F1 Prec. Recall IoU F1 Prec. Recall
40.6870.8010.8280.7670.4460.5940.5720.667
80.6930.8140.8530.7840.4600.6030.5620.718
160.6790.8030.8420.7760.4990.6340.5930.746
320.6310.7610.7780.7740.4490.5930.5470.723
Note: Bold values indicate the optimal IoU and F1 scores.
Table 3. Quantitative Performance Comparison of Road Extraction Methods on DeepGlobe Dataset.
Table 3. Quantitative Performance Comparison of Road Extraction Methods on DeepGlobe Dataset.
CategoryMethodF1IoUPre.Rec.
BackbonesD-LinkNet0.6750.5420.9250.565
Segformer0.6780.5310.8810.579
Compared MethodsRCFSNet0.6910.5480.8630.610
UGD-DLinkNet0.6350.4820.8540.527
D-LinkNet + GCE0.6710.5270.9280.549
D-LinkNet + RLS0.7060.5620.8540.626
OursD-LinkNet + PLS0.7990.6750.8920.736
Segformer + PLS0.7540.6130.8700.675
Note: Bold values indicate the optimal IoU and F1 scores.
Table 4. Quantitative Performance Comparison of Road Extraction Methods on CH4P Dataset.
Table 4. Quantitative Performance Comparison of Road Extraction Methods on CH4P Dataset.
CategoryMethodF1IoUPre.Rec.
BackbonesD-LinkNet0.6170.4690.8220.539
Segformer0.5980.4340.7900.498
Compared MethodsRCFSNet0.5200.3570.5030.569
UGD-DLinkNet0.6100.4380.5170.742
D-LinkNet + GCE0.4220.3150.7570.361
D-LinkNet + RLS0.6030.4440.7560.525
OursD-LinkNet + PLS0.7210.5730.7270.746
Segformer + PLS0.6810.5200.6950.679
Note: Bold values indicate the optimal IoU and F1 scores.
Table 5. Cross-dataset Generalization Evaluation (trained on CH4P Dataset, tested on DG-mini-test Dataset).
Table 5. Cross-dataset Generalization Evaluation (trained on CH4P Dataset, tested on DG-mini-test Dataset).
CategoryMethodF1IoUPre.Rec.
BackbonesD-LinkNet0.3230.2120.9250.214
Segformer0.5570.3980.6970.510
Compared MethodsRCFSNet0.3490.2440.6130.309
UGD-DLinkNet0.4450.2860.3170.750
D-LinkNet + GCE0.3170.2120.8410.224
D-LinkNet + RLS0.3630.2460.8120.265
OursD-LinkNet + PLS0.6410.4770.7610.599
Segformer + PLS0.6160.4530.5200.805
Note: Bold values indicate the optimal IoU and F1 scores.
Table 6. Cross-dataset Generalization Evaluation (trained on DeepGlobe dataset, tested on CH4P-mini-test dataset).
Table 6. Cross-dataset Generalization Evaluation (trained on DeepGlobe dataset, tested on CH4P-mini-test dataset).
CategoryMethodF1IoUPre.Rec.
BackbonesD-LinkNet0.6940.5430.8300.621
Segformer0.6690.5120.7630.620
Compared MethodsRCFSNet0.6430.4840.7800.572
UGD-DLinkNet0.6740.5080.6120.750
D-LinkNet + GCE0.6260.4170.8400.827
D-LinkNet + RLS0.6330.4710.6570.643
OursD-LinkNet + PLS0.7030.5500.7410.689
Segformer + PLS0.6840.5280.6870.705
Note: Bold values indicate the optimal IoU and F1 scores.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, H.; Wang, S.; Huang, L.; Fan, X.; Li, Y.; Yang, D. Positive-Guided Local Supervision for Robust Road Extraction from Remote Sensing Imagery. Remote Sens. 2026, 18, 1589. https://doi.org/10.3390/rs18101589

AMA Style

He H, Wang S, Huang L, Fan X, Li Y, Yang D. Positive-Guided Local Supervision for Robust Road Extraction from Remote Sensing Imagery. Remote Sensing. 2026; 18(10):1589. https://doi.org/10.3390/rs18101589

Chicago/Turabian Style

He, Hao, Shuyang Wang, Lei Huang, Xiaohu Fan, Yongfei Li, and Dongfang Yang. 2026. "Positive-Guided Local Supervision for Robust Road Extraction from Remote Sensing Imagery" Remote Sensing 18, no. 10: 1589. https://doi.org/10.3390/rs18101589

APA Style

He, H., Wang, S., Huang, L., Fan, X., Li, Y., & Yang, D. (2026). Positive-Guided Local Supervision for Robust Road Extraction from Remote Sensing Imagery. Remote Sensing, 18(10), 1589. https://doi.org/10.3390/rs18101589

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop