Next Article in Journal
Responses of Dominant Tree Species Phenology to Climate Change in the Ailao Mountains Mid-Subtropical Evergreen Broad-Leaved Forest (2008–2022)
Next Article in Special Issue
MBA-Former: A Boundary-Aware Transformer for Synergistic Multi-Modal Representation in Pine Wilt Disease Detection from High-Resolution Satellite Imagery
Previous Article in Journal
Early-Warning Indicators of Mangrove Decline Under Compounded Biotic and Anthropogenic Stressors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Precise Mapping of Linear Shelterbelt Forests in Agricultural Landscapes: A Deep Learning Benchmarking Study

1
College of Life Science and Technology, Tarim University, Alar 843300, China
2
Xinjiang Production & Construction Corps Key Laboratory of Protection and Utilization of Biological Resources in Tarim Basin, Alar 843300, China
3
Institute of Cold Regions Science and Engineering, Northeast Forestry University, Harbin 150040, China
4
College of Horticulture and Forestry, Tarim University, Alar 843300, China
*
Author to whom correspondence should be addressed.
Forests 2026, 17(1), 91; https://doi.org/10.3390/f17010091
Submission received: 29 November 2025 / Revised: 26 December 2025 / Accepted: 7 January 2026 / Published: 9 January 2026

Abstract

Farmland shelterbelts are crucial elements in safeguarding agricultural ecological security and sustainable development, with their precise extraction being vital for regional ecological monitoring and precision agriculture management. However, constrained by their narrow linear distribution, complex farmland backgrounds, and spectral confusion issues, traditional remote sensing methods encounter significant challenges in terms of accuracy and generalization capability. In this study, six representative deep learning semantic segmentation models—U-Net, Attention U-Net (AttU_Net), ResU-Net, U2-Net, SwinUNet, and TransUNet—were systematically evaluated for farmland shelterbelt extraction using high-resolution Gaofen-6 imagery. Model performance was assessed through four-fold cross-validation and independent test set validation. The results indicate that convolutional neural network (CNN)-based models show overall better performance than Transformer-based architectures; on the independent test set, the best-performing CNN model (U-Net) achieved a Dice Similarity Coefficient (DSC) of 91.45%, while the lowest DSC (88.86%) was obtained by the Transformer-based TransUNet model. Among the evaluated models, U-Net demonstrated a favorable balance between accuracy, stability, and computational efficiency. The trained U-Net was applied to large-scale farmland shelterbelt mapping in the study area (Alar City, Xinjiang), achieving a belt-level visual accuracy of 95.58% based on 385 manually interpreted samples. Qualitative demonstrations in Aksu City and Shaya County illustrated model transferability. This study provides empirical guidance for model selection in high-resolution agricultural remote sensing and offers a feasible technical solution for large-scale and precise farmland shelterbelt extraction.

1. Introduction

Farmland shelterbelts are key components of farmland ecosystems, playing a crucial role in safeguarding global food security, ecological balance, and sustainable agricultural development [1]. These shelterbelts achieve this by reducing wind speeds, preventing soil and water erosion, minimizing transpiration, blocking dust storms, and regulating local microclimates [2,3], thereby helping to stabilize crop yields and mitigate the impacts of extreme weather events [4,5]. In arid and semi-arid regions, these shelterbelts additionally fulfill vital functions in ecological stabilization, land restoration, and combating desertification [6,7,8,9]. However, under the backdrop of global climate warming and intensifying anthropogenic disturbances, farmland shelterbelts are confronting challenges of degradation and fragmentation [10], which weaken their protective functions and threaten farmland ecological stability and agricultural sustainability [11]. Therefore, promptly acquiring information on the distribution, structure, health status, and change trends of farmland shelterbelts has become a pressing need in ecological conservation and agricultural management [12,13]. Moreover, conducting large-scale, high-precision extraction of these shelterbelts represents a critical step in addressing this demand.
Farmland shelterbelts typically exhibit narrow, linear distributions embedded within highly heterogeneous farmland landscapes that undergo pronounced seasonal variations, posing significant challenges to their automatic identification and precise extraction [14,15]. In complex farmland environments characterized by mixed vegetation, existing methods often suffer from misclassifications due to spectral confusion, thereby limiting extraction accuracy and completeness [16]. Although field surveys can yield reliable data, their low efficiency renders them unsuitable for large-scale monitoring tasks. With technological advancements, multi-source remote sensing methods offer advantages in efficiency [17]; however, while visual interpretation of remote sensing imagery achieves higher accuracy [18], it is similarly constrained by labor costs, hindering rapid large-area monitoring. Current mainstream approaches integrate high-resolution imagery with traditional machine learning algorithms (e.g., random forest, support vector machines, and classification and regression trees (CART)) [19,20], performing well during sparse vegetation periods (e.g., April–May) [21]; however, accuracy notably declines in the lush growth season, and their reliance on handcrafted feature engineering further limits applicability across diverse regional scenarios.
Deep learning, particularly convolutional neural network (CNN)-based semantic segmentation, has significantly advanced the extraction of complex features from remote sensing imagery [22]. Unlike classification-based approaches that assign labels at the image or object level, semantic segmentation enables direct pixel-level prediction, allowing precise delineation of target boundaries and preservation of fine structural details. This characteristic is especially critical for farmland shelterbelts, which are typically narrow, linear features embedded in heterogeneous agricultural landscapes. CNN-based models inherently encode strong inductive biases—such as locality, translation invariance, and hierarchical receptive field expansion—that are well suited for capturing fine-scale linear structures and local boundary cues. Accordingly, a series of improved CNN architectures, including U-Net, Attention U-Net (AttU_Net), ResU-Net, and U2-Net, have been widely applied to remote sensing tasks such as building extraction, road detection, land cover mapping, and vegetation identification, demonstrating strong performance [23,24,25,26].
In recent years, Transformer architectures have also been introduced into remote sensing semantic segmentation [27], with derivative models like SwinUNet and TransUNet excelling in high-resolution land cover classification and feature segmentation [28,29]. However, farmland shelterbelts represent a distinct category of targets characterized by narrow widths, elongated geometries, and strong local boundary cues, with target widths often spanning only a few pixels in high-resolution imagery. Accurate extraction of such features therefore relies heavily on precise local feature modeling and boundary preservation. Under sample-limited conditions, the global context modeling emphasized by Transformer architectures and their reliance on patch-based tokenization may dilute critical local structural information, potentially leading to fragmented or missed detections of narrow linear features.
Although both CNN-based and Transformer-based models have made notable progress in general remote sensing segmentation tasks, challenges remain in high-resolution agricultural remote sensing applications due to scarce labeled samples and complex background interference [30]. For the extraction of farmland shelterbelts in high-interference environments, existing studies lack systematic comparisons of the performance, stability, and applicability of different models. Therefore, this study selects six representative models—U-Net, AttU_Net, ResU-Net, U2-Net, SwinUNet, and TransUNet—for systematic evaluation in sample-limited, complex farmland scenarios. This study aims to provide empirical evidence for high-precision extraction of farmland shelterbelts, thereby supporting practical applications in precision agriculture and ecological monitoring.

2. Materials

2.1. Study Area Overview

The study area is located in Alar City, Xinjiang Uygur Autonomous Region, China (40°22′4–0°57′ N, 80°30′–81°58′ E), as shown in Figure 1c, and is situated in the oasis-desert transition zone along the northern edge of the Taklamakan Desert, exhibiting typical characteristics of arid zone environments [31]. This area is a key region of the national Three-North Shelterbelt Program and serves as an irrigated agricultural belt. Leveraging its unique geographical position, it performs multiple ecological functions, including windbreak and sand fixation (reducing wind speeds by 30%–50%) and soil and water conservation, acting as a vital ecological barrier at the forefront of regional wind-sand hazards and playing a pivotal role [32].
Alar City features a warm-temperate, extremely continental arid desert climate, with annual sunshine duration ranging from 2556.3 to 2991.8 h, precipitation of 40.1 to 82.5 mm, and evaporation rates as high as 1876.6 to 2558.9 mm, resulting in water scarcity and ecological fragility that pose formidable challenges to agricultural irrigation and ecological restoration [33]. The study focuses on the extraction of arboreal farmland shelterbelts, with primary species including Xinjiang poplar (Populus alba var. pyramidalis) and Euphrates poplar (Populus euphratica) [34]; shelterbelt widths range from 4 to 30 m.

2.2. Remote Sensing Data Sources

This study employs panchromatic multispectral (PMS) data from the Gaofen-6 (GF-6) satellite (source: China Centre for Resources Satellite Data and Application) [35]. The GF-6 PMS provides 2 m panchromatic and 8 m multispectral resolution imagery with a swath width of approximately 90 km, encompassing red, green, blue, and near-infrared (NIR) bands [36]. The high revisit frequency of GF-6 (every 4 days) ensures the acquisition of cloud-free imagery during key phenological periods (cloud cover ≤ 15%).
Although the NIR band is widely recognized as important for woody vegetation, this study intentionally restricted model inputs to red, green, and blue (RGB) bands to enhance dataset generality and methodological simplicity. High-resolution RGB imagery is more widely available across sensors and regions and facilitates reliable visual interpretation and pixel-level annotation. Given that farmland shelterbelts are characterized by narrow, linear structures with clear boundary contrasts, their spatial and geometric features can be effectively captured using high-resolution RGB data alone. The multispectral capability of GF-6, including the NIR band, provides opportunities for future extensions.

2.3. Deep Learning Model Architectures

To systematically evaluate the potential of deep learning techniques for extracting farmland shelterbelts in complex farmland environments, this study selects six representative semantic segmentation models (all codes sourced from GitHub open repositories), spanning mainstream architectures from convolutional neural networks (CNNs) to Transformer-based models. The exact source codes and key configuration settings of all models are provided in Appendix A.
U-Net employs an encoder–decoder structure with skip connections to achieve linear fusion of multi-level features [37]. AttU_Net incorporates an additive attention mechanism to adaptively focus on shelterbelt regions while suppressing background noise [38]. ResU-Net integrates residual learning units to mitigate deep network degradation issues and enhance discriminative feature extraction capabilities [39]. U2-Net adopts a nested U-shaped structure and deep supervision strategy to effectively capture multi-scale shelterbelt information [24]. SwinUNet leverages a shifted window self-attention mechanism to discern spectrally similar ground objects using global contextual information [40]. TransUNet combines CNN’s local perception with Transformer’s global modeling capabilities, exhibiting superior local-global collaborative representation performance in high-interference environments [41]. The selection of these models balances architectural diversity and state-of-the-art advancements, providing a solid foundation for comprehensive performance comparisons under limited sample conditions.

3. Methods

3.1. Remote Sensing Data Preprocessing

This study implements a systematic preprocessing workflow, with all operations performed in ENVI (v5.6; L3Harris Geospatial, Boulder, CO, USA) (Environment for Visualizing Images), sequentially encompassing radiometric calibration, atmospheric correction, orthorectification, and image fusion to obtain high-precision surface reflectance products, as illustrated in Figure 2.
The Radiometric Calibration tool in ENVI is used to convert raw digital number (DN) values to at-sensor radiance, based on the absolute calibration coefficients provided by the China Centre for Resources Satellite Data and Application, ensuring accuracy in subsequent inversions. Atmospheric correction was performed in ENVI using the FLAASH module. The atmospheric model was set to Sub-Arctic Summer and the aerosol model was set to Rural. Aerosol retrieval was disabled (None), water vapor retrieval was not performed, and the initial visibility was set to 40 km. Building on radiometric and spectral corrections, orthorectification was conducted using the RPC Orthorectification tool in ENVI, supported by the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) digital elevation model at a spatial resolution of 30 arc-seconds. Finally, image fusion was carried out using the Gram–Schmidt pan-sharpening method, in which the panchromatic image (after radiometric calibration and orthorectification) and the multispectral image (after radiometric calibration, atmospheric correction, and orthorectification) were fused by integrating the spatial detail of the 2 m panchromatic data with the spectral information of the 8 m multispectral data, generating high-quality multispectral imagery at a 2 m spatial resolution, with all parameters kept at their default settings, including the use of bilinear interpolation during the resampling process.

3.2. Dataset Construction

Dataset construction was designed to address the challenge of extracting farmland shelterbelts under high vegetation interference during the peak growing season. To this end, three high-resolution Gaofen-6 images acquired in June, July, and August were selected, corresponding to the lush growth period of northern crops, when spectral similarities among crops, orchards, and shelterbelts are most pronounced. This setting represents the most challenging discrimination conditions for shelterbelt extraction and thus provides a rigorous testbed for model robustness.
From each image scene, 20 image blocks of 512 × 512 pixels were strategically cropped, yielding a total of 60 blocks. These blocks were treated as high-quality source imagery rather than direct training samples. Instead of purely random sampling, block locations were manually guided to ensure representativeness and diversity of challenging scenarios. This sampling strategy intentionally prioritizes typical yet difficult cases over sheer sample quantity, which is particularly important for evaluating model performance under sample-limited conditions. Specifically, the selected blocks were required to contain representative farmland shelterbelt structures, including complete belts, belt edges, and structurally complex or fragmented segments. In addition, emphasis was placed on areas with strong background interferences, such as adjacent croplands, orchards, irrigation channels, and natural vegetation, to enhance the models’ ability to distinguish shelterbelts from spectrally and morphologically similar objects.
Each 512 × 512 block was further subdivided using a sliding-window strategy with a window size of 256 × 256 pixels and a stride of 256 pixels, generating 240 non-overlapping patch-level samples that constituted the actual inputs for model training and validation. All image patches were subjected to pixel-level annotation following a unified definition of farmland shelterbelts. Annotations were performed independently by two trained annotators.
To quantify annotation consistency, inter-rater agreement was evaluated at the pixel level. Specifically, Cohen’ s κ coefficient was computed for the binary classification task (shelterbelt vs. non-shelterbelt). Across all 240 patches, the two annotators achieved a Cohen’ s κ of 0.9114 (95% CI: [0.9062, 0.9166]), indicating high agreement. In addition, the mean Dice coefficient between the two annotation masks was 0.9118, reflecting strong spatial overlap. Pixel-level disagreements accounted for approximately 0.95% of all annotated pixels. All annotation discrepancies were subsequently resolved through a consensus review conducted by a single senior researcher, whose adjudicated annotations were used as the final ground-truth labels for model training and validation.
To further improve model generalization and mitigate overfitting, online data augmentation was applied during training, including random horizontal or vertical flipping (p = 0.5) and random rotation within ±45°, simulating diverse shelterbelt orientations. All augmentation operations were performed in memory with a fixed global random seed (seed = 42) to ensure experimental reproducibility. Through data augmentation, the effective training sample diversity was substantially increased, theoretically expanding the dataset to approximately 1200 samples. Given the relatively consistent structural characteristics of linear shelterbelt targets, this patch-based learning strategy combined with data augmentation provides sufficient training support for robust model learning under the given data scale.

3.3. Experimental Environment and Parameter Settings

To ensure experimental reproducibility and fair comparisons, training and evaluation are conducted under a unified computing environment and hyperparameter configuration. All experiments are performed on a computer equipped with an Intel Core i5-13600KF CPU and NVIDIA GeForce RTX 4070 Super GPU (12 GB VRAM), using a software environment of Python (v3.9.23), PyTorch (v2.6.0), and CUDA (v12.4; NVIDIA, Santa Clara, CA, USA). On this basis, this study adopts a unified hyperparameter configuration (Table 1), guided by the principle of avoiding model-specific optimizations to ensure that performance differences are attributable to the architectures themselves rather than tuning strategies.
The input image size was fixed at 256 × 256 pixels, and the batch size was set to 16, representing a trade-off between GPU memory constraints and training stability. All models were trained for 200 epochs to ensure sufficient convergence under the unified training protocol. The AdamW optimizer was employed for all experiments, with an initial learning rate of 0.001 and a weight decay of 0.01. This setting follows standard practice in semantic segmentation, where decoupled weight decay has been shown to provide stable optimization and improved generalization performance [42,43]. Learning rate scheduling employs a cosine annealing strategy, synchronized with the total training epochs to ensure smooth decay to a minimum of 10−5 [44]. The loss function combines binary cross-entropy and Dice loss to synergistically optimize pixel-level classification accuracy while alleviating foreground–background class imbalance issues.

3.4. Loss Function

The semantic segmentation task for farmland shelterbelts requires distinguishing shelterbelt pixels from background regions at the pixel level; however, in practical applications, it faces data distribution challenges, where background pixels (e.g., crops, soil) dominate the image, while shelterbelt pixels are scarce and exhibit narrow, elongated distributions. Such data characteristics readily cause models to overemphasize background classes during training, resulting in insufficient recognition of shelterbelts and ultimately compromising segmentation performance. To address this issue, this study designs an adaptive loss function by combining binary cross-entropy loss (Equation (2)) and Dice loss (Equation (3)), with equal weights of 1 for both [45,46].
L = w b L B C E + w d L D i c e
L B C E = 1 N i = 1 N [ y i log ( p i ) + ( 1 y i ) log ( 1 p i ) ]
L D i c e = 1 2 i = 1 N y i p i + ϵ i = 1 N y i + i = 1 N p i + ϵ
Here, wb and wd denote the weights of the BCE loss and the Dice loss, respectively; N denotes the total number of pixels; yi ∈ {0, 1} is the ground-truth label for pixel i (1 for shelterbelt, 0 for background); pi ∈ [0, 1] is the model’s predicted probability for shelterbelts; and ϵ is a small smoothing constant (typically 1 × 10−5) to prevent division by zero and ensure numerical stability of the loss function during training. In all experiments, we set wb = wd = 1 and fixed ϵ = 1 × 10−5.

3.5. Evaluation Metrics

Model performance is evaluated using the Dice Similarity Coefficient (DSC), Intersection over Union (IoU), Precision, and Sensitivity (Recall). These metrics are derived from the four fundamental elements of the confusion matrix: true positives (TP: correctly identified shelterbelt pixels), false positives (FP: background pixels misclassified as shelterbelts), true negatives (TN: correctly identified background pixels), and false negatives (FN: missed shelterbelt pixels).
D S C = 2 × T P 2 × T P + F P + F N
I o U = T P T P + F P + F N
P r e c i s i o n = T P T P + F P
S e n s i t i v i t y = T P T P + F N
Given the research emphasis on accurately identifying farmland shelterbelts as the foreground target, overall pixel accuracy—Pixel Accuracy = (TP + TN)/(TP + TN + FP + FN)—is not adopted as the primary evaluation metric. In farmland images, background pixels vastly outnumber shelterbelt pixels; thus, even if a model’s shelterbelt recognition is weak, high background classification accuracy can yield inflated overall accuracy scores, masking true target recognition performance and failing to reflect segmentation quality [47,48]. DSC emphasizes the proportion of correctly identified pixels and exhibits better adaptability than other metrics when foreground targets (e.g., shelterbelts) constitute a small proportion of the data. Therefore, this study designates DSC as the core evaluation metric to assess the overall consistency between predictions and ground-truth annotations.

3.6. Data Partitioning and Validation Strategy

To systematically evaluate model architectures and ensure unbiased performance estimation, this study adopts a rigorous three-stage validation strategy with strict data isolation and zero information leakage (Figure 3).
First, an independent test set was created by applying stratified random sampling by acquisition month (June, July, and August), so that the test set retained a proportional representation of different phenological conditions and vegetation interference levels. The remaining samples formed the training/validation pool. Stage one: Four-fold cross-validation was conducted exclusively within the training/validation pool to quantify each architecture’s sensitivity to data partitioning variability and to assess robustness, rather than to tune hyperparameters. Cross-validation was performed at the patch level using the 256 × 256 samples. Importantly, because the dataset was generated with a stride equal to the window size, the resulting patches were non-overlapping, and each patch was assigned to a single fold. Therefore, no pixel- or patch-level overlap existed between folds, preventing information leakage during cross-validation. Stage two: Based on the cross-validation results, final models were retrained for each architecture using the full training/validation pool, which was randomly split into 80% for training and 20% for internal validation. Training was run for 200 epochs, and the model weights corresponding to the epoch with the highest validation DSC were selected as the optimal weights for each architecture. Stage three: The resulting final models were evaluated in a one-shot, blind manner on the previously isolated independent test set, reporting their generalization performance. The independent test set was not used in any stage of training, cross-validation, validation, or model selection. This workflow enforces a rigid sequence of isolation-first, training-second, and testing-last to block information leakage and selection bias, thereby maximizing the fairness of model comparisons and the reliability of conclusions.

4. Results

4.1. Cross-Validation Assessment of Model Generalization and Stability

To systematically evaluate the generalization capability and stability of deep learning models in the farmland shelterbelt extraction task, four-fold cross-validation was performed on the six models. This approach maximizes the use of limited data and assesses stability by examining performance fluctuations across different data subsets. Quantitative results from cross-validation are presented in Table 2. All models exhibited strong competitiveness, with average DSC values exceeding 87%, confirming the substantial potential of deep learning in this domain. However, clear performance differences emerged among the models.
CNN-based models (U-Net, AttU_Net, ResU-Net, U2-Net) demonstrated overall superior performance compared to Transformer-based architectures (SwinUNet, TransUNet). The top four models in average DSC rankings were all CNN-based, with scores clustered around 91%. In contrast, Transformer-based models performed relatively weaker, with SwinUNet achieving the lowest average DSC (87.34%). This preliminarily indicates that, for the current task and data scale, CNN’s inductive biases (e.g., translation invariance, locality) may offer greater advantages than Transformer’s global attention mechanisms.
Among the high-performing CNN models, U2-Net achieved the highest average DSC (91.03%) and Sensitivity (91.51%), indicating strong capability in identifying shelterbelt pixels. U-Net’s average DSC (90.93%) was close to that of U2-Net, while also featuring the highest average Precision (92.06%), signifying the lowest false positive rate. AttU_Net, incorporating attention mechanisms, and ResU-Net, with residual connections, also yielded competitive results, with average DSC values of 90.89% and 90.16%, respectively.
U2-Net and U-Net formed the performance ceiling of this test, with their average DSC values being extremely close (91.03% vs. 90.93%). However, while maintaining top-tier accuracy, U2-Net sacrificed stability, exhibiting the highest standard deviations in DSC and IoU (±1.50%, ±2.59%) among all models.
For a scientific stability comparison, models must be evaluated at equivalent performance levels, with Figure 4 providing intuitive evidence. In the high-performance region (DSC > 89.1%), U-Net’s data points were more concentrated, indicating that it delivers more reliable outputs while sustaining top-tier performance. In contrast, U2-Net’s data points spanned a greater vertical range in this region, corroborating its higher variability.
ResU-Net possessed the lowest DSC standard deviation (±1.26%), reflecting high consistency in its outputs. However, this superior stability came at the cost of sacrificing nearly 1 percentage point in average DSC (90.16%). In other words, ResU-Net performs “stably well,” whereas U-Net achieves “stably excellent” performance. For accuracy-critical applications, among the top performance tier (U-Net, U2-Net, AttU_Net), U-Net achieves the optimal balance of stability and accuracy.

4.2. Independent Test Set Validation of Final Model Performance

Cross-validation unveiled the intrinsic characteristics of the models, while a fully independent test set provided the final, unbiased performance estimates. This section first confirms healthy convergence of all models in the final training phase, then reports their performance on the independent test set, and rigorously delineates differences via statistical tests. Figure 5 illustrates the loss variation curves for each model during training.
Although some models (e.g., TransUNet, SwinUNet) exhibited relatively higher validation losses, their training losses still trended downward, indicating learning capability. Since this study focuses on the potential performance of model architectures rather than extreme optimization, models without severe overfitting or training failures are deemed valid, with their loss curves serving as a basis for subsequent performance comparisons. Under the unified experimental framework, the convergence and stability of the loss curves suffice to support quantitative comparisons and architectural potential analyses, without requiring individual model fine-tuning.
Based on confirmed model convergence, performance on the independent test set is shown in Table 3.
The results show that U-Net achieved the best overall performance, with the highest DSC (91.45%) and IoU (84.68%). U2-Net’s slight average advantage in cross-validation did not translate to leadership on the independent test set. This phenomenon aligns with the stability analysis in Section 4.1, where U-Net’s robust learning strategy enables superior performance on novel, unseen data compared to the “high-potential but highly variable” U2-Net.
Table 4 summarizes the statistical significance of pairwise performance differences among the six models on the final test set (n = 48). Depending on the normality of paired differences assessed by the Shapiro–Wilk test, either a paired-sample t-test or a Wilcoxon signed-rank test was applied.
The performance difference between U-Net and U2-Net was not statistically significant (Wilcoxon signed-rank test, p = 0.566), indicating that these two models belong to the same performance tier from a statistical perspective. Similarly, the comparison between SwinUnet and TransUNet did not show a significant difference (Wilcoxon signed-rank test, p = 0.915).
In contrast, performance differences between CNN-based models and Transformer-based models were consistently and highly significant (p < 0.001 in all corresponding comparisons), revealing a clear performance gap between the two model categories. These results demonstrate that CNN-based architectures outperform Transformer-based models for the farmland shelterbelt extraction task under the evaluated conditions.
Overall, CNN-based models demonstrated superior performance in the farmland shelterbelt extraction task, with U-Net offering the most balanced trade-off between accuracy and stability, making it the most competitive model for this application.

4.3. Analysis of Computational Efficiency and Practical Training Costs

For models addressing practical problems, computational efficiency is as crucial as accuracy. This section evaluates, from a practical deployment perspective, each model’s parameter count, optimal training epochs, training time to peak accuracy, and single-image inference speed (Table 5). Here, the “Best epoch” refers to the epoch at which the trained model achieved the highest DSC on the final independent test set. The model parameters corresponding to this epoch were used for reporting computational efficiency metrics.
The data reveal the direct computational costs of architectural choices, with U-Net demonstrating clear efficiency advantages, as its parameter count (4.32 M) is the smallest among all models. This structural lightweighting directly translates to the shortest total training time (14.19 min) and fastest inference speed (6.8 ms/image) among CNN-based models, indicating that U-Net’s simplicity is not a limitation but a significant advantage, enabling rapid experimentation and high-responsiveness deployment. In contrast, U2-Net incurs extremely high computational costs. Its large parameter count (44.01 M) results in the longest total training time (50.77 min); more critically, its single-image inference time (43.4 ms) is 6.4 times that of U-Net. When processing large-scale areas, this efficiency gap will be dramatically amplified, posing severe application bottlenecks.
SwinUNet’s performance is noteworthy: despite its high parameter count (41.34 M), it converged to the optimal epoch (106 epochs) the fastest, with the shortest total training time (9.23 min) among all models. This suggests that Transformer-based SwinUNet, leveraging its global self-attention mechanism, possesses a highly efficient optimization pathway, allowing it to find good solutions faster than CNNs. However, this training advantage is offset by its lower final accuracy (DSC: 87.34%) and slower inference speed compared to U-Net.
To comprehensively assess the trade-off between model accuracy and runtime speed, a comparison of DSC scores against single-image inference times was plotted (Figure 6).
U-Net occupies the optimal region, being the only model to achieve both top-tier accuracy and ultra-low latency simultaneously. SwinUNet and TransUNet offer higher efficiency but insufficient accuracy, whereas U2-Net excels in accuracy but suffers from extremely low efficiency. This efficiency analysis, combined with prior findings on accuracy and stability, positions U-Net not only as the highest-accuracy or most stable model but also as the most computationally efficient, making it the optimal choice for academic research and large-scale operational mapping of farmland shelterbelts.

4.4. Qualitative Visualization and Typical Case Comparisons

Qualitative analysis intuitively reveals model behaviors in challenging scenarios, uncovering strengths and weaknesses that quantitative metrics may obscure. Accordingly, Figure 7 presents six representative test cases selected from the independent test set, including three high-DSC samples (a, b, c) and three low-DSC samples (d, e, f), to visually compare the segmentation results of all models against the ground-truth annotations.
These test cases correspond to six typical and challenging scenarios commonly encountered in farmland shelterbelt extraction: (a) interspersed fruit tree rows and shelterbelts, (b) fragmented and discontinuous shelterbelts, (c) sparse vegetation disturbance at desert margins, (d) mixed orchard–crop plantings, (e) agricultural channels adjacent to narrow shelterbelts, and (f) riverbanks densely covered by reeds. Together, these scenarios span a wide range of background complexities and interference conditions, providing a comprehensive basis for qualitative comparison of different model behaviors.
In scenarios (a) and (b), all models performed well, successfully identifying and precisely segmenting the main body of the shelterbelts. Compared to other models, U-Net and U2-Net generated smoother, more precise boundary contours and effectively captured complex structural features and fine details. Transformer-based models also exhibited acceptable performance in these relatively simple scenarios, demonstrating the strong capability of deep learning models in handling spectral interferences from orchards and crops, as well as fragmented shelterbelt distributions.
In scenario (c), all models showed mild over-segmentation, primarily manifesting as misidentifying dense understory shrubs as shelterbelts, resulting in extracted shelterbelt widths slightly broader than ground-truth labels. This error likely stems from spectral similarities in vegetation canopies and insufficient model comprehension of spatial contextual relationships. AttU_Net produced noticeable salt-and-pepper noise in this scenario, reflecting higher sensitivity of its attention mechanism to fragmented vegetation at desert edges.
In the challenging scenarios (d, e, f) representing the highest extraction difficulty, the limitations of each model were amplified. These “hard samples” typically feature blurred ground object boundaries, strong background interferences, or extreme target morphologies. In scenario (d), AttU_Net, ResU-Net, and TransUNet exhibited land cover confusion, misidentifying orchard-to-farmland transition zones as shelterbelts, revealing limitations in finely distinguishing different arbor vegetation types. Conversely, U-Net and U2-Net, leveraging their superior local feature extraction capabilities, maintained the highest segmentation accuracy in this scenario. Extraction results in scenarios (e) (agricultural ditches and narrow shelterbelts) and (f) (reed-infested riverbank areas) further highlighted model deficiencies in morphological discrimination. All models misclassified linear hydrological features (ditches, river channels) as shelterbelts, stemming from high spectral and morphological similarities between vegetation-covered linear objects and true shelterbelts. Notably, in scenario (f), U-Net, ResU-Net, and U2-Net failed to detect narrow shelterbelts in the upper-left region, exposing the need for improved sensitivity to weak signals and small target features. Deeper analysis indicates that such errors are not only constrained by model architecture but also closely tied to insufficient extreme cases in training samples, suggesting the need for enhanced coverage and annotation of such challenging samples in future data strategies.

4.5. Large-Scale Regional Application and Belt-Level Assessment

4.5.1. Large-Scale Application and Belt-Level Assessment in the Study Area

To evaluate the practical applicability of the best-performing model, U-Net was applied to large-scale farmland shelterbelt mapping in the primary study area, Alar City, Xinjiang. The input data consisted of Gaofen-6 satellite imagery mosaics covering more than 6000 km2. The imagery was cropped into multiple 256 × 256 pixel tiles and processed using the final trained U-Net model. Urban roadside trees were excluded through an urban mask, and the predicted tiles were seamlessly mosaicked to generate a regional-scale farmland shelterbelt distribution map (Figure 8).
The resulting map effectively captured the spatial structure, continuity, and connectivity of the farmland shelterbelt network. Visual inspection indicates a high degree of consistency between the extracted shelterbelts and the corresponding satellite imagery. Enlarged local views further demonstrate the model’s capability to delineate individual shelterbelts and their intersections at the regional scale.
At this large-application scale, validation was conducted using an application-oriented, belt-level visual interpretation approach, rather than pixel-wise quantitative segmentation metrics. Specifically, 500 random points were generated across the study area using ArcGIS Pro. For each point, the nearest shelterbelt was selected as a validation sample, yielding 385 effective shelterbelt samples. Manual interpretation was performed using high-resolution Google Earth imagery as reference data. The spatial distribution of the validation samples is shown in Figure 9.
Based on this belt-level assessment, the U-Net model achieved an overall shelterbelt extraction accuracy of approximately 95.58% within Alar City. The observed commission and omission error patterns were generally consistent with those identified in Section 3.4. It should be emphasized that this accuracy reflects object-level correctness of shelterbelt detection and delineation, serving to evaluate the feasibility and reliability of large-scale automated mapping, rather than strict pixel-wise segmentation accuracy.

4.5.2. Qualitative Cross-Regional Case Demonstration

To further examine the potential transferability of the trained model beyond the primary study area, qualitative cross-regional demonstrations were conducted in Aksu City and Shaya County (Figure 10). These regions are located in the oasis–desert transition zone along the northern edge of the Tarim Basin and share broadly similar climatic and agro-ecological conditions with Alar City, while exhibiting differences in farmland configuration and shelterbelt morphology.
It should be noted that no quantitative accuracy assessment or sample-based validation was conducted in these two regions. The results are presented solely as qualitative demonstrations to illustrate typical model behavior under cross-regional application scenarios.
The results show that the U-Net model successfully extracted the major structural patterns of farmland shelterbelts in both regions, effectively delineating the macroscopic shelterbelt networks. However, several limitations associated with out-of-distribution data were observed. In Aksu City, urban green belts were occasionally misclassified as farmland shelterbelts, primarily due to the absence of urban green space samples as negative classes in the training data. In Shaya County, extremely narrow shelterbelts (approximately 2 m in width, corresponding to about one pixel in the imagery) were frequently omitted, reflecting insufficient small-target perception caused by the scarcity of such ultra-narrow shelterbelt samples in the training dataset.
These observations highlight that the generalization performance of deep learning-based models is strongly dependent on the representativeness and completeness of training data. Overall, the large-scale application in Alar City, combined with qualitative cross-regional demonstrations, indicates that the proposed method is a feasible and effective solution for regional-scale farmland shelterbelt mapping, while also clarifying the current boundaries of its applicability and providing directions for future work involving more rigorous cross-regional quantitative validation.

5. Discussion

5.1. Comparison with Related Studies

5.1.1. Comparison with Global-Scale Land Use/Land Cover (LULC) Products

Global land cover products such as GlobeLand30 [49], the FROM-GLC series [50], GLC_FCS30/GLC_FCS30D, and the recently released ESA WorldCover provide long-term, multi-temporal land cover information at spatial resolutions ranging from 30 m to 10 m. These datasets play an essential role in large-scale land cover monitoring and ecological assessment, where forest or tree cover is typically represented as a patch-based class. However, they are not specifically designed to capture farmland shelterbelts, which are small-scale ecological elements characterized by narrow widths, elongated geometries, and close interspersion with croplands.
From a spatial resolution perspective, pixels of 30 m—and even 10 m—often exceed the width of farmland shelterbelts, leading to inevitable mixing of shelterbelts with surrounding farmland within single pixels. This results in smoothing effects, blocky representations and systematic boundary displacement, particularly for narrow and strip-like features. Consequently, while global LULC products are effective for macro-scale forest cover mapping, they remain insufficient for boundary-level, high-precision representation and structural analysis of farmland shelterbelt networks.
In contrast, this study employs 2 m resolution Gaofen-6 fused imagery combined with pixel-level semantic segmentation, enabling discrimination of farmland shelterbelts from surrounding crops, orchards, and natural vegetation at the single-pixel scale. This fine-grained mapping approach allows accurate restoration of shelterbelt orientation, width, and connectivity, providing a structured representation tailored to farmland ecological infrastructure. In application contexts emphasizing ecosystem services, agricultural landscape connectivity, and green infrastructure management, such detailed shelterbelt extraction offers clear advantages over global-scale LULC products.

5.1.2. Comparison with Linear Farmland Shelterbelt Extraction Methods

Earlier approaches to farmland shelterbelt extraction predominantly relied on handcrafted features and rule-based strategies. For example, Xing et al. (2016) combined vegetation indices, mathematical morphology, and object-based analysis to extract shelterbelt skeletons [51], but remained sensitive to structural variations and intersection complexity. Li et al. (2024) integrated spectral, textural, and vegetation index features with random forest classifiers, improving extraction accuracy while retaining a strong dependence on feature engineering [21]. Zhang et al. (2024) enhanced vegetation indices using phenological information and validated performance across multiple resolutions, yet the method remained sensitive to threshold settings and prior knowledge [52]. Deng et al. (2023) focused on repairing fragmented shelterbelts through belt-oriented post-processing, improving continuity but still facing difficulties under complex spectral mixing conditions [18].
In contrast, the deep learning framework adopted in this study enables end-to-end learning of multi-level spectral–spatial representations, reducing reliance on manually designed features. CNN-based semantic segmentation models demonstrate stronger semantic consistency and structural integrity when extracting shelterbelts in complex environments such as orchards, dense crop fields, and desert margins. Compared with traditional approaches, these models offer improved automation, robustness, and scalability, providing a more reliable pathway for high-precision and large-area farmland shelterbelt mapping.

5.2. Structural Characteristics and Task Adaptability of Deep Learning Models

The experimental results consistently indicate that CNN-based models outperform Transformer-based architectures in farmland shelterbelt extraction. This outcome directly supports the task-oriented considerations articulated in the Introduction, where farmland shelterbelts were characterized as narrow, elongated features with strong local boundary cues that place high demands on precise local feature preservation.
Across both cross-validation and independent testing, CNN-based models exhibit higher performance stability under sample-limited and high-interference conditions. In contrast, Transformer-based models show reduced sensitivity to fine-scale linear structures, which is reflected in lower segmentation consistency and increased variability. These findings suggest that, for narrow-width linear targets embedded in heterogeneous agricultural landscapes, architectural inductive biases favoring locality-aware feature modeling play a more critical role than global context aggregation.
Similar observations have been reported in recent reviews of remote sensing semantic segmentation, which note that pure Transformer architectures often require larger training datasets and stronger regularization strategies to achieve performance comparable to or exceeding that of CNN-based models in high-resolution tasks with strict spatial structural constraints [49]. The present results provide task-specific empirical evidence for this conclusion in the context of farmland shelterbelt extraction.
Among the evaluated CNN architectures, U-Net achieves the most balanced performance in terms of accuracy, stability, and computational efficiency. Its encoder–decoder structure with skip connections enables effective integration of deep semantic information and shallow boundary details, allowing continuous and coherent delineation of shelterbelts under complex background conditions. Enhanced CNN variants, such as U2-Net, AttU_Net, and ResU-Net, further improve feature representation and connectivity in specific scenarios; however, these gains are often accompanied by increased parameter complexity and computational cost, without consistently outperforming the standard U-Net under medium-scale data conditions.

5.3. Current Limitations and Future Research Directions

Despite this study’s systematic evaluation of deep learning’s advantages in precise farmland shelterbelt extraction, several limitations persist. First, model semantic discrimination in complex feature-mixing environments remains inadequate, particularly for linear features highly similar to shelterbelts in both spectrum and morphology (e.g., shrub-covered ditches, riverbank reed belts, and roadside trees). Such misclassification issues stem from optical imagery’s inability to fully represent canopy structural differences, with local convolutional features prone to confusion under similar textures. Second, small-scale shelterbelt targets—ultra-narrow, fractured, or sparsely canopied—face significant omission risks under dual constraints of resolution and network architecture, as their widths often span only 1–2 pixels, easily weakened during downsampling, convolutional smoothing, or skip connections. Additionally, although this study employs a BCE–Dice combined loss to mitigate class imbalance, small targets may still be systematically overlooked in scenarios with extremely skewed natural distributions. Finally, while cross-regional trials in Aksu City and Shaya County demonstrate certain generalization potential, significant variations in geomorphology, vegetation structure, water resources, and management practices within Xinjiang’s typical oasis-desert transition zones imply that model robustness across larger spatial scales, ecological zones, crop belts, and phenological periods requires more systematic validation.
Addressing these challenges, future research urgently requires synergistic advancements across four directions: “expanding data foundations,” “deepening multi-source information,” “enhancing cross-regional generalization,” and “advancing toward ecological applications.” This is pivotal to advancing shelterbelt intelligent recognition from “case-level studies” to “nationally scalable applications.” Concurrently, integrating multimodal image fusion, structure-aware networks, time-series modeling, and self-supervised or domain-adaptive learning will provide effective pathways for models to discern three-dimensional structural differences and maintain stable cross-regional performance. Furthermore, future shelterbelt remote sensing research should not merely halt at geometric extraction but deeply couple with ecological function models, incorporating features such as belt widths, continuity, and tree height structures into indicators like wind erosion protection, carbon sink estimation, landscape connectivity assessment, and farmland climate regulation, thereby achieving a leap from “spatial identification” to “ecological process quantification” and “agricultural management decision support.” Through integrated advancements in data, models, and applications, deep learning-driven precise extraction of farmland shelterbelts will demonstrate its scientific value and application potential at larger scales.

6. Conclusions

This study systematically evaluated the comprehensive performance of six mainstream deep learning models in the precise extraction task for farmland shelterbelts. Through four-fold cross-validation, independent test set evaluation, computational efficiency analysis, and multi-scenario qualitative comparisons, the following key conclusions were drawn:
  • Deep learning models exhibit exceptional performance potential in farmland shelterbelt extraction tasks. All evaluated models achieved average DSC values exceeding 87% in cross-validation, with the best DSC reaching 91.45% on the independent test set. This result fully demonstrates that deep learning techniques can achieve high-precision shelterbelt identification and segmentation from remote sensing imagery, providing a solid technical foundation for automated, large-scale shelterbelt monitoring.
  • Model architecture exerts a decisive influence on performance. This study found that, in the current task, CNN-based models (e.g., U-Net, U2-Net) significantly outperform Transformer-based models (e.g., SwinUNet, TransUNet) in extraction accuracy, result stability, and statistical significance. This phenomenon indicates that, for medium-scale datasets and shelterbelt targets with strong local spatial features, CNN inductive biases (including local connectivity and translation invariance) offer greater advantages than Transformer global attention mechanisms.
  • The U-Net model achieves the optimal balance among accuracy, stability, and computational efficiency, representing the most practical solution for this task. In terms of accuracy and stability, U-Net demonstrated top-tier and stable performance in both cross-validation and independent test sets, surpassing the more variable U2-Net in stability. In computational efficiency, U-Net attained an inference speed of 6.8 ms/image, with training and deployment costs substantially lower than other high-performance models. In qualitative performance, U-Net generated the smoothest and most precise boundary segmentation results in complex scenarios, exhibiting excellent adaptability to fragmented shelterbelts, orchard interferences, and other challenges.
In summary, this study, through establishing a comprehensive and rigorous evaluation framework, empirically validates the effectiveness and superiority of deep learning techniques—particularly CNN architectures represented by U-Net—in high-precision farmland shelterbelt extraction. The findings provide crucial technical support and empirical evidence for precision agriculture planning, ecological environment monitoring, and forest resource management.

Author Contributions

Conceptualization, R.L. (Ruiheng Lyu) and W.Z.; methodology, W.Z. and L.L.; software, W.Z. and L.L.; validation, W.Z., R.L. (Ruiqi Liu) and L.Q.; formal analysis, W.Z. and L.L.; investigation, W.Z., R.L. (Ruiqi Liu) and L.Q.; resources, F.C., L.Y. and R.L. (Ruiheng Lyu); data curation, W.Z., R.L. (Ruiqi Liu) and F.C.; writing—original draft preparation, W.Z. and R.L. (Ruiqi Liu); writing—review and editing, L.L., F.C., L.Y., L.Q. and R.L. (Ruiheng Lyu); visualization, W.Z.; supervision, R.L. (Ruiheng Lyu); project administration, R.L. (Ruiheng Lyu); funding acquisition, R.L. (Ruiheng Lyu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Plan for Tackling Key Scientific and Technological Problems in Key Areas of the Xinjiang Production & Construction Corps (Grant No. 2021AB022).

Data Availability Statement

The Gaofen-6 PMS data were obtained from the China Centre for Resources Satellite Data and Application (CRESDA), the official national satellite data distribution platform in China. Due to access restrictions, the data portal is not publicly accessible from users outside China; the data are available upon formal application to the data provider, in accordance with its data use regulations. The deep learning model codes are available from the first author (mooode13@163.com) upon reasonable request.

Acknowledgments

The authors would like to express their sincere gratitude to the China Centre for Resources Satellite Data and Application for providing the Gaofen satellite imagery. We also thank our colleagues from the laboratory for their insightful discussions and technical support throughout this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

To ensure full reproducibility, this study adopted publicly available GitHub implementations for all deep learning models. Table A1 summarizes the exact source repositories and key configuration settings used in our experiments. All other architectural components and default parameters strictly follow the original implementations provided in the respective repositories.
Table A1. GitHub repositories and key configuration settings of the deep learning models used in this study.
Table A1. GitHub repositories and key configuration settings of the deep learning models used in this study.
ModelGitHub RepositoryKey Configuration
U-Nethttps://github.com/milesial/Pytorch-UNet (accessed on 20 July 2025).base_c = 32; in_channels = 3; out_channels = 1
AttU_Nethttps://github.com/LeeJunHyun/Image_Segmentation (accessed on 20 July 2025).in_channels = 3; num_classes = 1
ResU-Nethttps://github.com/rishikksh20/ResUnet (accessed on 20 July 2025).in_channels = 3; num_classes = 1
U2-Nethttps://github.com/xuebinqin/U-2-Net (accessed on 20 July 2025).in_channels = 3; out_channels = 1
SwinUnethttps://github.com/HuCaoFighting/Swin-Unet (accessed on 20 July 2025).in_channels = 3; num_classes = 1
TransUNethttps://github.com/Beckschen/TransUNet (accessed on 20 July 2025).in_channels = 3;
num_classes = 1

References

  1. Lobell, D.B.; Field, C.B. Global Scale Climate–Crop Yield Relationships and the Impacts of Recent Warming. Environ. Res. Lett. 2007, 2, 014002. [Google Scholar] [CrossRef]
  2. Thevs, N.; Strenge, E.; Aliev, K.; Eraaliev, M.; Lang, P.; Baibagysov, A.; Xu, J. Tree Shelterbelts as an Element to Improve Water Resource Management in Central Asia. Water 2017, 9, 842. [Google Scholar] [CrossRef]
  3. Potashkina, Y.N.; Koshelev, A.V. Impact of Field-Protective Forest Belts on the Microclimate of Agroforest Landscape in the Zone of Chestnut Soils of the Volgograd Region. Forests 2022, 13, 1892. [Google Scholar] [CrossRef]
  4. Zheng, X.; Zhu, J.; Xing, Z. Assessment of the Effects of Shelterbelts on Crop Yields at the Regional Scale in Northeast China. Agric. Syst. 2016, 143, 49–60. [Google Scholar] [CrossRef]
  5. Smith, M.M.; Bentrup, G.; Kellerman, T.; MacFarland, K.; Straight, R. Ameyaw, Lord Windbreaks in the United States: A Systematic Review of Producer-Reported Benefits, Challenges, Management Activities and Drivers of Adoption. Agric. Syst. 2021, 187, 103032. [Google Scholar] [CrossRef]
  6. Zhang, J.; Zhang, Y. Quantitative Assessment of the Impact of the Three-North Shelter Forest Program on Vegetation Net Primary Productivity over the Past Two Decades and Its Environmental Benefits in China. Sustainability 2024, 16, 3656. [Google Scholar] [CrossRef]
  7. Aili, A.; Bakayisire, F.; Xu, H.; Waheed, A. Enhancing Agroecological Resilience in Arid Regions: A Review of Shelterbelt Structure and Function. Agriculture 2025, 15, 2004. [Google Scholar] [CrossRef]
  8. Kulik, K.N.; Belyaev, A.I.; Pugacheva, A.M. The Role of Protective Afforestation in Drought and Desertification Control in Agro-Landscapes. Arid Ecosyst. 2023, 13, 1–10. [Google Scholar] [CrossRef]
  9. Adesina, F.A.; Gadiga, B.L. The Role of Shelterbelts in Vegetation Development of Desert Prone Area of Yobe State, Nigeria. J. Geogr. Geol. 2014, 6, 109–121. [Google Scholar] [CrossRef]
  10. Fahrig, L. Effects of Habitat Fragmentation on Biodiversity. Annu. Rev. Ecol. Evol. Syst. 2003, 34, 487–515. [Google Scholar] [CrossRef]
  11. Enescu, C.M.; Mihalache, M.; Ilie, L.; Dinca, L.; Constandache, C.; Murariu, G. Agricultural Benefits of Shelterbelts and Windbreaks: A Bibliometric Analysis. Agriculture 2025, 15, 1204. [Google Scholar] [CrossRef]
  12. Mitchell, A.L.; Rosenqvist, A.; Mora, B. Current Remote Sensing Approaches to Monitoring Forest Degradation in Support of Countries Measurement, Reporting and Verification (MRV) Systems for REDD+. Carbon Balance Manag. 2017, 12, 9. [Google Scholar] [CrossRef]
  13. Oliveira, A.H.M.; Matricardi, E.A.; De Aragão, L.E.O.E.C.; Felix, I.M.; Chaves, J.H.; Magliano, M.M.; Oliveira-Junior, J.M.B.; Vieira, T.A.; Santos, L.E.D.; Reis, L.P.; et al. Assessing Forest Degradation Through Remote Sensing in the Brazilian Amazon: Implications and Perspectives for Sustainable Forest Management. Remote Sens. 2024, 16, 4557. [Google Scholar] [CrossRef]
  14. Aksoy, S.; Akcay, H.G.; Wassenaar, T. Automatic Mapping of Linear Woody Vegetation Features in Agricultural Landscapes Using Very High Resolution Imagery. IEEE Trans. Geosci. Remote Sens. 2010, 48, 511–522. [Google Scholar] [CrossRef]
  15. Zhu, J.J.; Matsuzaki, T.; Gonda, Y. Optical Stratification Porosity as a Measure of Vertical Canopy Structure in a Japanese Coastal Forest. For. Ecol. Manag. 2003, 173, 89–104. [Google Scholar] [CrossRef]
  16. Pirbasti, M.A.; McArdle, G.; Akbari, V. Hedgerows Monitoring in Remote Sensing: A Comprehensive Review. IEEE Access 2024, 12, 156184–156207. [Google Scholar] [CrossRef]
  17. Pippuri, I.; Suvanto, A.; Maltamo, M.; Korhonen, K.T.; Pitkänen, J.; Packalen, P. Classification of Forest Land Attributes Using Multi-Source Remotely Sensed Data. Int. J. Appl. Earth Obs. Geoinf. 2016, 44, 11–22. [Google Scholar] [CrossRef]
  18. Deng, R.; Guo, Q.; Jia, M.; Wu, Y.; Zhou, Q.; Xu, Z. Extraction of Farmland Shelterbelts from Remote Sensing Imagery Based on a Belt-Oriented Method. Front. For. Glob. Change 2023, 6, 1247032. [Google Scholar] [CrossRef]
  19. Breiman, L. (Ed.) Classification and Regression Trees; The Wadsworth statistics/probability series; Wadsworth & Brooks/Cole: Pacific Grove, CA, USA, 1984; ISBN 978-0-534-98054-2. [Google Scholar]
  20. Moisen, G.G.; Frescino, T.S. Comparing Five Modelling Techniques for Predicting Forest Characteristics. Ecol. Model. 2002, 157, 209–225. [Google Scholar] [CrossRef]
  21. Li, Y.; Sun, B.; Gao, Z.; Wang, B.; Yan, Z.; Su, W.; Gao, T.; Yue, W. Farmland Shelterbelt Information Extraction Based on Multispectral Image of the ZY1-02E Satellite. Natl. Remote Sens. Bull. 2024, 28, 624–634. [Google Scholar] [CrossRef]
  22. Yuan, X.; Shi, J.; Gu, L. A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
  23. Ramos, L.T.; Sappa, A.D. Leveraging U-Net and Selective Feature Extraction for Land Cover Classification Using Remote Sensing Imagery. Sci. Rep. 2025, 15, 784. [Google Scholar] [CrossRef] [PubMed]
  24. Fan, X.; Yan, C.; Fan, J.; Wang, N. Improved U-Net Remote Sensing Classification Algorithm Fusing Attention and Multiscale Features. Remote Sens. 2022, 14, 3591. [Google Scholar] [CrossRef]
  25. Lu, Y.; Li, H.; Zhang, C.; Zhang, S. Object-Based Semi-Supervised Spatial Attention Residual UNet for Urban High-Resolution Remote Sensing Image Classification. Remote Sens. 2024, 16, 1444. [Google Scholar] [CrossRef]
  26. Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
  27. Aleissaee, A.A.; Kumar, A.; Anwer, R.M.; Khan, S.; Cholakkal, H.; Xia, G.-S.; Khan, F.S. Transformers in Remote Sensing: A Survey. Remote Sens. 2023, 15, 1860. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Huang, M.; Chen, Y.; Xiao, X.; Li, H. Land Cover Classification in High-Resolution Remote Sensing: Using Swin Transformer Deep Learning with Texture Features. J. Spat. Sci. 2025, 70, 205–229. [Google Scholar] [CrossRef]
  29. Yuan, J.; Wang, L.; Cheng, S. STransUNet: A Siamese TransUNet-Based Remote Sensing Image Change Detection Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 9241–9253. [Google Scholar] [CrossRef]
  30. Luo, Z.; Yang, W.; Yuan, Y.; Gou, R.; Li, X. Semantic Segmentation of Agricultural Images: A Survey. Inf. Process. Agric. 2024, 11, 172–186. [Google Scholar] [CrossRef]
  31. Zhai, J.; Wang, L.; Liu, Y.; Wang, C.; Mao, X. Assessing the Effects of China’s Three-North Shelter Forest Program over 40 Years. Sci. Total Environ. 2023, 857, 159354. [Google Scholar] [CrossRef]
  32. Podhrázská, J.; Kučera, J.; Doubrava, D.; Doležal, P. Functions of Windbreaks in the Landscape Ecological Network and Methods of Their Evaluation. Forests 2021, 12, 67. [Google Scholar] [CrossRef]
  33. Li, X.; Shi, Z.; Yu, J.; Liang, J. Study on the Change in Vegetation Coverage in Desert Oasis and Its Driving Factors from 1990 to 2020 Based on Google Earth Engine. Appl. Sci. 2023, 13, 5394. [Google Scholar] [CrossRef]
  34. Xue, Z.; Wang, X.; Zhou, W. Selection of Optimal Farmland Tree Speciesin Southern Xinjiang Oasis Based on the Process of Photosynthesis. Pol. J. Environ. Stud. 2025, 34, 5077–5083. [Google Scholar] [CrossRef]
  35. Luo, J.; Chu, Q.; Sun, C.; Wang, Y.; Sun, D. Staple Crop Mapping with Chinese Gaofen-1 and Gaofen-6 Satellite Images: A Case Study in Yanshou County, Heilongjiang Province, China. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11 July 2021; IEEE: New York, NY, USA, 2021; pp. 6769–6772. [Google Scholar]
  36. Liang, J.; Zheng, Z.; Zhang, X.; Tang, Y. China Crop Recognition and Evaluationusing Red Edge Features of GF-6 Satellite. Natl. Remote Sens. Bull. 2020, 24, 1168–1179. [Google Scholar] [CrossRef]
  37. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. ISBN 978-3-319-24573-7. [Google Scholar]
  38. Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arxiv 2018, arXiv:1804.03999. [Google Scholar]
  39. Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
  40. Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2021. [Google Scholar]
  41. Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
  42. Ni, Y.; Liu, J.; Zhang, H.; Chi, W.; Luan, J. Category-Guided Transformer for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2025, 17, 3054. [Google Scholar] [CrossRef]
  43. Zhang, T.; Ji, W.; Li, W.; Qin, C.; Wang, T.; Ren, Y.; Fang, Y.; Han, Z.; Jiao, L. EDWNet: A Novel Encoder–Decoder Architecture Network for Water Body Extraction from Optical Images. Remote Sens. 2024, 16, 4275. [Google Scholar] [CrossRef]
  44. Wu, X.; Wang, D.; Ma, C.; Zeng, Y.; Lv, Y.; Huang, X.; Wang, J. Parcel Segmentation Method Combined YOLOV5s and Segment Anything Model Using Remote Sensing Image. Land 2025, 14, 1429. [Google Scholar] [CrossRef]
  45. Galdran, A.; Carneiro, G.; Ballester, M.A.G. On the Optimal Combination of Cross-Entropy and Soft Dice Losses for Lesion Segmentation with Out-of-Distribution Robustness. In Diabetic Foot Ulcers Grand Challenge; Yap, M.H., Kendrick, C., Cassidy, B., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2023; Volume 13797, pp. 40–51. ISBN 978-3-031-26353-8. [Google Scholar]
  46. Yeung, M.; Sala, E.; Schönlieb, C.-B.; Rundo, L. Unified Focal Loss: Generalising Dice and Cross Entropy-Based Losses to Handle Class Imbalanced Medical Image Segmentation. Comput. Med. Imaging Graph. 2022, 95, 102026. [Google Scholar] [CrossRef]
  47. Müller, D.; Soto-Rey, I.; Kramer, F. Towards a Guideline for Evaluation Metrics in Medical Image Segmentation. BMC Res. Notes 2022, 15, 210. [Google Scholar] [CrossRef]
  48. Bressan, P.O.; Junior, J.M.; Martins, J.A.C.; Gonçalves, D.N.; Freitas, D.M.; Osco, L.P.; Silva, J.d.A.; Luo, Z.; Li, J.; Garcia, R.C.; et al. Semantic Segmentation with Labeling Uncertainty and Class Imbalance. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102690. [Google Scholar] [CrossRef]
  49. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global Land Cover Mapping at 30 m Resolution: A POK-Based Operational Approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef]
  50. Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer Resolution Observation and Monitoring of Global Land Cover: First Mapping Results with Landsat TM and ETM+ Data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef]
  51. Xing, Z.F.; Li, Y.; Deng, R.X.; Zhu, H.L.; Fu, B.L. Extracting Farmland Shelterbelt Automatically Based on ZY-3 Remote Sensing Images. Sci. Silvae Sin. 2016, 52, 11–20. [Google Scholar]
  52. Zhang, X.; Liu, J.; Meng, L.; Qin, C.; An, Z.; Wang, Y.; Liu, H. Enhanced Blue Band Vegetation Index (The Re-Modified Anthocyanin Reflectance Index (RMARI)) for Accurate Farmland Shelterbelt Extraction. Remote Sens. 2024, 16, 3680. [Google Scholar] [CrossRef]
Figure 1. Geographical location of the study area and photographs of key shelterbelt species. (a) Map of the People’s Republic of China (Map Review No.: GS (2024) 0650). (b) Xinjiang Uygur Autonomous Region. (c) The city of Aral. (d) Populus euphratica, a dominant tree species used in local shelterbelts. (e) Populus alba var. pyramidalis (Xinjiang poplar), another primary shelterbelt species in the study region.
Figure 1. Geographical location of the study area and photographs of key shelterbelt species. (a) Map of the People’s Republic of China (Map Review No.: GS (2024) 0650). (b) Xinjiang Uygur Autonomous Region. (c) The city of Aral. (d) Populus euphratica, a dominant tree species used in local shelterbelts. (e) Populus alba var. pyramidalis (Xinjiang poplar), another primary shelterbelt species in the study region.
Forests 17 00091 g001
Figure 2. Preprocessing workflow of the Gaofen-6 (GF-6) Panchromatic Multispectral Sensor (PMS) data.
Figure 2. Preprocessing workflow of the Gaofen-6 (GF-6) Panchromatic Multispectral Sensor (PMS) data.
Forests 17 00091 g002
Figure 3. Schematic Diagram of the four-Fold Cross-Validation Data Partitioning.
Figure 3. Schematic Diagram of the four-Fold Cross-Validation Data Partitioning.
Forests 17 00091 g003
Figure 4. Mean DSC (%) scatter plot. Light-colored dots denote DSC values from individual cross-validation folds, while dark solid dots represent the average DSC over four-fold cross-validation. The pink line connects the mean DSC values across different model architectures; error bars represent the standard error of the mean (SE, n = 4).
Figure 4. Mean DSC (%) scatter plot. Light-colored dots denote DSC values from individual cross-validation folds, while dark solid dots represent the average DSC over four-fold cross-validation. The pink line connects the mean DSC values across different model architectures; error bars represent the standard error of the mean (SE, n = 4).
Forests 17 00091 g004
Figure 5. Loss curves of different models during training: (a) training loss; (b) validation loss. All models were trained under consistent hyperparameter settings, and the curves reflect their convergence behavior and generalization capacity.
Figure 5. Loss curves of different models during training: (a) training loss; (b) validation loss. All models were trained under consistent hyperparameter settings, and the curves reflect their convergence behavior and generalization capacity.
Forests 17 00091 g005
Figure 6. Analysis of model performance (DSC) versus inference efficiency. The closer a model is to the upper-left corner, the better the combination of high accuracy and low latency it achieves.
Figure 6. Analysis of model performance (DSC) versus inference efficiency. The closer a model is to the upper-left corner, the better the combination of high accuracy and low latency it achieves.
Forests 17 00091 g006
Figure 7. Visual comparison of model performances for shelterbelt extraction under complex scenarios. Green masks represent manually refined ground truth (GT). Yellow boxes highlight missed detections (false negatives), while red boxes indicate false identifications (false positives). Six typical interference scenarios are illustrated: (a) interspersed fruit tree rows and shelterbelts; (b) fragmented and discontinuous shelterbelts; (c) sparse vegetation disturbance at desert margins; (d) mixed orchard-crop plantings; (e) Agricultural channels and narrow shelterbelts; (f) A river with thick reeds growing along its banks.
Figure 7. Visual comparison of model performances for shelterbelt extraction under complex scenarios. Green masks represent manually refined ground truth (GT). Yellow boxes highlight missed detections (false negatives), while red boxes indicate false identifications (false positives). Six typical interference scenarios are illustrated: (a) interspersed fruit tree rows and shelterbelts; (b) fragmented and discontinuous shelterbelts; (c) sparse vegetation disturbance at desert margins; (d) mixed orchard-crop plantings; (e) Agricultural channels and narrow shelterbelts; (f) A river with thick reeds growing along its banks.
Forests 17 00091 g007
Figure 8. Farmland shelterbelt extraction results across the entire study area (Alar City). The green overlays represent the extracted farmland shelterbelts at the regional scale. Zoomed-in views of representative local areas are provided, with each enlarged window covering 1280 × 1280 pixels, corresponding to an area of approximately 2560 × 2560 m, to illustrate local spatial patterns, continuity, and boundary details of the shelterbelt network.
Figure 8. Farmland shelterbelt extraction results across the entire study area (Alar City). The green overlays represent the extracted farmland shelterbelts at the regional scale. Zoomed-in views of representative local areas are provided, with each enlarged window covering 1280 × 1280 pixels, corresponding to an area of approximately 2560 × 2560 m, to illustrate local spatial patterns, continuity, and boundary details of the shelterbelt network.
Forests 17 00091 g008
Figure 9. Belt-level validation samples in Alar City.
Figure 9. Belt-level validation samples in Alar City.
Forests 17 00091 g009
Figure 10. Cross-Regional Case Study of Farmland Shelterbelt Extraction. (a) Partial satellite image of Aksu City (Center coordinate: 41°13′34.46″ N, 80°28′23.35″ E). (b) Farmland shelterbelt extraction result in Aksu City with the U-Net model. (c) Partial satellite image of Xayar County (Center coordinate: 41°19′21.77″ N, 82°20′46.03″ E). (d) Farmland shelterbelt extraction result in Xayar County with the U-Net model.
Figure 10. Cross-Regional Case Study of Farmland Shelterbelt Extraction. (a) Partial satellite image of Aksu City (Center coordinate: 41°13′34.46″ N, 80°28′23.35″ E). (b) Farmland shelterbelt extraction result in Aksu City with the U-Net model. (c) Partial satellite image of Xayar County (Center coordinate: 41°19′21.77″ N, 82°20′46.03″ E). (d) Farmland shelterbelt extraction result in Xayar County with the U-Net model.
Forests 17 00091 g010
Table 1. Hyperparameter settings for the comparative experiments.
Table 1. Hyperparameter settings for the comparative experiments.
ParameterValue
Input Size256 × 256
Epochs200
Batch Size16
OptimizerAdamW (lr = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 10−8, weight_decay = 0.01)
SchedulerCosineAnnealingLR (T_max = 200, ηmin = 10−5)
Loss FunctionBCE-Dice Loss
Table 2. Performance comparison of different models using four-fold cross-validation. Results are reported as mean ± SD. The 95% confidence intervals (CIs) were calculated as mean ± t0.025,3 × SE, where SE = SD/√4 and t0.025,3 = 3.182 for ν = 3.
Table 2. Performance comparison of different models using four-fold cross-validation. Results are reported as mean ± SD. The 95% confidence intervals (CIs) were calculated as mean ± t0.025,3 × SE, where SE = SD/√4 and t0.025,3 = 3.182 for ν = 3.
MethodDSC (%)IoU (%)Precision (%)Sensitivity (%)
U-Net90.93 ± 1.42 [88.67–93.19]84.56 ± 2.32 [80.87–88.25]92.06 ± 0.67 [90.99–93.13]91.21 ± 2.38 [87.42–95.00]
AttU_Net90.89 ± 1.35 [88.74–93.04]84.23 ± 2.14 [80.83–87.63]91.64 ± 1.00 [90.05–93.23]91.23 ± 1.67 [88.57–93.89]
ResU-Net90.16 ± 1.26 [88.16–92.16]83.31 ± 1.91 [80.27–86.35]90.94 ± 1.88 [87.95–93.93]90.84 ± 0.89 [89.42–92.26]
U2-Net91.03 ± 1.50 [88.64–93.42]84.51 ± 2.59 [80.39–88.63]91.68 ± 0.60 [90.73–92.63]91.51 ± 2.64 [87.31–95.71]
SwinUnet87.34 ± 1.03 [85.70–88.98]78.91 ± 1.46 [76.59–81.23]87.67 ± 1.21 [85.74–89.60]88.76 ± 1.12 [86.98–90.54]
TransUNet88.46 ± 1.48 [85.11–90.81]80.87 ± 2.15 [77.45–84.29]88.86 ± 2.08 [85.55–92.17]89.98 ± 1.21 [88.05–91.91]
Table 3. Performance of different models on the final independent test set.
Table 3. Performance of different models on the final independent test set.
MethodDSC (%)IoU (%)Precision (%)Sensitivity (%)
U-Net91.4584.6891.4791.93
AttU_Net91.1684.2090.3192.56
ResU-Net90.8283.6589.392.9
U2-Net91.3284.5391.6591.48
SwinUnet89.3181.2187.7691.55
TransUNet88.8680.5688.8989.41
Table 4. p-value matrix of pairwise statistical comparisons between six models on the final test set (n = 48). Depending on the normality of paired differences assessed by the Shapiro–Wilk test, either a paired-sample t-test or a Wilcoxon signed-rank test was applied. Smaller p-values indicate more significant performance differences (α = 0.05).
Table 4. p-value matrix of pairwise statistical comparisons between six models on the final test set (n = 48). Depending on the normality of paired differences assessed by the Shapiro–Wilk test, either a paired-sample t-test or a Wilcoxon signed-rank test was applied. Smaller p-values indicate more significant performance differences (α = 0.05).
U-NetAttU_NetResU-NetU2-NetSwinUnetTransUNet
U-Net-0.4790.0770.566<0.001<0.001
AttU_Net -0.0850.578<0.001<0.001
ResU-Net -0.129<0.001<0.001
U2-Net -<0.001<0.001
SwinUnet -0.915
TransUNet -
Table 5. Comprehensive Analysis of Computational Efficiency and Resource Consumption Across Models. Inference time (ms/image) is the average per-image wall-clock time obtained from 48 repeated inferences on the 48-image test set under the same hardware environment.
Table 5. Comprehensive Analysis of Computational Efficiency and Resource Consumption Across Models. Inference time (ms/image) is the average per-image wall-clock time obtained from 48 repeated inferences on the 48-image test set under the same hardware environment.
MethodParameters (M)Best EpochTraining Time (min)Inference Time (ms/Image)
U-Net4.3212314.196.8
AttU_Net34.8812721.2415.5
ResU-Net13.0417629.7413.3
U2-Net44.0117450.7743.4
SwinUnet41.341069.2316.1
TransUNet66.8211921.0110.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, W.; Liu, L.; Liu, R.; Chen, F.; Yang, L.; Qin, L.; Lyu, R. Precise Mapping of Linear Shelterbelt Forests in Agricultural Landscapes: A Deep Learning Benchmarking Study. Forests 2026, 17, 91. https://doi.org/10.3390/f17010091

AMA Style

Zhou W, Liu L, Liu R, Chen F, Yang L, Qin L, Lyu R. Precise Mapping of Linear Shelterbelt Forests in Agricultural Landscapes: A Deep Learning Benchmarking Study. Forests. 2026; 17(1):91. https://doi.org/10.3390/f17010091

Chicago/Turabian Style

Zhou, Wenjie, Lizhi Liu, Ruiqi Liu, Fei Chen, Liyu Yang, Linfeng Qin, and Ruiheng Lyu. 2026. "Precise Mapping of Linear Shelterbelt Forests in Agricultural Landscapes: A Deep Learning Benchmarking Study" Forests 17, no. 1: 91. https://doi.org/10.3390/f17010091

APA Style

Zhou, W., Liu, L., Liu, R., Chen, F., Yang, L., Qin, L., & Lyu, R. (2026). Precise Mapping of Linear Shelterbelt Forests in Agricultural Landscapes: A Deep Learning Benchmarking Study. Forests, 17(1), 91. https://doi.org/10.3390/f17010091

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop