Operational Anomaly Screening in Permanent Basic Farmland Using Optimized Remote Sensing Semantic Segmentation: Implications for Sustainable Land Stewardship

Wang, Jianwen; Wang, Yujie; Cheng, Jiahao; Gao, Caiyun; Rong, Wei; Wang, Nan; Hu, Jian

doi:10.3390/su18094292

Open AccessArticle

Operational Anomaly Screening in Permanent Basic Farmland Using Optimized Remote Sensing Semantic Segmentation: Implications for Sustainable Land Stewardship

by

Jianwen Wang

¹

,

Yujie Wang

²,

Jiahao Cheng

¹,

Caiyun Gao

³,

Wei Rong

²,

Nan Wang

² and

Jian Hu

^1,*

¹

College of Economics and Management, Hebei Agricultural University, Baoding 071000, China

²

College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding 071000, China

³

Baoding Trued Land Management Technology Service Co., Ltd, Baoding 071000, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(9), 4292; https://doi.org/10.3390/su18094292

Submission received: 5 April 2026 / Revised: 23 April 2026 / Accepted: 23 April 2026 / Published: 26 April 2026

Download

Browse Figures

Versions Notes

Abstract

Cropland protection enforcement is central to food security and sustainable land management, yet small-scale encroachments within Permanent Basic Farmland (PBF) boundaries frequently evade conventional field surveys and reactive inspection regimes. Existing remote sensing approaches rely mainly on comprehensive land-cover classification or bi-temporal change detection, which often generate alerts beyond the regulatory scope and require annotation efforts that limit county-scale deployment. To address this gap, this study reframes PBF monitoring as a boundary-constrained anomaly screening task, defined as the detection of surface conditions that deviate from expected cultivation norms within legally defined parcels. To operationalise this task, we adapt a DeepLabv3+-based segmentation pipeline by incorporating an auxiliary edge branch and a composite loss to improve sensitivity to minority-class anomalies and preserve fragmented parcel boundaries. The model is trained on the LoveDA dataset and evaluated in Mancheng District, Hebei Province, China, without site-specific fine-tuning. Multi-temporal imagery from 2021 to 2023 is further used as a post hoc consistency check to distinguish persistent anomalies from transient surface conditions, rather than to model temporal dynamics explicitly. Cross-regional zero-shot evaluation further examines model robustness under heterogeneous environmental conditions. Benchmarked against five comparison architectures, the adapted pipeline achieves a Recall of 61.25%, representing a 10.24 percentage-point improvement over DeepLabv3+ and expanding the set of candidate encroachments for field verification. This result should be interpreted in terms of screening sensitivity rather than overall segmentation optimisation. The outputs are intended as preliminary screening leads that support, rather than replace, expert review. The principal contribution of this study therefore lies in reframing PBF monitoring as an operational anomaly-screening task aligned with enforcement needs, rather than in proposing a fundamentally new segmentation architecture.

Keywords:

cropland protection; sustainable land management; anomaly screening; remote sensing semantic segmentation

1. Introduction

Food security and sustainable agricultural production are among the most pressing challenges of our time, underpinning several Sustainable Development Goals (SDGs) of the United Nations 2030 Agenda [1]. At their foundation lies cropland protection: by preserving arable land as the primary resource for grain production, cropland safeguards serve the zero hunger goal, particularly in regions undergoing rapid urbanization. In peri-urban transition zones such as Mancheng District, China, cropland monitoring also bears on the sustainable cities and communities goal, as directing urban growth toward compact development patterns is inseparable from limiting encroachment on fertile agricultural land [2,3]. Beyond productivity, PBF constitutes a component of terrestrial ecosystems that provides biodiversity habitat, regulates soil and water, and sequesters carbon, all of which are functions directly relevant to the life on land goal [4,5]. Unauthorized activities within protected boundaries, including illegal hardening, construction, and surface modification, erode these ecosystem functions, contributing to habitat fragmentation, land degradation, and heightened climate vulnerability [6].

China’s cultivated land is subject to compounding pressures: urban sprawl, infrastructure expansion, field abandonment, and the shift away from grain cultivation. Urban growth alone consumed approximately 159,200 km² of cropland between 1992 and 2016—equivalent to half the total urban expansion area over that period—placing China among the most severely affected countries worldwide [2]. The consequences extend beyond physical conversion; declining productivity and non-grain land use erode the long-term stability of national food supply, reinforcing the case for stronger, more integrated protection frameworks [4,5].

In response, China has established legally binding protections through PBF designation and high-standard farmland programs [7], yet translating these mandates into consistent enforcement remains challenging. Cropland governance operates within a state-driven food security framework implemented through the cultivated land “red line” policy, the PBF designation system, and the “Chief Farmland Officer” accountability mechanism, supported by the Ecological Civilization agenda. These instruments transmit strict targets through provincial, municipal, and county administrations. In peri-urban zones such as Mancheng District, however, top-down protection mandates intersect with competing local development interests, creating institutional tensions that shape both violation patterns and resource allocation. The result is uneven implementation: local capacity, fiscal constraints, and performance pressures mediate how monitoring obligations are executed in practice. Unauthorized land-use changes, including illegal construction and surface hardening, frequently alter agricultural conditions before administrative records are updated. Although periodic inspections are formally required [8,9], the geographic scope and inspection frequency far exceed county-level agency capacity, relegating enforcement to a reactive mode [10]. With finite personnel and budgets, inspection efforts are concentrated in administratively accessible or politically prioritized areas, leaving other jurisdictions systematically underserved [11,12]. Encroachment patterns are further influenced by broader contextual factors. Socioeconomic pressures are often associated with higher encroachment risks near urban margins, where land values and development incentives are elevated. Soil conditions and cultivation practices affect surface characteristics captured in remote sensing imagery, while seasonal dynamics introduce temporal variability that complicates single-date interpretation. Climatic variability also alters vegetation cover and bare-soil exposure, influencing detection sensitivity and contributing to both false positives and missed detections [13]. Rather than modelling these drivers explicitly, this study focuses on the operational detection stage, identifying spatial anomalies that warrant field verification and thereby supporting more systematic and potentially more equitable allocation of limited inspection capacity [14].

Similar tensions between large-scale agricultural monitoring and resource-constrained enforcement have prompted remote sensing adoption in other national contexts. Self-supervised learning has enabled efficient land-cover mapping across United States agricultural regions without dense labelling requirements [15]. Polygon-matching methods have supported cross-referencing of satellite-derived field boundaries against cadastral records in several European countries, facilitating anomaly detection within protected designations [16]. Multi-temporal semantic segmentation has been applied to crop-type and land-cover mapping in transboundary watersheds, while multi-country benchmark datasets have advanced model generalisation across diverse agricultural landscapes [17,18]. Systematic evaluations of Sentinel-2-based classification have further demonstrated the operational viability of satellite-driven monitoring for sustainable land management in Europe [19].

Conventional monitoring instruments, including field inspections, administrative self-reporting, and periodic land surveys, are ill-suited to these demands. Their labor intensity, cost, and inherent time lags make it difficult to generate the timely, spatially precise intelligence that proactive enforcement requires. In rapidly changing peri-urban landscapes, agencies are caught between two unacceptable options: broadening patrol coverage to a point that is financially unsustainable, or narrowing it in ways that allow early encroachments to go undetected. What is needed is a lightweight and repeatable screening tool capable of systematically flagging suspicious deviations within PBF boundaries, so that limited inspection capacity can be directed where it is most needed [20].

Remote sensing combined with deep learning offers a practical path toward such a tool. Satellite imagery provides temporally rich observations of surface conditions and vegetation status, while deep learning methods have demonstrated strong performance in change detection [21,22,23]. Recent advances in remote sensing monitoring, including semantic change detection, boundary-aware localisation, and cropland-oriented bi-temporal monitoring frameworks, provide an important methodological reference for the present study. At the same time, broader remote sensing analyses of land-use/cover dynamics and land degradation, together with advances in deep segmentation architectures, further underscore the potential of satellite-based approaches for land monitoring [24,25]. However, temporally explicit monitoring approaches generally share a common premise: they rely on co-registered bi-temporal or multi-temporal inputs and are designed to characterise transitions between acquisition dates. The regulatory problem addressed here is distinct. The central question is not what changed between two observations, but whether the current surface condition of a legally protected PBF parcel deviates from expected cultivation practice and therefore warrants verification.

This distinction motivates a shift from change detection to anomaly screening. Change detection is temporally relational, characterising whether and how a location has shifted between observations. Anomaly screening, as framed here, is norm-referenced and regulation-constrained: a parcel is flagged when its observed surface condition is inconsistent with the continued cultivation use expected of protected parcels within a legally defined boundary, irrespective of whether a temporal transition has occurred. In this study, “expected cultivation norms” refer to observable surface conditions consistent with the continued agricultural use of legally protected PBF parcels. For operational purposes, these norms are translated into a binary distinction between compliant cultivated surfaces and anomalous surfaces indicating likely non-agricultural occupation or persistent deviation from cultivation use. Seasonal harvesting may register as change without constituting a violation; conversely, a persistent unauthorized structure may be identifiable from a single observation without a known transition date. The proposed framework is therefore best understood as a current-state, boundary-constrained screening approach for compliance prioritisation, in which labels reflect regulatory consistency rather than generic land-cover categories, complementary to rather than substituting for temporally explicit change-detection methods.

We accordingly reframe PBF monitoring as targeted anomaly screening: the identification of surface conditions within legal PBF boundaries that deviate from expected cultivation norms, including building encroachments, hardened surfaces, persistent bare soil inconsistent with normal cultivation cycles, and compaction. Outputs are not definitive violation determinations; they are prioritised investigation leads intended to direct expert review and field verification [26]. To operationalise this task, we implement an adapted segmentation pipeline that uses official PBF boundary polygons as spatial masks and optical satellite imagery as the observational basis, producing parcel-level binary anomaly maps restricted to protected parcels for direct integration into Geographic Information System (GIS) platforms and inspection workflows. Confining outputs to protected zones reduces false-alarm burden and supports a detection–localisation–verification logic without requiring fine-grained class labels, costly annotation, or heavy computational infrastructure [27]. Standardised, reproducible outputs also reduce the role of subjective judgment in setting inspection priorities, supporting more consistent and equitable enforcement across jurisdictions. The framework is intended as an operational complement to existing governance arrangements rather than a replacement for them.

Against this backdrop, this study adopts a circumscribed methodological scope, eschewing novel architectural innovations in favor of demonstrating the adaptation of established segmentation frameworks to regulation-specific anomaly detection. The research yields three principal contributions: First, it reframes PBF monitoring from comprehensive land-use classification toward deviation-centric anomaly screening, establishing a norm-referenced regulatory paradigm complementary to conventional change detection methodologies. Second, it operationalizes this framework through a boundary-constrained single-date screening pipeline that generates parcel-level binary anomaly maps in contexts where consistent multi-temporal archives are unavailable. In the present study, multi-year imagery is used only for a post hoc consistency check on independently generated single-date predictions, not as input to a temporally explicit model. Third, it demonstrates how task-oriented implementation enhances consistency and operational efficiency within hierarchically structured regulatory systems, with multi-temporal analysis constituting a natural trajectory for strengthening temporal discrimination in future research.

2. Materials and Methods

2.1. Study Area

The study area is located in Mancheng District, Baoding City, Hebei Province, China (Figure 1), spanning 115°10′–115°30′ E and 38°50′–39°10′ N. The landscape comprises a heterogeneous mosaic of cropland interspersed with rural settlements, roads, and facility agriculture, where field parcels are highly fragmented with complex boundaries. Two challenges are commonly encountered in this context: boundary adhesion between adjacent parcels with similar spectral or textural characteristics, and localized encroachment of non-agricultural land uses into cropland areas. These conditions make the study area a representative testbed for evaluating parcel boundary segmentation performance and detecting anomalous encroachment within the PBF.

2.2. Data Sources

Remote sensing data comprised Gaofen-2 (GF-2) imagery, supplemented by basemap imagery from the National Tianditu platform, China’s official government mapping service. GF-2 provides panchromatic and multispectral resolutions of 0.8 m and 3.2 m, respectively, which support the detection of subtle structural changes along field margins. All imagery was clipped in ArcGIS 10.7 and exported as standard image chips for model input. The model was trained exclusively on the publicly available LoveDA dataset [28], which contains 5987 high-resolution images from Nanjing, Changzhou, and Wuhan. Training augmentations included random flips, rotations, scaling, per-channel standardisation, and random cropping. Only three-channel Red, Green, Blue (RGB) images were used, without additional constraints on spatial resolution, cloud cover, or atmospheric correction, in order to better reflect realistic deployment conditions.

The primary evaluation data consisted of multi-temporal GF-2 imagery from Mancheng District acquired in 2021, 2022, and 2023. No Mancheng data were used during training. The 2022 imagery served as the main evaluation set because it corresponded to the PBF boundary designation year, while the 2021 and 2023 images provided temporal context for distinguishing persistent encroachments from transient surface conditions. The PBF boundary dataset also corresponded to the 2022 designation year and was obtained from the National Tianditu platform, the PBF query platform, and official vector-data applications. All three annual image sets were co-registered to the 2022 boundary vectors to ensure spatial consistency. These boundaries served as legal and spatial constraints, and all outputs were restricted to the PBF extent to reduce false alarms and improve compliance relevance.

The task was formulated as binary semantic segmentation within PBF boundaries, distinguishing compliant cropland from anomalous areas that deviated from expected cultivation practices. Compliant areas were defined as parcels showing normal tillage textures without hardened surfaces or constructed features, whereas anomalous areas included suspected buildings, impervious surfaces, road extensions, and temporary structures. Ground-truth labels were derived from the 2022 Mancheng imagery. Two specialists independently delineated anomalous regions as polygons in ArcGIS, and the annotations were subsequently checked for geometric and topological consistency before being rasterised into pixel-level masks. The 2021 and 2023 images were not independently annotated but were used for post hoc temporal comparison to assess anomaly persistence across years. Accordingly, temporal persistence in this study should be interpreted as an inference based on cross-year consistency of single-date predictions rather than as a fully validated temporal ground-truth assessment. Annotator disagreements were resolved through consensus, and all revisions were logged for traceability. On a 50-tile validation sample selected to cover the main feature types and scene conditions relevant to this task, including compliant cultivated parcels, built structures, hardened surfaces, road extensions, and spectrally ambiguous bare-soil-like areas, inter-annotator agreement reached a mean Intersection over Union (IoU) of 0.85 and a Cohen’s Kappa of 0.82. The sample was designed to include both anomalous and non-anomalous parcels as well as varying boundary complexity, in order to assess annotation consistency under representative labeling conditions. Tiles scoring below 0.70 on either metric were refined before inclusion. For training, the seven original LoveDA classes were remapped to match the network indexing scheme, with invalid pixels excluded from loss computation. Full seven-class supervision was retained during training to preserve scene context, whereas evaluation was restricted to the cropland and non-cropland categories relevant to PBF anomaly screening. In addition, the trained model was evaluated on seven regions from the OpenEarthMap dataset, namely Baybay (Philippines), Świętokrzyskie (Poland), Münster and Köln (Germany), Mahe (Seychelles), Lohur (Pakistan), and Lambayeque (Peru), to support cross-regional generalization evaluation [29]. For this evaluation, the original OpenEarthMap multiclass labels were remapped into a binary cropland/non-cropland scheme aligned with the anomaly-screening objective of this study. Pixels annotated as cropland were assigned to the cropland class, whereas all remaining classes were merged into the non-cropland class.

2.3. Cross-Regional Evaluation Design

Cross-regional generalization of the proposed DeepLabPro framework was evaluated through zero-shot inference on seven representative regions from the OpenEarthMap dataset: Baybay (Philippines), Świętokrzyskie (Poland), Münster (Germany), Köln (Germany), Mahe (Seychelles), Lohur (Pakistan), and Lambayeque (Peru). These sites were selected to capture the environmental and land-use diversity relevant to operational cropland monitoring.The regions encompass tropical, temperate, arid, and maritime climates and therefore reflect variations in vegetation dynamics, seasonal patterns, and surface reflectance. Agriculturally, they include both fragmented smallholder systems and large-scale mechanized fields, illustrating differences in parcel geometry, management practices, and land-use heterogeneity. Surface complexity also varies, with mixed cover types, bare-soil exposure, and irregular parcel boundaries that are known to affect segmentation performance.

The evaluation does not explicitly model climatic, soil, or socioeconomic drivers. Instead, it empirically examines generalization performance under conditions in which these factors are implicitly expressed through observable surface patterns. All experiments followed a strict zero-shot protocol without target-domain fine-tuning, mirroring realistic deployment scenarios in which local labeled data are unavailable.

2.4. Methods

This study addresses the detection stage of cropland protection auditing by formulating anomalous land-occupancy screening as a pixel-level binary semantic segmentation task based on high-resolution optical imagery. The analysis is primarily conducted on multi-temporal data from Mancheng District, with cross-regional evaluation introduced as a supplementary assessment to examine model robustness under varying environmental conditions.

To implement this task within PBF boundaries, an edge-augmented adaptation of DeepLabv3+ is employed. The introduction of an auxiliary edge branch and a composite loss function is motivated by the need to address fragmented parcel structures and severe class imbalance, rather than to propose architectural novelty. The resulting framework adopts a unified end-to-end structure with two parallel outputs, a segmentation branch for anomaly detection and an auxiliary branch for boundary delineation, both sharing a common encoder and trained jointly.

PBF boundaries define the auditing scope and constrain both imagery and labels during data preparation. Model generalization is assessed under a zero-shot cross-regional setting without fine-tuning on target-domain data. For temporal analysis, the trained model is applied independently to imagery from 2021, 2022, and 2023 in Mancheng District, and predictions are compared across years to distinguish persistent encroachments from transient surface conditions. This comparison is conducted as a post hoc consistency check based on single-date predictions and does not involve temporal modelling during training. Figure 2 presents the overall architecture of the proposed screening framework.

2.4.1. Task-Oriented Adaptation of the Baseline Architecture

The implementation builds on DeepLabv3+, an encoder–decoder architecture with Atrous Spatial Pyramid Pooling (ASPP) for multi-scale feature extraction. Relative to the baseline DeepLabv3+, the present implementation retains the core encoder–decoder structure and ASPP module while introducing task-oriented adaptations to address challenges specific to high-resolution agricultural scenes, including fragmented parcels and class imbalance. These adaptations include an auxiliary edge branch and a composite loss function, both introduced to improve boundary-sensitive anomaly screening performance rather than to constitute a fundamentally new segmentation architecture. All network parameters, including both the backbone and the newly introduced components, were trained end-to-end after initialization from LoveDA pretraining, and no Mancheng data were used for model selection or target-domain tuning.

The resulting DeepLabv3+-based implementation, referred to here as DeepLabPro for convenience, retains the encoder–decoder paradigm of DeepLabv3+ while refining feature extraction, hierarchical decoding, and attention-based feature refinement across three components.

Backbone: ResNeXt-101 (32 × 8d) serves as the feature extractor. Its grouped convolutions with increased cardinality improve the representation of subtle cropland textures without a proportional increase in parameters. Feature maps from four backbone stages (L1–L4) are extracted for subsequent decoding and edge-aware processing.

Multi-scale context and cascaded decoding: The ASPP module is retained with dilation rates of 6, 12, and 18. In the decoding stage, the standard single-step fusion is replaced by a cascaded decoder that progressively integrates high-level semantic features (L4) with mid-level features (L2) and low-level spatial details (L1), improving boundary preservation and reducing over-smoothing. At each fusion step, high-level features are upsampled via bilinear interpolation, fused with the corresponding lower-level features through concatenation, and refined by a 3 × 3 convolution–Batch Normalisation–ReLU block for channel reduction and feature consolidation.

Attention refinement: A Convolutional Block Attention Module (CBAM) is embedded at the final decoder fusion stage. Channel attention followed by spatial attention adaptively reweights feature responses, suppressing background noise and highlighting discriminative cues associated with cropland and built-up surfaces.

The proposed framework was implemented from scratch using Python 3.8 and the PyTorch2.0.1 deep learning library. All models were trained and evaluated on an Ubuntu workstation equipped with an Intel Xeon CPU and a single NVIDIA RTX 5880 Ada GPU (48 GB memory).

2.4.2. Auxiliary Edge Branch Configuration

To alleviate parcel adhesion and improve boundary delineation in densely fragmented agricultural scenes, a Dual Edge Branch is introduced in parallel with the main segmentation stream. Rather than relying on high-level semantic representations, this auxiliary pathway leverages low- and mid-level features from shallow backbone stages (L1 and L2), which preserve fine spatial details relevant to parcel boundaries. These features are fused and processed by a lightweight convolutional head to predict a binary edge map, thereby providing complementary boundary cues that are not always captured by the primary segmentation pathway.

Edge supervision targets are derived from the rasterised anomaly masks by extracting fixed-width boundary contours using morphological gradient operations. This supervision is applied exclusively during training to encourage the learning of continuous and well-defined parcel boundaries and to reduce adhesion between adjacent regions. During inference, the edge branch is not required, and predictions are generated solely from the primary segmentation pathway. The edge branch is therefore best understood as a task-oriented refinement that enhances boundary-sensitive anomaly screening, rather than as a standalone architectural innovation.

2.4.3. Composite Loss Function

To address severe class imbalance between the minority of abnormal pixels and the majority of background pixels, as well as boundary ambiguity, a composite objective was adopted for optimization. The total loss was defined as

L_{t o t a l} = 0.3 \cdot L_{C E}^{w} + 0.3 \cdot L_{F T} + 0.4 \cdot L_{L o v a s z} + 0.2 \cdot L_{e d g e}

(1)

where the four terms correspond to weighted cross-entropy, Focal Tversky, Lovász–Softmax, and edge supervision losses, respectively.

(1): Weighted cross-entropy ( $L_{C E}^{w}$ ): Higher pixel weights were assigned to conflict zones at the cropland–built-up interface, directing the model’s attention toward visually ambiguous boundary pixels.
(2): Focal Tversky loss ( $L_{F T}$ ): This loss combined the hard-sample mining capability of Focal Loss with the recall-oriented optimization of Tversky Loss, reducing missed detections of small and fragmented abnormal patches [30].
(3): Lovász-Softmax loss ( $L_{L o v a s z}$ ): This loss directly optimized the mean Intersection-over-Union, improving the geometric consistency and overall boundary quality of the segmentation results.
(4): Edge supervision loss ( $L_{e d g e}$ ): This auxiliary term is applied to the edge branch during training to enhance boundary continuity and improve the delineation of adjacent parcels in fragmented agricultural landscapes. This term provides auxiliary boundary guidance without dominating the overall optimization objective.

The coefficients of the composite loss were empirically determined through pilot experiments to balance anomaly detection sensitivity and boundary delineation. No exhaustive sensitivity analysis over alternative coefficient settings was conducted, which is acknowledged as a limitation of the present study.

During inference, the trained model processes single-date image chips and produces binary anomaly maps restricted to the PBF extent. For temporal consistency analysis, predictions from the 2021, 2022, and 2023 Mancheng acquisitions were overlaid and compared at the parcel level. For operational prioritization, parcels flagged in at least two of the three annual predictions were treated as temporally recurrent screening signals and assigned higher priority for field verification, whereas parcels flagged in only one year were treated as lower-priority candidates because they were more likely to reflect transient conditions or prediction noise. This rule is heuristic and is used only as a post hoc prioritization step; it should not be interpreted as strict temporal validation or as proof of ongoing illegal development.

In addition, during manuscript preparation, the authors used the generative AI tool Gemini 3 (Google) for limited auxiliary tasks, specifically for figure preparation and Chinese–English translation of draft text. The tool was not used for data collection, model development, remote sensing analysis, result generation, or interpretation. All AI-assisted outputs were reviewed, revised, and verified by the authors, who take full responsibility for the scientific content and conclusions of this study.

3. Results

3.1. Quantitative Performance Assessment of the Segmentation Network

DeepLabPro was benchmarked against five semantic segmentation architectures: the classical convolutional designs PSPNet, U-Net, and DeepLabv3+, and the recent representative models SegFormer and HRNet. All models were pre-trained on LoveDA and evaluated on the Mancheng District test set without site-specific fine-tuning, reflecting a zero-shot cross-regional protocol representative of operational settings where target-domain labels are unavailable. Performance was assessed using mIoU, Overall Accuracy (OA), F1-score, Precision, and Recall, which jointly characterise pixel-level accuracy, prediction conservativeness, and class-imbalance handling in fragmented agricultural landscapes [31].

Quantitative results are summarised in Table 1. DeepLabPro achieves an mIoU of 42.23%, OA of 67.94%, F1-score of 57.08%, Precision of 67.92%, and the highest Recall of 61.25% among all evaluated models. SegFormer leads on mIoU at 43.21%, F1-score at 59.62%, and Precision at 70.58%, yet its Recall of 56.05% falls below that of DeepLabPro by 5.20 percentage points.

This divergence reflects a recurring trade-off in screening-oriented segmentation: architectures optimised for global overlap metrics tend toward conservative minority-class predictions, improving Precision at the cost of anomaly sensitivity. DeepLabPro sustains competitive Precision while improving Recall from 51.01% in DeepLabv3+ to 61.25%, an increase of 10.24 percentage points. This improvement is specific to Recall and should be interpreted as enhanced screening sensitivity rather than a general gain across all evaluation metrics [32]. In operational terms, these predictions should be interpreted as an expanded shortlist of candidate parcels for follow-up review rather than as confirmed violation sites.

The operational significance of this distinction is pronounced in PBF enforcement. Missed detections expose parcels to irreversible surface hardening or structural occupation, whereas false positives impose only the marginal cost of an additional site visit. Given this asymmetric cost structure, Recall is the more relevant indicator of practical utility, and DeepLabPro’s profile of marginally lower Precision than SegFormer but markedly higher Recall better suits the demands of compliance screening.

Differences across the remaining architectures reflect varying limitations in spatial detail recovery and class-imbalance handling. PSPNet’s pyramid pooling over-smooths fine-grained predictions, causing small encroachments to be suppressed; U-Net recovers local geometry through skip connections but lacks the semantic depth needed for reliable anomaly discrimination; DeepLabv3+ strengthens contextual consistency yet remains prone to inter-parcel adhesion in fragmented landscapes; HRNet maintains high-resolution representations throughout but still underperforms on both mIoU and Recall, confirming that resolution preservation alone does not resolve class-imbalance sensitivity. DeepLabPro addresses these shortcomings through an auxiliary edge branch targeting parcel adhesion and a composite loss function weighted toward minority-class pixels, two design choices that together prioritise anomaly localisation over aggregate accuracy. Their individual contributions are examined in the ablation study that follows. It should be noted that the five comparison architectures represent a range of established convolutional and Transformer-based designs rather than an exhaustive survey of available segmentation models; other architectures may yield different performance profiles under the same evaluation protocol.

3.2. Ablation Study: Effectiveness of Key Components

To evaluate the contribution of each component, three ablated variants were tested under the same zero-shot protocol, including the removal of the Convolutional Block Attention Module, the removal of the auxiliary edge branch, and the replacement of the composite loss with standard cross-entropy. All models were pre-trained on LoveDA and evaluated on the Mancheng District test set without site-specific fine-tuning, with all other settings kept identical. Results are reported from a single training and evaluation run under fixed conditions; while repeated runs or confidence intervals would provide a more robust estimate of variability, the present comparison is intended as a controlled assessment of component-wise effects.

Quantitative results are presented in Table 2. The complete DeepLabPro exhibits the most balanced performance across metrics relevant to the screening task, particularly in terms of anomaly sensitivity and boundary-aware segmentation quality, whereas overall accuracy is not considered the primary indicator of effectiveness in this context. In this setting, the evaluation follows a recall-oriented perspective, where avoiding missed detections is prioritised over maximising overall accuracy.

As shown in Table 2, the largest performance decline occurs when the composite loss is replaced with standard cross-entropy, with mIoU decreasing by 6.10 percentage points, highlighting the importance of explicitly addressing class imbalance. Removing the edge branch leads to a 2.13 percentage point reduction in mIoU and a noticeable drop in OA, underscoring the role of boundary learning in separating adjacent parcels and delineating anomalies. In contrast, removing CBAM results in a slight increase in OA but a reduction in F1-score. This pattern is more appropriately interpreted as a trade-off between conservative background prediction and screening sensitivity, rather than as a clear improvement or deterioration. In this anomaly-screening setting, OA is secondary to the preservation of anomaly-related signals, and the contribution of CBAM lies in supporting a more balanced precision–recall relationship rather than maximising overall accuracy alone.

3.2.1. Enhancement of Structural Integrity in Encroachment Masks

Illegal encroachments in PBF monitoring encompass heterogeneous targets, including roads, hardened surfaces, and non-agricultural facilities. For baseline segmentation models without an edge branch, maintaining spatial continuity of these features remains challenging. Progressive feature downsampling weakens fine structural cues, leading to incomplete delineation of encroachment extents. As a result, predictions often exhibit fragmented morphology and eroded boundaries, with continuous violations incorrectly split into discrete clusters and the occupied area consequently underestimated [33].

This effect is visible in the ablated variants (Figure 3a–c), where anomalies within the cyan-boxed region appear discontinuous, with gaps and irregular margins that deviate from the observed ground features. In contrast, DeepLabPro (Figure 3d), which incorporates the Dual Edge Branch, produces more coherent predictions and improves structural integrity. Edge supervision encourages the retention of boundary information, resulting in more compact masks with improved continuity. Linear features such as roads and areal structures are therefore more frequently delineated as unified objects rather than dispersed fragments, which in turn supports more reliable area estimation for land-use compliance assessment. Nevertheless, narrow linear features and geometrically complex structures remain challenging even for DeepLabPro, and the example in Figure 3 should therefore be interpreted as an illustration of improved structural preservation rather than as evidence of consistently accurate delineation.

3.2.2. Background Confusion Suppression via Attention Refinement

PBF landscapes in Mancheng exhibit substantial within-class variability and contain numerous visually confounding elements, including plastic mulching, seasonal fallow, and phenology-driven texture changes. These patterns can resemble “non-grain production” signals (e.g., hardened surfaces associated with construction) and therefore contribute to false alarms, especially in periods dominated by bare soil or post-harvest conditions.

CBAM is introduced at the decoder fusion stage to refine feature aggregation by reweighting informative responses and suppressing irrelevant ones [34]. Channel attention emphasizes discriminative spectral cues associated with vegetation and farmland structure, while spatial attention highlights localized anomalies and boundary-consistent patterns. Qualitative inspection and error auditing indicate that CBAM reduces false positives in areas characterized by seasonal bare soil. In cases where the baseline model frequently labels harvested fields as abnormal bare ground, the attention-augmented model more consistently leverages surrounding context and structural continuity to maintain the correct “compliant farmland” classification.

3.3. Qualitative Visualization and Scene Analysis

To complement the quantitative results in Table 1 and the preceding ablation analysis, qualitative inspections were conducted on three representative subregions in Mancheng District under identical zero-shot conditions. Figure 4 presents GF-2 RGB image chips alongside anomaly masks generated by six models, none of which were fine-tuned on local data. The scenes were selected as representative examples of recurring challenges in PBF monitoring, rather than as isolated favourable cases. Specifically, the selection was based on three typical sources of qualitative difficulty in this task: inter-parcel adhesion under weak boundary cues, small anthropogenic structures embedded within cropland, and spectrally ambiguous agricultural surfaces such as seasonal bare soil and plastic mulching.

In Figure 4a, baseline models exhibit clear limitations at the cropland–settlement interface, where weak boundary cues lead to parcel adhesion. PSPNet merges adjacent parcels with similar spectral characteristics, while U-Net produces fragmented boundaries and scattered false positives. DeepLabv3+ improves semantic consistency but at the cost of boundary precision, and SegFormer tends to over-smooth fine parcel structures. Although HRNet preserves spatial resolution, it yields irregular delineation under low-contrast conditions. In contrast, DeepLabPro produces more continuous parcel contours and clearer inter-parcel separation.

Figure 4b represents a typical case involving small anthropogenic structures embedded within cropland. Most baseline models generate fragmented detections and exhibit leakage into adjacent compliant areas. SegFormer recovers building footprints more completely than convolutional baselines, but boundary ambiguity remains. DeepLabPro produces more compact and well-defined masks, improving the delineation of small targets.

Figure 4c represents a spectrally ambiguous surface condition, in which impervious features are difficult to distinguish from surrounding agricultural backgrounds. Under weak vegetation signals, baseline models show elevated false-positive rates, particularly SegFormer and DeepLabv3+. DeepLabPro reduces these background-driven errors more consistently while maintaining stable predictions over compliant cropland.

While Figure 4 provides an overall comparison across models, additional insight can be gained from a finer-scale analysis. Figure 5 therefore presents zoomed-in views of three representative regions of interest from a typical scene, including one failure case and two improvement cases. Figure 5b illustrates a failure case in a narrow and elongated built-up region, where the target is only partially detected, resulting in incomplete delineation. This type of slender structure appears challenging for all methods, although some baseline models show slightly better continuity in this instance, suggesting that elongated objects with weak or fragmented spatial signatures remain difficult to capture reliably. By contrast, Figure 5c,d demonstrate two improvement cases, in which the proposed method preserves the geometric structure of elongated greenhouse targets more effectively than SegFormer and HRNet, producing more coherent predictions and reducing confusion in spectrally complex backgrounds. Overall, these local comparisons indicate improved boundary consistency and spatial coherence under challenging conditions, while also revealing a limitation in handling narrow and elongated artificial structures.

3.4. Temporal Consistency and Persistence Analysis

To explore whether cross-year agreement in model outputs could help distinguish higher-priority anomaly candidates from potentially transient surface conditions, the trained model was independently applied to GF-2 imagery from 2021, 2022, and 2023 over Mancheng District. Figure 6 presents the predicted binary anomaly maps alongside the corresponding RGB image chips, with 2022 serving as the primary evaluation reference aligned with the PBF boundary designation year. It should be noted that this procedure constitutes a post hoc consistency check applied to the outputs of a single-date screening model, rather than an independently validated temporal assessment based on year-specific ground-truth labels.

The three-year sequence appears visually consistent with a gradual rather than abrupt pattern in the highlighted area. In 2021, parcels within the highlighted region display surface conditions broadly consistent with active cultivation, with only marginal anomalous signals at parcel edges. By 2022, the affected extent has expanded, with hardened surfaces and structural elements becoming spatially coherent and spectrally distinct from surrounding compliant cropland. By 2023, the encroachment footprint has consolidated further, exhibiting the morphological regularity characteristic of permanent occupation. This apparent cross-year expansion is compatible with incremental construction or surface hardening, but it may also be influenced by seasonal surface variability or model uncertainty and should therefore be interpreted cautiously.

The cross-year overlay also provides a practical basis for reducing phenology-driven false positives. Detections appearing in only one acquisition, likely reflecting post-harvest bare soil, temporary mulching, or seasonal fallow, are assigned lower verification priority, while anomalies that persist or expand across years are escalated as higher-priority candidates for field investigation. Under this protocol, areas showing repeated anomaly signals between 2021 and 2022 would be elevated for earlier inspection attention than areas flagged only once. This should be understood as prioritization support within an operational workflow, rather than as formal temporal confirmation of unlawful land conversion.

3.5. Cross-Regional Generalization Analysis

To assess the generalization capability of the proposed framework, zero-shot inference was conducted on seven regions from the OpenEarthMap dataset under a binary cropland/non-cropland evaluation setting. This remapping was introduced to align the external benchmark with the anomaly-screening formulation adopted in this study.

Quantitative results are summarized in Table 3. Overall, the model maintained a relatively stable screening capacity across regions, achieving an average Recall of 66.91%, an average Precision of 76.39%, an average F1-score of 69.56%, and an average IoU of 54.94%. High Recall values were observed in Baybay, Świętokrzyskie, and Mahe, suggesting that the framework may retain relatively strong sensitivity under several external regional conditions. However, these values should be interpreted cautiously, because they may reflect not only model robustness under domain shift and regional differences in class structure but also differences in regional class composition, cropland prevalence, and class separability after binary label remapping. This characteristic is particularly important for compliance-oriented screening tasks, in which missed detections are operationally more costly than false positives.

Greater variability is observed in IoU and F1-score. Specifically,, lower IoU values in Münster, Lohur, and Lambayeque suggest reduced overlap accuracy, which likely reflects a combination of domain shift and differences in regional class structure after binary remapping. These factors may include more heterogeneous non-cropland categories, varying cropland-to-background ratios, and weaker boundary separability between cultivated and non-cultivated surfaces. These regions are characterised by fragmented land-use patterns, arid or semi-arid surface conditions, and increased spectral ambiguity, all of which complicate the separation of compliant cropland from anomalous areas. Precision shows a similar degree of variation. For example, Mahe achieves high recall but relatively low precision, a pattern that may reflect increased background confusion in a complex island environment as well as the broader within-class heterogeneity introduced by collapsing multiple land-cover categories into a single non-cropland class. By contrast, Poland and the Philippines achieve both high precision and recall, which may reflect closer alignment between training data characteristics and target-domain conditions.

Compared with the results obtained on the Mancheng dataset, the local mIoU of 42.23% appears relatively moderate. This is likely related to the greater complexity of the Mancheng study area, where fragmented parcels, mixed land-use patterns, and strong spectral ambiguity increase boundary uncertainty and make precise spatial overlap more difficult to achieve. In addition, the proposed framework is explicitly designed as a recall-oriented screening tool rather than a model optimised for global overlap metrics, which further helps explain this difference. Overall, the results indicate that the proposed pipeline maintains stable generalization across diverse conditions. Although performance is affected by domain shift and regional heterogeneity, the emphasis on recall supports broader anomaly coverage, in line with the practical requirements of operational cropland monitoring and sustainable land management.

4. Discussion

4.1. Interpretation of Results and Mechanistic Insights

This study reframes PBF monitoring by shifting from comprehensive land-cover classification to targeted anomaly screening. While absolute performance values remain moderate, they are consistent with the inherent difficulty of detecting small, fragmented, and visually ambiguous anomalies under zero-shot cross-regional conditions. This challenge is particularly pronounced because the model is applied to a northern Chinese peri-urban agricultural landscape without exposure to target-domain data during training.

Rather than providing definitive identification of illegal land conversion, the model is best understood as a probabilistic screening instrument within a detection–localisation–verification workflow. The achieved Recall indicates that a substantial proportion of potential encroachments enters the scope of expert review and field verification, representing a structured expansion of detection coverage compared with conventional patrol-based inspection constrained by limited personnel, access, and budgets. It bears noting that undetected anomalies constitute a residual risk that field verification workflows must account for; the framework is intended to complement rather than replace human judgment in enforcement decisions.

Prioritising recall over global overlap metrics reflects the asymmetric cost structure of cropland protection. Missed detections may lead to irreversible land-use conversion, whereas false positives primarily result in additional inspection effort. The comparison with SegFormer illustrates this trade-off: although SegFormer achieves higher mIoU and precision, its lower recall indicates reduced sensitivity to anomalous patterns. In contrast, the proposed pipeline achieves higher recall while maintaining comparable precision, which is more consistent with the requirements of compliance-oriented screening. This behaviour is further reinforced by the Focal Tversky loss, which emphasises minority-class detection [35]. Accordingly, the model is better interpreted as a high-sensitivity early warning screener rather than a segmentation approach optimised solely for benchmark metrics.

Results obtained across multiple regions further support this interpretation. Even without target-domain fine-tuning, recall remains relatively stable, indicating consistent sensitivity to anomalous land-use patterns under varying surface conditions. In contrast, variations in IoU and precision reflect the effects of domain shift, including differences in land-use configuration, surface complexity, and spectral ambiguity. Such variation should be understood as a consequence of applying a single model across heterogeneous environments, rather than as evidence of instability. This distinction is important when interpreting zero-shot deployment across regions with different surface conditions and label characteristics. In several cases, performance in external regions exceeds that observed in the Mancheng study area. This difference is likely due to the higher complexity of the local dataset, where fragmented parcels and mixed land-use patterns introduce greater boundary uncertainty, whereas some external regions exhibit clearer class separability and more regular field structures.

The incorporation of multi-temporal observations provides an additional interpretive layer. Anomalies detected consistently across multiple years are more likely to represent persistent encroachments, whereas single-year detections are often associated with transient conditions such as seasonal bare soil or short-term disturbance. This distinction supports prioritisation within inspection workflows. The temporal sequence also suggests that certain encroachments evolve gradually, indicating that repeated annual application may enable earlier intervention. However, this temporal component operates only as a post hoc filtering and prioritization step, as each acquisition is processed independently without explicit modelling of temporal dynamics. Because the 2021 and 2023 images were not annotated separately, cross-year consistency should not be interpreted as strict temporal validation. Repeated detections may indicate a more stable anomaly signal, but seasonal variability, phenological effects, and year-specific prediction errors cannot be fully excluded.

The observed performance can be attributed to task-oriented design choices at both the loss and architectural levels. The composite loss function addresses class imbalance by placing greater emphasis on false-negative errors, while the auxiliary edge branch enhances boundary delineation in fragmented parcel structures and reduces inter-parcel adhesion. These effects are reflected in the qualitative results (Figure 5), where improved boundary continuity is observed in complex scenes, alongside remaining limitations in detecting narrow and elongated structures. Together, these components improve the localisation of small and ambiguous targets. They are best understood as task-specific engineering adaptations within a regulatory context, rather than as general-purpose architectural innovations; the main contribution of this study lies in task formulation, workflow design, and alignment with operational requirements.

4.2. Reconciling Accuracy and Operational Practicality

Despite moderate absolute accuracy, the proposed framework demonstrates practical relevance through its alignment with existing enforcement workflows. Constraining model outputs to officially designated PBF boundaries excludes areas outside the regulatory scope and concentrates attention on legally relevant parcels, potentially narrowing the inspection scope and reducing extraneous verification effort, though this efficiency gain depends on the availability, accuracy, and currency of boundary data.

The observed Recall level supports the framework’s utility as a first-pass screening tool. While not exhaustive, it provides a more structured basis for inspection prioritisation than ad hoc field patrols. This contribution should be understood within a regulatory setting in which formal farmland protection requirements are strong, but local implementation may remain uneven, creating a practical need for tools that support more systematic and less subjective inspection prioritisation.

The practical significance of these metrics becomes clearer when situated within the actual verification workflow. The reported Precision of 67.92% means that roughly one in three flagged parcels may ultimately be false positives. This trade-off reflects the intended design of the framework as a recall-oriented screening tool: it prioritises sensitivity to genuine encroachments over conservative prediction, accepting a moderate false-positive burden to ensure that potentially consequential violations do not escape initial detection.

This design choice aligns with operational constraints in Mancheng District. The jurisdiction’s 14,780 PBF parcels across 13,720 hectares are subject to routine inspection, with suspected violations forwarded to a dedicated verification team of 18 personnel for on-site confirmation. Under this workflow, false positives remain operationally acceptable provided they do not exceed the team’s verification capacity. The framework supports this constraint through two mechanisms: predictions are spatially restricted to officially designated PBF boundaries, ensuring outputs remain directly aligned with regulatory scope, and multi-temporal cross-referencing distinguishes persistent anomalies from transient surface conditions, allowing the verification team to prioritize high-confidence candidates while reviewing lower-confidence detections as capacity permits. Consequently, the observed precision level is compatible with the practical demands of county-level enforcement. Although the exact number of flagged parcels will vary with acquisition dates and local surface conditions, the outputs are intended to serve as a manageable shortlist for follow-up verification rather than as a comprehensive substitute for routine inspection.

Binary anomaly outputs reduce interpretation complexity and facilitate incorporation into GIS-based inspection platforms. The boundary-aware pipeline configuration may support clearer identification of encroachment extents, aiding verification consistency. Compared with architectures emphasising global context aggregation, the adapted approach better preserves local boundary precision, which is advantageous for detecting small and spatially constrained anomalies, while its convolutional architecture offers a practical balance between robustness and computational efficiency more compatible with county-level resource constraints than the Transformer-based baseline evaluated here [36,37,38]. These features should therefore be understood as deployment-oriented advantages rather than as architectural contributions in themselves.

4.3. Limitations, Implications for Sustainable Land Governance, and Future Directions

This study reframes PBF monitoring by shifting from comprehensive land-cover classification to focused binary anomaly screening, generating signals that can support earlier identification of unauthorised structures and surface hardening within protected farmland. Although earlier detection may help mitigate cumulative land degradation, its contribution to broader sustainability objectives, including SDG 2 and SDG 15, remains indirect and contingent on institutional implementation rather than technical performance alone.

The practical value of this approach is closely linked to its institutional context. Screening outputs can strengthen the informational basis for regulatory decisions, but their effectiveness ultimately depends on administrative processes, accountability arrangements, and local implementation capacity. The framework is therefore more appropriately understood as a preliminary screening tool than as a definitive adjudication system. Given moderate model performance and the presence of false positives under complex surface conditions, anomaly maps highlight locations that deviate from expected cultivation patterns but do not provide conclusive evidence of violations. This limitation is particularly relevant in situations involving seasonal bare soil, post-harvest fallow, or temporary agricultural structures, where visual ambiguity is pronounced. Accordingly, anomaly detections should be interpreted as probabilistic signals requiring institutional validation.

Model performance is further shaped by contextual factors not explicitly represented in the data. Socioeconomic pressures, land-use policies, soil conditions, cultivation practices, seasonal dynamics, and climatic variability all influence surface appearance and may obscure the boundary between compliant and anomalous states. Together, these factors show that land sustainability is shaped by interacting human and environmental processes, which cannot be fully explained by surface observations alone. The proposed framework captures observable anomalies but does not account for underlying processes such as economic incentives, policy enforcement variability, or long-term climate pressures, and should therefore be regarded as addressing the monitoring dimension of land governance rather than the full system dynamics.

The multi-temporal cross-referencing strategy partially alleviates these limitations by comparing acquisitions from different years to distinguish persistent anomalies from transient surface conditions. This improves verification prioritisation and reduces phenology-related false positives. However, it remains a post hoc filtering step rather than an integrated temporal modelling approach, as it does not capture intra-annual dynamics or transition probabilities and depends on the availability and consistency of multi-year imagery.

Additional constraints arise from heterogeneity in ground-truth definitions and class distributions across datasets and jurisdictions. This issue is particularly relevant for the OpenEarthMap-based cross-regional evaluation, where the original multiclass annotation scheme was reduced to a binary cropland/non-cropland setting. Although this remapping improves task alignment, it also compresses region-specific land-cover diversity into a single background class, which may affect the comparability of Recall and IoU across regions. Labelling conventions for ambiguous cases, such as temporary facilities or semi-hardened surfaces, vary across annotation contexts, introducing inconsistencies that cannot be fully resolved through quality control. This issue is compounded by class imbalance, as cropland-to-anomaly ratios differ across regions and may reduce sensitivity to minority-class anomalies under shifted distributions. As a result, systematic bias may emerge in cross-regional deployment, particularly where regulatory interpretations of violations differ from training conditions.

Future work should focus on several directions. Incorporating bi-temporal or time-series inputs would improve the discrimination between transient and persistent land-use changes [39,40,41]. Integrating synthetic aperture radar and digital elevation model data may enhance robustness under varying environmental and terrain conditions [42]. Domain adaptation approaches are also needed to support reliable deployment across diverse agricultural systems and annotation conventions [43,44,45]. Beyond these technical advances, closer integration with socioeconomic and policy data will be necessary to better capture the drivers of land system change. Linking anomaly detection outputs with administrative workflows, including inspection records and enforcement feedback, will be essential for assessing whether improvements in technical screening can translate into measurable governance outcomes.

5. Conclusions

This work reframes PBF monitoring as boundary-constrained anomaly screening rather than comprehensive land-cover classification, with its primary contribution lying in task-level reformulation and alignment with operational enforcement needs rather than architectural novelty. The proposed edge-augmented DeepLabv3+ framework improves minority-class sensitivity and boundary delineation within legally defined parcels, achieving a Recall of 0.6125 and enabling broader identification of candidate anomalies for field verification, although limitations remain in detecting narrow and elongated artificial structures. Cross-regional evaluation further demonstrates that the framework maintains stable sensitivity to anomalous patterns under heterogeneous environmental conditions without target-domain fine-tuning, supporting its applicability beyond the training domain. In addition, multi-temporal analysis shows that cross-year consistency can help distinguish persistent encroachments from transient agricultural conditions without additional model training.

These findings suggest that PBF monitoring should shift from exhaustive land-cover classification toward recall-oriented anomaly screening that supports inspection prioritisation. The approach is most appropriate in contexts where up-to-date boundary data and annual optical imagery are available, particularly where inspection capacity is limited relative to the monitoring extent, as it can serve as an effective first-pass screening tool. Its applicability is reduced in settings where boundary data are incomplete, image availability is constrained, or precise quantification of violations is required. Future work should focus on improving temporal modelling, strengthening cross-regional robustness, and integrating screening outputs with governance processes in operational settings.

Author Contributions

Conceptualization, J.W. and J.H.; methodology, J.W. and Y.W.; software, J.W. and J.C.; validation, J.W., Y.W. and N.W.; formal analysis, J.W. and J.C.; investigation, J.W., Y.W., W.R. and N.W.; resources, C.G., W.R. and N.W.; data curation, J.W. and J.C.; writing—original draft preparation, J.W.; writing—review and editing, J.W., Y.W., J.C., C.G., W.R., N.W. and J.H.; visualization, J.W. and Y.W.; supervision, J.H.; project administration, J.H. and N.W.; funding acquisition, J.H. and N.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Central Guiding Local Science and Technology Development Fund Project (246Z7402G), the Science Research Project of Hebei Education Department (CXZX2026003, KY2025037), and the Hebei Provincial High-Level Talent Funding Project (C2024106).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Due to privacy, data are available upon request from the corresponding author. The data are not publicly available because of the group’s confidentiality policy.

Acknowledgments

The authors would like to thank all participants for their time and cooperation. The authors also thank Hebei Agricultural University for institutional support and the data collection organizers for their coordination and assistance during data acquisition. During manuscript preparation, the authors used the generative AI tool Gemini 3 (Google) exclusively for figure preparation and minor language editing. The tool was not used for data analysis, model development, result generation, or scientific writing. All content was critically reviewed and verified by the authors.

Conflicts of Interest

Caiyun Gao is employed by Baoding Trued Land Management Technology Service Co., Ltd. The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PBF	Permanent Basic Farmland
GF-2	Gaofen-2
UN	United Nations
RGB	Red, Green, Blue
PSPNet	Pyramid Scene Parsing Network
CE	Cross-Entropy
ASPP	Atrous Spatial Pyramid Pooling
IoU	Intersection over Union
mIoU	Mean Intersection over Union
OA	Overall Accuracy
CBAM	Convolutional Block Attention Module
U-Net	U-shaped Convolutional Network
ResNeXt	Aggregated Residual Network
GIS	Geographic Information System
SDGs	Sustainable Development Goals

References

United Nations General Assembly. Transforming Our World: The 2030 Agenda for Sustainable Development. United Nations. 2015. Available online: https://sdgs.un.org/2030agenda (accessed on 13 January 2025).
Huang, Q.; Liu, Z.; He, C.; Gou, S.; Bai, Y.; Wang, Y.; Shen, M. The occupation of cropland by global urban expansion from 1992 to 2016 and its implications. Environ. Res. Lett. 2020, 15, 084037. [Google Scholar] [CrossRef]
D’Acunto, F.; Marinello, F.; Pezzuolo, A. Rural land degradation assessment through remote sensing: Current technologies, models, and applications. Remote Sens. 2024, 16, 3059. [Google Scholar] [CrossRef]
Guo, A.; Yue, W.; Yang, J.; Xue, B.; Xiao, W.; Li, M.; He, T.; Zhang, M.; Jin, X.; Zhou, Q. Cropland abandonment in China: Patterns, drivers, and implications for food security. J. Clean. Prod. 2023, 418, 138154. [Google Scholar] [CrossRef]
Long, S.; Cai, E.; Li, L.; Xie, F.; Lai, S.; Hu, H.; Li, Y.; Jing, Y. Structural changes, transfer trajectories, and driving forces of non-grain farmland in the main grain producing areas of central China. J. Environ. Manag. 2025, 393, 127029. [Google Scholar] [CrossRef]
Tu, Y.; Chen, B.; Yu, L.; Song, Y.; Wu, S.; Li, M.; Wei, H.; Chen, T.; Lang, W.; Gong, P.; et al. Unraveling the nexus between urban expansion and cropland loss in China. Landsc. Ecol. 2023, 38, 1869–1884. [Google Scholar] [CrossRef]
Wang, N.; Hao, J.; Zhang, L.; Duan, W.; Shi, Y.; Zhang, J.; Paruke, W. Basic farmland protection system in China: Changes, conflicts and prospects. Agronomy 2023, 13, 651. [Google Scholar] [CrossRef]
Seifollahi-Aghmiuni, S.; Kalantari, Z.; Egidi, G.; Gaburova, L.; Salvati, L. Urbanisation-driven land degradation and socioeconomic challenges in peri-urban areas: Insights from Southern Europe. Ambio 2022, 51, 1446–1458. [Google Scholar] [CrossRef] [PubMed]
Su, H.; Liu, F.; Zhang, H.; Ma, X.; Sun, A. Progress and prospects of non-grain production of cultivated land in China. Sustainability 2024, 16, 3517. [Google Scholar] [CrossRef]
Liu, Y.; Shen, G.; He, T. Cropping and transformation features of non-grain cropland in mainland China and policy implications. Land 2025, 14, 561. [Google Scholar] [CrossRef]
Perich, G.; Turkoglu, M.O.; Graf, L.V.; Wegner, J.D.; Aasen, H.; Walter, A.; Liebisch, F. Pixel-based yield mapping and prediction from Sentinel-2 using spectral indices and neural networks. Field Crops Res. 2023, 292, 108824. [Google Scholar] [CrossRef]
Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
Delaney, B.; Tansey, K.; Whelan, M. Satellite Remote Sensing Techniques and Limitations for Identifying Bare Soil. Remote Sens. 2025, 17, 630. [Google Scholar] [CrossRef]
Decuyper, M.; Chávez, R.O.; Lohbeck, M.; Lastra, J.A.; Tsendbazar, N.; Hackländer, J.; Herold, M.; Vågen, T.-G. Continuous monitoring of forest change dynamics with satellite time series. Remote Sens. Environ. 2022, 269, 112829. [Google Scholar] [CrossRef]
Ebrahimi, S.; Kumar, S. Semantic segmentation for simultaneous crop and land cover land use classification using multi-temporal Landsat imagery. Remote Sens. Appl. Soc. Environ. 2025, 37, 101505. [Google Scholar] [CrossRef]
Hester, D.; Martins, V.S.; Ferreira, L.B.; Lima, T.M.A. Learning with less: Label-efficient land cover classification at very high spatial resolution using self-supervised deep learning. Sci. Remote Sens. 2026, 13, 100397. [Google Scholar] [CrossRef]
Jocea, A.F.; Porumb, L.; Necula, L.; Raducanu, D. Sentinel-2 Land Cover Classification: State-of-the-Art Methods and the Reality of Operational Deployment—A Systematic Review. Sustainability 2025, 17, 10324. [Google Scholar] [CrossRef]
Naumann, A.; Gedicke, S.; Haunert, J.H. A scalable matching approach for the comparison of agricultural land use maps based on corresponding field polygons. Int. J. Digit. Earth 2026, 19, 2632420. [Google Scholar] [CrossRef]
Sykas, D.; Sdraka, M.; Zografakis, D.; Papoutsis, I. A Sentinel-2 multiyear, multicountry benchmark dataset for crop classification and segmentation with deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3323–3339. [Google Scholar] [CrossRef]
Guo, L.; Xi, X.; Yang, W.; Liang, L. Monitoring Land Use/Cover Change Using Remotely Sensed Data in Guangzhou of China. Sustainability 2021, 13, 2944. [Google Scholar] [CrossRef]
Peng, D.; Liu, X.; Zhang, Y.; Guan, H.; Li, Y.; Bruzzone, L. Deep learning change detection techniques for optical remote sensing imagery: Status, perspectives and challenges. Int. J. Appl. Earth Obs. Geoinf. 2025, 136, 104282. [Google Scholar] [CrossRef]
Homer, C.; Dewitz, J.; Jin, S.; Xian, G.; Costello, C.; Danielson, P.; Gass, L.; Funk, M.; Wickham, J.; Stehman, S.; et al. Conterminous United States land cover change patterns 2001–2016 from the 2016 National Land Cover Database. ISPRS J. Photogramm. Remote Sens. 2020, 162, 184–199. [Google Scholar] [CrossRef] [PubMed]
Souverijns, N.; Buchhorn, M.; Horion, S.; Fensholt, R.; Verbeeck, H.; Verbesselt, J.; Herold, M.; Tsendbazar, N.-E.; Bernardino, P.N.; Somers, B.; et al. Thirty years of land cover and fraction cover changes over the Sudano-Sahel using Landsat time series. Remote Sens. 2020, 12, 3817. [Google Scholar] [CrossRef]
Du, Z.; Yu, L.; Chen, X.; Gao, B.; Yang, J.; Fu, H.; Gong, P. Land use/cover and land degradation across the Eurasian steppe: Dynamics, patterns and driving factors. Sci. Total Environ. 2024, 909, 168593. [Google Scholar] [CrossRef] [PubMed]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; pp. 833–851. [Google Scholar] [CrossRef]
Kanjir, U.; Đurić, N.; Veljanovski, T. Sentinel-2 Based Temporal Detection of Agricultural Land Use Anomalies in Support of Common Agricultural Policy Monitoring. ISPRS Int. J. Geo-Inf. 2018, 7, 405. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks; Vanschoren, J., Yeung, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 1, Available online: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/4e732ced3463d06de0ca9a15b6153677-Abstract-round2.html (accessed on 20 April 2026).
Xia, J.; Yokoya, N.; Adriano, B.; Broni-Bediako, C. OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); Zenodo: Geneva, Switzerland, 2022. [Google Scholar] [CrossRef]
Abraham, N.; Khan, N.M. A novel focal Tversky loss function with improved attention U-Net for lesion segmentation. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 683–687. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, Z.; Zheng, T.; Tian, C.; Dong, W. PSNet: A universal algorithm for multispectral remote sensing image segmentation. Remote Sens. 2025, 17, 563. [Google Scholar] [CrossRef]
Zhou, Z.; Zheng, C.; Liu, X.; Tian, Y.; Chen, X.; Chen, X.; Dong, Z. A dynamic effective class balanced approach for remote sensing imagery semantic segmentation of imbalanced data. Remote Sens. 2023, 15, 1768. [Google Scholar] [CrossRef]
He, X.; Zhou, Y.; Liu, B.; Zhao, J.; Yao, R. Remote sensing image semantic segmentation via class-guided structural interaction and boundary perception. Expert Syst. Appl. 2024, 252, 124019. [Google Scholar] [CrossRef]
Duan, S.; Zhao, J.; Huang, X.; Zhao, S. Semantic Segmentation of Remote Sensing Data Based on Channel Attention and Feature Information Entropy. Sensors 2024, 24, 1324. [Google Scholar] [CrossRef]
Salehi, S.S.M.; Erdogmus, D.; Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In Machine Learning in Medical Imaging; Lecture Notes in Computer Science; Wang, Q., Shi, Y., Suk, H.-I., Suzuki, K., Eds.; Springer: Cham, Switzerland, 2017; Volume 10541, pp. 379–387. [Google Scholar] [CrossRef]
Voelsen, M.; Rottensteiner, F.; Heipke, C. Transformer models for land cover classification with satellite image time series. J. Photogramm. Remote Sens. Geoinf. Sci. 2024, 92, 547–568. [Google Scholar] [CrossRef]
Boulila, W.; Ghandorh, H.; Masood, S.; Alzahem, A.; Koubaa, A.; Ahmed, F.; Khan, Z.; Ahmad, J. A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing. Heliyon 2024, 10, e29396. [Google Scholar] [CrossRef] [PubMed]
Panboonyuen, T.; Charoenphon, C.; Satirapod, C. MeViT: A medium-resolution vision transformer for semantic segmentation on Landsat satellite imagery for agriculture in Thailand. Remote Sens. 2023, 15, 5124. [Google Scholar] [CrossRef]
Qi, Z.; Gar-On Yeh, A.; Li, X.; Liu, X. A land clearing index for high-frequency unsupervised monitoring of land development using multi-source optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2022, 187, 393–421. [Google Scholar] [CrossRef]
Sun, Z.; Zhong, Y.; Wang, X.; Zhang, L. Identifying cropland non-agriculturalization with high representational consistency from bi-temporal high-resolution remote sensing images: From benchmark datasets to real-world application. ISPRS J. Photogramm. Remote Sens. 2024, 212, 454–474. [Google Scholar] [CrossRef]
Wang, H.; Li, X.; Huo, L.; Hu, C. Global and edge enhanced transformer for semantic segmentation of remote sensing. Appl. Intell. 2024, 54, 5658–5673. [Google Scholar] [CrossRef]
Cao, Z.; Kooistra, L.; Wang, W.; Guo, L.; Valente, J. Real-Time Object Detection Based on UAV Remote Sensing: A Systematic Literature Review. Drones 2023, 7, 620. [Google Scholar] [CrossRef]
Wang, S.; Zuo, Z.; Yan, S.; Zeng, W.; Pang, S. A Novel Global-Local Feature Aggregation Framework for Semantic Segmentation of Large-Format High-Resolution Remote Sensing Images. Appl. Sci. 2024, 14, 6616. [Google Scholar] [CrossRef]
Yang, J.; Chen, G.; Huang, J.; Ma, D.; Liu, J.; Zhu, H. GLE-net: Global-local information enhancement for semantic segmentation of remote sensing images. Sci. Rep. 2024, 14, 25282. [Google Scholar] [CrossRef] [PubMed]
Saidi, S.; Idbraim, S.; Karmoude, Y.; Masse, A.; Arbelo, M. Deep-Learning for Change Detection Using Multi-Modal Fusion of Remote Sensing Images: A Review. Remote Sens. 2024, 16, 3852. [Google Scholar] [CrossRef]

Figure 1. Geographic location and land-use distribution of the study area.

Figure 2. Overview of the adapted DeepLabv3+-based screening architecture used in this study.

Figure 3. Non-agricultural facility highlighted in the cyan box. Green areas indicate permanent basic farmland, red areas indicate predicted non-agricultural occupancy, and the cyan box marks the facility area used for visual comparison. (a) Baseline (DeepLabv3+), (b) model without the Edge Branch, and (c) model without CBAM: the baseline and ablated variants fail to capture the full spatial extent of the facility, producing fragmented and spatially truncated masks that only partially delineate the structure. (d) Proposed DeepLabPro: by integrating the Dual Edge Branch and CBAM, the model better preserves the structural integrity of the encroachment, producing a more coherent and structurally consistent prediction.

Figure 4. Qualitative comparison of zero-shot anomaly-screening results on three representative GF-2 subregions from Mancheng District. The scenes were selected to reflect recurring challenges in PBF monitoring, including boundary adhesion under weak boundary cues, small anthropogenic structures embedded within cropland, and spectrally ambiguous surfaces. For each example (a–c), columns present the input RGB image chip followed by predictions from DeepLabPro, PSPNet, U-Net, DeepLabv3+, SegFormer, and HRNet, none of which were fine-tuned on local data.

Figure 5. Qualitative comparison with zoomed-in regions of interest (ROIs) in a representative scene. (a) Full image with highlighted ROIs; (b–d) enlarged views of three representative regions. The first row shows the original imagery, while subsequent rows present predictions from DeepLabPro (Ours), SegFormer, and HRNet. ROI-1 (red box) illustrates a typical failure case, where baseline models exhibit fragmented predictions and boundary ambiguity. ROI-2 and ROI-3 (green boxes) demonstrate improved delineation by DeepLabPro, with more coherent masks and clearer boundary preservation.

Figure 6. Temporal comparison of DeepLabPro anomaly screening outputs for a representative Mancheng District subregion in 2021, 2022, and 2023. The highlighted area shows recurrent anomaly signals across annual predictions. For operational prioritization, locations flagged in two or more years are assigned higher verification priority; this consistency rule is heuristic and should not be interpreted as strict temporal validation or proof of ongoing illegal development.

Table 1. Quantitative comparison of segmentation performance on the Mancheng PBF dataset.

Model Architecture	mIoU (%)	OA (%)	F1-Score (%)	Precision (%)	Recall (%)
PSPNet	29.65	60.57	42.10	69.40	44.87
U-Net	40.71	66.23	56.27	63.61	56.56
DeepLabv3+	41.24	67.54	54.81	69.00	51.01
SegFormer	43.21	68.15	59.62	70.58	56.05
HRNet	38.61	65.43	52.44	69.42	49.78
DeepLabPro (Ours)	42.23	67.94	57.08	67.92	61.25

Table 2. Ablation study of the proposed Dual-Stream DeepLabPro on the Mancheng dataset.

Model Variant	OA (%)	F1-Score (%)	mIoU (%)	Δ mIoU (Percentage Points)
DeepLabPro	67.94	57.08	42.23	-
Model without CBAM	68.29	56.66	42.10	−0.13
Model without Edge Branch	63.59	54.99	40.10	−2.13
Model with standard cross-entropy loss	58.95	50.26	36.13	−6.10

Table 3. Cross-regional generalization performance.

Region	Precision (%)	F1-Score (%)	IoU (%)	Recall (%)
Baybay (Philippines)	89.58	89.00	80.18	88.43
Świętokrzyskie (Poland)	90.89	88.25	78.97	85.76
Münster (Germany)	67.85	58.14	40.98	50.86
Mahe (Seychelles)	53.55	66.75	50.10	88.59
Lohur (Pakistan)	73.65	55.80	38.70	44.92
Lambayeque (Peru)	81.39	59.09	41.94	46.39
Köln (Germany)	77.81	69.86	53.68	63.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Wang, Y.; Cheng, J.; Gao, C.; Rong, W.; Wang, N.; Hu, J. Operational Anomaly Screening in Permanent Basic Farmland Using Optimized Remote Sensing Semantic Segmentation: Implications for Sustainable Land Stewardship. Sustainability 2026, 18, 4292. https://doi.org/10.3390/su18094292

AMA Style

Wang J, Wang Y, Cheng J, Gao C, Rong W, Wang N, Hu J. Operational Anomaly Screening in Permanent Basic Farmland Using Optimized Remote Sensing Semantic Segmentation: Implications for Sustainable Land Stewardship. Sustainability. 2026; 18(9):4292. https://doi.org/10.3390/su18094292

Chicago/Turabian Style

Wang, Jianwen, Yujie Wang, Jiahao Cheng, Caiyun Gao, Wei Rong, Nan Wang, and Jian Hu. 2026. "Operational Anomaly Screening in Permanent Basic Farmland Using Optimized Remote Sensing Semantic Segmentation: Implications for Sustainable Land Stewardship" Sustainability 18, no. 9: 4292. https://doi.org/10.3390/su18094292

APA Style

Wang, J., Wang, Y., Cheng, J., Gao, C., Rong, W., Wang, N., & Hu, J. (2026). Operational Anomaly Screening in Permanent Basic Farmland Using Optimized Remote Sensing Semantic Segmentation: Implications for Sustainable Land Stewardship. Sustainability, 18(9), 4292. https://doi.org/10.3390/su18094292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Operational Anomaly Screening in Permanent Basic Farmland Using Optimized Remote Sensing Semantic Segmentation: Implications for Sustainable Land Stewardship

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Sources

2.3. Cross-Regional Evaluation Design

2.4. Methods

2.4.1. Task-Oriented Adaptation of the Baseline Architecture

2.4.2. Auxiliary Edge Branch Configuration

2.4.3. Composite Loss Function

3. Results

3.1. Quantitative Performance Assessment of the Segmentation Network

3.2. Ablation Study: Effectiveness of Key Components

3.2.1. Enhancement of Structural Integrity in Encroachment Masks

3.2.2. Background Confusion Suppression via Attention Refinement

3.3. Qualitative Visualization and Scene Analysis

3.4. Temporal Consistency and Persistence Analysis

3.5. Cross-Regional Generalization Analysis

4. Discussion

4.1. Interpretation of Results and Mechanistic Insights

4.2. Reconciling Accuracy and Operational Practicality

4.3. Limitations, Implications for Sustainable Land Governance, and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI