DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection

Lan, Mengyao; Huang, Bangjun; Wu, Peng

doi:10.3390/agronomy16100964

Open AccessArticle

DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection

by

Mengyao Lan

¹,

Bangjun Huang

^2,* and

Peng Wu

^1,*

¹

College of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China

²

School of Software, Quanzhou University of Information Engineering, Quanzhou 362000, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2026, 16(10), 964; https://doi.org/10.3390/agronomy16100964 (registering DOI)

Submission received: 24 March 2026 / Revised: 4 May 2026 / Accepted: 6 May 2026 / Published: 12 May 2026

(This article belongs to the Special Issue Remote Sensing and GIS in Sustainable and Precision Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Farmland resources are fundamental to human survival and play a vital role in ensuring global food security. However, farmland boundary detection remains a significant technical challenge due to the low proportion of boundary pixels, multi-scale variations, and weak boundary continuity. To address these issues, this study proposes DCFENet, a dual-branch collaborative feature enhancement network. Specifically, a multi-scale feature fusion attention module TA-ASPP (Task-Aware Atrous Spatial Pyramid Pooling) is designed, which effectively enhances the network’s perception of farmland boundary features by integrating multi-scale dilated convolutions with skeleton-aware attention. In addition, a dual-branch decoding structure is proposed to enhance boundary localization and global topology modeling through boundary-aware gating and cross-branch feature fusion, thereby improving the boundary continuity. Furthermore, a collaborative constraint mechanism is proposed for dual-branch decoding, which supervises the two decoders using boundary loss and skeleton loss, thereby enhancing structural consistency and topology preservation. Experimental results demonstrate that DCFENet achieves precision, recall, and boundary IoU of 74.5%, 68.1%, and 77.4%, respectively, representing an improvement of 26.8%, 36.3%, and 13.2% compared with ResNet18_UNet. It also outperforms mainstream methods such as UNet, EdgeNAT, and EDTER. In terms of computational efficiency, DCFENet contains 26.43 M parameters and 37.43 G FLOPs, with a memory usage of 1.03 GB and an inference speed of 97.97 FPS, achieving a good balance between accuracy and efficiency. The results demonstrate the efficiency and accuracy of DCFENet in extracting farmland boundaries from high-resolution remote sensing images, providing technical support for farmland management and the advancement of precision and digital agriculture.

Keywords:

remote sensing; topology; multi-scale; digital agriculture; precision agriculture

1. Introduction

Farmland resources, as fundamental resources for human survival, play a vital role in ensuring global food security and maintaining ecological balance [1,2,3]. However, rapid population growth and accelerating urbanization are imposing unprecedented pressure on farmland resources. According to the food and agriculture organization of the United Nations, approximately 5 to 10 million hectares of high-quality farmland are permanently lost each year due to urban expansion, industrial development, and environmental degradation [4]. Therefore, accurately monitoring farmland boundary changes is essential for farmland resource management, food security, agricultural modernization, and sustainable rural development [5,6,7].

Early farmland boundary monitoring primarily relies on manual field surveys and GPS measurements [8], which are inefficient and often constrained by terrain. With the development of satellite sensor technologies, remote sensing has been widely applied in agricultural monitoring, land-use change analysis, and natural resource management [9]. Satellite imagery provides large-scale, multi-temporal surface information and has become an important tool for farmland boundary detection [10]. However, due to the complex and variable nature of farmland boundary environments, accurately extracting boundary information from remote sensing imagery remains a challenging task [11,12].

With the development of convolutional neural networks (CNNs), deep learning has become the mainstream technical paradigm for semantic segmentation and instance segmentation. Compared to traditional boundary extraction methods based on manually designed features, CNNs are capable of automatically learning discriminative deep semantic features and demonstrate superior boundary representation and segmentation capabilities in complex scenes [13,14]. Recent studies have shown that deep learning provides strong technical support for accurate boundary segmentation, especially in tasks requiring precise boundary representation and structural consistency [15,16].

Although deep learning-based methods have made some progress in the task of farmland boundary segmentation, they still face many challenges. Firstly, traditional methods often emphasize pixel-level overlap metrics during training, potentially at the expense of boundary continuity and structural integrity, leading to discontinuities in the predicted boundaries. Secondly, in complex farmland scenes that include multi-scale features such as ridges and ditches, a single-branch network struggles to fully capture crucial structural information, resulting in blurred boundaries and loss of details, thereby affecting overall segmentation performance. Moreover, due to the complex shape of farmland boundaries and their susceptibility to background interference, boundary detection accuracy and structural consistency modeling are insufficient, making it difficult to achieve stable boundary detection.

To address the aforementioned issues, a dual-branch collaborative feature enhancement network, DCFENet, is proposed. This network uses ResNet18 as the encoder backbone to extract feature information from farmland scenes. To enhance the network’s multi-scale feature extraction capability, a multi-scale feature fusion attention module TA-ASPP is designed to improve the perception ability of boundaries at different scales. Concurrently, a dual-branch feature decoding structure is constructed to enhance the precise detection and continuous modelling of farmland boundaries. Boundary and skeleton loss functions are designed, respectively, based on this dual-branch structure, enabling synergistic optimization of segmentation results across both boundary localization and structural representation. This approach aims to improve the accuracy, continuity, and stability of farmland boundary segmentation. The main contributions of this work are as follows:

(1): A multi-scale feature fusion attention module, TA-ASPP is proposed. By fusing features of different scales, it enhances the network’s adaptability to complex farmland image scenes, enabling more precise boundary localization.
(2): A dual-branch decoding structure is proposed. By constructing a boundary decoder and a skeleton decoder, the network’s abilities for boundary localization and boundary skeleton topology modeling are, respectively, enhanced, effectively strengthening the continuity modeling of fractured boundaries.
(3): A constraint mechanism tailored for the dual-branch structures is proposed. Boundary loss and skeleton loss are designed to collaboratively optimize the dual-branch networks, enabling simultaneous enhancement of boundary localization and structural integrity.
(4): A high-resolution farmland boundary dataset is constructed, providing a data basis for research on farmland boundary detection and for validating the effectiveness of models.

The remainder of this paper is organized as follows: Section 2 reviews related work on farmland boundary extraction and segmentation. Section 3 describes the dataset construction process, the proposed DCFENet architecture, the loss functions, and the experimental settings. Section 4 presents the experimental results, including ablation studies, comparative experiments, efficiency analysis, and quantitative and qualitative evaluations. Section 5 discusses the experimental findings, summarizes the main conclusions, and outlines the limitations and future research directions.

2. Related Work

2.1. Traditional Farmland Boundary Extraction Methods

Traditional farmland boundary detection methods primarily rely on edge detection and region segmentation techniques. Edge detection methods commonly utilize classical gradient or multi-scale-based operators, including Sobel, Canny, and Scharr, to capture grayscale variations and spatial texture variations in remote sensing imagery. By computing the gradient magnitude of local pixel neighborhoods, these methods highlight regions with significant intensity or texture variations, thereby identifying potential farmland boundaries [17,18]. These methods are computationally efficient and straightforward to implement, making them suitable for initial boundary detection [12]. Nevertheless, in complex or noisy environments, or when plot edges are blurred, the performance of these methods often deteriorates, leading to fragmented or spurious boundaries and limiting their suitability for accurate farmland boundary delineation [19]. Region segmentation methods typically employ multi-resolution segmentation (MRS), graph-based segmentation, and watershed algorithms to delineate plot boundaries. By analyzing spectral similarity and spatial proximity among pixels, these methods progressively aggregate similar pixels into plot objects with high internal consistency [20,21,22]. Compared with gradient-based edge detection methods, region segmentation techniques are more effective at maintaining plot contours and boundary continuity, thus facilitating the precise delineation of complete farmland boundaries. However, such region segmentation methods are prone to over- or under-segmentation in cases of high internal spectral variation or when adjacent plots display similar spectral characteristics. Additionally, these methods typically rely on scene-specific thresholds, which limits their transferability and general applicability [23]. Some studies integrate edge and region features within hybrid segmentation frameworks, but still require complex post-processing to preserve boundary continuity, constraining their applicability in automated scenarios [24].

2.2. Deep Learning Methods for Boundary Segmentation

In recent years, deep learning algorithms such as U-Net have been widely applied to land parcel segmentation and boundary detection tasks in the field of agricultural remote sensing. Zhang et al. [25] proposed a U-Net-based farmland boundary detection method that combines low-level and deep semantic features and incorporates a boundary connection strategy to produce complete field boundaries. Experimental results show that it achieves an overall accuracy of 89.28%, significantly outperforming the conventional U-Net and OBIA methods. Wu et al. [26] proposed a UAV-based multispectral and deep learning method for cropland parcel extraction and mapping. It utilizes multi-temporal multispectral UAV data from representative southern regions, employs an improved U-Net for automatic boundary extraction, and integrates spectral, texture, and topographic features to enhance the model’s discriminative capability. Experimental results indicate an average overall accuracy of 96.9%, a kappa coefficient of 0.895, and a maximum IoU of 0.957, demonstrating its ability to delineate parcel boundaries under complex backgrounds and fragmented cropland conditions. To address the challenges of small and fuzzy farmland parcels, Persello et al. [27] proposed a SegNet-based and combined grouping-based method for boundary detection. The method first detects sparse field contours using SegNet, and then employs a directed watershed transform along with hierarchical segmentation information to produce the final segmentation. Lu et al. [28] proposed AttMobile-DeepLabV3+,which integrates MobileNetV2 and CBAM into DeepLabV3+ to enhance boundary feature extraction.

To overcome the limitations of conventional CNNs in modeling long-range dependencies, several studies have introduced transformer-based architectures into farmland boundary extraction. Li et al. [29] proposed BAFormer, a UNetFormer-based network for cropland boundary extraction. Through the FAM module, the network adaptively enhances high-frequency boundary features and utilizes DWLK-MLP to integrate large receptive field boundary positional information, effectively improving the network’s overall feature extraction. Experimental results show that BAFormer-T achieves an F1 score of 91.3% and an mIoU of 84.1% on the Vaihingen dataset. Huang et al. [30] proposed DEDANet, a transformer-based deep segmentation model for mountainous cropland extraction. This model enhances high-frequency boundary features via a detail enhancement module and adaptively models pixel-wise spatial relationships using a distance-decay transformer. Experimental evaluation indicates that DEDANet achieves an overall accuracy of 95.76%, an F1 score of 95.15%, and an IoU of 90.79%. The model significantly enhances boundary recognition in challenging terrain. With the development of generative models, diffusion models have also been applied to improve feature representation granularity and robustness. Shi et al. [31] proposed LULC-SegNet, a semantic segmentation network that incorporates denoising diffusion features for land use and LULC imagery. Within this network, semantic features extracted by a DDPM decoder are integrated into a CNN segmentation framework and fused via clustering and spatial attention, resulting in enhanced boundary clarity and contour continuity, achieving an average IoU of 80.25%.

2.3. Multi-Task and Pixel-Wise Instance Segmentation Methods

Besides pure CNN- or Transformer-based approaches, some studies have explored multi-task learning and boundary-aware modeling to improve farmland boundary extraction. Liu et al. [32] proposed a cultivated parcel vectorization framework based on a boundary-parcel multi-task learning model, which combines region, boundary, and distance tasks to improve region separability and boundary connectivity. Ren et al. [33] proposed CAFM-Net, which integrates CNN and transformer branches to enhance global–local feature interaction and boundary delineation. Wang et al. [34] proposed DAENet for the extraction of croplands from high-resolution satellite imagery. This method combines channel attention, multi-scale spatial attention, and boundary supervision mechanisms to effectively reduce the occurrence of disjointed patches and boundary failures in complex scenes. It achieved 96.36% and 98.11% for IoU and F1 score, respectively, outperforming other mainstream semantic segmentation models.

Despite these advances, existing methods still have limitations in preserving boundary continuity and representing fine structural details in complex farmland scenes. Therefore, improving both boundary accuracy and structural consistency remains an important research issue in farmland boundary extraction.

3. Materials and Methods

3.1. Dataset Construction

3.1.1. Study Area

The remote sensing data used in this study were acquired from Zhejiang Province, China, as shown in Figure 1. The study area covered five prefecture-level cities (Hangzhou, Huzhou, Zhoushan, Jiaxing, and Shaoxing) with a geographic range of (28°51′ N–31°11′ N, 119°14′ E–121°57′ E). Those regions are characterized by a subtropical monsoon climate with four distinct seasons, an annual mean temperature of 15–18 °C, and total annual precipitation ranging from 1100 to 1600 mm. The terrain is dominated by plains and hills and constitutes an important grain production region in the Yangtze River Delta. Farmland types are diverse, including paddy fields, drylands, and protected-environment agriculture. The boundaries exhibit complex forms, ranging from narrow traditional field ridges to standardized farmland roads and broad machinery-accessible paths. This landscape provides a representative scenario for validating multi-scale boundary detection algorithms.

3.1.2. Dataset Preparation

In this study, high-resolution remote sensing imagery of the study area was acquired by the GaoFen-2 (GF-2) satellite. GF-2 is the first domestically developed civilian optical remote sensing satellite in China, which provides sub-meter spatial resolution with a 0.8 m panchromatic camera. The imagery used in this study was obtained through authorized commercial channels. To ensure image quality, the raw data were processed with radiometric calibration, atmospheric correction, and geometric refinement using ground control points to ensure radiometric consistency and spatial accuracy for analysis. The final planimetric positioning error was controlled within two pixels. The processed images were converted to three-band true color with 8-bit pixel depth and a spatial resolution of approximately 0.955 m. Pixel-level manual annotations were performed exclusively for cropland boundaries, with non-cropland areas left unmarked, ensuring that all annotations are strictly limited to the cropland boundary detection task. The annotation process was carried out by three professional remote sensing researchers using ArcGIS10.8. A total of 782 image samples with resolution of 1000 × 1000 were collected and split into a training set (680 images) and a test set (102 images). All images were split without any spatial overlap to ensure a fair evaluation of model generalization. Examples of the images and their corresponding annotations are shown in Figure 2.

Data augmentation operations are strictly executed in a fixed sequence. This approach enhances the model’s generalization capability and robustness in complex remote sensing scenarios, which are characterized by diverse field shapes, orientations, and background clutter, while also effectively mitigating overfitting during training. First, random cropping is applied with a fixed size of 512 × 512 pixels, ensuring each cropped sub-image contained boundary pixels. This approach strengthened the model’s ability to capture local boundary features and adapt to boundaries at varying positions. Subsequently, random horizontal flipping and random vertical flipping are implemented, each with a probability of 0.5. This randomization alters the spatial orientation of images, enhancing the model’s robustness to variations in boundary direction. Finally, random rotation is applied, with rotation angles selectable from 0°, 90°, 180°, or 270°. This further diversifies the spatial pose of images, strengthening the model’s adaptability to boundaries at different rotational angles. These geometric transformations are applied in unison to both the input images and their corresponding boundary labels to maintain annotation accuracy. Examples of these augmentations applied to the images and corresponding labels are shown in Figure 3.

During the training phase, each original image is sampled 4 times within a single training epoch, with each sampled image undergoing distinct data augmentation processing to generate varied augmented outputs. This approach effectively expands the training sample size without requiring additional original images, thereby providing the model with richer and more diverse training data. This facilitates enhanced model generalisation ability and stability when handling complex remote sensing scenarios.

3.2. The Proposed DCFENet

3.2.1. Overall Architecture

To address the challenges of low boundary pixel proportion, large-scale variation, and weak boundary continuity in remote sensing images, a dual-branch collaborative feature enhancement network, DCFENet, is proposed for farmland boundary detection, as shown in Figure 4. DCFENet adopts ResNet18 as the encoder to extract hierarchical features from low-level details to high-level semantics. Between the encoder and decoders, a topology-aware multi-scale feature fusion module, TA-ASPP, is introduced to capture contextual information at multiple scales and enhance structurally critical regions under the guidance of boundary skeleton features.

For decoding, DCFENet employs two parallel branches: the boundary decoder for pixel-level boundary localization and the skeleton decoder for boundary skeleton topology modeling. The skeleton decoder progressively restores spatial resolution and provides intermediate skeleton features for cross-branch fusion, while the boundary decoder integrates gated skip features from the encoder and residual features from the skeleton decoder to refine boundary prediction. To supervise the two branches, boundary loss and skeleton loss are jointly optimized through weighted summation, enabling the network to improve both boundary accuracy and structural coherence during training.

3.2.2. TA-ASPP

In the task of farmland boundary detection, accurately capturing multi-scale contextual information and key structural features is crucial for enhancing model performance. Although the ASPP has certain advantages in multi-scale information extraction, its ability to perceive structural information is still limited in complex farmland environments. Inspired by MPF-Net [35] and DeepIndices [36], a topology-aware multi-scale feature fusion module, TA-ASPP, is introduced, as illustrated in Figure 5.

TA-ASPP first performs multi-scale feature extraction on the deep features

X_{5}

from the encoder using four parallel convolutional branches with dilation rates of {1, 6, 12, 18}. The outputs from the four branches are concatenated along the channel dimension, forming a complete multi-scale feature representation that encompasses details from local features to global semantic information. Subsequently, the concatenated features undergo dimensionality reduction via 1 × 1 convolutions, integrating multi-scale information into a unified feature representation while enhancing expressive ability through nonlinear transformations.

While acquiring multi-scale features, the TA-ASPP module introduces a skeleton-aware attention mechanism that utilizes shallow boundary skeleton features

F_{s k e l}

to guide and enhance these multi-scale features. This mechanism first maps the number of channels in the shallow boundary skeleton feature

F_{s k e l}

to the same dimension as the multi-scale features via a 1 × 1 convolution. It then applies a sigmoid activation to generate an attention weight map with values ranging from 0 to 1. Subsequently, the attention weight map is upsampled to the same spatial resolution as the multi-scale features through upsampling, achieving spatial alignment.

In terms of feature fusion methods, TA-ASPP employs a residual modulation strategy to enhance the structural integrity of multi-scale features. The calculation formula is as follows:

F_{o u t} = F_{b a s e} ⊙ (1 + τ A_{u p})

(1)

where

⊙

denotes element-wise multiplication,

A_{u p} \in [0, 1]

represents the boundary skeleton attention weight, and

F_{b a s e}

represents the multi-scale features.

τ

is the modulation coefficient, set to 0.2 in this study, to provide mild structural guidance without overwhelming the multi-scale contextual representation.

When the attention weight approaches 1, the corresponding skeleton-indicated region receives moderate feature enhancement; when the attention weight approaches 0, the corresponding non-skeleton region retains its features largely unchanged. It enables the network to better adapt to the multi-scale morphological features of cultivated land by introducing the synergistic interaction between multi-scale dilated convolution and skeleton attention mechanisms, thereby providing more advantageous feature representations for accurate boundary detection of cultivated land.

3.2.3. Dual-Branch Structure

To simultaneously improve boundary localization accuracy and structural consistency, DCFENet adopts a shared-encoder dual-decoder architecture: based on the features extracted by the shared encoder, the boundary decoder performs pixel-level boundary prediction, while the skeleton decoder is designed to learn global structural topology. Meanwhile, cross-branch feature interaction is introduced during decoding, enabling the dual-branch framework to differ from the standard single-task propagation mechanisms of FPN and PANet in terms of gradient propagation, thereby achieving more efficient and task-oriented gradient flow.

The Skeleton Decoder

The skeleton decoder primarily learns farmland global skeleton structural features through a four-stage upsampling pyramid architecture, as illustrated in Figure 6. The skeleton decoder takes the high-level semantic features

x_{5}^{'} \in R^{\frac{H}{32} \times \frac{W}{32} \times C_{5}}

, enhanced by TA-ASPP, as input. It sequentially upsamples the features at each stage through a top-down process, and then fuses them with the features

X_{4}

,

X_{3}

,

X_{2}

,

X_{1}

from the corresponding encoder layers via skip connections. This achieves cross-level information exchange and feature supplementation. The feature fusion process in the i-th stage can be expressed as follows:

{\tilde{x}}_{i} = U (x_{i + 1}^{'})

(2)

s_{i} = F ({\tilde{x}}_{i}, x_{i})

(3)

where

U (\cdot)

denotes upsampling, which employs bilinear interpolation in this paper;

F (\cdot)

represents feature fusion.

It progressively restores spatial resolution and outputs four hierarchical levels of boundary skeleton features {s1, s2, s3, s4}. Here, s4, s3 and s2 are intermediate features, while s1 represents the output-layer feature. After upsampling and sigmoid activation, s1 generates a boundary probability map matching the original image’s resolution, reconstructing the farmland skeletal structure. This approach progressively incorporates low-level spatial detail while preserving high-level semantic information, enabling the refined reconstruction of cultivated land boundaries. To ensure semantic consistency between the two branches, skeleton decoding shares information with the boundary decoder while performing boundary reconstruction. Specifically, the structural features s1 and s2 are output as intermediate features to the boundary decoder for cross-branch fusion. This process guides the boundary decoder to learn more global boundary feature information by incorporating the overall structural feature information.

To improve robustness to fragmented boundaries during training, different strategies are used in training and inference, as shown in Figure 6. During training, a 3 × 3 max-pooling operation is applied to the skeleton prediction to slightly enlarge the boundary response, improving tolerance to local discontinuities. During inference, the original prediction is used directly to preserve boundary precision.

The Boundary Decoder

The boundary decoder performs precise prediction of cultivated land boundaries, employing a four-stage upsampling structure symmetrical to the skeleton decoder. Its module structure is shown in Figure 7.

Starting from the TA-ASPP-enhanced feature

x_{5}^{'} \in R^{\frac{H}{32} \times \frac{W}{32} \times C_{5}}

, the decoder progressively restores spatial resolution. At each stage, the upsampled feature is fused with encoder features modulated by a structure-aware gate, so that boundary-related responses are enhanced and background interference is suppressed.

In the first two high-level decoding stages, the structural features s1 and s2 from the skeleton decoder’s output are introduced, and a residual-weighted fusion strategy is employed to facilitate cross-branch feature interaction. Let the boundary feature of stage k be

b_{k}

; then, the fusion process can be represented as follows:

b_{k} = [{U (b}_{k + 1}), {G (x}_{k}^{g})] + α \cdot C_{k} (s_{k}), k = 1,2

(4)

where

[\cdot]

denotes feature concatenation,

G (\cdot)

represents structure-aware gate,

C (\cdot)

indicates the channel transformation operation applied to skeleton features, and

α

is a learnable fusion weight parameter used to adjust the contribution level of cross-branch information.

The structure of the structure-aware gate

G (\cdot)

is shown in Figure 7, which is used to enhance the target skip feature from the encoder. Specifically, it first decreases the resolution of boundary skeleton feature

F_{s k e l}

to the same spatial resolution as the target skip feature; then, it maps the channel dimension through a

1 \times 1

convolution and generates a soft attention weight map via sigmoid activation. Finally, the weight matrix is multiplied element-wise with the original skip feature to enhance features in regions relevant to the boundary skeleton while suppressing those in irrelevant regions, yielding the gated skip connection feature

x_{k}^{g}

. The computational process can be expressed as follows:

x_{k}^{g} = x_{k} ⊙ σ (W_{k}^{1 \times 1} (U (F_{s k e l}))), k = 1, 2, 3, 4

(5)

where

W_{k}^{1 \times 1}

denotes a

1 \times 1

convolution, and

σ

represents the sigmoid activation function.

Through gated skip fusion and cross-branch structural guidance, the boundary decoder is able to recover fine boundary details while preserving geometric consistency. Overall, the two decoders form a collaborative framework in which the skeleton decoder provides topology-aware structural priors and the boundary decoder focuses on precise boundary localization.

3.2.4. Bibranch Loss

Skeleton Loss

In the task of cropland boundaries detection, existing methods often face two major challenges: first, the imbalance between boundary pixels and backbone pixels; second, the difficulty in ensuring the topological connectivity of boundaries. To address these challenges, we adopt a skeleton loss that combines focal loss and clDice. This design is motivated by the specific requirements of the skeleton detection subtask. It improves structural integrity and boundary continuity by working jointly with boundary loss in the optimization process.

To alleviate the severe class imbalance problem, focal loss is adopted and defined as follows:

L_{F o c a l} = - α {(1 - p)}^{γ} l o g p

(6)

where

α

is a balancing factor that equalizes the contribution of positive and negative samples to the loss;

p

represents the probability that the model predicts a sample as positive; and

γ

is a tuning parameter used to adjust the weight difference between difficult and easy samples.

Focal loss enables the model to focus more on hard samples while reducing the excessive influence of easy negatives on the loss value. This allows the model to learn the features of boundary pixels more effectively during training.

Unlike evaluation metrics that focus solely on pixel-level overlap, clDice emphasizes the topological connectivity of the constrained boundary skeleton. It measures the topological consistency of predictions by comparing whether the predicted boundary skeleton falls within the ground truth boundary label (GT) and whether the ground truth boundary skeleton is covered by the prediction. Its formula is as follows:

c l D i c e = \frac{2 \times T_{p r e c} \times T_{s e n s}}{T_{p r e c} + T_{s e n s}}

(7)

T_{p r e c} = \frac{|S (P) \cap G|}{|S (P)|}

(8)

T_{s e n s} = \frac{|S (G) \cap P|}{|S (G)|}

(9)

where

G

and

P

represent the ground truth and the model-predicted image, respectively.

S (P)

denotes the boundary skeleton pixels predicted by the model, while

S (G)

refers to the boundary skeleton pixels in the ground truth.

T_{p r e c}

is a topological metric related to precision, and

T_{s e n s}

is a topological metric related to sensitivity.

clDice effectively ensures that the predicted boundary skeleton maintains structural coherence and consistency with the true boundary skeleton. Skeleton loss employs a weighted fusion of focal loss and clDice, as defined by the following formula:

L_{s k e l e t o n} = L_{F o c a l} + (1 - c l D i c e)

(10)

Under this combination approach, skeleton loss not only alleviates training difficulties caused by sparse boundary skeleton pixels but also enhances the topological connectivity of the network’s predicted boundary skeleton.

Boundary Branch Supervision Loss

Traditional loss functions often struggle to balance pixel-level accuracy, overall regional coverage, and the geometric rationality of boundaries. Consequently, the segmentation results tend to have blurred boundaries, local discontinuities, or overly coarse predictions. To address these issues, boundary branch supervision loss is adopted. This loss function comprises three components: BCE (binary cross-entropy) loss, dice loss, and thin penalty. BCE and dice loss are used to improve pixel-level classification accuracy and regional coverage, while thin penalty constrains excessive boundary thickness.

BCE (binary cross-entropy) loss is a binary cross-entropy loss function used to measure the degree of discrepancy between a model’s predicted probability distribution and the ground truth distribution. In the current task, BCE loss performs per-pixel classification judgments on remote sensing imagery, ensuring the boundaries segmentation task achieves a fundamental accuracy. Additionally, it can optimize the stability of model prediction probabilities, and can be expressed as follows:

L_{B C E} = - \frac{1}{N} \sum_{i = 1}^{N} y_{i} • l o g (p (y_{i})) + (1 - y_{i}) • l o g (1 - p (y_{i}))

(11)

where

y

is a binary label of either 0 or 1,

P (y)

is the probability that the output belongs to the label, and

N

denotes the number of groups of objects predicted by the model.

Dice loss is calculated based on the dice coefficient and can be formulated as follows:

L_{D i c e} = \frac{2 |P \cap G|}{|P| + |G|}

(12)

where

P

represents the boundary region predicted by the model, while

G

denotes the true boundary region.

This loss function aims to optimize the intersection over union (IoU) metric, thereby enhancing the model’s recall in boundary regions and enabling it to identify as many true boundary pixels as possible. In addition, it can mitigate the issue of class imbalance to some extent, prompting the model to focus more on overall regional overlap rather than the classification accuracy of individual pixels. This guides the model to prioritize comprehensive coverage, thereby enhancing the overall performance of boundary detection.

The thin penalty constrains the mean probability of pixels in the predicted boundary region. Since true boundary pixels account for only a very small proportion of the image, the average prediction probability should remain low. By penalizing excessively large predicted regions, the model is prevented from misclassifying too many non-boundary pixels as boundaries, thereby implicitly constraining boundary width without requiring explicit connected component analysis. Consequently, the thin penalty refines the predicted boundaries and improves model stability. Its mathematical expression is as follows:

L_{t h i n} = \frac{1}{H W} \sum p_{i j}

(13)

where

H

and

W

denote the height and width of the feature map, respectively, while

p_{i j}

represents the prediction probability at pixel position

(i, j)

.

The boundary loss is a weighted combination of the three components described above, and can be formulated as follows:

L_{b o u n d a r y} = L_{B C E} + L_{D i c e} + 0.5 \times L_{t h i n}

(14)

The final farmland boundary prediction results are generated by merging features from the boundary branch and the skeleton branch. To enhance the detection accuracy of the model in boundary regions, this paper introduces multiple loss functions for joint optimization when calculating the loss between predicted results and ground truth. Specifically, to measure the overlap between predicted segmentation regions and actual regions, and to mitigate the issue of fewer foreground pixels and a higher proportion of background pixels in remote sensing imagery, the model employs BCE (binary cross-entropy) loss, dice loss, and false-positive penalty. Among these, false-positive penalty is used to penalize false-positive predictions, with the formula defined as follows:

L_{F P} = \frac{1}{N} \sum_{i = 1}^{N} m a x (0, p_{i} - g_{i})

(15)

where

N = H \times W

denotes the total number of pixels, and

p_{i}

and

g_{i}

represent the predicted probability and ground truth of the i-th pixel, respectively.

Based on this, the overall loss function of DCFENet is defined as follows:

L_{t o t a l} = {λ_{b} L}_{b o u n d a r y} {{+ λ}_{s} L}_{s k e l e t o n} + {λ_{d i c e} L}_{D i c e} + {λ_{b c e} L}_{B C E} + {λ_{f p} L}_{F P}

(16)

where

L_{b o u n d a r y}

and

L_{s k e l e t o n}

represent boundary branch loss and skeleton branch loss, respectively;

L_{D i c e}

and

L_{B C E}

denote dice loss and binary cross-entropy loss, respectively; and

L_{F P}

is the false-positive penalty. In the experiment, the weights for each loss term were set as

λ_{b} = 0.5

,

λ_{b c e} = 0.3

, and

λ_{f p} = 0.2

. Skeleton loss and dice loss employed a dynamic weighting strategy: during the first 100 epochs of training, their weights

λ_{s}

and

λ_{d i c e}

increased linearly with training progress, then remained stable. This approach ensured the stability of the primary task learning while progressively strengthening the constraints imposed by boundary structure information on model optimization.

3.3. Experimental Environment

The experiments were conducted under the following configuration. The hardware consisted of an AMD Ryzen 9 5900HX CPU, an NVIDIA GeForce RTX 3070 GPU, and 24 GB of RAM. The software environment includes Windows 11 (64-bit), the deep learning framework PyTorch 2.0.0, the parallel computing platform CUDA 12.1, and Python 3.11. During model training, the Adam optimizer was used to update parameters, with an initial learning rate of 0.01, a batch size of 16, and 200 training epochs.

3.4. Evaluation Metrics

To systematically evaluate the model’s performance in the binary classification task of farmland boundary detection, this study selected six complementary evaluation metrics to analyze model performance from different dimensions. These metrics include precision, recall, F1 score, boundary IoU, OIS, and ODS.

Precision measures the proportion of predicted boundary pixels that are correctly classified, indicating the accuracy of boundary predictions. Recall measures the proportion of true boundary pixels that are correctly detected by the model. The F1 score is a harmonic mean of precision and recall, providing more stable performance evaluation under class imbalance. Boundary IoU evaluates the spatial overlap between predicted and ground-truth boundaries, providing an interpretable measure of boundary alignment. OIS calculates the F1 score by selecting the optimal threshold for each image and averaging the results, reflecting the model’s best performance on individual images and its robustness. ODS calculates the F1 score using a single optimal threshold across the entire dataset, reducing sensitivity to threshold selection and providing an objective assessment of overall performance. The calculations of these metrics are as follows:

P r e c i s i o n (P) = \frac{T P}{T P + F P}

(17)

R e c a l l (R) = \frac{T P}{T P + F N}

(18)

F 1 - S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(19)

Boundary Io U (B I o U) = \frac{T P}{T P + F P + F N}

(20)

O I S = \frac{1}{N} \sum_{i = 1}^{N} (\underset{T}{m a x} F 1_{i} (T))

(21)

O D S = \underset{T}{m a x} (\frac{1}{N} \sum_{i = 1}^{N} F 1_{i} (T))

(22)

where

T P

denotes the number of pixels correctly predicted as boundaries,

F P

denotes the number of background pixels incorrectly predicted as boundaries, and

F N

denotes the number of true boundary pixels that were missed. T and

N

denote the candidate threshold and the number of images in the dataset, respectively.

4. Results

4.1. Ablation Study

To validate the individual contributions of the proposed components, systematic ablation studies were conducted on the farmland boundary detection dataset, with quantitative results presented in Table 1.

The analysis begins with the baseline model (Row 1), which employs a standard ResNet18_UNet architecture. This configuration exhibits suboptimal performance, yielding a precision of 47.7%, a recall of only 31.8%, an F1 score of 36.8%, and a BIoU of 64.2%. These metrics indicate that the standard UNet lacks the capacity for targeted modeling of cultivated land boundaries, resulting in severe boundary discontinuities and a high rate of missed detections.

Building upon this foundation, the second row evaluates the impact of incorporating the TA-ASPP module. This addition leads to substantial performance gains, with precision, recall, and F1 score increasing by 12.1%, 18.2%, and 18.1%, respectively. Despite a slight decrease in BIoU, the marked improvement across key metrics confirms that the TA-ASPP module significantly enhances the network’s perception of boundary features via effective multi-scale context modeling and boundary skeleton-aware attention mechanisms.

The third row introduces the dual-branch decoder structure to the baseline. This modification results in improvements of 5.6%, 21.3%, and 14.4% in precision, recall, and F1 score, respectively. Compared to the integration of TA-ASPP, this configuration achieves a further increase in recall at the expense of precision. This suggests that while the dual-branch architecture is effective in capturing a greater number of boundary pixels, the absence of a dedicated feature enhancement module inevitably introduces a degree of false positives.

The fourth row further refines this architecture by incorporating the dual-branch loss function. Under this joint supervision strategy, precision improves significantly to 64.6%, the F1 score reaches 59.8%, and the OIS achieves 64.9%. This result demonstrates that co-optimizing the shared encoder with both boundary loss and skeleton loss effectively enhances boundary localization accuracy and structural coherence.

Finally, the fifth row presents the complete DCFENet framework, which synergistically integrates the TA-ASPP module, the dual-branch structure, and the dual-branch loss. This comprehensive configuration yields the optimal performance across all metrics, achieving a precision of 74.5%, recall of 68.1%, F1 score of 68.1%, BIoU of 77.4%, ODS of 68.1%, and OIS of 70.4%. The significant improvements over the baseline conclusively demonstrate that the proposed modules function cooperatively to enhance the model’s capability in segmenting complex farmland edge features.

4.2. Multi-Scale Comparative Experiment

To validate the effectiveness of multi-scale attention mechanisms in farmland boundary detection, this section conducts comparative experiments by replacing different attention modules. These experiments are built upon the existing dual-branch decoder architecture and the dual-branch loss. The experimental results are presented in Table 2.

As detailed in Table 2, the baseline model (DUALRes-Net18_UNet), which corresponds to DCFENet without the multi-scale attention module, yields a precision of 64.6% but a comparatively low recall of 49.3% and a BIoU of 61.3%.

The incorporation of the traditional ASPP module (Row 2) significantly enhances recall to 64.3% (+15.0%) and BIoU to 62.9%. However, this improvement comes at the cost of a substantial decline in precision to 51.9%. This trade-off implies that while traditional ASPP expands the receptive field to detect more boundary pixels, the lack of selective attention introduces significant noise and false positives.

In contrast, the incorporation of the EMA module (Row 3) attempts to balance these competing metrics. It recovers some precision (56.6%) while maintaining a robust recall (62.7%), resulting in a moderate F1 score of 61.5%. This demonstrates that EMA, through cross-spatial feature correlation learning, reduces false positives while maintaining high recall, thereby enhancing boundary localization accuracy.

Finally, the incorporation of the proposed TA-ASPP module (Row 4) achieves optimal performance across all metrics, significantly outperforming both the baseline and other attention mechanisms. It reaches a precision of 74.5%, recall of 68.1%, and a high BIoU of 77.4%. Compared to the traditional ASPP, TA-ASPP improves precision by 22.6% and BIoU by 14.5%. Compared to the EMA module, all metrics also showed significant improvements. This outcome fully validates the core advantage of the TA-ASPP module. By integrating multi-scale context modeling with skeleton-aware attention mechanisms, the module enhances boundary feature perception. Meanwhile, skeleton information is effectively utilized for guidance.

4.3. Comparison of Different Models

4.3.1. Performance Comparison of Different Models

To evaluate the superiority of DCFENet in the farmland boundary detection task, this section conducts comparative experiments with numerous representative edge detection and semantic segmentation models, including: UNet [37], EdgeNAT [38], AttentionUNet [39], EDTER [40], ResNet18UNet, ResNet18UNet (mean), and ResNet18UNet (sum). Among these, UNet and AttentionUNet represent a classic encoder–decoder architecture model. EdgeNAT and EDTER are transformer-based architecture models. BSiNet [41] is a multi-task encoder–decoder architecture model. ResNet18_UNet is a semantic segmentation network using ResNet18 as its encoder. ResNet18_UNet (mean) and ResNet18_UNet (sum) incorporate the focal loss function into this model, calculating losses using mean and sum, respectively. To ensure a fair comparison and improve the generalization of all models, a consistent data augmentation strategy was applied to the training data for all comparative methods. All comparative experiments were conducted under the same experimental environments and evaluation metrics to ensure fairness and comparability of results.

As shown in Table 3, UNet achieves relatively low performance, with precision and recall of only 29.5% and 40.8%, respectively. This indicates that Vanilla U-Net exhibits limited capability in the farmland boundary detection task. In contrast, ResNet18_UNet, which employs ResNet18 as its encoder, shows improvements in precision and boundary IoU, reaching 47.7% and 64.2%, respectively. However, its recall remains low at 31.8%, resulting in a still-low F1 score. This indicates the model’s shortcomings in boundary detection completeness. EdgeNAT and EDTER also exhibit limited overall performance on this dataset. Attention UNet demonstrates relatively low metrics across the board. This indicates that its attention mechanism fails to fully leverage its advantages in complex farmland boundary scenarios. BSiNet achieves relatively better performance, with precision, recall, and F1 score reaching 66.2%, 64.3%, and 64.1%, respectively, indicating that its multi-task design is effective for farmland boundary extraction.

After introducing focal loss, the performance of ResNet18_UNet improved significantly. ResNet18_UNet (mean) achieved precision, recall, and F1 score, and BIoU of 61.6%, 36.9%, and 44.7%, and 73.5%, respectively. Further adopting a summation method for loss calculation, ResNet18_UNet (sum) achieved superior performance across all metrics. These results demonstrate that focal loss effectively mitigates the class imbalance issue in farmland boundary detection by directing the model’s attention toward challenging boundary pixels, thereby enhancing overall detection performance.

In contrast, the proposed DCFENet achieves the best results across all evaluation metrics, with precision, recall, and F1 score reaching 74.5%, 68.1%, and 68.1%, respectively. Boundary IoU reaches 77.4%, while ODS and OIS achieve 68.1% and 70.4%, respectively. The experimental results demonstrate that DCFENet effectively adapted to complex cultivated land scenarios and diverse remote sensing image conditions. It provides an effective solution for high-precision cultivated land boundary extraction.

4.3.2. Model Computational Efficiency Analysis

To evaluate the superiority of DCFENet in the farmland boundary detection task, this paper compares the proposed model with mainstream segmentation networks across four dimensions: parameter size, computational complexity, inference speed, and memory consumption. The results are presented in Table 4.

As shown in Table 4, DCFENet achieves an effective balance between model complexity and detection accuracy. In terms of parameter scale, DCFENet has 26.43 million parameters, representing reductions of 14.7%, 16.1%, and 45.3% compared to UNet, EdgeNAT, and EDTER, respectively. In computational complexity, DCFENet achieves 37.43 GFLOPs, which represents reductions of 82.9%, 68.4%, and 51.0% compared with UNet, EDTER, and EdgeNAT, respectively, highlighting its superior computational efficiency.

In terms of inference speed, DCFENet achieves 97.97 FPS, significantly outperforming UNet, EdgeNAT, and EDTER, meeting the demands for efficient processing. In terms of memory consumption, DCFENet utilizes 1.03 GB of GPU memory, substantially lower than UNet, EdgeNAT, and EDTER. It is comparable to the lightweight AttentionUNet at 1.04 GB, demonstrating excellent resource efficiency. Compared to EdgeNAT with similar parameter scale, DCFENet not only has fewer parameters but also reduces FLOPs and memory usage by 51.0% and 58.8%, respectively, while boosting inference speed by approximately 4.2 times. Compared to ResNet18_UNet, although DCFENet exhibits slight increases in parameters and FLOPs, its detection performance significantly improves.

The experimental results demonstrate that DCFENet maintains high computational efficiency while ensuring high boundary detection accuracy, providing an efficient technical solution for practical applications in farmland boundary extraction for agricultural management and precision agriculture.

4.3.3. Cross-Validation Analysis

To further evaluate the generalization ability and stability of DCFENet, five-fold cross-validation was conducted. The results of U-Net, ResNet18UNet, and DCFENet are shown in Table 5. Among the three models, DCFENet achieves the best performance across all evaluation metrics, with precision, recall, F1 score, BIoU, ODS, and OIS reaching 74.37 ± 0.51, 68.22 ± 1.73, 70.64 ± 0.92, 75.38 ± 0.81, 71.15 ± 0.89, and 68.36 ± 0.92, respectively. In addition, DCFENet exhibits relatively small standard deviations across different folds, indicating that it is less sensitive to data partitioning and shows stronger robustness and generalization capability. These results further confirm the effectiveness and reliability of the proposed method for farmland boundary detection.

4.4. Experimental Results Analysis

To more intuitively demonstrate the performance advantages of DCFENet in the farmland boundary detection task, this section provides a comparative analysis of its innovative features and visual results with multiple models. First, we conduct a comparative experiment between DCFENet and two classical encoder–decoder architectures, UNet and ResNet18UNet. The results are illustrated in Figure 8.

As shown in Figure 8, UNet produces coarse and incomplete farmland boundaries. The extracted boundaries exhibit noticeable discontinuities, and many farmlands are frequently misclassified as non-agricultural areas. ResNet18_UNet improves boundary continuity and overall accuracy to some extent and is able to capture field shape features. However, it still suffers from significant misclassification in complex scenarios; for example, large farmland areas in Figure 8a are not fully detected, while local non-farmland regions in Figure 8b are incorrectly classified as farmland. In addition, its ability to distinguish subtle differences between adjacent fields is limited, often leading to boundary adhesion, indicating that its feature representation capability remains insufficient. In contrast, DCFENet achieves more accurate discrimination between farmland and non-farmland regions. As shown in Figure 8a, it successfully identifies farmland and produces more continuous and clearer boundaries. Moreover, both false positives and false negatives are significantly reduced across different regions in Figure 8, demonstrating superior accuracy and stability.

In addition, we compared DCFENet with two classical transformer-based architectures, EDTER and EdgeNAT, as shown in Figure 9. Transformer-based models generally achieve better boundary integrity, successfully capturing most field outlines with good continuity. However, they exhibit limitations in extracting irregularly shaped farmland boundaries and are prone to pixel-level misclassification. As illustrated in Figure 9a,b, both EDTER and EdgeNAT incorrectly classify some non-farmland regions as farmland. In contrast, DCFENet achieves more accurate farmland detection.

The visual results in Figure 8 and Figure 9 show that DCFENet produces outputs closest to the ground truth, with clearer, more continuous, and detail-preserving boundaries. DCFENet accurately extracts farmland boundaries at multiple scales and effectively reduces issues such as boundary discontinuities, blurring, and missed detections. It demonstrates strong robustness and is better suited to meet the practical demands for high-precision farmland boundary extraction.

5. Discussion

5.1. Model Advantages

This study addresses challenges in farmland boundary detection—including low boundary pixel coverage, significant scale variations, and weak boundary continuity—by proposing a dual-branch collaborative feature enhancement network, DCFENet. First, to tackle the pronounced scale differences in farmland boundaries within remote sensing imagery, TA-ASPP, a multi-scale feature fusion attention module, is designed. This module effectively enhances the network’s perception of farmland boundary features by integrating multi-scale dilated convolutions with skeleton-aware attention. Second, to strengthen the network’s boundary localization and spatial structural topology modeling capabilities, a dual-branch decoder architecture is designed. This architecture incorporates boundary-aware gate and cross-branch feature fusion. Additionally, a collaborative constraint mechanism tailored for dual-branch decoding is proposed. By designing boundary loss and skeleton loss to supervise the two decoders, respectively while jointly optimizing the shared encoder, it achieves simultaneous enhancement of boundary localization and structural integrity.

Compared to current representative edge detection and semantic segmentation models, DCFENet demonstrates significant advantages in the task of cultivated land boundary detection. The traditional UNet, constrained by its fixed loss design, tends to exhibit blurred or broken boundaries in scenarios with sparse boundary pixels and severe class imbalance. Its accuracy and BIoU are only 29.5% and 37.4%, respectively, significantly lower than those of DCFENet. Compared to transformer-based architectures like Edter and EdgeNAT, DCFENet enhances multi-scale feature extraction through TA-ASPP and synchronously strengthens boundary localization and structural integrity via a dual-branch decoder. This achieves a 31.5% and 62.4% improvement in precision and BIoU over Edter, respectively, and a 33.9% and 50.8% improvement over EdgeNAT Compared to ResNet18_UNet, DCFENet maintains comparable inference speed (37.43 fps) and lower parameter count (26.4 M) while demonstrating superior boundary localization consistency and contour integrity, achieving a more robust precision–recall tradeoff.

DCFENet not only outperforms mainstream segmentation models, achieving a 74.5% precision and 77.4% boundary IoU, but also demonstrates significant advantages over recent state-of-the-art methods in boundary awareness and remote sensing segmentation, as evidenced by its superior performance metrics. Qin et al. [42] proposed U²-Net, a two-level nested U-structure for salient object detection (SOD) that leverages rich hierarchical features to capture fine details. However, its heavy reliance on deep supervision across multiple scales increases computational complexity and often performs poorly when handling the topological connectivity of sparse, elongated farmland boundaries. Xu et al. [43] proposed EdgeViT, which attempts to balance accuracy and efficiency by combining CNNs with vision transformers (ViTs). Although it yields significant results in natural scenes, the self-attention mechanism in EdgeViT may introduce noise in high-resolution remote sensing images with complex textures, leading to fragmented predictions in low-contrast boundary regions.

Building on its performance improvements, DCFENet also demonstrates strong potential in real-world applications, including integration with decision support systems, GIS platforms, and remote sensing-based operational workflows. It can enable automated farmland monitoring, precision input management, and informed agricultural decision-making, thereby enhancing the practical utility of the approach in precision agriculture [44].

5.2. Limitations and Future Work

Despite its outstanding performance in farmland boundary detection, DCFENet still exhibits certain limitations. First, the model relies on a large volume of high-resolution remote sensing imagery for training, demanding substantial data quantities and high annotation quality. The sparse pixel distribution and complex morphology of farmland boundaries result in high annotation costs. Second, despite optimizations in multi-scale feature extraction and cross-spatial attention, the relatively complex network architecture and large number of parameters place computational demands on resources and inference efficiency, limiting deployment in resource-constrained environments. Third, the model’s generalization capability across different regions, crop types, or sensor imagery remains to be validated. In particular, this study is mainly conducted on remote sensing data acquired under relatively uniform climatic conditions, without fully considering diverse weather conditions (e.g., fog and cloud cover) or multi-temporal data acquisition. As a result, the robustness and transferability of the model under more complex environmental conditions remain to be further investigated.

To address the aforementioned limitations, future research will proceed in several directions. First, semi-supervised or weakly supervised learning methods may be adopted to reduce reliance on large volumes of high-quality labeled data. At the same time, data augmentation and synthetic image generation methods will be further introduced to enrich the training set. In particular, Perlin noise-based data augmentation will be further investigated to simulate complex imaging variations and improve training diversity. Second, lightweight network designs and knowledge distillation techniques will be explored to further reduce model parameters and computational overhead while maintaining accuracy, making DCFENet more suitable for mobile or edge computing deployments. Finally, cross-region and cross-sensor adaptation strategies will be investigated, including transfer learning and adaptive domain alignment methods, to enhance the model’s robustness and stability across different cropland types and remote sensing conditions. Furthermore, future work will also focus on constructing a multimodal dataset integrating multispectral and RGB information, and exploring cross-modal feature fusion mechanisms to achieve more accurate and robust farmland boundary prediction.

6. Conclusions

To achieve precise and stable segmentation of cultivated land boundaries, this paper proposes a dual-branch collaborative feature enhancement network, DCFENet. This method effectively captures boundary information at different scales through the multi-scale feature fusion attention module TA-ASPP. It also designs a dual-branch decoding structure to enhance the modeling of discontinuous boundary continuity. Simultaneously, a bibranch loss function is designed to synchronously optimize boundary precise localization and boundary skeleton structure modeling. Experimental results demonstrate that DCFENet significantly outperforms mainstream methods including UNet, EdgeNAT, and EDTER. Furthermore, DCFENet demonstrates strong potential in practical applications due to its compact design and high inference speed. It provides robust technical support for farmland management, facilitating management zone delineation, variable input ap-plication, machinery planning, spatiotemporal field monitoring, and agricultural policy planning. Although the model presented in this paper has demonstrated certain advantages, it has not yet fully accounted for cross-regional adaptability, which to some extent limits its application in broader precision agriculture scenarios. Future work will focus on improving the model’s generalization ability and robustness across different geographical environments and imaging conditions.

Author Contributions

M.L.: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing—Original Draft. B.H.: Conceptualization, Methodology, Software, Investigation, Data Curation. P.W.: Conceptualization, Methodology, Data Curation, Supervision, Project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors express their appreciation to the editor and anonymous reviewers for their insightful recommendations, which significantly contributed to enhancing the initial manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Potapov, P.; Turubanova, S.; Hansen, M.C.; Tyukavina, A.; Zalles, V.; Khan, A.; Cortez, J. Global maps of cropland extent and change show accelerated cropland expansion in the twenty-first century. Nat. Food 2022, 3, 19–28. [Google Scholar] [CrossRef]
Lambin, E.F.; Meyfroidt, P. Global land use change, economic globalization, and the looming land scarcity. Proc. Natl. Acad. Sci. USA 2011, 108, 3465–3472. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Huang, Y.; Sun, J.; Chen, S.; Zou, J. Balancing Urban Expansion and Food Security: A Spatiotemporal Assessment of Cropland Loss and Productivity Compensation in the Yangtze River Delta, China. Land 2025, 14, 1476. [Google Scholar] [CrossRef]
Li, L. The State of the World’s Land and Water Resources for Food and Agriculture (SOLAW): Systems at Breaking Point; FAO: Rome, Italy, 2021. [Google Scholar]
Wang, M.; Wang, J.; Cui, Y.; Liu, J.; Chen, L. Agricultural field boundary delineation with satellite image segmentation for high-resolution crop mapping: A case study of rice paddy. Agronomy 2022, 12, 2342. [Google Scholar] [CrossRef]
Wang, S.; Waldner, F.; Lobell, D.B. Unlocking large-scale crop field delineation in smallholder farming systems with transfer learning and weak supervision. Remote Sens. 2022, 14, 5738. [Google Scholar] [CrossRef]
Zhang, J.; Yang, X.; Dai, J.; Wang, X.; Fang, Z.; Liu, X.; Wang, Z. Automated remote sensing monitoring of cropland non-agricultural and non-grain conversion at parcel scale in complex environments through multi-source data fusion. Geo-Spat. Inf. Sci. 2025, 29, 168–192. [Google Scholar] [CrossRef]
De Bruin, S.; Heuvelink, G.B.M.; Brown, J.D. Propagation of positional measurement errors to agricultural field boundaries and associated costs. Comput. Electron. Agric. 2008, 63, 245–256. [Google Scholar] [CrossRef]
Omia, E.; Bae, H.; Park, E.; Kim, M.S.; Baek, I.; Kabenge, I.; Cho, B.K. Remote sensing in field crop monitoring: A comprehensive review of sensor systems, data analyses and recent advances. Remote Sens. 2023, 15, 354. [Google Scholar] [CrossRef]
Xu, Y.; Xue, X.; Sun, Z.; Gu, W.; Cui, L.; Jin, Y.; Lan, Y. Deriving agricultural field boundaries for crop management from satellite images using semantic feature pyramid network. Remote Sens. 2023, 15, 2937. [Google Scholar] [CrossRef]
Wang, X.; Shu, L.; Han, R.; Yang, F.; Gordon, T.; Wang, X.; Xu, H. A survey of farmland boundary extraction technology based on remote sensing images. Electronics 2023, 12, 1156. [Google Scholar] [CrossRef]
Waldner, F.; Diakogiannis, F.I. Deep learning on edge: Extracting field boundaries from satellite images with a convolutional neural network. Remote Sens. Environ. 2020, 245, 111741. [Google Scholar] [CrossRef]
Neupane, B.; Teerayut, H.; Jagannath, A. Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis. Remote Sens. 2021, 13, 808. [Google Scholar] [CrossRef]
Lu, R.; Zhang, Y.; Huang, Q.; Zeng, P.; Shi, Z.; Ye, S. A refined edge-aware convolutional neural networks for agricultural parcel delineation. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104084. [Google Scholar] [CrossRef]
Chen, H.; Qi, X.; Yu, L.; Heng, P.A. DCAN: Deep contour-aware networks for accurate gland segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2487–2496. [Google Scholar]
Bai, M.; Urtasun, R. Deep watershed transform for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5221–5229. [Google Scholar]
Watkins, B.; Van Niekerk, A. A comparison of object-based image analysis approaches for field boundary delineation using multi-temporal Sentinel-2 imagery. Comput. Electron. Agric. 2019, 158, 294–302. [Google Scholar] [CrossRef]
Hong, R.; Park, J.; Jang, S.; Shin, H.; Kim, H.; Song, I. Development of a parcel-level land boundary extraction algorithm for aerial imagery of regularly arranged agricultural areas. Remote Sens. 2021, 13, 1167. [Google Scholar] [CrossRef]
Wagner, M.P.; Oppelt, N. Extracting agricultural fields from remote sensing imagery using graph-based growing contours. Remote Sens. 2020, 12, 1205. [Google Scholar] [CrossRef]
Xu, L.; Ming, D.; Zhou, W.; Bao, H.; Chen, Y.; Ling, X. Farmland extraction from high spatial resolution remote sensing images based on stratified scale pre-estimation. Remote Sens. 2019, 11, 108. [Google Scholar] [CrossRef]
Felzenszwalb, P.F.; Huttenlocher, D.P. Efficient graph-based image segmentation. Int. J. Comput. Vis. 2004, 59, 167–181. [Google Scholar] [CrossRef]
Xue, Y.; Zhao, J.; Zhang, M. A watershed-segmentation-based improved algorithm for extracting cultivated land boundaries. Remote Sens. 2021, 13, 939. [Google Scholar] [CrossRef]
Tetteh, G.O.; Schwieder, M.; Erasmi, S.; Conrad, C.; Gocht, A. Comparison of an optimised multiresolution segmentation approach with deep neural networks for delineating agricultural fields from sentinel-2 images. PFG-J. Photogramm. Remote Sens. Geoinf. Sci. 2023, 91, 295–312. [Google Scholar] [CrossRef]
Shunying, W.; Ya’nan, Z.; Xianzeng, Y.; Li, F.; Tianjun, W.; Jiancheng, L. BSNet: Boundary-semantic-fusion network for farmland parcel mapping in high-resolution satellite images. Comput. Electron. Agric. 2023, 206, 107683. [Google Scholar] [CrossRef]
Zhang, H.; Liu, M.; Wang, Y.; Shang, J.; Liu, X.; Li, B.; Li, Q. Automated delineation of agricultural field boundaries from Sentinel-2 images using recurrent residual U-Net. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102557. [Google Scholar] [CrossRef]
Wu, S.; Su, Y.; Lu, X.; Xu, H.; Kang, S.; Zhang, B.; Liu, L. Extraction and mapping of cropland parcels in typical regions of southern China using unmanned aerial vehicle multispectral images and deep learning. Drones 2023, 7, 285. [Google Scholar] [CrossRef]
Persello, C.; Tolpekin, V.A.; Bergado, J.R.; De By, R.A. Delineation of agricultural fields in smallholder farms from satellite images using fully convolutional networks and combinatorial grouping. Remote Sens. Environ. 2019, 231, 111253. [Google Scholar] [CrossRef] [PubMed]
Lu, H.; Wang, H.; Ma, Z.; Ren, Y.; Fu, W.; Shan, Y.; Meng, Z. Farmland boundary extraction based on the AttMobile-DeeplabV3+ network and least squares fitting of straight lines. Front. Plant Sci. 2023, 14, 1228590. [Google Scholar] [CrossRef]
Li, Z.; Wang, Y.; Tian, F.; Zhang, J.; Chen, Y.; Li, K. BAFormer: A Novel Boundary-Aware Compensation UNet-like Transformer for High-Resolution Cropland Extraction. Remote Sens. 2024, 16, 2526. [Google Scholar] [CrossRef]
Huang, L.; Zhang, Z.; Yu, Y.; Tang, B.H. DEDANet: Mountainous Cropland Extraction From Remote Sensing Imagery with Detail Enhancement and Distance Attenuation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 17565–17579. [Google Scholar] [CrossRef]
Shi, Z.; Fan, J.; Du, Y.; Zhou, Y.; Zhang, Y. LULC-SegNet: Enhancing land use and land cover semantic segmentation with denoising diffusion feature fusion. Remote Sens. 2024, 16, 4573. [Google Scholar] [CrossRef]
Liu, X.; Zhang, J.; Duan, Y.; Zhou, J. CPVF: Vectorization of agricultural cultivation field parcels via a boundary–parcel multi-task learning network in ultra-high-resolution remote sensing images. ISPRS J. Photogramm. Remote Sens. 2025, 226, 267–299. [Google Scholar] [CrossRef]
Ren, J.; Jing, Y.; Zheng, X.; Li, S.; Li, K.; Mu, G. Cropland Extraction Based on PlanetScope Images and a Newly Developed CAFM-Net Model. Remote Sens. 2026, 18, 646. [Google Scholar] [CrossRef]
Wang, Y.; Yang, M.; Zhang, T.; Hu, S.; Zhuang, Q. DAENet: A Deep Attention-Enhanced Network for Cropland Extraction in Complex Terrain from High-Resolution Satellite Imagery. Agriculture 2025, 15, 1318. [Google Scholar] [CrossRef]
Chen, H.; Wang, Q.; Xie, K.; Lei, L.; Wu, X. MPF-Net: Multi-projection filtering network for few-shot object detection. Appl. Intell. 2024, 54, 7777–7792. [Google Scholar] [CrossRef]
Vayssade, J.A.; Paoli, J.N.; Gée, C.; Jones, G. DeepIndices: Remote sensing indices based on approximation of functions through deep-learning, application to uncalibrated vegetation images. Remote Sens. 2021, 13, 2261. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Jie, J.; Guo, Y.; Wu, G.; Wu, J.; Hua, B. EdgeNAT: Transformer for efficient edge detection. arXiv 2024, arXiv:2408.10527. [Google Scholar] [CrossRef]
Zhu, Z.; Yan, Y.; Xu, R.; Zi, Y.; Wang, J. Attention-Unet: A deep learning approach for fast and accurate segmentation in medical imaging. J. Comput. Sci. Softw. Appl. 2022, 2, 24–31. [Google Scholar]
Pu, M.; Huang, Y.; Liu, Y.; Guan, Q.; Ling, H. Edter: Edge detection with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1402–1412. [Google Scholar]
Long, J.; Li, M.; Wang, X.; Stein, A. Delineation of agricultural fields using multi-task BsiNet from high-resolution satellite images. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102871. [Google Scholar] [CrossRef]
Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
Pan, J.; Bulat, A.; Tan, F.; Zhu, X.; Dudziak, L.; Li, H.; Tzimiropoulos, G.; Martinez, B. Edgevits: Competing light-weight cnns on mobile devices with vision transformers. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 294–311. [Google Scholar]
Mulla, D.J. Twenty five years of remote sensing in precision agriculture: Key advances and remaining knowledge gaps. Biosyst. Eng. 2013, 114, 358–371. [Google Scholar] [CrossRef]

Figure 1. Study area in typical agricultural regions of Zhejiang Province, China.

Figure 2. Examples of original remote sensing images and their corresponding binary boundary labels (white pixels represent boundaries and black pixels represent background).

Figure 3. Examples of augmented images and their corresponding labels.

Figure 4. Architecture of DCFENet.

Figure 5. Architecture of TA-ASPP.

Figure 6. Architecture of the skeleton decoder.

Figure 7. Architecture of the boundary decoder.

Figure 8. Comparison of the encoder–decoder-based models.

Figure 9. Comparison of the transformer-based models.

Table 1. Ablation study results.

Baseline	TA-ASPP	Dual-Branch	Dual-Branch Loss	Precision	Recall	F1 Score	BIoU	ODS	OIS
√	×	×	×	47.7	31.8	36.8	64.2	36.8	39.5
√	√	×	×	59.8	50	54.9	62.8	54.9	58.3
√	×	√	×	53.3	53.1	51.2	56.9	51.2	55.9
√	×	√	√	64.6	49.3	59.8	61.3	59.8	64.9
√	√	√	√	74.5	68.1	68.1	77.4	68.1	70.4

Note: “√” indicates that the corresponding module is included, while “×” indicates that it is not included; TA-ASPP: task-aware atrous spatial pyramid pooling; dual-branch: dual-branch decoder structure; dual-branch loss: joint supervision with boundary loss and skeleton loss.

Table 2. Multi-scale comparative experimental results.

Baseline	Precision	Recall	F1 Score	BIoU	ODS	OIS
DUALRes-Net18_UNet	64.6	49.3	59.8	61.3	59.8	64.9
+ASPP	51.9	64.3	60.7	62.9	60.7	63.4
+EMA	56.6	62.7	61.5	64.1	61.5	64.6
+TA-ASPP	74.5	68.1	68.1	77.4	68.1	70.4

Note: “+” indicates that the corresponding module is added to the baseline model; ASPP: atrous spatial pyramid pooling; EMA: efficient multi-scale attention; TA-ASPP: task-aware atrous spatial pyramid pooling.

Table 3. Performance comparison of DCFENet and benchmark models.

Model	Precision	Recall	F1 Score	BIoU	ODS	OIS
UNet	29.5	40.8	31.7	37.4	31.7	34.4
EdgeNAT	40.6	38	35.3	26.6	37.4	38.2
AttentionUnet	25.1	16.3	18.7	45.2	18.7	19.6
EDTER	43	47.1	35.7	15	37.8	38.7
BSiNet	66.2	64.3	64.1	48.2	66.5	68.2
ResNet18_UNet	47.7	31.8	36.8	64.2	36.8	39.5
ResNet18_UNet (mean)	61.6	36.9	44.7	73.5	44.7	46.3
ResNet18_UNet (sum)	72.9	44.1	54	77.5	54	56.5
DCFENet (ours)	74.5	68.1	68.1	77.4	68.1	70.4

Table 4. Model computational efficiency comparison.

Baseline	Parameter (M)	FLOPS (G)	Inference Speed (FPS)	Memory (GB)
UNet	31.00	218.97	51.75	2.93
EdgeNAT	31.50	76.40	23.60	2.50
AttentionUnet	10.74	10.691	38.72	1.04
EDTER	48.30	118.60	12.30	3.40
ResNet18_UNet	15.78	28.37	166.94	0.68
DCFENet (ours)	26.43	37.43	97.97	1.03

Table 5. Comparison of different models under five-fold cross-validation.

Model	Precision	Recall	F1 Score	BIoU	ODS	OIS
U-Net	24.53 ± 3.47	42.04 ± 2.75	29.81 ± 2.75	34.6 ± 1.67	31.71 ± 2.86	29.92 ± 3.12
ResNet18UNet	45.08 ± 2.50	29.74 ± 2.22	34.89 ± 2.16	64.90 ± 0. 94	35.42 ± 3.07	34.24 ± 3.50
DCFENet	74.37 ± 0. 51	68.22 ± 1.73	70.64 ± 0. 92	75.38 ± 0. 81	71.15 ± 0. 89	68.36 ± 0. 92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lan, M.; Huang, B.; Wu, P. DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection. Agronomy 2026, 16, 964. https://doi.org/10.3390/agronomy16100964

AMA Style

Lan M, Huang B, Wu P. DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection. Agronomy. 2026; 16(10):964. https://doi.org/10.3390/agronomy16100964

Chicago/Turabian Style

Lan, Mengyao, Bangjun Huang, and Peng Wu. 2026. "DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection" Agronomy 16, no. 10: 964. https://doi.org/10.3390/agronomy16100964

APA Style

Lan, M., Huang, B., & Wu, P. (2026). DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection. Agronomy, 16(10), 964. https://doi.org/10.3390/agronomy16100964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection

Abstract

1. Introduction

2. Related Work

2.1. Traditional Farmland Boundary Extraction Methods

2.2. Deep Learning Methods for Boundary Segmentation

2.3. Multi-Task and Pixel-Wise Instance Segmentation Methods

3. Materials and Methods

3.1. Dataset Construction

3.1.1. Study Area

3.1.2. Dataset Preparation

3.2. The Proposed DCFENet

3.2.1. Overall Architecture

3.2.2. TA-ASPP

3.2.3. Dual-Branch Structure

The Skeleton Decoder

The Boundary Decoder

3.2.4. Bibranch Loss

Skeleton Loss

Boundary Branch Supervision Loss

3.3. Experimental Environment

3.4. Evaluation Metrics

4. Results

4.1. Ablation Study

4.2. Multi-Scale Comparative Experiment

4.3. Comparison of Different Models

4.3.1. Performance Comparison of Different Models

4.3.2. Model Computational Efficiency Analysis

4.3.3. Cross-Validation Analysis

4.4. Experimental Results Analysis

5. Discussion

5.1. Model Advantages

5.2. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI