Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction

Teng, Feiyu; Wu, Ling; Liu, Shukuan

doi:10.3390/rs17152556

Open AccessArticle

Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction

by

Feiyu Teng

,

Ling Wu

^* and

Shukuan Liu

School of Information Engineering, China University of Geosciences, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(15), 2556; https://doi.org/10.3390/rs17152556

Submission received: 16 June 2025 / Revised: 22 July 2025 / Accepted: 22 July 2025 / Published: 23 July 2025

(This article belongs to the Special Issue Advanced in Remote Sensing Approaches for Agricultural Monitoring at Field and Regional Scale)

Download

Browse Figures

Versions Notes

Abstract

The accurate acquisition of agricultural parcels from remote sensing images is crucial for agricultural management and crop production monitoring. Most of the existing agricultural parcel extraction methods comprise semantic segmentation through remote sensing images, pixel-level classification, and then vectorized raster data. However, this approach faces challenges such as internal cavities, unclosed boundaries, and fuzzy edges, which hinder the accurate extraction of complete agricultural parcels. Therefore, this paper proposes a vector contour segmentation network based on the hybrid backbone and multiscale edge feature extraction module (HEVNet). We use the extraction of vector polygons of agricultural parcels by predicting the location of contour points, which avoids the above problems that may occur when raster data is converted to vector data. Simultaneously, this paper proposes a hybrid backbone for feature extraction. A hybrid backbone combines the respective advantages of the Resnet and Transformer backbone networks to balance local features and global features in feature extraction. In addition, we propose a multiscale edge feature extraction module, which can extract and enhance the edge features of different scales to prevent the possible loss of edge details in down sampling. This paper uses the datasets of Denmark, the Netherlands, iFLYTEK, and Hengyang in China to evaluate our model. The obtained IOU indexes were 67.92%, 81.35%, 78.02%, and 66.35%, which are higher than previous IOU indexes based on the optimal model (DBBANet). The results demonstrate that the proposed model significantly enhances the integrity and edge accuracy of agricultural parcel extraction.

Keywords:

remote sensing; contour segmentation; agricultural parcels extraction; edge enhancement

1. Introduction

Farmland is a necessary resource to ensure agricultural development and food production. The quantity and quality of agricultural parcels are decisive factors in ensuring the sustainable development of agriculture and have a direct impact on the development of the national economy [1]. Against the background of accelerated global urbanization and population growth, the expansion of urban construction land has led to the reduction in agricultural land. It is particularly important for agricultural managers to obtain accurate agricultural information [2]. In the field of agricultural remote sensing, agricultural information extraction is a key research topic [3,4]. Carrying out information extraction of agricultural parcels is not only a long-term plan to ensure agricultural development, but also an urgent task to maintain the stability of cultivated land reserve resources [5]. Agricultural parcels generally refer to agricultural land with clear boundaries for the continuous planting of a crop. It is the basic unit for agricultural management, supervision, and statistics, and plays an important role in agricultural management and land use planning [6]. The extraction of agricultural parcels can be applied in various fields such as agricultural yield forecasting [7], farmland area investigation [8], land resource evaluation [9], land use management [10], and the sustainable development of resources and the environment [11,12].

Previous agricultural parcel division mainly depended on site measurements to obtain spatial distribution data and draw plot distribution maps [13]. However, this method often consumes a lot of manpower and time and is inefficient, so it is difficult to extract large-scale agricultural parcels [14]. With the continuous development of remote sensing technology, high-spatial resolution, spectral resolution, and multi temporal remote sensing images have been obtained by us. The extraction of agricultural parcels using remote sensing images has been widely used by researchers [15,16]. Remote sensing has the characteristics of real-time acquisition, rapid monitoring, and wide coverage. Using remote sensing images to automatically delimit agricultural parcels is a hot research topic [17,18].

The traditional remote sensing image extraction method of agricultural parcels is to use the relevant algorithm to classify the edge or region of agricultural parcels at the pixel level. For example, the Sobel or Canny filter operator is used to extract the edges of agricultural parcels in the image, and the extracted edge is used as the field boundary to divide farmland and non-farmland [19]. Conversely, the clustering threshold segmentation algorithm can be used to classify the pixels with similar characteristics into the same category to extract the agricultural parcels in the remote sensing image [20]. Although the above method is simple and easy to implement, it only considers the shallow features and cannot use the high-level semantic features of remote sensing images [21]. In the face of complex environmental backgrounds, the extraction accuracy is poor and it is difficult to extract complete agricultural parcels.

In recent years, with the development of deep learning, the method of extracting land cover categories using deep learning has been widely used, especially for the extraction of single-category features [22,23,24,25]. Deep learning semantic segmentation to extract agricultural parcels from remote sensing images has been used by many scholars. Compared with traditional machine learning and threshold segmentation, deep learning networks can extract high-dimensional semantic information and has stronger adaptability to complex and changeable terrain [26,27]. At first, some scholars used convolutional neural network to extract agricultural parcels from remote sensing images for semantic segmentation, such as u-net, FCN, segnet, and other networks [28,29,30]. Convolutional neural networks extract agricultural parcels by convolutionally checking the color, texture, and other features of the agricultural parcel. However, this is affected by the size of the convolutional kernel receptive field. For convolutional neural networks, it is difficult to effectively integrate the context information of the image, and its extraction ability is weak over a large area. In this regard, some scholars have proposed the Deeplabv3 convolution model [30], which uses hole convolution to expand the receptive field and obtain a wider range of influence characteristics. Some scholars use the Swin Transformer as the backbone network and the Transformer’s self-attention mechanism to integrate image contexts [31]. However, in terms of the integration of global features and local details, it is difficult to extract the global features of the image using Deeplabv3, and the Swin transformer trunk has a poor ability to extract local information features. Therefore, integrating global features and local details remains a challenge in agricultural parcel extraction from remote sensing images.

In the process of semantic segmentation for extracting agricultural parcels, the effect of parcel edge segmentation is often poor. In the process of image feature extraction, with the sampling under the feature layer, the edge detail feature is often lost [32]. In order to solve this problem, some scholars have proposed a special edge detection algorithm [24], such as the RCF and HED [33,34] network models, which extract pixels at the edge of the parcel by capturing features at multiple scales. Demarcating agricultural parcels by extracting their edges is essentially the semantic segmentation of edge pixels. Therefore, some scholars have proposed multitask learning network models to extract both the region and edge of agricultural parcels, improving model accuracy by integrating multiple related tasks. For example, François et al. have proposed Resunet-a, a multitask network model that uses three tasks: extracting parcel boundary, extracting parcel area, and extracting distance to boundary. The addition of related tasks improves the shape and boundary of the extraction results [35]. In addition et al. have proposed the SEANet multitask model, which takes into account the uncertainty between different tasks on the basis of designing regional, edge, and distance tasks, and sets the multitask adaptive loss weight to balance the loss between different tasks [36]. Jiang et al. have proposed the BsiNet lightweight multitask network model. On the basis of completing the same task, the parameters of the model are less than those of Resunet-a and SEANet, so the calculation speed is faster and the extraction efficiency is higher [37]. Although the above methods can improve the accuracy of agricultural parcel edge extraction, they are all at the pixel level. This introduced problems such as the fact that the edges of the parcels will not be closed, the extracted adjacent parcel will be bonded, and there will be holes in the middle of the parcel, as well as fragmentation and salt and pepper-like phenomena. The pixel-based extraction method cannot reflect the topological relationship between agricultural parcels, so the agricultural parcels extracted by the model are difficult to be vectorized directly, and need to undergo post processing before mapping.

In the remote sensing community, contour segmentation is often used to extract targets. Jiao et al. have proposed a PolyR-CNN contour segmentation network model to extract polygonal buildings in remote sensing images [38]. Xu et al. proposed to integrate Mask RCNN and DeepSnake to improve the recognition ability of cultivated land [39]. Semantic segmentation extracts agricultural parcels from remote sensing images by first generating raster masks and then using a vectorization scheme to convert these masks into vector polygons of agricultural parcels, as shown in Figure 1. In contrast, contour segmentation generates a set of contour points from remote sensing image features, which are used to represent the vector polygon of the target. This is an object-based extraction method, as shown in Figure 1. For agricultural parcels, the boundary of agricultural parcels extracted by contour segmentation is closed, and can provide topological information between parcels. Therefore, the vector contour extraction is more in line with the actual situation than the raster pixel extraction. But, the existing contour extraction models, such as PolarMask [40] and Point Set [41], are unable to extract complete parcels for large areas. It cannot achieve good results in contour boundary fitting, therefore, contour segmentation is rarely used to extract agricultural parcels from remote sensing images.

To address the issues of voids, fragmentation, and unclosed boundaries in raster masks extracted from agricultural parcels using semantic segmentation, this paper proposes a vector contour prediction network based on a hybrid backbone and multiscale edge feature extraction module (HEVNet) for agricultural parcel extraction in remote sensing images. This solved the problems of poor extraction performance and fine edge refinement for agricultural parcels of different scales, and improved the geographic measurement accuracy of agricultural parcel delineation. The main contributions of this study are summarized as follows:

The hybrid backbone combining Resnet 50 and Swin Transformer is used to extract the texture features of agricultural parcels while taking into account the context information. It breaks through the limitation of the receptive field of the traditional convolutional network and enhances the recognition ability of the network for agricultural parcels of different sizes.
The multiscale edge feature extraction module is added to predict and calculate the loss of the parcel edge from the multiscale features, so as to supervise and restrict the prediction of the parcel edge and improve the accuracy of contour prediction.
We use the vector contour prediction module based on contour instance segmentation to extract the agricultural parcels in an object-based manner, and directly generate the vector polygon of the agricultural parcels to solve the problems of internal cavity and unclosed edge of the agricultural parcels.

2. Materials and Methods

2.1. Research Region and Data

In this paper, we selected three public datasets: the cultivated land dataset in the Netherlands, the cultivated land dataset in Denmark, and a cultivated land dataset from the iFLYTEK challenge in 2021. In addition, we also selected Hengyang in Hunan Province as the research area, and outlined the cultivated land in the remote sensing images of Hengyang as the dataset. Data mapping and processing using ARCGIS 10.2. The overview of the four study areas is shown in Figure 2.

Netherlands data: The remote sensing image of the Netherlands is from a Sentinel-2 multispectral 10-m resolution remote sensing image downloaded from GEE. The image was collected in May 2020, including RGB and NIR bands, and the cloud content in the image accounted for less than 10%. Agricultural parcel vector labels are obtained from the key registered crop parcel dataset provided by Publieke Dienstverlening Op de Kaart (PDOK). The registered crop parcel dataset includes farmland, grassland, wasteland, natural areas, and other parcels. Here, we only extract farmland as the label of our dataset. The whole area is located in the northeast of the Netherlands, covering an area of about 15000 square kilometers. The mechanization mode is used in a large area of Dutch agriculture. The parcel is densely distributed, the shape is relatively regular, and there are obvious boundaries or obvious planting patterns.

Denmark data: The dataset of Denmark is from the European Union Land Parcel Identification System (LPIS), which includes two Sentinel-2 true-color synthetic images obtained on May 8th, 2016 and the polygon vector files of agricultural parcels manually created in the region. The remote sensing image has a resolution of 10 m and contains three bands of RGB. The whole area is located in the east of Denmark, covering an area of more than 23000 square kilometers. Agricultural parcels in Denmark are relatively densely distributed, but the area is different, and the shapes of parcels are diverse, with many irregular parcels.

iFLYTEK data: iFLYTEK dataset is from the iFLYTEK Challenge 2021 cultivated land extraction from high-resolution remote sensing image. The image data is composed of high-spatial resolution images obtained by the Jilin-1 satellite, with a total of 31 images, of which the spatial resolution is between 0.75 m and 1.1 m. Each image contains RGB and NIR bands. The parcels in the whole dataset are unevenly distributed in the image, but the shape is regular and the structure is simple, without obvious fragmentation, and the boundary between adjacent parcels is not obvious.

Hengyang data: We collected the remote sensing images obtained by the GF-2 satellite during the receiving season of Hengyang in 2023, highlighting the diversified farmland landscape in the region. The spatial resolution of the panchromatic band and the multispectral band (RGB and NIR) of the remote sensing images provided by the GF-2 satellite is 0.8m and 3.2m, respectively. The downloaded remote sensing image is calibrated by radiation and atmospheric correction, and the panchromatic sharpening method is used to fuse the multispectral band and panchromatic band in the image, and its spatial resolution is improved to 1 m. In the processed high-resolution remote sensing image, 1802 agricultural parcels in the study area were sketched by hand as sample labels. Under the influence of topography and climate, the farmland in Hengyang is seriously fragmented, the agricultural parcels are scattered, of different sizes, have irregular shapes, and have a disorderly distribution, and farmland boundaries are fuzzy.

2.2. Data Preprocessing

Two regions were randomly selected in the Netherlands and Denmark datasets as the training set and test set of their respective images. The first 26 images in the iFLYTEK dataset were used as the training set, and the last 5 images were used as the test set. Most of the western regions in Hengyang dataset are used as the training set of the model, and a small part of the eastern regions are used as the test set of the model. In order to make the experimental data acceptable to the model, the image and label data were sliced. The remote sensing image is cut into 512 × 512 slices by a sliding window. According to experience, the overlap rate between adjacent parcels is 25%, and then the sliced data is cleaned to filter out the slices with less than 20% of the farmland area. For the Hengyang data, due to the small sample size, the data were enhanced, and the sample size was amplified by horizontal and vertical flipping, mixing, and color jitter. The relevant information of the obtained data is shown in Table 1. The number of training validation sample slices obtained from the Netherlands, Denmark, iFLYTEK, and Hengyang datasets was 2287, 3658, 3744, and 1152, respectively, and the number of test set samples obtained was 1428, 1356, 566, and 365. Then, the samples of the training validation set are divided into training samples and validation samples according to the ratio of 4:1. At the same time, the vector file of the parcel annotation is sorted into two labels, raster mask and COCO format, for model reasoning and accuracy evaluation. Finally, we obtained four datasets, which are mutually independent. The model was trained, validated, and tested on each dataset separately.

2.3. Methods

In this paper, we propose a vector contour prediction network based on a hybrid backbone and multiscale edge feature extraction module (HEVNet) to extract agricultural vector parcels from remote sensing images. The network model is composed of the following components: a hybrid backbone, a multiscale edge feature extraction module, and a vector contour prediction module. The overall structure is shown in Figure 3. The hybrid backbone module combines two feature extraction models, Resnet 50 and Swin Transformer, to extract multiscale features from remote sensing images and generate feature pyramids. The multiscale edge feature extraction module predicts the edge pixels of the agricultural parcels by obtaining the extracted information, and returns the edge enhancement information. The vector prediction module uses the method of instance segmentation, regarding the farmland as a polygon, and generates the vector polygon boundary box of the farmland through the regression and discrimination of the position of the contour points, so as to solve the problems of internal cavity and the unclosed phenomenon of the parcel boundary in the previous pixel-based prediction.

2.3.1. Hybrid Backbone

In the past, convolutional neural networks were usually used for the feature extraction of remote sensing images. Convolutional neural networks have a good effect on image local feature extraction, but due to the limitation of the convolutional kernel receptive field, it is difficult to recognize large-scale targets and obtain global information. In this regard, some scholars proposed the Transformer model with an attention mechanism as the core, and developed a Swin Transformer image feature extraction module similar to convolution feature extraction. The Swin Transformer feature extracts features efficiently through the attention mechanism of the window, breaks through the limitation of the convolution layer on the receptive field, and can use the self-attention mechanism to better capture the global context information. However, when dealing with fine-grained features, it may ignore local details, resulting in small local features that may be covered by global information.

According to the characteristics of convolutional networks with strong local information extraction and the Swin Transformer with strong global semantic information extraction, we use a feature extraction module that combines the convolutional network Resnet 50 and the Swin Transformer model, and propose a feature fusion module that can fuse the features of different scales extracted by the two in the channel and space. As shown in Figure 4, the two feature extraction modules extract features at different scales from remote sensing images, import the extracted image features at each level into the feature fusion module for feature fusion, and use the fused features as the input of the next level feature extraction module. The fused feature maps of different scales are used to form the feature pyramid of the remote sensing image.

As shown in Figure 4, the feature fusion module references the previous CBAM attention mechanism and fuses image features from both spatial and channel perspectives [42]. Firstly, the feature images of different channel numbers are stitched, and the number of characteristic channels is adjusted through 1 × 1 convolution. Then, the channel and spatial position are weighted by means of average pooling, maximum pooling, global average pooling, and global maximum pooling, that is, spatial attention and channel attention. Finally, the fused feature layer is output through a 1 × 1 convolution layer.

2.3.2. Multiscale Edge Feature Extraction Module

The process of the continuous down sampling of image features will cause a loss of edge detail information. Therefore, we propose an edge prediction module to predict the edge pixels of agricultural parcels, and restrict the extraction of edge features by the hybrid backbone network. As shown in Figure 5, the edge decoder is connected to the four different scale features extracted from the hybrid backbone network. The decoder can obtain the probability prediction of the edges of agricultural parcels with different scales, and return the extracted edge features to the backbone network. Finally, the prediction results of each scale are superimposed and fused through the convolution network to obtain the final fused edge prediction map. Finally, the loss function of each scale edge prediction and the fused edge prediction loss function are calculated.

Edge decoder: In order to enhance the ability of the model to perceive the edge information in the feature graph, the edge perception module is added to integrate the gradient information in the feature graph. As shown in Figure 6, Sobel filters with four different directions are used for filtering. The edge of the agricultural parcels usually appear in the region with obvious difference in pixel values and fixed directions. Sobel filters can better capture the edge features of different directions in remote sensing images. The filtered image is stitched and fused through a 3 × 3 convolution network. The fused edge features are first returned to the backbone network, and then the fused edge feature image is reduced to one channel dimension through a 1 × 1 convolution operation. Finally, the image resolution is restored to the original image size using bilinear interpolation, and the prediction results of the agricultural parcels edges are obtained.

Loss function: Referring to the HED edge detection loss function, we design the edge detection loss function in this paper [34]. For the prediction of agricultural parcel edges, the edge loss function is divided into the loss of the lateral output layer and the loss of the fusion layer. The loss of the lateral output layer is as follows:

L_{s i d e} = \sum_{m = 1}^{4} L_{s i d e}^{m}

(1)

L_{s i d e}^{m} = \{\begin{array}{l} - β \times \log (1 - \hat{y_{E}}) y_{E} = 0 \\ - (1 - β) \times \log \hat{y_{E}} y_{E} = 1 \end{array}

(2)

where

L_{s i d e}^{m}

represents the loss of four different lateral output layers.

y_{E}

and

\hat{y_{E}}

represent the sample label value and the predicted value, respectively.

β

is the weight parameter. Because the number of edge pixels and non-edge pixels in the edge detection task is extremely unbalanced, it is necessary to introduce a weight parameter to balance the weights of different categories, and its expression is as follows:

β = \frac{| Y_{-} |}{| Y_{+} + Y_{-} |}

(3)

where

Y_{+}

and

Y_{-}

represent the number of edge and non-edge pixels, respectively.

The loss of the fusion layer is calculated as follows:

L_{f u s i o n} = D i s t (Y, {\hat{Y}}_{f u s i o n})

(4)

where

D i s t

represents the cross-entropy loss function, and

{\hat{Y}}_{f u s i o n}

is the prediction result of fusing all side output layers through the activation function σ as follows:

{\hat{Y}}_{f u s i o n} = σ (\sum_{m = 1}^{4} A_{s i d e}^{m})

(5)

where

A_{s i d e}^{m}

represents the result of each lateral output layer.

The final edge prediction loss function is the sum of the losses of the fusion layer and the lateral output layer as follows:

L_{E d g e} = L_{f u s i o n} + L_{s i d e}

(6)

2.3.3. Vector Contour Prediction Module

In the past, the extraction of farmland blocks was essentially the pixel-level classification of remote sensing images, so the extracted farmland blocks would inevitably have the phenomenon of incomplete edges and empty parcels. In this regard, we propose a method to extract the parcel in the form of a vector, taking a single farmland area as an example, representing the contour of the target parcel with a polygon composed of the coordinate set of feature points, and extracting the agricultural parcel in an end-to-end manner. The contour prediction module proposed by us refers to the Fast R-CNN target detection model, and predicts the vector contour of the target object on the basis of generating the target detection box.

As shown in Figure 3, the contour prediction structure is similar to the Fast R-CNN structure [43]. Before vector contour prediction, target detection must be performed to generate a target detection box, and then the parcel contour is predicted in the target detection box. The remote sensing image is feature extracted through the previous hybrid backbone network, and edge features are enhanced through the multiscale edge feature extraction module. Then, the extracted feature map is put into the RPN (region proposal network) module to generate the extraction candidate box. According to the obtained candidate box, the features of the corresponding region are clipped in the feature map, and then the clipped part is imported into the ROI Align module. The ROI Align module is responsible for extracting the regional features in the candidate box and converting the regional features of different scales into the same scale size. The obtained regional features are transmitted to the full connection network and then transferred to the target box prediction module and contour prediction module. The target box prediction module generates the category and regression parameters of the target box, and the contour prediction module generates the confidence and regression parameters of the contour points.

The polygon representing the agricultural parcel is formed by a series of contour points. The prediction of agricultural parcel contour is essentially a task of identifying and regressing contour points. As shown in Figure 7, according to the target parcel boundary box obtained by target detection, points are generated by uniformly sampling along the surrounding boundaries of the target parcel boundary box as initial polygon contour points. According to the features in the target boundary box extracted by ROI Align, the offset of each initial contour point is obtained through the calculation of the convolution layer and the full connection layer, and the position of the contour points is adjusted. Each contour point is offset in the direction of the closest real contour point, and all contour points are regressed according to the offset between the original point and the real point, and then the redundant contour points are deleted through the confidence judgment of contour points, and finally a contour point set is obtained.

In this experiment, 64 contour points are preset for each polygon, and 64 points are also sampled in the annotation as the corresponding ones. According to the confidence of the contour points obtained, whether they are effective contour points is determined, and the vector polygons of agricultural parcel in remote sensing images can be obtained by connecting the effective contour points.

Loss function: For contour prediction, the loss function is divided into three parts, regional proposal network loss, detection box loss, and contour point loss. Each part of the loss is divided into classification loss and regression loss. The classification loss of RPN and the box is used to determine whether each anchor box contains targets, and the point classification loss is used to determine whether each point is a valid contour point. For the loss of classification and regression, the traditional Fast R-CNN loss function is used for calculation [43].The calculation formula of classified loss is

L_{c l s} = - \frac{1}{N_{c l s}} \sum_{i} [Y_{p} l o g (\hat{Y_{p}}) + (1 - Y_{p}) l o g (1 - \hat{Y_{p}})]

(7)

where

\hat{Y_{p}}

is the real label of the target frame (the foreground is 1 and the background is 0),

Y_{p}

is the foreground probability predicted by the model, and

N_{c l s}

is the number of anchor frames participating in the training.

Regression loss is used to adjust the coordinates of the bounding box (or contour point) to make it closer to the true value. The calculation formula is as follows:

L_{r e g} = \frac{1}{N_{r e g}} \sum_{i} p_{i}^{*} {S m o o t h L}_{1} (t - t_{i}^{*})

(8)

where

p_{i}^{*}

is the indicating function, which is 1 when the anchor box is a positive sample,

t

represents the predicted boundary box (contour point) coordinates, and

t_{i}^{*}

represents the real boundary box (contour point) coordinates.

{S m o o t h L}_{1}

is used to calculate the offset between the predicted value and the real value.

The loss of vector contour prediction module is as follows:

L_{p o l y g o n a l} = λ_{1} L_{R P N_c l s} + {λ_{2} L}_{R P N_r e g} + {λ_{3} L}_{B o x_c l s} + {λ_{4} L}_{B o x_r e g} + λ_{5} L_{p o i n t_c l s} + {λ_{6} L}_{p o i n t_r e g}

(9)

The classification loss and regression loss of RPN, box, and point are considered for contour prediction loss. λ is the weight of different losses. Referring to the settings of the weight of the loss function for instance segmentation in previous studies [38,39,43], and considering that this study focuses more on predicting contour points, we set the weight λ as 2, 2, 3, 3, 6, and 6 in this experiment.

The overall loss of this model: combined with the multiscale edge feature extraction module and the vector contour module, the overall loss function is constructed according to the proportion of 0.3 and 0.7.

L O S S = {0.7 L}_{p o l y g o n a l} + 0.3 L_{E d g e}

(10)

2.4. Accuracy Evaluation

For the accuracy evaluation of the experimental results, in order to not only make a comparison with the traditional method, but also reflect the characteristics of vector extraction, we use two kinds of evaluation indexes for accuracy evaluation, the traditional semantic segmentation evaluation index and the evaluation index based on the contour object.

The semantic segmentation rating indicators include precision (P), recall (R), overall accuracy (ACC), intersection union ratio (IOU), and F1 score (F1). The calculation formula is as follows:

P = \frac{T P}{T P + F P}

(11)

R = \frac{T P}{T P + F N}

(12)

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(13)

I o U = \frac{T P}{T P + F P + F N}

(14)

F 1 = \frac{2 \times R \times P}{R + P}

(15)

where TP and TN are the areas correctly identified as agricultural parcels and non-agricultural parcels, while FN and FP are the number of pixels incorrectly identified as non-agricultural parcels and the area of agricultural parcels.

Object evaluation refers to taking the parcel as a single object to evaluate the geometric accuracy of the extracted results [38,44]. It includes over-classification error, under-classification error, and polygon similarity score (PoLiS).

The over-classification error and under-classification error [44] are used to indicate the excessive segmentation and insufficient segmentation of a single parcel object, as shown in the Figure 8. The smaller the error value, the better the segmentation effect.

The calculation formula is as follows:

S_{o v e r} = 1 - \frac{1}{n} \sum_{i}^{n} \frac{a r e a (T_{i} \cap E_{i})}{a r e a (T_{i})}

(16)

S_{u n d e r} = 1 - \frac{1}{n} \sum_{i}^{n} \frac{a r e a (T_{i} \cap E_{i})}{a r e a (E_{i})}

(17)

where

T_{i}

and

E_{i}

, respectively, represent the real value and predicted value of a single parcel. The over error and under error can be calculated by calculating the ratio of their overlapping area to their respective areas.

The polygon similarity score (PoLiS) is mainly used to measure the similarity between the extracted polygon contour and the real contour [38]. The smaller the value is, the closer the extraction result is to the real contour. The calculation formula is as follows:

P o L i S (X, Y) = \frac{1}{2 M} \sum_{i}^{M} \min (x_{i} - y) + \frac{1}{2 N} \sum_{i}^{N} m i n (y_{i} - x)

(18)

where

X

represents the predicted fixed-point coordinate set,

Y

represents the real fixed-point coordinate set, and the formula represents the average distance between the predicted fixed-point coordinates and the nearest point of the real fixed-point coordinates.

3. Results

In this experiment, our proposed model and all the experiments involved are implemented in the pytorch 1.8 framework. The learning strategy is as follows: we use a stochastic gradient descent (SGD) optimizer with an initial learning rate of 0.001 and a weight attenuation of 0.00001, and the momentum is set to 0.9. The multisteplr method is used to adjust the learning rate within the set time interval. The batch size is configured as 8, and the model is trained for 100 rounds. All our experiments were conducted on an NVIDIA GeForce RTX 3090 GPU.

3.1. Ablation Experiment

In this experiment, we propose HEVNet, which includes a hybrid backbone module to extract remote sensing image features and a multiscale edge feature extraction module to constrain and enhance the edge features of agricultural parcels. In order to reflect the effectiveness of these two modules, we designed this ablation experiment. We use different backbone network models and delete the multiscale edge feature extraction module, and compare the agricultural parcel extraction results with the original model to verify the effect of the corresponding module on the extraction results.

Due to the large amount of data in the iFLYTEK dataset and the relatively regular agricultural parcels, it is conducive to vector contour extraction. Therefore, we verify the influence of different modules on the experimental results of the iFLYTEK dataset.

3.1.1. Influence of Different Backbones on the Results

The backbone network plays an important role in the process of extracting remote sensing image features. In this section, we use Resnet 50, Swin Transformer, and our proposed hybrid backbone of Resnet 50 and Swin Transformer feature fusion as the model backbone to extract agricultural fields in the iFLYTEK dataset. The results are shown in Table 2. From the evaluation index

F_{1}

and IOU, it can be seen that the hybrid backbone network has the best result, and the

F_{1}

score is 2.74% and 6.76% higher than the Resnet 50 and Swin Transformer backbones, respectively. The IOU value was 2.74% and 6.76% higher than the Resnet 50 and Swin Transformer trunks. From the perspective of object-based indicators, the over error and under score error of the hybrid backbone are less than Resnet 50, and the over error is slightly higher than that of the Swin Transformer, but the under score error has decreased significantly. This shows that the feature extraction performance of the Resnet 50 and Swin Transformer trunks is improved by the proposed hybrid trunk, especially for Swin Transformer.

3.1.2. Influence of Multiscale Edge Feature Extraction Module on the Results

The edge feature extraction module plays a role in constraining and enhancing the edge information features of agricultural parcels at different scales in the model, and returns the extracted features to the backbone network. From the results obtained in Table 2, it can be seen that with the addition of the edge feature extraction module, the

F_{1}

score and IOU index increased slightly, while the error in the object-based indicators decreased more significantly, in which the under score error and polis decreased by 4.00% and 0.31, respectively. It shows that the edge feature extraction module is helpful in improving the accuracy of parcel division and parcel boundary identification.

3.1.3. Visual Display of Extraction Results

Figure 9 shows the results of our model using different backbone networks. As can be seen from the figure, compared with our hybrid backbone, the Resnet 50 trunk model often has missing points, which cannot recognize the slender agricultural parcels, and for large farmland, it divides the area into several small blocks, which cannot guarantee the integrity of farmland extraction, as shown in the blue box in Figure 9. This is because the Resnet 50 trunk is affected by the receptive field and has an insufficient learning ability for a large range of irregular-shaped features. Compared with our hybrid trunk, Swin Transform will mistakenly identify wasteland and grassland as farmland and miss small agricultural parcels, as shown in the yellow box in Figure 9. This is because the Swin Transform trunk pays attention to the connection between the global vision and context, and ignores the texture features of agricultural parcels, resulting in the misjudgment of parcels.

Figure 9 also shows the extraction results of adding the edge feature extraction module and not adding the model. As can be seen from the figure, compared with the model with the edge feature extraction module, the edge of the agricultural parcels extracted by the model without the added model is more tortuous (far from the edge position of the real parcel) and the shape of the parcel is irregular, so there is the phenomenon of missing the detection of small parcels. This is because the edge details of the image are lost in the process of down sampling, and the edge feature extraction module can constrain and integrate the edge features of the parcel at different scales, which improves the accuracy of the parcel edge.

3.2. Comparative Experiment

In order to verify the effectiveness of our proposed HEVNet, we selected the latest four semantic segmentation extraction models and one contour segmentation extraction model as the comparison method. The four semantic segmentation models are ResUNet-a [35], BisNet [37], SEANet [36], and DBBANet [45]. Among them, ResUNet-a is the first multitask learning network in the field of agricultural parcel extraction; the BisNet algorithm is based on the optimization and improvement of ResUNet-a; the SEANet algorithm integrates semantic edge perception into farmland extraction; and the DBBANet algorithm integrates spatial awareness and boundary awareness. Ploy r-CNN [38] is selected as the contour segmentation extraction model. The Ploy-r-CNN network was previously applied to extract the contour of buildings. Here we use it to extract agricultural parcels. The extraction result of the semantic segmentation model is in the form of a raster, and the contour segmentation result is in the form of a vector. In order to compare the accuracy between the vector extraction result and the raster extraction result, we convert the vector extraction result into a raster, and then use the semantic segmentation evaluation index to evaluate the accuracy.

Our model and the five control models were trained and tested on four different datasets. The parcels in the Netherlands and the iFLYTEK datasets are relatively regular, while the parcels in the Denmark and the Hengyang datasets are relatively fragmented, and the extraction effect may be different for different types of parcels.

3.2.1. Experimental Results and Analysis of Regular Parcels in Netherlands and iFLYTEK

The results of model testing on the Netherlands and the iFLYTEK dataset are shown in Table 3. Compared with the semantic segmentation model, the recall rate of our HEVNet model is relatively low, but the precision rate is significantly improved, which shows that the semantic segmentation model often misjudges non-agricultural parcels as agricultural parcels, while our HEVNet model has fewer problems in this regard. On the whole, the key evaluation indicators of our model are better than the latest semantic segmentation model, which is embodied in the

F_{1}

score and IOU index. In the iFLYTEK dataset, the

F_{1}

score of HEVNet was 2.09% higher than that of the DBBANet model, and the IOU index was 0.94% higher than that of the DBBANet model. In the Netherlands dataset, the

F_{1}

score of HEVNet was 0.27% higher than that of the ResUNet-a model, and the IOU index was 1.32% higher than that of the DBBANet model. Compared with Ploy-r-CNN, our HEVNet has higher

F_{1}

and IOU indexes by 2.33% and 1.70% in the iFLYTEK dataset and 1.82% and 3.00% in the Netherlands dataset. The

F_{1}

score and IOU index are higher than the current optimal model.

As shown in Figure 10 and Figure 11, the visual images of the extraction results of agricultural parcels by our model and other models are displayed to better compare the difference between the vector contour extraction method and the raster extraction method. As can be seen from Figure 10, although the previous semantic segmentation based on raster extraction can also identify large areas of agricultural parcels, it is difficult to divide densely arranged parcels and cannot clearly show the boundaries of agricultural parcels. As a result, the boundaries of the parcel are blurred, the edges of the parcel are not closed (as shown in the red box in the figure), and the adjacent parcel is bonded. Neatly arranged parcels cannot be separated separately, and a whole large parcel often appears, which cannot reflect the real form of parcels. As can be seen in Figure 11, the raster extraction method can miss detecting parcels, resulting in empty parcels (as shown in the blue box), and wasteland can be wrongly detected as farmland (as shown in the yellow box). The HEVNet model does not have these problems. The extraction method based on contour regards each parcel as a separate object and describes the spatial position of the parcel by contour points, so there will be no interruption and lack of closure of the parcel boundary. Compared with Ploy-r-CNN, which is the same contour extraction model, our model has less missed detections and false detections. It can also be seen from the figure that the description of the edge of the parcel in our model is closer to the true value, and the spacing between adjacent parcels is more obvious, which can clearly express the boundary of adjacent parcels.

3.2.2. Experimental Results and Analysis of Irregular Parcels in Denmark and Hengyang

The results of the model tests on the Denmark and Hengyang datasets are shown in Table 4. In the extraction of irregular parcels, our HEVNet model still has a high precision and low recall rate compared with the semantic segmentation model. In the Danish dataset, the HEVNet model achieved the highest precision, the highest accuracy, and the highest

F_{1}

index, but in terms of IOU index, our model ranked second, which was 0.33% lower than the highest model, ResUNet-a. In the Hengyang dataset, our model obtained the highest precision, the highest accuracy, and the highest IOU index, which were 3.99%, 1.41%, and 1.2% higher than those of the second highest ResUNet-a model, respectively, but the

F_{1}

index was lower than that of the ResUNet-a model, which was 3.3% lower. Compared with PloyR-cnn, the

F_{1}

and IOU indexes of our model are 2.52% and 2.58% higher in the Denmark dataset, and 1.17% and 1.39% higher in the Netherlands dataset. This result shows that our model has an improved performance compared with the previous models in the process of extracting irregular parcels.

Figure 12 and Figure 13 show the extraction results of farmland in Denmark and Hengyang. From the results in Figure 12, in the Danish dataset, when extracting irregular and fragmented farmland based on raster extraction, there will be voids in the middle of the parcel (as shown in the blue box), and the edges between adjacent parcels cannot be separated (as shown in the red box), therefore, the complete agricultural parcel cannot be extracted. Although the extraction method based on vectors also has deviations in the division of agricultural parcel edges, the extracted agricultural parcel is very complete and there is an obvious division between agricultural parcels. In the Hengyang dataset in Figure 13, the raster extraction method misjudges idle wasteland as farmland (shown in the yellow box), while the contour extraction model does not have such a phenomenon. In large and dense farmland, the vector-based model can adequately show the roads between farmland, but the raster extraction method cannot. Although there may be misjudgments in contour extraction, generally speaking, the results based on vector contour extraction are better than those based on raster extraction in terms of parcel integrity and edge thinning.

The extraction results of our model vary greatly in different agricultural environments. In the regular parcels of the Netherlands and the iFLYTEK datasets, our model’s IOU index is 81.35% and 78.02%, and the

F_{1}

scores are 89.79% and 90.43%. In the irregular parcel datasets of Denmark and Hengyang, the IOU index is 67.92% and 66.35%, and the

F_{1}

scores are 72.64% and 71.49%. We can see that our model performs much better in extracting regular parcels than in extracting irregular parcels. From the comparison of the extraction results with existing models, our model has a more prominent advantage in extracting regular parcels, indicating that our model is more suitable for extracting regular agricultural parcels.

4. Discussion

This study’s key innovation was exceeding the previous prediction of agricultural parcels based on the semantic segmentation of raster data and the use of object-based vector contour prediction to extract agricultural parcels. At the same time, a hybrid backbone combining Resnet and Transform is proposed, which combines the features extracted from the two networks at different scales with the spatial and channel attention mechanisms, so that it cannot only contact the global context information, but also take into account the local details. Additionally, a multiscale edge feature extraction module is proposed to solve the problem of the fuzzy edges of agricultural parcels. By extracting and constraining the edges and shapes of agricultural parcels at different scales, the accuracy of the edge extraction of agricultural parcels is improved.

In terms of agricultural parcel extraction, our model has special advantages compared with previous raster-based extraction methods [35,36,45]. Previous methods aimed to improve the accuracy of agricultural parcel extraction by adding attention mechanisms [45], edge refinement [36], and multitask extraction [35]. However, due to the limitations of raster pixels, it was difficult to extract complete agricultural parcels. Our model extracts agricultural parcels in the form of objects, which is more efficient than raster-based methods. Compared with the raster-based approach, our model has a higher accuracy than the previous models in extracting agricultural parcels. The extracted parcels are more complete, the boundaries of the parcels are clearer, and the isolated segmentation is reduced, which shows a good effect in solving the hole phenomenon in the middle of large agricultural parcels. However, at the same time, our model has a low recall rate, which may result in missed detections for small agricultural parcels, and the recall rate of agricultural parcels is not as good as previous raster extraction methods.

Although the HEVNet model in this paper shows a good performance in extracting complete agricultural parcels, it still has some limitations. First of all, because the vector contour prediction model in this paper generates vector polygons by generating contour points, it can only predict closed agricultural parcels. For some farmland with houses and ponds inside, the agricultural parcels themselves have holes, so the model in this paper cannot predict their inner contours (such as Figure 10). The method of generating a vector image containing voids to represent the corresponding shape of the parcel is an improvement in the direction of this model.

Secondly, the contour prediction points generated by our model represent the polygon of agricultural parcels. The number of contour points is usually set quite high to cover most farmland plots, but this may cause contour point redundancy, which will cause the distortion of the predicted polygon of agricultural parcels [38] (such as Figure 11). The method for reducing the redundant contour points of simple-shaped agricultural parcels while extracting complex-shaped agricultural parcels remains to be optimized.

In addition, we find that the HEVNet model has a poor fitting effect on the shape of fragmented agricultural parcels, and there can be the problem of mistakenly dividing one parcel into multiple parcels (such as Figure 12). This may be because the edge features of scattered agricultural parcels are complex, and it is difficult for the multiscale edge feature extraction module to understand their features at a deeper level, thus similar texture features can be misjudged as agricultural parcel boundaries [26]. In the future, we can improve the fitting effect of agricultural parcels by optimizing edge feature extraction modules [36] and multitasking collaborative networks [35].

In general, although our model has some problems in extracting some special-shaped parcels, it has achieved remarkable results in solving the problems of agricultural parcel cavity, edge blur, and so on compared with the previous raster extraction model. In the future, we can conduct relevant research to optimize the extraction methods for the edges of small agricultural parcels and solve the problem of contour boundary extraction for special-shaped agricultural parcels, which can be used to extract fragmented agricultural parcels in special terrains.

5. Conclusions

This paper proposes a HEVNet model based on a hybrid backbone network and multiscale edge feature extraction, which is used to extract the vector polygon of agricultural parcels in remote sensing images. In this model, we propose a hybrid trunk module and a multiscale edge prediction module to assist in agricultural parcel contour prediction. By integrating features from the ResNet 50 and Transformer modules, the hybrid backbone module fully extracts the texture features of agricultural parcels, overcomes the limitations of traditional convolutional receptive fields, and improves the extraction of parcels of varying sizes. The multiscale edge feature extraction module identifies the edges of agricultural parcels on different scale features, and designs a multiscale edge loss function to restrict the extraction of edge features, reducing the loss caused by edge features in the down sampling process, so as to obtain agricultural parcels with clearer edges. In this aspect of mapping, we use the method of directly generating contour points to generate an agricultural parcel vector map, instead of the previous method of first using raster extraction and then using raster to vector mapping. We conducted experiments using datasets from Denmark, the Netherlands, iFLYTEK, and Hengyang, China. From the results, our model achieved IOU indices of 67.92%, 81.35%, 78.02%, and 66.35% on the four datasets, and

F_{1}

scores of 72.64%,89.79%, 90.43%, and 71.49%, respectively. The evaluation metrics are mostly higher than those of the previous raster models (ResUNet-a, BisNet, SEANet, and DBBANet) and the Ployr-cnn model in terms of IOU and

F_{1}

index. The effect extracted from regular parcels is obviously higher than that extracted from irregular parcels. From the visual map, the agricultural parcels extracted by our model are more complete and the boundary is more accurate and clear. Compared with raster-based extraction, vector-based extraction focuses more on the geometric information of agricultural parcels, effectively addressing the issues such as empty parcels, unclosed edges, and blurred boundaries that were present in previous extraction methods.

Author Contributions

Conceptualization, F.T. and L.W.; methodology, F.T.; validation, F.T. and L.W.; formal analysis, F.T. and S.L.; data curation, F.T. and S.L.; writing—original draft preparation, F.T. and S.L.; writing—review and editing, F.T. and L.W.; visualization, F.T.; project administration, L.W.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All code and data will be available from the authors upon request.

Acknowledgments

The authors wish to thank the anonymous reviewers for their constructive comments that helped improve the scholarly quality of the paper. This work was supported by the High-performance Computing Platform of China University of Geosciences Beijing.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, X.; Reba, M.; Coffin, A. Cropland mapping with L-band UAVSAR and development of NISAR products. Remote Sens. Environ. 2021, 253, 112180. [Google Scholar] [CrossRef]
Wang, L.; Wei, F.; Svenning, J.C. Accelerated cropland expansion into high integrity forests and protected areas globally in the 21st century. Iscience 2023, 26, 106450. [Google Scholar] [CrossRef]
Phalke, A.R.; Özdoğan, M.; Thenkabail, P.S. Mapping croplands of Europe, middle east, russia, and central asia using landsat, random forest, and google earth engine. ISPRS J. Photogramm. Remote Sens. 2020, 167, 104–122. [Google Scholar] [CrossRef]
Wu, J.; Peng, D. Tree-crown information extraction of farmland returned to forests using QuickBird image based on object-oriented approach. Spectrosc. Spectr. Anal. 2010, 30, 2533–2536. [Google Scholar]
Yin, H.; Brandão, A., Jr.; Buchner, J. Monitoring cropland abandonment with Landsat time series. Remote Sens. Environ. 2020, 246, 111873. [Google Scholar] [CrossRef]
Shafi, U.; Mumtaz, R.; García-Nieto, J. Precision agriculture techniques and practices: From considerations to applications. Sensors 2019, 19, 3796. [Google Scholar] [CrossRef]
Carfagna, E.; Gallego, F.J. Using remote sensing for agricultural statistics. Int. Stat. Rev. 2005, 73, 389–404. [Google Scholar] [CrossRef]
Yang, M.D.; Huang, K.S.; Kuo, Y.H. Spatial and spectral hybrid image classification for rice lodging assessment through UAV imagery. Remote Sens. 2017, 9, 583. [Google Scholar] [CrossRef]
Zhou, M.; Kuang, B.; Zhou, M. The spatial and temporal evolution of the coordination degree in regard to farmland transfer and cultivated land green utilization efficiency in China. Int. J. Environ. Res. Public Health 2022, 19, 10208. [Google Scholar] [CrossRef]
Li, Z.; He, W.; Li, J. Learning without exact guidance: Updating large-scale high-resolution land cover maps from low-resolution historical labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 27717–27727. [Google Scholar]
Lv, Y.; Zhang, C.; Ma, J. Sustainability assessment of smallholder farmland systems: Healthy farmland system assessment framework. Sustainability 2019, 11, 4525. [Google Scholar] [CrossRef]
Belgiu, M.; Csillik, O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
Xu, L.; Yang, P.; Yu, J. Extraction of cropland field parcels with high resolution remote sensing using multi-task learning. Eur. J. Remote Sens. 2023, 56, 2181874. [Google Scholar] [CrossRef]
Chang, B.; Wang, J.; Luo, Y. Cultivated land extraction based on GF-1/WFV remote sensing in Shenwu irrigation area of Hetao Irrigation District. Trans. Chin. Soc. Agric. Eng. 2017, 33, 188–195. [Google Scholar]
Fritz, S.; See, L.; McCallum, I. Mapping global cropland and field size. Glob. Change Biol. 2015, 21, 1980–1992. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Sun, H.; Wang, K. Enhanced Farmland Extraction from Gaofen-2: Multi-Scale Segmentation, SVM Integration, and Multi-Temporal Analysis. Agriculture 2025, 15, 1073. [Google Scholar] [CrossRef]
Volpi, M.; Tuia, D. Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images. ISPRS J. Photogramm. Remote Sens. 2018, 144, 48–60. [Google Scholar] [CrossRef]
Wagner, M.P.; Oppelt, N. Deep learning and adaptive graph-based growing contours for agricultural field extraction. Remote Sens. 2020, 12, 1990. [Google Scholar] [CrossRef]
Hong, R.; Park, J.; Jang, S. Development of a parcel-level land boundary extraction algorithm for aerial imagery of regularly arranged agricultural areas. Remote Sens. 2021, 13, 1167. [Google Scholar] [CrossRef]
Soille, P.J.; Ansoult, M.M. Automated basin delineation from digital elevation models using mathematical morphology. Signal Process. 1990, 20, 171–182. [Google Scholar] [CrossRef]
Zhao, X.; Zhu, L.; Wu, B. An improved mayfly algorithm based on Kapur entropy for multilevel thresholding color image segmentation. J. Intell. Fuzzy Syst. 2023, 44, 365–380. [Google Scholar] [CrossRef]
Xu, Z.; Liu, Y.; Gan, L. Rngdet: Road network graph detection by transformer in aerial images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4707612. [Google Scholar] [CrossRef]
Almeida, T.; Lourenço, B.; Santos, V. Road detection based on simultaneous deep learning approaches. Robot. Auton. Syst. 2020, 133, 103605. [Google Scholar] [CrossRef]
Marmanis, D.; Schindler, K.; Wegner, J.D. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef]
Mei, J.; Li, R.J.; Gao, W.; Cheng, M.M. CoANet: Connectivity attention network for road extraction from satellite imagery. IEEE Trans. Image Process. 2021, 30, 8540–8552. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Cao, S.; Lu, X. Lightweight Deep Learning Model, ConvNeXt-U: An Improved U-Net Network for Extracting Cropland in Complex Landscapes from Gaofen-2 Images. Sensors 2025, 25, 261. [Google Scholar] [CrossRef] [PubMed]
Mei, W.; Wang, H.; Fouhey, D. Using deep learning and very-high-resolution imagery to map smallholder field boundaries. Remote Sens. 2022, 14, 3046. [Google Scholar] [CrossRef]
Aung, H.L.; Uzkent, B.; Burke, M. Farm parcel delineation using spatio-temporal convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 76–77. [Google Scholar]
Wang, M.; Wang, J.; Cui, Y. Agricultural field boundary delineation with satellite image segmentation for high-resolution crop mapping: A case study of rice paddy. Agronomy 2022, 12, 2342. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Zhong, B.; Wei, T.; Luo, X. Multi-swin mask transformer for instance segmentation of agricultural field extraction. Remote Sens. 2023, 15, 549. [Google Scholar] [CrossRef]
Xia, L.; Luo, J.; Sun, Y. Deep extraction of cropland parcels from very high-resolution remotely sensed imagery. In Proceedings of the 2018 7th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), Hangzhou, China, 6–9 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–5. [Google Scholar]
Liu, Y.; Cheng, M.M.; Hu, X. Richer convolutional features for edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3000–3009. [Google Scholar]
Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar]
Waldner, F.; Diakogiannis, F.I. Deep learning on edge: Extracting field boundaries from satellite images with a convolutional neural network. Remote Sens. Environ. 2020, 245, 111741. [Google Scholar] [CrossRef]
Li, M.; Long, J.; Stein, A. Using a semantic edge-aware multi-task neural network to delineate agricultural parcels from remote sensing images. ISPRS J. Photogramm. Remote Sens. 2023, 200, 24–40. [Google Scholar] [CrossRef]
Long, J.; Li, M.; Wang, X. Delineation of agricultural fields using multi-task BsiNet from high-resolution satellite images. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102871. [Google Scholar] [CrossRef]
Jiao, W.; Persello, C.; Vosselman, G. PolyR-CNN: R-CNN for end-to-end polygonal building outline extraction. ISPRS J. Photogramm. Remote Sens. 2024, 218, 33–43. [Google Scholar] [CrossRef]
Xu, Y.; Zhu, Z.; Guo, M. Multiscale edge-guided network for accurate cultivated land parcel boundary extraction from remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 62, 4501020. [Google Scholar] [CrossRef]
Xie, E.; Sun, P.; Song, X. Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12193–12202. [Google Scholar]
Wei, F.; Sun, X.; Li, H. Point-set anchors for object detection, instance segmentation and pose estimation. In Computer Vision–ECCV 2020, Proceedings of 16th European Conference, Part X 16 Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 527–544. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar] [CrossRef]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Persello, C.; Bruzzone, L. A novel protocol for accuracy assessment in classification of very high resolution images. IEEE Trans. Geosci. Remote Sens. 2009, 48, 1232–1244. [Google Scholar] [CrossRef]
Li, J.; Wei, Y.; Wei, T.; Wei, H. A Comprehensive Deep-Learning Framework for Fine-Grained Farmland Mapping From High-Resolution Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5601215. [Google Scholar] [CrossRef]

Figure 1. Semantic segmentation vectorization and direct vector extraction.

Figure 2. Overview and labeling diagram of the four study areas. (A) is Denmark; (B) is the Netherlands; (C) is iFLYTEK; and (D) is Hengyang.

Figure 3. The overall structure of the model. The model consists of a hybrid backbone, multiscale edge feature extraction module, and vector contour prediction module. Hybrid backbone is used to extract image FPN. The multiscale edge feature extraction module predicts the edge of agricultural parcels at the pixel level based on the extracted features. The vector contour prediction module is responsible for predicting the boxes and poly points of agricultural parcels.

Figure 4. The hybrid backbone network is composed of Resnet and Transformer. On the right is the hybrid backbone structure. The features of different scales are fused and the feature pyramid (FPN) is output. On the left is the feature fusion module, showing the fusion of different scale features in the space and channel.

Figure 5. Different scale extraction of agricultural parcel edge module.

Figure 6. Agricultural parcel edge feature edge decoder.

Figure 7. Schematic diagram of contour point position regression. The yellow dot is the prediction point, and the red dot and green dot are the corner and edge points of the real contour, respectively. The model first calculates the offset loss from the predicted point to the nearest real point, and then determines whether it is a corner point.

Figure 8. Schematic diagram of over-classification error and under-classification.

Figure 9. Results of extracting agricultural parcels in iFLYTEK dataset using different backbone and with/without the addition of the multiscale edge feature extraction module. The blue border indicates the parcel is incomplete, and the yellow border indicates the wrong parcel is extracted.

Figure 10. Prediction results of agricultural parcels in the iFLYTEK dataset.

Figure 11. Prediction results of agricultural parcels in the Netherlands dataset.

Figure 12. Prediction results of agricultural parcels in the Denmark dataset.

Figure 13. Prediction results of agricultural parcels in the Hengyang dataset.

Table 1. Information about the Netherlands, Denmark, iFLYTEK, and Hengyang datasets.

Region	Satellite	Resolution	Bands	Date	Train/Val/Test
The Netherlands	Sentinel-2	10	4	1 May 2020	1830/457/1428
Denmark	Sentinel-2	10	3	8 May 2016	2927/731/1356
iFLYTEK	JiLin-1	0.75–1.1	4	-	2995/749/566
HengYang	GF-2	1	4	8 July 2023	922/230/365

Table 2. Evaluation results in iFLYTEK dataset using different trunks and whether to add multiscale edge feature extraction module. “Backbone” in the table represents different backbone types used; “Edge Predict” indicates whether the multiscale edge feature extraction module has been added to the model.

Data	Backbone	Edge Predict	P	R	ACC	$F_{1}$	IOU	$S_{o v e r}$	$S_{u n d e r}$	PoLiS
iFLYTEK	Resnet50	√	84.54	90.62	85.25	87.47	75.28	39.24	16.51	1.47
	Swin Transformer	√	79.54	88.80	82.25	83.92	71.26	38.94	17.24	1.44
	Resnet50+ Swin Transformer		83.54	92.80	88.25	87.93	77.52	41.24	18.12	1.69
	Resnet50+ Swin Transformer	√	86.54	90.80	87.25	88.62	78.02	39.13	14.12	1.38

Table 3. Evaluation results of different models in the regular parcels in the Netherlands and iFLYTEK datasets.

Data	Model	P	R	ACC	$F_{1}$	IOU
iFLYTEK	ResUNet-a	78.69	91.02	85.36	87.32	76.65
	BisNet	79.25	89.12	84.96	86.02	74.05
	SEANet	81.46	88.34	83.24	87.39	72.24
	DBBANet	80.25	92.36	87.56	88.34	77.08
	PloyR-cnn	83.56	88.62	87.35	86.01	76.32
	Ours	86.54	90.80	87.25	90.43	78.02
Netherlands	ResUNet-a	80.04	92.77	85.85	89.52	78.02
	BisNet	83.12	90.35	88.96	87.28	74.05
	SEANet	83.65	91.69	90.24	88.22	76.24
	DBBANet	82.02	90.45	89.52	86.50	80.03
	PloyR-cnn	86.36	89.65	87.25	87.97	78.35
	Ours	88.35	90.22	89.20	89.79	81.35

Table 4. Evaluation results of different models in the irregular parcels of Denmark and Hengyang datasets.

Data	Model	P	R	ACC	$F_{1}$	IOU
Denmark	ResUNet-a	65.24	81.35	74.63	72.41	68.25
	BisNet	61.23	79.18	69.75	69.06	60.36
	SEANet	62.38	78.29	70.32	69.44	62.54
	DBBANet	63.29	78.36	72.14	70.02	61.76
	PloyR-cnn	67.03	73.52	75.12	70.12	65.34
	Ours	69.76	75.76	77.25	72.64	67.92
HengYang	ResUNet-a	68.36	82.56	79.54	74.79	65.15
	BisNet	59.85	78.96	68.96	68.09	62.45
	SEANet	61.96	75.45	72.75	68.04	59.36
	DBBANet	61.45	75.96	74.14	67.94	60.85
	PloyR-cnn	71.45	69.24	78.63	70.32	64.96
	Ours	72.35	70.65	80.95	71.49	66.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Teng, F.; Wu, L.; Liu, S. Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction. Remote Sens. 2025, 17, 2556. https://doi.org/10.3390/rs17152556

AMA Style

Teng F, Wu L, Liu S. Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction. Remote Sensing. 2025; 17(15):2556. https://doi.org/10.3390/rs17152556

Chicago/Turabian Style

Teng, Feiyu, Ling Wu, and Shukuan Liu. 2025. "Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction" Remote Sensing 17, no. 15: 2556. https://doi.org/10.3390/rs17152556

APA Style

Teng, F., Wu, L., & Liu, S. (2025). Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction. Remote Sensing, 17(15), 2556. https://doi.org/10.3390/rs17152556

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extraction of Agricultural Parcels Using Vector Contour Segmentation Network with Hybrid Backbone and Multiscale Edge Feature Extraction

Abstract

1. Introduction

2. Materials and Methods

2.1. Research Region and Data

2.2. Data Preprocessing

2.3. Methods

2.3.1. Hybrid Backbone

2.3.2. Multiscale Edge Feature Extraction Module

2.3.3. Vector Contour Prediction Module

2.4. Accuracy Evaluation

3. Results

3.1. Ablation Experiment

3.1.1. Influence of Different Backbones on the Results

3.1.2. Influence of Multiscale Edge Feature Extraction Module on the Results

3.1.3. Visual Display of Extraction Results

3.2. Comparative Experiment

3.2.1. Experimental Results and Analysis of Regular Parcels in Netherlands and iFLYTEK

3.2.2. Experimental Results and Analysis of Irregular Parcels in Denmark and Hengyang

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI