Effective Cultivated Land Extraction in Complex Terrain Using High-Resolution Imagery and Deep Learning Method

Liu, Zhenzhen; Guo, Jianhua; Li, Chenghang; Wang, Lijun; Gao, Dongkai; Bai, Yali; Qin, Fen

doi:10.3390/rs17050931

Open AccessArticle

Effective Cultivated Land Extraction in Complex Terrain Using High-Resolution Imagery and Deep Learning Method

by

Zhenzhen Liu

^1,2,3,

Jianhua Guo

⁴,

Chenghang Li

^1,2,3

,

Lijun Wang

^1,2,3,

Dongkai Gao

^1,2,3,

Yali Bai

^5,6

and

Fen Qin

^1,2,3,*

¹

College of Geographical Sciences, Faculty of Geographical Science and Engineering, Henan University, Zhengzhou 450046, China

²

Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions, Ministry of Education, Henan University, Kaifeng 475004, China

³

Henan Technology Innovation Center of Spatial-Temporal Big Data, Henan University, Zhengzhou 450046, China

⁴

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

⁵

Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China

⁶

Information Technology Group, Wageningen University & Research, 6708 PB Wageningen, The Netherlands

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(5), 931; https://doi.org/10.3390/rs17050931

Submission received: 19 January 2025 / Revised: 24 February 2025 / Accepted: 4 March 2025 / Published: 6 March 2025

(This article belongs to the Special Issue Advances in Remote Sensing for Crop Monitoring and Food Security)

Download

Browse Figures

Versions Notes

Abstract

The accurate extraction of cultivated land information is crucial for optimizing regional farmland layouts and enhancing food supply. To address the problem of low accuracy in existing cultivated land products and the poor applicability of cultivated land extraction methods in fragmented, small parcel agricultural landscapes and complex terrain mapping, this study develops an advanced cultivated land extraction model for the western part of Henan Province, China, utilizing Gaofen-2 (GF-2) imagery and an improved U-Net architecture to achieve a 1 m resolution regional mapping in complex terrain. We obtained optimal input data for the U-Net model by fusing spectral features and vegetation index features from remote sensing images. We evaluated and validated the effectiveness of the proposed method from multiple perspectives and conducted a cultivated land change detection and agricultural landscape fragmentation assessment in the study area. The experimental results show that the proposed method achieved an F1 score of 89.55% for the entire study area, with an F1 score ranging from 83.84% to 90.44% in the hilly or transitional zones. Compared to models that solely rely on spectral features, the feature selection-based model demonstrates superior performance in hilly and adjacent mountainous regions, with improvements of 4.5% in Intersection over Union (IoU). Cultivated land mapping results show that 83.84% of the cultivated land parcels are smaller than 0.64 hectares. From 2017 to 2022, the overall cultivated land area decreased by 15.26 km², with the most significant reduction occurring in the adjacent hilly areas, where the land parcels are small and fragmented. This trend highlights the urgent need for effective land management strategies to address fragmentation and prevent further loss of cultivated land in these areas. We anticipate that the findings can contribute to precision agriculture management and agricultural modernization in complex terrains of the world.

Keywords:

Gaofen-2 imagery; deep learning; feature selection; cultivated land extraction; farmland fragmentation

1. Introduction

Amidst the relentless growth of the global population, cultivated land resources are confronting unprecedented challenges. FAO has revealed that approximately 2 billion hectares of land have been degraded due to human activities, affecting 34% of the world’s agricultural land. This stark fact underscores the urgency of effective land resource management to ensure food security and foster sustainable agricultural development. According to statistics, the cultivated land in China’s mountainous and hilly regions comprises about 33% of the nation’s total cultivated land and supports over 55% of the population (770 million people) [1]. These areas, encountering difficult farming conditions and high rates of land abandonment, are highly potential regions for significantly enhancing agricultural productivity [2]. Therefore, it is imperative to conduct research on the arable land extraction of the area and the spatial pattern information for mountainous and hilly regions.

Traditional methods for cultivated land extraction mainly consist of two categories: one focuses on boundary information extraction methods, such as edge detection [3], region growing and splitting [4], and multi-scale segmentation algorithms [5]. While edge detection methods can effectively reflect the contours of land parcels, they are susceptible to noise interference, which can lead to false edges [6]. Moreover, their parameterization is arbitrary and context-dependent, and a single parameterization can result in incomplete boundary extraction, making it difficult to close target edges. Region growing and splitting algorithms heavily rely on seed point selection and growth order, resulting in significant uncertainty in segmentation results. These methods are prone to over-segmentation in parcels with high internal variability and under-segmentation in small parcels [7]. This is especially detrimental to the accurate extraction of cultivated land in fragmented agricultural landscapes. Multi-scale segmentation methods can segment images at different levels, yielding better results. However, due to the complexity and variety of land cover types within the image, they increase computational complexity. In smallholder farming systems, where agricultural landscapes are heterogeneous and fragmented, determining the optimal segmentation scale is difficult, thus limiting the application of such methods [8]. The other category focuses on pixel extraction methods, which primarily include Random Forest [9], SVM [10], and other machine learning algorithms [11]. The minimum recognition unit of these methods is the pixel. In remote sensing classification, these methods are easily influenced by noise, resulting in salt-and-pepper effects in the classification results [12]. In addition, these approaches frequently rely on satellite imagery with a spatial resolution of 10 m or higher, primarily applied in regions or countries characterized by intensive agriculture and large-scale farmlands [13]. However, these methods remain highly uncertain in regions characterized by severe cultivated land fragmentation, small parcel sizes, and irregular shapes [14]. This may require data sources with higher spatial resolution and methods that consider more agricultural land characteristics to improve the fine-scale extraction of cultivated land in complex agricultural landscapes. According to the global distribution map of farmland plot sizes, small plots (with an area of <0.64 ha) account for as much as 40% of global agricultural land, with these plots primarily concentrated in China, India, and Africa [15]. Therefore, it is crucial to develop advanced cultivated land extraction methods for such regions.

Currently, the rapid progress in high-resolution satellite imagery and deep learning algorithms provides a solid data and technological foundation for deriving accurate agricultural land products [16,17]. Deep learning, leveraging the sophisticated architecture of convolutional neural networks (CNNs), exhibits a superior capacity for feature extraction, efficiently encapsulating the contextual nuances inherent in input data [18]. Deep learning has also made significant progress in high-resolution cropland extraction [19]. For instance, Garcia-Pedrero et al. [20] utilized image data featuring a spatial resolution of 0.25 m and employed U-Net to extract agricultural plots in Spain; their findings showed that the powerful feature learning capabilities of deep learning, aided by high-resolution data, can extract more rich semantic information. Cai et al. [8] developed a cultivated land plot extraction model (CEUNet) based on the U-Net network framework by integrating GaoFen-1/2/6 satellite imagery; this model achieved an OA of 92.92% in plain areas. Chen et al. [21] utilized GF-2 satellite imagery and the U-Net model to achieve farmland plot extraction in a plain test area severely affected by soil salinization, with an OA of 97%. However, as most test areas in these studies are predominantly plains, the effectiveness of these deep learning methods for extracting cultivated land in hilly or mountainous regions remains unclear, highlighting the need for a more advanced model that accounts for diverse geomorphological conditions.

Previous studies have shown that the traditional machine learning methods, combined with features such as vegetation indices (VIs), texture features, and spectral features, have become widely adopted strategies in remote sensing classification and segmentation [22,23,24,25]. These approaches provide a solid theoretical foundation and valuable scientific reference for the integration of deep learning in this field. VIs help simplify complex remote sensing data into intuitive metrics that reflect vegetation growth conditions. For instance, Yee-Rendon et al. [26] introduced the Normalized Green–Blue Vegetation Index (NGBVI) and Normalized Red–Blue Vegetation Index (NRBVI) into deep learning models for plant disease monitoring, achieving a Top-1 test accuracy averaging more than 98%. Following this, Ulku et al. [27] combined various vegetation indices in a three-channel input, enhancing tree segmentation accuracy compared to using only high-resolution visible or near-infrared data. However, few studies have attempted to integrate derived features into deep learning approaches in the field of farmland identification, particularly with high-resolution imagery. This is primarily because current deep learning-based farmland classifications using high-resolution imagery predominantly focus on feature extraction using the Red, Green, and Blue bands, while overlooking the potential advantages of the near-infrared band.

To address the above problems, we attempted to use high-resolution images and deep learning methods for cultivated land extraction in hilly and mountainous regions by integrating spectral characteristics and vegetation index features. We selected an agricultural production region located in the middle and lower reaches of the Yellow River Basin in China as the study area. This region, characterized by fragmented farmland and a complex cropping structure, serves as a representative area of smallholder farming systems. The research focuses on four key objectives: (1) determining the optimal spectral characteristics and vegetation index features of remote sensing images as inputs for the proposed U-Net architecture model for cultivated land extraction in complex terrains; (2) producing a high-precision cultivated land sample dataset for fragmented agricultural landscape; (3) creating a cultivated land map in complex terrains based on the optimal deep learning model; and (4) analyzing the changes in cultivated land from 2017 to 2020 and the state of fragmentation to support precision agriculture management and agricultural modernization. The overall workflow of this study is shown in Figure 1 which mainly includes the data and preprocessing, the feature selection, the optimal model construction, and the evaluation of the results and mapping.

2. Materials and Methods

2.1. Research Region

The research region is located in the middle and lower reaches of the Yellow River Basin in China, a key agricultural production area, with geographic coordinates ranging from 112°26′15″E to 113°00′00″E and from 34°27′30″N to 34°50′00″N. It includes 18 townships, and the total area of the research region is around 946 km². The research region occupies the junction of the Loess Plateau and the Funiu mountain region, with its central part characterized by Yellow River alluvial landforms, predominantly consisting of plains, hills, and mountainous landforms (Figure 2). The cultivated land in the study area is characterized by dispersed plots with small field sizes, irregular shapes, and complex crop planting structures, making it a typical smallholder farming system (Figure S1).

2.2. Data and Preprocessing

2.2.1. Gaofen-2 (GF-2) High-Resolution Satellite Data

GF-2 satellite imagery serves as the primary data source for this study (Table 1). The GF-2 satellite data utilized in this research were sourced from https://www.gscloud.cn/, accessed on 1 February 2025. A total of 35 usable scenes were selected, covering the years 2017, 2019, 2020, 2021, and 2022. The selected months include 2, 3, 7, 8, 9, 10, 11, and 12, with cloud cover below 5% for all scenes. Information on the GF-2 imagery and its coverage frequency in the study area is presented in Table S1 and Figure S2, respectively. Preprocessing was conducted primarily with ENVI 5.5 and ArcMap 10.8, with an error controlled within 0.5 pixels per scene [28]. It should be emphasized that the data we ultimately employed were the fusion of multi-spectral and panchromatic bands, with an achieved spatial resolution of 1 m.

2.2.2. GF-2 Diversified Farmland Sample Dataset (GF-2 DFSD)

To ensure the temporal diversity and spatial balance in the deep learning samples used for training, this study selected samples that uniformly cover different landforms within the study area, incorporating imagery from multiple years and months (Table 2). We use visual interpretation methods to prepare 3 km × 3 km sample plots, based on 1 m resolution GF-2 fused data (Figure 3). In this study, based on the created 3 km×3 km sample plots, sample slices of different patch sizes were produced for the training of deep learning models. The sample dataset used in this study will be made publicly available online concurrently with the publication of the article.

2.2.3. Test Site Locations

Figure 4 shows the test site locations. Site selection follows the principle of “avoiding training sample areas” to ensure a more accurate assessment of the model’s performance in extracting cultivated land information. We used visual interpretation methods to create reference samples for accuracy assessment.

2.3. Improved U-Net Architecture

This study developed an improved U-Net architecture as the foundational network model (Figure 5). In our model, we replaced the encoding module of U-Net with a pre-trained ResNet34 model [29], and it can accommodate inputs with any number of bands and common bit depths (8 bit, 16 bit, and 32 bit). The left half of the network architecture represents the contracting path, while the right half corresponds to the expanding path. The rectified linear unit (ReLU) activation function is employed to improve the vanishing gradient problem and enhance training efficiency [30]. The cross-entropy loss function applied here follows the approach described by [31].

2.4. Feature Selection

2.4.1. Vegetation Indices

The GF-2 imagery data contains only four bands (Red, Green, Blue, and near-infrared). Based on previous studies, the study collected vegetation indices that are useful for cultivated land extraction (Table 3). The Normalized Difference Vegetation Index (NDVI) is effective for assessing vegetation growth across different stages [33], while the Normalized Difference Water Index (NDWI) is used to extract water bodies [34]. The Ratio Vegetation Index (RVI) is sensitive to high-density vegetation [35], and the Difference Vegetation Index (DVI) is better suited for early-stage vegetation with low coverage [36]. The Green NDVI (GNDVI) monitors nitrogen content in vegetation at high coverage [37], and the Optimized Soil Adjusted Vegetation Index (OSAVI) is responsive to changes in canopy coverage [38].

The standard deviation and the mean of the cultivated and uncultivated land for each feature are provided in Figure 6. These potential discrepancies of the six derived vegetation indices offer opportunities to enhance the accuracy of cultivated land classification. This further demonstrates the validity of selecting these indices.

2.4.2. Pearson Correlation Analysis (PCA)

Correlation analysis can help reduce data redundancy. Figure 7a presents the PCA results among the 10 features. It can be observed that the correlation between the spectral features and the vegetation indices is relatively low (r < 0.6). Among the four spectral features, the Red, Green, and Blue bands exhibited strong correlations, while their correlation with the NIR band was relatively low. The correlation between RVI and DVI with the other features was relatively low.

2.4.3. Permutation Feature Importance (PFI)

The PFI method offers a practical approach for calculating feature importance in neural network models [39]. This study is based on existing sample data, using the LSTM neural network model for model training and prediction [40]. The PFI method is then applied to rank the importance of each feature. Figure 7b shows the PFI results, represented by the Mean Squared Error (MSE). To eliminate randomness, the final importance score for each feature is the mean obtained after randomly shuffling the feature ten times. As shown in Figure 7b, the three index features OSAVI, GNDVI, and NDWI have the lowest importance.

2.4.4. Feature Combination Scheme

For deep learning models, each additional feature increases computational complexity significantly. Therefore, this study applied a combination of correlation analysis and permutation feature importance methods to reduce the feature set’s dimensionality. Based on the principle that “the stronger the correlation, the greater the information redundancy, the lower the feature importance score, and the weaker the classification ability”, the final feature set was determined [41]. It is suggested that certain features, such as NDWI, GNDVI, and OSAVI, may need to be removed. OSAVI, sensitive to changes in vegetation canopy coverage, is valuable for extracting cultivated land information from multi-temporal or long-term series data and will be retained in the analysis. Despite high correlations among the Red, Green, and Blue spectral bands, their PFI scores surpass those of other indices, warranting their retention as well. Therefore, the optimal feature subset for cultivated land extraction in this study includes Blue, Green, Red, NIR, NDVI, RVI, DVI, and OSAVI. This study developed two comparative schemes by analyzing feature correlations and integrating their importance scores. Detailed descriptions of these schemes are provided in Table 4. (1) The first scheme, termed U-Net_WFS, uses the original spectral band combination without feature selection, including four original spectral bands. (2) The second scheme, termed U-Net_FS, employs selection-based combinations, incorporating both four spectral bandsand four vegetation indices.

2.5. Training Parameter Settings

All the training and inference were carried out in Jupyter Notebook5.7.10 by importing the third-party Python library, Fastai [42]. The deep learning models were all executed on a Windows 10 system, which includes an NVIDIA GeForce RTX 2080 Ti GPU and 128 GB of RAM. Based on the findings from optimal learning rate (lr) search, the lr employed in this research was between 1.1 × 10⁻⁵ and 1.1 × 10⁻⁴. Sample augmentation is a method aimed at expanding the original samples to strengthen the model’s generalization ability. The participation of multi-temporal samples in this study essentially increased the diversity of samples. On this basis, online sample augmentation was performed, including two methods: rotation and linear stretching, to further enrich the samples and enhance the model’s robustness. The ratio of training samples to validation samples during model training was 8:2.

2.6. Evaluation Metrics

In the study, we assessed the cultivated land classification model using F1 score, mean Intersection over Union (mIoU), Precision, and Recall [28]. In addition, we utilized the IoU, F1 score, PA, UA, overall accuracy (OA), and kappa coefficient (Kappa) for evaluating the classification results [12].

3. Results

3.1. Visual Evaluation of Cultivated Land Mapping Results

We employed the trained feature selection model to identify cultivated land within the study area from 2017 to 2022. Figure 8 illustrates the detailed results of cultivated land extraction for the study area in 2022. The mapping results demonstrated effective cultivated land extraction capabilities in both flat and hilly regions. Specifically, the model successfully captured the spatial details of cultivated land, including fine features such as field pathways, which were particularly evident in flat areas. However, the enlarged results indicate instances of very small plots being missed, particularly near roads. This omission may be attributed to the model’s sensitivity to small-scale features. Additionally, the cultivated land within the research region exhibits fragmentation, particularly in the southern hilly areas, which are closer to the mountainous regions. The fragmentation is likely influenced by the complex terrain and human activities, such as land use changes and agricultural practices.

3.2. Quantitative Evaluation of Cultivated Land Mapping Results

3.2.1. Classification Results Evaluation Under Different Landforms

Table 5 and Figure S3 present the cultivated land extraction results for years and regions not included in the training samples, with the imagery acquired on 9 December 2017. The site locations are shown in Figure 4. The results show that U-Net_FS outperforms U-Net_WFS in various geomorphological areas, especially in complex terrains. Both models achieve the highest cultivated land extraction accuracy in plain areas (sites B and C), with decreasing accuracy in transition zones (site D), northern hilly areas (site A), and southern hilly areas (sites E and F). At site C, U-Net_FS’s IoU is 31.03% higher than at site F, with OA and Kappa differences of 9.74% and 22.34%, respectively. The performance gap between U-Net_FS and U-Net_WFS is minimal in plains but more significant in hilly regions, particularly in sites E and F, where U-Net_FS outperforms U-Net_WFS in IoU, F1, OA, and Kappa by 3–8%. The feature selection strategy enhances accuracy in irregular, fragmented cultivated lands influenced by other vegetation types, improving small parcel extraction in complex terrains. Overall, the results suggest that the feature selection strategy adapts better to the spectral and spatial heterogeneity of hilly regions, which is less pronounced in plain areas. These findings underscore the importance of tailoring feature selection strategies to specific geomorphological conditions to optimize cultivated land extraction accuracy.

3.2.2. Classification Results Evaluation Under Different Temporal Phases

Table 6 and Figure S4 present the visual results of cultivated land extraction and evaluation metrics across different temporal phases, with the inference region located at site G. The results indicate that both U-Net_FS and U-Net_WFS models possess strong temporal transferability and high capability in extracting cultivated land information, with PA and UA exceeding 90%. Specifically, the cultivated land IoU of U-Net_FS ranges from 89.15% to 93.36%, and OA ranges from 91.05% to 94.31%. For U-Net_WFS, the IoU ranges from 88.74% to 93.58%, and OA from 90.74% to 94.51%. The high consistency in performance across different temporal phases demonstrates the robustness of both models in handling temporal variations. These findings suggest that the inclusion of multi-temporal samples in model training enhances sample diversity and improves the model’s temporal transfer performance.

3.3. Comparisons with Other Methods

The comparison methods in this study include PSPNet [43], DeepLabV3 [44], U-Net [32], and our method. All models were trained under the same configuration environment to ensure fairness in the comparison (Table 7). We evaluated these comparative methods on the entire validation set. The results show that our method achieves the best performance in high-resolution cropland extraction, followed by the original U-Net, DeepLabV3, and PSPNet. Our method has an advantage of 0.69% to 4.27% in terms of mIoU, and Only U-Net and our method achieved an mIoU of over 80%. This further highlights the rationality of choosing U-Net as the base model.

3.4. Analysis of Cultivated Land Area and Changes

We mapped cultivated land area changes in the research region from 2017 to 2022 (Figure 9) and summarized the cultivated land area changes in sub-administrative regions (Table 8). The results clearly reflect the increase and decrease in cultivated land under various regions. Spatial distribution results show that the changes in cultivated land are most pronounced in the southwest of the study area (LCT, ZGT). They belong to the transition zone of plains and hills, and the cultivated land distribution is relatively dispersed. Statistical results show that the overall cultivated land area decreased by 15.26 km², with LCT accounting for 27.20% of the decrease. This decline is primarily attributed to urban expansion, which has resulted in the conversion of farmland into buildings. The rapid urbanization in LCT highlights the pressure on agricultural land in regions undergoing economic development. In contrast, the plain region (GLT) experienced the minimal change, with cultivated land continuing to be the dominant land use type, and its distribution is more concentrated. The stability of cultivated land in GLT underscores the importance of flat terrain and established agricultural practices in maintaining land use consistency.

3.5. Analysis of Cultivated Land Fragmented Patterns Based on Field Size

Based on the cultivated land extraction results from this study, we applied a deep learning road extraction method [45] and a mean shift multi-scale segmentation algorithm [46] to achieve more precise segmentation of cultivated land parcels in 2020, with each parcel (crop field) containing only a single type. According to global field size classification standards, fields in the study area are categorized into three levels: very small (<0.64 hm²), small (0.64 hm² to 2.56 hm²), and medium (2.56 hm² to 16 hm²) [15]. As shown in Figure 10 and Table 9, 83.84% of fields are less than 0.64 hm², with the smallest plots primarily in the northern and southern hilly regions and small plots dominating the central plain area. This spatial pattern reflects the impact of topography on field size distribution, as hilly regions often favor smaller plots due to terrain constraints. The high fragmentation of cultivated land, especially in transitional areas between hills and mountains, may present challenges for smallholder farming, such as suboptimal planting decisions and a higher risk of land abandonment. These challenges underscore the importance of targeted policies to improve land use efficiency and support farmer livelihoods in fragmented agricultural landscapes.

4. Discussion

4.1. Ablation Study of the Proposed Deep Learning Method

4.1.1. The Performance of Models Under Different Patch Sizes

As shown in Table 10, there are significant differences in model performance across varying patch sizes. The U-Net_FS model generally outperforms the U-Net_WFS model, and the training duration tends to increase as the patch size decreases. The patch sizes of 128 × 128 pixels achieved the best results in our study, followed by 224 × 224 pixels. Although the model achieves high mIoU and F1 scores with 128 × 128 pixels, validation loss becomes unstable after 60 training epochs (Figure 11). A patch size of 224 × 224 pixels balances training time and stability, yielding the best metrics. With this size, the mIoU of U-Net_FS and U-Net_WFS models is 11.46% and 3.52% higher, respectively, than that with 512 × 512 pixels, underscoring the impact of patch size on performance. Thus, 224 × 224 pixels is identified as optimal.

4.1.2. The Performance of Models Under Different Sample Sizes

In the present research, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90% of the total 19,503 paired sample patches, with patch size at 224 × 224 pixels, were randomly selected as training samples to participate in model training (Table 11). The results show that when the proportion of training samples is less than 50%, the model’s mIoU and F1 scores are unstable. When the proportion exceeds 70%, the mIoU and F1 scores stabilize at a certain level, with mIoU above 77% and F1 scores above 88%. These findings suggest that while increasing the proportion of training samples can improve model performance up to a certain point, there are diminishing returns beyond 70%. Additionally, the quality and diversity of the training samples play a crucial role in model accuracy, and time efficiency is an important consideration in practical applications.

4.1.3. The Performance of Models Under Multi-Temporal Samples

In this study, all multi-temporal samples mentioned in Table 2 were used for training to evaluate U-Net_FS and U-Net_WFS. The total number of patches was 19,503, with the optimal size of 224×224 pixels determined earlier.

The accuracy and validation loss of the U-Net_FS and U-Net_WFS models differ only slightly. But overall, the performance of U-Net_FS is superior to that of U-Net_WFS. Both models achieve a recall rate above 0.93, F1 scores above 89%, and mIoU values exceeding 80%. Compared to the single-temporal results (Table 10), the multi-temporal model U-Net_FS shows a 1.74% improvement in mIoU, while U-Net_WFS exhibits a 1.98% increase (Table 12). This improvement is the result of the combined effect of optimal patch size and multi-temporal samples.

4.2. Comprehensive Evaluation of Cultivated Land Products

4.2.1. High Spatial Generalization Capability

To assess the spatial generalization capability of our method, we selected four test areas (A, B, C, D), including both plain and hilly regions (Figure 12). The method described in this paper effectively removes uncultivated land types such as ponds, residential areas, and forests from high-resolution imagery, revealing richer details of cultivated land information. Even in hilly areas where the image features are more complex, it can capture detailed cultivated land specifics (Figure 12i). Field roads between cultivated land plots were excluded as uncultivated land, resulting in better spatial independence between the plots. The results demonstrate the strong out-of-sample generalization capabilities of our method but also underscore the effectiveness of integrating high-resolution imagery with deep learning techniques for capturing intricate details of cultivated land.

4.2.2. Comparison of Publicly Available Cultivated Land Product Details

We compared our cultivated land product with existing products (Figure 13, Table 13) and found it aligns well spatially, but offers more detailed internal features, area extraction, and clear boundaries. In contrast, products like ESA_WorldCover [47], Dynamic World [48], and GLC_FCS [49], derived from medium to high-resolution imagery, struggle to capture details like field roads and highways, resulting in misclassification. Although the SinoLC-1 [50,51] product uses a 1 m resolution deep learning algorithm and captures some road details, it lacks the internal richness of our extraction. This is due to its training on coarser 10 m land cover data, which limits its ability to capture finer details. This underscores the importance of detailed training labels for improving model performance.

Additionally, the product comparison results indicate that using imagery with a spatial resolution of 10 m or higher for cropland extraction may lead to an overestimation of cropland area. In developing countries, many field roads are narrower than 8 m and are often misclassified as cropland pixels, unless masked with actual road networks, which introduces new challenges related to geometric alignment between different products.

4.3. Model’s Sensitivity Analysis

This study further validates the reliability of the feature selection results by analyzing the sensitivity of individual features to the target model. The reference baseline model is trained using all available features. The larger difference in mIoU or F1 scores after removing a feature indicates that the model is more sensitive to that particular feature. As shown in Figure 14, the sensitivity of the U-Net model to each feature is ranked from highest to lowest as follows: NDVI > RVI > Red > Green > OSAVI > NIR > NDWI > GNDVI > DVI > Blue. This ranking suggests that OSAVI has a greater impact on the U-Net model’s classification and segmentation performance compared to NDWI and GNDVI. Therefore, it is reasonable to retain OSAVI in the feature selection process. Conversely, since NDWI and GNDVI have minimal impact on improving the U-Net model’s performance and are ranked lower, their removal aligns with the previous analysis results.

4.4. Causes of Cultivated Land Fragmentation and Decrease

We calculated and tabulated data for each sub-administrative region within the study area, including elevation (EL), slope (SL), population size (PS), aging rate (AR), defined as the proportion of individuals aged 60 and above relative to the total population, and the total decrease in cultivated land area from 2017 to 2022 (DA). Based on the results presented in Section 3.5, the patch density (PD) of the cultivated land, calculated as the ratio of the total number of fields to the total cultivated area, was used as an indicator of cultivated land fragmentation (Table S2).

The correlation analysis results (Figure 15) indicate that elevation and slope are the dominant factors in the fragmentation of cultivated land. However, there is no direct evidence linking fragmentation to aging. According to the results in Section 3.4, the reduction in cultivated land mainly results from urban expansion. In addition, field surveys and consultations with farmers also indicate that the reduction in cultivated land is due to the long-term idleness of farmland caused by land transfers during the pandemic lockdown, as well as the implementation of the Grain-to-Green Program.

4.5. Limitations and Futurework

This study effectively used a deep learning model with high-resolution imagery to extract cultivated land information in complex geomorphic areas. Although incorporating vegetation features proved beneficial, some limitations remain. This is mainly reflected in the following aspects: (1) Model performance is influenced by sample size, patch size, and sample diversity [52]. A small number of samples may yield high indicator scores, but the model performance remains unstable. It is essential to identify the critical point where increasing the sample size has minimal impact on model performance. This can be achieved by dynamically setting the ratio of training to validation sample sizes. The patch size has a greater impact on model performance, so careful consideration of tile size design is necessary in deep learning classification studies. Currently, patch sizes of 256 × 256 and 224 × 224 are commonly used. The diversity of the samples seems to have a less obvious effect on model performance, but the final classification results confirm the advantage of sample diversity. (2) Although the cultivated lands in the study area are small, fragmented, and exhibit diverse forms, they follow a typical winter wheat-summer maize rotation pattern within a complex landscape. However, the applicability of these methods to rice-dominated regions is uncertain, as rice fields are easily confused with water bodies during early growth and irrigation stages, which require targeted classification schemes or methods for cultivated land extraction [53,54]. (3) The accuracy of our model still requires further improvement, especially in transition zones from hilly to mountainous areas. This is mainly due to the fact that field roads cause some parcels of cultivated land to adhere, especially in areas where vegetation conceals narrow roads. This highlights the need for further algorithmic refinement in boundary extraction [55]. To address this, a multi-task deep learning approach could be implemented, incorporating an edge extraction model to better delineate field boundaries in future work [56]. Additionally, previous studies have shown that texture and geometric features perform well in high-resolution image classification using traditional machine learning methods like Random Forest. Incorporating these features into deep learning models may help address the lower extraction accuracy in complex terrains [12,57].

This study develops a deep learning-based method for the fine-scale extraction of cultivated land in fragmented agricultural landscapes, achieving a 1 m spatial resolution cultivated land mapping, and conducts application research. The findings provide important solutions for agricultural management departments to perform more detailed cultivated land change detection and the evolution of cultivated land fragmentation patterns. Additionally, it offers valuable methodological and data support for government or agricultural management departments in optimizing farmland layouts. This research also provides scientific references for researchers in developing high-temporal and spatial resolution, fine-scale, large-scale cultivated land products.

5. Conclusions

In this study, a cultivated land extraction model developed using Gaofen-2 (GF-2) images and an improved U-Net architecture achieved 1 m resolution mapping in complex terrain areas. The input data, which combines spectral and vegetation index features from remote sensing images, demonstrates strong performance across various regions. The experimental results show that the proposed method achieved an F1 score of 89.55% in the study area, with high overall accuracy (OA) across different terrain zones and years. Compared to models that solely rely on spectral features, the feature selection-based model demonstrates superior performance in hilly and adjacent mountainous regions. Our mapping reveals significant fragmentation of the cultivated land, with most plots being small, and a decrease in total cultivated land area from 2017 to 2022, particularly in hilly regions. This highlights the urgent need for effective land management strategies to address the issue of cultivated land loss. Using imagery with a spatial resolution of 10 m or higher for cropland extraction may lead to an overestimation of cropland area, particularly in fragmented and small plot areas. Moreover, existing deep learning-based remote sensing public datasets primarily focus on large, regular farmland areas. The deep learning-based cultivated land sample dataset (GF-2 DFSD) created in this study enriches the current public datasets by including fragmented and irregular farmland data. We hope our research contributes to precision agriculture management and agricultural modernization in complex terrains regionally or globally, which is vital for safeguarding food security and fostering sustainable development.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17050931/s1.

Author Contributions

Z.L.: Conceptualization, methodology, data curation, formal analysis, writing—original draft, writing—review and editing. J.G.: Formal analysis, writing—original draft, writing—review and editing. C.L.: Data curation, formal analysis. L.W.: Methodology, software. D.G.: Data curation, formal analysis. Y.B.: Conceptualization, writing—review and editing. F.Q.: Conceptualization, methodology, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was sponsored by the High-Resolution Satellite Project of the State Administration of Science, Technology, and Industry for National Defense of the PRC (80Y50G19-9001-22/23); the National Science and Technology Platform Construction Project (2005DKA32300); the Major Research Projects of the Ministry of Education (16JJD770019); the National Natural Science Foundation of China (42401377, U21A2014).

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

Acknowledgement for the data support from National Earth System Science Data Sharing Infrastructure, National Science and Technology Infrastructure of China-Data Center of Lower Yellow River Regions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, Y.; Yang, C.; Zhang, Y.; Xue, Y. Mountainous Areas: Alleviating the Shortage of Cultivated Land Caused by Changing Dietary Structure in China. Land 2023, 12, 1464. [Google Scholar] [CrossRef]
Qiu, B.; Liu, B.; Tang, Z.; Dong, J.; Xu, W.; Liang, J.; Chen, N.; Chen, J.; Wang, L.; Zhang, C.; et al. National-scale 10-m maps of cropland use intensity in China during 2018-2023. Sci. Data 2024, 11, 691. [Google Scholar] [CrossRef]
Marshall, M.; Crommelinck, S.; Kohli, D.; Perger, C.; Yang, M.Y.; Ghosh, A.; Fritz, S.; de Bie, K.; Nelson, A. Crowd-Driven and Automated Mapping of Field Boundaries in Highly Fragmented Agricultural Landscapes of Ethiopia with Very High Spatial Resolution Imagery. Remote Sens. 2019, 11, 2082. [Google Scholar] [CrossRef]
Espindola, G.M.; Camara, G.; Reis, I.A.; Bins, L.S.; Monteiro, A.M. Parameter selection for region-growing image segmentation algorithms using spatial autocorrelation. Int. J. Remote Sens. 2006, 27, 3035–3040. [Google Scholar] [CrossRef]
Karydas, C.G. Optimization of multi-scale segmentation of satellite imagery using fractal geometry. Int J Remote Sens. 2020, 41, 2905–2933. [Google Scholar] [CrossRef]
Waldner, F.; Diakogiannis, F.I. Deep learning on edge: Extracting field boundaries from satellite images with a convolutional neural network. Remote Sens. Environ. 2020, 245, 111741. [Google Scholar] [CrossRef]
Xue, Y.; Zhao, J.; Zhang, M.J.R.S. A Watershed-Segmentation-Based Improved Algorithm for Extracting Cultivated Land Boundaries. Remote Sens. 2021, 13, 939. [Google Scholar] [CrossRef]
Cai, Z.; He, Z.; Wang, W.; Yang, J.; Wei, H.; Wang, C.; Xu, B. Mapping cropland at metric resolution using the spatiotemporal information from multi-source GF satellite data. Natl. Remote Sens. Bull. 2022, 26, 1368–1382. [Google Scholar] [CrossRef]
Belgiu, M.; Dragut, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE J. Sel. Topics. Appl. Earth Observ. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
Bhagwat, R.U.; Shankar, B.U. A novel multilabel classification of remote sensing images using XGBoost. In Proceedings of the 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), Bombay, India, 29–31 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
Wang, L.; Wang, J.; Liu, Z.; Zhu, J.; Qin, F. Evaluation of a deep-learning model for multispectral remote sensing of land use and crop classification. Crop J. 2022, 10, 1435–1451. [Google Scholar] [CrossRef]
Watkins, B.; van Niekerk, A. A comparison of object-based image analysis approaches for field boundary delineation using multi-temporal Sentinel-2 imagery. Comput. Electron. Agric. 2019, 158, 294–302. [Google Scholar] [CrossRef]
Ye, S.; Ren, S.; Song, C.; Du, Z.; Wang, K.; Du, B.; Cheng, F.; Zhu, D. Spatial pattern of cultivated land fragmentation in mainland China: Characteristics, dominant factors, and countermeasures. Land Use Policy 2024, 139, 107070. [Google Scholar] [CrossRef]
Lesiv, M.; Laso Bayas, J.C.; See, L.; Duerauer, M.; Dahlia, D.; Durando, N.; Hazarika, R.; Sahariah, P.K.; Vakolyuk, M.Y.; Blyshchyk, V. Estimating the global distribution of field size using crowdsourcing. Glob. Change Biol. 2019, 25, 174–186. [Google Scholar] [CrossRef]
Attri, I.; Awasthi, L.K.; Sharma, T.P.; Rathee, P. A review of deep learning techniques used in agriculture. Ecol. Inform. 2023, 77, 102217. [Google Scholar] [CrossRef]
Wu, Y.; Peng, Z.; Hu, Y.; Wang, R.; Xu, T. A dual-branch network for crop-type mapping of scattered small agricultural fields in time series remote sensing images. Remote Sens. Environ. 2025, 316, 114497. [Google Scholar] [CrossRef]
Guo, J.; Xu, Q.; Zeng, Y.; Liu, Z.; Zhu, X.X. Nationwide urban tree canopy mapping and coverage assessment in Brazil from high-resolution remote sensing images using deep learning. ISPRS J. Photogramm. Remote Sens. 2023, 198, 1–15. [Google Scholar] [CrossRef]
Xu, F.; Yao, X.; Zhang, K.; Yang, H.; Feng, Q.; Li, Y.; Yan, S.; Gao, B.; Li, S.; Yang, J.; et al. Deep learning in cropland field identification: A review. Comput. Electron. Agric. 2024, 222, 109042. [Google Scholar] [CrossRef]
Garcia-Pedrero, A.; Lillo-Saavedra, M.; Rodriguez-Esparragon, D.; Gonzalo-Martin, C. Deep Learning for Automatic Outlining Agricultural Parcels: Exploiting the Land Parcel Identification System. IEEE Access 2019, 7, 158223–158236. [Google Scholar] [CrossRef]
Chen, W.; Liu, G. A novel method for identifying crops in parcels constrained by environmental factors through the integration of a Gaofen-2 high-resolution remote sensing image and Sentinel-2 time series. IEEE J. Sel. Topics. Appl. Earth Observ. Remote Sens. 2023, 17, 450–463. [Google Scholar] [CrossRef]
Bolfe, E.L.; Parreiras, T.C.; da Silva, L.A.P.; Sano, E.E.; Bettiol, G.M.; Victoria, D.D.; Sanches, I.D.; Vicente, L.E. Mapping Agricultural Intensification in the Brazilian Savanna: A Machine Learning Approach Using Harmonized Data from Landsat Sentinel-2. ISPRS Int. J. Geo-Inf. 2023, 12, 263. [Google Scholar] [CrossRef]
Macarringue, L.S.; Bolfe, E.L.; Duverger, S.G.; Sano, E.E.; Caldas, M.M.; Ferreira, M.C.; Zullo Junior, J.; Matias, L.F. Land Use and Land Cover Classification in the Northern Region of Mozambique Based on Landsat Time Series and Machine Learning. ISPRS Int. J. Geo-Inf. 2023, 12, 342. [Google Scholar] [CrossRef]
Sun, X.; Zhang, P.; Wang, Z.; Yijia, W. Potential of multi-seasonal vegetation indices to predict rice yield from UAV multispectral observations. Precis. Agric. 2024, 25, 1235–1261. [Google Scholar] [CrossRef]
Xiao, T.; She, B.; Zhao, J.; Huang, L.; Ruan, C.; Huang, W. Identification of soybean planting areas using Sentinel-1/2 remote sensing data: A combined approach of reduced redundancy feature optimization and ensemble learning. Eur. J. Agron. 2025, 164, 127480. [Google Scholar] [CrossRef]
Yee-Rendon, A.; Torres-Pacheco, I.; Trujillo-Lopez, A.S.; Romero-Bringas, K.P.; Millan-Almaraz, J.R. Analysis of New RGB Vegetation Indices for PHYVV and TMV Identification in Jalapeno Pepper (Capsicum annuum) Leaves Using CNNs-Based Model. Plants 2021, 10, 1977. [Google Scholar] [CrossRef]
Ulku, I.; Akagündüz, E.; Ghamisi, P. Deep Semantic Segmentation of Trees Using Multispectral Images. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2022, 15, 7589–7604. [Google Scholar] [CrossRef]
Liu, Z.; Li, N.; Wang, L.; Zhu, J.; Qin, F. A multi-angle comprehensive solution based on deep learning to extract cultivated land information from high-resolution remote sensing images. Ecol. Indic. 2022, 141, 108961. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Pan, Z.; Xu, J.; Guo, Y.; Hu, Y.; Wang, G. Deep Learning Segmentation and Classification for Urban Village Using a Worldview Satellite Image Based on U-Net. Remote Sens. 2020, 12, 1574. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the MICCAI, Munich, Germany, 5–9 October 2015; Volume 9351, pp. 234–241. [Google Scholar]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ. 1974, 351, 309. [Google Scholar]
McFeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Gonenc, A.; Ozerdem, M.S.; Acar, E. Comparison of NDVI and RVI Vegetation Indices Using Satellite Images. In Proceedings of the 2019 8th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), Istanbul, Turkey, 16–19 July 2019; pp. 1–4. [Google Scholar]
Garcia-Ruiz, F.; Sankaran, S.; Maja, J.M.; Lee, W.S.; Rasmussen, J.; Ehsani, R. Comparison of two aerial imaging platforms for identification of Huanglongbing-infected citrus trees. Comput. Electron. Agric. 2013, 91, 106–115. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Baret, S.F. Optimization of soil-adjusted vegetation indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar]
Fisher, A.; Rudin, C.; Dominici, F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res. 2019, 20, 1–81. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
Zhang, P.; Hu, S. Fine crop classification by remote sensing in complex planting areas based on field parcel. TCSAE 2019, 35, 125–134. [Google Scholar]
Howard, J.; Gugger, S. Fastai: A Layered API for Deep Learning. Information 2020, 11, 108. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2016; IEEE: Piscataway, NJ, USA, 2017; pp. 6230–6239. [Google Scholar]
Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Batra, A.; Singh, S.; Pang, G.; Basu, S.; Jawahar, C.V.; Paluri, M. Improved Road Connectivity by Joint Learning of Orientation and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10377–10385. [Google Scholar]
Shen, Z.; Luo, J.; Wu, W.; Hu, X. Agricultural and forestry land boundary precise segmentation from remote sensing images by parallel mean shift algorithm. J. Image Graph. 2011, 16, 1689–1695. [Google Scholar]
Zanaga, D.; Van De Kerchove, R.; De Keersmaecker, W.; Souverijns, N.; Brockmann, C.; Quast, R.; Wevers, J.; Grosu, A.; Paccini, A.; Vergnaud, S.; et al. ESA WorldCover 10 m 2020, v100; Zenodo: Cambridge, UK, 2021. [CrossRef]
Brown, C.F.; Brumby, S.P.; Guzder-Williams, B.; Birch, T.; Hyde, S.B.; Mazzariello, J.; Czerwinski, W.; Pasquarella, V.J.; Haertel, R.; Ilyushchenko, S.; et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci. Data 2022, 9, 251. [Google Scholar] [CrossRef]
Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Xie, S.; Mi, J. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth Syst. Sci. Data 2021, 13, 2753–2776. [Google Scholar] [CrossRef]
Li, Z.; Zhang, H.; Lu, F.; Xue, R.; Yang, G.; Zhang, L. Breaking the resolution barrier: A low-to-high network for large-scale high-resolution land-cover mapping using low-resolution labels. ISPRS J. Photogramm. Remote Sens. 2022, 192, 244–267. [Google Scholar] [CrossRef]
Li, Z.; He, W.; Cheng, M.; Hu, J.; Yang, G.; Zhang, H. SinoLC-1: The first 1 m resolution national-scale land-cover map of China created with the deep learning framework and open-access data. Earth Syst. Sci. Data Discuss. 2023, 15, 4749–4780. [Google Scholar] [CrossRef]
Li, Z.; Chen, B.; Wu, S.; Su, M.; Chen, J.M.; Xu, B. Deep learning for urban land use category classification: A review and experimental assessment. Remote Sens. Environ. 2024, 311, 114290. [Google Scholar] [CrossRef]
Li, H.; Huang, J.; Zhang, C.; Ning, X.; Zhang, S.; Atkinson, P.M. An efficient and generalisable approach for mapping paddy rice fields based on their unique spectra during the transplanting period leveraging the CIE colour space. Remote Sens. Environ. 2024, 313, 114381. [Google Scholar] [CrossRef]
Zhao, Z.; Dong, J.; Zhang, G.; Yang, J.; Liu, R.; Wu, B.; Xiao, X. Improved phenology-based rice mapping algorithm by integrating optical and radar data. Remote Sens. Environ. 2024, 315, 114460. [Google Scholar] [CrossRef]
Zhao, H.; Wu, B.; Zhang, M.; Long, J.; Tian, F.; Xie, Y.; Zeng, H.; Zheng, Z.; Ma, Z.; Wang, M.; et al. A large-scale VHR parcel dataset and a novel hierarchical semantic boundary-guided network for agricultural parcel delineation. ISPRS J. Photogramm. Remote Sens. 2025, 221, 1–19. [Google Scholar] [CrossRef]
Chen, X.; Sun, Q.; Guo, W.; Qiu, C.; Yu, A. GA-Net: A geometry prior assisted neural network for road extraction. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 103004. [Google Scholar] [CrossRef]
Zhang, J.; Wu, T.; Luo, J.; Hu, X.; Wang, L.; Li, M.; Lu, X.; Li, Z. Toward Agricultural Cultivation Parcels Extraction in the Complex Mountainous Areas Using Prior Information and Deep Learning. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4402414. [Google Scholar] [CrossRef]

Figure 1. The overall workflow of this study.

Figure 2. Geographic location and topography of the research region. Abbreviations for township names: Dianzhuang Town (DZT), Huaixin Street (HXS), Shouyangshan Street (SYSS), Zhai Town (ZT), Yuetang Town (YTT), Guxian Town (GXT), Goushi Town (GST), Fudian Town (FDT), Gaolong Town (GLT), Shanhua Town (SHT), Mangling Town (MLT), Dakou Town (DKT), Shangcheng Street (SCS), Yiluo Street (YLS), Koudian Town (KDT), Pangcun Town (PCT), Licun Town (LCT), and Zhuge Town (ZGT).

Figure 3. Sample examples in the study area. (a,c) show the fused standard false-color composite images from GF-2, acquired on 16 December 2020, while (b,d) show the corresponding samples.

Figure 4. Location distribution of predicted site. (A–G) are the corresponding GF-2 standard false-color composite images. Site A is in the northern hills, Sites B and C are in the plains, Site D is in the transition zone, and Sites E, F, and G are in the southern hills and plains, respectively. Employing 30 m resolution SRTM DEM data, the research delineated the study area’s landforms into three types based on elevation: plains (0–200 m), hills (200–500 m), and mountains (over 500 m).

Figure 5. Improved U-Net architecture employed in the study (built upon Ronneberger et al. [32]).

Figure 6. Mean and standard deviation of each feature.

Figure 7. The PCA and PFI scores results for each feature. (a) shows the pearson correlation analysis results between features. The correlation results among these features are significant at the 0.001 level. (b) shows the permutation feature importance scores for each feature. PFI scores are uniformly multiplied by a constant value of 1000.

Figure 8. Cultivated land map spatial details in the study area for 2022. (a) represents the cultivated land extraction results for the entire study area; (b,c) show cultivated land extraction results for the northern hilly region; (d) represents results for the central plain; (e,f) show results for the southern hilly region. (g–k) display zoomed-in views within the blue boxes from (b–f), overlaid with GF-2 standard false-color imagery.

Figure 9. Cultivated land area changes results from 2017 to 2022.

Figure 10. Map and statistics of field size in the study area for 2020. (a) shows the field size distribution map for the entire study area; (b), (c), and (d) represent the magnified views of the fields in the northern hilly area, central plain area, and southern hilly area of the study region, respectively.

Figure 11. Training process of U-Net_FS and U-Net_WFS models with different numbers of sample patches. (a–f) represent the changes in model accuracy and validation loss for the U-Net_FS and U-Net_WFS models at patch sizes of 512 × 512 pixels, 448 × 448 pixels, 256 × 256 pixels, 224 × 224 pixels, 128 × 128 pixels, and 112 × 112 pixels, respectively.

Figure 12. Cultivated land extraction results in the test areas. Test areas A, B, and C are located in plain regions, while area D represents a hilly region. Green represents cultivated land. (a) shows the schematic map of the test area location; (b,d,f,h) are standard false-color images of GF-2 for test areas A, B, C, and D, respectively; (c,e,g,i) are the corresponding cultivated land extraction results for these images.

Figure 13. A visual comparison of the results between the cultivated land extraction method proposed in this study and publicly released cultivated land products. Green represents cultivated land, and (g–x) denotes the zoomed-in views within the blue boxes from (a–f).

Figure 14. The classification sensitivity of the U-Net model to each feature.

Figure 15. Correlation between cultivated land fragmentation and potential influencing factors. The result is significant at the 0.001 level.

Table 1. GF-2 satellite sensor parameters.

Payload	Band Number	Wavelength Range (µm)	Band Name	Spatial Resolution (m)	Revisit Interval (Day)
Panchromatic Multispectral Camera	1	0.45~0.90	Panchromatic (Pan)	1	5
	2	0.45~0.52	Blue	4
	3	0.52~0.59	Green
	4	0.63~0.69	Red
	5	0.77~0.89	Near-Infrared (Nir)

Table 2. The area of the quadrat involved in model training from 2017 to 2022 (km²).

Year	Month	Cultivated Land	Uncultivated Land	Total Area	The Total Area of Quadrat Each Year
2017	2	97.74	68.11	165.85	165.85
2019	7	33.77	23.47	57.24	57.24
2020	2	8.44	5.77	14.21	449.68
	8	31.31	13.69	45.00
	9	8.04	4.72	12.75
	10	53.76	27.22	80.99
	11	35.07	20.32	55.39
	12	131.16	110.18	241.34
2021	11	35.07	23.84	58.91	58.91
2022	3	93.41	66.31	159.72	159.72

Table 3. Vegetation indices.

ID	Vegetation Index	Expression
1	NDVI	(NIR − Red)/(NIR + Red)
2	NDWI	(Green − NIR)/(Green + NIR)
3	RVI	NIR/Red
4	DVI	NIR − Red
5	GNDVI	(NIR − Green)/(NIR + Green)
6	OSAVI	(NIR − Red)/(NIR + Red + 0.16)

Table 4. Design scheme of the cultivated land information extraction model.

Scheme	Band Composition	Description
U-Net_WFS	Blue, Green, Red, NIR	Spectral Features
U-Net_FS	Blue, Green, Red, NIR, NDVI, RVI, DVI, OSAVI	Spectral Features + Vegetation Features

Table 5. Assessment results of cultivated land extraction under various landforms.

Site	Physiognomy	Evaluation Metrics	Scheme
			U-Net_WFS		U-Net_FS
			Cultivated Land	Uncultivated Land	Cultivated Land	Uncultivated Land
A	Northern Hilly Area	IoU	0.8632	0.7363	0.8712	0.7582
		PA	0.9380	0.8272	0.9538	0.8234
		UA	0.9153	0.8702	0.9095	0.9055
		F₁	0.8877		0.8980
		OA	0.9010		0.9082
		Kappa	0.7748		0.7939
B	Plain	IoU	0.9095	0.8867	0.9151	0.8912
		PA	0.9665	0.9231	0.9596	0.9375
		UA	0.9392	0.9573	0.9518	0.9474
		F₁	0.9465		0.9491
		OA	0.9470		0.9499
		Kappa	0.8926		0.8981
C	Plain	IoU	0.9416	0.8600	0.9439	0.8653
		PA	0.9862	0.8881	0.9875	0.8909
		UA	0.9542	0.9646	0.9553	0.9678
		F₁	0.9481		0.9503
		OA	0.9570		0.9588
		Kappa	0.8947		0.8990
D	Transition Zone between Plains and Hilly Areas	IoU	0.8876	0.8072	0.9024	0.8271
		PA	0.9722	0.8440	0.9701	0.8700
		UA	0.9107	0.9488	0.9282	0.9437
		F₁	0.9188		0.9279
		OA	0.9236		0.9335
		Kappa	0.8341		0.8542
E	Southern Hilly Area	IoU	0.8021	0.7589	0.8401	0.8092
		PA	0.8606	0.9015	0.8934	0.9192
		UA	0.9218	0.8274	0.9338	0.8711
		F₁	0.8779		0.9044
		OA	0.8781		0.9047
		Kappa	0.7535		0.8078
F	Southern Hilly Area	IoU	0.5885	0.7447	0.6336	0.8176
		PA	0.6681	0.9097	0.8083	0.8837
		UA	0.8317	0.8042	0.7456	0.9162
		F₁	0.8032		0.8384
		OA	0.8130		0.8613
		Kappa	0.5974		0.6756

Table 6. Evaluation of cultivated land extraction results under different temporal phases.

Image Date	Evaluation Metrics	Scheme
		U-Net_WFS		U-Net_FS
		Cultivated Land	Uncultivated Land	Cultivated Land	Uncultivated Land
20170211	IoU	0.9358	0.7249	0.9336	0.7159
	PA	0.9768	0.8012	0.9753	0.7963
	UA	0.9571	0.8839	0.9561	0.8764
	F₁	0.9045		0.9008
	OA	0.9451		0.9431
	Kappa	0.8074		0.8002
20171209	IoU	0.9041	0.6468	0.9022	0.6345
	PA	0.9482	0.7907	0.9434	0.7955
	UA	0.9511	0.7804	0.9539	0.7582
	F₁	0.8676		0.8627
	OA	0.9184		0.9164
	Kappa	0.7351		0.7250
20190713	IoU	0.9077	0.6652	0.9099	0.6732
	PA	0.9676	0.7478	0.9695	0.7513
	UA	0.9362	0.8577	0.9367	0.8663
	F₁	0.8769		0.8804
	OA	0.9220		0.9240
	Kappa	0.7509		0.7578
20201216	IoU	0.8874	0.6564	0.8915	0.6610
	PA	0.9629	0.7330	0.9601	0.7476
	UA	0.9189	0.8628	0.9258	0.8509
	F₁	0.8688		0.8708
	OA	0.9074		0.9105
	Kappa	0.7335		0.7389
20211112	IoU	0.9316	0.7709	0.9295	0.7683
	PA	0.9724	0.8458	0.9752	0.8330
	UA	0.9569	0.8970	0.9520	0.9082
	F₁	0.9179		0.9169
	OA	0.9444		0.9429
	Kappa	0.8353		0.8325
20220310	IoU	0.9251	0.7435	0.9290	0.7565
	PA	0.9704	0.8232	0.9734	0.8285
	UA	0.9520	0.8849	0.9531	0.8969
	F₁	0.9075		0.9128
	OA	0.9385		0.9418
	Kappa	0.8141		0.8246

Table 7. Evaluation results of different methods for cultivated land extraction from high-resolution imagery.

Model	mIoU	Recall	Precision	F₁
PSPNet	76.61%	92.84%	81.60%	86.86%
DeepLabV3	79.43%	91.67%	85.83%	88.65%
U-Net	80.19%	93.94%	84.73%	89.10%
Our method	80.88%	94.65%	84.96%	89.55%

Table 8. Cultivated land area changes at the township scale in the study area from 2017 to 2022 (km²).

Township Name	Total Land Area of the Township	Cultivated Land Area in 2017	Increase Area	Decrease Area	Ultimate Change Area
DZT	38.47	22.64	0.78	−0.87	−0.09
HXS	14.46	2.30	0.33	−0.80	−0.47
SYSS	53.58	25.77	2.03	−2.83	−0.81
ZT	31.81	19.51	0.76	−1.51	−0.75
YTT	29.31	12.63	0.70	−1.58	−0.88
GXT	43.85	22.82	1.16	−2.26	−1.10
GST	80.30	49.97	1.16	−2.48	−1.32
FDT	127.09	27.48	1.12	−2.42	−1.30
GLT	37.59	23.92	0.76	−0.76	0.00
SHT	68.20	35.29	1.71	−3.03	−1.32
MLT	58.43	33.38	1.61	−1.20	0.41
DKT	88.42	37.86	1.41	−1.92	−0.51
SCS	20.79	3.33	0.73	−0.81	−0.08
YLS	11.85	4.34	0.15	−0.88	−0.73
KDT	63.97	24.08	1.19	−2.06	−0.87
PCT	32.88	17.88	1.35	−0.86	0.49
LCT	83.89	33.35	3.14	−7.29	−4.15
ZGT	60.73	22.64	2.46	−4.23	−1.77
Total area	945.62	419.19	22.55	−37.81	−15.26

Table 9. Field size statistics of the study area.

Field Size	Field Count	Proportion of Total Fields
Very small	80,935	83.84%
Small	14,535	15.06%
Medium	1061	1.10%

Table 10. The model accuracy for U-Net_WFS and U-Net_FS under different patch sizes.

Patch Size (Pixels)	mIoU		F₁		Mean Epoch Time (s)
Patch Size (Pixels)	U-Net_WFS	U-Net_FS	U-Net_WFS	U-Net_FS	U-Net_WFS	U-Net_FS
512 × 512	0.7369	0.6768	0.8434	0.7990	201	232
448 × 448	0.6963	0.7391	0.8106	0.8463	187	219
256 × 256	0.7673	0.7768	0.8767	0.8841	212	243
224 × 224	0.7821	0.7914	0.8846	0.8892	218	261
128 × 128	0.8099	0.8047	0.9051	0.9020	324	360
112 × 112	0.7804	0.7882	0.8880	0.8929	400	411

Table 11. The model accuracy under different training sample sizes.

Patch Size Number	1950	3901	5851	7801	9752	11,702	13,652	15,602	17,553
Percentage of Patch Number	10%	20%	30%	40%	50%	60%	70%	80%	90%
mIoU	0.6511	0.6981	0.7735	0.7597	0.7699	0.7612	0.7670	0.7733	0.7783
F1	0.8309	0.8457	0.8843	0.8742	0.8791	0.8794	0.8836	0.8837	0.8867
Mean Epoch Time (s)	290	360	435	439	540	620	630	732	787

Table 12. The model accuracy of U-Net_WFS and U-Net_FS under multi-temporal samples.

Model	mIoU	Precision	Recall	F₁	Total Duration of Model Training (h)
U-Net_FS	80.88%	84.96%	94.65%	89.55%	26.5
U-Net_WFS	80.19%	84.73%	93.94%	89.10%	20.3

Table 13. Information on publicly released cultivated land products used in this study.

Name	Year	Spatial Resolution (m)	Data Source	OA
ESA_WorldCover	2020	10	Sentinel-1/2	74.4%
Dynamic World	2020	10	Sentinel-2	/
GLC_FCS	2020	30	Landsat TM, ETM+ and OLI	77.34%
SinoLC-1	/	1	Google	73.61%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Guo, J.; Li, C.; Wang, L.; Gao, D.; Bai, Y.; Qin, F. Effective Cultivated Land Extraction in Complex Terrain Using High-Resolution Imagery and Deep Learning Method. Remote Sens. 2025, 17, 931. https://doi.org/10.3390/rs17050931

AMA Style

Liu Z, Guo J, Li C, Wang L, Gao D, Bai Y, Qin F. Effective Cultivated Land Extraction in Complex Terrain Using High-Resolution Imagery and Deep Learning Method. Remote Sensing. 2025; 17(5):931. https://doi.org/10.3390/rs17050931

Chicago/Turabian Style

Liu, Zhenzhen, Jianhua Guo, Chenghang Li, Lijun Wang, Dongkai Gao, Yali Bai, and Fen Qin. 2025. "Effective Cultivated Land Extraction in Complex Terrain Using High-Resolution Imagery and Deep Learning Method" Remote Sensing 17, no. 5: 931. https://doi.org/10.3390/rs17050931

APA Style

Liu, Z., Guo, J., Li, C., Wang, L., Gao, D., Bai, Y., & Qin, F. (2025). Effective Cultivated Land Extraction in Complex Terrain Using High-Resolution Imagery and Deep Learning Method. Remote Sensing, 17(5), 931. https://doi.org/10.3390/rs17050931

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effective Cultivated Land Extraction in Complex Terrain Using High-Resolution Imagery and Deep Learning Method

Abstract

1. Introduction

2. Materials and Methods

2.1. Research Region

2.2. Data and Preprocessing

2.2.1. Gaofen-2 (GF-2) High-Resolution Satellite Data

2.2.2. GF-2 Diversified Farmland Sample Dataset (GF-2 DFSD)

2.2.3. Test Site Locations

2.3. Improved U-Net Architecture

2.4. Feature Selection

2.4.1. Vegetation Indices

2.4.2. Pearson Correlation Analysis (PCA)

2.4.3. Permutation Feature Importance (PFI)

2.4.4. Feature Combination Scheme

2.5. Training Parameter Settings

2.6. Evaluation Metrics

3. Results

3.1. Visual Evaluation of Cultivated Land Mapping Results

3.2. Quantitative Evaluation of Cultivated Land Mapping Results

3.2.1. Classification Results Evaluation Under Different Landforms

3.2.2. Classification Results Evaluation Under Different Temporal Phases

3.3. Comparisons with Other Methods

3.4. Analysis of Cultivated Land Area and Changes

3.5. Analysis of Cultivated Land Fragmented Patterns Based on Field Size

4. Discussion

4.1. Ablation Study of the Proposed Deep Learning Method

4.1.1. The Performance of Models Under Different Patch Sizes

4.1.2. The Performance of Models Under Different Sample Sizes

4.1.3. The Performance of Models Under Multi-Temporal Samples

4.2. Comprehensive Evaluation of Cultivated Land Products

4.2.1. High Spatial Generalization Capability

4.2.2. Comparison of Publicly Available Cultivated Land Product Details

4.3. Model’s Sensitivity Analysis

4.4. Causes of Cultivated Land Fragmentation and Decrease

4.5. Limitations and Futurework

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI