Geographical Zoning-Based Classification of Agricultural Land Use in Hilly and Mountainous Areas Using High-Resolution Remote Sensing Images

Zhang, Junyao; Yang, Xiaomei; Wang, Zhihua; Liu, Xiaoliang; Wu, Haiyan; Cai, Xiaoqiong; Fu, Shifeng

doi:10.3390/rs18081259

Open AccessArticle

Geographical Zoning-Based Classification of Agricultural Land Use in Hilly and Mountainous Areas Using High-Resolution Remote Sensing Images

by

Junyao Zhang

^1,2,†

,

Xiaomei Yang

^2,3,†

,

Zhihua Wang

^2,3,†

,

Xiaoliang Liu

^4,5,

Haiyan Wu

¹,

Xiaoqiong Cai

¹ and

Shifeng Fu

^1,*

¹

Third Institute of Oceanography, Ministry of Natural Resources, Xiamen 361005, China

²

State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Jiangsu Province Land and Resources Research Center, Nanjing 210017, China

⁵

Key Laboratory of Coastal Zone Exploitation and Protection, Ministry of Natural Resources, Nanjing 210017, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2026, 18(8), 1259; https://doi.org/10.3390/rs18081259

Submission received: 22 February 2026 / Revised: 14 April 2026 / Accepted: 17 April 2026 / Published: 21 April 2026

(This article belongs to the Special Issue Mapping Essential Elements of Agricultural Land Using Remote Sensing (2nd Edition))

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A spatiotemporal feature-driven geographical zoning method was established by systematically integrating vegetation phenology, topographic characteristics, and human activities.
The proposed zoning-based method significantly improved agricultural land use classification, increasing overall accuracy by 6% and IoU by 12.58%.

What are the implications of the main findings?

Decoupling the complex global classification task into homogeneous local subzones with optimized segmentation scales provides an effective mechanism to resolve severe spectral confusion while preserving high-resolution boundary details.
The findings suggest that incorporating geographical knowledge as prior constraints can complement data-driven approaches for accurate agricultural land use classification in geographically complex areas.

Abstract

Accurately mapping agricultural land use in fragmented hilly and mountainous areas is crucial for resource management but is severely challenged by spatial heterogeneity. While high-resolution (HR) images excel at delineating fine parcel boundaries, their limited spectral and temporal information often leads to spectral confusion among diverse agricultural types. To address this limitation, this study proposes a novel spatiotemporal feature-driven geographical zoning method integrating vegetation phenology, topography, and human activity. This zoning strategy decouples the complex global classification task into relatively simple local problems, providing explicit geoscientific constraints for subsequent classification. The proposed method was validated by classifying plain open-field croplands, sloping croplands, terraces, and greenhouses in the hilly and mountainous areas of Beijing using 2 m resolution satellite images. Compared to traditional global classification methods, the proposed zoning-based method increased the overall accuracy from 84.81% to 90.81%, the Kappa coefficient from 0.74 to 0.85, and the Intersection over Union (IoU) from 77.85% to 90.85%. The advantages of geographic zoning were particularly evident in mitigating spatial heterogeneity and enhancing boundary precision. These findings indicate that integrating dynamic geographical zoning as a priori knowledge successfully bridges the gap between HR spatial details and environmental contexts, offering a robust solution for mapping fragmented agricultural landscapes.

Keywords:

geographical zoning; agricultural land use; high resolution; segmentation boundary; spectral heterogeneity; Google Earth Engine; Gaofen-6; Gaofen-2; Sentinel-2

1. Introduction

Agricultural land in hilly and mountainous areas constitutes a crucial component of China’s agricultural resources. Data from the Second National Land Survey indicate that cultivated land with a slope greater than 6° accounts for 27% of the total national agricultural area [1]. However, extensive studies have shown that the abandonment of agricultural land is particularly severe in these complex terrains [2]. Statistics indicate that from 2014 to 2015, the average abandonment rate in mountainous counties (where mountainous areas exceed 80% of the county area) reached 14.32%, with an upward trend [3,4]. Given the dual pressures of severe land abandonment and increasing food demand driven by population growth and dietary shifts, the land area required for global food production is approaching a critical threshold [5]. Under such land scarcity, maximizing the utilization efficiency of existing farmlands has become imperative. However, achieving this goal requires a comprehensive understanding of the current spatial distribution and specific categories of agricultural land use, which remains insufficiently characterized in hilly and mountainous regions due to the complexity of the terrain and the diversity of cultivation practices. Accurately mapping this information therefore serves as the fundamental prerequisite for optimizing resource allocation and ensuring food security.

Agricultural land use patterns in hilly and mountainous areas are shaped by a combination of undulating topography, soil variations, and human cultivation practices, resulting in spatial heterogeneity and parcel fragmentation. This topographic complexity significantly increases the difficulty of agricultural land-use classification. Traditional medium-to-low spatial resolution imagery often struggles in these environments, primarily due to the severe mixed-pixel effect that blurs parcel boundaries [6,7]. In contrast, high-resolution (HR) remote sensing images, with their superior spatial details, can clearly delineate the geometric boundaries and internal textures of small parcels, providing an opportunity for the precise extraction of fragmented agricultural land use in hilly and mountainous areas [8,9,10].

Despite these distinct advantages in spatial representation, HR images present clear limitations in agricultural classification, primarily due to the lack of temporal information and restricted spectral bands. Constrained by narrow swath widths and long revisit cycles, HR sensors often fail to capture the continuous phenological dynamics. In hilly regions characterized by distinct seasonal variations and complex terrain, this absence of time-series data exacerbates spectral confusion [11]. A typical example is the misclassification between snow-covered terraces and plastic greenhouses; without phenological background information, their similar high reflectance and smooth textures make them easily confused. Furthermore, HR sensors typically offer fewer spectral bands than medium-resolution data, making it difficult to capture the subtle spectral differences required to distinguish specific agricultural land use types. To overcome these limitations, existing studies have attempted to combine HR images with medium-resolution time-series data using spatiotemporal fusion algorithms or by constructing feature indices [12,13,14]. However, these data-driven fusion strategies focus on the statistical reconstruction of spectral values while neglecting the intrinsic logical connection between land use distribution and geographical environment. Consequently, relying solely on increasing feature dimensions to improve accuracy often encounters bottlenecks [15]. Moreover, the requisite resampling operations (upscaling or downscaling) to match disparate spatial resolutions not only incur high computational costs but also introduce cumulative errors, thereby limiting their effectiveness in complex agricultural landscapes.

The inherent spatial correlation between agricultural land-use and geographical environments provides a more robust a priori knowledge base for optimizing classification. For instance, the spatial distribution of different agricultural land use types is closely related to topographic factors such as slope and elevation [16]. Geographical zoning, as a method to quantify this spatial heterogeneity, integrates the spatial patterns of multidimensional geographical elements to divide a study area into sub-regions that are internally homogeneous but externally heterogeneous. This strategy effectively decouples a complex global classification task into several relatively simple local problems, providing prior geoscientific constraints for classifiers [17]. Previous studies in ecological spatial analysis and local classifier training have demonstrated the effectiveness of geographical zoning in reducing data complexity and enhancing classification performance [18,19,20,21,22,23]. However, most existing geographical zoning studies rely on static, coarse-resolution datasets, such as climatic zoning or large-scale topographic zoning. Their spatial resolution and temporal dynamics are often insufficient to match the classification requirements of HR images. Moreover, systematic research on the mechanisms by which geographical zoning affects classification accuracy remains relatively scarce [24,25].

To address the aforementioned challenges, this study proposes an innovative spatiotemporal feature-driven geographical zoning framework. This framework overcomes the limitations of traditional static zoning by integrating spatiotemporal features including vegetation phenology, topography, and human activity. It transforms the implicit geospatial association rules between agricultural land use and the geographical environment into explicit spatial zoning constraints. By introducing this dynamic zoning strategy, this study aims to mitigate the misclassifications caused by spectral confusion and the lack of temporal information in HR images, while preserving their advantages in delineating fragmented parcel boundaries. Furthermore, this study will systematically evaluate the application efficacy of this geographical zoning and investigate the mechanisms driving its accuracy improvements. Specifically, this study will answer the following two key research questions: (1) How can a geographical zoning model tailored for agricultural land use classification in hilly and mountainous areas be constructed? (2) Through what mechanisms does geographical zoning achieve the improvement of classification accuracy?

2. Materials and Methods

2.1. Study Area and Data

The study area is located in the northwest of Beijing, China (115.7–116.6°E, 40.1–40.6°N), at the intersection of Shunyi, Yanqing, Changping, and Huairou districts. It is also near the boundary between the second and third topographic tiers of China. The study area encompasses a total area of 2780 km², with an elevation ranging up to 2200 m, as shown in Figure 1. The general topography of Beijing features higher elevations in the northwest and lower elevations in the southeast, with mountainous regions accounting for 61% of the total land area. The climate is classified as warm temperate semi-humid and semi-arid monsoon, characterized by hot, rainy summers, cold and dry winters, and relatively short spring and autumn seasons. According to the Major Data Bulletin of the Third National Land Survey of Beijing [26], the total cropland in Beijing in 2019 amounted to 93,547.90 ha. Daxing, Shunyi, Yanqing, and Fangshan were the four districts with the largest areas of cropland, together accounting for 61.71% of the total cropland. Previous studies have shown that from 1996 to 2004, the total area of cropland in the mountainous regions of Beijing decreased by 65,994.39 ha [27]. Research on the cropland in Yanqing indicates that the cropland patches are highly fragmented and irregularly shaped [28].

The image data used in this study include: (1) Sentinel-2 (S2) imagery with a 10-m resolution and less than 10% cloud cover, from 1 January to 31 December 2020, which was used for the median monthly NDVI (Normalized Difference Vegetation Index) time-series composite [29]. The image screening and NDVI time-series composition were performed using the Google Earth Engine (GEE) platform (https://earthengine.google.com/); (2) Gaofen-6 (GF-6) imagery with a 2 m resolution from December 2020, which was used as input for the classification process to extract different agricultural land use types; (3) Gaofen-2 (GF-2) imagery with a 0.8 m resolution from January to June 2020, which was used to assist in sample selection and validation. All remote sensing images underwent geometric, radiometric, and atmospheric corrections, as detailed in Table 1.

Other data include: (1) ALOS PALSAR DEM (Digital Elevation Model) data with a 12.5 m resolution, used to extract elevation, slope, and relief features as input for geographical zoning, and the data were resampled to 10 m using the Bi-Cubic Interpolation Method [30], which was chosen for its ability to produce smoother and more continuous surfaces compared to Nearest Neighbor and Bilinear approaches, thereby better preserving the accuracy of derived topographic features (slope and relief) that serve as critical inputs for the geographical zoning process; (2) ESA (European Space Agency) World Cover data for 2020 with a 10 m resolution [31], which was used to extract thematic information on built-up areas within the study area. This data served as input for the zoning process. In selecting the land use data, we referred to recent studies evaluating the performance of 10 m land use products for various thematic extractions [32,33]. Based on these studies, we chose the ESA World Cover data due to its overall performance, particularly in the delineation of built-up areas.

2.2. Classification System and Interpretation Keys

To ensure clear definitions and standardized recognition of agricultural land use types, we established a classification system and corresponding interpretation keys based on field surveys and remote sensing imagery.

The classification system in this study is based on the national standards from China’s Third National Land Survey, which provides an authoritative framework for land use categorization. It adopts a hierarchical structure, the primary level distinguishes between cropland and non-cropland (including forests, grasslands, buildings, and water bodies), while the cropland category is further subdivided into four secondary types: plain open-field cropland, greenhouses, slope cropland, and terraces. This differs from conventional land use classification systems, which primarily focus on large-scale land cover categories, such as cropland vs. non-cropland, or crop type identification. In contrast, our classification emphasizes the morphological characteristics and management practices influenced by topography, a crucial factor for agricultural land use in hilly and mountainous regions, as further discussed in Section 4.1.

Based on the classification system, the interpretation keys describe the specific visual features of each cropland type, as shown in Figure 2. Plain open-field cropland is mainly distributed in flat areas, characterized by relatively concentrated, regular block-shaped patches with clear and uniform texture, and nearby human activity areas such as buildings. Greenhouses are generally located in areas with a spatial distribution similar to that of plain open-field cropland. They exhibit regularly arranged strip-like patterns that can be classified into two types based on the presence of plastic coverings: covered greenhouses appear as densely packed light-colored strips, while uncovered greenhouses, often left fallow without crops, are represented by densely arranged brown strips. Slope cropland is mainly located at the boundary between plains and mountainous areas, characterized by irregular boundaries and non-uniform planting textures. Terraces are mainly distributed in areas with significant elevation variations, and are visually characterized by step-like structures arranged in successive horizontal tiers along the slope.

The field survey was conducted in 2022 along the route shown in Figure 3. We collected 26 representative sample points, including 9 for plain open-field cropland, 3 for greenhouses, 7 for slope cropland, 3 for terraces, and 4 for non-cropland. For each point, we recorded coordinates, elevation, land use type, and field photographs. This information was compiled into documents (as shown in Appendix A) to support the subsequent classification process. The primary purpose of the field survey was to establish the interpretation keys by documenting the morphological and structural characteristics of each agricultural land use type. Since these characteristics reflect inherent landscape features that remain stable over short time periods, the two-year gap between the field survey and the 2020 classification target does not compromise the validity of the interpretation keys. Locations where land use changes were observed between 2020 and 2022 on GF-2 imagery were excluded from the key compilation.

2.3. Sample Collection

The sample collection followed a stratified strategy to ensure both spatial uniformity and statistical representativeness [34,35]. First, to guarantee uniform spatial coverage, a hexagonal grid system consisting of 523 equal-sized cells (each approximately 5.85 km²) was established to cover the entire study area. The hexagonal grid was selected over square grids or random sampling due to its superior spatial isotropy (equidistance to all neighboring cells) and lowest perimeter-to-area ratio among tessellating polygons, which effectively minimizes edge effects and reduces spatial autocorrelation during the sampling process. A minimum of two sample points were allocated to each grid cell. The specific number of samples for each category was determined based on the proportional area of land cover types, guided by spatial distribution patterns identified during the preliminary field surveys. Subsequently, the ground truth labels for these samples were identified through visual interpretation of GF-2 HR images, strictly applying the interpretation keys and feature recognition rules established in Section 2.2. Notably, the 26 field survey points (Section 2.2) were used exclusively for establishing the interpretation keys and were not included in the training or validation datasets. Finally, the collected samples were separated into two non-overlapping subsets through random partitioning: a training set containing 650 samples used for model construction, and a validation set containing 1600 samples reserved for accuracy assessment. The spatial distribution and detailed quantity of these samples are illustrated in Figure 4.

2.4. Building a Spatiotemporal Feature-Driven Hierarchical Geographical Zoning Method for Classification

The geographical zoning process integrates time-series NDVI, land use datasets, and DEM through clustering algorithms, morphological processing, and spatial overlay analysis, establishing a two-tier structure. The first level delineates the spatial extent of cropland agglomerations, while the second level further segments these areas into distinct zones dominated by specific agricultural land uses. The complete workflow is illustrated in Figure 5.

2.4.1. First-Level Zoning: Identifying Cropland Agglomerations

The distinction between cropland and non-cropland can be effectively captured through time-series analysis of the NDVI [36,37]. Unlike natural vegetation, croplands are characterized by managed planting and harvesting cycles, resulting in pronounced periodic fluctuations in their NDVI profiles. Specifically, croplands exhibit sharp increases during the growing season followed by rapid declines after harvest, whereas non-croplands typically show more gradual or irregular phenological changes.

To leverage these temporal features, we constructed a 12-month NDVI time series using 10-m resolution Sentinel-2 imagery for the year 2020 via the GEE platform. The formula for calculating NDVI is:

N D V I = \frac{N I R - R E D}{N I R + R E D}

(1)

where NIR and RED represent the near-infrared and red bands of the image. To mitigate noise from clouds and atmospheric conditions, monthly median composites were first generated, followed by smoothing using the Savitzky–Golay (SG) filter [38]. Subsequently, to differentiate cropland patterns, a time-series clustering approach based on Dynamic Time Warping (DTW) was applied [39]. DTW is robust for phenological analysis as it accounts for temporal shifts and dynamic growth variations better than standard Euclidean distance. Prior to clustering, water bodies and built-up areas were masked using ESA World Cover data to eliminate non-vegetative interference. The number of clusters was set to two. As shown in Figure 6a, Class A exhibits the high seasonal variability typical of cultivation and was identified as the phenology-based candidate zone.

In addition to spectral features, cropland distribution is strongly correlated spatially with human settlements and economic activities [40]. To quantify this relationship, we calculated the Euclidean distance between the cropland samples (collected in Section 2.3) and built-up areas identified in the ESA World Cover dataset. A cumulative frequency analysis revealed that 95% of the cropland samples were located within a distance of approximately 1180 m from built-up areas (Figure 6b). Based on this threshold, a buffer zone was established around all built-up regions to define the distance-based constraint zone.

Finally, the first-level zoning was established by spatially intersecting the two aforementioned results: the phenology-based candidate zone and the distance-based constraint zone. This intersection represents the cropland agglomeration, delineating areas that possess both the phenological characteristics of crops and the spatial logic of human activity. Areas outside this intersection were classified as non-cropland zones and excluded from further subdivision.

2.4.2. Second-Level Zoning: Delineating Specific Agricultural Land Use Zones

Topography significantly influences planting patterns and management practices, thereby fundamentally shaping agricultural land use types [41]. Elevation, in particular, affects climatic conditions, which in turn impact crop growth. In low-elevation, flat areas, accessibility and convenient transportation facilitate efficient planning and the establishment of irrigation networks, resulting in regularly shaped cropland. Conversely, as elevation and slope increase, agricultural layout is constrained. In steeply sloped regions, soil erosion becomes a critical issue, prompting the adoption of terracing to minimize erosion and improve cultivation efficiency. In areas with greater topographic relief, farming strategies must be adjusted based on terrain characteristics to make effective use of water and soil resources.

Given the significant differences in topographic characteristics associated with various agricultural land use types, this study further delineated the concentrated distribution zones for plain open-field cropland, greenhouses, terraces, and slope cropland. Elevation, slope, and relief were derived from the DEM for the study area and used to analyze the distribution frequency of agricultural land use types. The results of this analysis, presented in Figure 7 as an intermediate product of the zoning workflow, provided the quantitative basis for establishing the zoning rules. To establish specific zoning rules, the 95% cumulative frequency interval was calculated independently for each topographic factor (elevation, slope, and relief) based on their sample distributions. The spatial intersection of these independent topographic thresholds was then used to delineate the concentrated zones for each agricultural land use type. Based on the numerical outputs derived from the 95% cumulative frequency, the concentrated zone for plain open-field cropland was defined as “(DEM < 650 m) and (Relief < 7 m) and (Slope < 9°)”, for slope cropland as “(DEM < 800 m) and (Relief < 17 m) and (5° < Slope < 23°)”, for terraces as “(600 m ≤ DEM < 1000 m) and (5 m < Relief < 21 m) and (6° < Slope < 21°)”, and for greenhouses as “(DEM < 650 m) and (Relief < 7 m) and (Slope < 9°)”. The results showed that the concentrated zones for plain open-field cropland and greenhouses overlapped entirely, due to their similar topographic conditions, and the two categories were therefore merged into a single zone. Their differentiation was instead achieved in the subsequent classification stage through spectral, textural, and geometric features derived from the high-resolution imagery.

The regions identified through these rules were refined using morphological opening and closing operations to eliminate fragmented patches and fill small gaps. After iterative testing, the radii for the opening and closing operations were set to 7 and 5 pixels, respectively, informed by the methodology adopted in our previous work [42]. Subsequently, the zones for different types were spatially integrated using a union operation. To resolve spatial overlaps where a single patch satisfied multiple topographic rules, attributes were assigned based on the majority voting principle: each patch was attributed to the land use type with the highest number of training samples within it. Patches with identical attributes were then merged to ensure consistency. The final outcome of this zoning procedure was the delineation of distinct second-level zones, including the plain open-field cropland and greenhouses zone, the slope cropland zone, and the terrace zone, as illustrated in Figure 8, which provided the spatial framework for the subsequent stratified segmentation and classification.

2.5. Multiresolution Segmentation of Remote Sensing Image

High-quality image segmentation is a prerequisite for object-oriented classification. In this study, the Multiresolution Segmentation (MRS) algorithm within the eCognition software V10.1 was employed. To avoid the subjectivity of trial-and-error in selecting the segmentation scale, we adopted the Estimation of Scale Parameter (ESP) method. This method determines the optimal scale by calculating the Rate of Change of Local Variance (ROC-LV), which indicates the degree of homogeneity within image objects [43]. The scale corresponding to the peak ROC-LV value represents the point where the internal homogeneity of objects is maximized relative to their heterogeneity. The ROC-LV is calculated as follows:

R O C = [\frac{L - (L_{n - 1})}{L_{n - 1}}] \times 100

(2)

where

L

represents the local variance at the current scale, and

L_{n - 1}

is the local variance value at the preceding lower scale.

To investigate the impact of zoning on segmentation quality, segmentation was conducted in two stages:

Stage 1: Global segmentation. The entire study area was treated as a single entity. Based on the ROC-LV calculation, the optimal global scale parameter was determined to be 170. The shape and compactness parameters were set to 0.2 and 0.6, respectively, based on prior segmentation experience [44] and verified through testing across all subzones. Previous studies have indicated that segmentation scale is the most sensitive parameter in MRS, with shape values in the range of 0.1–0.3 generally yielding the most stable segmentation accuracy [45], while compactness has a relatively minor influence and can be set as a constant within a reasonable range [46,47].

Stage 2: Stratified segmentation. Based on the second-level zoning results (Section 2.4.2), optimal scales were calculated independently for each subzone. For the non-cropland zone, the scale was optimized to 200. For the plain open-field cropland and greenhouse zone, the optimal scale was 125. For the slope cropland zone, it was set to 135, and for the terrace zone, it was set to 205 (Figure 9).

2.6. Classification Strategies and Accuracy Assessment

To evaluate the impact of the proposed geographic zoning on the classification accuracy, we designed three comparative schemes:

Classification without zoning using unified scale (CUUS): The entire study area was treated as a single uniform region. A global segmentation scale (170) was applied, and a single classification model was trained for the whole area.
Classification with zoning using unified scale (CZUS): The study area was divided into subzones based on the zoning results. The segmentation was still performed using the global parameter (170). The key difference was that separate classification models were trained for each subzone.
Classification with zoning using different scales (CZDS): The study area was divided into subzones based on the zoning results, and each subzone was processed using its specific optimal segmentation scale (as determined in Stage 2 of Section 2.5) and a specifically trained classification model.

For all schemes, the Random Forest (RF) algorithm was selected as the classifier due to its proven stability and performance in agricultural extraction tasks [48]. Previous studies have demonstrated that RF is far less tunable compared to other machine learning algorithms such as support vector machines and generally provides reliable results with default parameter settings [49]. Probst et al. [50] demonstrated through a benchmark study on 38 datasets that the average improvement in AUC (Area Under the ROC Curve) from hyperparameter tuning was only approximately 0.01 compared to default values. In the RF algorithm, classification is performed by generating multiple decision trees and aggregating their outputs. The n_estimators parameter determines the number of decision trees in the forest. Generally, increasing n_estimators improves model stability and prediction accuracy; however, this improvement exhibits diminishing returns as more trees are added [51,52]. Meanwhile, larger n_estimators values lead to increased training time and memory consumption, necessitating a balance between computational cost and model performance. To determine the optimal n_estimators, we evaluated the out-of-bag (OOB) error [53] across a range of values from 100 to 1000 with an increment of 100. The OOB error stabilized when n_estimators reached approximately 500, beyond which no significant improvement was observed. Therefore, n_estimators was set to 500, with other parameters kept at default values. The feature space constructed for classification included: (1) Spectral features (Red, Green, Blue, Near-Infrared, and Brightness); (2) Vegetation indices (NDVI); (3) Textural features (GLCM Contrast and Entropy); (4) Geometric features (Shape Index); and (5) Topographic factors (Elevation, Slope, and Relief).

Accuracy validation was conducted from two complementary perspectives: classification accuracy and boundary precision. For classification accuracy, confusion matrices were constructed based on the validation samples (Section 2.3) to systematically identify misclassification between categories. From these matrices, Overall Accuracy (OA), Kappa coefficient, Producer’s Accuracy (PA), and User’s Accuracy (UA) were derived to quantify the overall performance, category-specific omission errors, and commission errors, respectively [54]. For boundary precision, five typical regions were selected across the study area, covering all agricultural land use types: non-cropland, plain open-field cropland, slope cropland, terraces, and greenhouses. These regions were selected to represent the characteristic landscape patterns and boundary complexity of each type. Within each region, the ground truth boundaries were manually digitized from GF-2 imagery (0.8 m resolution) based on the classification system and interpretation keys established in Section 2.2. The spatial correspondence between the classification results and these reference boundaries was quantified using the IoU metric [55]. The IoU calculation formula is as follows:

I o U = \frac{A \cap B}{A \cup B}

(3)

where

A

is the classification result,

B

is the ground truth.

3. Results

3.1. Accuracy Assessment Based on Confusion Matrix

The confusion matrix results for the three classification schemes are shown in Figure 10a–c. In terms of PA, the CZDS scheme consistently achieved the highest performance across all land cover types. For terraces, slope cropland, plain open-field cropland, and non-cropland, accuracy improved progressively from CUUS to CZUS, and finally to CZDS. On average, CZDS increased PA by 3.28% compared to CZUS and 4.85% compared to CUUS. However, a notable exception was observed for greenhouses, where the CZUS scheme recorded the lowest accuracy, indicating severe omission errors. This decline is likely due to the internal spectral heterogeneity of greenhouses (e.g., plastic-covered vs. uncovered) combined with a reduced sample size in specific subzones after zoning. While the distinct linear structure of greenhouses helps reduce commission errors, insufficient training data within a single subzone can hamper the model’s ability to learn diverse spectral features, leading to omissions.

In terms of UA, CZDS similarly outperformed the other schemes. For greenhouses, terraces, and slope croplands, accuracy increased stepwise from CUUS to CZDS. On average, CZDS improved UA by 3.39% compared to CZUS and 11.41% compared to CUUS. Notably, the zoning-based schemes (CZUS and CZDS) demonstrated significant reductions in commission errors for complex topographies like terraces and slope croplands. Conversely, for plain open-field cropland and non-cropland, CZUS exhibited the lowest UA, indicating a higher rate of commission errors. This phenomenon can be partially attributed to the high internal heterogeneity of these categories, which encompass multiple distinct subclasses. In HR images, the detailed spectral and textural variations within these subclasses are far more pronounced than in medium-resolution data, thereby increasing the complexity of classification and the likelihood of misclassification [56].

In terms of OA and Kappa coefficient, a consistent upward trend was observed across the three schemes. The OA improved from 84.81% (CUUS) to 88.75% (CZUS) and 90.81% (CZDS), and the Kappa coefficient increased from 0.74 (CUUS) to 0.81 (CZUS) and 0.85 (CZDS).

3.2. Accuracy Assessment Based on IoU

The IoU assessment (Figure 10d) highlights the geometric quality of the classification results. The average IoU increased from 77.85% (CUUS) to 83.05% (CZUS) and 90.85% (CZDS), with an overall improvement of 13.0% over the non-zoning scheme, demonstrating that adaptive segmentation scales better capture the morphology of different agricultural land use types.

A detailed visual comparison against manual interpretation (ground truth) is shown in Figure 11. The non-zoning scheme (CUUS) frequently exhibited misclassifications, such as confusing slope cropland or plain open-field cropland with non-cropland (Figure 11a,b), and significantly overestimating terrace areas (Figure 11d). While CZUS improved upon this, it still struggled with distinguishing non-cropland from slope cropland in certain areas (Figure 11c) and showed higher omission rates for greenhouses (Figure 11e). In contrast, CZDS demonstrated superior performance, minimizing both omission and commission errors while preserving the detailed boundaries of fragmented patches.

3.3. Classification Result

The comparative analysis confirms that zoning-based strategies significantly enhance classification accuracy compared to non-zoning approaches. This improvement is particularly evident in complex terrains, such as slope croplands and terraces, where zoning effectively mitigates both commission and omission errors. Moreover, optimizing segmentation scales specific to each subzone further refines the geometric precision of object boundaries. Among the three evaluated schemes, the CZDS scheme demonstrates superior performance by achieving an effective balance between thematic classification accuracy and geometric boundary precision. Consequently, the final agricultural land use map for the study area was generated based on the CZDS scheme, as illustrated in Figure 12.

4. Discussion

4.1. The Necessity of Agricultural Land Use Classification in Hilly and Mountainous Areas

Cropland, as a fundamental resource for global food security, has been a primary focus of remote sensing monitoring. Over the years, considerable progress has been made in cropland extraction, resulting in the availability of various datasets. Notable examples include the 30 m resolution global map of cropland extent and change [57], the 30 m annual cropland dataset of China from 1986 to 2021 [58], the 10 m resolution dataset depicting annual changes in maize cropland in China from 2017 to 2021 [59], and the 10 m resolution national scale map of cropland use intensity in China during 2018–2023 [60]. However, these existing classification systems predominantly focus on crop types (e.g., soybean, maize) [61] or broad cropping patterns [62]. Research on classifying specific agricultural land use types characterized by distinct landscape morphologies remains limited. This gap is particularly significant in hilly and mountainous regions, where complex topography necessitates a distinction between these land use types to address their critical ecological and economic implications.

From an ecological perspective, different agricultural land use types impose distinct environmental footprints and conservation needs. Terraces serve as effective conservation measures against soil erosion by transforming steep slopes into level platforms to enhance stability and cultivation conditions [63]. However, socio-economic factors like urbanization and labor migration have caused terrace abandonment in certain regions, which reverses the ecological benefits and may exacerbate soil erosion and slope instability. In contrast, greenhouses have expanded rapidly, particularly in China; the area of agricultural plastic greenhouses grew by 42.4% during 2000–2020 [64]. While greenhouses optimize microclimates to protect crops from extreme weather and pests, they also introduce environmental challenges: extensive plastic coverings can alter pollination cycles and reduce biodiversity, and the accumulation of plastic residues often degrades soil structure. Therefore, accurate mapping of these agricultural land use types is essential for understanding specific ecological impacts and guiding conservation efforts.

From an economic perspective, agricultural land use types vary substantially in production efficiency, input costs, and income stability. Plain open-field cropland, featuring flat terrain and high mechanization, supports large-scale cultivation with high efficiency and low per-unit costs. However, this type depends heavily on natural rainfall and climatic conditions, leading to limited growing seasons and substantial seasonal income variability [65]. In contrast, greenhouse cultivation offers high economic potential by extending growing seasons, enabling year-round production, and reducing dependency on seasonal and climatic variations. Greenhouses are particularly suitable for high-value crops such as fruits, vegetables, and flowers. However, greenhouse systems require substantial financial investment for construction and maintenance and are heavily reliant on energy for heating, lighting, and ventilation, making them vulnerable to energy price fluctuations [66]. Slope cropland, constrained by steep terrain and limited mechanization, is typically used for small-scale agriculture. This type relies on natural rainfall, making it highly susceptible to soil erosion and climate variability, resulting in low production stability [67]. Therefore, understanding the spatial distribution of different agricultural land use types is crucial for conducting comprehensive economic analyses, optimizing resource allocation, and improving land use efficiency.

In conclusion, developing a specialized classification for diverse agricultural land use types is essential. Such an approach bridges the gap between general monitoring and precision agriculture, enhances ecological conservation efforts, and facilitates comprehensive economic analysis. To fulfill these objectives, the classification results must achieve an accuracy level that can reliably support practical applications and scientific research.

With the support of geographic zoning, this study achieved an overall classification accuracy of 90.81% and a Kappa coefficient of 0.85, indicating its potential to meet these requirements from the following perspectives. First, reliable crop type mapping serves as a fundamental prerequisite for monitoring planting area, crop growth, and yield estimation [68]. The accuracy level attained in this study is expected to meet the requirements for regional agricultural resource surveys and land use inventories. Second, accurate classification of diverse agricultural land use types contributes to crop phenology analysis and growth monitoring in heterogeneous environments, where data quality and temporal resolution directly affect classification reliability [69]. Third, accurate delineation of agricultural field boundaries from high-resolution imagery has been recognized as crucial for parcel-level cropland monitoring and precision agriculture management [70]. The high boundary precision achieved through our zoning-optimized segmentation strategy suggests that the proposed method can provide reliable parcel-level spatial references, potentially supporting area estimation, agricultural economic analysis, and resource allocation optimization at the regional scale.

4.2. Pathways of Geographical Zoning in Improving Classification Accuracy: Feature Heterogeneity and Boundary Precision

The concept of geographical zoning has a long history, with its roots tracing back to the early 19th century when Alexander von Humboldt combined climate and vegetation distribution to explore the laws of geographic zonation. This pioneering work led to the creation of the first global isothermal map and marked a significant milestone in geographical zoning research [71]. Over time, the application of geographical zoning has expanded across various fields, such as climatic zoning [72], ecological zoning [73], and agricultural zoning [74]. Despite this extensive theoretical foundation, the potential of geographical zoning in remote sensing classification remains underexplored [75].

In remote sensing, the most common strategy for handling large-scale or HR remote sensing imagery is image tiling. This method divides the study area into equal-sized rectangular tiles or segments based on administrative boundaries to enhance computational efficiency [76,77]. For example, Zhang et al. [78] divided their study area into 159 km grid cells to classify land cover in North America, demonstrating that the average classification confidence achieved by training a separate classifier for each grid cell is higher than that of the global classifier. Campos et al. [79] divided their study area into 5 km grid cells to classify small farms in Spain, demonstrating that tiling could address spatial variability and improve classification accuracy by approximately 12%. However, the image tiling method overlooks the spatiotemporal heterogeneity of geographical environmental factors. This mechanical division poses challenges for classification in areas with complex environments or diverse land cover types.

With the growing recognition of spatial heterogeneity and multi-scale variability in remote sensing, geographical zoning has gained attention as a more optimized alternative to image tiling [80]. By leveraging the attributes and characteristics of geographical, ecological, and human activity features, geographical zoning provides a comprehensive representation of regional environmental differences. This approach reduces the complexity of the environment and land cover within subzones, thereby creating the potential for improved classification accuracy [81]. For instance, Jin et al. [82] applied eco-geographical zoning using environmental data and clustering methods for 30 m national-scale land cover classification. Their findings indicated that this approach effectively reduced the regional complexity of land cover, although no classification validation experiments within the zones were reported. Similarly, Jiang et al. [83] classified forests in the Pacific Northwest of the United States using WWF-provided ecological zoning. Their results demonstrated that zoning effectively reduced spectral heterogeneity and led to a 15% improvement in classification accuracy.

However, reliance on pre-existing zoning datasets introduces significant limitations in the context of HR remote sensing applications. First, there is a scale mismatch because most existing zoning datasets are designed for large-scale mapping scenarios, such as global or national analyses. These datasets typically define zones that encompass extensive areas, which often do not align with the finer spatial resolution or specific objectives of HR studies. Second, the loss of spatial detail is a critical issue. The original data used to create large-scale zoning datasets often has a resolution at the kilometer scale, which is insufficient for capturing detailed geographic and spectral variations required for precision agriculture. Third, traditional zoning methods typically overlook temporal dynamics. These methods produce results that only reflect spatial differences at a single time point and fail to account for dynamic phenological changes over time, which are essential for distinguishing agricultural land use types [84].

In this study, we addressed these challenges by integrating multi-source data to develop a spatiotemporal feature-driven geographical zoning method. This approach improves classification accuracy through two primary pathways: reducing feature heterogeneity and enhancing boundary precision.

Feature heterogeneity: Geographical zoning enhances classification by creating subzones with more concentrated distributions of land use types. As shown in Figure 13, each subzone after zoning represents distinct conditions of topography, human activity, and vegetation phenology, resulting in spectral and other classification feature values (e.g., texture, topography) that are more distinctive. The increased inter-regional feature differences reduce confusion among similar types, such as the spectral overlap between “same objects with different spectra” or “different objects with similar spectra” [85]. Moreover, zoning allows the classification models to be locally optimized within each subzone, enabling them to focus on regional feature patterns, thereby reducing classification errors associated with the global model.

Boundary precision: By optimizing segmentation scales within each subzone, the segmentation results more accurately reflect the actual distribution of cropland types within the region. As shown in Figure 14, the segmentation results of the zoning model delineate cropland boundaries more precisely than those of the global model. Segmentation boundaries define the spatial extent for calculating classification characteristics of the target patches. When segmentation scales are too large, as shown in Figure 14b,c,e, patches may include multiple land-cover types, leading to mixed feature calculations and subsequent misclassification. Conversely, overly small segmentation scales divide a single type of distribution into an excessive number of patches, as illustrated in Figure 14a,d, increasing computational complexity while reducing efficiency.

4.3. Limitations and Future Work

The proposed method has several limitations that warrant further investigation.

First, the geographical zoning strategy relies on optical time-series data to capture vegetation phenology. While this approach proved highly effective in the relatively dry and clear-sky conditions of northern mountainous regions like Beijing, its transferability to southern regions requires caution. Southern hilly areas often experience persistent cloud cover and rain during critical crop growth stages, causing severe data gaps in optical imagery. To ensure all-weather phenological monitoring and enhance model transferability across climatically diverse regions, future research should integrate optical data with Synthetic Aperture Radar (SAR) time-series imagery [86].

Second, as a hierarchical approach, errors in the first-level zoning may propagate to the subsequent classification stage, particularly in transitional areas along zoning boundaries. Although the dual-condition intersection strategy and the statistically derived thresholds adopted in this study have effectively minimized such risks, future research should quantitatively assess the sensitivity of the final classification accuracy to potential zoning errors, for example, by systematically introducing perturbations to the zoning boundaries and evaluating the resulting changes in classification performance.

Third, this study adopted the RF classifier based on its proven robustness with limited training samples in agricultural classification tasks. Future studies will explore the integration of Deep Learning (DL) architectures with the geographical zoning framework through two potential strategies [87]. The first involves embedding the geographical zoning information as prior knowledge into DL networks, for example, by encoding the zoning results as additional input feature channels or incorporating them into spatial attention modules to guide the network to learn zone-specific feature patterns. The second involves replacing the current RF classifier within each subzone with DL-based semantic segmentation models (e.g., U-Net or DeepLab), which could directly learn spatial features from high-resolution imagery under zone-specific constraints. A key challenge in both strategies is that DL models typically require large volumes of labeled training data, and the sample availability within individual subzones after geographical zoning may be insufficient for effective model training. Addressing this issue through strategies such as transfer learning or data augmentation represents an important future direction. Additionally, systematically comparing the performance of different classification algorithms within the geographical zoning framework will help assess the generalizability of the proposed zoning strategy across different classifiers.

5. Conclusions

This study developed a spatiotemporal feature-driven geographical zoning framework that integrates vegetation phenology, topographic characteristics, and human activities to improve agricultural land use classification in the hilly and mountainous areas of Beijing. By decoupling the complex global classification task into relatively homogeneous local problems, the zoning framework significantly enhanced classification performance. The overall accuracy improved from 84.81% to 90.81%, the Kappa coefficient increased from 0.74 to 0.85, and the average IoU rose from 77.85% to 90.85%. The improvement was achieved through two complementary pathways: reducing feature heterogeneity by constraining classifiers within geographically homogeneous subzones and enhancing boundary precision through zone-specific optimization of segmentation scales. These findings suggest that incorporating geographical knowledge as prior constraints into the classification workflow can complement data-driven approaches, offering a promising perspective for agricultural mapping in geographically complex areas.

Author Contributions

Conceptualization, J.Z., X.Y. and Z.W.; methodology, J.Z.; software, J.Z.; validation, X.L., H.W. and X.C.; formal analysis, J.Z., X.Y. and Z.W.; investigation, J.Z., X.Y., Z.W. and X.L.; resources, X.Y.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z. and S.F.; visualization, J.Z. and X.L.; supervision, X.Y. and S.F.; project administration, X.Y. and S.F.; funding acquisition, S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2021YFB3900501, the National Natural Science Foundation of China, grant number 42371473, and the National Natural Science Foundation of China, grant number 42376227.

Data Availability Statement

The datasets produced in this study are available from the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under the ROC Curve
CUUS	Classification without zoning using unified scale
CZDS	Classification with zoning using different scale
CZUS	Classification with zoning using unified scale
DEM	Digital Elevation Model
DL	Deep Learning
DTW	Dynamic Time Warping
ESA	European Space Agency
ESP	Estimation of Scale Parameter
GF-2	Gaofen-2
GF-6	Gaofen-6
GEE	Google Earth Engine
HR	High Resolution
IoU	Intersection over Union
MRS	Multiresolution Segmentation
NDVI	Normalized Difference Vegetation Index
OA	Overall Accuracy
OOB	out-of-bag
PA	Producer Accuracy
RF	Random Forest
ROC	Rate of Change
ROC-LV	Rate of Change of Local Variance
SAR	Synthetic Aperture Radar
S2	Sentinel-2
SG	Savitzky–Golay
UA	User Accuracy