Identification of Abandoned Cropland and Global–Local Driving Mechanism Analysis via Multi-Source Remote Sensing Data and Multi-Objective Optimization

Gui, Side; Li, Jiaming; Chen, Guoping; Zhao, Junsan; Tang, Bohui; Li, Lei

doi:10.3390/rs17173086

Open AccessArticle

Identification of Abandoned Cropland and Global–Local Driving Mechanism Analysis via Multi-Source Remote Sensing Data and Multi-Objective Optimization

by

Side Gui

^1,2,

Jiaming Li

^2,3,*,

Guoping Chen

^1,2

,

Junsan Zhao

^1,2,

Bohui Tang

¹

and

Lei Li

^2,3

¹

Faculty of Land Resource Engineering, Kunming University of Science and Technology, Kunming 650093, China

²

Yunnan Key Laboratory of Intelligent Monitoring and Spatiotemporal Big Data Governance of Natural Resources, Kunming 650051, China

³

Yunnan Institute of Geology and Mineral Surveying and Mapping Co., Ltd., Kunming 650051, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 3086; https://doi.org/10.3390/rs17173086

Submission received: 7 July 2025 / Revised: 29 August 2025 / Accepted: 2 September 2025 / Published: 4 September 2025

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

The issue of abandoned cropland poses a significant threat to national food security and the sustainable use of land resources, highlighting the urgent need for an efficient and interpretable remote sensing identification framework. This study integrates three authoritative land cover datasets—the European Space Agency WorldCover (ESA), the Environmental Systems Research Institute Land Cover (ESRI), and the China Resource and Environment Data Cloud Platform (CRLC). Multi-source remote sensing features were extracted using the Google Earth Engine platform, and high-quality training samples were constructed by randomly selecting sample points based on these features in ArcGIS. A recursive feature cross-validation method is employed to eliminate redundant variables, thereby optimizing the feature structure without compromising classification accuracy. In terms of model construction, a multi-objective optimization strategy combining the Non-dominated Sorting Genetic Algorithm II (NSGA-II) and eXtreme Gradient Boosting (XGBoost) is proposed. By incorporating a pruning mechanism, computational efficiency is significantly improved—accelerating the identification speed by up to 75%—while maintaining model accuracy (OA: 0.9817; Kappa: 0.9633; F1-score: 0.9817; recall: 0.9866). For result interpretation, the SHapley Additive exPlanations (SHAP) method is used to evaluate global feature importance, revealing that variables such as SAVG, B3_p25, Road, DEM, and Population contribute most significantly to the identification of abandoned cropland. Meanwhile, the Local Interpretable Model-Agnostic Explanations (LIME) method is applied to conduct local interpretability analysis on typical samples. The results show that, while some samples share consistent dominant features with the global results, others exhibit stronger local influences from features such as slope and SAVG. The combination of SHAP and LIME for global–local interpretability provides insight into the heterogeneous drivers of cropland abandonment and enhances the transparency of the classification model. This study presents a practical, scalable framework for the rapid identification and management of abandoned cropland, balancing precision, interpretability, and efficiency.

Keywords:

abandoned cropland identification; pruning algorithm; NSGA-II-XGBoost; SHAP; LIME; multi-source data

1. Introduction

Cropland is the most fundamental natural resource for agriculture and plays a vital role in ensuring national food security, maintaining ecological balance, and promoting social stability [1,2]. For China—a nation characterized by a large population and limited arable land—cropland protection has long been a core national strategy [3,4,5]. In recent decades, China has implemented a series of policies aiming to safeguard farmland, such as the “1.8 billion mu red line”, the “balance between farmland occupation and compensation”, and the “food security strategy through land and technology” policies, in an effort to stabilize total cropland area and improve land quality. However, with the rapid advancement of industrialization and urbanization, large numbers of rural laborers have migrated to cities, leading to severe labor shortages in agriculture. Combined with the low comparative benefits of farming and poor returns from land use, these factors have contributed to a growing problem of cropland abandonment in many regions [6,7,8].

Abandoned cropland refers to land that is suitable for cultivation but has not been utilized effectively for a certain period [9]. It typically manifests as long-term fallow, idle, or degraded land. In recent years, cropland abandonment has evolved from scattered individual behavior to a widespread regional phenomenon, particularly prominent in central and western regions as well as remote mountainous areas. According to statistics from the Ministry of Natural Resources and the Ministry of Agriculture and Rural Affairs, the proportion of abandoned cropland in some areas has exceeded 10%. The causes are multifaceted, including labor outmigration, low agricultural profitability, institutional land constraints, and underdeveloped infrastructure. As such, cropland abandonment has emerged as a serious threat to national food security and the sustainable development of agriculture. Yunnan Province, as one of China’s major agricultural regions, is characterized by complex topography and diverse ethnic groups, with highly heterogeneous land resources. Binchuan County, located in eastern Dali Prefecture, Yunnan Province, is a typical mountainous agricultural county with a long-standing role in grain and cash crop production. However, since 2020, some towns in Binchuan have experienced significant trends of cropland degradation and long-term idleness. The intensification of abandonment has become a critical constraint on agricultural productivity. Therefore, systematically investigating the spatial distribution, temporal evolution, and driving mechanisms of abandoned cropland in Binchuan County from 2020 to 2022 is of great significance for developing region-specific governance strategies, ensuring national food security, and advancing rural revitalization.

Early identification of abandoned land primarily relies on field surveys, farmer questionnaires, and local statistical data. These methods suffer from outdated information and limited spatial precision. In recent years, with the advancement of remote sensing (RS), geographic information systems (GIS), and artificial intelligence, remote-sensing-based approaches have become mainstream. In terms of data sources, researchers commonly employ multi-temporal satellite data such as MODIS, Landsat, and Sentinel, along with vegetation indices such as the Normalized Difference Vegetation Index (NDVI) and the Enhanced Vegetation Index (EVI) to monitor abandonment dynamics [10,11]. For instance, Yin et al. (2018) used Landsat time series data and change detection techniques to identify abandoned land [12]. In terms of data integration, both domestic and international scholars have begun exploring the use of fused land cover datasets (e.g., ESA, ESRI, Globeland30) to enhance classification accuracy and coverage [13]. At the methodological level, traditional supervised classification algorithms (e.g., decision trees, support vector machines, random forests) have been widely applied in abandonment identification, but they often struggle with generalization when processing large-scale, high-dimensional, and heterogeneous datasets. In recent years, machine learning methods such as XGBoost and RF have gained popularity due to their high accuracy and robustness [14,15]. For example, Long et al. (2021) developed an abandonment classification model using RF [16]. In addition, some studies have explored non-parametric methods such as k-nearest neighbors (KNN) and deep learning techniques (e.g., LSTM) to dynamically identify cropland abandonment using multi-temporal imagery [17]. For multi-objective model optimization, evolutionary algorithms such as the NSGA-II have been widely adopted in land use classification to balance accuracy, computational complexity, and efficiency. In driving mechanism analysis, existing studies mostly employ methods such as statistical regression and Gradient-Boosting Decision Tree models to identify the causes of cropland abandonment [18,19].

With the increasing adoption of “black-box models” in land use research, such as advanced machine learning algorithms and ensemble frameworks, concerns regarding model interpretability have become more pronounced. This is especially critical when attempting to reveal the underlying mechanisms that drive cropland abandonment or other land use transitions. To address this challenge, interpretability tools like SHAP have been increasingly applied to quantify and visualize the way in which input variables contribute to model outputs, thereby offering valuable insights into complex, high-dimensional decision processes [20].

Given the increasingly severe cropland abandonment issue in the mountainous regions of southwestern China—and the limitations of existing studies in terms of spatial accuracy, modeling efficiency, and result interpretability—this study focuses on Binchuan County as a representative case to develop an integrated method for identifying abandoned cropland and analyzing its underlying causes. By combining multi-source remote sensing data and geospatial platforms, and integrating spatial analysis, machine learning algorithms, and model interpretability tools, this study aims to achieve a unified framework of “high-accuracy identification, efficient modeling, and in-depth mechanism analysis”. The main innovations of this study include the following:

Constructing a training sample system that fuses multiple land cover datasets to enhance reliability and spatiotemporal representativeness.
Proposing an integrated modeling framework that combines XGBoost with the NSGA-II evolutionary optimization algorithm. This design leverages the predictive performance of XGBoost and the multi-objective optimization capacity of NSGA-II, while incorporating pruning algorithm to improve computational efficiency.
Combining SHAP and LIME to conduct global and local multi-scale interpretation of abandonment drivers, thereby improving model transparency and its value for policy guidance. This study is expected to provide scientific support for the governance of abandoned cropland in mountainous counties and serve as a technical reference for implementing China’s cropland protection and rural revitalization strategies.

2. Study Area and Data

2.1. Overview of the Study Area

Binchuan County is located in the central–western part of Yunnan Province, China, and is administratively under the jurisdiction of Dali Bai Autonomous Prefecture. Geographically, it spans from 100°12′E to 100°42′E and from 25°57′N to 26°27′N (Figure 1), covering a total area of approximately 2650 km². The terrain is dominated by mountainous and hilly landscapes, characterized by significant elevation variation and a typical vertical agricultural structure. The region features a subtropical plateau monsoon climate, with an average annual temperature of about 18 °C and annual precipitation ranging from 800 to 1100 mm, exhibiting distinct wet and dry seasons. The diverse natural geography and rich agricultural resources provide a solid foundation for land use research and cropland dynamics analysis. In recent years, however, due to declining agricultural profitability, climate variability, and other factors, certain areas of Binchuan County have experienced increasing levels of cropland abandonment. This has led to reduced land use efficiency and posed challenges to agricultural sustainability. Therefore, selecting Binchuan County as the study area is not only representative of typical mountainous agricultural counties but also carries important implications for practical policymaking and regional scalability.

2.2. Data Sources and Preprocessing

This study utilizes a combination of datasets, including remote sensing imagery, land use data, topographic data, socioeconomic indicators, and meteorological variables. The detailed sources are as follows:

Remote sensing imagery: Remote sensing data were primarily acquired through the Google Earth Engine (GEE) platform. This includes Sentinel-1 synthetic aperture radar (SAR) imagery and Sentinel-2 multispectral data. Sentinel-1 backscatter coefficients and Sentinel-2 surface reflectance were preprocessed within GEE by applying cloud masking, noise filtering, and temporal compositing, which ensured data consistency and minimized atmospheric effects.
Three authoritative datasets were integrated, including ESA WorldCover [21], ESRI Land Cover [22], and the China Resource and Environment Data Cloud Platform (CRLC) [23]. To harmonize differences among products, all land cover datasets were reclassified into a consistent category system and combined through overlay analysis, producing a unified land use dataset suitable for national-scale cropland studies.
Topographic data: A 30 m resolution Digital Elevation Model (DEM) [24] was used to derive key topographic factors such as slope and aspect. The slope and aspect layers were generated using ArcGIS spatial analysis tools, and subsequently resampled to align with the spatial resolution of Sentinel imagery.
Socioeconomic data: Variables such as population density, distance to roads, and distance to rivers were included to reflect the intensity of human activity and infrastructure distribution. These indicators help reveal the socioeconomic driving forces behind land use change. These datasets, originally at coarser resolutions, were resampled to 10 m to ensure spatial consistency with remote sensing and topographic variables.
Meteorological data: Climate variables, including annual precipitation [25] and average temperature [26], were used to characterize the natural environment and assess its impact on land use patterns and evolution. The gridded climate data were resampled to a 30 m resolution to ensure comparability across all input layers.

The sources and basic descriptions of each dataset are summarized in Table 1.

2.3. Sample Point Construction

The construction of sample points plays a critical role in determining the robustness and generalization ability of cropland identification models. To enhance both the accuracy and representativeness of training samples, this study adhered to the principle of remote sensing data consistency and developed a training dataset with high confidence and structural balance. Specifically, we performed a spatial overlay analysis using three authoritative land cover datasets—ESA WorldCover, ESRI Land Cover, and the China Resource and Environment Data Cloud Platform (CRLC)—to identify areas with consistent classification across sources. Regions jointly classified as “cropland” by all three datasets were defined as high-confidence cropland sample zones, while those consistently labeled as “non-cropland” were designated as candidate non-cropland sample zones. Field verification of actual cropland (Figure 2a) and abandoned land parcels (Figure 2b) was then conducted to validate the sample labels. This strategy helped mitigate classification bias caused by reliance on a single data source and enhanced label consistency and reliability. To balance class representation and model learning capacity, a total of 4000 sample points were randomly selected from these high-confidence areas, including 2000 cropland and 2000 non-cropland samples (Figure 1). This sampling method not only ensured label reliability but also maintained balanced class distribution and land cover structure, thus providing a solid foundation for model training.

3. Methodology

3.1. Abandoned Cropland Definition and Overall Framework

In this study, abandoned cropland is defined as land that was previously cultivated but has remained uncultivated for at least two consecutive years (Figure 2c) and has lost its agricultural functionality; this is typically characterized by the prevalence of weeds, stones, and surface degradation that make farming impractical (Figure 2b). The overall research framework is illustrated in Figure 3 and includes several key steps: high-confidence sample selection, multi-dimensional feature construction, variable reduction, model optimization, and result interpretation. First, cropland candidate areas were selected by overlaying the three authoritative land cover datasets—ESA WorldCover, ESRI Land Cover, and CRLC—to ensure high label confidence and consistency. Based on these samples, a comprehensive high-dimensional feature library was constructed. It includes multi-source remote-sensing-derived spectral and grayscale statistics as well as topographic variables (e.g., slope, elevation), road accessibility, and rural population density, thereby capturing both biophysical and anthropogenic factors related to cropland status and abandonment risk. To reduce the impact of redundant variables on model efficiency and generalizability, the Recursive Feature Elimination with Cross-Validation (RFECV) method was applied to dynamically filter the input variables and retain only the most discriminative ones [27]. For model construction, the Non-dominated Sorting Genetic Algorithm II (NSGA-II) was used to perform multi-objective optimization of both the XGBoost classifier’s hyperparameters and classification thresholds. This approach ensures a balance between classification accuracy and model complexity through global optimization [28,29]. To improve model transparency and decision-making interpretability, SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations) were employed. SHAP was used to analyze global feature importance, while LIME provided local interpretability for specific samples. This dual-scale interpretability revealed both the dominant factors driving abandonment and their regional variations. In summary, this study established a comprehensive technical framework encompassing sample construction, feature reduction, parameter optimization, classification, and interpretation. This framework offers methodological support for the accurate identification of abandoned cropland and in-depth analysis of its driving mechanisms.

3.2. Feature Construction

To comprehensively analyze the driving mechanisms and spatial distribution patterns of cropland abandonment, a multi-dimensional feature set was constructed that integrates vegetation indices, topographic features, land use changes, meteorological conditions, and socioeconomic attributes.

Sentinel-1 SAR data, including VV and VH polarization bands and derived indices such as the Polarization Ratio (PR) and Normalized Difference Polarization Index (NDPI), were incorporated. SAR imagery is unaffected by cloud cover and illumination conditions, providing complementary structural and moisture information that enhances cropland detection in complex terrains. For remote sensing features, we utilized Sentinel-2 surface reflectance data, including its 10 spectral bands (B2–B8A, B11, B12). Based on these bands, a set of vegetation indices (e.g., NDVI, EVI, SAVI) was derived to capture vegetation greenness and productivity. In addition, GLCM texture metrics were calculated from the B8 (near-infrared) band to represent the spatial patterns and heterogeneity of vegetation cover. To further characterize local spectral distributions, a percentile-based approach was applied within a fixed window centered on each sample point, where the proportion of pixels falling within different threshold intervals was extracted. For example, features such as B2_p5, B2_p25, B2_p50, B2_p75, and B2_p95 were generated to represent the 5th, 25th, 50th, 75th, and 95th percentile reflectance levels of the blue band, respectively [30,31]. Vegetation indices are closely related to cropland abandonment because low or unstable vegetation vigor often signals land degradation, reduced productivity, or discontinuous cultivation, which increase the likelihood of abandonment. Topographic variables include elevation, slope, and aspect, which reflect the physical constraints of terrain on cultivation suitability. Steep slopes and high elevations increase cultivation costs, hinder mechanization, and exacerbate soil erosion, thereby raising the risk of abandonment. In contrast, gentle slopes and favorable aspects facilitate farming and are generally less associated with abandonment. Meteorological factors such as annual precipitation and average temperature indicate climatic conditions for crop growth. Regions with insufficient or highly variable rainfall or with low temperatures may experience lower crop yields and higher production risks, contributing to farmers’ decisions to cease cultivation. Socioeconomic indicators such as distance to roads and rural population density reflect the intensity of human activities and infrastructure development levels, Poor infrastructure and remote locations increase transportation costs and reduce market access, discouraging continuous farming. Likewise, out-migration and declining rural populations reduce available agricultural labor, further promoting cropland abandonment.

3.3. Feature Selection via RFECV

To improve model training efficiency and generalizability, the Recursive Feature Elimination (RFE) method—a wrapper-based approach—was employed, along with its cross-validated version, RFECV. RFE iteratively ranks features based on their importance to a given model by sequentially removing the least important features. RFECV further incorporates a cross-validation mechanism to evaluate model performance across different feature subset sizes, ultimately identifying the optimal feature subset. This method balances computational efficiency with robustness of results, making it particularly suitable for remote sensing classification tasks under complex environmental conditions. Figure 4 shows the RFECV feature-selection curve used to determine the optimal number of features.

3.4. Cropland Classification Model and Algorithmic Foundations

To achieve high-precision and high-efficiency mapping of cropland spatial distribution, we constructed an integrated modeling framework combining pruning, NSGA-II, and XGBoost, incorporating a multi-objective evolutionary optimization strategy. This approach offers strong stability and scalability while balancing classification accuracy and computational complexity.

XGBoost (eXtreme Gradient Boosting) is an ensemble learning algorithm based on gradient boosting, which is widely applied in remote sensing classification tasks due to its excellent training efficiency and generalization capability [32,33,34]. In this study, XGBoost takes the optimized feature subset as input and outputs the predicted probability of each sample belonging to the cropland class, implementing binary classification (cropland labeled as 0).

To further enhance performance while controlling model complexity, we introduced NSGA-II as an external optimizer. NSGA-II is well-suited for solving multi-objective nonlinear optimization problems. Its core mechanisms include fast non-dominated sorting, crowding distance calculation, and elite preservation, which help maintain diversity and convergence within the solution space [35,36]. The model construction process includes the following steps:

Population Initialization: Construct a diverse population composed of different combinations of feature subsets and hyperparameters.
Fitness Evaluation: For each individual, train an XGBoost model and evaluate it using dual objectives—classification accuracy and model complexity.
Evolutionary Operations: Apply tournament selection and use single-point crossover and mutation to generate offspring.
Sorting and Updating: Identify Pareto-optimal solutions via non-dominated sorting and use crowding distance to preserve population diversity.
Convergence Criteria: The process terminates once convergence is reached or the maximum number of generations is exceeded.

To prevent overfitting in high-dimensional feature spaces and reduce computational burden, we incorporated a pruning algorithm. If no improvement is observed over several generations, the evolutionary process is terminated early to improve resource efficiency.

3.5. Interpretability Framework and Methods

After constructing a high-performance model for abandoned cropland identification, enhancing model interpretability is essential to increase its credibility and support policy adaptation. To explore how each input variable contributes to classification, we established a dual-perspective interpretability framework that incorporates both global and local explanations using two widely recognized tools: SHAP and LIME.

SHAP (SHapley Additive exPlanations): Based on game theory, SHAP computes each feature’s average marginal contribution across all possible feature combinations, ensuring a fair and consistent importance attribution. In this study, SHAP was used to rank the global importance of variables across all samples, providing insights into the dominant drivers and their relationship to the model’s decision process [37,38].

LIME (Local Interpretable Model-Agnostic Explanations): LIME focuses on local interpretability by perturbing the neighborhood of a specific sample and fitting a local linear model to approximate the behavior of the complex model around that instance. It reveals the locally dominant features that most influence the prediction result. This method addresses the limitations of global explanation methods in capturing sample-specific behavior [39,40].

The two approaches are complementary: SHAP provides macrolevel insights into discriminative variables, while LIME allows for microlevel interpretation and case-by-case analysis. Together, they contribute to a transparent, interpretable framework capable of deconstructing the “black-box” mechanisms of the model. Experimental procedures and results are detailed in Section 4.4.

4. Results and Analysis

4.1. Feature Optimization and Model Performance Improvement

To improve both classification accuracy and computational efficiency, we applied a two-pronged optimization strategy involving feature selection and multi-objective parameter tuning. At the feature level, the Recursive Feature Elimination with Cross-Validation (RFECV) algorithm was employed to extract 60 effective variables from the original feature set. These include key dimensions such as socioeconomic indicators, meteorological variables, and remote sensing imagery. Features such as population density, annual precipitation, and temperature seasonality exhibited strong and stable discriminative power across different regions. In contrast, features like aspect—which contributed little information and exhibited low variance—were automatically removed, suggesting limited marginal gain. In terms of model optimization, we compared three strategies: baseline XGBoost, NSGA-II-optimized XGBoost, and the joint optimization model that combined feature pruning with NSGA-II (Figure 5a). Results showed that the NSGA-II-optimized model outperformed in terms of accuracy, Kappa coefficient, recall, and F1-score, although it required substantial computational resources (runtime: 3270 s). The joint optimization model reduced the runtime significantly to 802 s with only a marginal decline in F1-score (to 98.09%), while recall increased to 99.00%, demonstrating superior capability in comprehensively identifying abandoned cropland (Figure 5b).

Overall, the joint optimization strategy that integrates feature compression and parameter space exploration not only improves model performance but also effectively controls model complexity and computational cost. This provides a feasible pathway for scalable deployment in large-scale remote sensing applications.

A further spatial comparison between the 2020 cropland classification results generated by the three models and the national land survey dataset (Figure 6) reveals the following: Although the baseline XGBoost model achieves acceptable overall accuracy metrics, it demonstrates a relatively high misclassification rate in geomorphological edge zones and remote areas with limited accessibility. This leads to noticeable spatial fragmentation in the classification outputs.

In contrast, the pruning–NSGA-II-optimized model produces more spatially continuous results and exhibits significantly better boundary alignment with the land survey data, indicating stronger spatial adaptability. This improvement is primarily attributed to the dual optimization of hyperparameters and feature space enabled by the multi-objective evolutionary strategy, which substantially enhances the model’s capacity to accommodate interactions among high-dimensional features. Furthermore, it effectively suppresses overfitting in heterogeneous regions and boundary areas.

4.2. Generalization Capability and Independent Validation

Under five-fold cross-validation, the model exhibited stable performance on the training dataset (Figure 7). When applied to an independent test set, the optimized model achieved an overall accuracy of 0.9817, a Kappa coefficient of 0.9633, a recall of 0.9866, and an F1-score of 0.9817, demonstrating excellent generalization capability.

The slight variation in the performance can be attributed to minor differences between the training and testing samples, which reflect the natural variability encountered in real-world applications. The proposed optimization strategy not only improves performance during training but also enhances the model’s discriminative capacity for unseen samples, particularly in reducing omission errors. Therefore, the pruning–NSGA-II-optimized model was ultimately selected as the final solution, as it successfully balances accuracy, generalizability, and practical deployability, offering a reliable methodological foundation for efficient abandoned cropland identification in large-scale applications.

4.3. Spatial Distribution of Abandoned Cropland and Validation in Representative Areas

Using the optimal model configured with pruning and NSGA-II multi-objective optimization, abandoned cropland mapping and spatial statistical analysis were conducted (Figure 8). Results indicate that the overall abandonment rate in the study area is 1.48%. Although the total extent is relatively limited, the distribution exhibits significant spatial heterogeneity and fragmentation, predominantly occurring in marginal areas characterized by steep slopes, poor accessibility, and low population density. This reflects strong constraints from topography and socioeconomic suitability for cultivation.

To further validate model performance, comparative assessments were conducted in representative high-risk zones for abandonment, including hillside remnants, village fringes, and terraced platforms at higher elevations. The model not only accurately replicated abandoned patches identified in the national land survey data but also captured additional scattered micro-abandonment plots. These typically occurred along parcel edges, unfavorable slope aspects, or in areas with limited soil moisture or long distances to roads.

From a remote sensing diagnostic perspective, these small-scale abandoned patches exhibited low spectral brightness, particularly in the green band, and were associated with poor road and population accessibility. This demonstrates the model’s high sensitivity in monitoring cropland status at fine spatial scales, especially for “micro-abandonment” conditions. The model thereby complements conventional surveys by improving granular spatial detection and provides data support for targeted land restoration planning.

The spatial distribution of abandonment further reflects differentiated driving mechanisms. On gentle slopes, abandonment is often associated with urban expansion and land conversion pressures, whereas steep slopes are constrained by high transportation costs and low cultivation profitability. Accessibility amplifies these effects: parcels near roads and village centers are generally maintained, while remote peripheral plots are more frequently abandoned, reflecting both infrastructural and demographic constraints. These patterns highlight that abandonment is jointly shaped by topography and socioeconomic accessibility rather than a single factor.

Apparent discrepancies with the “third national land survey” (san diao) in remote mountainous areas are partly attributable to survey limitations: due to cost considerations, such regions are updated less frequently, likely underestimating actual abandonment. By contrast, our model identified more abandoned plots in these zones, suggesting that true abandonment may be higher than reported in official statistics. Nevertheless, rugged terrain with mixed land cover remains a challenge, and some misclassification persists due to spectral confusion.

From a policy perspective, micro-abandonment along parcel edges or terrace margins could be addressed through small-scale land remediation or technical support, whereas abandonment in inaccessible mountainous areas may require broader strategies such as labor transfer, livelihood diversification, and land consolidation programs. Moreover, the spatial patterns identified here suggest that abandonment risk is not static but may evolve with changes in infrastructure, population dynamics, and regional development strategies.

4.4. Interpretation of Driving Factors

4.4.1. Global Interpretation: SHAP Analysis

To understand the underlying drivers of abandonment, we employed SHAP (SHapley Additive exPlanations) to evaluate the global importance of each input variable (Figure 9). The top five features in terms of contribution were SAVG, B3_p25, Road, DEM, and Population, collectively reflecting a combination of spectral, topographic, accessibility, and socioeconomic drivers.

SAVG (mean image gray level): As a statistical brightness index derived from multi-band remote sensing imagery, SAVG captures the average spectral reflectance of surface features. It was identified as the most influential feature, indicating its strong role in distinguishing abandoned cropland. Abandoned plots often exhibit irregular brightness, coarser textures, and weakened reflectance, resulting in lower average gray values. SAVG effectively integrates multi-band gray-level signals and is particularly sensitive to degraded vegetation and long-term abandonment.

B3_p25 (25th percentile of green band reflectance): This feature represents low-end vegetation vigor and photosynthetic activity, corresponding to the weakest periods of vegetation growth. In areas with long-term abandonment, characterized by degraded crops or weed invasion, low green-band values become an important signal. This feature enables the model to detect spectral degradation trends during land use decline, which is especially useful for identifying seasonal or latent abandonment.

Road (distance to nearest road): Accessibility significantly affects land use decisions. A strong positive correlation was observed between distance from roads and likelihood of abandonment, indicating that infrastructure availability is a major constraint on sustained cultivation. This supports broader patterns observed in Southwestern China, where mountainous and remote areas show higher abandonment rates due to limited road networks.

DEM (elevation): Elevation serves as a key determinant of cropland utilization, particularly in boundary and marginal zones. Higher elevations are generally associated with reduced mechanization potential, inferior soil conditions, and increased cultivation costs, leading to decreased land use efficiency. DEM remained among the top features, highlighting its stable role in identifying vulnerable areas.

Population (population density): This feature captures human-driven dynamics not directly observable in remote sensing data. Areas with low population density were more prone to abandonment, consistent with trends of labor shortages, rural depopulation, and outmigration of agricultural laborers. As such, population density acts as a key socioeconomic proxy, reinforcing the role of human factors in shaping land use transitions.

4.4.2. Interaction Effects: Variable Synergy Mechanisms

The SHAP interaction analysis (Figure 10) revealed significant nonlinear interactions among features, indicating the existence of synergistic decision mechanisms in the classification process.

The identification of abandoned cropland is not dominated by a single variable, but rather emerges from the collaborative influence of multiple factors. Specifically, the interactions between SAVG and B3_p25, Road, and Population suggest that, in areas with low image brightness, remote from roads, or in sparsely populated regions, the probability of abandonment increases significantly. Conversely, in well-connected or densely populated areas, the spectral contribution of SAVG diminishes.

A strong interaction was also observed between B3_p25 and DEM, particularly in mid-to-high elevation zones, where low reflectance in the green band exacerbates abandonment risks. In contrast, features like slope and precipitation primarily exert influence independently, with weaker synergistic effects.

These interactions substantially enhance the model’s ability to detect non-typical abandonment patterns, improving flexibility and generalization under complex environmental conditions. The synergistic mechanism aligns well with the diverse and heterogeneous spatial distribution characteristics of cropland abandonment.

4.4.3. Local Interpretation: LIME-Based Sample Response Analysis

To further examine the model’s interpretability at the individual sample level, we adopted the Local Interpretable Model-Agnostic Explanations (LIME) method. By generating perturbed samples and fitting local surrogate models, LIME captures the dynamic importance and contribution directions of dominant variables in each prediction, revealing the heterogeneity of global variable influence at the local scale.

Based on LIME analyses of seven representative samples (Figure 11), the following insights were obtained:

(1): SAVG-dominated samples: In most cases, SAVG (mean image gray level) remains the primary or top-ranked explanatory variable, demonstrating strong stable dominance across predictions. This finding is consistent with the global feature importance ranking from SHAP. These samples are typically located in areas with low reflectance and coarse textures, where LIME assigns significant positive contributions to SAVG. This confirms SAVG’s fundamental role in distinguishing abandonment-prone land, particularly when spectral characteristics of degradation are apparent.
(2): Socioeconomic-dominated samples: In some samples, the influence of SAVG decreases, while Road (distance to road) and Population (population density) emerge as the leading predictors. These cases often involve regions with low spectral brightness but high accessibility or population density. The model downweights the spectral features and instead emphasizes socioeconomic indicators, reflecting its adaptive capacity to contextual information. This adjustment helps prevent misclassification in atypical abandonment scenarios.
(3): Contribution direction heterogeneity: Notably, the same feature may exhibit opposite contribution directions across samples. For instance, a given variable may increase the predicted probability of abandonment in some samples, while suppressing it in others. This highlights that abandonment classification is a typical multi-feature collaborative decision process, in which the model dynamically constructs differentiated logic based on sample context.

This illustrates LIME’s unique advantage in uncovering the “black-box” decision processes of machine learning models at the microlevel, complementing SHAP’s global explanations. In summary, the LIME analysis confirms that, while SHAP provides robust global rankings and average impact estimates, the model exhibits highly heterogeneous local responses to input features. The direction and weight of each feature’s contribution may vary substantially across different contexts. This inherent “global trend vs. local variability” structure not only reflects the complexity of cropland abandonment detection but also underscores the adaptive flexibility of the pruning–NSGA-II–XGBoost model, particularly when dealing with boundary zones, noisy areas, and low-typicality samples. The combined use of LIME and SHAP allows for a comprehensive interpretability framework, spanning macrolevel patterns and microlevel decision logic, thereby enhancing model transparency, risk controllability, and policy support capabilities.

5. Discussion and Outlook

This study developed an abandoned cropland identification framework that integrates multi-source remote sensing data, feature pruning, NSGA-II-based parameter optimization, and the XGBoost classifier, further enhanced by a dual-path interpretability system utilizing SHAP and LIME [41,42]. In addition to the findings from Binchuan County, this study also validated the proposed framework in another experimental area, Xundian County. The results showed excellent performance in cropland identification, indicating the framework’s potential for broader applications in cropland- and land-use-type mapping. Notably, the analysis revealed distinct differences in feature importance between Xundian and Binchuan, underscoring the regional variability of key driving factors. Specifically, the top five features contributing to cropland identification in Xundian County were B4_p5, VH_p5, slope, EVI_p5, and B2_p5. Furthermore, synthetic aperture radar (SAR) features, particularly VV and VH, demonstrated strong predictive power, with 9 of the top 20 most important variables being derived from these bands. This highlights the significant role of radar-based information in mountainous and plateau regions, further emphasizing the need to tailor feature selection strategies to local conditions. The approach offers notable advantages in terms of accuracy, efficiency, and model interpretability, and its outputs also hold practical value for land management and rural development. The high-resolution maps of abandoned cropland can help local governments and land resource agencies identify priority areas for restoration or conversion, reducing blind investment. Feature importance analysis can further guide targeted interventions—such as improving road access, irrigation, or soil management—in areas where abandonment is driven by terrain, infrastructure, or socioeconomic factors. Moreover, integrating these results with existing rural revitalization strategies can support evidence-based decision-making, enabling policymakers to allocate funds and resources more efficiently, design incentive programs for farmers, and monitor the effectiveness of land use policies over time.

5.1. Methodological Advantages

(1): High-performance and efficient identification framework: By applying NSGA-II to optimize XGBoost’s hyperparameter space and combining it with recursive feature and pruning, the proposed model achieves high recall and strong generalization on the test set, while significantly reducing model complexity and improving both deployment efficiency and computational adaptability.
(2): Enhanced robustness through multi-source feature integration: By fusing remote sensing spectral features, terrain attributes, and socioeconomic indicators, the model constructs a semantically rich and spatially scalable feature set, improving its adaptability across diverse geomorphological types and land management regimes while maintaining classification stability [43,44,45].
(3): Simplified structure with flexible deployment: Through effective feature compression and parameter control, the final model achieves an optimal balance between computational load and recognition performance while retaining key variables. This structure is well-suited for integration into regional-scale cropland monitoring systems, supporting dynamic updates and long-term operation.

5.2. Interpretability Architecture and Strengths

(1): SHAP for global variable attribution: SHAP analysis reveals the ranking of features in terms of importance in the classification process, demonstrating that variables like mean image gray level, low values of red-edge bands, slope, and road accessibility consistently play a significant role in identifying abandoned land. These insights contribute to building a reusable knowledge-based rule system.
(2): LIME for pixel-level heterogeneity analysis: Focusing on local neighborhoods at the pixel scale, LIME reveals marginal contributions and directionality of variables for individual samples. It enhances understanding of classification logic in complex boundary zones or rural–urban fringes, complementing SHAP’s global-level insights.
(3): Improved model transparency and policy usability: The dual-path SHAP–LIME interpretability framework makes the variable–response–spatial object relationships explicit, transitioning the model from a “black box” to a “gray box.” This significantly enhances the model’s auditability and credibility in practical scenarios such as land supervision, cropland consolidation zoning, and policy design.
(4): Verification of classification stability and cross-region consistency: SHAP and LIME consistently show similar variable response trends across multiple samples and regions, indicating strong stability and potential transferability of the model across environmental and geographic units.

5.3. Limitations and Challenges

(1): Spatial mismatch in socioeconomic data: Some socioeconomic features (e.g., population density, road accessibility) are derived from statistical yearbooks or grid-based interpolation with coarser resolution than remote sensing pixels. This mismatch can induce classification errors, especially at the village scale or in mountainous fringe areas, weakening interpretability at fine spatial scales. Although the exact magnitude of this error was not directly quantified in this study due to the lack of pixel-level socioeconomic ground-truth data, existing studies have demonstrated that improving and harmonizing data resolution can effectively reduce such uncertainties and enhance classification accuracy. This underscores the importance of incorporating higher-resolution socioeconomic datasets or employing more detailed survey data for cross-validation in future research [46,47].
(2): Lack of causal inference mechanisms: Current interpretability tools (e.g., SHAP, LIME) reveal associations rather than causal relationships [48]. Under correlated or noisy features, variable importance may be biased by confounding, reverse causality, or scale effects, meaning that signals such as VV/VH or NDVI could partly reflect consequences rather than true drivers. Differences in feature rankings across regions (e.g., Xundian vs. Binchuan) are therefore expected and should be interpreted cautiously. Future studies could integrate causal frameworks—such as temporal lags, panel data designs, matching/weighting, or instrumental-variable approaches—to identify more robust cause–effect pathways and provide stronger evidence for policy interventions.
(3): Limited performance in transitional boundary zones: In zones with blurred boundaries (e.g., cropland vs. forest or built-up land), similar remote sensing features lead to confused or missed classifications. This issue is particularly prominent in fragmented plots or areas with unclear land ownership. Higher-resolution imagery and auxiliary geographic data are needed to enhance spatial boundary delineation [49,50,51]. Future research could benefit from higher-resolution remote sensing imagery, auxiliary geospatial datasets (e.g., cadastral maps), and multi-scale image fusion techniques to improve boundary delineation and enhance classification accuracy in transitional zones.
(4): Although the integration of optical, radar, and topographic data significantly improves model accuracy and robustness, it may also introduce uncertainties due to differences in spatial resolution, sensor characteristics, and acquisition times. Such discrepancies can create noise or bias in feature extraction and classification. Nevertheless, previous studies indicate that when properly harmonized, multi-source approaches generally outperform single-source models. Future work should further explore methods to quantify and reduce these uncertainties, such as advanced data fusion techniques and cross-regional validation.

5.4. Future Research Directions

(1): Building high-resolution socioeconomic data integration mechanisms: Incorporating frequent, fine-grained data such as land transaction records, rural e-commerce activity, or agricultural machinery trajectories can help reconstruct micro-scale socioeconomic maps, overcoming the coarse granularity and lag of traditional statistics and enhancing local-scale applicability and timeliness [52].
(2): Introducing causal inference and graph-based modeling: Future work may explore Structural Equation Modeling (SEM), Causal Forests, or Graph Neural Networks (GNNs) to model causal relationships and spatial dependencies among variables, thereby enabling mechanism reasoning and decision forecasting, and enriching the theoretical depth of abandonment modeling.
(3): Advancing transfer learning and regional adaptation mechanisms: Considering the variability in cropping systems, climate zones, and landforms, future models should incorporate domain adaptation, regional tuning, and model migration techniques to enable scaling from local to national levels, enhancing the model’s scalability and application potential.

6. Conclusions

This study addresses the challenges of improving accuracy and efficiency in cropland abandonment detection by proposing a machine learning framework that integrates multi-source remote sensing, recursive feature compression, and multi-objective evolutionary optimization. By combining NSGA-II with XGBoost, the framework enables both feature space reduction and hyperparameter tuning, significantly improving computational efficiency and engineering applicability while maintaining high classification performance. The experimental results demonstrate that the pruning–NSGA-II-optimized model achieves excellent recall and generalization on independent test sets, making it well-suited for practical monitoring scenarios demanding low omission rates and high coverage. The dual-layer interpretability system, built upon SHAP and LIME, provides strong support for transparent model logic, enabling reliable use in policy design and land supervision.

In summary, this study presents innovations in remote sensing data utilization, model optimization, and interpretability, offering a scalable, integrable, and explainable technical solution for large-scale cropland abandonment monitoring. The results can serve as intelligent tools for natural resource management agencies and provide a robust data-model foundation for agricultural revitalization and cropland protection policymaking.

Author Contributions

Conceptualization, S.G. and J.L.; methodology, S.G. and G.C.; validation, S.G. and G.C.; formal analysis, J.Z. and B.T.; investigation, G.C.; writing—original draft preparation, S.G. and G.C.; writing—review and editing, G.C. and J.Z.; supervision, J.Z. and B.T.; project administration, G.C. and J.Z.; funding acquisition, L.L. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Open Fund Program of Yunnan Key Laboratory of Intelligent Monitoring and Spatiotemporal Big Data Governance of Natural Resources (202449CE340023) and Major Science and Technology Project and Key Research and Development Program of Yunnan Province (202403ZC380001).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that this study received funding from Yunnan Institute of Geology and Mineral Surveying and Mapping Co., Ltd. The funder had the following involvement with the study: participation in the conceptualization of the research by its employee, J.Z.

References

Zhong, L.; Zhou, Y.; Gan, Y.; Lou, W. Land Resources, Green Energy, and Food Production: Perspective of Middle-Income Economies. Land Degrad. Dev. 2025, 36, 3786–3800. [Google Scholar] [CrossRef]
Long, H. Land Consolidation: An Indispensable Way of Spatial Restructuring in Rural China. J. Geogr. Sci. 2014, 24, 211–225. [Google Scholar] [CrossRef]
Zheng, W.; Li, S.; Ke, X.; Li, X.; Zhang, B. The Impacts of Cropland Balance Policy on Habitat Quality in China: A Multiscale Administrative Perspective. J. Environ. Manag. 2022, 323, 116182. [Google Scholar] [CrossRef]
Liang, X.; Jin, X.; Han, B.; Sun, R.; Xu, W.; Li, H.; He, J.; Li, J. China’s Food Security Situation and Key Questions in the New Era: A Perspective of Farmland Protection. J. Geogr. Sci. 2022, 32, 1001–1019. [Google Scholar] [CrossRef]
Liang, X.; Jin, X.; Dou, Y.; Zhang, X.; Li, H.; Wang, S.; Meng, F.; Tan, S.; Zhou, Y. Mapping Sustainability-Oriented China’s Cropland Use Stability. Comput. Electron. Agric. 2024, 219, 108823. [Google Scholar] [CrossRef]
Jaquet, S.; Schwilch, G.; Hartung-Hofmann, F.; Adhikari, A.; Sudmeier-Rieux, K.; Shrestha, G.; Liniger, H.P.; Kohler, T. Does Outmigration Lead to Land Degradation? Labour Shortage and Land Management in a Western Nepal Watershed. Appl. Geogr. 2015, 62, 157–170. [Google Scholar] [CrossRef]
Li, S.; Li, X.; Sun, L.; Cao, G.; Fischer, G.; Tramberend, S. An Estimation of the Extent of Cropland Abandonment in Mountainous Regions of China. Land Degrad. Dev. 2018, 29, 1327–1342. [Google Scholar] [CrossRef]
Chaudhary, S.; Wang, Y.; Dixit, A.M.; Khanal, N.R.; Xu, P.; Fu, B.; Yan, K.; Liu, Q.; Lu, Y.; Li, M. A Synopsis of Farmland Abandonment and Its Driving Factors in Nepal. Land 2020, 9, 84. [Google Scholar] [CrossRef]
Chen, H.; Tan, Y.; Xiao, W.; He, T.; Xu, S.; Meng, F.; Li, X.; Xiong, W. Assessment of Continuity and Efficiency of Complemented Cropland Use in China for the Past 20 Years: A Perspective of Cropland Abandonment. J. Clean. Prod. 2023, 388, 135987. [Google Scholar] [CrossRef]
Mu, P.; Tian, F. Spatiotemporal Fusion of Multi-Temporal MODIS and Landsat-8/9 Imagery for Enhanced Daily 30 m NDVI Reconstruction: A Case Study of the Shiyang River Basin Cropland (2022). Remote Sens. 2025, 17, 1510. [Google Scholar] [CrossRef]
Wang, X.; He, Y.; Zha, Y.; Chen, H.; Wang, Y.; Wu, X.; Ning, J.; Feng, A.; Han, S.; Luo, S. Mapping Winter Fallow Arable Lands in Southern China by Using a Multi-Temporal Overlapped Area Minimization Threshold Method. GISci. Remote Sens. 2024, 61, 2333587. [Google Scholar] [CrossRef]
Yin, H.; Prishchepov, A.V.; Kuemmerle, T.; Bleyhl, B.; Buchner, J.; Radeloff, V.C. Mapping agricultural land abandonment from spatial and temporal segmentation of Landsat time series.Remote Sensing of Environment. Remote Sens. Environ. 2018, 210, 12–24. [Google Scholar] [CrossRef]
Samadzadegan, F.; Toosi, A.; Dadrass Javan, F. A Critical Review on Multi-Sensor and Multi-Platform Remote Sensing Data Fusion Approaches: Current Status and Prospects. Int. J. Remote Sens. 2025, 46, 1327–1402. [Google Scholar] [CrossRef]
Lu, D.; Su, K.; Wang, Z.; Hou, M.; Li, X.; Lin, A.; Yang, Q. Patterns and Drivers of Terrace Abandonment in China: Monitoring Based on Multi-Source Remote Sensing Data. Land Use Policy 2025, 148, 107388. [Google Scholar] [CrossRef]
Wei, Z.; Gu, X.; Sun, Q.; Hu, X.; Gao, Y. Analysis of the Spatial and Temporal Pattern of Changes in Abandoned Farmland Based on Long Time Series of Remote Sensing Data. Remote Sens. 2021, 13, 2549. [Google Scholar] [CrossRef]
Long, Y.; Sun, J.; Wellens, J.; Colinet, G.; Wu, W.; Meersmans, J. Mapping the Spatiotemporal Dynamics of Cropland Abandonment and Recultivation across the Yangtze River Basin. Remote Sens. 2024, 16, 1052. [Google Scholar] [CrossRef]
Yang, Y.; Wu, Z.; Xiao, W.; Zhou, Y.; Huang, Q.; Wu, T.; Luo, J.; Wang, H. Abandoned Land Mapping Based on Spatiotemporal Features from PolSAR Data via Deep Learning Methods. Remote Sens. 2023, 15, 3942. [Google Scholar] [CrossRef]
Wang, L.; Xiong, Q.; Tong, Z.; An, R.; Liu, Y.; Zhang, S. Exploring the Non-Linear Relations between the Cropland Expansion and Driving Factors in China. Reg. Environ. Change 2024, 24, 138. [Google Scholar] [CrossRef]
Chen, H.; Tan, Y.; Xiao, W.; Xu, S.; Xia, H.; Ding, G.; Xia, H.; Prishchepov, A.V. Spatiotemporal Variation in Determinants of Cropland Abandonment across Yangtze River Economic Belt, China. Catena 2024, 245, 108326. [Google Scholar] [CrossRef]
Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. ESA WorldCover 10 m 2021 v200; European Space Agency: Paris, France, 2022. [Google Scholar] [CrossRef]
Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use/land cover with Sentinel-2 and deep learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021. [Google Scholar]
Liu, Y.; Zhong, Y.; Ma, A.; Zhao, J.; Zhang, L. Cross-resolution national-scale land-cover mapping based on noisy label learning: A case study of China. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103265. [Google Scholar] [CrossRef]
Neal, J.; Hawker, L. FABDEM V1-2; University of Bristol: Bristol, UK, 2023. [Google Scholar] [CrossRef]
Hawker, L.; Uhe, P.; Paulo, L.; Sosa, J.; Savage, J.; Sampson, C.; Neal, J. A 30m global map of elevation with forests and buildings removed. Environ. Res. Lett. 2022, 17, 024016. [Google Scholar] [CrossRef]
Peng, S. 1-km Monthly Precipitation Dataset for China (1901–2024); National Tibetan Plateau/Third Pole Environment Data Center: Beijing, China, 2020. [Google Scholar] [CrossRef]
Peng, S. 1-km Monthly Mean Temperature Dataset for China (1901–2024); National Tibetan Plateau/Third Pole Environment Data Center: Beijing, China, 2019. [Google Scholar] [CrossRef]
Ma, W.; Zhang, X.; Shen, Y.; Xie, J.; Zuo, G.; Zhang, X.; Jin, T. Incorporating Recursive Feature Elimination and Decomposed Ensemble Modeling for Monthly Runoff Prediction. Water 2024, 16, 3102. [Google Scholar] [CrossRef]
Xiang, W.; Zhu, G.; Hou, Y.; Mei, Z.; Wan, L.; Zhang, L.; Yang, G.; Zu, J. Development and Validation of an Interpretable Machine Learning Model for Predicting Tic Disorders and Severity in Children Based on Electroencephalogram Data. IEEE Trans. Neural Syst. Rehabil. Eng. 2025, 33, 2416–2427. [Google Scholar] [CrossRef]
Arif, M.; Rehman, F.U.; Sekanina, L.; Malik, A.S. A Comprehensive Survey of Evolutionary Algorithms and Metaheuristics in Brain EEG-Based Applications. J. Neural Eng. 2024, 21, 051002. [Google Scholar] [CrossRef] [PubMed]
Song, H.; Zhou, H.; Wang, H.; Ma, Y.; Zhang, Q.; Li, S. Retrieval of Tree Height Percentiles over Rugged Mountain Areas via Target Response Waveform of Satellite Lidar. Remote Sens. 2024, 16, 425. [Google Scholar] [CrossRef]
Liu, Z.; Chen, G.; Tang, B.; Wen, Q.; Tan, R.; Huang, Y. Regional Scale Terrace Mapping in Fragmented Mountainous Areas Using Multi-Source Remote Sensing Data and Sample Purification Strategy. Sci. Total Environ. 2024, 925, 171366. [Google Scholar] [CrossRef]
Zhen, J.; Mao, D.; Shen, Z.; Zhao, D.; Xu, Y.; Wang, J.; Jia, M.; Wang, Z.; Ren, C. Performance of Xgboost Ensemble Learning Algorithm for Mangrove Species Classification with Multisource Spaceborne Remote Sensing Data. J. Remote Sens. 2024, 4, 0146. [Google Scholar] [CrossRef]
Shao, Z.; Ahmad, M.N.; Javed, A. Comparison of Random Forest and Xgboost Classifiers Using Integrated Optical and Sar Features for Mapping Urban Impervious Surface. Remote Sens. 2024, 16, 665. [Google Scholar] [CrossRef]
Yang, Y.; Meng, Z.; Zu, J.; Cai, W.; Wang, J.; Su, H.; Yang, J. Fine-Scale Mangrove Species Classification Based on Uav Multispectral and Hyperspectral Remote Sensing Using Machine Learning. Remote Sens. 2024, 16, 3093. [Google Scholar] [CrossRef]
Ren, S.; Zhu, J.; Cheng, Z.; Wang, X.; Tong, Z.; Qiu, L. Multi-Objective Optimization of Fan-Shaped Film Hole for Balancing Heat Transfer and Thermal Stress Based on ANN and NSGA-II. Int. Commun. Heat Mass Transf. 2025, 165, 109045. [Google Scholar] [CrossRef]
Mosa, M.A. Optimizing Text Classification Accuracy: A Hybrid Strategy Incorporating Enhanced NSGA-II and XGBoost Techniques for Feature Selection. Prog. Artif. Intell. 2025, 14, 275–299. [Google Scholar] [CrossRef]
Wang, H.; Liang, Q.; Hancock, J.T.; Khoshgoftaar, T.M. Feature Selection Strategies: A Comparative Analysis of SHAP-Value and Importance-Based Methods. J. Big Data 2024, 11, 44. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Z.; Luo, X.; Zhao, H. Diagnosis of Parkinson’s Disease Based on SHAP Value Feature Selection. Biocybern. Biomed. Eng. 2022, 42, 856–869. [Google Scholar] [CrossRef]
Hung, Y.-H.; Lee, C.-Y. BMB-LIME: LIME with Modeling Local Nonlinearity and Uncertainty in Explainability. Knowl.-Based Syst. 2024, 294, 111732. [Google Scholar] [CrossRef]
Meng, H.; Wagner, C.; Triguero, I. SEGAL Time Series Classification—Stable Explanations Using a Generative Model and an Adaptive Weighting Method for LIME. Neural Netw. 2024, 176, 106345. [Google Scholar] [CrossRef] [PubMed]
Donmez, T.B.; Kutlu, M.; Mansour, M.; Yildiz, M.Z. Explainable AI in Action: A Comparative Analysis of Hypertension Risk Factors Using SHAP and LIME. Neural Comput. Appl. 2025, 37, 4053–4074. [Google Scholar] [CrossRef]
Alabi, R.O.; Elmusrati, M.; Leivo, I.; Almangush, A.; Mäkitie, A.A. Machine Learning Explainability in Nasopharyngeal Cancer Survival Using LIME and SHAP. Sci. Rep. 2023, 13, 8984. [Google Scholar] [CrossRef]
Li, Y.; Chang, C.; Wang, Z.; Li, T.; Li, J.; Zhao, G. Identification of Cultivated Land Quality Grade Using Fused Multi-Source Data and Multi-Temporal Crop Remote Sensing Information. Remote Sens. 2022, 14, 2109. [Google Scholar] [CrossRef]
Zhang, X.; Qin, C.; Ma, S.; Liu, J.; Wang, Y.; Liu, H.; An, Z.; Ma, Y. Study on the Extraction of Topsoil-Loss Areas of Cultivated Land Based on Multi-Source Remote Sensing Data. Remote Sens. 2025, 17, 547. [Google Scholar] [CrossRef]
Yang, L.; Sun, Q.; Gui, R.; Hu, J. Monitoring of Cropland Non-Agriculturalization Based on Google Earth Engine and Multi-Source Data. Appl. Sci. 2025, 15, 1474. [Google Scholar] [CrossRef]
Jia, X.; Hao, Z.; Shi, L.; Wang, Z.; Chen, S.; Du, Y.; Ling, F. Super-resolution cropland mapping with Sentinel-2 images based on a self-training learning network. Remote Sens. Lett. 2024, 15, 1143–1152. [Google Scholar] [CrossRef]
Li, R.; Gao, X.; Shi, F.; Zhang, H. Scale effect of land cover classification from multi-resolution satellite remote sensing data. Sensors 2023, 23, 6136. [Google Scholar] [CrossRef]
Carloni, G.; Berti, A.; Colantonio, S. The Role of Causality in Explainable Artificial Intelligence. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2025, 15, e70015. [Google Scholar] [CrossRef]
Xu, X.; Li, D.; Liu, H.; Zhao, G.; Cui, B.; Yi, Y.; Yang, W.; Du, J. Comparative Validation and Misclassification Diagnosis of 30-Meter Land Cover Datasets in China. Remote Sens. 2024, 16, 4330. [Google Scholar] [CrossRef]
Cui, Y.; Liu, R.; Li, Z.; Zhang, C.; Song, X.-P.; Yang, J.; Yu, L.; Chen, M.; Dong, J. Decoding the Inconsistency of Six Cropland Maps in China. Crop J. 2024, 12, 281–294. [Google Scholar] [CrossRef]
Zhang, F.; Wang, X.; Xin, L.; Li, X. Assessing the Accuracy and Consistency of Cropland Datasets and Their Influencing Factors on the Tibetan Plateau. Remote Sens. 2025, 17, 1866. [Google Scholar] [CrossRef]
Wu, N.; Yan, J.; Liang, D.; Sun, Z.; Ranjan, R.; Li, J. High-Resolution Mapping of GDP Using Multi-Scale Feature Fusion by Integrating Remote Sensing and POI Data. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103812. [Google Scholar] [CrossRef]

Figure 1. Location of the study area and sampling points of cropland.

Figure 2. Field survey and abandoned cropland judgement (green: cropland; orange: non-cropland; red: abandoned cropland; white: non-abandoned cropland). (a) Field survey of cropland; (b) field survey of abandoned cropland; (c) process for the judgement of abandoned cropland.

Figure 3. Overall flow chart.

Figure 4. RFECV feature-selection curve.

Figure 5. Comparison of different models results: (a) accuracy; (b) training time.

Figure 6. Cropland distribution map for 2020 based on the Third National Land Survey (a), XGBoost (b), pruning–NSGA-II-XGBoost (c), and NSGA-II-XGBoost (d).

Figure 7. Evolution of metrics per generation.

Figure 8. Model-derived abandoned cropland identification and validation with typical plots.

Figure 9. Feature SHAP value.

Figure 10. Feature SHAP interaction value. Note: The horizontal axis shows the model prediction values; the top horizontal and left vertical axes indicate feature names; the vertical bar on the right represents the SHAP value distribution, reflecting each feature’s contribution to the prediction.

Figure 11. Feature LIME contribution value.

Table 1. Data types and sources.

Data Types	Data Name	Resolution	Year	Data Sources
Sentinel-2	Blue (B2)	10 m	2020–2022	https://developers.google.cn/earth-engine/datasets/catalog/sentinel-2?hl=zh-cn (accessed on 4 July 2025).
	Green (B3)	10 m
	Red (B4)	10 m
	Red edge 1 (B5)	20 m
	Red edge 2 (B6)	20 m
	Red edge 3 (B7)	20 m
	Near Infrared (B8)	10 m
	Narrow Near Infrared (B8A)	20 m
	Short-Wave Infrared 2 (B11)	20 m
	Short-Wave Infrared 3 (B12)	20 m
Sentinel-1	VV VH	10 m		Sentinel-1 SAR GRD: C-band Synthetic Aperture Radar Ground Range Detected, log scaling\|Earth Engine Data Catalog\|Google for Developers (accessed on 6 July 2025).
LULC	ESA	10 m		https://developers.google.com/earth-engine/datasets/catalog/ESA_WorldCover_v100 (accessed on 4 July 2025).
	ESRI	10 m		https://www.arcgis.com/home/item.html?id=cfcb7609de5f478eb7666240902d4d3d (accessed on 8 July 2025).
	CRLC	10 m		https://github.com/LiuGalaxy/CRLC?tab=readme-ov-file (accessed on 11 July 2025).
Topographic data	DEM	30 m	2022	https://doi.org/10.5523/bris.s5hqmjcdj8yo2ibzi9b4ew3sn (accessed on 6 July 2025).
Socioeconomic data	Road River	1 km	2020–2022	The third national land resource survey
Socioeconomic data	Population	1 km		ORNL LandScan Viewer—Oak Ridge National Laboratory (accessed on 12 July 2025).
Meteorological data	Temperature	1 km		https://doi.org/10.5281/zenodo.3114194. (accessed on 1 July 2025).
Meteorological data	Precipitation	1 km		https://doi.org/10.5281/zenodo.3114194. (accessed on 1 July 2025).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gui, S.; Li, J.; Chen, G.; Zhao, J.; Tang, B.; Li, L. Identification of Abandoned Cropland and Global–Local Driving Mechanism Analysis via Multi-Source Remote Sensing Data and Multi-Objective Optimization. Remote Sens. 2025, 17, 3086. https://doi.org/10.3390/rs17173086

AMA Style

Gui S, Li J, Chen G, Zhao J, Tang B, Li L. Identification of Abandoned Cropland and Global–Local Driving Mechanism Analysis via Multi-Source Remote Sensing Data and Multi-Objective Optimization. Remote Sensing. 2025; 17(17):3086. https://doi.org/10.3390/rs17173086

Chicago/Turabian Style

Gui, Side, Jiaming Li, Guoping Chen, Junsan Zhao, Bohui Tang, and Lei Li. 2025. "Identification of Abandoned Cropland and Global–Local Driving Mechanism Analysis via Multi-Source Remote Sensing Data and Multi-Objective Optimization" Remote Sensing 17, no. 17: 3086. https://doi.org/10.3390/rs17173086

APA Style

Gui, S., Li, J., Chen, G., Zhao, J., Tang, B., & Li, L. (2025). Identification of Abandoned Cropland and Global–Local Driving Mechanism Analysis via Multi-Source Remote Sensing Data and Multi-Objective Optimization. Remote Sensing, 17(17), 3086. https://doi.org/10.3390/rs17173086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification of Abandoned Cropland and Global–Local Driving Mechanism Analysis via Multi-Source Remote Sensing Data and Multi-Objective Optimization

Abstract

1. Introduction

2. Study Area and Data

2.1. Overview of the Study Area

2.2. Data Sources and Preprocessing

2.3. Sample Point Construction

3. Methodology

3.1. Abandoned Cropland Definition and Overall Framework

3.2. Feature Construction

3.3. Feature Selection via RFECV

3.4. Cropland Classification Model and Algorithmic Foundations

3.5. Interpretability Framework and Methods

4. Results and Analysis

4.1. Feature Optimization and Model Performance Improvement

4.2. Generalization Capability and Independent Validation

4.3. Spatial Distribution of Abandoned Cropland and Validation in Representative Areas

4.4. Interpretation of Driving Factors

4.4.1. Global Interpretation: SHAP Analysis

4.4.2. Interaction Effects: Variable Synergy Mechanisms

4.4.3. Local Interpretation: LIME-Based Sample Response Analysis

5. Discussion and Outlook

5.1. Methodological Advantages

5.2. Interpretability Architecture and Strengths

5.3. Limitations and Challenges

5.4. Future Research Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI