1. Introduction
The accelerated urbanization and industrialization in developing countries such as China have intensified the process of cropland non-agriculturalization (CNA) [
1,
2,
3,
4,
5,
6,
7]. CNA refers to the transformation of cropland into non-agricultural land uses driven by economic expansion, population growth, policy orientation, and land-use restructuring, and is characterized by strong spatial heterogeneity and multi-factor coupling. As a core issue in global urban and industrial expansion, CNA has become the focus of interdisciplinary research across geography, land science, ecology, and regional planning. Previous studies have widely acknowledged that CNA emerges from the interaction between macro-level development strategies and micro-level land conversion behaviors, exhibiting strong spatial clustering and diffusion characteristics.
In rapidly growing metropolitan regions, fiscal dependence on land development, low agricultural returns, and urban spillover jointly accelerate CNA. Central cities often exert a siphoning effect, attracting population and capital while displacing surrounding cropland. Typical patterns have been observed in Jakarta’s Jabotabek region [
8], Guangzhou [
9], and Zhejiang Province [
10]. In contrast, developed countries, although they experienced large-scale cropland conversion during industrialization, now rely more on legal regulation and market-based instruments such as zoning control, tax incentives, and farmland protection policies to moderate land-use transitions. Nevertheless, CNA pressures continue to persist globally due to ongoing economic growth, population concentration, and complex policy implementation environments. These processes exert profound impacts on food security, ecosystem services, and rural livelihoods [
11,
12,
13]. In China, cropland conversion from 1990 to 2020 reached annual averages of 1520.60 km
2 to urban land, 1464.60 km
2 to rural residential land, and 987.44 km
2 to other construction land [
14]. Despite the implementation of cropland balance and permanent farmland protection policies, both total and per capita cropland areas have continued to decline. Beyond food security, CNA is closely associated with soil degradation, biodiversity loss, and landscape fragmentation, while recent studies have revealed a strong spatial correlation between cropland conversion and carbon emissions, further amplifying its ecological significance [
15,
16].
To address these challenges, extensive efforts have applied remote sensing, GIS, machine learning, and spatiotemporal modeling techniques to monitor CNA patterns and identify driving mechanisms. In the domain of remote sensing, multi-source optical and SAR data integrated via platforms such as Google Earth Engine have substantially improved cropland mapping and change detection [
17,
18,
19]. High-frequency vegetation index-based methods, such as seasonal NDVI differencing, have demonstrated strong performance in detecting abandoned cropland in mountainous and fragmented landscapes [
20]. At the algorithmic level, traditional GIS-based spatial analyses, including kernel density estimation and hot spot detection, have been used to characterize broad spatial patterns of CNA. Spatial econometric models further quantify both direct and spillover effects of socioeconomic and policy drivers [
1,
2,
21]. In recent years, machine learning and deep learning methods have demonstrated superior performance in extracting high-dimensional features and improving predictive accuracy. The XGBoost-SHAP framework reveals the dominant role of socioeconomic variables in driving CNA [
22], while semantic segmentation models based on Vision Transformers and ChangeFormers significantly enhance abandoned cropland detection [
23]. Dynamic convolution (FADConv) and frequency-based attention mechanisms have achieved a balance between accuracy and computational efficiency in mapping CNA using high-resolution imagery [
24]. PSO-optimized XGBoost models have also been applied to cropland degrainization susceptibility mapping [
25]. More recently, integrated “Remote Sensing–GIS–Machine Learning” frameworks have been developed to combine multi-scale indicators and simulate future scenarios [
26,
27,
28].
Despite these methodological advances, most existing approaches remain constrained by several structural limitations. NDVI-based methods are sensitive to vegetation sparsity and image quality [
20], deep learning models require large volumes of annotated data and often lack transferability [
23,
29,
30], and spatial econometric models struggle to capture nonlinear and multiscale relationships [
1,
21]. More importantly, four critical challenges persist: limited interpretability, insufficient temporal resolution for regulatory intervention, lack of regulatory integration, and the unresolved trade-off between accuracy and operational cost. These constraints reduce the practical value of many CNA models for land-use governance.
Graph-based modeling provides a natural framework to address these challenges by explicitly encoding spatial relationships and enabling joint modeling of structure and attributes. Graphs have been widely adopted in complex relational systems [
30,
31,
32] and micro-scale scientific domains [
33,
34]. Graph neural networks (GNNs) extend this paradigm by enabling end-to-end representation learning on graph-structured data [
35,
36,
37,
38]. Among them, Kipf et al. [
39] proposed Graph Convolutional Networks (GCNs), which applied spectral convolutions to graph data. GraphSAGE [
40] supports scalable neighborhood aggregation, while Graph Attention Networks (GAT) introduce adaptive weighting mechanisms to capture heterogeneous spatial influences [
41,
42]. These properties make GNNs particularly suitable for modeling CNA, which inherently involves spatial proximity, cross-parcel interaction, and multi-source drivers. Spatiotemporal GNNs further enhance the representation of dynamic geographic systems [
43,
44,
45,
46,
47,
48,
49]. Nevertheless, the application of GNNs to CNA susceptibility prediction remains limited.
To address the above limitations, this study proposes a unified CNA susceptibility prediction framework that integrates multi-source temporal remote sensing data with a spatially structured Graph Attention Network. Each node represents a block-scale land parcel, and edges are defined jointly by spatial adjacency and attribute similarity, embedding the causal chain from geographic constraints to construction behavior and spectral response. Multi-head attention enables adaptive aggregation of heterogeneous neighborhood features, supporting annual-scale, low-cost, and interpretable prediction. This framework enables meso-scale assessment and provides a practical tool for proactive farmland protection and refined land-use supervision.
3. Methods
To predict CNA susceptibility, this study proposes a three-stage hybrid methodological framework that integrates domain-specific knowledge and graph-based deep learning (
Figure 2). The approach comprises the following components. First, during the data preparation stage, CNA susceptibility levels are quantified and a spatial graph structure is constructed. In the susceptibility quantification phase, a Comprehensive Intensity of Cropland Non-agriculturalization (
) index is developed based on socioeconomic, natural, and policy-related indicators. This index is calculated at the node level and subsequently categorized into four levels of CNA susceptibility, namely none, low, medium, and high, via a combination of expert visual interpretation and the natural breaks classification method. Concurrently, the spatial graph structure is constructed by treating block-level units as nodes and defining edges based on spatial adjacency and attribute similarity. The resulting heterogeneous graph is labeled with the corresponding susceptibility level. Second, during the model development stage, the data are divided into training and testing sets at a 7:3 ratio. To address the issue of class imbalance, a GraphSMOTE-based over-sampling strategy is applied to enhance minority class representation during training. GAT serves as the backbone model, implemented using the PyTorch framework. Finally, during the evaluation and interpretation stage, model performance is assessed using Micro-F1 and Area Under the Curve (AUC) metrics. In addition to standard accuracy evaluation, a relaxed criterion is introduced, where predictions within ±1 susceptibility level of the ground truth (
) are also considered acceptable.
3.1. Evaluation of Cropland Non-Agriculturalization Susceptibility
Most existing studies measure CNA intensity using conversion rates. However, this study posits that CNA is a multifaceted process not solely characterized by quantitative changes, but also shaped by cropland quality, ecological integrity, economic benefits, and restoration feasibility. Accordingly, the concept of Comprehensive Intensity of Cropland Non-agriculturalization (
) is introduced, and an evaluation matrix is constructed to capture the multidimensional impacts of CNA across economic, ecological, and social dimensions (
Table 2).
Land cover categories are reclassified into six types, including cropland, water, forest, grassland, construction, and unused land, which results in five types of specific cropland conversion trajectories.
is evaluated across four dimensions: cropland quality, ecosystem impact, economic benefit, and difficulty of reclamation. The evaluation matrix is based on a 7-point Likert Scale (0: no change, ±1: negligible decline/improvement, ±2: moderate impact, ±3: significant impact), follows the environmental index evaluation method where +3 indicates strongly adverse impacts and −3 denotes strong positive effects, consistent with ecological compensation frameworks. Take the cropland-construction conversion type as an example, the scores for cropland quality, ecosystem impact, and reclamation difficulty are all +3, indicating severe negative effects—significant quality loss, ecological degradation, and high reclamation difficulty. Its economic benefit score of −3 suggests considerable economic gains that partly offset these adverse impacts. The scoring basis reflects expert judgment calibrated with empirical knowledge of land-use transition processes, and the weights were determined on the basis of prior studies on land use and cropland, which prioritizes long-term food security and ecological sustainability over short-term economic gains, emphasizing the fundamental importance of cropland quality and ecosystem integrity in assessing CNA impacts [
18,
21,
25,
28]. The scoring criteria for each transition type and evaluation dimension are summarized in
Table A1.
The composite
score for each spatial unit is calculated by aggregating the scores across all land conversion types and their respective dimensions, weighted by the importance of each dimension. Specifically, land use data from two adjacent years are compared to compute the annual cropland non-agriculturalization rate for each type of conversion within each unit. The final
score for each unit is derived by multiplying the normalized CNA rate by the weighted impact score from the evaluation matrix. This process allows for a comprehensive and time-sensitive quantification of CNA intensity. The
score for each unit is calculated using the following formula:
where
i denotes the type of cropland non-agriculturalization;
j represents the evaluation dimension in the scoring matrix;
corresponds to the composite coefficient assigned to each CNA type;
refers to the cropland non-agriculturalization rate of type
i from year
to year
t;
represents the comprehensive intensity of cropland non-agriculturalization for the current spatial unit; ∗ indicates that the value has been normalized. The constant term (+1) in Formula (
3) plays a critical role in maintaining both numerical stability and physical interpretability. First, it guarantees that any cropland loss contributes a baseline value to the total intensity, preventing neutralized transitions from being ignored. Second, by transforming the multiplier into the range [1, 2], the qualitative severity becomes an enhancement factor, allowing the framework to distinguish scale-driven from impact-driven cropland non-agriculturalization. To enhance the interpretability of the proposed formulation, a numerical example is provided in
Appendix B.
After calculating
for each spatial unit, the natural breaks method is used to categorize
into four susceptibility levels: none (0), low (1), medium (2), and high (3). These preliminary levels are further refined through visual interpretation using historical imagery from Google Earth, to correct potential misclassifications in land cover data. Units with low
but rapid cropland transition processes are manually upgraded to higher susceptibility classes.
Figure 3 outlines the visual interpretation rules corresponding to different susceptibility levels. For example, low-susceptibility areas exhibit surface hardening or bare land, while medium-susceptibility areas contain scattered factories or low-rise buildings. High-susceptibility zones are characterized by extensive non-agricultural built-up areas replacing cropland. To minimize subjectivity, two independent analysts optimized the preliminary levels based on
results using historical imagery from Google Earth. Inter-observer agreement reached 85%, indicating substantial consistency.
3.2. Construction of the Cropland Non-Agriculturalization Feature System
Currently, there is no consensus regarding the determinants of cropland non-agriculturalization susceptibility. Based on the intrinsic mechanisms of CNA, this study systematically constructs a multi-dimensional feature system that integrates natural attributes, landscape structure, human activity, and spatiotemporal dynamics. Features relevant to CNA susceptibility are derived from four primary dimensions: imagery, land cover, topography, and socioeconomics. The rationale behind each feature type and its explanatory power in the CNA process is discussed in detail. As shown in
Figure 4, the proposed four-dimensional feature system balances natural constraints and anthropogenic disturbances, incorporates both static background and dynamic signals, and provides a robust data foundation for subsequent modeling and mechanism interpretation.
3.2.1. Imagery
Remote sensing imagery is the most direct data source for monitoring surface cover. Spectral and texture information respectively reflect the physical attributes and spatial configuration of land features. Therefore, both spectral and texture characteristics are utilized as imagery features. On one hand, different land cover types exhibit significant spectral separability in visible–NIR wavelengths; for example, healthy vegetation is observed to reflect strongly in the near-infrared band, whereas construction land shows markedly lower reflectance. Thus, mean band values are selected as spectral features to distinguish cropland from built-up areas based on inherent radiometric responses without prior knowledge. On the other hand, cropland non-agriculturalization is often accompanied by either homogenization (e.g., contiguous factory roofs) or fragmentation (e.g., scattered sheds) of land surface patterns. Gray-Level Co-occurrence Matrix (GLCM) metrics are used to quantify these structural changes [
50]. Four indicators, namely contrast (texture sharpness), entropy (complexity), angular second moment (uniformity), and inverse difference moment (local homogeneity), are employed to capture texture variation caused by human activities in multiple directions, compensating for the limited spatial sensitivity of spectral features.
3.2.2. Land Cover
Land cover data not only represent land classification but also imply human intentions and spatial interaction patterns. This study combines static semantic features and dynamic NDVI time-series features to construct land cover indicators. Traditional categorical encoding is insufficient to represent functional associations. To address this, a metaphorical analogy is utilized, namely “region as document”, “land type as word” and “land sequence as sentence” to build a land category corpus. Using Word2Vec, each land category is embedded into a high-dimensional semantic space to capture potential transition logics embedded in spatial contexts [
51]. Weighted average and Principal Component Analysis (PCA) are applied to derive semantic features of each region, preserving contextual meaning while unifying data across multiple spatial scales. Furthermore, as seasonal exposure differences are frequently exhibited by CNA, manifested by distinct NDVI curves across spring tillage, summer growth, autumn harvest, and winter abandonment, quarterly NDVI statistics are utilized to identify permanently abandoned, seasonally fallowed, or transitional cropland, thereby mitigating misclassification risks from static land cover data.
3.2.3. Topography
Topography indirectly influences CNA through development cost and land suitability. Plains, hills, and mountainous areas exhibit substantial variation in slope, aspect, and elevation. Flat areas offer low development costs and are more susceptible to urban or industrial encroachment. South-facing terraced cropland, although highly efficient for farming, may be repurposed for tourism because of its scenic value. Hilly and mountainous areas often have fragmented, marginal cropland that may be abandoned or passively converted under policy interventions. Thus, slope, aspect, elevation range, and coefficient of variation are selected to characterize topographic constraints on land use change.
3.2.4. Socioeconomy
Socioeconomic factors are direct drivers of CNA, and their spatial heterogeneity governs the probability of land-use transformation. Transport accessibility is one of the most critical factors influencing CNA and is a leading indicator of urban expansion. Road network density and bus stop counts jointly reflect agricultural logistics efficiency and the accessibility of construction land, thus influencing the location of urban expansion and non-agricultural projects. Cropland-to-construction is the most common CNA pathway. Higher building density and expanded built-up area indicate stronger land development intensity and increased risk of surrounding cropland loss. Economic development and industrial upgrading, especially the expansion of secondary industries, is observed to be tightly linked to the conversion of cropland to non-agricultural uses. High levels of economic activity further exacerbate land use competition. Population totals and densities influence CNA through housing, employment, and infrastructure demands, and spatial population distribution helps locate urbanization hotspots. Therefore, four indicators, consisting of transport accessibility, building density, economic vitality, and population pressure, are selected to represent the socioeconomic dimension of CNA drivers.
3.3. Graph Model Construction
3.3.1. Node and Edge Construction
Geographical entities, characterized by fixed spatial relationships and dynamically changing attribute features, are inherently well-suited for graph-based modeling. A graph structure consists of nodes and edges, in which nodes represent to the smallest spatial unit of analysis. In geographical research, the spatial scale determines the extent and granularity of the study area. Within the context of cropland non-agriculturalization (CNA), macro-scale studies often focus on urban or provincial patterns, benefiting from higher data accessibility, whereas micro-scale studies emphasize spatial distribution and transitions at the grid-cell level, which are advantageous for understanding the underlying drivers and mechanisms.
This study adopts a meso-scale representation based on street-block units to construct the spatial graph structure for CNA prediction. The rationale is threefold: (1) As a meso-scale spatial unit, street blocks can capture complex interactions across adjacent units, which reflect spatial spillover effects. For instance, extensive non-agricultural development within a given block may accelerate CNA processes in its neighboring blocks, a pattern that can be effectively captured through graph edge connections. (2) Compared to raster-based representations, block-level graph structures are better suited for integrating multi-temporal and multi-resolution heterogeneous datasets, and for expressing the complex relationships inherent in CNA processes. (3) Street blocks also serve as fundamental administrative units in urban planning and management. Thus, predictions made at this scale are more aligned with practical land-use decision-making, enabling early identification and prioritization of high-risk areas for policy intervention.
To ensure consistent delineation of street-block units across heterogeneous urban-rural contexts, the road network extracted from OpenStreetMap was first refined through manual correction using Sentinel-2 and Google Earth high-resolution imagery. In sparsely roaded rural areas, visually identifiable field paths, irrigation ditches, and linear settlement boundaries were used to supplement missing road segments, thereby forming enclosed polygons. Blocks smaller than 40 ha were merged and those larger than 4000 ha were subdivided to reduce size heterogeneity. As shown in
Figure 5, the resulting 1290 blocks were identified, which have an average area of 795.8 ha. However, the resulting CV of 1.04 indicates substantial heterogeneity in block size, reflecting pronounced spatial variability across the urban–rural transition zones. To mitigate the influence of uneven block size, all density-related features (e.g., building, road and population densities et al.) were normalized by block area. This normalization effectively reduces first-order scale bias, ensuring that feature values are comparable across heterogeneous spatial units. Since block size variation may still affect spatial connectivity and intra-block heterogeneity, a sensitivity analysis (±20% block area adjustment) was conducted to verify the robustness of the model results. The results of the block size sensitivity test are presented in
Table A2. The sensitivity test showed that when block areas were adjusted by ±20%, the variations in model AUC and F1-score were within 1.5%, indicating that the model is robust to moderate changes in block size. All delineations were checked by two independent analysts, and discrepancies below 5% in block boundaries confirmed the internal consistency of the segmentation.
3.3.2. Feature Calculation
Following the data sources outlined in
Section 2.2 and the feature framework developed in
Section 3.2, a complete computational pipeline was established to translate conceptual definitions into quantifiable indicators for each node.
Imagery Features
Spectral features were calculated by computing the arithmetic mean of surface reflectance values for each node in four Sentinel-2 bands: Blue (B2/0.490 μm), Green (B3/0.560 μm), Red (B4/0.665 μm), and Near-Infrared (B8/0.842 μm), resulting in a four-dimensional spectral feature vector. Texture features were derived using the gray-level co-occurrence matrix (GLCM) method, calculated across four directions, specifically the horizontal, vertical, and two diagonal axes. From these, four indices were extracted: contrast, entropy, angular second moment (ASM), and inverse difference moment (IDM), capturing image sharpness, complexity, uniformity, and local consistency, respectively.
Land Cover Features
As shown in
Figure 6, for static semantic representation of land cover, each region was treated as a “document”, each pixel’s land cover type as a “word”, and sequences generated by random walks as “sentences”. For instance, with a walk length of five, a sample sentence could be “water–cropland–water–water–forest”. These sequences formed a land cover corpus, used to generate high-dimensional semantic embeddings via the Word2Vec Skip-Gram model. The weighted average of embeddings in each node was reduced to 10 principal components via PCA to form semantic land cover features. Additionally, dynamic NDVI time-series features were derived by computing quarterly averages from monthly NDVI data to capture seasonal phenological patterns and differentiate fallow, transitional, or intensively cultivated areas.
Topographic Features
Topographic variables were derived from DEM data. Using ArcGIS zonal statistics, each node’s slope, aspect, and elevation range were computed. Elevation variability was quantified using the coefficient of variation of elevation, resulting in seven topographic features.
Socioeconomic Features
Socioeconomic indicators were extracted using ArcGIS 10.8 from both remote sensing and crowdsourced vector data. Accessibility metrics, including road density and bus stop counts, were computed from OpenStreetMap and bus data. Building density was calculated from rooftop area datasets. Economic vitality was characterized using GDP, secondary industry output and growth, and nighttime light indices (TNLI and ANLI). Population pressure was measured by aggregating WorldPop pixel-level data within each node to obtain total population and dividing by node area to calculate population density.
3.4. Cropland Non-Agriculturalization Susceptibility Prediction Model
The proposed Cropland Non-Agriculturalization Graph Attention Network (GS-GAT) comprises two core components: a GraphSMOTE-based sample balancing module and a Graph Attention Network (GAT) module. As illustrated in
Figure 7, the geographical graph containing susceptibility prediction features and class labels is first augmented using the GraphSMOTE strategy to improve minority class representation. The resulting augmented graph is then fed into the GAT model. Through a multi-head attention mechanism driven by attribute similarity, node features are aggregated within the convolution layers. Following multi-layer feature updates guided by both classification and structural reconstruction losses, the model outputs predictions of CNA susceptibility levels for the subsequent year.
3.4.1. Minority Node Augmentation Based on GraphSMOTE
In CNA susceptibility prediction, certain classes are underrepresented in the dataset, which introduces bias toward majority classes, leading to reduced prediction accuracy. To address this issue, this study incorporates a sample balancing strategy based on GraphSMOTE, a graph-based extension of the classical SMOTE algorithm proposed by Zhao et al. [
52]. GraphSMOTE is specifically designed to tackle class imbalance in graph-structured data by oversampling minority class nodes while preserving the topological structure and attribute distribution of the original graph. As shown in
Figure 8, the algorithm operates by leveraging the neighborhood relationships of minority nodes. Using feature vectors of original minority nodes and their nearby neighbors, it generates synthetic samples through linear interpolation, and also creates new edges connecting synthetic and real nodes to maintain the graph’s connectivity. The interpolation process accounts for both attribute similarity and graph topology, ensuring the generated nodes are realistic and topologically consistent. To prevent label leakage, augmentation is only applied during training.
3.4.2. GCNN with Graph Attention Mechanism
A three-layer Graph Attention Network (GAT) is employed to model spatial feature aggregation and susceptibility classification. Originally proposed by Veličković et al. [
41], GAT addresses the limitations of traditional GNNs by introducing an attention mechanism that allows for the adaptive assignment of weights to neighboring nodes based on their features. This flexibility enables the model to better capture both local and global graph structure.
Aligned with Tobler’s First Law of Geography and the spatiotemporal nature of CNA, an attribute-driven attention mechanism is adopted. At each layer, node embeddings are updated by computing the similarity between a center node and its neighbors in terms of attribute features. Higher similarity yields greater attention weights, and the final embedding is a weighted sum of neighbor features. The attention-based update is formally expressed as:
where
represents the integrated attribute features of node
i and node
j,
denotes the feature concatenation operation,
is the normalized attention weight coefficient,
stands for the updated node features, and
n indicates the current number of graph convolutional layers.
During model training, both classification loss and structural reconstruction loss are jointly optimized. After applying the GraphSMOTE strategy to augment minority-class nodes within the geographical graph, which comprises CNA susceptibility features and corresponding labels, several temporally distinct augmented graphs are simultaneously fed into the GAT model for parallel training. The model computes the error between the node-level outputs from each GAT layer and the one-hot encoded ground truth labels, and uses backpropagation to iteratively update the trainable parameters. This process yields susceptibility predictions across all time periods. In each training iteration, the model takes the node features and edge indices of the current graph batch as input and performs forward propagation, producing two outputs, unnormalized node-level logits and raw edge-level logits. The classification loss is computed using categorical cross-entropy between the node logits and the ground truth labels. Simultaneously, the structural reconstruction loss is calculated using binary cross-entropy between the predicted edge logits and the ground-truth adjacency matrix. These two loss components are combined using a weighted sum to form the total loss:
Here, is a tunable hyperparameter that balances the contribution of structural consistency to the overall loss.
This formulation ensures that classification loss primarily drives the node-level prediction task, while the structure-aware reconstruction loss acts as a regularization term. By preserving the original graph topology in the learned embeddings, this design mitigates overfitting, enhances generalization, and partially offsets the bias introduced by class imbalance during training.
3.5. Model Training and Evaluation Settings
The model implementation was conducted using the PyTorch 1.7.0+cu101 deep learning framework and the Python 3.8 programming language. The experimental environment consisted of an Intel Core i7-10700 CPU and an NVIDIA GeForce GTX 1660 SUPER GPU. Based on the feature dataset established in
Section 3.2, the entire dataset was divided into a training set and a test set at a ratio of 7:3. Each temporal period contained 896 training samples and 394 test samples. Model hyperparameters, including epoch, learning rate, number of layers, attention heads, and the
of joint loss are optimized via grid search. The hyperparameters and cross-validation results are showed in
Table A3. The training batch size was set to 4. The model was optimized using the Adam optimizer, the cross-entropy loss function, and a cosine learning rate scheduler.
Model evaluation serves as a critical mechanism for quantifying the performance of classification algorithms. In this study, two widely recognized metrics, namely the Receiver Operating Characteristic Area Under the Curve (ROC-AUC) and the Micro-F1 score, are adopted to evaluate the classification performance of the proposed CNA susceptibility prediction model from multiple perspectives.
ROC-AUC is one of the most commonly used metrics for multi-class classification problems. It evaluates the model’s ability to distinguish between positive and negative instances, offering a comprehensive measure of discriminative performance. This metric is particularly well-suited for imbalanced classification tasks such as CNA susceptibility prediction. The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) across various classification thresholds. The AUC value, defined as the area under the ROC curve, ranges from 0 to 1, with higher AUC values indicating stronger classification performance. In this study, the One-vs-One (OvO) strategy is adopted to compute multi-class ROC-AUC values.
The Micro-F1 score, defined as the harmonic mean of precision and recall, evaluates the model’s performance by treating all classes collectively. This metric emphasizes holistic model performance across all instances, making it especially effective under conditions of class imbalance. It avoids overemphasizing dominant classes and better reflects the model’s practical classification capability.
These performance metrics are derived from four fundamental statistics. True Positives (TP): the number of instances correctly predicted as class i. False Positives (FP): the number of instances from other classes incorrectly predicted as class i. True Negatives (TN): the number of instances not in class i correctly predicted as such. False Negatives (FN): the number of instances in class i incorrectly predicted as belonging to another class. In the context of CNA susceptibility prediction, these metrics help quantify the model’s ability to identify high-risk regions while minimizing misclassification across different susceptibility levels. For multi-class classification tasks, confusion matrices are also employed to visualize prediction accuracy and misclassification tendencies across all classes.
The formulas used to compute these evaluation metrics are as follows:
Here, C represents the number of classes, (i, j) denotes a pair of classes, and it refers to the number of thresholds used to calculate the AUC value for the class pair (i, j).
5. Discussion
The prediction of CNA susceptibility represents a critical intersection of geographical information science, environmental economics, and sustainable land management. While existing monitoring methods have predominantly focused on retrospective change detection [
53] and high-frequency pixel-level analysis [
54], a crucial paradigm shift is introduced here. By conceptualizing CNA as a structural outcome arising from the complex interplay of multi-source features within street-block units, movement beyond simple spectral change detection is achieved toward a systematic understanding of the latent propensity for land-use transition. The experimental results, which demonstrate an average AUC of 85.6% and a relaxed F1 score of up to 91%, provide robust evidence that the GS-GAT framework effectively decodes the intricate functional skeleton of the urban-rural transition zone.
5.1. Meso-Scale Representation and Multi-Resolution Data Fusion
Previous research concerning spatial patterns of land-use change is frequently limited by a scale mismatch between data resolution and the operational units utilized in land governance. Although traditional pixel-based models provide indispensable value in the identification of specific land-cover transitions, their efficacy is often reduced when aggregated to areal units where the synergistic effects of various facilities and socioeconomic drivers must be analyzed [
55]. Within the intricate structural composition of Wuhan’s non-central districts, CNA is rarely initiated by an isolated factory or road segment; instead, it is determined by a foundational risk landscape established by dominant functional attributes. Frequent land-use transitions are shaped by the interaction of multiple, often competing forces, such as industrial restructuring, demographic mobility, and policy bargaining.
A primary challenge in multi-source geographic modeling pertains to the integration of datasets characterized by highly disparate spatial resolutions. A feasible methodology for fusing multi-source and multi-resolution data, ranging from 2.5-m rooftop data to 1-km NDVI products, is established via a meso-scale aggregation strategy centered on street-block units. These units function as information containers that encapsulate the statistical properties of multiple features, thereby eliminating resampling errors and artificial patterns that typically arise from pixel-level alignment. Scale-induced bias is further alleviated by the normalization of density-related features, such as building and population, relative to the respective block area. Results from the block size sensitivity analysis demonstrate that fluctuations in AUC and F1-score performance are maintained below 1.5% even when block sizes vary by ±20%, confirming that the GS-GAT framework is robust against the inherent resolution disparities of the input data. A transition from micro-facility counts to macro-functional dimensions is realized through the implementation of this meso-scale representation. Owing to the fragmentation of semantically coherent urban units and the loss of structural continuity caused by uniform grid decomposition, raster-based frameworks are inherently limited in representing cross-unit spatial interactions [
56]. In contrast, the block-level graph representation method adopted in this study facilitates the explicit modeling of inter-block dependencies, thereby providing an effective mechanism for capturing spatial spillover effects.
5.2. Geographic Principles and Semantic Landscape Embedding
From a theoretical perspective, the superior performance of the GS-GAT model is rooted in its deep coupling with the fundamental principles of geography. While traditional machine learning models are limited by treating spatial units as independent samples [
25], Tobler’s First Law of Geography is explicitly encoded by the attribute-driven attention mechanism. At each layer, node embeddings are updated by computing the semantic similarity between a center block and its neighbors. It is ensured that not only the static attributes of a parcel but also the dynamic pressures exerted by its geographical context are captured.
Furthermore, a significant methodological innovation is represented by the use of the “region as document” metaphor for land cover semantic embedding. By treating pixels as words and spatial sequences as sentences for Word2Vec embedding, movement beyond simple categorical encoding is facilitated. The transition logic embedded in the spatial configuration of land features is captured through this approach. For instance, a block containing a specific sequence such as cropland-water-construction might signal a higher latent tendency for further development than a purely agricultural cropland-cropland sequence. Contextual meanings of the landscape are preserved by this high-dimensional semantic space, allowing for a distinction to be made between stable agricultural zones and unstable transitional zones characterized by fragmented land patterns and spectral heterogeneity.
5.3. Assessment of Model Robustness and Mitigation of Data Imbalance
As demonstrated by feature dimension ablation experiments, overall performance is observed to remain relatively stable even under conditions of reduced feature inputs. Specifically, because the decrease in Micro-F1 never exceeded 5%, a high degree of feature robustness is exhibited by the GS-GAT model. Strong performance is maintained even in the presence of missing or degraded features through the effective utilization of remaining information. Better generalization and stability are implied by this robustness, as generalized patterns relevant to CNA susceptibility are learned without over-reliance on any specific feature subset. Such ablation results indirectly reflect the benefits of the attention mechanism embedded within GS-GAT. Focus is dynamically shifted toward the most informative features under varying input conditions via the assignment of attention weights based on content similarity and node associations, thereby ensuring that prediction stability is maintained. Additionally, the bias inherent in the original highly imbalanced dataset was effectively mitigated through the GraphSMOTE strategy. By synthesizing minority-class nodes while preserving topological consistency, a test F1 of 0.77 was achieved. It is confirmed that addressing class imbalance is a theoretical necessity for modeling rare but high-impact geographic events.
5.4. Strategic Utility for Proactive Cropland Governance
A key strength of the proposed GS-GAT framework is its ability to facilitate proactive cropland protection by transforming land monitoring from post-event damage assessment to anticipatory risk management. Conventional regulatory and observation systems largely rely on post hoc detection of land use and land cover change, such as satellite-based identification of unauthorized cropland conversion, which is inherently limited by temporal latency and omission errors, particularly in remote, fragmented, or topographically complex landscapes. In contrast, the GS-GAT model achieved a reduction in missed detection rates of 18% to 25% relative to conventional machine learning baselines, including support vector machines and XGBoost, with particularly strong performance in heterogeneous and mixed-use land systems. By identifying high-risk parcels prior to irreversible land conversion, inspection efforts can be prioritized, regulatory resources can be allocated more efficiently, and early intervention can be implemented to mitigate long-term land degradation.
Proactive monitoring and risk prediction frameworks have been increasingly recognized as essential components of sustainable land system governance and food security assurance. Recent studies have demonstrated that rapid cropland non-agriculturalization monitoring based on multi-source remote sensing data can support large-scale protection initiatives by enabling timely detection of conversion trajectories that are often overlooked by conventional methods [
17]. Furthermore, systematic near real-time crop type and land cover mapping from satellite observations has become a foundational element of adaptive agricultural management, enhancing the characterization of spatial and temporal dynamics across heterogeneous agroecosystems [
57].
Beyond regulatory enforcement, proactive cropland protection plays a critical role in strengthening agricultural system resilience and long-term food security. Cropland loss through abandonment or conversion to non-agricultural land uses has been widely associated with increased food supply vulnerability and ecosystem degradation, which has led to growing calls for integrated monitoring and early warning systems within both scientific and policy communities [
58]. Within this context, predictive models that integrate geospatial, temporal, and environmental drivers provide actionable decision support for land managers and policymakers, thereby enabling evidence-based land use planning and sustainable agricultural development.
Finally, the adoption of a relaxed correctness metric in the evaluation process is intended to reflect the operational tolerances of land management and regulatory practice. Although the classification of moderate-risk parcels as high risk may be considered an overestimation from a purely statistical perspective, such conservative labeling is consistent with precautionary land protection strategies and contributes to a reduced probability of cropland loss. Early identification of vulnerable areas enables the implementation of preemptive management measures that help preserve soil productivity, landscape stability, and associated ecosystem services.
5.5. Prospective Research Avenues
Building on the present framework, several promising directions can be pursued to further advance CNA susceptibility modeling. Future research may refine the representation of transitional categories by incorporating more discriminative semantic, temporal, and regulatory indicators, thereby enhancing the resolution of intermediate risk patterns. In addition, the integration of explicit policy and institutional constraints, such as redline boundaries and permanent basic farmland protection zones, would enable closer alignment between model outputs and real-world land governance processes. The analytical depth of the framework can also be strengthened through model interpretability techniques, including SHAP-based feature attribution and attention weight visualization, which would clarify how neighborhood interactions and functional contexts shape local conversion risks. Such tools would not only support scientific interpretation but also provide actionable insights for land managers. More broadly, continued development of macro-functional representations will facilitate a conceptual transition from micro-facility counting toward systematic functional environments, enabling a more comprehensive understanding of how urban–rural opportunity structures condition future cropland trajectories.
6. Conclusions
This study proposes GS-GAT, a novel method for predicting cropland non-agriculturalization (CNA) susceptibility, integrating multi-source remote sensing and socioeconomic data with a graph attention network architecture. By constructing a street-block-level heterogeneous graph structure, the model fuses four types of features, including imagery, land cover, topography, and socioeconomics, while incorporating the GraphSMOTE strategy to address class imbalance. The framework enables early prediction of CNA susceptibility levels for the following year. By leveraging four consecutive annual datasets from 2018 to 2022, cross-year parallel training and validation were conducted to systematically evaluate model performance across fragmented and mixed-use farmland scenarios, thereby offering a new technical pathway for cropland protection and refined land-use supervision in urban fringe zones.
Experimental results from non-central districts in Wuhan demonstrate the feasibility and effectiveness of the proposed method. The GS-GAT model achieved an average AUC of 85.6% and an average F1 score of 82.6% across the four test years. Under the relaxed prediction criterion, the AUC and F1 scores increased to 93% and 91%, respectively. The ablation study demonstrated that removing the GraphSMOTE strategy significantly decreased the number of minority-class samples and led to a sharp drop in AUC from 80.53% to 60.4%, confirming the necessity of minority augmentation strategies in high-heterogeneity regions. Comparative experiments with traditional models such as SVM and XGBoost further validated the superiority of GS-GAT, particularly in identifying fragmented and mixed-use farmland. GS-GAT reduced the missed detection rate by 18%–25%, enabling early identification of high-risk plots, supporting targeted on-site inspection and policy design for cropland protection, and reducing monitoring and management costs.
Nevertheless, the current feature system does not yet incorporate soil physicochemical properties or detailed land policy variables, limiting the model’s ability to interpret complex driving mechanisms. Although the attention weights in the GAT reflect the importance of neighborhood features, they lack explicit mapping to policy constraints such as redlines and permanent basic farmland boundaries. Future work will incorporate additional observational and regulatory constraint data, improve interpretability through SHAP values and attention weight visualization, and perform long-term validation in more representative regions. These efforts aim to assess the model’s generalizability and stability across varying climatic and socioeconomic conditions, ultimately providing stronger support for cropland protection and sustainable development.