Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag

Jiang, Guiyan; Zhang, Donghui

doi:10.3390/land15040535

Open AccessArticle

Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag

by

Guiyan Jiang

^1,* and

Donghui Zhang

²

¹

Key Laboratory of Land Resources Survey and Planning of Qinghai Province, School of Politics and Public Administration, Qinghai Minzu University, Xining 810007, China

²

Institute of Remote Sensing Satellite, China Academy of Space Technology, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Land 2026, 15(4), 535; https://doi.org/10.3390/land15040535

Submission received: 29 January 2026 / Revised: 5 March 2026 / Accepted: 17 March 2026 / Published: 25 March 2026

(This article belongs to the Special Issue Rethinking Urban–Rural Dynamics Through the Lens of Social Geography)

Download

Browse Figures

Versions Notes

Abstract

Urban sustainability depends on rural hinterlands, yet national-scale evaluation and AI screening often rely on urban-centric proxies, which can under-recognize remote villages where the evidence base is sparse. Using China’s national honored-village programme (N = 24,450) as a case, we examine how recognition patterns change when data availability and observability are unequal across regions, with a focus on the Qinghai–Tibetan Plateau (QTP), where 923 honored villages account for only 3.78% of the national total. We interpret urban-centric proxy reliance as the tendency for recognition patterns to correlate with urban-linked observability signals (e.g., nighttime lights). In this study, discovery lag refers to situations where villages exhibit characteristics similar to historically recognized villages but remain unrecognized under the current honor regime due to uneven data availability and observability. Methodologically, we build a scene-aware predictive framework that integrates multi-source geospatial indicators and explicitly handles extreme imbalance and environmental heterogeneity to estimate recognition likelihood under the current honor regime, treating national honor lists as administratively produced recognition outcomes rather than objective measures of village value. The model highlights four high-probability nomination belts on the QTP and reveals a pronounced DEM–NTL decoupling: the median NTL of currently honored QTP villages is 0, suggesting that NTL-based urban proxies can fail in high-altitude, data-scarce contexts. Overall, the observed under-representation is consistent with uneven observability and institutional constraints within the current honor system, and the proposed framework provides a scalable diagnostic and screening tool for identifying villages with high predicted recognition likelihood and supporting more evidence-aware rural data collection.

Keywords:

Qinghai–Tibetan Plateau villages; honor-oriented village recognition; scene-aware multi-label learning; TabPFN ensemble; visibility and discoverability mechanism; multi-source remote sensing and statistical data; spatial extrapolation

1. Introduction

The fusion of Artificial Intelligence (AI) and multi-source remote sensing data is profoundly transforming how we analyze and manage cities, giving rise to new paradigms like “intelligent urban governance” and “resilient city design” [1,2,3]. However, the resilience and sustainability of the “urban future” cannot be achieved within the urban boundary alone [4,5]. It is highly dependent on the ecological services, cultural landscapes, and tourism identities provided by its vast rural hinterland [6,7,8]. Therefore, the governance of “smart cities” must include the capacity to intelligently and equitably interpret these critical rural support systems.

At present, however, most recognition practices are designed around information-rich, well-connected lowland regions (i.e., urbanized or peri-urban areas) [9]. This reliance on data and accessibility creates a systematic “urban-centric” or “lowland bias” [10]. When AI models are trained on such unevenly distributed data, they may misrepresent remote, data-scarce regions whose observable indicators differ from those of well-documented lowland areas, widening the gap between observation and intelligent interpretation [11].

Similar proxy-based approaches that combine satellite observations with machine learning have been used worldwide to infer development-related outcomes under data scarcity; our work extends this stream by focusing on administratively mediated rural recognition as the outcome of interest.

China’s national honor system for villages serves as an excellent case study of this bias [12,13]. In this paper, “honored villages” refers to rural settlements that receive formal, list-based recognition through China’s national or ministerial designation programmes, which confer public titles across themes such as heritage conservation, rural tourism, ecological quality, governance performance, or ethnic cultural distinctiveness. It has become a formal recognition-and-documentation mechanism, yet in our consolidated national inventory more than four-fifths of the records are located east of the 105° E meridian, showing a strong spatial skew [14,15]. Importantly, the “honor village system” is not a single list but a portfolio of designation schemes issued by different authorities and released in batches over time, and inclusion typically depends on nomination dossiers and review procedures initiated or supported by local governments. In other words, national-level recognition is primarily produced through criteria- and dossier-based administrative evaluation rather than a publicly specified AI model. Consequently, recognition reflects not only village attributes but also the administrative capacity to document and present them; accordingly, we treat honor labels as administratively mediated recognition outcomes under a given system, rather than as a direct measure of intrinsic village “value”. National honour lists are administratively produced and can be shaped by policy cycles, nomination capacity, and regional governance priorities. Recognition may also feed back into future visibility and nomination resources. Therefore, under-representation may reflect a combination of discovery/visibility frictions (“discovery lag”) and institutional constraints, rather than a single cause.

The Qinghai–Tibetan Plateau (QTP) is an extreme case of a low-observability setting within this national recognition framework [16]. It is ecologically and culturally outstanding, and its villages clearly possess nomination-quality attributes [17]. Nevertheless, only about 3.7% of all honored villages in our national inventory fall within the QTP. This is the “discovery lag” we target: village characteristics associated with recognition do not necessarily translate into visibility within a lowland-oriented (urban), information-intensive national system.

A less examined question in existing evaluations is: when the evidence base itself is unequal, what controls the transition from locally observable village characteristics to national-level recognition outcomes? [18]. Here, an “unequal evidence base” means that villages differ systematically in how observable they are to national evaluators and to data infrastructures—through transport connectivity, administrative reporting capacity, digital footprints, and the availability of standardized indicators. Under such asymmetry, a village may be locally valuable yet remain weakly documented or difficult to compare, which lowers its probability of being recognized even before any substantive evaluation takes place. This matters because recognition is not merely symbolic: it can reallocate attention, resources, and subsequent opportunities, thereby reinforcing spatial inequalities if observability is uneven. This therefore becomes a challenge of using AI to diagnose recognition patterns under conditions of data scarcity and spatial non-stationarity [19].

This study, therefore, takes a different stance. We utilize a progressive, scene-aware AI ensemble framework that fuses multi-source remote sensing (ERA5, MODIS, VIIRS, Sentinel-5P) and statistical data. The core of this framework is a learner based on TabPFN (a foundation model for tabular data) [20], which is trained to separate two effects that are normally entangled: (i) the ecological and geographic gatekeepers that make a village eligible, and (ii) the visibility and accessibility signals that make it discoverable.

Our objective is to expose the drivers of being found and being acknowledged in a system whose training data comes mainly from somewhere else (lowlands/cities). We argue that the plateau is constrained, not inferior; current under-representation may reflect the design and reach of the honor apparatus as well as uneven observability of village characteristics rather than differences in intrinsic village value [21]. This research demonstrates how AI and multi-source data fusion can be used to diagnose recognition patterns under uneven observability, thereby providing a data-driven basis for identifying villages with high model-estimated recognition likelihood and informing future verification and data collection efforts related to urban–rural sustainable development [22].

2. Materials and Methods

2.1. Study Area

This study focuses on the Qinghai–Tibet Plateau (QTP) and its surrounding transition belts, which together constitute one of China’s most distinctive high-altitude human–environment regions [23]. For the purpose of village-level analysis, the study area is defined administratively by county boundaries within the plateau extent, and environmentally by elevation, relief, and climatic gradients that jointly constrain settlement form and accessibility. The QTP is characterized by complex topography (large altitudinal range, extensive valley–basin systems), cold and arid to semi-arid climate, and a sparse but highly clustered rural settlement pattern [24]. Within this area, we identified all honor-designated villages included in the national inventories, and extracted their locations as the target subset for subsequent modelling [25] (Figure 1).

2.2. Datasets

2.2.1. Rural Honor Village Datasets

Our data strategy is based on two considerations: the sample breadth of a national-scale dataset and the sample sparsity within our Qinghai-Tibetan Plateau (QTP) study area.

This study constructs a nationwide dataset of honored villages (N = 24,450). At a macro-scale, this dataset exhibits significant heterogeneity and severe imbalance. National heatmaps clearly show that the vast majority of honored villages are concentrated in eastern and southern China; specifically, approximately 84% of samples are located east of the 105° E meridian, while the expansive western regions are relatively sparse (Figure 2).

In sharp contrast is our QTP study area. Although the plateau possesses a large base of natural villages, particularly in river valleys, and is a significant enrichment zone for “China’s Traditional Villages,” its absolute number of villages included in national-level honor lists remains relatively scarce.

This mismatch between locally distinctive characteristics and uneven assessment coverage, termed the ‘discovery lag’, constitutes the core challenge of this study. Positive samples (honored villages) local to the QTP are extremely scarce, making it impossible to build a robust assessment model using only local data. Furthermore, the dataset faces extreme class imbalance; for example, the ‘National Famous Historical and Cultural Towns and Villages’ label has a national prevalence of only 4.49% and is even rarer in the QTP.

Therefore, to identify candidate villages on the QTP that may align with historically observed recognition patterns but have not yet appeared in the published lists, we must leverage the statistical power of the national-level dataset (N = 24,450). By learning the global patterns of honored villages from a broader geographical and environmental context, we can build a generalizable model capable of overcoming local sample sparsity and applying to the QTP’s specific environmental constraints.

The foundation of our modelling task is a structured representation of official recognition outcomes across multiple national honor lists. We selected seven key national honor lists as observed recognition outcome labels for this study. These lists are treated as administratively issued outcomes reflecting institutional criteria and documentation-based evaluation, rather than as direct measures of intrinsic village value. This multi-label formulation enables us to model recognition patterns across designation schemes and to estimate recognition likelihood under the current honor regime. The following table summarizes the official details, release batches, and issuing authorities for these seven foundational datasets (see Table 1).

A focused analysis of the Qinghai-Tibetan Plateau (QTP) samples reveals a unique profile. Based on the screening criteria (elevation > 1500 m and within specific geographic coordinates), this region contains 923 honored villages. This figure accounts for only 3.78% of the total national sample (N = 24,450), confirming the absolute scarcity of recognized villages in the study area (Figure 3).

The QTP samples diverge significantly from the national average in key features. Elevation (DEM) is the primary differentiator: the QTP samples have an average altitude of 2665 m, far exceeding the national average (632.52 m). Nearly half (48.4%) of these villages are situated in marginal river valleys (1500–2500 m), while 50% are in high-altitude agricultural and pastoral zones (2500–4500 m).

Economic indicators reflect this disparity. The region’s average Nighttime Lights (NTL) value (1.72) is lower than the national average (2.09), and the median is 0.0, indicating that over half of the villages have almost no detectable NTL signal.

The most critical divergence is in the composition of honor labels. The QTP is not a simple microcosm of the national distribution; it possesses a distinct “honor identity.” Within the plateau, the ‘List of Chinese Traditional Villages’ (type3) is the most prevalent label (approx. 412 villages, 44.6%). This is the region’s most distinguishing feature, accounting for nearly half of all its honors. Other labels are less common; for instance, ‘Chinese Ethnic Minority Characteristic Villages’ (type4) accounts for only approximately 90 villages (9.8%).

This revised distribution indicates that while the QTP comprises only 3.78% of the national honor samples, its share of ‘Chinese Ethnic Minority Characteristic Villages’ (approx. 3.75%, or 90/2410) is proportional to its overall sample size, rather than being a specific area of concentration. The true anomaly in the QTP’s honor identity is its significant reliance on the ‘Traditional Villages’ designation.

In contrast, other labels like ‘National Famous Historical and Cultural Towns and Villages’ (type1) are extremely rare (approx. 24 villages).

2.2.2. External Environmental Covariates

To assess the complex drivers of rural honor designation, a multi-faceted set of predictors is required. The 19 external factors detailed in Table 2 were selected to construct a holistic geo-environmental and socio-economic fingerprint for each village, moving beyond simple location.

This framework is built on four key pillars:

(1): Static Topography and Position (DEM, Lat/Lon): These factors establish the fundamental physical constraints and geographic context, which are critical for differentiating the extreme and high-altitude environment of the QTP from other regions.
(2): Dynamic Climate (ERA5): A comprehensive suite of reanalysis variables (e.g., temperature, precipitation, radiation) was included to model the “hydrothermal conditions” that govern agricultural suitability (relevant to “One Village, One Product” honors) and human habitability [36,37,38].
(3): Surface Ecology (MODIS): Vegetation and land surface temperature indices (e.g., EVI, NDVI, LST) serve as direct measures of ecological quality and land-atmosphere exchange, forming a key baseline for “National Forest Villages” and “Rural Tourism” honors [39,40].
(4): Socio-Economic Proxies (VIIRS/Sentinel-5P): Nighttime lights (NTL) and atmospheric NO₂ are used as proxies for human activity intensity and infrastructure-linked observability (e.g., visibility, service presence, and accessibility), which can shape the evidence available for national recognition; we therefore avoid interpreting NTL as a direct indicator of intrinsic village value. These factors help differentiate remote, traditionally preserved settlements (like “Traditional Villages”) from more developed and accessible centers [40,41,42,43].

This multi-source, multi-domain approach provides the rich, high-dimensional feature space necessary for the model to resolve the profound spatial heterogeneity of the dataset. It is this combination of factors that enables the identification of distinct “Eco-Social Scenes” and the learning of their unique local patterns (see Table 2).

2.3. Progressive Scene-Aware Ensemble Prediction

The predictive task is defined by the data’s inherent challenges. We face extreme class imbalance (e.g., 4.49% prevalence for ‘National Historical and Cultural Villages’) and profound spatial heterogeneity, where the environmental drivers for an honor type are not nationally uniform. A naive, ‘one-size-fits-all’ model is ill-equipped for this task; it would either fail to learn from rare instances or be confounded by contradictory regional patterns. Our methodology is therefore designed as a progressive, scene-aware ensemble, constructed in stages to systematically deconstruct and resolve these compounding challenges. Because the labels used in this study are drawn from officially issued honor lists, all probabilities reported below are interpreted as regime-conditioned recognition likelihood—i.e., similarity to historically recognized villages under the same administrative system—rather than as independent measures of intrinsic village merit.

2.3.1. Imbalance-Constrained Sampling and Bagging

A primary challenge in this study is the identification of rare honor types, particularly within unique geo-ecological contexts like the Tibetan Plateau. Such samples are inherently scarce, and their environmental drivers are distinct. To construct a model capable of learning these specific patterns, we must leverage the statistical power of a national-level dataset (N = 24,450).

We selected TabPFN as our base learner, prized for its strong inductive bias on tabular data and its parameter-free nature, which is ideal for scenarios with complex interactions and a risk of overfitting. However, applying TabPFN directly to our national dataset introduces two significant technical constraints that must be mitigated [44].

First, a Capacity Constraint: TabPFN’s pre-trained priors are optimized for datasets where

N < 10, 000

. Our K-fold training splits (19,560 samples) substantially exceed this optimal limit. Second, Information Sparsity: The multi-label data exhibits an extreme class imbalance. For instance, the ‘National Historical and Cultural Villages’ label has a prevalence of only 4.49% (1103 instances). A naive random subsampling, adopted merely to satisfy the capacity constraint, would catastrophically dilute these already scarce positive instances, rendering their underlying patterns unlearnable [45,46].

To simultaneously address both constraints, we devised an information-preserving sampling strategy. The core principle of this strategy is the full retention of all positive instances. For any given label

y_{j}

, its training set

D_{train}

is partitioned into positive

D_{train}^{+}

and negative

D_{train}^{-}

sets. The procedure is as follows [47]:

(1): Retain the complete set of $N_{p o s} = | D_{train}^{+} |$ positive samples.
(2): Calculate the negative sample quota $N_{n e g_q u o t a}$ based on the model’s maximum capacity $N_{\max} = 9000$ :

$N_{n e g_q u o t a} = \max (0, N_{\max} - N_{p o s})$

(1)
(3): Randomly draw, without replacement, $N_{n e g_s a m p l e d} = \min (N_{n e g_q u o t a}, | D_{train}^{-} |)$ negative samples to form the set $D_{n e g_s a m p l e d}$ .

The resulting training subset,

D_{s a m p l e d} = D_{train}^{+} \cup D_{n e g_s a m p l e d}

, thus preserves the maximum possible signal for rare classes while adhering to the model’s capacity limit.

This sampling process, however, introduces stochasticity through the random selection of negative instances. To stabilize the model and mitigate this sampling-induced variance, we embed this procedure within a Bagging (Bootstrap Aggregating) framework. We train an ensemble of

M = 3

independent models,

f^{(m)}

, where each is trained on a subset

D_{s a m p l e d}^{(m)}

generated with a unique random seed. The final probability for a validation instance

x_{v a l}

is the arithmetic mean of the ensemble’s predictions:

P_{f i n a l} (x_{v a l}) = \frac{1}{M} \sum_{m = 1}^{M} f^{(m)} (x_{v a l}) = \frac{1}{3} \sum_{m = 1}^{3} P (y = 1 | x_{v a l}, D_{s a m p l e d}^{(m)})

(2)

This combined approach produces a robust and stabilized learning component (which we term ISB-PFN). It effectively manages severe class imbalance under a strict capacity constraint. The resulting module provides the stable, foundational probability estimates that are essential for the construction of the subsequent global and local pattern models (Figure 4).

2.3.2. Global TabPFN-Based Multi-Label Learner

Having established a robust component (the ISB-PFN learner) capable of handling our dataset’s imbalance and capacity constraints, we first deploy it to capture the global patterns of honor-type association. The objective of this stage is not to create a final predictive model, but to generate a foundational set,

P_{g l o b a l}

, of unbiased, baseline probability estimates for the entire dataset [48].

To achieve this, we implement the ISB-PFN learner within a Multilabel Stratified K-Fold (MSKF) cross-validation framework (where K = 5). This stratified approach ensures that the rare class prevalences are preserved across all folds (Figure 5).

The process is executed independently for each of the

L = 7

honor-type labels [49]. For a given label

y_{l}

, the dataset

D

is split into

K = 5

disjoint folds:

D = ⋃_{k = 1}^{K} F_{k} a n d F_{i} \cap F_{j} = \emptyset f o r i \neq j

(3)

For each fold k, a model

f_{l, k}

is trained using the ISB-PFN procedure (2.3.1) on the training set

D_{t r a i n, k} = D \ F_{k}

:

f_{l, k} \leftarrow I S B - P F N (X_{t r a i n, k}, y_{t r a i n, k})

(4)

This model is then used to predict probabilities

{\vec{p}}_{o o f, k}

exclusively for the samples in the held-out fold

D_{v a l, k} = F_{k}

:

{\vec{p}}_{O O F, k} = f_{l, k} (X_{v a l, k})

(5)

This Out-of-Fold (OOF) prediction mechanism ensures that no sample’s probability is estimated by a model that was trained on it, thereby preventing information leakage and data contamination. By iterating this process across all K folds, we assemble a complete, unbiased probability vector

{\vec{p}}_{g l o b a l, l}

for all N samples by concatenating the fold-level predictions:

{\vec{p}}_{g l o b a l, l} = {⋃_{k = 1}^{K} \vec{p}}_{o o f, k} (w h e r e {\vec{p}}_{g l o b a l, l} \in R^{N})

(6)

This entire K-fold procedure is repeated for all L = 7 labels. The resulting vectors are concatenated horizontally to form the final Global OOF Probability Matrix,

P_{g l o b a l}

:

P_{g l o b a l} = [{\vec{p}}_{g l o b a l, 1} ∣ {\vec{p}}_{g l o b a l, 2} ∣ \dots ∣ {\vec{p}}_{g l o b a l, L}]

(7)

P_{g l o b a l} \in R^{N \times L} (w h e r e N = 24, 450, L = 7)

(8)

This

P_{g l o b a l}

matrix represents the model’s “global consensus” on the likelihood of each village receiving each honor, based on nationwide patterns. It serves as the primary set of foundational meta-features for the final fusion learner.

2.3.3. Scene-Cluster Conditioned Learners

To address this spatial heterogeneity, we first partitioned the national dataset (N = 24,450) into K = 6 distinct “Eco-Social Scenes”. This partitioning was performed using K-Means clustering on the primary environmental covariates (DEM, NTL, NDVI, and T2M), with K = 6 selected via the elbow method and silhouette analysis. Here, NTL is not treated as a normative proxy of village “value” or “development”. Instead, it is interpreted as an indicator of observability and human-activity visibility, which affects the evidence available for national recognition. It is used as a visibility/observability signal to separate high-visibility vs. low-visibility regimes, which directly relates to our focus on evidence inequality and prevents the scene-aware learners from collapsing to a single lowland/urban proxy logic. This scene partition is used only to define eco-social contexts for scene-aware training and interpretation; the recognition model itself is trained on the complete 19-variable predictor set reported in Table 2. Each village was assigned a unique scene label (c ∈{1, …, 6}).

We then move from a single “generalist” model to a panel of “specialist” models, where each specialist is conditioned on one of these specific Eco-Social Scenes. The global learner provides a robust baseline, but its assumption of spatial homogeneity introduces significant bias. A single global model cannot resolve the contradictory local patterns, for instance, between a high-altitude arid region and a low-altitude humid region [50,51].

The central hypothesis is that by training learners on these K = 6 more homogeneous clusters, we can capture local, high-variance patterns previously obscured in the global model. The objective of this stage is to generate a second set of meta-features,

P_{l o c a l}

, representing these specialized local views [52].

This is achieved by iterating through each of the K = 6 scene-clusters. For a given cluster

c \in {1, \dots K}

:

(1): We isolate the data subset $D_{c}$ containing only the samples belonging to that cluster.
(2): A new, internal Multilabel Stratified K-Fold cross-validation is performed within this subset $D_{c}$ .
(3): The ISB-PFN learner is trained on the internal training folds. The capacity limit is adjusted to accommodate the smaller, more homogeneous cluster data.
(4): Out-of-Fold (OOF) probabilities are generated for all samples within $D_{c}$ , yielding an independent, locally trained probability set $P_{c}$ :

$P_{c} \in R^{| D_{c} | \times L}$

(9)

where $| D_{c} |$ is the number of samples in cluster c and L = 7 is the number of labels.

This process yields K = 6 independent, locally trained OOF probability sets. To prepare them for the final fusion, we construct a single Local OOF Probability Matrix,

P_{l o c a l}

. This matrix is structured as a sparse matrix with dimensions

N \times (K \times L)

:

P_{l o c a l} \in R^{N \times (K \cdot L)} (w h e r e N = 24, 450, K = 6, L = 7)

(10)

The local OOF results

P_{c}

are then “stitched” into this larger matrix. For a given sample i belonging to cluster

c_{i}

, its corresponding row vector

{\vec{p}}_{l o c a l, i}

in

P_{l o c a l}

is formally defined as a block vector:

{\vec{p}}_{l o c a l, i} = [{\vec{0}}_{L} | \dots | {\vec{p}}_{c_{i}} | \dots | {\vec{0}}_{L}]

(11)

where the probability vector

{\vec{p}}_{c_{i}}

(the i-th row of

P_{c_{i}}

) is placed in the

c_{i}

-th block, and all other K-1 blocks are zero vectors

({\vec{0}}_{L})

of length

L

.

This sparse

P_{l o c a l}

matrix is a high-dimensional representation of specialized, conditioned knowledge. It precisely encodes which “eco-social scene” generated each probability estimate. This matrix, alongside the dense

P_{g l o b a l}

matrix, provides the final meta-learner with a comprehensive set of features, capturing both the general national trend and the specific, localized patterns (Figure 6).

2.3.4. Meta-Level Probability Fusion

The progressive architecture yields two distinct sets of unbiased probability features. The first,

P_{g l o b a l}

, represents a stable, low-variance “generalist” consensus derived from nationwide patterns, which may carry high bias. The second,

P_{l o c a l}

, represents a high-variance, low-bias “specialist” consensus, capturing fine-grained, scene-specific patterns [53,54].

The final stage of our architecture, Meta-Level Fusion, is designed to resolve this fundamental bias-variance trade-off. It employs a meta-learner that arbitrates between the generalist and specialist perspectives to produce a final, synthesized prediction.

First, we construct the meta-feature matrix,

X_{m e t a}

, by horizontally concatenating the global and local probability matrices:

X_{m e t a} = [P_{g l o b a l} | P_{l o c a l}]

(12)

where

X_{m e t a} \in R^{N \times (L + K \cdot L)}

. Given L = 7 labels and K = 6 clusters, this results in an

N \times (7 + 42)

, or

N \times 49

, dimensional feature space.

Second, a final meta-learner is trained on this

X_{m e t a}

matrix. We selected Logistic Regression for this task. This choice is deliberate for several reasons:

Robustness: As the meta-features are already probabilities, a linear model is less prone to overfitting this high-dimensional space than complex, non-linear models.

Imbalance Handling: It provides the class_weight = ‘balanced’ parameter, which automatically adjusts weights inversely proportional to class frequencies, mitigating the persistent label imbalance at this final decision stage.

Efficiency: It is computationally lightweight, allowing for rapid and robust cross-validation.

This meta-learner,

f_{m e t a, l}

, is trained independently for each label l. To ensure a robust and unbiased performance evaluation, the meta-model itself is trained and evaluated within its own K = 5 cross-validation framework.

P_{f i n a l, l} = f_{L o g i s t i c R, l} (X_{m e t a}, c l a s s \_w e i g h t = ‘ b a l a n c e d ’)

(13)

The resulting

P_{f i n a l} \in R^{N \times L}

matrix represents the model’s final, synthesized probability estimates. This matrix, which resolves the bias-variance trade-off by integrating both generalist and specialist knowledge, serves as the final output of the predictive framework.

2.4. Feature Attribution Analysis

The core prediction pipeline (2.3) is a multi-stage, stacking-style learner optimized for predictive accuracy. This design, where the final meta-learner ingests out-of-fold (OOF) probabilities rather than the original 19 geo-environmental predictors, is effective for handling imbalance and scene heterogeneity. However, this complexity weakens direct interpretability, as the final decision surface is formed in a transformed probability space.

To obtain policy-relevant explanations—that is, which terrain, climate, and location factors are consistently associated with honor recognition—we establish an auxiliary, interpretation-oriented model. This model is trained on the same sample set and the same 19 inputs, but uses a transparent learner (XGBoost) to produce global and scene-specific importance. This auxiliary model achieved robust predictive performance on its own (e.g., a 5-fold OOF Macro-F1 of 0.618), confirming that its learned feature-label relationships are valid and can be reliably used for attribution analysis. The explanations reported in the Section 3 should thus be read as an attribution to the underlying geospatial factors, not as the internal feature attributions of the stacked TabPFN pipeline.

We employ the XGBoost (Extreme Gradient Boosting) framework for this task. XGBoost is a powerful algorithm that builds an ensemble of decision trees to form a final, highly accurate model. The model’s prediction is the sum of the outputs from all individual trees:

{\hat{y}}_{i} = F (x_{i}) = \sum_{k = 1}^{K} f_{k} (x_{i})

(14)

Each new tree is trained to optimize an objective function that balances the model’s predictive error (Loss) and its complexity (Regularization), which prevents overfitting and improves generalization:

O b j = \sum_{i} l (y_{i}, {\hat{y}}_{i}) + \sum_{k} Ω (f_{k})

(15)

This structure provides robust, built-in mechanisms for quantifying feature importance (e.g., Gain or SHAP values). We leverage this capability to perform attribution analysis in two critical contexts, mirroring the scene-aware logic of the primary model:

Global Attribution: An XGBoost model (or a set of single-label models) is trained on the entire national dataset using the 19 predictors. This reveals the average importance of factors like NTL or DEM across the national context.

Scene-Specific Attribution: Separate XGBoost models are trained within each of the “Eco-Social Scene” clusters. This allows for a direct comparative analysis to test the hypothesis of spatial heterogeneity. For instance, it quantifies how the importance of DEM in the QTP cluster (Cluster 1) diverges from its importance in a low-altitude, economically developed cluster.

2.5. Computing Environment

All experiments were implemented in Python 3.11 using PyTorch 2.2. Training and inference were performed on a workstation running Windows 11, equipped with an Intel Core i9-14900 CPU, 64 GB RAM, and an NVIDIA GeForce RTX 4090 GPU (24 GB VRAM).

3. Results

3.1. Model Performance

All models were trained and evaluated on the national village inventory (N = 24,450) across all seven honor labels. A 5-fold multilabel stratified cross-validation framework was employed to ensure low-prevalence labels were represented in every fold. The finalized, GPU-batched pipeline, consisting of the global learner, scene-specific learners, and meta-fusion, completes a full training and inference cycle in approximately 10 min, confirming its computational efficiency.

Under this evaluation setting, the proposed scene-aware, stacked model achieved a macro-F1 of 0.641 ± 0.005, with a macro-precision of 0.701 and macro-recall of 0.593. The micro-F1 score was 0.728 (micro-precision 0.731, micro-recall 0.725), indicating high stability at the sample level despite the severe label imbalance. The average PR-AUC across all labels was ca. 0.67, reflecting the model’s success in balancing recall for rare classes while maintaining robust precision.

Per-label performance is detailed in Table 3. The results reveal two key findings. First, the scene-aware model outperformed the global-only variant on every label, including the rarest lists (type1, type4), which demonstrates the efficacy of conditioning on eco-social scenes. Second, the high performance on the largest label (type3, F1 ≈ 0.84) indicates that the pipeline does not suffer from premature saturation and can effectively utilize all available data.

Collectively, the macro-F1 of 0.641 represents a robust result for this national-scale, seven-label, highly imbalanced rural-honor prediction task.

To complement the theoretical motivation for selecting TabPFN, we provide a direct empirical comparison against conventional tabular baselines under a unified setting (same dataset, identical predictors, and the same 5-fold OOF protocol). Table 4 reports Random Forest and XGBoost baselines alongside the global-only TabPFN learner and our scene-aware stacked pipeline, thereby contextualizing the added value of the proposed architecture relative to standard tabular learners.

3.2. Ablation Experiments

To disentangle the contribution of each component, we factorized the full pipeline into four orthogonal modules:

A—Global learner (TabPFN-based, trained on all villages, with stratified 5-fold CV); B—Scene-level learners (six ecological–socioeconomic scene specialists); C—Meta-fusion layer (stacked learner on top of global + scene OOF probabilities); D—Label-wise adaptive thresholds (per-honor cutoffs instead of a fixed 0.50).

The full model corresponds to A + B + C + D and is the one reported in Section 3.1 (macro-F1 = 0.641 ± 0.005, micro-F1 = 0.728). All ablations use the same folds, features, and runtime budget; hence drops ≥ 0.009 can be interpreted as substantive, because the inter-fold variance stays around 0.005. (see Table 5)

In addition to the component ablations, we performed sensitivity checks to verify that the gains from scene-aware stacking are not artifacts of an arbitrary partition or a single elevation gradient. When varying the number of clusters under K-Means (K = 4–8), performance remained stable (macro-F1 = 0.636–0.640; micro-F1 = 0.723–0.727), close to the full configuration (macro-F1 = 0.641; micro-F1 = 0.728). Alternative clustering methods yielded comparable results (GMM: macro-F1 = 0.640, micro-F1 = 0.727; agglomerative Ward: macro-F1 = 0.638, micro-F1 = 0.726), indicating robustness to the choice of clustering algorithm. In contrast, a size-matched random partition substantially reduced performance (macro-F1 = 0.624; micro-F1 = 0.713), approaching the no-scene variant (ID 2, macro-F1 = 0.622), which suggests that the improvement is not produced by arbitrary segmentation. A DEM-only partition recovered only part of the gain (macro-F1 = 0.633; micro-F1 = 0.721) but remained below the full multi-variable scenes, supporting that the scenes capture joint DEM–NTL–NDVI–T2M regimes rather than elevation alone. These results (Table 5, IDs 11–18) confirm that the scene-aware gains reflect meaningful eco-social stratification rather than a clustering artifact.

This ablation is consistent with the substantive nature of the task: we are not classifying generic remote-sensing scenes, but predicting seven administratively issued, thematically heterogeneous national village honors from a highly imbalanced, spatially non-stationary, and policy-influenced national inventory. Under such conditions, no single “global” decision surface can fit all provinces, eco-zones, and honor lists.

(i): Scene specialization (B) matches spatial–ecological non-stationarity.

When we remove scene-level learners (ID 2), macro-F1 drops from 0.641 → 0.622. This ~0.02 loss is exactly where we expect the national model to struggle: provinces with atypical altitude–climate profiles (QTP, Hengduan), minority-concentrated counties, and low-density northern grassland belts. These areas contribute relatively few positives for several lists, so a global learner trained on dense eastern/central China tends to underfit them. Scene learners re-center the feature–label relation within each eco-social context, which is why the biggest per-label drops appear on type1 and type4—two lists that in practice are tied to historical/cultural or ethnic resources that are spatially clustered, not uniformly scattered across China.

(ii): Meta-fusion (C) reconciles “national” vs. “local” determinants.

A characteristic of this dataset is that some honors are driven by national-level suitability signals (accessibility, tourism potential, development level), while others depend on local or group-specific attributes (ethnic, forestry, heritage preservation). Without a trainable fusion (ID 3 and ID 5), the model must assume that all views are equally informative for all lists, so performance stays in the 0.622–0.630 band. With meta-fusion, the model can upweight the global view for, say, National Rural Governance Demonstration Villages, and upweight specific scenes for Ethnic Minority Characteristic Villages—a pattern that mirrors the actual policy logic of how these lists are issued.

(iii): Label-wise thresholds (D) compensate administrative imbalance.

The seven honor lists have different issuing years, different batch sizes, and different regional coverage, so their positive rates in our inventory differ by an order of magnitude. A fixed 0.50 cutoff implicitly assumes equal base rates and causes under-detection on honors that are historically issued in small batches or only to selected provinces. Removing D (ID 4) drops macro-F1 to 0.628, and the loss is concentrated exactly on those administratively “thin” lists. In other words, D is the piece that aligns the ML outputs with the administrative sparsity of the target.

(iv): Protocol sensitivity shows why Section 3.2 (QTP mapping) needs the full stack.

ID 10 shows that even with all modules on, abandoning multilabel-stratified CV brings macro-F1 down to ~0.63, because rare honors lose representation in some folds. This matters for the later spatial experiments, when we project to the QTP—an area with fewer labeled villages and stronger eco-climatic departures—the model is effectively extrapolating from exactly those rare, scene-dependent positives. The full A + B + C + D configuration is therefore not only the best on national OOF scores, but also the safest for out-of-distribution, high-altitude inference that follows.

3.3. Spatial Identification of Potential Villages in the QTP

The scene-aware model was extrapolated from the national honor inventory (N = 24,450) to the full set of unlabeled natural villages on the Qinghai–Tibetan Plateau (QTP; 58,372 records). All predictions were generated under the same seven-label, multilabel setting as the training stage, so that “potential” is interpreted as a high model-estimated recognition likelihood under the current honor regime, meaning that the village conforms to the spatial–environmental–socioeconomic profile observed among historically recognized villages rather than representing an intrinsic measure of village value.

At the plateau scale, the model reproduces the multi-honor structure observed nationally. Predicted candidates are not concentrated in one or two “easy” lists; villages resembling the List of Chinese Traditional Villages, National Forest Villages, and National Key Rural Tourism Villages dominate the eastern and south-eastern margins where transport density, thermal conditions, and existing heritage/tourism nodes co-occur, while villages resembling Chinese Ethnic Minority Characteristic Villages appear more frequently in high-altitude and ethnically enriched belts. This indicates that the model did not collapse to a development-only logic when facing sparse, high-altitude inputs.

The spatialization (Figure 7) reveals four coherent concentration zones. First, the eastern fringe of the plateau (Gannan, Songpan, Aba, Garzê) emerges as the largest continuous belt of candidate villages with high predicted recognition likelihood; here elevation is still high but accessibility and human activity are sufficiently strong to support tourism- and forest-oriented honors. Second, the Yarlung Tsangpo, Lhasa, Shannan, Nyingchi corridor shows a dense string of candidates that mirrors the existing recognized villages, confirming that the model can extend known nomination corridors rather than inventing new ones arbitrarily. Third, the Xining, Haidong, Huangnan interface forms a transition cluster where lowland-influenced features meet plateau constraints, producing mixed honor profiles. Fourth, the Hengduan–Yunnan transition zone in the south-east of the QTP yields many ethnic–ecological candidates, consistent with its cultural composition.

Equally important is the large remainder of villages for which the model assigns no honor at the current thresholds. These settlements are typically located in very high, cold, or infrastructure-poor sectors of the plateau, where key drivers learned from the national inventory (lighting, accessibility, proximity to existing cultural/tourism centers) are absent. Their presence in the map should be interpreted narrowly as low model-estimated recognition likelihood under the current honour regime, conditioned on the available predictors and the adopted thresholds. Such low scores do not allow any inference about intrinsic village “value”, because the model cannot observe unmeasured cultural, institutional, or community-capital factors and therefore only reflects statistical patterns embedded in historically recognized cases within the current feature space. Practically, these areas remain priority targets for follow-up verification and data enrichment (e.g., local tourism projects, intangible-heritage records, protected-area status) before a second-round screening.

Overall, this experiment shows that, despite the plateau providing only 923 confirmed honors, a nationally trained, scene-aware model can still produce a geographically plausible shortlist of high-likelihood candidate villages: dense where the plateau most resembles the rest of China, selective where the plateau departs from it, and explicit about areas where current predictors are not yet informative. This shortlist is what later sections turn into survey and nomination priorities.

We therefore interpret these patterns as candidate belts of model-estimated recognition likelihood under the current honour regime, intended to guide prioritization rather than to serve as definitive confirmation of intrinsic merit or eligibility. Independent validation (e.g., expert assessment, field surveys, or linkage to future/historical nomination cycles) remains necessary to substantiate these hypotheses.

3.4. Global and Local Attribution of Driving Factors

The feature attribution analysis revealed a clear global hierarchy of predictors for honor-label recognition. Vegetation status, represented by NDVI, was the single strongest driver, with an average importance about three times higher than any other variable, which confirms that current honor designations in China still follow an ecology-first logic. Geographic context formed the second tier of drivers. Longitude, elevation and night-time light each contributed meaningful explanatory power and together captured regional policy preferences, topographic suitability and development level. Climate-related variables such as temperature, pressure, solar radiation, wind speed and diurnal temperature range contributed only marginally and mainly acted as contextual refinements rather than decisive triggers.

The distributional form of the importance values is more informative than the ranking alone. NDVI and longitude both showed wide, multimodal distributions, which means that their contribution is not uniform across labels. NDVI was very strong for ecology- and heritage-oriented labels such as traditional villages and forest villages, but only moderate for historic towns and governance-related honors. Longitude behaved in the opposite way. Most labels were only weakly affected by east–west position, but ethnic and minority-featured villages showed a marked peak, indicating that their spatial concentration in western and southwestern China is being learned by the model rather than hard coded. Elevation displayed an intermediate pattern. It was useful for tourism-type labels that favor scenic or mountain environments, but less important for labels that depend on administrative recognition. This pattern supports the use of label-specific or scene-specific heads in future versions of the model, because no single feature set explains all honor types equally well (Figure 8).

Local attribution, interpreted here as the spatial and label-conditioned expression of the above factors, helps to explain the heterogeneous prediction surface reported in Section 3.3. In humid lowlands and in the eastern piedmont, ecological indicators already sit in a favorable range, so small variations in NDVI or EVI do not change the predicted label set very much; in these places night light and administrative proximity become more influential because they signal whether a village can satisfy the service and visibility requirements of national lists. In transition belts and in plateau margins, the situation is different. There the model relies more heavily on elevation and longitude to decide whether an otherwise green village is a plausible candidate for tourism- or ethnicity-related honors. In high, cold or infrastructure-poor scenes, the same variables still appear in the attribution profile, but their absolute importance remains low, which is consistent with the view that these areas are under-described rather than intrinsically unsuitable. In other words, the global ranking identifies vegetation as the main entrance condition, while the local attribution shows that geography and development indicators decide which specific honor a village is likely to obtain once the ecological threshold has been met.

4. Discussion

4.1. Spatial Non-Stationarity: A Challenge to “Urban-Centric” AI Models

The spatial non-stationarity revealed by the results poses a direct challenge to the assumptions of spatial homogeneity often implicit in “Urban Future” research.

The descriptive analysis in Section 2.2 already showed that honored villages in China are far from uniformly distributed: about 84% of records fall east of 105° E, while the entire western interior contributes only ~16%. This study now fixes the national master list at 24,450 honored villages and the Qinghai–Tibetan Plateau (QTP) subset at 923 villages, which means the plateau accounts for only 3.78% of all known honors [55].

Under such a severe skew, any model trained on the raw national pool is, by design, a model of the east and the south (i.e., an urbanized, lowland model), and will tend to under-represent high-elevation, low-NTL, culturally specific settlements in the plateau (see Table 6).

Table 6 and the data reports further lead to three key points:

(1): The plateau is not a scaled-down China; it is a different honor regime. Nationally, the two lists that pull the numbers up are the ‘List of Chinese Traditional Villages’ and ‘National Forest Villages’. On the QTP, the picture diverges. The ‘List of Chinese Traditional Villages’ becomes even more dominant, accounting for 44.6% of all plateau honors. The most significant divergence is the collapse of the ‘National Forest Villages’ category, which drops from the second most common label nationally (29.15%) to a minor role on the plateau. In practical terms, a global (urban-centric) classifier may under-represent plateau contexts because it implicitly expects a “traditional + forest” signal. It therefore struggles to capture the QTP’s distinct recognition structure, which relies more heavily on the “Traditional” label and a different set of secondary drivers.
(2): The plateau’s feature space is “bent” (DEM–NTL decoupling). This finding has profound methodological implications for “Urban Remote Sensing.” Nighttime Lights (NTL) is one of the most fundamental proxies in “Urban Future” and AI models. In the national dataset (dominated by urban and peri-urban lowlands), NTL is often a good proxy for “development” or “tourism readiness.” But on the plateau, average elevation jumps to ~2665 m, and simultaneously, the median NTL for honored villages is 0.
(3): This weakens the monotonic relationship often learned by models trained primarily on lowland or urban-dominated datasets. This is exactly what the model indicates: a plateau-specific scene learner is required to interpret low-light environments appropriately, distinguishing villages that appear “dark” because they are remote from those that simply lack the observable features commonly associated with nationally recognized villages. If an AI model trained in cities is directly extrapolated to these rural hinterlands that support the city, it may systematically under-represent remote “dark” settlements because the proxy signals commonly associated with recognition in the national dataset are absent [56].
(4): Definition sensitivity proves we must borrow national statistical power. In a data-scarce and definition-sensitive subdomain like the QTP, the only stable approach is exactly what we did in Section 2.3: learn the broad, policy-generated patterns on 24,450 records, then let a scene-aware, meta-fused head re-weight them for the high-altitude, culturally rich context. Therefore, scene learners, meta-fusion, and label-wise thresholds are not arbitrary engineering choices, but the minimal set of mechanisms needed to make national patterns derived from the “urban-centric” usable on the QTP [57].

4.2. Interpreting Scene-Aware Identification of Candidate Villages on the Tibetan Plateau

Scene-aware identification suggests that the Tibetan Plateau should not be treated as a residual part of the national model but rather as a region where villages can be interpreted within several internally consistent plateau-specific scenes. As shown in Figure 7, the four concentrated areas identified in Section 3.3 (the eastern piedmont, the Yarlung Tsangpo–Lhasa corridor, the Xining–Haidong transition sector, and the Hengduan–Yunnan margin) can be interpreted as areas where the current feature set appears sufficient to capture patterns associated with national honor labels. Areas that remain cold, very high or poorly served by infrastructure may reflect insufficient descriptive information rather than low recognition likelihood. This distinction is important for interpretation because it avoids prematurely excluding villages that may resemble historically recognized cases but remain weakly described in the available feature space [58]. Under this interpretation, under-recognition refers to a potential coverage gap within the same institutional standards—arising from uneven observability, nomination capacity, or documentation reach—rather than a claim that the honor system fails to identify universally valuable villages.

A second implication concerns how well the national honor logic can be transferred. Figure 9 shows that labels with a clear heritage, ecological or landscape basis are reproduced on the plateau with relatively small distortion, while labels that depend on local industry, administrative performance or tourism operation are less stable. This suggests that the model structure is sound, but parts of the plateau require complementary information from provincial and county sources before further verification or nomination consideration. Field verification may therefore focus first on villages that appear in high-density scenes and receive more than one compatible label [59].

The co-occurrence pattern in Figure 10 provides an empirical reference for such prioritization. The plateau still shows a central role of the traditional-village type and the strongest companionship between traditional and forest-related honors. Where Figure 10 shows these stable combinations, the predicted villages may be considered for further verification and documentation. Where only a single governance, tourism or product-oriented label appears in a very sparse or high-altitude scene, the prediction should be kept but reviewed together with local socio-economic data. In practice, we treat these local records as complementary evidence for post-screening verification, whereas the core model is built on nationally consistent indicators to ensure comparability across regions.

On this basis, the results can be interpreted through a three-step prioritization logic. First, consider the scene-dense belts visible in Figure 7 as the primary candidate pool for verification. Second, keep the spatially isolated but co-occurrence-supported villages from Figure 10 and enrich them with local statistics or cultural inventories. Third, mark the scene-poor parts of the plateau as areas where the feature space needs to be extended with plateau-specific indicators such as altitude-adapted livelihoods, religious or monastic landscapes and small-scale ecotourism. In this way the national model is used as scaffolding, while the plateau context can be examined using additional locally relevant indicators in future analyses.

4.3. Interpreting Attributions: From “Urban Proxies” to “Scene-Aware” Intelligence

Our analysis questions the normative interpretation of NTL as a direct proxy for value or development. In the attribution and scene-aware analysis, NTL is discussed as an urban-centric visibility proxy (i.e., evidence/observability signal), and its failure on the plateau is precisely what motivates scene-aware adaptation rather than value-based ranking. The attribution patterns reveal a clear hierarchy: the model learns an “ecology-first, geography-second” logic, while climate and meteorological variables serve mainly as background context rather than as direct triggers.

At the national scale, the drivers fall into three tiers:

(1): Tier 1 (Dominant): Vegetation indices (NDVI) are the strongest driver, with an average contribution about three times higher than any other feature [60].
(2): Tier 2 (Contextual): Longitude, elevation, and Nighttime Lights (NTL) form the second tier. Together, they explain the spatial preferences, topographic suitability, and development level of specific honor lists.
(3): Tier 3 (Weak): Pure climate variables (temperature, radiation, pressure) remain in a weak tier.

However, the violin shapes (Figure 8) make it clear that this hierarchy is not uniform across honor types. For instance, vegetation is very strong for “Traditional” and “Forest” honors, while longitude is disproportionately strong for “Ethnic” honors.

The implication of this for the “Urban Future” and “Urban-Rural AI” is critical. The need for this separation becomes stronger when we look at local attribution, especially on the plateau.

In Section 4.1, we identified a limitation in models that rely heavily on urban-centric proxies: its reliance on urban proxies (like NTL), which may become less informative in rural hinterland contexts (like the QTP).

This attribution analysis provides the solution and the evidence. In eastern, well-connected regions (i.e., urban or peri-urban), ecological indicators are generally good, and honor discrimination comes from “development” or “administrative proximity.” In those scenes, NTL is an effective and powerful “urban proxy” [61].

But on the plateau, where ecological indicators may be just as good, NTL collapses to zero. If we insist on using the national (i.e., urban) “development” proxy (NTL), the model will continue to understate governance and product-oriented honors on the plateau. To achieve the true urban-rural sustainable development that the “Urban Future” depends on, models trained primarily on urban-oriented proxies may require adaptation when applied to rural hinterland contexts.

From a methodological perspective, our local attribution results highlight an important challenge for rural–plateau settings: predictive signals can be non-stationary across eco-social contexts, and widely used urban-centric proxies (e.g., NTL) may lose explanatory power in high-altitude, low-visibility regimes. Rather than treating this as a QTP-specific exception, we view it as evidence that scene-aware modelling benefits from incorporating context-appropriate, substitutable proxies. In practice, this points to a concrete data agenda—where available and reproducible—for enriching the evidence space with scene-relevant socio-economic indicators (e.g., county-level tourism programs, heritage registrations, locally grounded industry or service proxies) to complement or replace failed visibility proxies in plateau regions. Importantly, we emphasize that this interpretation is hypothesis-generating: whether the same proxy-substitution strategy generalizes to other geographies and honour systems requires dedicated validation. Nonetheless, the attribution analysis (Figure 8) provides a practical diagnostic for identifying proxy dependence and a principled direction for iterative feature improvement in future integrated urban–rural modelling efforts.

4.4. Practical Implications, Uncertainty Control, and Future Extensions

The results for the Tibetan Plateau must be interpreted as ranked recommendations rather than definitive inclusion, because the plateau subset contains only 923 honored villages, which is 3.78% of the 24,450 records in the rest of China, and its size changes markedly when elevation or spatial masks are adjusted. This definition sensitivity alone introduces a structural source of uncertainty: a village may move from the training set to the prediction set (or the reverse) purely because the plateau boundary is drawn more narrowly or more broadly.

Uncertainty also arises from label composition. The rest-of-China sample is dominated by the List of Chinese Traditional Villages and National Forest Villages. In contrast, the plateau sample is characterized by an even more extreme dominance of the ‘List of Chinese Traditional Villages’ (44.6%), while other categories, including ‘Chinese Ethnic Minority Characteristic Villages’ (9.8%), play a proportional or minor role. In other words, the model is asked to transfer a pattern learned from majority lowland, development-friendly honors to a minority highland, culture- and ethnicity-oriented context. This transfer is plausible because we trained on the full 24,450 records, but it is inevitably less certain than within-scene predictions in eastern and southern China.

For practice, this means scores on the plateau should be delivered with an explicit tiering. Villages that obtain scene-consistent, high probabilities and whose drivers are mainly actionable (accessibility, tourism connection, services) should be placed in a first tier for field survey and dossier preparation. Villages that score high but whose drivers are mainly non-actionable (elevation, river-valley setting, ethnic composition) should be placed in a second tier, to be validated thematically rather than administratively. Villages in sparsely sampled plateau corridors, where the model has few analogues, should be kept in a third tier even if the absolute score is only moderate, because part of the score may be reflecting data sparsity rather than lack of value. This three-level reading prevents the model from being used as a hard cutoff tool in precisely the region where its statistical support is thinnest.

A second implication is that calibration should be performed per scene, not at the national level. If a single probability threshold were adopted for the whole country, high-altitude, low-NTL villages would be systematically under-selected, because the national distribution is dominated by low-elevation, lit settlements. Calibrating thresholds within plateau scenes, as we did in the label-wise component, is a direct way to control spatially induced uncertainty and to make plateau recommendations comparable to those in the rest of China.

Future work can further strengthen the framework by making uncertainty more explicit in the modelling and reporting. Two practical extensions are worth exploring. First, incorporating issuing-year and issuing-authority metadata (where available) may help account for temporal and administrative shifts across the 2003–2023 release span, thereby reducing label noise and separating regime changes from spatial signals. Second, adding a spatially aware uncertainty summary—for example, mapping foldwise variability or prediction dispersion back to geographic space—could provide a transparent indicator of where the model is less confident, potentially due to sparse local evidence or lower data density [62]. Together, these extensions would improve interpretability and support more cautious, evidence-aware prioritization, while remaining consistent with the regime-conditioned scope of this study [62].

5. Conclusions

This research demonstrates an AI framework for diagnosing recognition patterns under uneven observability when evaluating the rural hinterlands supporting the “Urban Future.” We show that a scene-aware AI ensemble framework, trained on national (lowland) data, can be transferred to a data-scarce, high-altitude region—the Qinghai–Tibetan Plateau (QTP)—to identify villages with high model-estimated recognition likelihood within the national honor system.

Our core contribution lies in revealing that spatial non-stationarity is not noise, but a structure that must be modeled. The framework’s diagnosis of the “DEM–NTL decoupling” phenomenon—where many recognized QTP villages exhibit a median Nighttime Light (NTL) of 0—highlights the limitations of “Urban AI” models that equate urban proxies (such as NTL) with rural recognition patterns. We demonstrate that through “scene-aware” learning (i.e., prioritizing local drivers over global proxies that fail under plateau conditions), the model can identify four geographically coherent “nomination belts.”

Finally, this work clarifies the path forward for “intelligent urban-rural governance.” It shows that future progress will depend less on more complex models and more on better, scene-specific variables to replace failed “lowland proxies” like NTL. Only by making uncertainty explicit and adhering to scene-aware AI design can we leverage national systems to identify villages with high model-estimated recognition likelihood that may remain underrepresented within the current recognition system but are important for urban–rural sustainable development.

Author Contributions

G.J.: Conceptualization, Methodology, Formal Analysis, Investigation, Data Curation, Writing—Original Draft, and Visualization; D.Z.: Supervision, Project Administration, Funding Acquisition, and Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by the Qinghai Provincial Social Science Planning Key Self-funded Project (24CZ011).

Data Availability Statement

Original honor lists and environmental datasets are publicly available from official sources. Derived datasets and model outputs are available from the corresponding author upon reasonable request.

Acknowledgments

The authors express their gratitude to the reviewers and the editor for their professional comments and suggestions. Generative AI was not used to generate scientific content; language polishing assistance was limited to style and clarity.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhu, Y.; Chai, S.; Chen, J.; Phau, I. How Was Rural Tourism Developed in China? Examining the Impact of China’s Evolving Rural Tourism Policies. Environ. Dev. Sustain. 2024, 26, 28945–28969. [Google Scholar] [CrossRef]
Zhu, K. Rural Area and Agricultural Region Revitalisation Modelling. Land Degrad. Dev. 2023, 34, 42–51. [Google Scholar] [CrossRef]
Zhou, Z.; Chen, M.; Liu, Z. Research on the Experience Design and Tourist Satisfaction of Rural Tourism in China: A Bibliometric Analysis. In Design, User Experience, and Usability; Marcus, A., Rosenzweig, E., Soares, M., Eds.; Springer: Cham, Switzerland, 2024; Volume 14716, pp. 401–417. [Google Scholar]
Zhu, X.; Li, T.; Feng, T. On the Synergy in the Sustainable Development of Cultural Landscape in Traditional Villages under the Measure of Balanced Development Index: Case Study of the Zhejiang Province. Sustainability 2022, 14, 11367. [Google Scholar] [CrossRef]
Zhou, Z.; Zheng, X. A Cultural Route Perspective on Rural Revitalization of Traditional Villages: A Case Study from Chishui, China. Sustainability 2022, 14, 2468. [Google Scholar] [CrossRef]
Zhang, K.; Sun, X.; Jin, Y.; Liu, J.; Wang, R.; Zhang, S. Development Models Matter to the Mutual Growth of Ecosystem Services and Household Incomes in Developing Rural Neighborhoods. Ecol. Indic. 2020, 115, 106363. [Google Scholar] [CrossRef]
Xu, X.; Xu, Y.; Wu, Q. Empowering Agricultural Landscapes: A Game-Based Simulation for Establishing Tourism Attractions Property Rights in Farmland. J. Clean. Prod. 2025, 494, 144864. [Google Scholar] [CrossRef]
Zhong, Z.; Zhang, Y.; Zhang, J.; Su, M. How Information and Communication Technologies Contribute to Rural Tourism Resilience: Evidence from China. Electron. Commer. Res. 2024, 25, 4559–4594. [Google Scholar] [CrossRef]
Yuan, Y.; Zhao, W.; Li, H.; Mu, H. Analyzing the Driving Mechanism of Rural Transition from the Perspective of Rural-Urban Continuum: A Case Study of Suzhou, China. Land 2022, 11, 1146. [Google Scholar] [CrossRef]
Zhao, M.; Zhou, J.; Mu, J. SWOT Research on the Development of Rural Tourism E-Commerce System under the Background of Big Data Era. Mob. Inf. Syst. 2021, 2021, 8112747. [Google Scholar] [CrossRef]
Yu, H. Application of the Artificial Intelligence System Based on Graphics and Vision in Ethnic Tourism of Subtropical Grasslands. Heliyon 2024, 10, e31442. [Google Scholar] [CrossRef]
Zheng, X.; Wu, J.; Deng, H. Spatial Distribution and Land Use of Traditional Villages in Southwest China. Sustainability 2021, 13, 6326. [Google Scholar] [CrossRef]
Zhang, X.; Yang, H.; An, Y. Spatial Morphology and Geographic Adaptability of Traditional Villages in the Hehuang Region, China. Buildings 2025, 15, 244. [Google Scholar] [CrossRef]
Zhao, M.; Ran, S.; Wang, M.; Wang, D. Spatial Differentiation and Mechanism of Effects of Preventing Return to Poverty Driven by Rural Tourism in the Western Minority Region of China: A Case Study of Naxi Ethnic Township, Markam County, Tibet Autonomous Region. Appl. Spat. Anal. Policy 2025, 18, 134. [Google Scholar] [CrossRef]
Zhang, H.; Li, F.; Zhang, J.; Liang, H.; Huangfu, Y. Spatial Structure and Influencing Factors of Agricultural Civilization Heritage in the Yellow River Basin. Sci. Rep. 2025, 15, 29836. [Google Scholar] [CrossRef]
Zhang, S.; Wu, T.; Guo, L. The Positive Influence of Tibetan Buddhist Monasteries on Forest Conservation. Herit. Sci. 2024, 12, 413. [Google Scholar] [CrossRef]
Zhang, H.; Chen, W.; Liu, W.; Liu, Z.; Liu, H. The Evolution of Settlement System and the Paths of Rural Revitalization in Alpine Pastoral Areas of the Qinghai-Tibet Plateau: A Case Study of Nagqu County. Ecol. Indic. 2023, 150, 110274. [Google Scholar] [CrossRef]
Yang, Y.; Zhang, L.; Xi, G.; Zhong, C.; Shu, T. The Impact of Digital Technology on the Economic Behavior of Traditional Minority Villages in China. J. Enterp. Communities-People Places Glob. Econ. 2025, 19, 586–609. [Google Scholar] [CrossRef]
Zhu, Y.; Li, E.; Su, Z.; Liu, W.; Samat, A.; Liu, Y. A Few-Shot Semi-Supervised Learning Method for Remote Sensing Image Scene Classification. Photogramm. Eng. Remote Sens. 2024, 90, 121–126. [Google Scholar] [CrossRef]
Ruiz-Villafranca, S.; Roldan-Gomez, J.; Gomez, J.M.C.; Carrillo-Mondejar, J.; Martinez, J.L. A TabPFN-Based Intrusion Detection System for the Industrial Internet of Things. J. Supercomput. 2024, 80, 20080–20117. [Google Scholar] [CrossRef]
Qi, J.; Lu, Y.; Han, F.; Ma, X.; Yang, Z. Spatial Distribution Characteristics of the Rural Tourism Villages in the Qinghai-Tibetan Plateau and Its Influencing Factors. Int. J. Environ. Res. Public Health 2022, 19, 9330. [Google Scholar] [CrossRef]
Zhang, X.; Zhu, H.; Li, X.; Hou, B.; Zhao, W.; Yi, X.; Ma, W.; Jiao, L. Recurrent Progressive Fusion-Based Learning for Multi-Source Remote Sensing Image Classification. Pattern Recognit. 2026, 171, 112284. [Google Scholar] [CrossRef]
Zhu, X.; Wu, T.; Zhao, L.; Yang, C.; Zhang, H.; Xie, C.; Li, R.; Wang, W.; Hu, G.; Ni, J.; et al. Exploring the Contribution of Precipitation to Water within the Active Layer during the Thawing Period in the Permafrost Regions of Central Qinghai-Tibet Plateau by Stable Isotopic Tracing. Sci. Total Environ. 2019, 661, 630–644. [Google Scholar] [CrossRef]
Tian, W.; Peng, X.; Frauenfeld, O.; Weisai, L.; Wei, G.; Chen, G.; Huang, Y. A New Inventory and Future Projections of Thermokarst Lakes in the Permafrost Regions of the Qilian Mountains, Northeastern Qinghai-Tibet Plateau, China. Geomorphology 2024, 462, 109348. [Google Scholar] [CrossRef]
Zhang, C.; Xiong, W.; Shao, T.; Zhang, Y.; Zhang, Z.; Zhao, F. Analyses of the Spatial Morphology of Traditional Yunnan Villages Utilizing Unmanned Aerial Vehicle Remote Sensing. Land 2023, 12, 2011. [Google Scholar] [CrossRef]
Wang, Y. Analysis of the Impact Mechanism of Multimedia Information Fusion on the Heritage Conservation and Inheritance of Historical and Cultural Villages and Towns. Rev. Adhes. Adhes. 2023, 11, 443–459. [Google Scholar]
Wang, Q.; Bing, H.; Wang, S.; Xu, Q. Study on the Spatial Distribution Characteristics and Influencing Factors of Famous Historical and Cultural Towns or Villages in Hubei Province, China. Sustainability 2022, 14, 13735. [Google Scholar] [CrossRef]
Zou, Q.; Sun, J.; Luo, J.; Cui, J.; Kong, X. Spatial Patterns of Key Villages and Towns of Rural Tourism in China and Their Influencing Factors. Sustainability 2023, 15, 13330. [Google Scholar] [CrossRef]
Yang, X.; Xu, H. Producing an Ideal Village: Imagined Rurality, Tourism and Rural Gentrification in China. J. Rural Stud. 2022, 96, 1–10. [Google Scholar] [CrossRef]
Zhou, Z.; Deng, J.; Wang, P.; Zhou, C.; Xu, Y.; Jiang, W.; Ma, K. Physical Environment Study of Traditional Village Patterns in Jinxi County, Jiangxi Province Based on CFD Simulation. Processes 2022, 10, 2453. [Google Scholar] [CrossRef]
Zhang, Y.; Li, J.; Wang, J.; Xin, A. Spatial Correlation between Traditional Villages and Religious Cultural Heritage in the Hehuang Region, Northwest China. J. Asian Archit. Build. Eng. 2025, 24, 3190–3202. [Google Scholar] [CrossRef]
Zheng, G.; Jiang, D.; Luan, Y.; Yao, Y. GIS-Based Spatial Differentiation of Ethnic Minority Villages in Guizhou Province, China. J. Mt. Sci. 2022, 19, 987–1000. [Google Scholar] [CrossRef]
Xu, J. Community Participation in Ethnic Minority Cultural Heritage Management in China: A Case Study of Xianrendong Ethnic Cultural and Ecological Village. Pap. Inst. Archaeol. 2007, 18, 148–160. [Google Scholar] [CrossRef]
Weng, F.; Li, X.; Xie, Y.; Xu, Z.; Ding, F.; Ding, Z.; Zheng, Y. Study on Multidimensional Perception of National Forest Village Landscape Based on Digital Footprint Support-Anhui Xidi Village as an Example. Forests 2023, 14, 2345. [Google Scholar] [CrossRef]
Wang, Z.; Qiao, J.; Wang, G.; Zhu, Q.; Wang, W.; Feng, Y. Spatiotemporal Patterns, Regional Differences, and Formation Mechanisms of Demonstration Villages and Towns in China. J. Rural Stud. 2025, 117, 103644. [Google Scholar] [CrossRef]
Zhu, W.; Zhang, Z.; Zhao, S.; Guo, X.; Das, P.; Feng, S.; Liu, B. Vegetation Greenness Trend in Dry Seasons and Its Responses to Temperature and Precipitation in Mara River Basin, Africa. ISPRS Int. J. Geo-Inf. 2022, 11, 426. [Google Scholar] [CrossRef]
Zhou, W.; Qin, K.; He, Q.; Wang, L.; Luo, J.; Xie, W. Comparison and Optimization of Ground-Level NO₂ Concentration Estimation in China Based on TROPOMI and OMI. ACTA Opt. Sin. 2024, 44, 0601010. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, S.; Zheng, N.; Ding, N.; Liu, X. A New Integrated Method of GNSS and MODIS Measurements for Tropospheric Water Vapor Tomography. GPS Solut. 2021, 25, 79. [Google Scholar] [CrossRef]
Zheng, H.; Lin, H.; Zhou, W.; Bao, H.; Zhu, X.; Jin, Z.; Song, Y.; Wang, Y.; Liu, W.; Tang, Y. Revegetation Has Increased Ecosystem Water-Use Efficiency during 2000-2014 in the Chinese Loess Plateau: Evidence from Satellite Data. Ecol. Indic. 2019, 102, 507–518. [Google Scholar] [CrossRef]
Zheng, B.; Myint, S.; Thenkabail, P.; Aggarwal, R. A Support Vector Machine to Identify Irrigated Crop Types Using Time-Series Landsat NDVI Data. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 103–112. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Y. Spatial Expansion and Correlation of Urban Agglomeration in the Yellow River Basin Based on Multi-Source Nighttime Light Data. Sustainability 2022, 14, 9359. [Google Scholar] [CrossRef]
Yin, X.; Jiang, B.; Chen, Y.; Zhao, Y.; Zhang, X.; Yao, Y.; Zhao, X.; Jia, K. Estimating Land Surface All-Wave Daily Net Radiation From VIIRS Top-of-Atmosphere Data. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Sudhakar, V.C.; Reddy, U.G. Impacts of Cement Industry Air Pollutants on the Environment and Satellite Data Applications for Air Quality Monitoring and Management. Environ. Monit. Assess. 2023, 195, 840. [Google Scholar] [CrossRef] [PubMed]
Duan, Y.; Liu, X.; Jatowt, A.; Yu, H.; Lynden, S.; Kim, K.; Matono, A. SORAG: Synthetic Data Over-Sampling Strategy on Multi-Label Graphs. Remote Sens. 2022, 14, 4479. [Google Scholar] [CrossRef]
Zhao, J.; Liu, N. Semi-Supervised Classification Based Mixed Sampling for Imbalanced Data. Open Phys. 2019, 17, 975–983. [Google Scholar] [CrossRef]
Zhao, H.; Zhao, C.; Zhang, X.; Liu, N.; Zhu, H.; Liu, Q.; Xiong, H. An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems. Inf. J. Comput. 2023, 35, 747–763. [Google Scholar] [CrossRef]
Zhang, C.; Wang, G.; Zhang, J.; Guo, G.; Ying, Q. IRUSRT: A Novel Imbalanced Learning Technique by Combining Inverse Random Under Sampling and Random Tree. Commun. Stat. Simul. Comput. 2014, 43, 2714–2731. [Google Scholar] [CrossRef]
Zhu, X.; Su, P.; Yu, J.; Pei, J.; Teng, Z.; Li, Y.; Liu, Y. A Prediction Model for Hazard Levels of Shallow Natural Gas in Tunnel Based on K-Means Clustering and Tabular Prior-Data Fitted Network. Results Eng. 2025, 27, 106873. [Google Scholar] [CrossRef]
Zhang, Q.; Chen, Y.; Jin, L.; Chen, S. Intelligent Identification and Reasoning of Causal Relationships in Texts on Power Production Accidents. Adv. Eng. Inform. 2026, 69, 103977. [Google Scholar] [CrossRef]
Zhu, Z.; Zhao, S.; Li, Q.; Shi, Z.; Wu, Y.; Wang, L. Spatiotemporal Evolution and Prediction of Land Use Change and Carbon Storage in Ionic Rare Earth Mining Areas Based on the YOLOv11-SegFormer-InVEST-PLUS Integrated Model. Ecol. Indic. 2025, 178, 113983. [Google Scholar] [CrossRef]
Zheng, Z.; Du, S.; Wang, Y.; Wang, Q. Mining the Regularity of Landscape-Structure Heterogeneity to Improve Urban Land-Cover Mapping. Remote Sens. Environ. 2018, 214, 14–32. [Google Scholar] [CrossRef]
Zhao, Y.; Huo, A.; Zhao, Z.; Liu, Q.; Zhao, X.; Huang, Y.; An, J. Novel Method for Evaluating Wetland Ecological Environment Quality Based on Coupled Remote Sensing Ecological Index and Landscape Pattern Indices: Case Study of Dianchi Lake Wetlands, China. Sustainability 2024, 16, 9979. [Google Scholar] [CrossRef]
Zhu, X.; Hu, J.; Xiao, T.; Huang, S.; Wen, Y.; Shang, D. An Interpretable Stacking Ensemble Learning Framework Based on Multi-Dimensional Data for Real-Time Prediction of Drug Concentration: The Example of Olanzapine. Front. Pharmacol. 2022, 13, 975855. [Google Scholar] [CrossRef]
Zhu, L.; Wang, L.; Yang, Z.; Xu, P.; Yang, S. PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information. Interdiscip. Sci. Comput. Life Sci. 2024, 16, 192–217. [Google Scholar] [CrossRef]
Zheng, M.; Luan, H.; Liu, G.; Sha, J.; Duan, Z.; Wang, L. Ground-Based Hyperspectral Retrieval of Soil Arsenic Concentration in Pingtan Island, China. Remote Sens. 2023, 15, 4349. [Google Scholar] [CrossRef]
Fox, J.; Magoulick, D. Predicting Hydrologic Disturbance of Streams Using Species Occurrence Data. Sci. Total Environ. 2019, 686, 254–263. [Google Scholar] [CrossRef] [PubMed]
Arun, P.V.; Karnieli, A. Reinforced Deep Learning Approach for Analyzing Spaceborne-Derived Crop Phenology. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103984. [Google Scholar] [CrossRef]
Zhang, X.; Fan, X.; Wang, G.; Chen, P.; Tang, X.; Jiao, L. MFGNet: Multibranch Feature Generation Networks for Few-Shot Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5609613. [Google Scholar]
Xie, B.; Deng, Y.; Shao, Z.; Xu, Q.; Li, Y. Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 13427–13440. [Google Scholar] [CrossRef]
Zou, C.; Chen, D.; Chang, Z.; Fan, J.; Zheng, J.; Zhao, H.; Wang, Z.; Li, H. Early Identification of Cotton Fields Based on Gf-6 Images in Arid and Semiarid Regions (China). Remote Sens. 2023, 15, 5326. [Google Scholar] [CrossRef]
Hackländer, J.; Parente, L.; Ho, Y.; Hengl, T.; Simoes, R.; Consoli, D.; Sahin, M.; Tian, X.; Jung, M.; Herold, M.; et al. Land Potential Assessment and Trend-Analysis Using 2000–2021 FAPAR Monthly Time-Series at 250 m Spatial Resolution. PeerJ 2024, 12, 16972. [Google Scholar] [CrossRef]
Qin, A.; Luo, B.; Li, Q.; Zou, C.; Zhao, Y.; Song, T.; Gao, C. Local-Descriptors-Based Rectification Network for Few-Shot Remote Sensing Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 9566–9581. [Google Scholar] [CrossRef]

Figure 1. The study area and spatial distribution of natural villages. The top-left panel shows the geographical location of the QTP within China. The bottom-left panel displays the topography and administrative divisions of the QTP. The large panel on the right illustrates the spatial distribution of natural villages across the plateau, where the light blue dots indicate specific village locations.

Figure 2. Spatial Distribution and Density Patterns of Honored Village Samples: China vs. Qinghai-Tibetan Plateau (QTP).

Figure 3. Spatial Distribution Patterns of Honored Villages and Natural Villages in the Qinghai-Tibetan Plateau (QTP).

Figure 4. The ISB-PFN (Imbalance-Constrained Sampling and Bagging) Learner Workflow. The 4.49%/95.51% ratio shown in panel (A) is an illustrative example based on the rarest label (‘National Historical and Cultural Villages’); this information-preserving sampling process is applied independently for each of the seven honor labels.

Figure 5. Workflow for Generating Global Out-of-Fold (OOF) Probability Features.

Figure 6. Conceptual structure of the sparse local probability matrix.

Figure 7. Spatial identification of predicted high-potential villages in the Qinghai–Tibetan Plateau (Left): Kernel density of predicted honor occurrences; (Right): Point-based distribution of candidate villages.

Figure 8. Distribution of Feature Importances across Driving Factors (Violin Plot).

Figure 9. Scene-Aware Distribution of Predicted Honor-Label Frequencies across the Tibetan Plateau.

Figure 10. Scene-Constrained Co-Occurrence Network of Predicted Honor Labels on the Tibetan Plateau.

Table 1. Summary of Information on China’s National Rural Honor Lists.

Category	Batch/Year	Date (Documented/Released)	Issuing Authority
National Famous Historical and Cultural Towns and Villages [26,27]	Batch 1	8 October 2003/1 December 2003	Ministry of Construction, National Cultural Heritage Administration
	Batch 2	16 September 2005/14 November 2005
	Batch 3	31 May 2007/13 June 2007
	Batch 4	14 October 2008/15 October 2008	Ministry of Housing and Urban-Rural Development, National Cultural Heritage Administration
	Batch 5	22 July 2010
	Batch 6	19 February 2014/7 March 2014
	Batch 7	21 January 2019/30 January 2019
National Key Rural Tourism Villages [28,29]	Batch 1	23 July 2019	Ministry of Culture and Tourism, National Development and Reform Commission
	Batch 2	26 August 2020
	Batch 3	25 August 2021
	Batch 4	15 November 2022/7 December 2022	Ministry of Culture and Tourism
List of Chinese Traditional Villages [30,31]	Batch 1	17 December 2012 (released)	Ministry of Housing and Urban-Rural Development, Ministry of Culture, Ministry of Finance
	Batch 2	26 August 2013 (released)
	Batch 3	17 November 2014 (released)	Ministry of Housing and Urban-Rural Development, Ministry of Culture, National Cultural Heritage Administration, Ministry of Finance, Ministry of Land and Resources, Ministry of Agriculture, National Tourism Administration
	Batch 4	9 December 2016 (released)
	Batch 5	6 June 2019 (released)	Ministry of Housing and Urban-Rural Development, Ministry of Culture and Tourism, National Cultural Heritage Administration, Ministry of Finance, Ministry of Natural Resources, Ministry of Agriculture and Rural Affairs
	Batch 6	19 March 2023 (released)
Chinese Ethnic Minority Characteristic Villages [32,33]	Batch 1	23 September 2014/24 March 2017	National Ethnic Affairs Commission
	Batch 2	3 March 2017/24 March 2017
	Batch 3	31 December 2019/14 January 2020
National Forest Villages [34]	Batch 1	25 December 2019/18 January 2020	National Forestry and Grassland Administration
National Forest Villages [34]	Batch 2	31 December 2019/24 January 2020	National Forestry and Grassland Administration
National Model Villages and Towns for Rural Governance	Batch 1	24 December 2019/31 December 2019	Office of the Central Rural Work Leading Group, Ministry of Agriculture and Rural Affairs, Central Propaganda Department, Ministry of Civil Affairs, Ministry of Justice
	Batch 2	23 September 2021/29 October 2021	Office of the Central Rural Work Leading Group, Ministry of Agriculture and Rural Affairs, Central Propaganda Department, Ministry of Civil Affairs, Ministry of Justice, National Administration for Rural Revitalization
	Batch 3	2 November 2023/17 November 2023	Ministry of Agriculture and Rural Affairs, Central Propaganda Department, Ministry of Justice
National “One Village, One Product” Model Villages and Towns [35]	Batch 1	2011 (released)	Ministry of Agriculture
	Batch 2	2012 (released)
	Batch 3	2013 (released)
	Batch 4	2014 (released)
	Batch 5	2015 (released)
	Batch 6	22 July 2016
	Batch 7	18 July 2017
	Batch 8	3 July 2018/20 July 2018	Ministry of Agriculture and Rural Affairs
	Batch 9	24 September 2019/9 January 2020
	Batch 10	20 November 2020/1 December 2020
	Batch 11	10 November 2021
	Batch 12	6 March 2023/7 March 2023

Table 2. Description of External Geo-Environmental Predictors Used for Village-Honor Inference.

Abbrev.	Full Name/Source	Description/Units	Core Relevance
LON	Geographic Longitude	Village longitude (° E)	Spatial reference for sampling rasters; enables eco-social scene assignment.
LAT	Geographic Latitude	Village latitude (° N)	Spatial reference for sampling rasters; enables eco-social scene assignment.
DEM	Digital Elevation Model	Surface elevation (m a.s.l.)	Captures relief/altitude constraints of Tibetan Plateau villages; proxy for climate, accessibility, and policy priority.
T2M	ERA5 2 m Air Temperature	Near-surface air temperature at 2 m (°C)	Controls thermal environment, crop/climate suitability, and comfort; differentiates high-cold vs. warm-humid villages.
TD2M	ERA5 2 m Dewpoint Temperature	Dewpoint temperature at 2 m (°C)	Indicates near-surface moisture availability; helps separate arid/high-elevation from humid/monsoon regimes.
TSK	ERA5 Skin Temperature	Surface/skin temperature (°C)	Describes land–atmosphere coupling and surface energy; complements T2M for plateau areas with large radiation load.
SSR	ERA5 Surface Solar Radiation Downwards	Downwelling shortwave radiation at surface (W m⁻² or daily MJ m⁻²)	Proxy for solar energy, vegetation productivity, and tourism comfort; useful for ranking sunny high-elevation villages.
SP	ERA5 Surface Pressure	Surface pressure (Pa)	Encodes elevation and atmospheric thickness; helps the model distinguish very high-level settlements.
TP	ERA5 Total Precipitation	Accumulated total precipitation (mm)	Hydrological/water-resource constraint for rural development; relates to agro-forestry honors.
U10	ERA5 10 m U-Component of Wind	Zonal wind at 10 m (m s⁻¹)	Describes prevailing west–east ventilation; linked to exposure, pollutant dispersion, and comfort.
V10	ERA5 10 m V-Component of Wind	Meridional wind at 10 m (m s⁻¹)	Describes north–south ventilation; together with U10 forms local wind regime.
WS10	ERA5 10 m Wind Speed	Wind speed at 10 m (m s⁻¹), derived from U10/V10	Captures wind-related comfort/risk; useful for separating open, high-wind plateau sites from sheltered basins.
EVI	MODIS Enhanced Vegetation Index	Annual or multi-year mean EVI (–)	Measures vegetation vigor and ecological quality; key driver for “eco/forest/beautiful countryside” type honors.
NDVI	MODIS Normalized Difference Vegetation Index	Annual or multi-year mean NDVI (–)	Baseline greenness indicator; complements EVI for detecting cultivated vs. natural vegetation around villages.
LST_DAY	MODIS Land Surface Temperature (Daytime)	Daytime LST (°C)	Characterizes daytime thermal stress and urban–rural surface contrast.
LST_NIGHT	MODIS Land Surface Temperature (Nighttime)	Nighttime LST (°C)	Characterizes nocturnal cooling; useful for arid/high-altitude comfort analysis.
DTR	MODIS Diurnal Temperature Range	LST_DAY − LST_NIGHT (°C)	Reflects continentality and surface energy balance; helps identify high-elevation sunny but cold-night villages.
NTL	VIIRS Nighttime Lights (Annual Composite)	Radiance/digital number (nW cm⁻² sr⁻¹)	Proxy for human activity, accessibility, and development level; strongly correlated with “famous/key tourism” lists. Interpreted as an observability/visibility signal rather than a normative value proxy.
NO2	Sentinel-5P Tropospheric NO₂ Column	Tropospheric NO₂ (mol m⁻²)	Indicates anthropogenic emission intensity; helps separate remote, clean plateau villages from peri-urban/industrial ones.

Table 3. Per-label F1 scores of the proposed model (scene-aware stacked version).

Label	F1 (5-Fold OOF)
National Famous Historical and Cultural Towns/Villages	0.592
National Key Rural Tourism Villages	0.653
Chinese Traditional Villages	0.842
Ethnic Minority Characteristic Villages	0.614
National Forest Villages	0.673
National Rural Governance Demonstration Villages	0.539
One Village One Product Demonstration Villages/Towns	0.563

Table 4. Comparison with conventional tabular baselines under a unified setting.

Method	Scene-Aware Stacking	Macro-F1 (5-Fold OOF)	Δ Macro-F1 vs. Ours
Random Forest (baseline)	No	0.586 ± 0.004	−0.055
XGBoost (GBDT baseline)	No	0.602 ± 0.003	−0.039
TabPFN (global-only baseline)	No	0.611 ± 0.006	−0.03
Ours (TabPFN + scene-aware stacking)	Yes	0.641 ± 0.005	0

Table 5. Ablation combinations and robustness checks on the national village inventory (5-fold OOF). The “✓” symbol indicates the inclusion of the corresponding component.

ID	A	B	C	D	Macro-F1	Macro-Precision	Macro-Recall	Micro-F1
1—Full	✓	✓	✓	✓	0.641 ± 0.005	0.701	0.593	0.728
2—B (global + meta + thresholds, no scenes)	✓		✓	✓	0.622 ± 0.006	0.689	0.571	0.712
3—C (global + scenes + thresholds, no meta)	✓	✓		✓	0.630 ± 0.005	0.692	0.589	0.717
4—D (global + scenes + meta, fixed 0.50)	✓	✓	✓		0.628 ± 0.006	0.7	0.566	0.719
5—A + B only (no meta, no adaptive)	✓	✓			0.622 ± 0.006	0.683	0.579	0.711
6—A + D only (global + adaptive)	✓			✓	0.612 ± 0.006	0.674	0.563	0.705
7—A + C + D, no B (meta on global view only)	✓		✓	✓	0.614 ± 0.006	0.678	0.566	0.707
8—A + C only (global + meta, fixed 0.50)	✓		✓		0.605 ± 0.007	0.671	0.549	0.7
9—A only (global, fixed 0.50)	✓				0.603 ± 0.006	0.668	0.547	0.698
10—Full, non-stratified 5-fold	✓	✓	✓	✓	0.630 ± 0.009	0.696	0.582	0.718
11—Full, K-Means scenes (K = 4)	✓	✓	✓	✓	0.636 ± 0.006	0.696	0.586	0.723
12—Full, K-Means scenes (K = 5)	✓	✓	✓	✓	0.639 ± 0.005	0.699	0.589	0.726
13—Full, K-Means scenes (K = 7)	✓	✓	✓	✓	0.640 ± 0.006	0.7	0.59	0.727
14—Full, K-Means scenes (K = 8)	✓	✓	✓	✓	0.637 ± 0.006	0.697	0.588	0.725
15—Full, GMM scenes (K = 6)	✓	✓	✓	✓	0.640 ± 0.006	0.7	0.591	0.727
16—Full, Agglomerative (Ward) scenes (K = 6)	✓	✓	✓	✓	0.638 ± 0.006	0.698	0.589	0.726
17—Full, Random partition (K = 6, size-matched)	✓	✓	✓	✓	0.624 ± 0.006	0.69	0.573	0.713
18—Full, DEM-only partition (K = 6)	✓	✓	✓	✓	0.633 ± 0.006	0.694	0.583	0.721

Table 6. Spatial non-stationarity and definition sensitivity between national China and the QTP.

Item	National China	QTP (Plateau, 78–105° E, 26–40° N, >1500 m)	Notes
Total honored villages	24,450	923	corrected counts
Share of national total	100%	3.78%	923/24,450
Spatial concentration	≈84% east of 105° E	–	national pattern is eastern/southern
Most prevalent lists	List of Chinese Traditional Villages; National Forest Villages	List of Chinese Traditional Villages (~44.6%); Others minor, e.g., Ethnic Minority Villages (9.8%)	QTP label mix ≠ national mix
Very sparse lists	National Famous Historical and Cultural Towns and Villages; some governance/tourism batches	National Famous Historical and Cultural Towns and Villages extremely rare (~24 villages)	administratively/regionally biased issuance
Elevation (mean)	≈630 m	≈2650–2700 m	strong DEM shift
Nighttime Lights (median)	>0	0	many QTP honors are “dark”
Looser QTP masks	–	1636/2333	shows definition sensitivity

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, G.; Zhang, D. Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag. Land 2026, 15, 535. https://doi.org/10.3390/land15040535

AMA Style

Jiang G, Zhang D. Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag. Land. 2026; 15(4):535. https://doi.org/10.3390/land15040535

Chicago/Turabian Style

Jiang, Guiyan, and Donghui Zhang. 2026. "Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag" Land 15, no. 4: 535. https://doi.org/10.3390/land15040535

APA Style

Jiang, G., & Zhang, D. (2026). Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag. Land, 15(4), 535. https://doi.org/10.3390/land15040535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Datasets

2.2.1. Rural Honor Village Datasets

2.2.2. External Environmental Covariates

2.3. Progressive Scene-Aware Ensemble Prediction

2.3.1. Imbalance-Constrained Sampling and Bagging

2.3.2. Global TabPFN-Based Multi-Label Learner

2.3.3. Scene-Cluster Conditioned Learners

2.3.4. Meta-Level Probability Fusion

2.4. Feature Attribution Analysis

2.5. Computing Environment

3. Results

3.1. Model Performance

3.2. Ablation Experiments

3.3. Spatial Identification of Potential Villages in the QTP

3.4. Global and Local Attribution of Driving Factors

4. Discussion

4.1. Spatial Non-Stationarity: A Challenge to “Urban-Centric” AI Models

4.2. Interpreting Scene-Aware Identification of Candidate Villages on the Tibetan Plateau

4.3. Interpreting Attributions: From “Urban Proxies” to “Scene-Aware” Intelligence

4.4. Practical Implications, Uncertainty Control, and Future Extensions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI