Next Article in Journal
Biocultural Value of Semi-Natural and Human-Conditioned Habitats in Slovakia
Previous Article in Journal
Correction: Luo et al. A Novel Dual Comprehensive Study of the Economic and Environmental Effectiveness of Urban Stormwater Management Strategies: A Case Study of Xi’an, China. Land 2026, 15, 75
Previous Article in Special Issue
Does Road Infrastructure Close or Widen the Urban–Rural Divide? Evidence from China’s Lanxi Urban Agglomeration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Adaptive Transformer-Based Language-Model Framework for Assessing Urban Expansion

1
School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
2
State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China
3
School of Urban Design, Wuhan University, Wuhan 430072, China
4
School of Electronic Information, Wuhan University, Wuhan 430079, China
*
Authors to whom correspondence should be addressed.
Land 2026, 15(3), 514; https://doi.org/10.3390/land15030514
Submission received: 14 January 2026 / Revised: 12 March 2026 / Accepted: 12 March 2026 / Published: 23 March 2026

Abstract

Urban expansion is a key driver of land-use change and environmental pressure in rapidly urbanizing regions. Existing assessments of urban expansion often rely on predefined indicator systems and fixed weighting schemes, which limits their adaptability to evolving research priorities and regional contexts. This study develops an adaptive framework for urban expansion assessment by integrating a transformer-based language model with multi-source spatial data. A BERT-based semantic extraction process is used to identify relevant indicators and derive their relative weights from the scientific literature, enabling the construction of a literature-driven Urban Expansion Index (UEI). The framework is applied to the Central Plains Mega-city Region (CPMR), China, to examine spatial patterns and temporal dynamics of urban expansion between 2010 and 2020. Results show that UEI is primarily driven by land-use expansion indicators, while socioeconomic, infrastructure, and environmental indicators jointly reflect the multidimensional nature of expansion processes. Spatial patterns reveal a persistent concentration of high expansion intensity in core cities, alongside heterogeneous environmental responses and gradual outward growth. Changes in UEI display weaker spatial coherence than static levels, indicating differentiated local expansion dynamics. Local spatial autocorrelation analysis further identifies shifting clusters of urban expansion intensity, suggesting a reorganization of expansion centers within the agglomeration over time. By linking transformer-based indicator extraction with spatial analysis, this study advances urban expansion assessment beyond outcome-oriented mapping toward a more adaptive and knowledge-informed approach. The proposed framework is transferable to other mega-city regions and provides a useful tool for supporting territorial spatial planning and sustainable urban development.

1. Introduction

With the continuous advancement of globalization and industrialization, urbanization is accelerating worldwide, and it is projected that by 2050, 68% of the global population (6.6 billion people) will reside in urban area [1,2]. The concentration of population and industries in cities has driven rapid urban spatial expansion, which not only alters land use patterns but also profoundly affects ecological environments, social structures, and regional development patterns. On one hand, urban expansion has played a positive role in promoting economic growth, enhancing employment opportunities, and improving infrastructure [3,4]. On the other hand, excessive or unplanned expansion has led to a series of pressing issues, including decreased land use efficiency, over-occupation of farmland and ecological space, intensified urban heat island effects, traffic congestion, aggravated environmental pollution, and increasing social inequality [5,6]. These challenges not only constrain the sustainable development of cities themselves but also pose significant risks to regional and even global ecological security and climate change governance [7].
Existing methods for assessing urban expansion primarily rely on remote sensing and geographic information technologies. For example, Yu and Zhou (2017) [8] quantified the spatiotemporal patterns of urban expansion in the Beijing–Tianjin–Hebei (BTH) region, the Yangtze River Delta (YRD), and the Pearl River Delta (PRD) from 2000 to 2010 based on remote sensing data. Forget et al. (2021) [9] mapped 20 years of urban expansion across 45 cities in Sub-Saharan Africa using multi-sensor satellite imagery (Landsat, Sentinel-1, Envisat, ERS) and volunteered geographic information (OpenStreetMap). First, traditional approaches tend to focus on the “outcomes” of urban expansion, such as changes in built-up land area and spatial pattern evolution, while providing limited insight into the underlying “drivers” and intrinsic mechanisms. Second, urban expansion is a multidimensional, multiscale, and dynamically evolving process involving economic growth, population mobility, environmental constraints, policy interventions, and social demands. Traditional analytical frameworks often struggle to fully capture the interactions among these factors and their evolving weights over time. Although the volume of scholarly literature on urban expansion has grown substantially in recent years, covering a rich diversity of theoretical perspectives and empirical evidence, systematically integrating this vast knowledge base to construct a more adaptive and comprehensive assessment framework remains an unresolved challenge [10].
In recent years, the rapid development of semantic models has provided new opportunities to address the aforementioned challenges [11]. Transformer-based models, such as BERT (Bidirectional Encoder Representations from Transformers), enable deep semantic understanding and information extraction from large-scale corpora, offering a technical means to automatically identify urban expansion-related indicators from literature and to extract multidimensional drivers along with their dynamic weights [12,13]. This development is consistent with the broader use of language models in urban planning and city science, where recent studies have highlighted their potential for planning support, knowledge synthesis, and decision-oriented analysis [14,15]. Beyond planning workflows, city-oriented human-behaviour research has also begun to explore generative AI and language models as enabling tools for scalable, privacy-aware urban analytics, including perspectives linking fine-grained mobility data to experienced inequalities and discussing language-model-based mobility generation as an emerging direction [16]. Meanwhile, systematic reviews in geospatial science have emphasized both the opportunities, such as semantic extraction and geospatial knowledge support, and the risks, including bias and reliability, associated with integrating language models into spatial research pipelines, reinforcing the need for transparent and reproducible designs when such models are used to support spatial inference [17]. Building on this frontier, this study proposes an adaptive urban expansion assessment framework that integrates literature-driven meta-analysis with BERT-based semantic modeling. The framework first systematically collects and screens relevant studies to extract indicators and their associated weights in a data-driven manner, thereby constructing a dynamic and comprehensive evaluation system that reflects prevailing research emphases. It then integrates multi-source spatial and socioeconomic data into a panel database and applies spatial analysis to reveal the spatiotemporal patterns and evolution of urban expansion. The framework is applied to the CPMR, a representative inland urban cluster in China facing rapid expansion alongside agricultural and ecological constraints, to demonstrate its ability to capture heterogeneous urbanization dynamics. By combining semantic knowledge extraction with spatial analysis, this study provides a replicable and scalable approach for examining the multidimensional mechanisms of urban expansion and for supporting sustainable urban and regional planning. The proposed framework uses a transformer-based semantic harmonization step to identify and standardize indicator concepts directly from the literature. This means that the language-model component is not used to generate narrative explanations or predict urban growth directly, but to translate heterogeneous scholarly expressions into an adaptive and updateable indicator system. In this sense, the framework functions as an evidence-synthesis layer between urban expansion theory and measurable spatial variables, allowing the resulting index to evolve with the literature while remaining transparent and reproducible.

2. Method and Data

2.1. Research Region

The CPMR is one of the most important national urban clusters in China and a core region in the country’s new-type urbanization strategy. Located at the geographic heart of China (Figure 1), the CPMR links eastern coastal regions with the central and western hinterlands, forming a key corridor for population mobility, industrial development, and regional logistics [18]. The agglomeration spans 30 cities across Henan, Shanxi, Hebei, Shandong, and Anhui provinces, covering more than 280,000 km2 and hosting over 160 million residents, making it one of China’s most densely populated and fastest-growing urban regions [19,20].
Rapid economic restructuring, improvements in transportation networks, and strong demographic inflows have driven significant urban expansion across the CPMR during the past two decades. Recent studies show that major cities in the region—such as Zhengzhou, Luoyang, and Xuzhou—have experienced continuous outward growth, with built-up areas expanding at annual rates often exceeding the national average for inland cities [21,22]. While this expansion has supported economic vitality, it has also intensified the loss of arable land, altered ecosystem patterns, and increased pressure on environmental resources. As the CPMR lies within one of China’s key agricultural production belts, balancing urban expansion with farmland protection and ecological security remains an urgent planning challenge [7].

2.2. Transformer-Based Language-Model Framework

The adaptive evaluation framework for urban expansion (Figure 2a) constructs a dynamic indicator system by integrating literature-derived knowledge with multi-source spatial data. The process begins with a topic search in the Web of Science database, where core concepts and synonyms related to “urban expansion” are defined through preliminary background research. Because the topic search and the transformer-based language-model framework rely on English-language tokenization and an English stopword dictionary, non-English publications are automatically excluded during the text extraction stage. This removes the need for manual language filtering and ensures that the literature corpus remains linguistically consistent. The full set of retrieved items is then compiled, and duplicate records are identified and removed through a post-processing step, allowing the framework to retain only unique, relevant studies.
To ensure transparency and reproducibility, Figure 3 summarizes the complete transformer-based language-model workflow, from corpus construction to indicator extraction and index formation. In this study, the language-model component is implemented as a BERT transformer encoder for phrase-level semantic representation. Specifically, a 12-layer Transformer encoder with multi-head self-attention is used to map each candidate indicator phrase into a contextualized embedding vector, which is subsequently used for semantic similarity matching and clustering. The role of this component is not to generate text or infer causal relationships, but to standardize heterogeneous terminology in the literature and enable consistent and auditable extraction of indicator concepts.
Text preprocessing and phrase detection. After deduplication, we retained the Title and Abstract fields of each record for semantic processing. We applied (i) lowercasing, (ii) punctuation and non-informative symbol removal, and (iii) stopword filtering using a standard English stopword dictionary. Because many urban-expansion indicators appear as multi-word expressions (e.g., “impervious surface”, “built-up area proportion”), we conducted phrase detection to extract candidate indicator terms. Concretely, we extracted noun phrases and compound expressions using rule-based part-of-speech patterns (e.g., adjective–noun and noun–noun compounds) and retained phrases that occurred above a minimum frequency threshold in the corpus to reduce noise. The resulting candidate list constitutes the input vocabulary for the BERT encoder stage.
BERT encoding and semantic vector construction. Each candidate phrase was encoded by BERT to obtain a fixed-length semantic vector. For each phrase, we tokenized the text using WordPiece tokenization and fed tokens through the Transformer encoder to obtain contextualized token representations. We then computed the phrase embedding by pooling token-level representations (mean pooling over the last hidden layer) to produce a single semantic vector for that phrase. This embedding-based representation allows semantically similar indicators that are expressed differently across studies (e.g., “urban land expansion” vs. “built-up land growth”) to be identified through similarity in the embedding space.
Semantic clustering and indicator harmonization. We quantified semantic proximity between phrase embeddings using cosine similarity. Candidate phrases were then grouped via semantic clustering (based on the cosine similarity matrix) to form concept clusters corresponding to indicator families. Within each cluster, we consolidated synonymous or near-synonymous terms into a single standardized indicator label, while removing overly generic phrases that lacked operational meaning. To minimize subjective bias, the clustering output was further checked through an expert review step to confirm that merged phrases referred to the same measurable construct (e.g., “road density” vs. “road network intensity”) and to avoid incorrect merges across distinct concepts. This procedure yields a compact, standardized indicator dictionary that is consistent across the literature corpus and is directly usable for downstream quantification.
Linking literature indicators to measurable variables and producing outputs. The standardized indicator dictionary is then matched to measurable variables from multi-source datasets (remote sensing, statistical records, and literature-derived datasets) to build the panel database used in the UEI calculation. Importantly, not all indicators are derived from remote sensing imagery; rather, remote sensing contributes primarily to land-use/land-cover and surface-environment indicators, while socioeconomic and infrastructure indicators are integrated from official statistics and curated databases.
Once titles and abstracts are collected, the curated corpus is processed using a transformer-based language model, which extracts potential expansion-related indicators and assigns preliminary weights based on contextual importance within the literature. Rationale for using BERT rather than traditional topic models (e.g., LDA). Our objective is to extract indicator-level concepts (often expressed as multi-word phrases) and to harmonize semantically equivalent expressions across studies (e.g., “built-up area”, “impervious surface”, “urban land”). LDA is effective for discovering broad latent topics, but it relies on a bag-of-words representation and the assumption of word exchangeability, which makes it sensitive to vocabulary variation and less suitable for phrase-level concept alignment. In practice, LDA often disperses semantically similar indicator phrases across multiple topics and requires additional manual interpretation to map topics to a consistent indicator list. In contrast, BERT provides contextual embeddings that preserve meaning under different wording, supporting semantic clustering and synonym merging and thereby reducing indicator fragmentation. This property is important for building a stable and interpretable indicator dictionary that can be updated as the literature evolves. This automated extraction captures a wide range of demographic, economic, environmental, land-use, and policy drivers, and reduces selection bias associated with manually curated indicator lists. Indicator terms are then cleaned, harmonized, and normalized to merge synonyms, eliminate repeated concepts, and standardize terminology across themes and periods. The second component of the framework links the cleaned indicator system to three categories of data inputs: open-source spatial data, statistical records, and literature-derived datasets. These combined sources provide comprehensive coverage of built-up area changes, socioeconomic conditions, infrastructure development, and environmental constraints across the study region. All variables are standardized to ensure comparability across spatial units and indicator categories.
In the final step, normalized indicators are weighted according to their literature-based relevance, forming a composite UEI for each spatial unit. The UEI is then mapped to visualize spatial patterns and intensities of urban expansion, enabling the identification of hotspots, transition zones, and cross-city differences within the mega-city region. Through this integrated process, the framework offers an adaptive, transparent, and scalable approach for evaluating urban expansion across large and complex regions.

2.3. Topic Search Selection and Data Collection

We developed a standardized procedure to identify keywords directly related to “urban expansion” and to construct a Topic Search (TS) formula that captures a comprehensive body of literature for further analysis. The procedure consists of three steps: (1) identifying core concepts and their synonymous expressions; (2) refining the keyword list through background reading; and (3) iterative testing and adjustment. In the first step, all relevant terms associated with urban spatial growth were collected, including variations used in geography, remote sensing, planning, and urban studies. The final set of TS keywords reflects the breadth of terminology commonly used to describe expansion processes. The Topic Search was defined as:
  • TS = (“urban expansion”) OR (“urban growth”) OR (“urban sprawl*”) OR (“built-up expansion*”) OR (“urban land use change*”) OR (“urban development patterns*”).
The Web of Science (WOS) database was selected as the primary source for literature acquisition due to its comprehensive coverage of multidisciplinary urban research [23,24]. Using the WOS API, we retrieved full metadata records on October 2025, including titles, abstracts, keywords, publication information, author affiliations, and cited references. All records were downloaded in JSON format to support consistent data handling during the subsequent cleaning and extraction stages.
Because both the TS function and the BERT model rely on English-language tokenization and stopword rules, non-English records are automatically filtered out during metadata processing. This ensures linguistic consistency without the need for manual language removal. After metadata extraction, the full collection of retrieved items was compiled and deduplicated to eliminate repeated entries. This produced a refined literature dataset that provides a reliable foundation for indicator identification and weight extraction. The resulting corpus forms the basis for constructing the indicator system used in this study. It supports the automated extraction of urban expansion drivers and ensures that the dataset reflects the diversity of research perspectives across disciplines and time periods.
During the extraction of potential indicators, two main challenges were encountered. First, high-frequency functional words—such as articles, prepositions, and auxiliary verbs (e.g., “a,” “the,” “is,” “to”)—carry limited semantic meaning but may dominate token counts, thereby introducing noise into frequency-based statistics. Second, many urban expansion indicators are expressed as compound phrases rather than single words, including multi-word expressions that describe land-use processes, spatial forms, or development mechanisms. Simple tokenization based on whitespace would fragment these expressions and reduce their interpretability. To address these issues, the framework applies English stopword filtering and subword-aware processing, enabling the extraction of semantically meaningful indicator phrases while excluding low-information tokens from the potential indicator pool.
After compiling the literature corpus, titles and abstracts were processed using a transformer-based language model to identify terms and phrases that received the highest level of attention across studies. Candidate indicators were identified by encoding detected phrases into BERT embeddings and ranking them by the cumulative literature weight (Equation (1)), followed by semantic clustering in the embedding space to merge near-synonymous expressions. For each article, the relative importance of a keyword or phrase was quantified using a weighting scheme that considers both its frequency and its contextual prominence within the text. Specifically, keywords appearing in titles were assigned greater importance than those appearing only in abstracts, as titles typically reflect the core focus of a study, whereas abstracts may include broader background or discussion elements [24].
The weight of each keyword within a single article was calculated as:
W = N k × E t N t + N a × E a N a ,
where k denotes the keyword, t denotes the title, and a denotes the abstract. W represents the keyword weight, N k is the number of words in the keyword, E t and E a are the occurrence frequencies of the keyword in the title and abstract, respectively, and N t and N a are the total numbers of words in the title and abstract.
By iterating this calculation across all articles, cumulative weights were obtained for each candidate indicator. This approach serves two purposes. First, it captures the relative relevance of indicators by integrating their distribution across the literature. Second, it accounts for keyword length, recognizing that longer phrases often convey more specific and informative meanings in the context of urban expansion. As a result, the weighting process emphasizes indicators that are both frequently discussed and semantically rich.
Following the initial weighting, all candidate indicators were reviewed to address semantic overlap. Indicators with equivalent meanings but different syntactic forms—such as singular and plural variants or closely related expressions—were merged into unified indicators. This harmonization step reduced redundancy and improved conceptual clarity within the indicator system. The final set of indicators and their normalized weights formed the basis for constructing the adaptive urban expansion assessment framework used in subsequent analysis.

2.4. UEI Calculation

The weight values derived from the transformer-based keyword analysis constitute a central component of the urban expansion assessment framework. These weights are based on the assumption that the magnitude of an indicator’s weight is proportional to its relevance in explaining urban expansion processes as reflected in the literature. Indicators that are more frequently discussed and contextually emphasized across studies are therefore considered more influential in characterizing expansion dynamics.
To ensure comparability across indicators, all weights were normalized to a common scale prior to index construction. This normalization step prevents individual indicators with large raw weights from dominating the composite index solely due to scale differences. The normalized weights were then used as definitive coefficients in the calculation of the UEI.
After normalization, each indicator value was multiplied by its corresponding weight, and the weighted indicators were aggregated to generate the UEI for each spatial unit. This procedure allows the UEI to capture the combined effects of multiple expansion-related dimensions, including land-use change, demographic pressure, infrastructure development, and environmental constraints. By grounding the weighting scheme in a literature-driven and data-based process, the framework maintains objectivity while remaining adaptive to changing research priorities and planning contexts. After calculating UEI for each spatial unit, we mapped UEI values to visualize the spatial pattern and intensity of urban expansion across the mega-city region. Mapping serves two purposes: (i) to provide an intuitive representation of cross-city differences in expansion intensity, and (ii) to support subsequent spatial diagnostics that assess whether high (or low) UEI values tend to cluster geographically rather than appearing randomly distributed. Because urban expansion is shaped by spatial spillovers, regional coordination, and shared infrastructure and policy contexts, we treat spatial pattern analysis as a necessary complement to index construction. To evaluate spatial dependence, we defined a spatial weights matrix W describing neighborhood relationships among spatial units. Consistent with common practices in regional and urban studies, W can be specified using contiguity-based adjacency (e.g., first-order contiguity) or distance-based neighborhoods (e.g., k -nearest neighbors) depending on the geometry and connectivity of the study units. The UEI vector and the weights matrix together form the basis for spatial autocorrelation statistics reported in the Section 3.
We applied Local Moran’s I to identify local clusters and spatial outliers of UEI. This statistic classifies each spatial unit into regimes such as high–high (HH) clusters, low–low (LL) clusters, and potential outliers (high–low or low–high), enabling the identification of spatially coherent expansion “cores” and transition zones. In parallel, we used the Getis–Ord Gi* statistic to detect statistically significant hotspots and cold spots of UEI intensity, which provides a complementary perspective that emphasizes the spatial concentration of high (or low) UEI values. Using both approaches helps reduce method dependence and supports a more robust interpretation when their spatial signals are consistent. We note that spatial autocorrelation statistics can be sensitive when the number of spatial units is limited and when neighborhood definitions vary; therefore, we use Local Moran’s I and Getis–Ord Gi* primarily as exploratory diagnostics to describe clustering patterns of UEI and interpret the resulting regimes cautiously in the context of the study area’s sample size.

2.5. Data

To operationalize the literature-derived indicators, multi-source spatial and statistical datasets were collected and integrated at the city level. Urban physical expansion was characterized using the Global Artificial Impervious Area (GAIA)V10 dataset, which provides annual global impervious surface maps at a 30 m spatial resolution for the period 1985–2018 (https://data-starcloud.pcl.ac.cn/iearthdata/, accessed on 1 September 2025). GAIA was used to derive the proportion of impervious surface and its temporal change as a core indicator of built-up expansion.
Urban land-use structure and land-use change were represented using the 30 m annual land cover dataset of China (1990–2020), which provides consistent land-use classifications and change information at the national scale (Earth System Science Data). This dataset was used to calculate land-use and land-cover change intensity and to derive green space indicators. Socioeconomic development was quantified using gridded gross domestic product (GDP) and population density datasets at a 1 km spatial resolution, provided by the Chinese Academy of Sciences Resource and Environment Science Data Center (RESDC). The GDP dataset represents per capita economic intensity, while the population dataset captures demographic concentration and urbanization pressure.
Environmental response indicators were derived from multiple sources. Carbon emissions were obtained from the Emissions Database for Global Atmospheric Research (EDGAR), which provides gridded emission estimates at a 1 km resolution (https://edgar.jrc.ec.europa.eu, accessed on 1 September 2025). Urban thermal conditions were characterized using land surface temperature (LST) data derived from MODIS (MOD11A2), focusing on summer months to capture peak thermal conditions, with spatial resolution resampled to match other datasets. In addition, human thermal comfort was represented using the HiTIC-Monthly dataset, which provides a 1 km resolution human thermal index for China from 2003 to 2020 [25]. Urban infrastructure and public service capacity were represented using statistical indicators obtained from the China Urban Construction Statistical Yearbook, including gas supply coverage, water supply coverage, road density, and per capita green space. These indicators were compiled at the city level and aligned with spatial datasets through administrative boundaries. All raster-based datasets were spatially aggregated to city boundaries using area-weighted statistics, while statistical indicators were directly assigned to corresponding cities. To ensure comparability across indicators with different units and scales, all variables were normalized prior to integration into the UEI.

3. Results

3.1. Framework and Weight Distribution of UEI

The composition and weight distribution of the UEI are reported in Table 1. The indicator system was constructed following the adaptive evaluation framework illustrated in Figure 2, which integrates literature-driven indicator identification with multi-source spatial data.
First, a topic search (TS) strategy was developed to retrieve urban expansion-related studies from the Web of Science (WoS) Core Collection. Core concepts and their synonyms were identified through background review, iteratively refined, and tested to ensure comprehensive coverage of urban expansion processes. Based on this TS strategy, a total of 48,187 peer-reviewed publications were retrieved. Duplicate records and non-English publications were removed, and titles and abstracts were collected as the input corpus for subsequent analysis. Second, the curated literature corpus was processed using a transformer-based language model to identify potential indicators and to estimate their relative importance based on semantic relevance. At this stage, indicators listed in Table 1 were not predefined search terms, but rather the result of automated semantic extraction and clustering. “Gas” represents a city-level proxy for gas-supply infrastructure capacity and was obtained from official statistical/infrastructure datasets rather than extracted from remote sensing imagery; as pipelines are subterranean, remote sensing is not used for this indicator. The gas indicator was standardized at the city level and integrated with other socioeconomic/infrastructure variables in the panel database. Indicator cleaning and harmonization were then conducted to merge synonymous concepts and remove redundancy. The resulting indicator groups represent the dominant dimensions of urban expansion discussed in the literature, while the descriptions in Table 1 provide concise interpretations of these groups.
Following indicator identification, normalized weights were assigned based on their relative prominence in the literature. Indicators related to spatial expansion, including land-use and land-cover change intensity (LULCP) and built-up land proportion (GAIA), account for a substantial share of the total weight. This reflects their direct correspondence with observable changes in urban form captured by remote sensing data and their central role in urban expansion studies.
Socioeconomic indicators, such as gross domestic product (GDP) and total population size, capture demand-driven forces underlying urban growth. Their moderate and stable weights indicate consistent recognition across studies. Infrastructure-related indicators, including road density and public service capacity (water and gas supply), represent the physical support systems associated with expanding built-up areas. Environmental response indicators, including land surface temperature, carbon emissions, and green space coverage, were incorporated to reflect environmental feedbacks accompanying urban expansion. Although these variables do not directly quantify spatial growth, their inclusion and assigned weights indicate that environmental impacts are widely treated as integral components of expansion processes.

3.2. Spatial Patterns of Individual Indicators Across the CPMR in 2010 and 2020

The spatial distribution of the UEI in 2010, 2020, and its change over the 2010–2020 period across the CPMR is shown in Figure 4. The spatial distribution of individual indicators was derived by integrating multi-source datasets at the city level. For raster-based indicators, including built-up area, GAIA, LST, carbon emissions, and land-use-related variables, grid-level values were spatially aggregated to administrative city boundaries using area-weighted statistics. Socioeconomic and infrastructure indicators obtained from statistical records were directly assigned to corresponding cities. All indicators were then normalized to ensure comparability and mapped to visualize relative spatial differences across the agglomeration. In 2010, UEI values exhibit a clear spatial differentiation, with higher levels concentrated in a limited number of central cities, while most peripheral cities display relatively low expansion intensity. This pattern indicates that urban expansion at the beginning of the study period was strongly centered on core urban areas. By 2020, the spatial extent of high UEI values expands and becomes more continuous across the region. Several cities surrounding the original core show marked increases in expansion intensity, suggesting a gradual outward diffusion of urban growth from central cities toward adjacent areas. Although core cities continue to exhibit the highest UEI levels, the contrast between core and peripheral areas is reduced compared with 2010. The UEI change map highlights substantial spatial heterogeneity in expansion dynamics. Cities located along major development corridors and near core urban centers experience stronger increases in UEI, while more remote areas show relatively modest changes. This uneven pattern indicates that urban expansion within the agglomeration proceeds at different rates, reflecting spatially differentiated development trajectories rather than uniform regional growth.
The spatial distribution of individual indicators used in UEI for 2010 and 2020 are shown in Figure 5 and Figure 6. In both years, indicators related to built-up area and GAIA show clear spatial concentration in core cities of the CPMR, with higher values consistently observed in central and eastern parts of the region. Peripheral cities generally display lower levels of built-up land proportion, indicating a more limited degree of urban development during both periods.
Economic and demographic indicators exhibit similar spatial structures across years. GDP and population size remain strongly clustered around major urban centers in 2010, and this pattern persists in 2020. While absolute values increase in many cities, the relative spatial hierarchy shows limited change, with core cities maintaining their dominant positions. This stability suggests that economic activity and population concentration remain spatially anchored despite continued regional growth. Environmental indicators display more heterogeneous patterns. In 2010, land surface temperature (LST) shows elevated values in several central and southern cities, while lower values are distributed across less urbanized areas. By 2020, higher LST values become more spatially continuous across the agglomeration, with expanded high-value zones extending beyond core cities. Carbon emissions show a similar tendency, with higher values concentrated in industrial and densely populated areas in both years, though the spatial extent of higher emissions expands in 2020.
Indicators related to green space coverage exhibit an inverse spatial pattern relative to built-up area. In both years, cities with higher levels of urban expansion generally display lower green space values, while peripheral areas maintain comparatively higher green coverage. This contrast remains evident in 2020, although the spatial difference between core and peripheral cities becomes less pronounced in some subregions. Infrastructure-related indicators, including gas supply, water provision, and road density, present relatively consistent spatial patterns between 2010 and 2020. Higher values are primarily observed in central cities, while lower values persist in outer areas. Compared with land-use and environmental indicators, infrastructure indicators show less spatial variability and fewer abrupt changes across the study period.

3.3. Spatial Changes of Indicators Across the CPMR Between 2010 and 2020

Figure 7 illustrates the spatial differences in each indicator between 2010 and 2020. Built-up area and GAIA changes show positive growth across most cities, with stronger increases concentrated in central and southeastern parts of the agglomeration. Several peripheral cities also exhibit moderate growth, indicating a gradual outward expansion from the urban core.
Economic and population-related changes present more localized patterns. GDP growth is pronounced in selected cities rather than uniformly distributed, reflecting uneven economic expansion across the region. Population change shows a similar structure, with notable increases concentrated in a limited number of cities, while several areas experience relatively small changes. Environmental indicators demonstrate shared spatial trends. Increases in land surface temperature are observed across a wide portion of the agglomeration, with particularly strong changes in central and southern cities. Carbon emissions also increase in many of the same areas, indicating a spatial overlap between thermal intensification and emission growth. In contrast, green space change displays a more mixed pattern, with both increases and decreases observed across cities, and no single dominant spatial trend.
Infrastructure-related changes show relatively modest variation. Gas and water supply indicators increase in many cities, but the magnitude of change is generally small and spatially scattered. Road density change appears more localized, with noticeable increases in selected cities rather than across the entire region. Across indicators, several common change patterns can be identified. Land-use, economic, and thermal indicators tend to show stronger and more spatially continuous growth, while infrastructure and service-related indicators change more gradually. Environmental indicators, particularly LST and carbon emissions, display spatial patterns that partially align with areas of intensified urban expansion.

3.4. Inter-Indicator Correlation Structure and Temporal Dynamics Across the CPMR

Figure 8a–c summarizes the correlation structure among UEI indicators for 2010, 2020, and their interannual changes, providing insights into how relationships among land-use, socioeconomic, environmental, and infrastructure indicators evolve over time. In both 2010 and 2020, strong and stable positive correlations are observed among built-up area, GAIA, GDP, and population. These indicators consistently exhibit moderate to high correlation coefficients, indicating that spatial expansion, economic activity, and demographic concentration are closely coupled across cities in the mega-city region. The persistence of these relationships across both years suggests that the core land–economy–population linkage remained structurally stable despite changes in absolute values and spatial distributions.
Environmental indicators display more heterogeneous correlation patterns. Land surface temperature and carbon emissions generally show positive correlations with built-up area and economic indicators in both periods, reflecting the thermal and emission responses associated with intensified urban expansion. However, the strength of these correlations varies between 2010 and 2020, indicating that environmental responses to urban growth are not uniform across time or space. Green space indicators tend to show weak or negative correlations with expansion- and economy-related variables, highlighting the contrasting spatial behavior between ecological elements and built-up growth. Compared with land-use and socioeconomic indicators, environmental variables exhibit greater dispersion in correlation values, suggesting differentiated environmental sensitivities among cities.
Infrastructure-related indicators, including road density, gas supply, and water provision, present moderate correlations with both land-use and socioeconomic indicators. These correlations are generally positive but weaker than those observed among expansion and economic variables, implying that infrastructure development responds to urban growth while following more gradual or city-specific adjustment pathways. This pattern remains consistent across both years, indicating a relatively stable but secondary role of infrastructure indicators within the overall expansion system.
The correlation matrix of indicator changes (Figure 8c) reveals additional dynamics that are not evident from static-year correlations. Changes in built-up area and GAIA show positive correlations with changes in GDP and population, suggesting that cities experiencing faster spatial expansion also tend to undergo stronger economic and demographic growth. In contrast, changes in environmental indicators exhibit mixed relationships with expansion-related changes. For example, increases in land surface temperature are positively associated with expansion changes in some cities, while green space changes display both positive and negative correlations, reflecting divergent local development and environmental management trajectories. Infrastructure-related changes generally show weaker and more scattered correlations, indicating that short-term adjustments in infrastructure provision are less synchronized with rapid expansion processes.

3.5. Shifting Centers of Urban Expansion Within the Agglomeration

Local spatial autocorrelation analysis reveals that UEI exhibits statistically significant but spatially limited clustering patterns across the study area. In 2010, clusters identified using Local Moran’s I are mainly concentrated in the central part of the mega-city region, where cities with relatively high UEI values are adjacent to cities with similarly high values. In contrast, Low–Low clusters are primarily distributed in the western and southwestern cities, indicating spatially contiguous areas characterized by relatively low UEI levels. For most cities, UEI does not exhibit statistically significant local spatial dependence, suggesting that spatial clustering was not widespread at this stage (Figure 9a). A consistent spatial pattern is obtained using the Getis–Ord Gi* statistic for the same year. High-value hot spots are concentrated in the central area, while cold spots appear mainly in the western cities (Figure 9d). The correspondence between the two methods indicates that the detected clusters are robust to the choice of local spatial statistic and reflect stable spatial structures rather than methodological artifacts.
By 2020, the spatial configuration of UEI clustering changes. High–High clusters identified by both Local Moran’s I and Getis–Ord Gi* are mainly located in the northern and northeastern cities, indicating a shift in the spatial concentration of high UEI values over time (Figure 9b,e). Low–Low clusters remain limited in number and spatial extent, with only minor changes in location compared with 2010. Although the proportion of statistically significant cities increases slightly, most cities continue to show weak local spatial autocorrelation, indicating that UEI clustering remains localized rather than region-wide.
The spatial pattern of UEI change between 2010 and 2020 differs from that observed for static UEI levels. High–High clusters of UEI change are mainly distributed in the western and central cities, reflecting spatially concentrated areas where UEI increased at a similar rate over the study period (Figure 8c). Low–Low clusters of change are sparse and spatially fragmented. Compared with static-year clustering, the change-based results show weaker spatial continuity and greater heterogeneity, indicating that short-term UEI dynamics are less spatially synchronized across the mega-city region. This pattern is consistently identified by both Local Moran’s I and Getis–Ord Gi* (Figure 9f). Although Local Moran’s I and Getis–Ord Gi* yield highly consistent spatial patterns, the two statistics emphasize different aspects of local spatial structure. Local Moran’s I identifies similarity or dissimilarity between each city and its immediate neighbors, while Gi* highlights the concentration of high or low values within a broader local context. The observed correspondence therefore indicates that UEI clustering in the study area is sufficiently pronounced to be detected by both neighborhood-based and hotspot-based approaches, rather than reflecting methodological redundance.

4. Discussion

4.1. Positioning the Transformer-Based Language-Model Framework Within Existing Urban Expansion Assessments

Urban expansion has traditionally been quantified using remote sensing-derived indicators such as built-up area, impervious surface, land-use conversion rates, and spatial metrics of urban form [26,27,28]. These approaches provide robust measurements of observable expansion outcomes, but they usually rely on predefined indicator systems and relatively fixed weighting schemes, which limits their ability to reflect evolving research priorities, cross-disciplinary knowledge, or region-specific development contexts. In many index-based assessment frameworks, indicator weights are assigned through expert judgment, equal weighting, or variance-based statistical methods such as PCA [8,28]. Although these approaches are widely used and effective, they may underrepresent indicators that are increasingly emphasized in the broader urban expansion literature but are not incorporated into the initial analytical design.
Recent studies have introduced machine learning into urban expansion research primarily for land-cover classification, urban boundary extraction, and growth simulation [29,30,31]. These applications have substantially improved the automation and predictive capacity of urban analysis, but they still focus mainly on identifying, classifying, or forecasting spatial outcomes. By contrast, the present study applies a transformer-based language model at an earlier stage of the analytical chain, namely in the construction of the indicator system itself. In this sense, the proposed approach is not intended to replace remote sensing or spatial analysis, but to complement them by improving how indicators are discovered, standardized, and weighted before spatial evaluation is conducted.
This distinction is also important when comparing the proposed framework with conventional text-mining approaches. Methods based on keyword frequency or topic modeling, such as LDA, are effective for identifying broad thematic structures in the literature, but they are often less capable of capturing contextual similarity across semantically related expressions. Transformer-based models such as BERT provide stronger semantic representation and contextual encoding, making them more suitable for harmonizing heterogeneous indicator expressions drawn from different disciplinary traditions [12,32]. By embedding this capability into the construction of the UEI, the proposed framework moves beyond outcome-oriented urban expansion mapping and introduces a literature-adaptive mechanism that links the evolving knowledge base of urban expansion research with quantitative spatial assessment. This is the main methodological difference between the present study and existing fixed-indicator or purely outcome-driven assessment frameworks.

4.2. Spatial Persistence and Path Dependence of Urban Expansion

The spatial patterns observed in the CPMR indicate a strong persistence of urban expansion intensity in core cities between 2010 and 2020. This finding is consistent with extensive empirical evidence showing that urban expansion tends to follow path-dependent trajectories, where early advantages in accessibility, infrastructure, and economic agglomeration continue to shape long-term growth [27,33,34]. Similar spatial persistence has been documented in China’s Yangtze River Delta, Pearl River Delta, and Beijing–Tianjin–Hebei regions, where expansion reinforces existing urban hierarchies rather than redistributing growth evenly across cities [35,36].
At the global scale, large comparative studies also demonstrate that urban expansion concentrates around established urban centers, even as peripheral growth accelerates [37,38]. The stability of spatial hierarchies revealed by UEI therefore reflects a broader structural characteristic of urban systems, in which expansion is shaped by cumulative causation rather than short-term fluctuations alone. The CPMR results align with these findings, indicating that urban expansion dynamics in inland China are increasingly similar to those observed in mature coastal agglomerations.
What is methodologically important here is that the persistence of core cities is revealed by a literature-adaptive composite index rather than by a single morphological measure alone. If urban expansion were assessed only through built-up area growth, the observed hierarchy could be interpreted simply as a scale effect of already large cities. In the UEI, however, the persistence of core cities remains visible after combining land-use, socioeconomic, infrastructure, and environmental dimensions through literature-derived weights. This suggests that path dependence in the CPMR is not limited to land conversion itself; rather, it reflects a coupled urbanization process in which physical expansion, agglomeration economies, service provision, and environmental pressure reinforce one another. This is a substantive difference from single-indicator assessments and helps explain why some cities remain central to the regional expansion system even when their areal growth alone is not the most extreme.

4.3. Environmental Feedbacks and Differentiated Expansion Responses

The heterogeneous spatial behavior of environmental indicators observed in this study reflects the complex and uneven environmental responses to urban expansion. The spatial overlap between increased built-up intensity, land surface temperature, and carbon emissions is consistent with a large body of literature linking urban expansion to urban heat island effects and emission growth [19,39,40]. These relationships have been observed across diverse climatic and development contexts, indicating a robust coupling between expansion intensity and environmental pressure.
In contrast, the mixed spatial patterns of green space change highlight divergent local development pathways. While some cities experience ecological space loss under expansion pressure, others maintain or even increase green coverage through planning interventions or land-use regulation [40,41]. This divergence has been emphasized in recent studies that argue against treating environmental outcomes as uniform consequences of urban growth [42]. By incorporating environmental indicators directly into UEI, the framework captures these differentiated responses and avoids the simplification inherent in expansion-only assessments. A further difference from many remote-sensing-based expansion studies is that environmental variables are incorporated directly into the UEI rather than being examined only as downstream consequences after expansion has been mapped. This design allows the analysis to ask not only where urban expansion occurs, but also where expansion is accompanied by stronger thermal and carbon responses or by weaker ecological buffering. At the same time, this integration should be interpreted cautiously because environmental indicators often operate on different temporal lags and spatial scales from land-use and socioeconomic indicators. Future versions of the framework could improve this component by introducing indicator-specific temporal windows and cross-modal fusion strategies, so that land, socioeconomic, and environmental signals are not assumed to respond synchronously.

4.4. Spatial Clustering and Reorganization of Expansion Intensity

The local spatial autocorrelation results reveal both persistent clustering and spatial reorganization of urban expansion intensity within the CPMR. The relocation of High–High clusters between 2010 and 2020 suggests a shift in expansion focus, likely associated with changes in infrastructure investment, industrial layout, and regional development strategies. Similar spatial reorganization of expansion hotspots has been reported in studies examining transport-oriented development and industrial relocation in rapidly urbanizing regions [43,44].
The weaker and more fragmented clustering patterns observed for UEI change align with findings that short-term urban growth rates are more sensitive to local policy interventions, land supply constraints, and economic restructuring than long-term urban form [3,45]. The combined use of Local Moran’s I and Getis–Ord Gi* follows best practices in spatial analysis and has been widely applied in urban studies to distinguish stable spatial structures from transient growth dynamics [46,47,48,49]. The strong agreement between Local Moran’s I and Getis–Ord Gi* results reflects both methodological robustness and the underlying spatial structure of the CPMR. In regions where urban expansion is strongly organized around a limited number of dominant core cities, spatial gradients tend to be clear and continuous. Under such conditions, both neighborhood-based clustering (Local Moran’s I) and hotspot detection (Gi*) are likely to identify similar spatial patterns.
However, this correspondence should not be interpreted as a universal feature of all mega-city regions. In polycentric or highly fragmented urban systems, where expansion occurs through multiple competing centers or discontinuous development corridors, Local Moran’s I and Gi* may reveal more divergent spatial patterns. The consistency observed in this study therefore reflects the relatively centralized development structure of the CPMR, rather than a general property of urban expansion across all city clusters. The consistency between the two statistics strengthens confidence in the identified clustering patterns.

4.5. Architecture-Level Limitations and Future Improvement of the Transformer-Based Language-Model Component

By integrating Transformer-based indicator extraction with spatial analysis, this study contributes to ongoing efforts to develop adaptive, data-driven urban assessment frameworks [50,51]. The framework is particularly suited for large mega-city regions where development processes are multidimensional and rapidly evolving. At the same time, the current language-model component remains a relatively lightweight semantic extraction pipeline, and several limitations warrant further attention.
First, the present framework mainly encodes candidate phrases at the phrase level and groups them according to semantic similarity. Although this strategy improves the standardization of heterogeneous indicator expressions, it preserves only part of the original document context. As a result, an indicator emphasized as a central explanatory factor in one study and an indicator mentioned only as a secondary contextual condition in another study may receive similar semantic treatment during the extraction stage. Second, the current implementation relies on a general BERT-family encoder rather than a model specifically adapted to the vocabulary of urban science, remote sensing, and planning. This may limit semantic precision when identifying specialized terms, compound indicators, and cross-disciplinary concepts. Third, the weighting structure derived from the literature primarily reflects the relative prominence of topics in the existing research corpus, rather than their causal importance, empirical robustness, or policy urgency. Therefore, the resulting weights should be interpreted as a structured representation of current knowledge emphasis rather than as a fixed or universally optimal description of urban expansion mechanisms.
A further limitation concerns the composition of the literature corpus itself. The reliance on English-language studies may bias indicator extraction and weighting toward themes more frequently emphasized in international journals, while underrepresenting region-specific planning practices and policy experiences documented in local-language publications. Similar limitations have been noted in global urban synthesis research [36]. This issue is especially important in rapidly urbanizing regions, where locally grounded planning discourse may differ from internationally dominant research narratives. At the same time, this dependence on accumulated knowledge also constitutes a major strength of the framework. Because indicator identification and weighting are derived from the literature corpus rather than from predefined expert rules alone, the framework is adaptive by design and can be updated as new studies emerge and research priorities evolve. Periodic re-running of the literature search, semantic extraction, and indicator harmonization process would allow the weighting structure to adjust dynamically, thereby keeping the UEI aligned with changing scientific understanding, planning concerns, and development contexts. In this sense, the framework supports longitudinal updating rather than static evaluation.
These limitations also point to several concrete directions for future improvement. A first step would be to conduct domain-adaptive pretraining or fine-tuning on corpora from urban planning, geography, and remote sensing so that the encoder can better capture specialized terminology and domain-specific semantic relationships. A second step would be to move beyond phrase-level encoding toward a hierarchical architecture that integrates phrase-, sentence-, and document-level representations, allowing the model to distinguish more effectively between core indicators and secondary contextual references. A third step would be to incorporate multilingual or cross-lingual literature so that the extracted indicator system better reflects both international and local knowledge bases. A fourth step would be to add retrieval-based evidence tracing, so that each extracted indicator can be linked back to representative sentences, abstracts, or source documents, thereby improving transparency and interpretability. Future research could also combine this framework with expert-in-the-loop validation, dynamic updating as new publications emerge, and scenario-based simulations of urban growth, further enhancing its capacity to support spatial planning and sustainability-oriented decision-making.

4.6. Implications for Transferability and Planning Practice

The practical value of the framework lies in its ability to update the indicator system as the literature and planning agenda evolve. Rather than locking urban expansion assessment into a fixed set of variables, the approach allows new concerns—such as carbon pressure, thermal stress, infrastructure equity, or ecological restoration—to enter the evaluation system when they become more prominent in scientific and policy discourse. This is especially useful for large urban agglomerations where development pressures are heterogeneous and where a purely land-based metric may understate environmental and service-related stresses.
Nevertheless, transferability should be treated as conditional rather than automatic. The current application uses a single study region, city-level aggregation, and a specific combination of remote sensing and statistical datasets. In other regions, the indicator dictionary, spatial scale, neighborhood definition, and data availability may all affect the resulting index structure and spatial patterns. The framework should therefore be understood as a replicable procedure rather than as a universally fixed indicator set. Comparative applications across multiple urban agglomerations would be valuable for testing the robustness of the literature-derived weighting logic under different development regimes.

5. Conclusions

This study developed an adaptive framework for urban expansion assessment by combining a transformer-based language model with multi-source spatial data. Rather than predefining indicators and weights entirely through expert judgment or fixed statistical schemes, the framework uses semantic extraction and harmonization of the urban-expansion literature to build a literature-adaptive Urban Expansion Index (UEI). In this sense, the methodological contribution of the study is not simply the introduction of a language model into an urban application, but the construction of a bridge between evolving scientific knowledge and measurable spatial variables.
Applied to the Central Plains Urban Agglomeration, the UEI reveals that urban expansion between 2010 and 2020 remained concentrated in core cities, while surrounding areas exhibited uneven and corridor-like growth trajectories. The results also show that environmental responses, especially land surface temperature and carbon emissions, do not change uniformly across the region, indicating differentiated local consequences of urban expansion. Local Moran’s I and Getis–Ord Gi* further suggest that expansion intensity is spatially clustered but that the location of high-intensity clusters shifted over time, pointing to a reorganization rather than a simple outward enlargement of regional growth centers. At the same time, the present language-model pipeline remains limited by its English-language corpus, phrase-level semantic representation, and reliance on literature prominence as a proxy for indicator importance. Future work should therefore move toward domain-adapted and multilingual encoders, document-level relevance modeling, retrieval-based evidence tracing, and explicit uncertainty assessment. With these improvements, the framework could support more robust comparative urban studies and more adaptive spatial planning in rapidly changing metropolitan regions.

Author Contributions

Conceptualization, F.W. and J.G.; methodology, F.W., Z.Z. and R.W.; software, Z.Z. and R.W.; validation, F.W., R.W., D.S. and B.N.; formal analysis, F.W. and Z.Z.; investigation, R.W. and B.N.; resources, J.G. and X.L.; data curation, D.S. and R.W.; writing—original draft preparation, F.W.; writing—review and editing, F.W., J.G. and X.L.; visualization, Z.Z. and D.S.; supervision, J.G. and X.L.; project administration, J.G.; funding acquisition, J.G. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42271354.

Data Availability Statement

All data and code will be made available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, X.Q. The Trends, Promises and Challenges of Urbanisation in the World. Habitat Int. 2016, 54, 241–252. [Google Scholar] [CrossRef]
  2. Sun, L.; Chen, J.; Li, Q.; Huang, D. Dramatic Uneven Urbanization of Large Cities throughout the World in Recent Decades. Nat. Commun. 2020, 11, 5366. [Google Scholar] [CrossRef]
  3. Mahtta, R.; Fragkias, M.; Güneralp, B.; Mahendra, A.; Reba, M.; Wentz, E.A.; Seto, K.C. Urban Land Expansion: The Role of Population and Economic Growth for 300+ Cities. npj Urban Sustain. 2022, 2, 5. [Google Scholar] [CrossRef]
  4. Zhang, Y.; Mao, W.; Zhang, B. Distortion of Government Behaviour under Target Constraints: Economic Growth Target and Urban Sprawl in China. Cities 2022, 131, 104009. [Google Scholar] [CrossRef]
  5. Wei, Y.D.; Ewing, R. Urban Expansion, Sprawl and Inequality. Landsc. Urban Plan. 2018, 177, 259–265. [Google Scholar] [CrossRef]
  6. Li, Y.; Jia, N.; Zhang, Z.; Cheng, J.; Song, W.; Liu, L.; Bao, S.; Zheng, L.; Chen, R. Mapping Global Urban Inequality under Climate Change and Its Interaction with Sustainable Development. GISci. Remote Sens. 2025, 62, 2513104. [Google Scholar] [CrossRef]
  7. Fernández, J.E.; Angel, M. Ecological City-States in an Era of Environmental Disaster: Security, Climate Change and Biodiversity. Sustainability 2020, 12, 5532. [Google Scholar] [CrossRef]
  8. Yu, W.; Zhou, W. The Spatiotemporal Pattern of Urban Expansion in China: A Comparison Study of Three Urban Megaregions. Remote Sens. 2017, 9, 45. [Google Scholar] [CrossRef]
  9. Forget, Y.; Shimoni, M.; Gilbert, M.; Linard, C. Mapping 20 Years of Urban Expansion in 45 Urban Areas of Sub-Saharan Africa. Remote Sens. 2021, 13, 525. [Google Scholar] [CrossRef]
  10. Xie, H.; Zhang, Y.; Duan, K. Evolutionary Overview of Urban Expansion Based on Bibliometric Analysis in Web of Science from 1990 to 2019. Habitat Int. 2020, 95, 102100. [Google Scholar] [CrossRef]
  11. Rillig, M.C.; Ågerstrand, M.; Bi, M.; Gould, K.A.; Sauerland, U. Risks and Benefits of Large Language Models for the Environment. Environ. Sci. Technol. 2023, 57, 3464–3466. [Google Scholar] [CrossRef]
  12. Areshey, A.; Mathkour, H. Transfer Learning for Sentiment Classification Using Bidirectional Encoder Representations from Transformers (BERT) Model. Sensors 2023, 23, 5232. [Google Scholar] [CrossRef] [PubMed]
  13. Gardazi, N.M.; Daud, A.; Malik, M.K.; Bukhari, A.; Alsahfi, T.; Alshemaimri, B. BERT Applications in Natural Language Processing: A Review. Artif. Intell. Rev. 2025, 58, 166. [Google Scholar] [CrossRef]
  14. Fu, X.; Li, C.; Quan, S.J.; Yigitcanlar, T.; Wasserman, D. Large language models in urban planning. Nat. Cities 2025, 2, 585–592. [Google Scholar] [CrossRef]
  15. Zheng, Y.; Xu, F.; Lin, Y.; Santi, P.; Ratti, C.; Wang, Q.R.; Li, Y. Urban planning in the era of large language models. Nat. Comput. Sci. 2025, 5, 727–736. [Google Scholar] [CrossRef]
  16. Xu, F.; Wang, Q.; Moro, E.; Chen, L.; Miranda, A.S.; González, M.C.; Tizzoni, M.; Song, C.; Ratti, C.; Bettencourt, L.; et al. Using human mobility data to quantify experienced urban inequalities. Nat. Hum. Behav. 2025, 9, 654–664. [Google Scholar] [CrossRef]
  17. Wang, S.; Hu, T.; Xiao, H.; Li, Y.; Zhang, C.; Ning, H.; Zhu, R.; Li, Z.; Ye, X. GPT, large language models (LLMs) and generative artificial intelligence (GAI) models in geospatial science: A systematic review. Int. J. Digit. Earth 2024, 17, 2353122. [Google Scholar] [CrossRef]
  18. Luo, H.; Li, L.; Lei, Y.; Wu, S.; Yan, D.; Fu, X.; Luo, X.; Wu, L. Decoupling Analysis between Economic Growth and Resources Environment in Central Plains Urban Agglomeration. Sci. Total Environ. 2021, 752, 142284. [Google Scholar] [CrossRef]
  19. Peng, L.; Zhang, L.; Li, X.; Wang, Z.; Wang, H.; Jiao, L. Spatial Expansion Effects on Urban Ecosystem Services Supply-Demand Mismatching in Guanzhong Plain Urban Agglomeration of China. J. Geogr. Sci. 2022, 32, 806–828. [Google Scholar] [CrossRef]
  20. Wang, Z.; Wang, L.; Zhao, B.; Pei, Q. Analysis of Spatiotemporal Interaction Characteristics and Decoupling Effects of Urban Expansion in the Central Plains Urban Agglomeration. Land 2023, 12, 772. [Google Scholar] [CrossRef]
  21. Zhao, S.; Da, L.; Tang, Z.; Fang, H.; Song, K.; Fang, J. Ecological consequences of rapid urban expansion: Shanghai, China. Front. Ecol. Environ. 2025, 4, 341–346. [Google Scholar] [CrossRef]
  22. Jiang, Z.; Wu, H.; Xu, Z.; Shen, F.; Jia, N.; Huang, J.; Lin, A. Optimizing Land Use Spatial Patterns to Balance Urban Development and Resource-Environmental Constraints: A Case Study of China’s Central Plains Urban Agglomeration. J. Environ. Manag. 2025, 380, 125173. [Google Scholar] [CrossRef] [PubMed]
  23. Maia, S.C.; De Benedicto, G.C.; Do Prado, J.W.; Robb, D.A.; De Almeida Bispo, O.N.; De Brito, M.J. Mapping the Literature on Credit Unions: A Bibliometric Investigation Grounded in Scopus and Web of Science. Scientometrics 2019, 120, 929–960. [Google Scholar] [CrossRef]
  24. Pranckutė, R. Web of science (WoS) and Scopus: The titans of bibliographic information in today’s academic world. Publications 2023, 9, 12. [Google Scholar] [CrossRef]
  25. Zhang, H.; Luo, M.; Zhao, Y.; Lin, L.; Ge, E.; Yang, Y.; Ning, G.; Cong, J.; Zeng, Z.; Gui, K.; et al. HiTIC-Monthly: A monthly high spatial resolution (1 km) human thermal index collection over China during 2003–2020. Earth Syst. Sci. Data 2023, 15, 359–381. [Google Scholar] [CrossRef]
  26. Brkic, S.; Vucenovic, M.; Djokic, Z. Title, Abstract, Key Words and References in Biomedical Articles. Arch. Oncol. 2003, 11, 207–209. [Google Scholar] [CrossRef]
  27. Seto, K.C.; Güneralp, B.; Hutyra, L.R. Global Forecasts of Urban Expansion to 2030 and Direct Impacts on Biodiversity and Carbon Pools. Proc. Natl. Acad. Sci. USA 2012, 109, 16083–16088. [Google Scholar] [CrossRef]
  28. Trinder, J.; Liu, Q. Assessing environmental impacts of urban growth using remote sensing. Geo-Spat. Inf. Sci. 2020, 23, 20–39. [Google Scholar] [CrossRef]
  29. Ding, Q.; Shao, Z.; Huang, X.; Altan, O.; Hu, B. Time-series land cover mapping and urban expansion analysis using OpenStreetMap data and remote sensing big data: A case study of Guangdong-Hong Kong-Macao Greater Bay Area, China. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 103001. [Google Scholar] [CrossRef]
  30. Doe, B.; Amoako, C.; Adamtey, R. Spatial expansion and patterns of land use/land cover changes around Accra, Ghana–Emerging insights from Awutu Senya East Municipal Area. Land Use Policy 2022, 112, 105796. [Google Scholar] [CrossRef]
  31. Yang, J.; Tang, W.; Gong, J.; Shi, R.; Zheng, M.; Dai, Y. Simulating urban expansion using cellular automata model with spatiotemporally explicit representation of urban demand. Landsc. Urban Plan. 2023, 31, 104640. [Google Scholar] [CrossRef]
  32. Bhopale, A.P.; Tiwari, A. Transformer based contextual text representation framework for intelligent information retrieval. Expert Syst. Appl. 2024, 238, 121629. [Google Scholar] [CrossRef]
  33. Wang, Y.; Yin, S.; Fang, X.; Chen, W. Interaction of economic agglomeration, energy conservation and emission reduction: Evidence from three major urban agglomerations in China. Energy 2022, 241, 122519. [Google Scholar] [CrossRef]
  34. Acheampong, R.A.; Asabere, S.B. Urban expansion and differential accessibility by car and public transport in the Greater Kumasi city-region, Ghana—A geospatial modelling approach. J. Transp. Geogr. 2022, 98, 103257. [Google Scholar] [CrossRef]
  35. Fang, C.; Yu, D. Urban Agglomeration: An Evolving Concept of an Emerging Phenomenon. Landsc. Urban Plan. 2017, 162, 126–136. [Google Scholar] [CrossRef]
  36. Zhong, S.; Huang, X.; Mao, X. Deciphering China’s urban-rural income gap: A multi-level analysis, 2006–2020. Cities 2026, 168, 106488. [Google Scholar] [CrossRef]
  37. Güneralp, B.; Ahasan, R. Urban land-change futures: Current understanding, challenges, and implications. npj Urban Sustain. 2025, 6, 7. [Google Scholar] [CrossRef]
  38. Angel, S.; Blei, A. Atlas of Urban Expansion—2016 Edition; Lincoln Institute of Land Policy: Cambridge, MA, USA, 2016; ISBN 978-0-9981758-0-5. [Google Scholar]
  39. Imhoff, M.L.; Zhang, P.; Wolfe, R.E.; Bounoua, L. Remote sensing of the urban heat island effect across biomes in the continental USA. Remote Sens. Environ. 2010, 114, 504–513. [Google Scholar] [CrossRef]
  40. Parker, D.E. Urban heat island effects on estimates of observed climate change. Wiley Interdiscip. Rev. Clim. Change 2010, 1, 123–133. [Google Scholar] [CrossRef]
  41. Haase, D.; Haase, A.; Rink, D. Conceptualizing the nexus between urban shrinkage and ecosystem services. Landsc. Urban Plan. 2014, 132, 159–169. [Google Scholar] [CrossRef]
  42. Kabisch, N.; Korn, H.; Stadler, J.; Bonn, A. Nature-Based Solutions to Climate Change Adaptation in Urban Areas: Linkages Between Science, Policy and Practice; Springer Nature: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
  43. Frantzeskaki, N. Seven lessons for planning nature-based solutions in cities. Environ. Sci. Policy 2019, 93, 101–111. [Google Scholar] [CrossRef]
  44. Xu, T.; Umair, M.; Cheng, W.; Hakimova, Y.; Mang, G. Evaluating Eco-Efficiency as a metric for sustainable urban Growth: A comparative study of provincial capital cities in China. Ecol. Indic. 2024, 169, 112959. [Google Scholar] [CrossRef]
  45. Tang, X.; Xu, J.; Wang, R.; Li, J.V.; Jiang, L.; Li, C.Z. Drivers of Cross-Boundary Land Use and Cover Change in a Megacity Region: Evidence from the Guangdong–Hong Kong–Macao Greater Bay Area. Sustainability 2026, 18, 470. [Google Scholar] [CrossRef]
  46. Wang, J.; Fleischmann, M.; Venerandi, A.; Romice, O.; Kuffer, M.; Porta, S. EO+ Morphometrics: Understanding cities through urban morphology at large scale. Landsc. Urban Plan. 2023, 233, 104691. [Google Scholar] [CrossRef]
  47. Anselin, L. Local indicators of spatial association—LISA. Geogr. Anal. 1995, 27, 93–115. [Google Scholar] [CrossRef]
  48. Ord, J.K.; Getis, A. Local spatial autocorrelation statistics: Distributional issues and an application. Geogr. Anal. 1995, 27, 286–306. [Google Scholar] [CrossRef]
  49. Rey, S.J.; Anselin, L. PySAL: A Python library of spatial analytical methods. In Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 175–193. [Google Scholar]
  50. Batty, M. Inventing Future Cities; MIT Press: Cambridge, UK, 2018. [Google Scholar]
  51. Kandt, J.; Batty, M. Smart cities, big data and urban policy: Towards urban analytics for the long run. Cities 2021, 109, 102992. [Google Scholar] [CrossRef]
Figure 1. Geographic location and spatial composition of the CPMR. (a) The national context of the CPMR within China, the red star means the location of our research region, (b) a satellite view of the regional extent, (c) administrative boundaries of the midstream city cluster alongside examples of typical urban expansion patterns, (d) Heze urban area; (e) Zhengzhou urban area; and (f) Xinyang urban area. The map highlights the central strategic position of the CPMR and the diverse spatial forms of urban development across the agglomeration.
Figure 1. Geographic location and spatial composition of the CPMR. (a) The national context of the CPMR within China, the red star means the location of our research region, (b) a satellite view of the regional extent, (c) administrative boundaries of the midstream city cluster alongside examples of typical urban expansion patterns, (d) Heze urban area; (e) Zhengzhou urban area; and (f) Xinyang urban area. The map highlights the central strategic position of the CPMR and the diverse spatial forms of urban development across the agglomeration.
Land 15 00514 g001
Figure 2. Adaptive evaluation framework for constructing the UEI. (a) Workflow showing topic search, automated language filtering, transformer-based indicator extraction, indicator cleaning, and weight normalization; (b) Data retrieval from open-source, statistical, and literature-based sources; the asterisks in the TS mean a placeholder that can represent any letter.
Figure 2. Adaptive evaluation framework for constructing the UEI. (a) Workflow showing topic search, automated language filtering, transformer-based indicator extraction, indicator cleaning, and weight normalization; (b) Data retrieval from open-source, statistical, and literature-based sources; the asterisks in the TS mean a placeholder that can represent any letter.
Land 15 00514 g002
Figure 3. Workflow of the BERT-enabled literature-to-indicator pipeline and urban expansion assessment. The process includes (i) literature corpus construction from Web of Science (topic search, screening, and deduplication), (ii) text preprocessing and phrase detection on titles and abstracts, (iii) BERT encoder-based generation of contextual semantic embeddings for candidate indicator phrases, (iv) semantic clustering using cosine similarity to group synonymous terms, (v) indicator extraction and refinement with expert validation to produce a standardized indicator dictionary, and (vi) integration with multi-source spatial/socioeconomic datasets to quantify indicators and construct the UEI and associated maps.
Figure 3. Workflow of the BERT-enabled literature-to-indicator pipeline and urban expansion assessment. The process includes (i) literature corpus construction from Web of Science (topic search, screening, and deduplication), (ii) text preprocessing and phrase detection on titles and abstracts, (iii) BERT encoder-based generation of contextual semantic embeddings for candidate indicator phrases, (iv) semantic clustering using cosine similarity to group synonymous terms, (v) indicator extraction and refinement with expert validation to produce a standardized indicator dictionary, and (vi) integration with multi-source spatial/socioeconomic datasets to quantify indicators and construct the UEI and associated maps.
Land 15 00514 g003
Figure 4. Spatial patterns of UEI in 2010, 2020, and the 2010–2020 change across cities in the CPMR. Panels (ac) correspond to UEI in 2010, UEI in 2020, and the change in UEI over the study period.
Figure 4. Spatial patterns of UEI in 2010, 2020, and the 2010–2020 change across cities in the CPMR. Panels (ac) correspond to UEI in 2010, UEI in 2020, and the change in UEI over the study period.
Land 15 00514 g004
Figure 5. Spatial distribution of individual UEI indicators in 2010. Panels (ak) show the spatial patterns of Build Area, GAIA, GDP, POP, Green, Carbon, LST, TV, Gas, Water, and Road across cities in the CPMR. All indicators are normalized to allow comparison of relative spatial differences among cities.
Figure 5. Spatial distribution of individual UEI indicators in 2010. Panels (ak) show the spatial patterns of Build Area, GAIA, GDP, POP, Green, Carbon, LST, TV, Gas, Water, and Road across cities in the CPMR. All indicators are normalized to allow comparison of relative spatial differences among cities.
Land 15 00514 g005
Figure 6. Spatial distribution of individual UEI indicators in 2020. Panels (ak) show the spatial patterns of Build Area, GAIA, GDP, POP, Green, Carbon, LST, TV, Gas, Water, and Road across cities in the CPMR.
Figure 6. Spatial distribution of individual UEI indicators in 2020. Panels (ak) show the spatial patterns of Build Area, GAIA, GDP, POP, Green, Carbon, LST, TV, Gas, Water, and Road across cities in the CPMR.
Land 15 00514 g006
Figure 7. Spatial changes of individual UEI indicators between 2010 and 2020. Panels (ak) show the spatial differences in these indicators across cities in the CPMR. Positive and negative values indicate increases and decreases in indicator levels over the study period.
Figure 7. Spatial changes of individual UEI indicators between 2010 and 2020. Panels (ak) show the spatial differences in these indicators across cities in the CPMR. Positive and negative values indicate increases and decreases in indicator levels over the study period.
Land 15 00514 g007
Figure 8. Correlation structure of UEI indicators and their temporal changes. Panel (a) shows the inter-indicator correlation matrix for 2010, panel (b) shows the inter-indicator correlation matrix for 2020, and panel (c) shows the correlation matrix of indicator changes between 2010 and 2020. Colors represent the strength and direction of correlations, with positive values indicating positive associations and negative values indicating inverse relationships.
Figure 8. Correlation structure of UEI indicators and their temporal changes. Panel (a) shows the inter-indicator correlation matrix for 2010, panel (b) shows the inter-indicator correlation matrix for 2020, and panel (c) shows the correlation matrix of indicator changes between 2010 and 2020. Colors represent the strength and direction of correlations, with positive values indicating positive associations and negative values indicating inverse relationships.
Land 15 00514 g008
Figure 9. Local spatial clustering patterns of UEI and its temporal change. Panels (ac) present Local Moran’s I cluster maps of UEI for 2010, 2020, and the change between 2010 and 2020, respectively. Panels (df) show corresponding Getis–Ord Gi* hotspot maps for the same periods. Cluster types include High–High, Low–Low, High–Low, and Low–High for Local Moran’s I, while hot spots and cold spots indicate statistically significant positive and negative spatial concentrations identified by Gi*. Statistical significance is assessed at the 95% confidence level.
Figure 9. Local spatial clustering patterns of UEI and its temporal change. Panels (ac) present Local Moran’s I cluster maps of UEI for 2010, 2020, and the change between 2010 and 2020, respectively. Panels (df) show corresponding Getis–Ord Gi* hotspot maps for the same periods. Cluster types include High–High, Low–Low, High–Low, and Low–High for Local Moran’s I, while hot spots and cold spots indicate statistically significant positive and negative spatial concentrations identified by Gi*. Statistical significance is assessed at the 95% confidence level.
Land 15 00514 g009
Table 1. Indicator composition and weight distribution of UEI.
Table 1. Indicator composition and weight distribution of UEI.
Indicator CategoryIndicator (Abbreviation)DescriptionNormalized Weight
Spatial expansionLULCPLand-use and land-cover change intensity0.375
Spatial expansionGAIABuilt-up land proportion0.341
SocioeconomicGDPGross domestic product0.172
SocioeconomicPopulationTotal population size0.172
Environmental responseCarbonCarbon emissions0.172
InfrastructureLSTLand surface temperature0.061
InfrastructureTVPTransportation infrastructure proxy0.022
InfrastructureWaterWater supply capacity0.028
InfrastructureGasGas supply capacity0.028
InfrastructureRoadRoad density0.028
Environmental responseGreenGreen space coverage0.028
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wan, F.; Zhang, Z.; Wang, R.; Shu, D.; Ning, B.; Gong, J.; Li, X. An Adaptive Transformer-Based Language-Model Framework for Assessing Urban Expansion. Land 2026, 15, 514. https://doi.org/10.3390/land15030514

AMA Style

Wan F, Zhang Z, Wang R, Shu D, Ning B, Gong J, Li X. An Adaptive Transformer-Based Language-Model Framework for Assessing Urban Expansion. Land. 2026; 15(3):514. https://doi.org/10.3390/land15030514

Chicago/Turabian Style

Wan, Fang, Zhan Zhang, Ru Wang, Daoyu Shu, Beile Ning, Jianya Gong, and Xi Li. 2026. "An Adaptive Transformer-Based Language-Model Framework for Assessing Urban Expansion" Land 15, no. 3: 514. https://doi.org/10.3390/land15030514

APA Style

Wan, F., Zhang, Z., Wang, R., Shu, D., Ning, B., Gong, J., & Li, X. (2026). An Adaptive Transformer-Based Language-Model Framework for Assessing Urban Expansion. Land, 15(3), 514. https://doi.org/10.3390/land15030514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop