1. Introduction
Urbanization has been one of the most significant drivers of socioeconomic transformation in China over the last three decades [
1,
2]. As functional hubs, urban areas centralize innovation, essential services, and specialized facilities, thereby facilitating the flow of capital and labor across the urban–rural continuum [
3]. This process is characterized not only by the spatial expansion of built-up land but also by the intensified concentration of population, economic activities, and infrastructure. Given that these socioeconomic functions are physically manifested through human activities and built environments [
4], the accurate delineation of urban boundaries is not merely a cartographic exercise but a prerequisite for evaluating the efficiency of urban space in accommodating and organizing such activities. However, most existing urban boundary delineation frameworks have been primarily developed and validated for large coastal metropolises [
5,
6]. While recent studies have extended urban mapping to inland cities [
7,
8,
9,
10], the spatial characterization of small and medium-sized inland cities remains comparatively limited. These cities often face challenges such as fragmented urban morphology and complex geomorphic settings. Despite these constraints, they play a crucial role in regional connectivity and spatial organization, serving as important nodes in broader socio-economic and infrastructural networks [
11]. Their heterogeneous built environments and fragile ecosystems make them particularly vulnerable to unplanned sprawl, which risks diluting urban functional intensity and undermines SDGs related to land use efficiency [
12,
13,
14,
15,
16]. Consequently, there is an urgent need for robust and reproducible monitoring approaches capable of capturing urban expansion patterns and supporting sustainable spatial planning in inland contexts.
Traditional urban extent approaches based on field surveys or census statistics are often labor-intensive, costly, and temporally inconsistent [
17,
18]. Satellite remote sensing, particularly NTL imagery, provides a practical alternative for such monitoring due to its strong correlation with human activities [
19,
20,
21]. Products such as the Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) and the National Polar-Orbiting Partnership-Visible Infrared Imaging Radiometer Suite (NPP/VIIRS) have enabled numerous studies on global and regional urbanization [
7,
22,
23,
24]. Despite these advantages, the utility of NTL for fine-scale boundary delineation is hindered by the coarse spatial resolution and persistent blooming effects [
10,
25,
26], particularly in inland cities where low-density development at the urban fringe becomes indistinguishable from background noise.
To mitigate the limitations of NTL-based mapping, recent efforts have combined NTL with auxiliary spectral or thermal indicators such as the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index (NDWI), the Normalized Difference Built-up Index (NDBI), and land surface temperature (LST) to characterize urban surfaces and suppress background noise [
27,
28,
29]. However, the efficacy of these individual auxiliary variables is often constrained by spectral confusion and environmental noise. Specifically, the spectral similarity between impervious surfaces and barren land often leads NDVI to conflate bare soil with built-up areas [
30]. Similarly, NDBI exhibits pronounced instability in peri-urban transitional zones, where heterogeneous mixtures of impervious surfaces, exposed soil, and sparse vegetation produce overlapping spectral signatures [
31,
32]. Although NDWI is generally effective for water body detection, its performance is often compromised in urban environments by topographic shading and building-induced shadows [
33]. Furthermore, the inclusion of LST introduces temporal uncertainty, as its values are highly sensitive to seasonal shifts and atmospheric variability [
34,
35].
In response to these localized misclassifications, composite indices such as the vegetation–temperature light index (VTLI) and the normalized urban area composite index (NUACI) have been developed to leverage multiple data sources, effectively reducing saturation effects and enhancing intra-urban contrast [
36,
37]. Despite their improvements, these frameworks predominantly rely on simplified pairwise combinations, which lack the multidimensional robustness required for complex inland environments. In such regions, the high heterogeneity inherent in mixed industrial–residential patterns and fragmented construction zones creates conflated signal signatures, which may limit the effectiveness of low-dimensional indices [
38,
39,
40]. Additionally, model complexity and interpretability present notable challenges. Some indices, exemplified by the human settlement index (HSI) [
41], employ intricate formulations that hinder transparent interpretation and reduce their practicality for large-scale or operational applications [
19].
Beyond physical indicators, socioeconomic datasets such as point of interest (POI) data and road networks have been introduced to capture functional patterns of human activity [
25,
42,
43]. While valuable in data-rich metropolitan regions, these datasets often suffer from spatial bias, inconsistent coverage, and substantial preprocessing requirements, limiting their applicability in data-constrained inland cities [
44,
45]. Synthesizing these challenges, specifically the limited diversity of spectral variables, the structural opacity of existing indices, and the accessibility barriers of ancillary data, it becomes evident that current NTL-based methods struggle to account for the pronounced land-surface heterogeneity of small and medium-sized inland cities. Consequently, these limitations lead to inconsistent or unreliable delineation of urban boundaries in geographically diverse contexts.
To bridge these gaps, this study introduces a robust composite index. By synergistically integrating NDVI, the Modified NDWI (MNDWI) [
33], NDBI, and LST with NTL data, VWBTNUI is designed to suppress NTL blooming effects and better characterize urban features. Developed using globally accessible datasets, VWBTNUI prioritizes simplicity, scalability, and applicability to underrepresented inland cities. The specific objectives of this study are to: (1) formulate the VWBTNUI architecture by optimizing the integration of multi-source physical indicators; (2) validate the index’s robustness across three representative small- and medium-sized cities in western China; and (3) provide methodological insights for advancing urban boundary extraction in support of sustainable development and spatial planning in data-limited environments.
4. Discussion
4.1. Interpreting the Performance of Existing Indices and VWBTNUI
Both qualitative and quantitative evaluations indicate that VWBTNUI provides the most reliable delineation of urban extent across the three inland cities. This advantage is particularly pronounced when the extraction results are considered within the distinctive developmental context of inland urban environments. Unlike the relatively contiguous expansion observed in coastal regions, inland cities, often constrained by rugged topography, tend to develop through fragmented parcels. This “roads-first, construction-later” development trajectory frequently produces extensive illuminated corridors connecting before the built-up surfaces are fully consolidated. Consequently, radiance-only indices are prone to systematic overestimation caused by light spillover [
72,
73,
74]. A representative case was observed in BZ, where illumination along the corridor connecting the urban core to Enyang Airport resulted in an overestimation of nearly 5 km
2, corresponding to a localized extraction error as high as 132.69%.
Pronounced spectral fragmentation within inland heterogeneous mosaics weakens the effectiveness of reflectance-based indices. In this study, BANUI exhibits blurred spatial patterns comparable to those of VNL, reflecting its limited discriminative capacity under mixed land-cover conditions. Statistical evidence supports this observation, as NDBI values across the study sites exhibit consistently low mean values and limited variability, as indicated by the mean and standard deviation (SD) (ZY: mean = 0.019, SD = 0.072; BZ: mean = 0.013, SD = 0.067; LZ: mean = 0.009, SD = 0.079). Such limited variability reduces separability among built-up surfaces, cropland, bare soil, and construction sites, thereby explaining why NDBI-based methods, although effective in large metropolitan areas [
75], perform poorly in fragmented inland settings where SWIR–NIR contrast is substantially reduced.
Environmental interference from water bodies and urban shadows remains a persistent source of misclassification. Vegetation-adjusted indices improve urban-rural contrast. However, their sensitivity to low-NDVI features often manifests as elevated responses along river segments. This effect arises from the formulation (1 − NDVI), which mathematically amplifies signals over water surfaces where NDVI values are typically minimal. Consequently, both VANUI and VTLI manifested elevated responses along river corridors. While integration of water constraints in VWANUI and VWBTNUI alleviates this inflation, conventional NDWI remains vulnerable to shadow-driven confusion in compact inland districts. This limitation is evident in the localized fragmentation within the dense urban cores of BZ observed in the VWANUI results, where narrow streets and variable building heights generate extensive shaded areas, which spectrally resemble water surfaces [
76]. Furthermore, the contribution of thermal information was found to be conditionally beneficial rather than universally effective. Although LST substantially enhanced urban separation in LZ (SD = 2.66), its discriminative effectiveness was reduced in ZY and BZ, where thermal variability is weaker, with LST SDs of 2.37 and 1.89, respectively. This finding aligns with previous studies suggesting that LST-based differentiation becomes effective only when temperature gradients are sufficiently pronounced [
65].
The limitations observed in those indices demonstrate that each individual indicator captures only a partial and context-dependent subset of urban surface characteristics, and their effects are neither independent nor uniformly beneficial [
25,
36]. In contrast, VWBTNUI outperforms competing approaches because its comprehensive structure enforces concurrence across complementary dimensions: vegetation suppression, water masking, building-specific reflectance, and thermal distinctiveness. This multi-source agreement mechanism functions through a series of targeted cross-checks: (1) illumination-driven spillover is suppressed where NDBI, NDVI, or LST contradict urban signatures; (2) water-related and shadow issues are corrected by the MNDWI, which leverages the strong attenuation of the SWIR band to enhance separability; and (3) low-radiance industrial areas are recovered when reflectance or thermal cues indicate impervious surfaces. Notably, at industrial fringes with low radiance, the combined contribution enables VWBTNUI to detect low-radiance built-up areas that other methods fail to capture, often due to elevated surface temperatures or distinctive roofing reflectance patterns [
77] as illustrated by ZY (
Figure 6).
To verify the technical robustness of this integration strategy, we conducted a comprehensive input-level sensitivity analysis by introducing ±5% random perturbations to each input band (NDVI, MNDWI, NDBI, NLST, and VNL) across 50 Monte Carlo simulations. As reported in
Table A2, the resulting coefficients of variation (CV) for extracted urban areas remained below 0.5% across all cities and input dimensions, indicating that VWBTNUI is highly stable with respect to small input fluctuations. Furthermore, acknowledging the potential impact of the resolution gap between sensors, we assessed the sensitivity of VWBTNUI to potential spatial misregistration of the VIIRS data. The raster was systematically displaced by one-half pixel (±250 m) in the four cardinal directions at its native 500 m resolution prior to resampling and index computation. The resulting urban area estimates showed only minor variation, with absolute relative differences below 2% across all shift scenarios and study areas (
Table A3), confirming that the index is resilient to sub-pixel geo-location uncertainties. Beyond input uncertainty and spatial misregistration, the robustness of VWBTNUI with respect to validation sample selection was evaluated using a bootstrap resampling procedure (500 iterations per city). As summarized in
Table A4, the small SDs (all below 5%) for OA, F1 score, and the Kappa coefficient confirm that the binary classification performance is statistically stable and not driven by particular configuration of validation samples.
While the index demonstrates high statistical stability, its operational efficacy is ultimately modulated by the spatial configuration of the urban environment. Inter-city performance disparities clearly indicate that urban morphology is a critical determinant of extraction fidelity. In ZY, the highly fragmented urban fabric—characterized by the intermixing of cropland, bare soil, and unbuilt plots—poses the greatest challenge for accurate boundary delineation. In contrast, BZ exhibits topographically constrained and spatially concentrated expansion, which reduces surface heterogeneity and facilitates higher extraction accuracy. LZ represents an intermediate case, where hydrological complexity combined with the spatial decoupling of peripheral industrial zones from the residential core introduces additional spectral ambiguities, limiting overall performance.
Despite pronounced variability in urban environmental conditions, VWBTNUI consistently maintains superior delineation performance across all cities. By jointly integrating complementary constraints on vegetation, water, building and thermal characteristics with NTL data, this index demonstrates that multi-source concurrence is essential for NTL-based urban mapping [
75,
78]. This is particularly relevant in inland environments characterized by fragmented morphologies and diverse surface compositions.
4.2. Practical Implications for Inland, Data-Scarce Regions and Sustainable Development
Beyond methodological performance, VWBTNUI offers substantial practical value for urban governance and sustainable development, particularly in small and medium-sized cities where accurate spatial information is often lacking. Reliable urban delineation generated by VWBTNUI provides a robust empirical basis for delineating urban growth boundaries, enforcing zoning regulations, and assessing urban compactness. These governance functions are closely aligned with the efficiency and containment objectives embedded in SDG 11.3, which require consistent measurements of land use dynamics. By accurately identifying built-up footprints within heterogeneous transitional zones, VWBTNUI reduces the risk of planning misjudgments that arise when coarse-resolution datasets fail to capture rapid urban expansion.
In addition, the proposed framework relies exclusively on open-access global datasets, including NTL and Landsat imagery, and key data preprocessing and computation were performed in the cloud-based GEE platform, making implementations financially and logistically viable for planning agencies with limited technical capacity [
63]. Benchmarking tests indicate that the end-to-end processing for a city-scale area (e.g., Yanjiang District of ZY, ~1600 km
2) requires less than 4 min, while a province-scale analysis (Sichuan Province, ~486,000 km
2) can be completed in approximately 41 min. The unsupervised workflow further minimizes the need for training samples, manual annotation, and computationally intensive deep-learning pipelines [
5]. This low-barrier implementation is particularly valuable in regions where institutions often operate under resource constraints.
Ultimately, the stable performance of VWBTNUI across heterogeneous inland environments provides a consistent and comparable basis for cross-city urban analysis and regional modeling. By offering a reproducible and transparent mapping framework, VWBTNUI supports evidence-based policymaking and enhances accountability in urban planning processes. Collectively, these attributes contribute to advancing progress toward SDG 11 in underrepresented inland regions, effectively translating methodological rigor into practical governance and sustainability outcomes.
4.3. Limitations and Future Research
Despite the promising performance of VWBTNUI, several limitations should be considered. First, the current analysis is based on a single-year snapshot (2020), which restricts direct evaluation of the index’s temporal stability and transferability. Second, the focus on three cities within a single province limits the representation of broader geomorphic and socio-economic gradients, which may constrain the generalizability of the findings to markedly different urban environments. Third, in low-brightness areas, the lack of anthropogenic lighting and the VIIRS detection threshold (0.3 nW cm−2 sr−1) limit the index’s ability to capture emerging or low-density urban areas. Although spectral and thermal indicators provide complementary information, underestimation may still occur in sparsely developed zones. Finally, the multiplicative fusion strategy may be less responsive under extreme conditions, potentially allowing uncertainties from individual input layers to propagate.
To address these limitations, subsequent work will extend the proposed framework to multi-year time series to systematically assess the temporal stability and transferability of VWBTNUI. Expanding the analysis to a more diverse set of cities will allow a rigorous evaluation of the method’s applicability beyond regionally constrained settings. Additionally, higher-resolution NTL observations from emerging platforms such as Luojia-01 and SDGSAT-1 [
79,
80] will be explored to improve the detection of small or dimly lit urban features unresolved by VIIRS. Methodologically, continued refinements of the fusion strategy will incorporate adaptive weighting and data-driven thresholding to improve the balance between multi-dimensional signal contributions and noise suppression.
5. Conclusions
This study introduced VWBTNUI, a multi-dimensional NTL-based index designed to address blooming and illumination heterogeneity in urban extent mapping. By integrating vegetation, water, building, and thermal information with NTL data, VWBTNUI provides a physically consistent representation of built-up surfaces and enhances classification robustness in heterogeneous inland environments.
Empirical evaluation across three inland cities in Sichuan Province demonstrated that VWBTNUI consistently outperformed VNL and existing composite indices, achieving OA values above 0.88, F1 scores over 0.80, Kappa coefficients exceeding 0.72, and RE below 10%. While BANUI, VANUI, VTLI, and VWANUI address specific distortions, they remain sensitive to bare soil, vegetation variability, weak thermal contrast, or water-related confusion, and their performance is further influenced by urban morphological complexity. In contrast, VWBTNUI’s multi-constraint design enforces concurrence among complementary urban signatures, effectively suppressing these anomalies and producing coherent urban patterns with greater adaptability across fragmented landscapes. These results highlight that reliable inland urban extraction depends on coordinated multi-dimensional integration rather than isolated indicators.
By leveraging globally accessible datasets and efficient pixel-level operations, VWBTNUI offers an easily implementable framework for urban monitoring, planning, and environmental assessment. However, to address current limitations—specifically in detecting dimly lit settlements and the constraints of limited sample sizes and timeframes—future research should integrate higher-resolution data, adaptive fusion strategies, and broader spatiotemporal validation. Overall, this study demonstrates that multi-dimensional concurrence constitutes a fundamental methodological principle for accurate NTL-based urban delineation in inland environments.