1. Introduction
Urban expansion mapping is a fundamental requirement for sustainable land management, infrastructure planning, environmental monitoring, and heritage conservation. Its importance becomes even greater in rapidly transforming dryland regions, where urban growth often occurs in landscapes dominated by bright bare soils, sparse vegetation, and highly reflective surfaces that complicate remote-sensing interpretation. In such contexts, inaccurate delineation of built-up land may distort estimates of growth, misinform planning priorities, and weaken the spatial evidence needed for managing sensitive heritage and development zones.
These challenges are particularly relevant in Saudi Arabia, where major development initiatives under Saudi Vision 2030 are accelerating urban transformation across historically and environmentally significant areas. Diriyah represents a highly important case in this regard. As the historic core of the first Saudi state and a UNESCO-recognized heritage landscape, Diriyah is simultaneously a site of cultural preservation, tourism investment, and rapid urban redevelopment. Monitoring its urban expansion therefore requires methods that are not only technically robust but also spatially reliable in a dryland setting where built-up surfaces are easily confused with exposed soil and construction-related land cover.
Remote sensing provides powerful tools for monitoring urban change across large areas and multiple time periods, but the performance of commonly used methods varies considerably by environmental context. Spectral indices such as the Normalized Difference Built-up Index (NDBI) are widely used because of their simplicity and operational efficiency, yet they are known to perform inconsistently in arid and semi-arid environments where bare land exhibits reflectance properties similar to urban materials. Unsupervised classification offers greater flexibility by allowing class boundaries to emerge from the data, but it may still confuse fragmented built-up patches with bright non-urban surfaces when spectral separability is weak. In contrast, global built-up products such as the Global Human Settlement Layer (GHSL) provide standardized and temporally consistent datasets derived from broader training and multi-source classification frameworks, potentially offering more stable performance in difficult mapping environments.
Despite extensive work on urban mapping, an important gap remains in the comparative evaluation of these approaches within dryland heritage districts undergoing rapid transformation. Much of the literature either focuses on single-method applications, broad metropolitan analyses, or humid and mixed-land-cover settings where spectral confusion is less severe. Fewer studies have systematically compared global built-up products, spectral-index methods, and unsupervised classification in a heritage-sensitive desert environment while also examining how their outputs diverge spatially and temporally. This gap is important because method choice directly affects estimates of urban growth and, by extension, the planning decisions derived from them.
Accordingly, this study comparatively evaluates three widely used urban expansion mapping approaches in Diriyah for 2015, 2020, and 2025: GHSL, NDBI, and unsupervised k-means classification. In addition to comparing their mapped urban extents and accuracy metrics, the study introduces a simple Hybrid Built-up Detection Model (HBDM) as an integrative diagnostic layer that synthesizes the outputs of the three methods into a continuous urban intensity surface. Rather than replacing formal accuracy assessment, the hybrid model is intended to support interpretation by highlighting persistent built-up cores, transitional zones, and areas of methodological disagreement. Through this framework, the study contributes empirical guidance for dryland urban mapping and offers planning-relevant insights for heritage conservation and sustainable urban development in Saudi Arabia and comparable desert cities.
2. Literature Review
2.1. Challenges in Dryland Urban Mapping
Urban expansion mapping is a long-established application of remote sensing, but its reliability remains strongly conditioned by environmental context. In dryland cities, built-up detection is particularly difficult because impervious surfaces often share similar spectral responses with bright bare soils, exposed rock, construction areas, and sparsely vegetated land. As a result, urban boundaries may be exaggerated, transitional zones may be misclassified, and temporal growth trends may be distorted if methods are transferred from humid or mixed-land-cover settings without local calibration. Previous studies from Saudi Arabia and other arid environments have shown that these conditions reduce separability between built-up and non-built classes and increase the risk of both commission error and spatial overestimation [
1,
2,
3,
4]. More recent work has reinforced the same concern in dry climates, showing that impervious-surface extraction remains sensitive to the spectral similarity between urban materials and surrounding barren land, especially when using medium-resolution optical imagery alone [
5,
6].
2.2. Spectral Indices for Urban Detection
To address the challenge of built-up extraction, a wide range of spectral indices has been developed. The Normalized Difference Built-up Index (NDBI) remains one of the most widely used methods because it offers a simple and operational way to emphasize built-up surfaces using near-infrared and shortwave-infrared reflectance [
7]. Subsequent work expanded this family of methods through thematic combinations designed to suppress vegetation and water signals and improve urban delineation [
8,
9]. In dry and semi-arid environments, however, index-based methods remain vulnerable to soil background effects, surface brightness, and threshold instability, which often require local tuning and cross-checking with reference imagery [
10]. Recent evaluations in dry climates have confirmed that single-index approaches are rarely sufficient on their own, and that combinations of indices or additional contextual constraints may improve the distinction between built-up land and bare surfaces [
5,
6]. These findings are important for the present study because they suggest that NDBI is useful as a comparative baseline, but not necessarily as a stand-alone solution in heritage-sensitive desert settings.
2.3. Unsupervised and Machine Learning Approaches
Beyond threshold-based indices, unsupervised and machine learning methods provide greater flexibility because class boundaries are inferred from the data rather than fixed a priori. K-means clustering and related unsupervised approaches are often attractive for exploratory mapping because they can reveal dominant spectral groupings without requiring labelled training samples. Yet in dryland environments their performance may still be constrained by weak spectral contrast and mixed clusters that combine bare soil, construction surfaces, and impervious materials into the same class [
11,
12,
13]. More recent studies indicate that improved performance can be achieved when optical imagery is complemented by radar, topographic, thermal, or textural information, or when deep learning architectures are used for impervious-surface extraction [
14,
15,
16,
17]. At the same time, these more advanced approaches often require larger training datasets, more complex feature engineering, or computationally intensive workflows. This makes them valuable reference points for future work, while preserving the relevance of simpler comparative designs when the research objective is to evaluate the behavior of commonly used approaches under a difficult dryland mapping scenario.
2.4. Global Built-Up Products, Hybrid Approaches, and the Research Gap
Global built-up products offer a different pathway by providing standardized, multi-temporal representations of settlement patterns derived from large-scale supervised frameworks. Among these, the Global Human Settlement Layer (GHSL) has become one of the most widely used datasets for tracking built-up dynamics and supporting regional and global urban analysis [
18,
19]. Evaluations suggest that GHSL performs particularly well in delineating established urban cores and providing temporally consistent baselines, although it may underrepresent fragmented, low-density, or newly emerging development at the urban fringe [
20]. Recent advances in high-resolution impervious-surface mapping and hybrid urban extraction also point toward the benefits of integrating multiple data sources rather than relying on a single method alone. Examples include fused optical-SAR workflows, texture-enhanced impervious-surface mapping, and new 10 m products that aim to improve spatial completeness and thematic accuracy [
14,
15,
16,
17]. Despite this progress, a gap remains in the comparative evaluation of GHSL, spectral-index mapping, and unsupervised clustering within dryland heritage districts undergoing rapid transformation. Much of the literature focuses either on single-method applications, high-resolution AI workflows, or broader metropolitan settings. Fewer studies have directly examined how these different methodological families diverge in a high-reflectance desert landscape where method choice can materially alter estimates of urban growth and planning interpretation. This gap is especially relevant in Diriyah, where accurate mapping is needed not only for measuring expansion but also for supporting heritage protection, landscape management, and sustainable planning under Saudi Vision 2030.
3. Data and Methodology
3.1. Study Area and Research Design
Diriyah is located northwest of Riyadh in central Saudi Arabia (
Figure 1) and represents one of the country’s most historically and culturally significant urban landscapes. It includes At-Turaif, a UNESCO World Heritage Site and the historic core of the first Saudi state, while also forming part of a rapidly transforming district shaped by tourism development, residential expansion, infrastructure investment, and heritage-led redevelopment under Saudi Vision 2030. This dual character makes Diriyah an especially suitable case for evaluating urban expansion mapping methods in a dryland heritage setting, where accurate delineation of built-up growth is important not only for land monitoring, but also for spatial planning and conservation-oriented decision-making.
The study was designed as a comparative, multi-method assessment of urban expansion across three benchmark years: 2015, 2020, and 2025. These years were selected to capture pre-expansion conditions, an intermediate stage of development, and the most recent phase of accelerated transformation in Diriyah. Three widely used yet methodologically distinct approaches were evaluated: (1) the Global Human Settlement Layer (GHSL), which represents a standardized global built-up product; (2) the Normalized Difference Built-up Index (NDBI), which represents a spectral-index-based method; and (3) unsupervised k-means clustering, which represents a simple data-driven classification approach. Taken together, these methods provide a practical framework for examining how method choice influences estimates of built-up extent in a high-reflectance dryland environment.
Figure 2 summarizes the methodological workflow of the study, including the main data inputs, the three mapping streams, the validation procedure, and the derivation of the Hybrid Built-up Detection Model (HBDM).
3.2. Data Sources and Method Selection
The analysis uses multi-temporal satellite imagery and GHSL built-up surfaces to map urban change in Diriyah for 2015, 2020, and 2025. Cloud-free imagery with consistent seasonal timing and comparable spatial resolution was selected to minimize temporal noise and improve inter-year comparability (
Table 1). Official GHSL built-up layers were used for 2015 and 2020 and clipped to the Diriyah boundary. For 2025, because no official GHSL was available for that date, a GHSL-based extension surface was derived by combining the 2020 GHSL layer with visually interpreted new built-up patches identified from high-resolution imagery acquired between 2018 and 2024. This derived layer is not treated as an official GHSL product; instead, it is used as an indicative extension to approximate recent built-up expansion while preserving the conservative logic of the GHSL baseline.
The three selected methods were chosen because they represent different and commonly used families of urban mapping approaches, each with distinct strengths and limitations in dryland settings. GHSL was included because it provides a globally calibrated and temporally consistent representation of built-up land, making it a useful baseline for comparison. NDBI was selected because it is one of the most widely applied spectral indices for urban detection and offers a transparent, low-complexity way of highlighting built-up surfaces. Unsupervised k-means clustering was included because it allows land-surface patterns to emerge directly from the spectral data without requiring labelled training samples, thereby serving as an exploratory classification baseline. Comparing these three approaches makes it possible to assess not only differences in mapped urban extent, but also the trade-offs between standardized global products, index-based extraction, and unsupervised classification under the same environmental conditions [
21,
22,
23].
3.3. NDBI-Based Built-Up Mapping
For the NDBI-based approach, the Normalized Difference Built-up Index was calculated for each study year using the standard formulation:
where SWIR is the shortwave infrared reflectance and NIR is the near-infrared reflectance.
Initial thresholds were guided by previous applications in dryland and semi-arid environments and then refined iteratively through visual comparison with high-resolution imagery in order to reduce confusion with bright bare soils and active construction areas. To further improve separation from non-urban surfaces, pixels exceeding the final NDBI threshold and falling below a simple NDVI threshold were classified as built-up. This produced binary built-up masks and associated area estimates for each year. The procedure balances methodological transparency with the need for local calibration in spectrally complex dryland landscapes.
3.4. Unsupervised k-Means Classification
Unsupervised k-means clustering was applied to the multispectral imagery for each year and treated here as a simple exploratory machine learning approach. After testing several cluster configurations, k = 5 was selected as a practical compromise between capturing spectral variability and maintaining interpretability. This number allowed the imagery to be partitioned into a manageable set of land-surface groups while preserving sufficient flexibility to distinguish built-up areas from vegetation, bare land, and transitional surfaces.
The cluster most closely corresponding to built-up land was identified through examination of cluster centroids and visual comparison with high-resolution imagery. The same general spectral and contextual logic was applied across all three years to maintain inter-temporal consistency. Although this approach does not eliminate spectral ambiguity, it provides a useful data-driven baseline for assessing how far unsupervised grouping alone can distinguish urban surfaces in a dryland setting characterized by high surface brightness [
24,
25].
3.5. Accuracy Assessment and Reference Points
To evaluate classification performance quantitatively, 150 stratified random reference points were generated across the study area and visually interpreted from high-resolution imagery for 2015, 2020, and 2025 as either built-up or non-built. Stratification was used to ensure that the validation sample represented both urbanized and non-urbanized surfaces rather than being dominated by the more extensive non-built background. This is especially important in dryland environments, where class imbalance can artificially inflate apparent accuracy when most validation points fall within the dominant bare-land class.
The sample size of 150 points was considered appropriate for this study because the analysis focuses on a relatively compact case-study area, compares broad built-up versus non-built classes rather than a large multi-class scheme, and is intended to provide a robust comparative assessment of method behavior rather than a cadastral inventory. Each method’s output at the reference locations was compared against the visually interpreted labels using confusion matrices.
Four standard accuracy metrics were calculated: overall accuracy (OA), user’s accuracy (UA), producer’s accuracy (PA), and the Kappa coefficient. These were computed as follows:
where xii is the number of correctly classified observations for class i, xi+ is the total number of observations assigned to class i, x + i is the total number of reference observations in class i, and N is the total number of validation points. Together, these metrics provide a more complete assessment of performance than area comparison alone by capturing both overall agreement and class-specific omission and commission error [
26].
A map of the reference points used in the accuracy assessment is shown in
Figure 3.
3.6. Hybrid Built-Up Detection Model (HBDM)
In addition to the three primary methods, a Hybrid Built-up Detection Model (HBDM) was constructed as an integrative diagnostic layer rather than as a separate classifier. Its purpose is not to replace the formal accuracy assessment or to claim a new ground-truth-validated classification product. Instead, it combines the outputs of GHSL, NDBI, and unsupervised k-means clustering into a continuous urban intensity surface that highlights zones of convergence and disagreement among methods.
To ensure pixelwise comparability, all input layers were aligned to a common spatial grid. The GHSL and NDBI layers were treated as continuous surfaces and normalized to a common range between 0 and 1 using min–max scaling:
where normalized values were clipped to the [0, 1] range. The built-up result derived from the unsupervised classification was converted into a binary built-up mask, where a value of 1 indicates built-up and 0 indicates non-built. In the present study, Cluster 0 was identified as the built-up class based on inspection of cluster characteristics and visual comparison with high-resolution imagery.
The HBDM was then computed as a weighted linear combination of the normalized GHSL and NDBI layers and the binary clustering mask:
The weighting structure was designed to reflect the comparative reliability observed in the study. GHSL was assigned the highest weight because it showed the strongest overall accuracy and the clearest temporal consistency, while NDBI and the clustering layer were assigned lower supporting weights because both were more affected by commission error under dryland conditions. In this sense, the hybrid layer is accuracy-informed rather than arbitrarily weighted.
The resulting HBDM surface is interpreted as a relative urban-intensity layer for visual comparison and diagnostic interpretation, especially for distinguishing persistent urban cores, transition zones, and areas of greater methodological uncertainty. It is therefore used as a supportive interpretive product rather than as an independent classification to be evaluated in the accuracy assessment.
4. Results
The three evaluated methods produced substantially different estimates of built-up area and contrasting spatial representations of urban growth in Diriyah.
Table 2 summarizes the built-up area mapped by GHSL, NDBI, and unsupervised k-means clustering for 2015, 2020, and 2025, while
Figure 4,
Figure 5,
Figure 6 and
Figure 7 present the corresponding spatial outputs. Taken together, the results reveal clear methodological divergence not only in the amount of built-up land identified by each approach, but also in their ability to represent a plausible trajectory of urban growth in a dryland heritage landscape.
4.1. Comparative Built-Up Area Estimates and Temporal Change
Built-up area estimates varied markedly across methods. GHSL mapped 2.80 km2 in 2015, 4.94 km2 in 2020, and 5.31 km2 in 2025, indicating a progressive and spatially plausible increase in urban extent over time. In contrast, NDBI mapped 36.28 km2, 35.75 km2, and 22.67 km2 for the same years, while unsupervised clustering mapped 35.70 km2, 32.91 km2, and 32.05 km2, respectively. These values are far larger than the GHSL estimates and, in the case of NDBI and clustering, do not reflect a realistic growth trajectory for Diriyah.
The contrast becomes even clearer when temporal change is considered explicitly. As shown in
Table 2, GHSL indicates an increase of 2.14 km
2 between 2015 and 2020 and a further 0.37 km
2 between 2020 and 2025, corresponding to a total gain of 2.51 km
2 across the study period. By comparison, NDBI shows a slight decline of 0.53 km
2 between 2015 and 2020 and a much larger decline of 13.08 km
2 between 2020 and 2025, yielding a total change of −13.61 km
2. Unsupervised clustering also produces a negative trajectory, with −2.79 km
2 from 2015 to 2020 and −0.86 km
2 from 2020 to 2025, for a total change of −3.65 km
2.
Given the documented transformation of Diriyah during this period, these negative trajectories should not be interpreted as evidence of actual urban contraction. Rather, they reflect methodological artefacts arising from over-classification and temporal instability. This comparison demonstrates that method choice affects not only the magnitude of mapped built-up area, but also the apparent direction and pace of change itself.
4.2. GHSL Results
Among the three approaches, GHSL provides the most spatially coherent and temporally credible representation of urban growth in Diriyah. Across all three years, the mapped built-up footprint remains relatively compact and expands gradually around the established urban core. In 2015, built-up pixels are concentrated mainly in the historic district and adjacent settled areas. By 2020, the mapped extent expands around the existing core, consistent with the early phases of recent development. In 2025, the GHSL-based footprint shows additional infill and limited outward growth toward newly developed zones.
Overall, GHSL portrays Diriyah as a relatively compact urban landscape undergoing steady expansion rather than diffuse sprawl. This pattern is consistent with the known development trajectory of the district and aligns more closely with the broader transformation associated with heritage-led redevelopment and Vision 2030.
4.3. NDBI Results
The NDBI-derived outputs differ sharply from the GHSL results and indicate substantial overestimation of built-up extent in all years. In 2015 and 2020, the NDBI maps classify extensive bright surfaces surrounding Diriyah as built-up, including rocky terrain, bare wadi floors, and highly reflective barren land. As a result, the mapped urban extent is inflated by more than an order of magnitude relative to GHSL.
The limited change between 2015 and 2020 suggests that NDBI had already saturated much of the scene with built-up labels at the beginning of the analysis period. Although the 2025 NDBI result appears somewhat more constrained, with greater concentration near the urban core, it still substantially overestimates the true built-up footprint. These results indicate that NDBI is highly sensitive to bright-soil conditions and therefore struggles to distinguish real urban growth from non-urban high-reflectance surfaces in Diriyah.
In practical terms, NDBI captures built-up features, but under these dryland conditions it also introduces widespread commission error. This makes it unsuitable, on its own, for reliable estimation of urban extent in the study area.
4.4. Unsupervised k-Means Clustering Results
The unsupervised k-means clustering approach also produces built-up area estimates that are much larger than those obtained from GHSL and broadly comparable to those derived from NDBI. In 2015, the selected built-up cluster extends across large parts of the study area, including many bright bare-soil surfaces surrounding Diriyah. In 2020 and 2025, the mapped built-up cluster becomes somewhat more compact, with some peripheral areas reassigned to other spectral classes, but it still covers a much broader area than the actual urban footprint.
The slight decline in mapped built-up area between 2015 and 2025 is counterintuitive and should not be interpreted as true urban shrinkage. Instead, it reflects year-to-year redistribution of spectral classes within a method that remains sensitive to weak separability between impervious surfaces and highly reflective desert soils. Compared with NDBI, clustering produces somewhat more contiguous spatial patterns, but it remains affected by the same underlying problem of spectral confusion.
In practical terms, the clustering approach identifies many actual urban pixels, yet it also misclassifies extensive non-urban surfaces as built-up. This limits its usefulness as a standalone basis for reliable built-up statistics in Diriyah.
4.5. Accuracy Assessment
The quantitative accuracy assessment based on 150 stratified random reference points confirms the superiority of GHSL over the other two methods. GHSL achieved the highest overall accuracy (0.88) and Kappa coefficient (0.83), together with strong user’s accuracy (0.91) and producer’s accuracy (0.86) for the built-up class. In contrast, NDBI produced substantially weaker results, with an overall accuracy of 0.53, Kappa of 0.41, user’s accuracy of 0.49, and producer’s accuracy of 0.71. Unsupervised clustering performed slightly better than NDBI but remained clearly below GHSL, with an overall accuracy of 0.61, Kappa of 0.50, user’s accuracy of 0.57, and producer’s accuracy of 0.79 (
Table 3).
These differences are too large to be treated as minor variation among otherwise similar methods. Instead, they indicate clear differences in the ability of the three approaches to represent built-up land under dryland conditions. The contrast between user’s and producer’s accuracy is also revealing. Both NDBI and clustering show higher producer’s accuracy than user’s accuracy, meaning that they capture many true built-up pixels but do so at the cost of substantial commission error. This pattern is consistent with the visual results, where non-urban bright surfaces are frequently misclassified as built-up.
GHSL, in contrast, shows a more balanced performance across all metrics and substantially fewer false positives. Quantitatively, GHSL exceeds NDBI by 0.35 in overall accuracy and 0.42 in Kappa, and exceeds unsupervised clustering by 0.27 in overall accuracy and 0.33 in Kappa. These margins confirm that GHSL provides the most reliable built-up mapping approach among the methods tested in Diriyah.
4.6. HBDM Results and Interpretive Role
The Hybrid Built-up Detection Model (HBDM) was used as a continuous urban intensity layer to synthesize the three mapping outputs and to visualize zones of convergence and uncertainty rather than to produce a separate binary classification. The HBDM maps highlight a compact but gradually expanding urban core around historic Diriyah. In 2015, high values are concentrated mainly in the historic core and adjacent settled areas. By 2020 and 2025, these zones extend outward toward newly developed corridors and project areas.
Compared with GHSL, HBDM provides a smoother gradation between clearly built-up areas and transitional zones. Compared with NDBI and clustering, it reduces the visual dominance of bright-soil overestimation by anchoring the interpretation more strongly to the more reliable GHSL baseline. In this sense, HBDM is most useful as a diagnostic layer that helps distinguish persistent urban cores from uncertain edges and areas of methodological disagreement.
Its value therefore lies in interpretation rather than validation. HBDM does not replace the formal accuracy assessment, but it adds an integrative perspective that is particularly useful in a dryland environment where the boundary between built-up, transitional, and reflective non-urban surfaces is often difficult to represent with a single method alone.
Higher values indicate stronger agreement among methods and more persistent built-up intensity, while intermediate values highlight transition zones and areas of greater methodological uncertainty.
5. Discussion
This study shows that estimates of urban expansion in Diriyah vary substantially according to the mapping approach used, and that these differences are large enough to influence both spatial interpretation and planning relevance. Although GHSL, NDBI, and unsupervised k-means clustering were all applied to identify built-up land, they did not produce minor variations around a shared urban pattern. Instead, they generated markedly different representations of the scale, configuration, and temporal direction of urban growth. This finding confirms that method selection in dryland urban mapping is not a routine technical step, but a substantive analytical decision with direct consequences for how urban change is understood and communicated.
5.1. Why GHSL Provides the Most Reliable Baseline in Diriyah
Among the evaluated methods, GHSL provides the most defensible baseline for monitoring urban expansion in Diriyah. Its outputs are spatially compact, temporally coherent, and strongly supported by the accuracy assessment. In a heritage-sensitive district such as Diriyah, a conservative tendency is methodologically preferable to widespread commission error, because overestimating built-up land can distort the perceived scale, location, and pace of transformation. The GHSL results indicate that Diriyah has undergone clear but spatially concentrated growth, with expansion occurring mainly through infill and contiguous outward development rather than diffuse, landscape-wide sprawl.
This interpretation is important in planning terms. In rapidly changing desert environments, a method that slightly underestimates fringe growth may still be more useful than one that systematically exaggerates urban extent across large non-urban areas. In this case, GHSL offers a more credible balance between sensitivity and reliability, making it particularly suitable as a monitoring baseline for the urban core and its immediate expansion zones.
5.2. Why NDBI and Unsupervised Clustering Overestimate Built-Up Extent
The weaker performance of NDBI and unsupervised clustering is closely related to the environmental characteristics of Diriyah. In high-reflectance dryland settings, bare soils, rocky surfaces, disturbed ground, and construction-related land cover may exhibit spectral responses similar to impervious materials. Under such conditions, methods that rely primarily on spectral contrast are especially vulnerable to commission error. NDBI is particularly sensitive to scene brightness and can classify extensive non-urban surfaces as built-up when threshold separation is weak. Unsupervised clustering reduces some of the dispersion visible in NDBI by producing broader and more contiguous spatial regions, but it remains governed by the same limitation of weak spectral separability.
A key implication is that both methods tend to reach a form of early saturation. Once large portions of the scene are already classified as built-up, the ability of the method to capture additional real growth declines sharply. This helps explain why NDBI and clustering produced unrealistic or even negative temporal trajectories despite the well-documented transformation of Diriyah over the study period. The problem is therefore not simply one of overestimation in individual years, but of reduced temporal sensitivity in environments where built-up and non-built surfaces are spectrally difficult to separate.
5.3. The Role and Limits of HBDM
A key contribution of this study is the introduction of the Hybrid Built-up Detection Model (HBDM) as an interpretive support layer. Its value lies not in replacing accuracy-tested classification, but in integrating the outputs of three methodological families into a continuous urban intensity surface. Used in this way, HBDM helps identify where the methods converge on plausible built-up areas and where they diverge because of uncertainty, mixed land cover, or likely misclassification. This is particularly useful in a dryland heritage setting, where the transition between established urban fabric, active development, exposed soil, and intermediate land surfaces is often difficult to represent using a single binary method.
At the same time, HBDM should not be interpreted as a substitute for formal validation. Because it was not assessed as an independent classifier, its role in this study remains explicitly diagnostic and supportive. This distinction is methodologically important. It preserves the integrity of the accuracy assessment while allowing the hybrid layer to contribute additional spatial insight, particularly in highlighting persistent urban cores, transitional zones, and areas of methodological disagreement.
5.4. Planning, Sustainability, and Heritage Implications
The findings have direct implications for urban monitoring and heritage-sensitive planning in Diriyah and similar desert cities. If built-up land is substantially overestimated, development pressure on heritage assets, landscape corridors, tourism areas, and infrastructure networks may also be overstated. Conversely, if emerging fringe development is detected too coarsely or too late, early signals of spatial pressure may be overlooked. In this context, the results suggest that a conservative but reliable baseline such as GHSL is preferable for core monitoring, especially when complemented by high-resolution visual checks and local contextual interpretation.
More broadly, the study supports a planning approach in which methodological caution is treated as part of sustainable land management. In places undergoing rapid transformation under Saudi Vision 2030, urban monitoring is not merely a technical exercise; it is part of a wider system of heritage protection, development control, and evidence-based spatial decision-making. A workflow that combines a stable global built-up product with locally informed interpretive tools may therefore offer a more useful planning basis than reliance on index-based or unsupervised outputs alone.
5.5. Limitations and Future Research
Several limitations should be acknowledged. First, the analysis evaluates three practical and widely used approaches rather than the full range of available supervised, object-based, or deep learning methods. This was intentional, as the study was designed to compare accessible and operationally distinct approaches under the same dryland conditions. Future research could extend this comparison by incorporating supervised classifiers such as Random Forest or other multi-feature approaches.
Second, the 2025 GHSL-based layer is a derived extension rather than an official GHSL release and should therefore be interpreted as a conservative approximation of recent built-up expansion rather than a formally released GHSL product. Third, HBDM was used as a diagnostic layer and was not validated as a separate classifier. Future research could test alternative weighting strategies, undertake sensitivity analysis, and explore multi-source hybrid models that incorporate thermal, radar, texture, or topographic information. Further work could also investigate whether morphology-sensitive measures, including fragmentation or urban fabric typologies, improve the interpretation of growth patterns in rapidly transforming heritage districts.
6. Conclusions
This study compared three remote-sensing approaches—GHSL, NDBI, and unsupervised k-means clustering—for mapping urban expansion in Diriyah, Saudi Arabia, between 2015 and 2025. The results show that method choice strongly influences both the magnitude and the apparent direction of urban change in dryland environments. GHSL produced the most spatially coherent and temporally plausible estimates, whereas NDBI and unsupervised clustering substantially overestimated built-up extent because of spectral confusion with bright bare soils and related high-reflectance surfaces.
The accuracy assessment confirms that GHSL provides the most reliable baseline among the tested approaches, with clearly higher overall accuracy and Kappa values than the other methods. Although NDBI and clustering identified many true built-up pixels, their large commission errors reduce their suitability for standalone use in a heritage-sensitive desert landscape. HBDM added value as an integrative diagnostic layer by highlighting persistent urban cores, uncertain edges, and areas of methodological disagreement, but it should be interpreted as a support surface rather than as a replacement for formal classification and validation.
Overall, the methodological lessons drawn from Diriyah extend beyond the study area itself. In arid and semi-arid cities undergoing rapid transformation, conservative global built-up products may provide a stronger monitoring baseline than index-based or unsupervised methods when surface reflectance conditions are highly ambiguous. When combined with local visual interpretation and planning context, such approaches can support more credible urban monitoring, more cautious land-management decisions, and more informed heritage-sensitive planning under large-scale development agendas such as Saudi Vision 2030.