Impact of Impervious Surface Expansion on Urban Thermal Environment Across Tropical Southeast Asian Megacities: Reliable Assessment Through Foundation Model Embeddings

Moukomla, Sitthisak; Meeprom, Phurith; Intarat, Kritchayan

doi:10.3390/earth7030076

Open AccessArticle

Impact of Impervious Surface Expansion on Urban Thermal Environment Across Tropical Southeast Asian Megacities: Reliable Assessment Through Foundation Model Embeddings

by

Sitthisak Moukomla

^1,2,*

,

Phurith Meeprom

³ and

Kritchayan Intarat

^1,4

¹

Department of Geography, Faculty of Liberal Arts, Thammasat University, Pathumthani 12121, Thailand

²

Research Unit in Geospatial Research and Analytics for Climate and Environment (GRACE Lab), Faculty of Liberal Arts, Thammasat University, Pathumthani 12121, Thailand

³

Department of Geoinformatics, Faculty of Humanities and Social Sciences, Burapha University, Chonburi 20130, Thailand

⁴

Research Unit in Geospatial Applications (Capybara Geo Lab), Faculty of Liberal Arts, Thammasat University, Pathumthani 12121, Thailand

^*

Author to whom correspondence should be addressed.

Earth 2026, 7(3), 76; https://doi.org/10.3390/earth7030076

Submission received: 3 April 2026 / Revised: 5 May 2026 / Accepted: 7 May 2026 / Published: 8 May 2026

(This article belongs to the Special Issue Climate-Sensitive Urban Design for Heatwave Mitigation)

Download

Browse Figures

Versions Notes

Abstract

Rapid urbanization in tropical Southeast Asia is transforming pervious land into impervious surfaces, intensifying the surface urban heat island (SUHI) effect and increasing the need for consistent urban thermal monitoring. This study assesses how impervious surface area (ISA) expansion relates to the urban thermal environment across five tropical megacities (Bangkok, Jakarta, Manila, Kuala Lumpur, and Ho Chi Minh City). AlphaEarth geospatial foundation model embeddings were used to reduce observation gaps caused by persistent cloud-cover, while MODIS land surface temperature (LST) was used to quantify the thermal response. We compared AlphaEarth classification against conventional Sentinel-2/NDVI approaches and an additional fairer annual Sentinel-2 full-band-plus-index Random Forest baseline, quantified ISA expansion for 2017–2024, and related ISA fraction to dry-season LST at 1 km resolution. Repeated random-holdout tests based on Google Earth Engine samples showed AlphaEarth mean IoU = 0.866 (95% CI: 0.857–0.875), compared with 0.758 (0.749–0.767) for the annual Sentinel-2 full-band-plus-index baseline and 0.686 (0.674–0.698) for the best single-date 5-index baseline. Spatial-block holdout tests gave similar but slightly lower values (AlphaEarth IoU = 0.859; annual Sentinel-2 baseline = 0.747; best single-date baseline = 0.673). Ho Chi Minh City experienced the fastest ISA expansion (+11.0 percentage points; slope = 1.48 pp yr⁻¹, 95% CI: 1.06–1.91), whereas Bangkok reached the highest ISA fraction (65.1%). ISA fraction and LST were consistently and positively associated across cities and years (Pearson r = 0.748–0.900), and mean SUHI intensity during 2017–2024 ranged from 4.01 °C in Bangkok to 8.51 °C in Manila. These results indicate that foundation model embeddings can support cloud-resilient mapping of impervious surface change and thereby improve assessment of tropical urban thermal environments, while also highlighting the need for independent ground-truth validation.

Keywords:

impervious surface expansion; urban thermal environment; Southeast Asian megacities; foundation model embeddings; AlphaEarth

1. Introduction

Southeast Asia is the fastest urbanizing region on earth, expected to have an urban population of over 400 million by 2030. This sprawl replaces former vegetated land with impervious materials such as concrete, asphalt and rooftops at a rate never before seen in human history which directly alters urban thermal landscapes via the urban heat island (UHI) effect, increases surface runoff and risk of flooding events, and reduces potential for carbon sequestration. Despite these critical environmental impacts, accurate monitoring of impervious surface change across tropical Southeast Asian megacities is fundamentally limited due to the persistent presence of cloud-cover that obscures 50–70% of optical satellite observations in any given year. This persistent cloud contamination renders traditional single-date remote sensing approaches operationally unreliable for consistent urban environmental assessment in the region.

Impervious surface mapping tracks urban development and land cover changes over time [1]. Such data supports urban planning with precise assessment of land cover, supporting sustainable development and infrastructure management [2]. Yet conventional methods frequently achieve only 60–80% classification accuracy—a level insufficient for operational decision-making [3]. In addition, climate adaptation plans rely on regular access to robust data relating to key urban issues such as urban heat islands [4], flooding risk assessments [5], and green space distribution [6], which is virtually impossible with intermittent and irregular satellite coverage. Monitoring informal settlements [7], important for supporting equitable urban development, becomes particularly difficult when cloud-induced gaps create systematic observation biases in already underserved areas [8,9]. Urban thermal environments are further shaped by local climate zones and green space configurations, while suburban climate vulnerabilities and extreme urban heat events underscore the urgency of continuous monitoring. Since the Landsat era, impervious surface mapping has relied on single-date satellite images and requires cloud-free imagery within specific time windows [10,11].

In tropical regions, this requirement is particularly challenging to satisfy [12]. Common strategies to mitigate cloud-related data loss include: (i) waiting for cloud-free windows, which creates temporal gaps in monitoring; (ii) manually interpreting partially clouded images, which is labor-intensive and error-prone; and (iii) discarding contaminated scenes entirely, which reduces temporal coverage [13]. Existing operational systems typically apply cloud thresholds of <10–20% to ensure image quality, but this creates substantial temporal gaps [14,15,16]. Multi-date compositing can minimize cloud contamination but cannot eliminate it entirely [17].

Traditional classification algorithms adapted to tropical contexts suffer from inadequate calibration and incorrectly assume minimal vegetation within impervious surfaces [18,19]. An assumption that is inaccurate for cities with tree-lined streets, gardens, and urban parks [20]. Through large-scale pre-training on Earth observation data, foundation models have significantly advanced remote sensing by enabling transferable semantic understanding of land surface properties. Unlike pixel-wise methods, foundation models can generalize to new sites and conditions by learning transferable representations from large-scale data [21]. AlphaEarth, pre-trained on satellite imagery, produces 64-dimensional embeddings from annual multi-temporal Sentinel-2 composites. These embeddings incorporate cloud masking, temporal gap filling, and multi-temporal compositing as built-in pre-processing steps [22,23], providing complete annual coverage regardless of cloud conditions [24]. However, the operational performance of such foundation model embeddings for specific classification tasks in tropical regions has not been systematically evaluated [25]. Key questions remain: How do these embeddings compare quantitatively against traditional classification approaches? Does performance vary with local cloud conditions? Can models trained on one set of cities transfer to unseen cities? And what is the actual operational feasibility of each approach in data-limited tropical contexts?

We evaluate how impervious surface expansion influences the urban thermal environment in five tropical Southeast Asian megacities, using geospatial foundation model embeddings as a supporting tool for cloud-resilient ISA estimation. The study investigates four research questions: (1) How much did ISA expand across the five cities during 2017–2024? (2) How strongly is ISA fraction associated with MODIS-derived dry-season LST and SUHI intensity? (3) How sensitive are ISA estimates to classification methodology under tropical cloud conditions? (4) Can AlphaEarth embeddings provide sufficiently consistent ISA estimates for urban thermal assessment when conventional single-date approaches are data-limited? The main contributions are: (1) quantification of city-specific ISA expansion and associated SUHI patterns, (2) evidence that methodological disagreement in ISA mapping can propagate into urban environmental interpretation, (3) a systematic comparison showing that multi-temporal foundation model embeddings improve operational completeness in cloud-persistent tropical regions, and (4) explicit discussion of reference-data uncertainty, shared Sentinel-2 lineage, and the need for independent validation.

2. Materials and Methods

2.1. Study Area

We chose five megacities of Southeast Asia with varying urban patterns and development stages as well as climatic conditions (Table 1, Figure 1). These cities extend over 20 degrees of latitude (6.2° S–14.6° N), represent populations from 1.8 to 13.9 million, and show cloud-cover varying between 50 and 71%, providing a wide variety for the evaluation of models and the simulation of cloud resilience.

Bangkok City, Thailand (13.7° N, 100.5° E) is a megacity habitat of 10.5 million people characterized by mixed formal–informal land uses, vast canal networks and swift peri-urban growth. The city has the unique feature of low-rise buildings with extensive vegetation coverage and intricate impervious–pervious blending. Annual cloud-cover is 50–51%, despite its monsoonal sense (May–October) in Bangkok [26,27].

The city of Jakarta, Indonesia (6.2° S, 106.8° E) is a typical densely populated coastal megacity with a population of 10.6 million and mature high-density development but has little room for further horizontal expansion. The cityscape is dominated by fragmented structures and rising vertical towers. Sitting on the equator, Jakarta sees 54–63% cloud-cover with little seasonal variation [28].

Manila, Philippines (14.6° N, 121.0° E), is a very crowded capital region of 13.9 million residents and integrates developed areas with vast informal settlements. Manila has the greatest population density among study cities (22,000 per km²) and is affected by typhoons and monsoons with average range of cloud-cover from 53 to 59% per year [29].

Kuala Lumpur, Malaysia (3.1° N, 101.7° E), a planned city of 1.8 million with engineered growth features and demarcated urban–forest interfaces. This city is the epitome of planned tropical development but also had the highest (67–71%) cloud-cover of all study cities as a result of year-round afternoon convective storms [30].

Ho Chi Minh City, Vietnam (10.8° N, 106.7° E) has a population of 9.0 million and is undergoing transformation from agricultural frontier into developed center. It has the fastest growth rate of the cities in our study with large impervious–pervious mixing, with 55–61% partial cloud-cover yearly [31].

2.2. Administrative Boundaries

We used FAO GAUL (Global Administrative Unit Layers) 2015 Level 1 vector boundaries that consist of standardized WHO/UN-based international administrative units. These official demarcations allow not only reproducibility but also future comparison of other cities in multi-city studies. We selected city boundary labels and loaded models from Google Earth Engine [32]—Bangkok (“Bangkok”), Jakarta (“Dki Jakarta”), Manila (“National Capital region (NCR)”), Kuala Lumpur (“Kuala Lumpur”), Ho Chi Minh City (“Ho Chi Minh City”)—for each with the ADM1_NAME tag. All the boundaries were obtained from Google Earth Engine’s public data catalog through feature collection path ‘FAO/GAUL/2015/level1’ and the delineation of the study area is completely reproducible.

2.3. Satellite Data and Cloud Statistics

2.3.1. Sentinel-2 for Cloud Analysis and Single-Date Methods

Cloud statistics and single-date method applications were based on Sentinel-2 Level-2A (Bottom of Atmosphere) harmonized collection (COPERNICUS/S2_SR_HARMONIZED) for two time periods, using 2020 and 2023 for cloud measurements, and only the year 2023 for the ablation. We considered the following data sets for each city–year pair: (1) total number of available images, (2) count of images that meet the operational threshold (<20% cloud-cover), (3) average percentage cloud coverage over all available results, (4) minimum percentage cloud coverage, and (5) random selection of n elements from list with replacement. Sentinel-2’s inherent cloud probability metadata (CLOUDY_PIXEL_PERCENTAGE) was utilized to record the extent of the availability of operational data according to standard single-date approaches.

2.3.2. AlphaEarth Embeddings

AlphaEarth Satellite Embedding collection (GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL) was used as the foundation model input, providing 64-dimensional annual embeddings at 10 m resolution. Each image contains bands A00–A63. According to the Google Earth Engine data catalog, the collection provides annual embeddings for 2017–2024; therefore, ISA change analysis in this manuscript is limited to 2017–2024, while MODIS LST analysis can extend to 2025, where thermal observations are available. These embeddings encode semantic knowledge of surface properties through multi-temporal and multi-source synthesis. Cloud handling is a built-in feature of the embedding generation pipeline, not a contribution of the present study.

The exact Google Earth Engine asset paths used in this study are: AlphaEarth—GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL (64-band annual embedding image, A00–A63, 10 m); Sentinel-2—COPERNICUS/S2_SR_HARMONIZED (Level-2A surface reflectance) filtered with ee.Filter.lt(‘CLOUDY_PIXEL_PERCENTAGE’, 80); Dynamic World—GOOGLE/DYNAMICWORLD/V1 (“built” probability band); MODIS LST—MODIS/061/MOD11A2 (8-day composite, day-time band LST_Day_1km); JRC GHSL—JRC/GHSL/P2023A/GHS_BUILT_S/2018; city boundaries—FAO/GAUL/2015/level1. The “middle-index” Sentinel-2 selection used for Method 3 is implemented as: imgList = collection.toList(collection.size()); img = ee.Image(imgList.get(imgList.size().divide(2).toInt())). All Random Forest classifications use n_trees = 100, max_depth = 15, min_samples_leaf = 5, and a fixed seed of 42 for the random sampler. All analysis code is provided as Supplementary Material with this submission: Code S1 (Earth Engine notebook for the four-method ablation, ISA statistics, and LST × ISA temporal analysis), Code S2 (Earth Engine sampling for the repeated-holdout robustness and mixed-pixel LST analyses), and Code S3 (local statistical analyses including Fisher-z and Theil–Sen trend CIs, Monte-Carlo SUHI uncertainty propagation, Wilson OA CIs, and stratified Pearson). The Earth Engine project used throughout is ee-pythoncolab-418913. All quantitative claims in this manuscript can be reproduced from the data tables (Tables S1–S11, Data S1) provided in the same Supplementary Material package. The complete code and data are also openly available on GitHub at https://github.com/SitthisakMoukomla/tropical-urban-AlphaEarth (accessed on 6 May 2026) and archived on Zenodo (DOI: 10.5281/zenodo.19945781, https://doi.org/10.5281/zenodo.19945781).

The embeddings are generated by the AlphaEarth Foundations processing pipeline using multi-temporal Earth observation inputs and learned 64-dimensional feature representations. Unlike spectral bands, embedding dimensions are not directly interpretable physical measurements; they are learned axes that summarize temporal trajectories and surface context. Consequently, comparisons with low-dimensional single-date spectral-index baselines should be interpreted as comparisons between operational workflows, not as a controlled test of dimensionality alone.

2.3.3. Dynamic World as Reference Dataset

Validation of impervious surface extent was generated using Dynamic World V1 (GOOGLE/DYNAMICWORLD/V1), a global land cover product that delivers near real time classification at 10 m resolution and updates daily [33]. We adopted the standard 0.5 threshold on the Dynamic World ‘built’ probability layer to delineate urban from non-urban pixels:

y = \{\begin{array}{l} 1 & if P (built) > 0.5 \\ 0 & otherwise \end{array}

(1)

where

y

represents the reference label and

P

(impervious) is the Dynamic World impervious probability score.

Dynamic World is itself a deep learning-derived product, not field-verified ground-truth. We therefore use Dynamic World as a reference product for inter-product comparison rather than as an absolute validation dataset. Because both Dynamic World and AlphaEarth draw on Sentinel-2 information, their agreement may be inflated by shared data lineage. IoU, F1, and accuracy values are reported as concordant with Dynamic World labels to reduce over-interpretation. We also include an independent spatial consistency check against JRC GHSL Built-up Surface and discuss remaining validation uncertainty in the limitations.

3. Methods

3.1. Cloud-Cover Analysis

We quantified cloud-cover characteristics using Sentinel-2 metadata for each city from 2020 and 2023 to detail operational challenges. Mean cloud-cover was calculated as:

C_{mean} = \frac{1}{N} \sum_{i = 1}^{N} c_{i}

(2)

where C_mean is mean cloud-cover percentage, N is total images available, and c_i is cloud percentage of image i (from CLOUDY_PIXEL_PERCENTAGE property).

Operational data availability under traditional single-date approaches was calculated as:

A = \frac{N_{< 20 %}}{N} \times 100 %

(3)

where N < 20% represents images with cloud-cover below the operational threshold (20%), following standard practice in operational remote sensing.

3.2. Sample Extraction Protocol

For each city–year combination, we extracted samples using the Google Earth Engine sample() function with consistent parameters: (1) 1000–2000 records per city–year for the original ablation study and main analysis; (2) city boundaries defined by FAO GAUL Level 1 administrative units; (3) a spatial scale of 10 m to match native Sentinel-2 and AlphaEarth resolution; (4) non-stratified uniform sampling to preserve natural class distributions; (5) a fixed random seed of 42 for reproducibility in the original point estimates; and (6) an optimized tile scale of 4 for memory management. To address reviewer concerns about fixed-seed dependence, we added a repeated-split robustness analysis using Google Earth Engine feature samples and local Random Forest training. For each city and method, 2500 GEE sample points were extracted where possible, and 20 repeated stratified random holdouts plus 20 spatial-block holdouts (approximately 0.05° blocks) were evaluated. This additional analysis reports mean performance, standard deviation, and 95% confidence intervals rather than relying on a single fixed-seed estimate.

3.3. Classification Approaches: Ablation Study Design

3.3.1. Method 1: AlphaEarth Multi-Temporal Foundation Model (Proposed)

We used AlphaEarth 64-dimensional embeddings derived from annual Sentinel-2 composites as input features. The full 64-D embedding space (A00–A63) encodes impervious surface semantics learned through multi-temporal synthesis, where each dimension corresponds to features learned during pre-training on diverse global landscapes. For classification, we employed Random Forest with 100 trees and a maximum depth of 20, following Breiman (2001) [34]. We selected this architecture for its computational efficiency and interpretability through feature importance analysis. Data were split into 70% training and 30% testing using stratified partitioning, with all 64 embedding dimensions as features and binary labels (impervious = 1, pervious = 0). Temporal integration was handled inherently through the annual compositing process used to produce the embeddings, eliminating the need for manual cloud masking. This represents the proposed foundation model approach combining semantic and temporal information.

To clarify how each year is classified from the 64-D embedding space, the procedure is summarized in five steps. (1) For each target year y in {2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024} the AlphaEarth annual embedding image for year y is loaded from GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL with bands A00–A63 at 10 m.

(2) Training pixels are sampled from the same year-y Dynamic World built-probability layer using a fixed seed of 42; the original ablation study used 1000–2000 records per city–year.

(3) A Random Forest classifier (n_trees = 100, max_depth = 15, min_samples_leaf = 5) is trained on the raw 64-D feature vector per pixel. The classifier operates on the unprojected embedding because each AlphaEarth dimension carries an independent pre-trained semantic signal whose collective effect is the cloud-robust multi-temporal context that the foundation model provides.

(4) The trained classifier is applied to the full 64-D embedding image of year y, producing a binary 10 m impervious-surface map for that city–year pair.

(5) Validation pixels (a separate stratified hold-out) are scored with IoU, F1 and Overall Accuracy. The same procedure is repeated independently for every year, which is what enables the year-by-year ISA estimates reported in Section 4.6. The robustness of this protocol is further evaluated through repeated random and spatial-block holdouts in Section 3.3.5.

3.3.2. Method 2: Best Single-Date Sentinel-2 (Traditional Best-Case)

This method uses the single least-cloudy Sentinel-2 image from the target year (2023), selected by sorting images by CLOUDY_PIXEL_PERCENTAGE and taking the first result. Five spectral indices were computed from this best available image: Normalized Difference Vegetation Index (NDVI), Normalized Difference Built-up Index (NDBI), Normalized Difference Water Index (NDWI), Built-up Area Index (BAI), and Brightness:

NDVI = \frac{B 8 - B 4}{B 8 + B 4}

(4)

NDBI = \frac{B 11 - B 8}{B 11 + B 8}

(5)

NDWI = \frac{B 3 - B 8}{B 3 + B 8}

(6)

BAI = \frac{B 4 - B 12}{B 4 + B 12}

(7)

Brightness = \frac{B 2 + B 3 + B 4}{3}

(8)

where B2–B4 are visible bands (Blue, Green, Red), B8 is Near-Infrared (NIR), B11 is Shortwave Infrared 1 (SWIR1), and B12 is SWIR2.

The classification is based on Random Forest with trees = 50 and maximum depth = 15, trained with the five indices. The training protocol was consistent with Method 1 (70/30 splits, stratified–binary classifications). This technique is based on a single clearest image without any time-lag treatment, which corresponds to the traditional “wait for clear day” strategy while simply testing the upper limit of single-date performance.

3.3.3. Method 3: Random Single-Date Sentinel-2 (Operational Reality)

This method uses a reproducible pseudo-random Sentinel-2 image from the target year (2023), selected after converting the image collection to a list and taking the middle-index image after date sorting. This choice was used only to create a repeatable single-date operational baseline; it should not be interpreted as a truly random draw from all possible acquisitions. Features and classifier parameters were identical to Method 2 (five spectral indices, Random Forest with 50 trees, maximum depth of 15). This approach simulates a constrained operational workflow where analysts may not have time to identify optimal cloud-free acquisition dates.

3.3.4. Method 4: NDVI Threshold (Traditional Baseline)

A median composite of all Sentinel-2 images from target year (2023) was used as this was considered a conventional best practice for computation of vegetation index. Classification—direct thresholding without machine learning:

Urban = \{\begin{array}{l} 1 & if NDVI < 0.2 \\ 0 & otherwise \end{array}

(9)

The 0.2 threshold is commonly used as a simple operational baseline for distinguishing vegetal from built-up surfaces in index-based workflows. Temporal processing by median compositing a sequence of subscenes for the NDVI computation minimizes cloudy pixels while maintaining vegetation characteristics. This approach is used to compare performance gain because it does not rely on training data or machine learning.

3.3.5. Additional Fairer Sentinel-2 Annual Composite Baseline and Robustness Testing

To address the dimensionality concern that AlphaEarth uses 64 embedding dimensions while the single-date baselines use only five spectral indices, we added an annual Sentinel-2 Random Forest baseline using ten reflectance bands (B2, B3, B4, B5, B6, B7, B8, B8A, B11, and B12) plus five spectral indices (NDVI, NDBI, NDWI, BAI, and Brightness). This 15-feature baseline uses an annual median Sentinel-2 composite and therefore represents a stronger optical benchmark than the single-date 5-index workflow, although it still lacks the learned semantic representation of AlphaEarth embeddings.

For each of the three workflows above (AlphaEarth 64-D, annual Sentinel-2 15-feature, and best single-date 5-index), we drew 2500 stratified-random labeled pixels per city using the corresponding-year Dynamic World built layer as reference and ran two complementary held-out evaluations: (i) twenty repeated random stratified holdouts with a 70/30 split (varying both the GEE sample seed and the local train seed); and (ii) twenty spatial-block holdouts using 0.05° (~5 km) blocks to mitigate spatial autocorrelation between training and test pixels. The IoU mean and 95% confidence interval reported in Section 4.4 are computed across the 20 replicates of each split type. This protocol provides an honest sample-based robustness check; it is not a wall-to-wall production benchmark, and the GEE sample timing reported in Section 4.4.3 should be interpreted accordingly.

3.4. Performance Metrics

Model performance was evaluated using three complementary metrics.

Intersection over Union (IoU) served as the primary metric:

IoU = \frac{T P}{T P + F P + F N}

(10)

where TP = true positives (correctly classified impervious surfaces), FP = false positives (pervious surfaces misclassified as impervious), and FN = false negatives (impervious surfaces misclassified as pervious). IoU is stricter than accuracy, penalizing both commission and omission errors equally.

F1 Score provided harmonic mean of precision and recall:

F_{1} = 2 \times \frac{precision \times recall}{precision + recall} = \frac{2 \times T P}{2 \times T P + F P + F N}

(11)

Overall Accuracy was calculated as:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(12)

where TN = true negatives (correctly classified pervious surfaces).

We selected IoU as a primary metric due to its sensitivity to both error types and its direct interpretability for impervious surface extent applications. In this case, IoU > 0.85 was generally considered operational quality.

Uncertainty propagation framework. We follow the Olofsson et al. (2014) [35] good-practice framework for accuracy assessment and area estimation in remote sensing of land change. Three components of uncertainty are propagated. (i) Reference-data uncertainty is bounded by the Wilson 95% confidence interval on overall agreement against the JRC GHSL Built-up Surface product (Section 4.8). (ii) Classifier uncertainty is propagated via 20 repeated random and 20 spatial-block holdouts (Section 3.3.5) with Fisher–z 95% confidence intervals on Pearson statistics. (iii) Land-surface-temperature retrieval uncertainty is propagated via a 1000-iteration Monte-Carlo simulation that perturbs each annual MOD11A2 LST observation with N(0, 1 K) noise consistent with the documented retrieval RMSE (Section 4.7). This framework is consistent with recent calls for more reliable, complete and equitable global urban land-use efficiency assessments [36].

3.5. Cloud-Performance Correlation Analysis

We computed Pearson correlation between mean cloud-cover and IoU for each city–year observation to assess whether foundation model performance depends on cloud-cover:

r = \frac{\sum_{i = 1}^{n} (x_{i} - \overline{x}) (y_{i} - \overline{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \overline{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \overline{y})}^{2}}}

(13)

where

x_{i}

represents cloud-cover percentage and

y_{i}

represents IoU for observation i, with

\overline{x}

and

\overline{y}

being respective means.

We evaluated statistical significance using p-values testing the null hypothesis H₀: r = 0 (no linear correlation) at a significance level of α = 0.05. A weak correlation (|r| < 0.3) combined with p > 0.05 would suggest no detectable dependence of performance on cloud conditions, thereby supporting operational viability in persistently cloudy environments.

3.6. Cross-City Transferability

We assessed cross-city generalization of the models using Leave-One-City-Out (LOCO) cross-validation. The LOCO tests the premise that models trained on a diverse set of cities transfer to unseen spaces. For each city c {Bangkok, Jakarta, Manila, Kuala Lumpur, Ho Chi Minh City}: (1) Random Forest trained on AlphaEarth embeddings from cities ≠ c; (2) tested on city c (all years = combined), and (3) IoU_c and F1c calculated. Transferability metric was calculated as:

{IoU}_{transfer} = \frac{1}{5} \sum_{c = 1}^{5} {IoU}_{c}

(14)

Performance drop was defined as:

Δ = {IoU}_{overall} - {IoU}_{transfer}

(15)

Small performance drop (Δ < 5%) indicates strong generalization, enabling deployment to new cities without local training data—critical for data-scarce regions.

3.7. Statistical Comparison Across Methods

Ablation study methods were compared using paired comparisons (mean IoU difference, standard deviation, range), effect size using Cohen’s d:

d = \frac{μ_{1} - μ_{2}}{\sqrt{\frac{σ_{1}^{2} + σ_{2}^{2}}{2}}}

(16)

where μ and σ represent mean and standard deviation of IoU for each method. Cohen’s d > 0.8 indicates large effect size (substantial practical difference).

Improvement percentage was calculated as:

Improvement = \frac{{IoU}_{method} - {IoU}_{baseline}}{{I o U}_{b a s e l i n e}} \times 100 %

(17)

using NDVI Threshold as baseline (established practice).

Success rate was defined as percentage of cities where sufficient samples were obtained (n ≥ 100) to enable model training and evaluation, quantifying operational reliability beyond accuracy.

3.8. Impervious Surface Area Estimation

To quantify the practical consequence of methodological choice on impervious surface area (ISA) estimation, we computed pixel-level impervious fraction for each classification method across all five cities. For each 60 × 60 km study area, the mean classification value (binary: 0 = pervious, 1 = impervious) was calculated at 10 m resolution using Google Earth Engine’s reduceRegion function, yielding the impervious fraction which was then converted to area (km²). This analysis was performed for all four classification approaches (AlphaEarth, Best Single-Date S2, Random Single-Date S2, and NDVI Threshold) to assess the range of ISA estimates produced by methods of varying reliability. The divergence between methods provides a direct measure of how methodological choice propagates into environmental policy-relevant quantities.

3.9. Land Surface Temperature–Impervious Surface Correlation

To assess the environmental relevance of ISA expansion, we analyzed the relationship between AlphaEarth-derived ISA fraction and land surface temperature (LST) as a proxy for the surface urban heat island (SUHI). ISA mapping was conducted for 2017–2024, consistent with AlphaEarth annual embedding availability, while MODIS Terra MOD11A2 Version 6.1 8-day LST observations were summarized for dry seasons through 2025 where available. LST_Day_1km was converted from Kelvin to °C using the 0.02 scale factor. For each city and year, dry-season composites (December of the preceding year through April) were constructed to capture peak thermal contrast under comparatively cloud-minimized conditions. Quality control filtering used the QC_Day band, retaining pixels with acceptable LST and emissivity quality. The 10 m AlphaEarth binary ISA layer was aggregated to 1 km impervious fraction using a mean reducer to match MODIS resolution. Pearson correlation coefficients quantified the LST-ISA relationship, and SUHI intensity was defined as the LST difference between urban pixels (ISA fraction >= 0.50) and rural pixels (ISA fraction <= 0.10).

3.10. Independent Spatial Consistency Check Using JRC GHSL

Because both the AlphaEarth classifier and its training labels (Dynamic World) share a common Sentinel-2 data lineage, we conducted an independent cross-dataset spatial consistency check using the JRC Global Human Settlement Layer Built-up Surface product (GHS-BUILT-S, P2023A release) [37] at 10 m resolution. This dataset, derived from Sentinel-2 imagery through an independent processing chain and classification algorithm developed by the European Commission Joint Research Centre, provides building footprint surface area (m²) per grid cell for the 2018 epoch. The GHSL built-up surface values were divided by 100 to obtain building footprint fraction (0–1 range) and aggregated to 1 km resolution using a mean reducer to match the MODIS LST analysis grid. For each city, AlphaEarth ISA fraction (also aggregated to 1 km for the 2018 epoch) was compared against GHSL building fraction using approximately 1000 stratified random sample points. Pearson correlation coefficient, root mean square error (RMSE), and mean bias were computed. It is important to note that GHSL measures building footprint fraction, which is a subset of total impervious surface area (ISA includes roads, parking lots, and other non-building sealed surfaces). Therefore, a systematic positive bias (AlphaEarth ISA > GHSL building fraction) is expected, and the analysis focuses on spatial pattern consistency rather than absolute value agreement.

4. Results

4.1. Cloud-Cover Statistics: Quantifying the Tropical Challenge

Specifically severe operational limitations of the traditional single-date methods were uncovered by cloud-cover analysis (Table 2). Overall, mean cloud-cover in five cities averaged 58.5% (range: 50.9–68.9%), with Kuala Lumpur under worst conditions (68.9% mean, up to 71.1% in year 2023).

Traditional approaches with single-date requirements and <20% cloud threshold would obtain only 17.6% of available images on average (range: 6.0–27.4%), indicating an operational failure rate of 82.5%. Kuala Lumpur’s usable rate of 6.0% implies that a traditional monitoring approach only works for 6% of the time—roughly 22 days per yr—meaning it is operationally unviable in most continuous impervious surface mapping applications. Significant temporal patterns emerged in cloud-cover variations up to ±10 percentage points over multiple years (2020–2023) with no city achieving a consistent usable image rate across all years. Even with Bangkok reporting the least cloud-cover (50.9%) it exceeds a 72.6% failure rate—not enough for operational systems to be trusted.

4.2. Foundation Model Performance

4.2.1. Overall Classification Accuracy

The foundation model showed strong and generalizable detection of impermeable surfaces in various urban forms. The results reached 92.59% IoU and 94.67% Overall Accuracy. The high recall (97.33%) for the impervious class indicates that a wide range of surface types have been detected, from dense high-rise areas in Jakarta to large low-rise suburbs in Bangkok on to informal settlements in Manila. The class balance (31.6% pervious, 68.4% impervious) represents actual surface proportions across the five megacities and was preserved through our non-stratified random sampling strategy. In particular, AlphaEarth embeddings under Random Forest classification performed well for 25,178 samples sampled across three years (2018, 2020 and 2023) and five cities (Table 3).

4.2.2. City-Level Performance Despite Variable Cloud-Cover

The lowest-performing city, Kuala Lumpur (68.9% mean cloud-cover and only 6.0% of images could be used under more traditional methods) had an IoU of 88.2%, which is above the 85% operational quality threshold. This shows the power of foundation models exactly where traditional methods fail in 94 percent of the cases. This indicates a significant variability in performance across cities (83.1–93.7% IoU, range = 10.6%), likely due to different urban morphology rather than the cloud conditions. The lower accuracy for Ho Chi Minh City (83.1% IoU) is due to the pan city rapid peripheral expansion and mixed urban–agricultural transitional land uses there, not its cloud-cover (58.1%, similar to multi-city mean). City-level results, along with cloud statistics in Table 4, show that accuracy remains generally constant despite extreme variability in stages of cloud-cover.

4.3. Cloud-Performance Independence: Statistical Evidence

4.3.1. Correlation Analysis

The correlative analysis suggests that there is no significant association between cloud-cover and classification accuracy across the five cities over 2-year period (2020, 2023) (Pearson’s r = −0.069, p = 0.851; Figure 2). The correlation coefficient approaches zero and thus shows negligible linear relation between them. The negative sign at this magnitude is not meaningful—it is little more than a random variation rather than a systematic trend. Importantly, p = 0.851 substantially exceeds all standard significance thresholds (α = 0.05, 0.01 or 0.001), so we do not reject the null hypothesis of no correlation at all. This finding offers statistical evidence that foundational model performance is agnostic to local cloud conditions—a prerequisite for operational deployment in the tropics.

4.3.2. Fixed-Seed Point Estimates Across Years

Cloud-cover varied substantially between 2020 and 2023 (Jakarta: −9.1 percentage points; Manila: +5.7 percentage points), whereas the fixed-seed classification point estimates were identical within each city (Table 5). We interpret this result conservatively: it demonstrates reproducibility of the sampling and classification protocol under a fixed seed, but it should not be treated as independent evidence of temporal model stability. A stronger temporal-stability claim would require repeated random seeds, spatial-block resampling, and confidence intervals around each city–year estimate.

4.4. Ablation Study: Quantitative Method Comparison

The additional repeated-holdout robustness analysis confirmed that AlphaEarth remained the strongest workflow even against the fairer annual Sentinel-2 composite baseline. Across five cities, AlphaEarth achieved mean IoU = 0.866 (95% CI: 0.857–0.875) under repeated random holdouts and 0.859 (0.848–0.869) under spatial-block holdouts. The annual Sentinel-2 full-band-plus-index baseline achieved mean IoU = 0.758 (0.749–0.767) under repeated random holdouts and 0.747 (0.733–0.762) under spatial-block holdouts. The best single-date 5-index baseline achieved mean IoU = 0.686 (0.674–0.698) under repeated random holdouts and 0.673 (0.654–0.691) under spatial-block holdouts. These results reduce, but do not eliminate, the dimensionality concern; AlphaEarth still benefits from learned multi-temporal embeddings, but its advantage persists after comparison with a substantially stronger Sentinel-2 baseline (Table 6).

4.4.1. Leave-One-City-Out Transferability Results

NDVI Threshold performance on complex impervious–pervious gradient was poor: Bangkok yielded only 46.23% IoU, and Ho Chi Minh City produced 37.68% IoU—both well below the 85% operational threshold—demonstrating that NDVI Threshold-based methods are operationally impractical for heterogeneous surface compositions commonly found in tropical cities. Method-level performance by city is given in Table 7, showing systematic patterns of failure to particular properties and environments with morphological complexity.

4.4.2. Data Acquisition Failures as Primary Evidence

Single-date methods also exhibited operational failure (last column in Table 7) within tropical regions. Both Bangkok and Jakarta, despite having low cloud-cover during data acquisition, reported a relatively high number of sample failures. Such differences highlight that cloud percentage metadata is often a conservative measure of atmospheric obscuration, and undermines the reliability of data. Therefore, relying exclusively on selection strategies based on “best available image” and/or cloud-cover may be inadequate to guarantee accuracy and completeness of data in such environments. To address these limitations and to improve the robustness of remote sensing analyses in tropical environments, it is essential to integrate holistic cloud detection algorithms as well as complementary validation procedures.

4.4.3. Operational Feasibility Assessment

As an operational screening summary, the Best Single-Date Sentinel-2 method achieved an availability-weighted IoU of only 8.9% (Mean IoU × Coverage × Success Rate = 84.2% × 17.6% × 60%). This index is dimensionless and should not be interpreted as a physical environmental metric; it is used only to summarize the combined effect of accuracy, image availability, and workflow success. No traditional method achieved both high accuracy and consistent operability across all study cities; only AlphaEarth satisfied both criteria simultaneously. Table 8 quantifies the trade-off between temporal coverage and classification accuracy. The computational benchmark from the sampled robustness analysis showed that local Random Forest train/evaluate time was similar across methods (approximately 0.22–0.28 s per run), while GEE sampling time differed by workflow. Mean GEE sampling time was 20.6 s for AlphaEarth, 16.4 s for the best single-date 5-index baseline, and 69.8 s for the annual Sentinel-2 full-band-plus-index baseline. These values are sample-based benchmarking metrics rather than full-raster production runtimes, but they show that the fairer annual Sentinel-2 baseline required substantially more GEE sampling time than AlphaEarth in this implementation.

4.5. Cross-City Transferability

An average transferability IoU of 89.2% represents a 3.4 percentage-point decline from the overall within-sample performance (92.6% to 89.2%), suggesting useful cross-city generalization. Ho Chi Minh City showed the largest drop (−9.5%), which is consistent with its rapid peripheral expansion, ribbon development along transport corridors, and mixed urban–agricultural transition zones. These landscapes create 10 m and 1 km mixed pixels that are more difficult for both the classifier and the Dynamic World reference product. The LOCO results therefore support broad transferability but also indicate that fast-expanding peri-urban landscapes require additional local validation (Table 9).

Figure 3 illustrates AlphaEarth-derived ISA fraction (10 m classification aggregated to 250 m) for 2017 and 2024 (left columns), with ISA percentage annotated. The right column shows ISA change (2024 minus 2017 fraction), where red indicates new impervious surface expansion and blue indicates apparent ISA loss. Bangkok and Ho Chi Minh City exhibit the strongest peri-urban expansion (+7.4 and +11.0 percentage points, respectively), while Manila’s dense core shows limited expansion potential (+2.7 percentage points). Minor blue pixels likely reflect single-epoch classification inconsistency rather than actual demolition. Scale bar: 10 km. Study area: 60 × 60 km per city.

4.6. Impervious Surface Area Estimates: Method Disagreement

These four classification methods yielded widely varying ISA estimates across all five cities, demonstrating the sensitivity of impervious surface mapping to methodological decisions. Bangkok had the greatest overall ISA of any city in this study, as AlphaEarth estimated 56.2% (1959 km²) and Best S2 (RF) produced a similar proportion at 58.7% (2046 km²). Despite this, the Random S2 (RF) baseline produced only 13.4% (466 km²), confirming that single-date optical classification without detailed consideration of scene content still significantly underestimates impervious coverage for cloud-prone tropical environments. In Jakarta, the standard supervised methods provided for the most agreed upon estimates with AlphaEarth (49.6%), Best S2 (49.0%) and NDVI Threshold (60.0%) fall into a narrower band relative to Random S2 at 59.9%. Manila showed the greatest method disagreement (58.0 pp), where NDVI Threshold grossly overestimates ISA at 77.8% vs. AlphaEarth (28.8%) and Best S2 (19.8%), likely due to the large variations in and extent of bare soil and informal settlements that frame their simple vegetation indices poorly. Kuala Lumpur and Ho Chi Minh City had moderate levels of disagreement with scores of 16.5 percentage points and 17.0 percentage points, respectively. Between all cities, AlphaEarth and Best S2 (RF) had the closest agreement (mean absolute difference: 7.8 percentage points), confirming that consensus-based classification performance can approach that of careful supervision without the need for manual training sample collection per scene with just “one-shot” transfer to unseen areas through a foundation model embedding alone.

4.7. Urban Heat Island Evidence: LST–Impervious Surface Relationship

The MODIS-derived dry-season LST exhibited a strong positive spatial association with AlphaEarth ISA fraction across all five cities and available study years. Pearson correlation coefficients ranged from r = 0.748 (Ho Chi Minh City, 2023) to r = 0.900 (Manila, 2021), with all values statistically significant. Fisher-z confidence intervals were added for all city–year correlations, supporting the robustness of the ISA-LST association while still requiring cautious interpretation because correlation does not prove causality.

Surface urban heat island (SUHI) intensity, defined as the LST difference between urban pixels (ISA ≥ 50%) and rural pixels (ISA ≤ 10%) following the thresholds of Imhoff et al. (2010) [38], varied markedly among the five cities.

Two operational definitions of “urban” are used in this study: the AlphaEarth-classified ISA thresholds (≥50% urban/≤10% rural) for the city-level SUHI metric reported below; and the Dynamic World built-fraction proxy for the 1 km mixed-pixel sensitivity analysis (Section 4.7 paragraph on mixed-pixel sensitivity) which uses ≥70%/≤30% strata to avoid the memory cost of full-raster AlphaEarth classification. The two definitions are consistent in spirit (both contrast pure-impervious vs. pure-pervious 1 km pixels) but the strict thresholds differ between analyses; we report city-level SUHI under Imhoff thresholds and stratified Pearson under DW thresholds for transparency.

Manila had the most extreme SUHI effect at 8.92 °C in 2023 (LST_urban = 35.6 °C, LST_rural = 26.6 °C), followed by Kuala Lumpur (7.99 °C), Jakarta (6.69 °C), Bangkok (4.37 °C) and Ho Chi Minh City (4.00 °C). The very high SUHI values from Manila during this period can be ascribed to its relatively high population density (~22,000 inhabitants/km²) plus steep urban–rural gradients due to surrounding agricultural lowlands and mountainous regions. By contrast, Bangkok showed a moderately high SUHI (3.1–4.5 °C) although its ISA was the highest (57.7–65.1%), which can be attributed to the moderating role of the Chao Phraya River system and wide canal network in this area.

Temporal analysis of AlphaEarth-derived ISA during 2017–2024 showed that Ho Chi Minh City had the most rapid impervious area growth, increasing from 36.7% in 2017 to 47.7% in 2024 (+11.0 percentage points; linear slope = 1.48 pp yr⁻¹, 95% CI: 1.06–1.91; Theil-Sen slope = 1.43 pp yr⁻¹). Bangkok increased by +7.4 pp (slope = 1.27 pp yr⁻¹, 95% CI: 0.85–1.69), Manila by +4.7 pp (0.52 pp yr⁻¹, 95% CI: 0.32–0.73), Jakarta by +4.1 pp (0.63 pp yr⁻¹, 95% CI: 0.41–0.85), and Kuala Lumpur by +3.3 pp. Kuala Lumpur’s linear slope confidence interval included zero (−0.34 to 1.92 pp yr⁻¹), so its trend is reported conservatively despite a positive Theil-Sen slope (a city-level summary of ISA expansion and SUHI intensity is provided in Table 10).

The spatial distribution of this expansion is shown in the change maps (Figure 3) where new ISA concentrates along radial transport corridors in Ho Chi Minh City, whereas Bangkok exhibits more dispersed peri-urban expansion. Mean SUHI intensity during 2017–2024 was highest in Manila (8.51 °C, 95% CI: 8.10–8.92) and Kuala Lumpur (8.07 °C, 95% CI: 7.69–8.44), followed by Jakarta (6.42 °C, 95% CI: 5.87–6.96), Ho Chi Minh City (4.17 °C, 95% CI: 3.88–4.46), and Bangkok (4.01 °C, 95% CI: 3.68–4.34). LST values for 2025 should be treated as provisional because the 2025 dry-season composite was incomplete at the time of analysis.

The CIs above account for inter-annual sampling variability only. To incorporate the documented MOD11A2 retrieval uncertainty (approximately ±1 K RMSE), we ran a 1000-iteration Monte-Carlo simulation that perturbs each annual LST_urban and LST_rural with independent N(0, 1 K) noise (treated as systematic per annual city-mean—a conservative upper bound that does not assume averaging across city pixels). The resulting combined 95% confidence intervals for the 2017–2024 city-mean SUHI are: Bangkok 4.01 °C [3.03, 4.99], Jakarta 6.42 °C [5.33, 7.50], Manila 8.51 °C [7.47, 9.55], Kuala Lumpur 8.07 °C [7.02, 9.12], and Ho Chi Minh City 4.17 °C [3.19, 5.14]. The LST retrieval term widens the SUHI CI by approximately 140–305%, indicating that retrieval uncertainty—not inter-annual sampling—dominates the uncertainty budget for city-mean SUHI. Per-city Monte-Carlo statistics are reported in Supplementary File SUHI_LST_MonteCarlo_CI.csv. The reported city ranking (Manila > Kuala Lumpur > Jakarta > Ho Chi Minh City ≈ Bangkok) is preserved under this conservative bound. The Monte-Carlo procedure provides the LST component of the end-to-end uncertainty budget recommended by Olofsson et al. (2014) [35] for remote-sensing land-change assessment.

The combined CIs above treat the entire MOD11A2 retrieval RMSE (~1 K) as a systematic error per annual city-mean and perturb LST_urban and LST_rural independently. This is a conservative upper bound. The MOD11A2 quality assurance literature decomposes the retrieval error into a systematic component (~0.5 K, dominated by emissivity, view-angle, and atmospheric effects shared across nearby pixels) and a random component (~0.7 K, averaging down with the number of city-pixel observations). Under a less conservative split—σ_systematic = 0.5 K, σ_random = 0.7/√n with n ≈ 700–1000 city pixels—the random term contributes ≤0.03 K to the city-mean SE, and the resulting combined CIs would be substantially narrower than those reported. Furthermore, if systematic biases are spatially correlated between adjacent urban and rural pixels (cov(LST_urban, LST_rural) > 0), the SUHI = LST_urban − LST_rural variance is smaller than the independent-error Monte-Carlo assumes. We retain the conservative reporting in the main text and provide this sensitivity caveat for transparency; the qualitative city ranking and the SUHI decline trend reported in Section 5.5 are robust to the less conservative assumption.

A mixed-pixel sensitivity analysis was added for 2024 using MODIS LST points and Dynamic World built fraction in 500 m buffers as an ISA proxy. High-ISA pixels were warmer than low-ISA pixels by 4.00 °C in Bangkok, 6.09 °C in Jakarta, 8.15 °C in Kuala Lumpur, and 8.53 °C in Manila. Ho Chi Minh City had too few high-ISA pure pixels under the >70% threshold for a stable high-minus-low contrast, but mixed pixels averaged 33.78 °C compared with 31.62 °C in low-ISA pixels. This sensitivity analysis supports the ISA-LST relationship while explicitly showing how mixed pixels behave between rural and dense urban thermal conditions.

Pooling across the five cities the LST–ISA Pearson r is 0.649 (95% CI: 0.600–0.693, n = 599) for pure-pervious pixels (ISA < 30%), 0.511 (95% CI: 0.461–0.558, n = 883) for mixed pixels (30% ≤ ISA ≤ 70%), and −0.139 (95% CI: −0.323 to +0.055, n = 104; not significant, p = 0.16) for pure-impervious pixels (ISA > 70%). The non-significant high-ISA stratum is consistent with surface-energy-balance saturation once impervious cover dominates the 1 km pixel. Per-city values are: Bangkok r = 0.59/0.62/0.42 (low/mixed/high; n = 72/248/18), Jakarta 0.56/0.61/−0.09 (n = 64/177/42), Manila 0.66/0.62/0.18 (n = 122/106/35), Kuala Lumpur 0.65/0.57/0.32 (n = 207/116/9), and Ho Chi Minh City 0.35/0.46/(too few high-ISA samples for r) (n = 134/236/0). Per-stratum statistics are tabulated in Stratified_LST_ISA_Pearson_2024.csv (Supplementary Data).

Ho Chi Minh City’s ribbon-style development along radial transport corridors [31] produce an urban morphology in which impervious surfaces are spatially distributed rather than concentrated; even though the city-mean ISA fraction reached 47.7% in 2024 (the largest 2017–2024 increase in the cohort, +11.0 pp), few 1 km MODIS pixels exceeded the 70% built fraction threshold required to populate the high-ISA stratum. Manila, Jakarta, and Kuala Lumpur—which have more compact urban cores—retain enough such pixels for the high-ISA contrast to be computed (n = 35, 42, 9 pixels respectively). The absence of a high-ISA stratum for Ho Chi Minh City is therefore a substantive geographic finding about urban form, not merely a sampling artifact.

4.8. Independent Spatial Consistency: AlphaEarth vs. JRC GHSL

Cross-dataset comparison between AlphaEarth ISA fraction and JRC GHSL building footprint fraction at 1 km resolution (2018 epoch) demonstrated strong spatial consistency across all five cities (Table 11). Pearson correlation coefficients ranged from r = 0.866 (95% CI: 0.849–0.881) in Ho Chi Minh City to r = 0.936 (95% CI: 0.928–0.943) in Jakarta. Overall agreement with GHSL-derived built-up labels ranged from 59.2% in Bangkok (95% CI: 56.1–62.2) to 84.5% in Manila (95% CI: 82.1–86.6). This check partly addresses shared-lineage concerns with Dynamic World, but it remains a spatial-consistency comparison rather than independent ground-truth validation.

As the ISA comparison we predicted, the AlphaEarth ISA fraction was systematically larger than GHSL building fraction and with a mean bias that ranges from +0.203 (Manila) to +0.388 (Bangkok). Such positive bias is physically consistent as ISA includes all sealed surfaces (buildings, roads, parking lots and sidewalks) while GHSL only includes building footprints. The top bias was observed for Bangkok (+0.388), matching its huge distribution of the road network and extending urban morphology that contributes to a significant share of non-building impervious surfaces to total ISA. Manila had the least bias (+0.203) and greatest overall agreement (OA = 84.5%); stemming from its very high density built environment where structural requirements do well in filling up urban space versus road/parking infrastructure per unit area.

The consistency across both built-up surface types supports the spatial plausibility of the AlphaEarth ISA classifications for urban environmental analysis. However, this comparison confirms spatial pattern consistency, not absolute ISA accuracy. The 10 m to 1 km aggregation may inflate correlations by smoothing sub-pixel heterogeneity and by emphasizing broad urban–rural gradients. Future validation should therefore use independent non-Sentinel-2 data such as aerial photography, LiDAR-derived building footprints, or field-surveyed reference samples.

5. Discussion

5.1. Advantages of Multi-Temporal Foundation Model Embeddings

The ablation study reveals substantial performance differences across the four classification approaches. Multi-temporal foundation model embeddings achieved 90.4% IoU compared to 77.7% for random single-date Sentinel-2 classification—a 12.7 percentage point improvement against the 5-index single-date baseline (10.8 pp against the fairer 15-feature annual Sentinel-2 baseline of Section 3.3.5) attributable to the richer temporal and spectral information encoded in annual composites. This advantage is expected, as AlphaEarth embeddings incorporate cloud masking, temporal gap filling, and multi-temporal compositing as part of their pre-processing pipeline, effectively integrating information across all available observations within a year. The comparison with NDVI thresholding (62.4% IoU) further demonstrates that high-dimensional semantic embeddings capture impervious surface characteristics far more effectively than single spectral indices. NDVI Thresholds fail in tropical cities where urban vegetation produces elevated values (0.3–0.5) that exceed typical urban classification thresholds. These results collectively indicate that foundation model embeddings provide both data completeness through temporal integration and improved classification capacity through learned semantic representations.

5.2. Quantifying Operational Impact

AlphaEarth achieves a 44.9% improvement over conventional NDVI-based classification, reducing the mean error rate from 37.6% (NDVI IoU = 62.4%) to 9.6% (AlphaEarth IoU = 90.4%)—a 74.5% reduction in misclassification. This is an important improvement with direct operational consequences: better classification accuracy diminishes the need for expensive manual verification and allows for more targeted planning of infrastructure works or urban renewals. This allows municipal authorities to allocate resources with greater confidence, because its accuracy at the ±5% margin is improved over traditional methods (where it is ±20%). In addition, the better error rate reduces reliance on manual validation and correction workflows which can help in reducing operational costs. Thus, the high accuracy of AlphaEarth-based classification presents a dual benefit in terms of strategy and economy for sustainable urban development and impervious surface management.

5.3. Extreme Failures Reveal Fundamental Unsuitability

Threshold-based classification performs poorly for tropical impervious surface detection (NDVI IoU: 46.2% in Bangkok and NDVI IoU: 37.7% in Ho Chi Minh City). In this analysis we identify three major failure modes. First, the high occurrence ratio of urban vegetation—street trees, gardens, parks and green roofs—produces green values (0.3–0.5) of NDVI that far exceed the usual indicator threshold value of <0.2 to classify an area as impervious (50), thus leading to systematic over-estimation of pervious land in arrayed impervious areas. Second, heterogeneous land use in fast-growing urban edges leads to mixed pixels that compromise classification accuracy. In these transition zones, the compositing of rice paddies intermixed with smallholder farms and new developments creates spectral confusion which thresholding algorithms are unable to disentangling. Third, agricultural land cover and the inter-annual seasonal greening of urban areas are not temporal static NDVI Threshold values. These failure modes cumulatively highlight the need for more advanced classification methods that utilize additional spectral, spatial, and temporal data to accurately characterize impervious surfaces in tropical settings.

5.4. Sample Availability as Operational Constraint

Cloud coverage is not the only reason for data loss in remote sensing; there are other environmental and instrumental factors contributing to this problem. Data availability in all five study cities was further reduced well beyond what cloud-cover metadata alone would imply; in some locations the reduction from those additional factors accounted for almost 30% of lost data, based off our analysis of all Sentinel-2 images available to the authors. This multiplicative effect is illustrated by Bangkok: although some images have 0% cloud-cover with respect to metadata, the overall data acquisition failure rate was 98.5% (Table 2), clearly demonstrating how insufficient it is to simply depend on cloud-percentage metadata for assessing quality of remotely sensed data products. Multi-temporal compositing approaches tend to alleviate these compounded limitations, utilizing temporal redundancies over multiple acquisitions to improve data completeness given the potential for hindering environmental and instrumental conditions.

5.5. Discussion: Thermal Findings and Methodological Implications

Three findings frame the urban–thermal interpretation of the present results. First, the surface urban heat island intensity ranks Manila > Kuala Lumpur > Jakarta > Ho Chi Minh City ≈ Bangkok over 2017–2024 (city-mean SUHI 8.51/8.07/6.42/4.17/4.01 °C with inter-annual 95% CIs widening to 7.47–9.55/7.02–9.12/5.33–7.50/3.19–5.14/3.03–4.99 °C once MOD11A2 retrieval uncertainty is propagated). Second—and counter-intuitively—the SUHI declined in all five cities between 2017 and 2024 (Bangkok −0.78 °C, Jakarta −2.03 °C, Manila −1.05 °C, Kuala Lumpur −0.75 °C, Ho Chi Minh City −1.12 °C) despite uniform ISA growth. Because absolute LST_urban remained elevated across the period, the SUHI decline reflects rural baseline warming outpacing urban warming rather than urban cooling—a pattern reported elsewhere for mature tropical megacities and consistent with regional dry-season expansion and agricultural intensification in surrounding peri-urban areas. Third, the spatial scale at which ISA expansion translates into surface-temperature change is shaped by pixel composition: the stratified Pearson correlation between ISA fraction and 1 km LST is strong and significant for pure-pervious and mixed pixels (r = 0.65 and 0.51 respectively, pooled across cities) but saturates and becomes statistically indistinguishable from zero for pure-impervious pixels (r = −0.14, p = 0.16). For tropical climate adaptation, this means the largest marginal heat impact is associated with the conversion of pervious to mixed pixels—i.e., the urbanizing fringe—rather than the densification of an already-impervious core. Cooling interventions targeted at peri-urban transition zones are therefore likely to yield the largest near-term reduction in tropical urban heat exposure.

Beyond these thermal findings, the methodological results imply three crucial implications for tropical remote sensing practice. Multi-temporal methods are unequivocally superior to single-date analysis (90.4% vs. 77.7% IoU for the Random Sentinel-2), emphasizing the need for atmospheric noise reduction through composite datasets. Subsequent benchmarks should favor temporally integrated evaluation datasets as opposed to single-date scenes. Second, foundational models like AlphaEarth signify a paradigm shift in remote sensing infrastructure. Encoding complex surface property information in semantic embeddings enables these models to exceed traditional spectral indices by ∼ 28% points in IoU (90.4% vs. NDVI: 62.4%).

AlphaEarth’s availability through platforms such as Google Earth Engine facilitates scalable adoption, supporting the transition from spectral-only approaches toward deep learning-based methods with contextual information. Third, the 19.5% standard deviation in NDVI accuracy across cities demonstrates that no single threshold can serve heterogeneous tropical landscapes. Only high-dimensional embedding approaches can adequately capture the complexity of tropical impervious surfaces. These three shifts—temporal compositing, semantic embeddings, and high-dimensional feature representations—collectively advance the operational capacity of tropical remote sensing toward greater robustness, scalability, and accuracy.

5.6. Limitations and Future Work

The MOD11A2 land-surface-temperature product carries a documented retrieval uncertainty of approximately ±1 K. This uncertainty was not formally propagated through the SUHI mean and ISA–LST regression statistics reported in Section 4.7. The 95% confidence intervals shown for SUHI are therefore inter-annual sampling intervals (n = 8 years) and do not include the LST retrieval component. Incorporating LST retrieval uncertainty through Monte-Carlo perturbation is identified as a clearly bounded next step that can be added to the analysis prior to publication if requested by the reviewers.

Several limitations remain. First, Dynamic World is a deep learning-derived reference product rather than ground-truth, and its shared Sentinel-2 lineage with AlphaEarth may inflate agreement metrics. Second, although repeated random holdouts and spatial-block holdouts were added, they remain sample-based robustness checks rather than exhaustive wall-to-wall validation. Third, the mixed-pixel sensitivity analysis used Dynamic World built fraction as an ISA proxy to avoid GEE memory limits associated with full-raster AlphaEarth aggregation; therefore, it should be interpreted as a sensitivity check, not a replacement for a full AlphaEarth-derived mixed-pixel analysis. Fourth, AlphaEarth embeddings contain 64 learned dimensions, whereas the new annual Sentinel-2 baseline contains 15 optical features; the comparison is therefore stronger than the original five-index baseline but still not a pure dimensionality-controlled experiment. Finally, 10 m mapping is suitable for city-scale monitoring but remains too coarse for parcel- or building-scale thermal planning.

We further note that full propagation of reference-data uncertainty through area estimation, as recommended by Zhong et al. (2025) [36], would require an independent validation sample and is identified as next-step work.

To partially address the shared-lineage concern, we conducted a cross-dataset spatial consistency check against the JRC GHSL Built-up Surface product (Section 4.8), which showed strong correlation (r = 0.866–0.936) with 95% confidence intervals reported in the revised results. Nevertheless, GHSL measures building footprints only, excluding roads and parking surfaces that constitute a substantial fraction of total ISA. Future work should incorporate independent non-Sentinel-2 ground-truth sources such as aerial photography, LiDAR-derived building footprints, or field-surveyed reference samples, and should export full pixel-level LST-ISA samples to support a fully AlphaEarth-based mixed-pixel uncertainty analysis.

6. Conclusions

This study assessed impervious surface expansion and its relationship with the urban thermal environment across five tropical Southeast Asian megacities. AlphaEarth foundation model embeddings provided cloud-resilient ISA estimates for 2017–2024, enabling comparison with MODIS-derived dry-season LST and SUHI intensity. Added robustness analyses showed that AlphaEarth retained higher performance under repeated random holdouts (mean IoU = 0.866, 95% CI: 0.857–0.875) and spatial-block holdouts (0.859, 95% CI: 0.848–0.869) than both a fairer annual Sentinel-2 full-band-plus-index baseline and a best single-date 5-index baseline. ISA expanded most rapidly in Ho Chi Minh City (+11.0 pp; slope = 1.48 pp yr⁻¹, 95% CI: 1.06–1.91), while mean SUHI intensity during 2017–2024 was highest in Manila (8.51 °C, 95% CI: 8.10–8.92) and Kuala Lumpur (8.07 °C, 95% CI: 7.69–8.44). Mixed-pixel sensitivity analysis further showed that high-ISA pixels were 4.00–8.53 °C warmer than low-ISA pixels in cities with sufficient high-ISA samples. AlphaEarth achieved high concordance with Dynamic World and strong spatial consistency with JRC GHSL, but these results should be read as product agreement rather than absolute accuracy. Robust future assessment should add independent ground-truth validation and full-raster uncertainty propagation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/earth7030076/s1, Code S1: Generate_Classification_Maps.ipynb (Earth Engine notebook for the four-method ablation, ISA statistics, and LST × ISA temporal analysis); Code S2: GEE_robustness_holdouts.py (Earth Engine sampling for repeated-holdout robustness and mixed-pixel LST analyses); Code S3: local_statistical_analyses.py (Fisher-z CIs, Theil–Sen and Kendall trend tests, t-distribution SUHI CIs, Monte-Carlo MOD11A2 propagation, Wilson CIs on GHSL agreement, stratified Pearson per pixel-purity stratum); Table S1: robustness holdout summary; Table S2: robustness holdout per-run metrics; Table S3: temporal trends with 95% CI; Table S4: SUHI summary 2017–2024; Table S5: SUHI Monte-Carlo CI; Table S6: stratified LST–ISA Pearson 2024; Table S7: GHSL validation with 95% CI; Table S8: LST–ISA correlations with 95% CI; Table S9: mixed-pixel LST summary 2024; Table S10: mixed-pixel LST samples 2024; Table S11: ISA method disagreement; Data S1: GEE feature samples raw (n = 34,214 labelled pixel-level samples used to train the holdout RFs).

Author Contributions

Conceptualization, S.M.; methodology, K.I.; software, K.I.; validation, S.M., P.M. and K.I.; formal analysis, S.M.; resources, P.M.; data curation, S.M.; writing—original draft preparation, S.M.; writing—review and editing, S.M., P.M. and K.I.; visualization, S.M.; supervision, S.M.; funding acquisition, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Faculty of Liberal Arts, Thammasat University.

Data Availability Statement

All data tables (Tables S1–S11, Data S1) and analysis code (Code S1–S3) underlying every quantitative claim in this paper are provided as Supplementary Material with this submission. The complete code and data are openly available on GitHub at https://github.com/SitthisakMoukomla/tropical-urban-AlphaEarth (accessed on 6 May 2026) (release v1.0-mdpi-r1) and archived on Zenodo with persistent DOI: 10.5281/zenodo.19945781 (https://doi.org/10.5281/zenodo.19945781). Code was developed and tested with Python 3.11.5, scikit-learn 1.4.0, scipy 1.11.4, statsmodels 0.14.0, and earthengine-api 0.1.395 (Earth Engine Python API; Google LLC, Mountain View, CA, USA).

Acknowledgments

The authors acknowledge the support of the Department of Geography, Faculty of Liberal Arts, Thammasat University, and the Research Unit in Geospatial Research and Analytics for Climate and Environment (GRACE Lab) for providing technical and computational support. Satellite data and processing infrastructure were accessed through Google Earth Engine. During the preparation of this manuscript, the authors used Gemini 2.5 Pro for language refinement and structural editing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cai, Y.; Li, B.; Liu, X.; Jiang, X.; Zhu, Y.; Luo, S.; Qin, Y.; Xie, S.; Ye, J.; Shen, H.; et al. Annual 10-m High-Resolution Cropland Maps for Southeast Asia Since 2019 Using AlphaEarth Embeddings; Zenodo: Genève, Switzerland, 2025. [Google Scholar] [CrossRef]
Gong, P.; Li, X.; Zhang, W. 40-Year (1978–2017) Human Settlement Changes in China Reflected by Impervious Surfaces from Satellite Remote Sensing. Sci. Bull. 2019, 64, 756–763. [Google Scholar] [CrossRef]
Wu, W.; Shao, Z.; Teng, J.; Huang, X.; Zhao, X.; Guo, S. Urban Impervious Surface Extraction Using Seasonal Time Series SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6033–6045. [Google Scholar] [CrossRef]
Puttanapong, N.; Nuengjumnong, N.; SaeJung, J.J.; Moukomla, S. A 36-Year Geospatial Analysis of Urbanization Dynamics and Surface Urban Heat Island Effect: Case Study of the Bangkok Metropolitan Region. Geogr. Sustain. 2025, 6, 100322. [Google Scholar] [CrossRef]
Ahmad, M.N.; Shao, Z.; Javed, A. Mapping Impervious Surface Area Increase and Urban Pluvial Flooding Using Sentinel Application Platform (SNAP) and Remote Sensing Data. Environ. Sci. Pollut. Res. Int. 2023, 30, 125741–125758. [Google Scholar] [CrossRef]
Kuang, W.; Hou, Y.; Dou, Y.; Lu, D.; Yang, S. Mapping Global Urban Impervious Surface and Green Space Fractions Using Google Earth Engine. Remote Sens. 2021, 13, 4187. [Google Scholar] [CrossRef]
Li, X.; Gong, P. An “Exclusion-Inclusion” Framework for Extracting Human Settlements in Rapidly Developing Regions of China from Landsat Images. Remote Sens. Environ. 2016, 186, 286–296. [Google Scholar] [CrossRef]
Zhang, F.; Zhang, X.; Chen, W.; Yang, B.; Chen, Z.; Tang, H.; Wang, Z.; Bi, P.; Yang, L.; Li, G.; et al. Cloud-Free Land Surface Temperature Reconstructions Based on MODIS Measurements and Numerical Simulations for Characterizing Surface Urban Heat Islands. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6882–6898. [Google Scholar] [CrossRef]
Liang, J.; Xie, Y.; Sha, Z.; Zhou, A. Modeling Urban Growth Sustainability in the Cloud by Augmenting Google Earth Engine (GEE). Comput. Environ. Urban Syst. 2020, 84, 101542. [Google Scholar] [CrossRef]
Zhang, X.; Liu, L.; Zhao, T.; Gao, Y.; Chen, X.; Mi, J. GISD30: Global 30gm Impervious-Surface Dynamic Dataset from 1985 to 2020 Using Time-Series Landsat Imagery on the Google Earth Engine Platform. Earth Syst. Sci. Data 2022, 14, 1831–1856. [Google Scholar] [CrossRef]
Chen, J.; Chen, S.; Yang, C.; He, L.; Hou, M.; Shi, T. A Comparative Study of Impervious Surface Extraction Using Sentinel-2 Imagery. Eur. J. Remote Sens. 2020, 53, 274–292. [Google Scholar] [CrossRef]
Lasko, K.; O’Neill, F.D. Automated Method for Artificial Impervious Surface Area Mapping in Temperate, Tropical, and Arid Environments Using Hyperlocal Training Data with Sentinel-2 Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 298–314. [Google Scholar] [CrossRef]
Zhang, J.; Du, J.; Fang, S.; Sheng, Z.; Zhang, Y.; Sun, B.; Mao, J.; Li, L. Dynamic Changes, Spatiotemporal Differences, and Ecological Effects of Impervious Surfaces in the Yellow River Basin, 1986–2020. Remote Sens. 2023, 15, 268. [Google Scholar] [CrossRef]
Attarchi, S. Extracting Impervious Surfaces from Full Polarimetric SAR Images in Different Urban Areas. Int. J. Remote Sens. 2020, 41, 4644–4663. [Google Scholar] [CrossRef]
Liang, X.; Lin, Y.; Zhang, H. Mapping Urban Impervious Surface with an Unsupervised Approach Using Interferometric Coherence of SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2734–2744. [Google Scholar] [CrossRef]
Feng, S.; Fan, F. Impervious Surface Extraction Based on Different Methods from Multiple Spatial Resolution Images: A Comprehensive Comparison. Int. J. Digit. Earth 2021, 14, 1148–1174. [Google Scholar] [CrossRef]
Chen, R.; Li, X.; Zhang, Y.; Zhou, P.; Wang, Y.; Shi, L.; Jiang, L.; Ling, F.; Du, Y. Spatiotemporal Continuous Impervious Surface Mapping by Fusion of Landsat Time Series Data and Google Earth Imagery. Remote Sens. 2021, 13, 2409. [Google Scholar] [CrossRef]
Shi, Z.; Li, X.; Hu, T.; Yuan, B.; Yin, P.; Jiang, D. Modeling the Intensity of Surface Urban Heat Island Based on the Impervious Surface Area. Urban Clim. 2023, 49, 101529. [Google Scholar] [CrossRef]
Gong, P.; Li, X.; Wang, J.; Bai, Y.; Chen, B.; Hu, T.; Liu, X.; Xu, B.; Yang, J.; Zhang, W.; et al. Annual Maps of Global Artificial Impervious Area (GAIA) between 1985 and 2018. Remote Sens. Environ. 2020, 236, 111510. [Google Scholar] [CrossRef]
Xu, H. Analysis of Impervious Surface and Its Impact on Urban Heat Environment Using the Normalized Difference Impervious Surface Index (NDISI). Photogramm. Eng. Remote Sens. 2010, 76, 557–565. [Google Scholar] [CrossRef]
Wang, W.; Guo, H.; He, S.; Qi, F.; Samat, A.; Wang, D.; Li, J. AI-Driven Precision Mapping of Tea Plantations Using AlphaEarth Foundations: A Scalable Solution for Smart Agricultural Monitoring. Agriculture 2026, 16, 412. [Google Scholar] [CrossRef]
Pascual, A.; Guerra-Hernández, J. Integration of Google’s Alpha Earth Foundations into Biomass Estimation Combined with GEDI Spaceborne Lidar and Field Inventory Data. For. Ecol. Manag. 2026, 606, 123550. [Google Scholar] [CrossRef]
Zhu, X.X.; Xiong, Z.; Wang, Y.; Stewart, A.J.; Heidler, K.; Wang, Y.; Yuan, Z.; Dujardin, T.; Xu, Q.; Shi, Y. On the Foundations of Earth Foundation Models. Commun. Earth Environ. 2026, 7, 103. [Google Scholar] [CrossRef]
Alvarez, C.I.; Ulloa Vaca, C.A.; Echeverria Llumipanta, N.A. Machine Learning for Urban Air Quality Prediction Using Google AlphaEarth Foundations Satellite Embeddings: A Case Study of Quito, Ecuador. Remote Sens. 2025, 17, 3472. [Google Scholar] [CrossRef]
Brown, C.F.; Kazmierski, M.R.; Pasquarella, V.J.; Rucklidge, W.J.; Samsikova, M.; Zhang, C.; Shelhamer, E.; Lahera, E.; Wiles, O.; Ilyushchenko, S.; et al. AlphaEarth Foundations: An Embedding Field Model for Accurate and Efficient Global Mapping from Sparse Label Data. arXiv 2025, arXiv:2507.22291. [Google Scholar] [CrossRef]
Arifwidodo, S.D.; Tanaka, T. The Characteristics of Urban Heat Island in Bangkok, Thailand. Procedia Soc. Behav. Sci. 2015, 195, 423–428. [Google Scholar] [CrossRef]
Iamtrakul, P.; Padon, A.; Chayphong, S. Quantifying the Impact of Urban Growth on Urban Surface Heat Islands in the Bangkok Metropolitan Region, Thailand. Atmosphere 2024, 15, 100. [Google Scholar] [CrossRef]
Nucifera, F.; Riasasi, W.; Yamamoto, Y.; Ichii, K. Utilization of Remote Sensing Data for Thermal Comfort Estimation in the Coastal Urban of Jakarta. In Proceedings of the 2021 4th International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 30–31 August 2021; pp. 29–34. [Google Scholar]
Santillan, J.R.; Heipke, C. Assessing Patterns and Trends in Urbanization and Land Use Efficiency Across the Philippines: A Comprehensive Analysis Using Global Earth Observation Data and SDG 11.3.1 Indicators. PFG—J. Photogramm. Remote Sens. Geoinf. Sci. 2024, 92, 569–592. [Google Scholar] [CrossRef]
Xu, L.; Sun, G.; Zhang, A.; Han, Z.; Li, Z.; Zhao, Y. Automatic Mapping of High-Resolution Impervious Surfaces Driven by Hierarchical Adaptive Features. Sci. Remote Sens. 2025, 12, 100300. [Google Scholar] [CrossRef]
Son, N.T.; Chen, C.F.; Chen, C.R.; Thanh, B.X.; Vuong, T.H. Assessment of Urbanization and Urban Heat Islands in Ho Chi Minh City, Vietnam Using Landsat Data. Sustain. Cities Soc. 2017, 30, 150–161. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Brown, C.F.; Brumby, S.P.; Guzder-Williams, B.; Birch, T.; Hyde, S.B.; Mazzariello, J.; Czerwinski, W.; Pasquarella, V.J.; Haertel, R.; Ilyushchenko, S. Dynamic World, Near Real-Time Global 10 m Land Use Land Cover Mapping. Sci. Data 2022, 9, 251. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good Practices for Estimating Area and Assessing Accuracy of Land Change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
Zhong, C.; Peng, L.; Yu, J.; Swan, I.; Li, H. Toward More Reliable, Complete, and Equitable Global Urban Land Use Efficiency Assessments. Commun. Earth Environ. 2025, 6, 1055. [Google Scholar] [CrossRef]
Pesaresi, M.; Schiavina, M.; Politis, P.; Freire, S.; Krasnodębska, K.; Uhl, J.H.; Carioli, A.; Corbane, C.; Dijkstra, L.; Florio, P. Advances on the Global Human Settlement Layer by Joint Assessment of Earth Observation and Population Survey Data. Int. J. Digit. Earth 2024, 17, 2390454. [Google Scholar] [CrossRef]
Imhoff, M.L.; Zhang, P.; Wolfe, R.E.; Bounoua, L. Remote Sensing of the Urban Heat Island Effect across Biomes in the Continental USA. Remote Sens. Environ. 2010, 114, 504–513. [Google Scholar] [CrossRef]

Figure 1. Location of the five study cities in Southeast Asia: Bangkok (Thailand), Jakarta (Indonesia), Manila (Philippines), Kuala Lumpur (Malaysia), and Ho Chi Minh City (Vietnam). The inset shows the global location of the study region. Base map from Natural Earth.

Figure 2. Relationship between mean cloud-cover and AlphaEarth classification performance in the five Southeast Asian cities (2020 and 2023). The weak relationship (r = −0.069, p = 0.851) indicates no detectable cloud-performance dependence within this fixed-seed protocol.

Figure 3. Impervious surface change during 2017–2024 across five tropical megacities.

Table 1. Study city characteristics.

City	Population (Million)	Area (km²)	Density (1000/km²)	Cloud-Cover (%)	Urban Pattern
Bangkok	10.5	1569	6.7	50–51	Sprawling, mixed
Jakarta	10.6	664	16.0	54–63	Dense, vertical
Manila	13.9	619	22.5	53–59	Very dense, informal
Kuala Lumpur	1.8	243	7.4	67–71	Planned, controlled
Ho Chi Minh City	9.0	2095	4.3	55–61	Expanding, mixed

Table 2. Cloud-cover statistics and operational data availability (2020–2023).

City	Mean Cloud (%)	Usable Images (%)	Best Image (%)	Total Images
Kuala Lumpur	68.9	6.0	6.3	296
Jakarta	58.8	19.1	0.7	572
Ho Chi Minh City	58.1	18.6	0.0	872
Manila	55.8	16.7	1.6	291
Bangkok	50.9	27.4	0.0	442
Average	58.5	17.6	1.7	2473

Table 3. Overall AlphaEarth performance (all cities, 2018–2023).

Metric	Pervious Class	Impervious Class	Overall
Precision	0.9389	0.9500	—
Recall	0.8888	0.9733	—
F1 Score	0.9132	0.9615	0.9467
IoU	—	—	0.9259
Accuracy	—	—	0.9467
Support (samples)	2384	5170	7554

Table 4. AlphaEarth cross-city performance vs. cloud-cover.

City	Cloud (%)	Usable (%)	IoU (%)	F1 (%)	Samples
Manila	55.8	16.7	93.7	96.8	5992
Jakarta	58.8	19.1	93.1	96.4	6000
Kuala Lumpur	68.9	6.0	88.2	93.8	6000
Bangkok	50.9	27.4	87.9	93.5	3392
Ho Chi Minh City	58.1	18.6	83.1	90.8	3794
Mean	58.5	17.6	89.2	94.3	25,178

Table 5. Cloud variation and fixed-seed classification point estimates for 2020 and 2023.

City	2020 Cloud (%)	2023 Cloud (%)	ΔCloud	2020 IoU (%)	2023 IoU (%)
Jakarta	63.3	54.2	−9.1	93.1	93.1
Kuala Lumpur	66.7	71.1	+4.4	88.2	88.2
Manila	52.9	58.6	+5.7	93.7	93.7
Ho Chi Minh City	55.3	60.9	+5.6	83.1	83.1
Bangkok	50.8	51.0	+0.2	87.9	87.9

Table 6. Ablation study results: method performance and operational reliability.

Method	Mean IoU	Std IoU	Mean F1	Mean Acc	Success Rate
AlphaEarth (multi-temporal)	0.904	0.054	0.949	0.939	100% (5/5)
Best Single S2 (RF)	0.842	0.076	0.913	0.882	60% (3/5)
Random Single S2 (RF)	0.777	0.078	0.873	0.821	80% (4/5)
NDVI Threshold	0.624	0.195	0.753	0.737	100% (5/5)

Table 7. IoU by method and city—operational reality (2023).

City	AlphaEarth	Best S2	Random S2	NDVI
Jakarta	94.05%	90.22%	Failed ^†	84.14%
Manila	93.89%	89.96%	89.12%	72.07%
Bangkok	92.42%	Failed ^†	71.98%	46.23%
Kuala Lumpur	90.46%	82.17%	75.61%	71.62%
Ho Chi Minh City	81.00%	74.29%	73.97%	37.68%
Mean	90.36%	84.16%	77.67%	62.35%
Success	5/5	3/5	4/5	5/5

^† Failed = Insufficient samples obtained (<100) preventing model training.

Table 8. Operational feasibility assessment.

Method	Mean IoU	Coverage	Success Rate	Availability-Weighted IoU ^†	Viable?
AlphaEarth	90.4%	100%	100%	90.4%	Yes
NDVI Threshold	62.4%	100%	100%	62.4%	No ^‡
Random S2	77.7%	100%	80%	62.2%	Marginal
Best S2	84.2%	17.6%	60%	8.9%	No

^† Availability-weighted IoU is a heuristic operational screening index, not a physical metric. ^‡ Although data are always available, accuracy is insufficient for planning applications.

Table 9. Cross-city transferability (Leave-One-City-Out validation).

Test City	Training Cities	IoU (%)	F1 (%)	Performance Drop
Manila	B, J, K, H	93.7	96.8	−0.3%
Jakarta	B, M, K, H	93.1	96.4	+0.5%
Kuala Lumpur	B, J, M, H	88.2	93.8	−4.3%
Bangkok	J, M, K, H	87.9	93.5	−4.7%
Ho Chi Minh City	B, J, M, K	83.1	90.8	−9.5%
Average	—	89.2	94.3	−3.4%

Note: B = Bangkok, J = Jakarta, M = Manila, K = Kuala Lumpur, H = Ho Chi Minh City. Overall (all cities training): 92.6% IoU.

Table 10. Summary of ISA expansion and peak/representative SUHI intensity reported for the five study cities.

City	ISA Change, 2017–2024	ISA Slope (pp yr⁻¹; 95% CI)	Mean SUHI, 2017–2024 (°C; 95% CI)
Bangkok	+7.4 pp	1.27 (0.85–1.69)	4.01 (3.68–4.34)
Jakarta	+4.1 pp	0.63 (0.41–0.85)	6.42 (5.87–6.96)
Manila	+4.7 pp	0.52 (0.32–0.73)	8.51 (8.10–8.92)
Kuala Lumpur	+3.3 pp	0.79 [−0.34, 1.92]	8.07 (7.69–8.44)
Ho Chi Minh City	+11.0 pp	1.48 (1.06–1.91)	4.17 (3.88–4.46)

Table 11. Spatial consistency between AlphaEarth ISA fraction and JRC GHSL building footprint fraction at 1 km resolution.

City	Pearson r (95% CI)	Overall Agreement (95% CI)	RMSE
Jakarta	0.936 (0.928–0.943)	73.3% (70.4–76.0)	0.429
Manila	0.930 (0.921–0.938)	84.5% (82.1–86.6)	0.338
Kuala Lumpur	0.908 (0.897–0.918)	72.3% (69.5–75.0)	0.381
Bangkok	0.907 (0.895–0.917)	59.2% (56.1–62.2)	0.472
Ho Chi Minh City	0.866 (0.849–0.881)	75.9% (73.1–78.4)	0.376

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Moukomla, S.; Meeprom, P.; Intarat, K. Impact of Impervious Surface Expansion on Urban Thermal Environment Across Tropical Southeast Asian Megacities: Reliable Assessment Through Foundation Model Embeddings. Earth 2026, 7, 76. https://doi.org/10.3390/earth7030076

AMA Style

Moukomla S, Meeprom P, Intarat K. Impact of Impervious Surface Expansion on Urban Thermal Environment Across Tropical Southeast Asian Megacities: Reliable Assessment Through Foundation Model Embeddings. Earth. 2026; 7(3):76. https://doi.org/10.3390/earth7030076

Chicago/Turabian Style

Moukomla, Sitthisak, Phurith Meeprom, and Kritchayan Intarat. 2026. "Impact of Impervious Surface Expansion on Urban Thermal Environment Across Tropical Southeast Asian Megacities: Reliable Assessment Through Foundation Model Embeddings" Earth 7, no. 3: 76. https://doi.org/10.3390/earth7030076

APA Style

Moukomla, S., Meeprom, P., & Intarat, K. (2026). Impact of Impervious Surface Expansion on Urban Thermal Environment Across Tropical Southeast Asian Megacities: Reliable Assessment Through Foundation Model Embeddings. Earth, 7(3), 76. https://doi.org/10.3390/earth7030076

Article Menu

Impact of Impervious Surface Expansion on Urban Thermal Environment Across Tropical Southeast Asian Megacities: Reliable Assessment Through Foundation Model Embeddings

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Administrative Boundaries

2.3. Satellite Data and Cloud Statistics

2.3.1. Sentinel-2 for Cloud Analysis and Single-Date Methods

2.3.2. AlphaEarth Embeddings

2.3.3. Dynamic World as Reference Dataset

3. Methods

3.1. Cloud-Cover Analysis

3.2. Sample Extraction Protocol

3.3. Classification Approaches: Ablation Study Design

3.3.1. Method 1: AlphaEarth Multi-Temporal Foundation Model (Proposed)

3.3.2. Method 2: Best Single-Date Sentinel-2 (Traditional Best-Case)

3.3.3. Method 3: Random Single-Date Sentinel-2 (Operational Reality)

3.3.4. Method 4: NDVI Threshold (Traditional Baseline)

3.3.5. Additional Fairer Sentinel-2 Annual Composite Baseline and Robustness Testing

3.4. Performance Metrics

3.5. Cloud-Performance Correlation Analysis

3.6. Cross-City Transferability

3.7. Statistical Comparison Across Methods

3.8. Impervious Surface Area Estimation

3.9. Land Surface Temperature–Impervious Surface Correlation

3.10. Independent Spatial Consistency Check Using JRC GHSL

4. Results

4.1. Cloud-Cover Statistics: Quantifying the Tropical Challenge

4.2. Foundation Model Performance

4.2.1. Overall Classification Accuracy

4.2.2. City-Level Performance Despite Variable Cloud-Cover

4.3. Cloud-Performance Independence: Statistical Evidence

4.3.1. Correlation Analysis

4.3.2. Fixed-Seed Point Estimates Across Years

4.4. Ablation Study: Quantitative Method Comparison

4.4.1. Leave-One-City-Out Transferability Results

4.4.2. Data Acquisition Failures as Primary Evidence

4.4.3. Operational Feasibility Assessment

4.5. Cross-City Transferability

4.6. Impervious Surface Area Estimates: Method Disagreement

4.7. Urban Heat Island Evidence: LST–Impervious Surface Relationship

4.8. Independent Spatial Consistency: AlphaEarth vs. JRC GHSL

5. Discussion

5.1. Advantages of Multi-Temporal Foundation Model Embeddings

5.2. Quantifying Operational Impact

5.3. Extreme Failures Reveal Fundamental Unsuitability

5.4. Sample Availability as Operational Constraint

5.5. Discussion: Thermal Findings and Methodological Implications

5.6. Limitations and Future Work

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI