Next Article in Journal
Temporal and Spatial Variations in the Thermal Front in the Beibu Gulf in Winter
Previous Article in Journal
Using Fracture-Induced Electromagnetic Radiation (FEMR) for In Situ Stress Analysis: A Case Study of the Ramon Crater
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Technical Note

A Machine Learning Algorithm to Convert Geostationary Satellite LST to Air Temperature Using In Situ Measurements: A Case Study in Rome and High-Resolution Spatio-Temporal UHI Analysis

Institute of Atmospheric Sciences and Climate (ISAC), Italian National Research Council (CNR), 00133 Rome, Italy
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(3), 468; https://doi.org/10.3390/rs17030468
Submission received: 5 December 2024 / Revised: 10 January 2025 / Accepted: 23 January 2025 / Published: 29 January 2025

Abstract

:
Air temperature ( T a ) measurements are crucial for characterizing phenomena like the urban heat island (UHI), which can create critical conditions in cities during summer. This study aims to develop a machine learning-based model, namely gradient boosting, to estimate T a from geostationary satellite LST data and to apply these estimates to investigate UHI dynamics. Using Rome, Italy, as a case study, the model was trained with T a data from 15 weather stations, taking multi-temporal LST values (instantaneous and lagged up to 4 h) and additional predictors. The model achieved an overall RMSE of 0.9 °C. The resulting T a fields, with a 3 km spatial and hourly temporal resolution, enabled a detailed analysis of UHI intensity and dynamics during the summers of 2019–2020, significantly enhancing the spatial and temporal detail compared to previous studies based solely on in situ data. The results also revealed a slightly higher nocturnal UHI intensity than previously reported, attributed to the inclusion of rural areas with near-zero imperviousness, thanks to the complete mapping of T a across the domain now accessible.

1. Introduction

The urban heat island (UHI) phenomenon, caused by urbanization, results in a warmer microclimate in cities compared to surrounding rural areas by significantly altering the thermal, radiative, and circulation properties of an area, thus disrupting its energy balance [1]. The built environment absorbs, retains, and produces more heat than the natural landscape it replaces. Th UHI impacts health, economy, the environment, energy consumption, air quality, and bioclimatic stress [2,3]. There are two main UHI types: surface (SUHI), observed via remote sensing through land surface temperature (LST) data, and atmospheric (AUHI), measured by air temperature sensors. The main difference between an AUHI and an SUHI is that an SUHI generally shows higher intensity, with distinctly positive values even during the daytime [1,4]. In the context of urban climate change the AUHI, especially within the urban canopy layer (UHICL, hereafter referred to as UHI), is particularly significant [2] as it is the phenomenon which directly impacts human health and activities. As urban populations grow, projected to reach 5.2 billion by 2030 [5], and heatwaves intensify due to global warming [6], the interaction between heatwaves and UHIs has garnered increasing attention. Extreme urban heat poses significant public health risks [2]. A recent study for the city of Rome (Italy) demonstrated how a dense network of in situ meteorological stations can be used to enhance the analysis of the UHI spatiotemporal dynamics, through a methodology that relates temperature to imperviousness; the latter, measured by satellite, is a good proxy for the level of urbanization. The results show a significant correlation between the two parameters during nighttime hours, when UHI intensity peaks [1]. This methodology has allowed for a step forward in UHI analysis, as it not only provides an overall estimate of UHI intensity but also enables the characterization of its intensity at the neighborhood scale, at least in those areas where temperature sensors are present.
It is evident that the denser the network of meteorological stations used, the more the imperviousness methodology allows for a detailed investigation of the intensity and spatiotemporal dynamics of the phenomenon. However, in general, in situ data are sparsely distributed and do not uniformly cover the city and the surrounding countryside. Additionally, installing and maintaining numerous stations is costly.
The present study aims to extend and improve the previous analysis of the UHI in Rome during the summer months by incorporating not only in situ data, but also the land surface temperature (LST) data measured by geostationary satellites. Geostationary orbit is chosen because it is the only one that allows for continuous temporal measurement of LST, with excellent temporal resolution (up to 15 min). In contrast, sun-synchronous orbit satellites, despite providing LST with greater spatial resolution, have poor temporal resolution (typically on the order of a day), which is not adequate for studying UHI time dynamics. In addition, LST is a different parameter from the air temperature ( T a ) measured by in situ weather stations and defines another type of UHI, namely surface UHI (SUHI). Therefore, a conversion is necessary.
Using LST data, the characterization of the UHI spatiotemporal dynamics can be significantly improved in terms of spatial coverage and accuracy, but first it a conversion to T a is needed.
The first study addressing the problem was conducted by Chen et al. in 1983 [7], employing a simple linear regression model to establish the relationship between LST measured by geostationary GOES satellites (resolution 8 km, 1 h) and T a at 1.5 m during four winter periods (1978–1981) in Florida, USA. They reported an average correlation coefficient of 0.87 and a sample standard deviation from the regression of 1.57 °C. Bechtel et al. [8] conducted a similar study for the city of Hamburg, Germany, using a multiple linear regression model. In this case, the LST measured by MSG-SEVIRI was introduced using a multitemporal approach, including time-lagged values. The results demonstrated that this methodology, applied to six T a measurement stations, reduced the RMSE to values between 1.8 and 1.5 °C. A dynamic multiple linear regression model was also used by Good in a 2015 study [9] to estimate daily maximum and minimum T a values across Europe for the years of 2012–2013. They utilized LST measurements from MSG-SEVIRI, achieving RMSE values between 2.3 and 2.7 °C. Pichierri et al. [10] used the brightness temperature measured by the MODIS instrument aboard sun-synchronous satellites to estimate T a in Milan, Italy, for the years of 2007–2010. They developed regression algorithms tailored to specific case studies (e.g., urban and rural environments), achieving RMSE values between 1.2 and 1.8 °C.
The literature, however, does not solely rely on regression models before the advent of machine learning. For instance, two studies from 2011 [11,12] employed the Temperature Vegetation Index (TVX) method, in which it is assumed that NDVI has a linear correlation with LST, and T a is also linearly correlated with the full-cover NDVI.
The first study to use a machine learning approach on LST measured by a geostationary satellite was conducted by Agathangelidis et al. in 2016 [13]. They estimated T a in Athens, Greece, during the summers of 2014–2015 using a neural network. The LST data, provided by MSG-SEVIRI, were first downscaled to a 1 km resolution. The study demonstrated that the neural network outperformed a polynomial regression model and that the multitemporal approach for LST significantly improved both models. With the neural network, an RMSE between 1.0 and 1.2 °C was achieved.
Machine learning underpins the most recent studies addressing the conversion of LST into T a . However, these studies often rely on LST data from sun-synchronous satellites, which lack the temporal resolution required for the objectives of the present study. For example, Rochelle Schneider dos Santos [14] used MODIS LST to estimate daily maximum T a in London, UK during the summers of 2006–2017, employing a gradient boosting algorithm and achieving an RMSE of 2 °C. Carrión et al. [15] used this algorithm to reconstruct hourly T a based on an LST dataset containing four measurements per day, collected over 13 U.S. states during the period of 2003–2019. The dataset was derived from various satellites equipped with MODIS, and the reconstruction achieved an RMSE of 1.6 °C. Similarly, Duy-Phien et al. [16], using MODIS data and gradient boosting, reconstructed monthly average air temperatures in Taiwan with an RMSE of 0.6 °C. Finally, Wang et al. [17] applied a random forest algorithm to MODIS data to calculate T a in the Jingjinji area of China for the years of 2018–2019. They estimated both the daily average and instantaneous values at the time of the satellite overpass, achieving RMSE values of 1.3 °C and between 1.9 and 2.5 °C, respectively.
The main goal of the present study is to convert LST measured by EUMETSAT MSG satellites into T a over Rome, with the specific aim of enabling a more accurate investigation of the urban heat island (UHI). This conversion is achieved in areas where in situ measurements are not available, using a gradient boosting algorithm, which is well-suited to regression problems and offers the advantage of shorter computational times compared to methods like random forests. By transforming the LST grid into T a , a homogeneous dataset is generated across the entire area of interest, addressing the issue of sparse in situ measurements. This approach enables a consistent characterization of the UHI across the city, allowing detailed analysis at the neighborhood level. We focus on the two summers of 2019 and 2020, as this period aligns with the previous study by Cecilia et al. [18], enabling direct comparison of the results. The dataset used includes various independent variables (predictors) and the dependent variable ( T a ), with hourly mean temporal resolution, while the key predictor is the LST. The study area is divided into cells, some of which contain meteorological stations providing T a data, while others do not. Our objective is to estimate T a for the cells without stations, using the available LST measurements and other predictors such as elevation, imperviousness density, tree cover density, land cover, grassland, the NDVI, LST, and lagged LST values from 1 to 4 h backward. Additionally, we include the cell ID, latitude, longitude, hour, and month. The lagged LST values account for any delayed effects of LST on air temperature. Due to the inability to measure LST in cloudy conditions, we focus only on clear sky periods to ensure data consistency. Additionally, we apply a synoptic filter to exclude days with unstable weather, concentrating on anticyclonic days, which are ideal for characterizing the UHI effect.

2. Materials and Methods

2.1. Study Area

The study area consists of the city of Rome and its surrounding countryside. More precisely, the chosen spatial domain is shown in Figure 1, with its perimeter indicated by the white line. The district of Rome, with 2.83 million inhabitants, is the most populous in Italy and the third in the European Union [19]; its surface of 1287.36 km2 is the vastest in Italy and in the European Union. It is located in the central-western part of the Italian Peninsula and is characterized by a complex orography [20], bordering the anti-Apennine group of the Monti della Tolfa and Monti Sabatini to the northwest, the Lazio sub-Apennines to the east, the Alban Hills to the southeast and the Tyrrhenian Sea to the west; it is also crossed by the narrow Tiber Valley. The area on which this study focuses is that of the city of Rome, enclosed by the A90 Grande Raccordo Anulare (GRA) motorway (highlighted by the green line in Figure 1), with an elevation between 13 and 120 m a.s.l. The city has a great variety of buildings and alternation of green and built areas, which involves a multiplicity of very localized microclimates. Rome has a Mediterranean climate (namely, Csa in the Köppen climate classification [21]) [22] with mild winters and warm to hot summers: maximum temperatures during summer in the urban center can easily exceed 32 °C; in this season, the reduced distance of about 25 km from the Tyrrhenian Sea allows the mitigating effect of sea breeze to be always present in the daytime, at least for sunny and stable days. The effect of the sea breeze, blowing from the WSW, is more significant in the western districts of Rome, resulting in a temperature difference up to 3 °C between the eastern and western areas of the city.

2.2. Conceptualization

The dataset structure used to build the gradient boosting algorithm comprises a series of measurements x ̲ = ( x 1 , , x M ) of independent variables (predictors), with the dependent variable being y = T a , and with temporal resolution of one hour. The domain D is divided into N c cells (pixels) c i according to the LST measurement pixels, D = { c i } , i = 1 , , N c . We categorize these cells into two classes: those containing at least one weather station, forming the subset D ws , and those without any station, forming the subset D 0 , with D ws D 0 = D . This division is illustrated in Figure 1a. Both subsets have LST measurements, whereas only D ws also includes T a data. The objective is to estimate T a for every cell in D 0 at each time point during the study period. This approach aims to create a measurement grid identical to that of LST but populated with T a data. Therefore, the algorithm can be viewed as a black box that inputs an LST measurement grid with a 3 km resolution at a given time t and the T a measurements from the in situ network at the same time t, and outputs the same grid but with air temperature data, using additional independent variables (predictors) for assistance. The chosen predictors are elevation, imperviousness density, tree cover density, land cover, grassland, NDVI, LST, LST-1, LST-2, LST-3, LST-4, cell, latitude, longitude, hour, and month. Here, LST-X denotes the LST lagged by X hours. These variables enable the model to use not only LST but also the terrain morphology, land properties and usage, as well as spatial and temporal information.

2.3. Input Data

This study used the following input data.
  • LST: land surface temperature, produced by the EUMETSAT Satellite Application Facility (SAF) on Land Surface Analysis (LSA), code MLST [LSA-001], measured by the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) instrument mounted on the Meteosat Second Generation (MSG) satellites launched by the European Spatial Agency (ESA). Instead of using the raw SEVIRI data, the operational LST product from the Land Surface Analysis Satellite Applications Facility (LSA SAF) was used (https://landsaf.ipma.pt/en/data/products/land-surface-temperature-and-emissivity/, last accessed on 14 March 2024), in which the pixels containing clouds are masked. The overall accuracy of the LST product is about 2 K [23], and its spatial and temporal resolutions are, respectively, 3 km and 15 min.
  • DTM: digital terrain model, i.e., elevation a.s.l. in meters. Data extracted from the EU-DEM product [24], with a spatial resolution of 25 m and an accuracy of ±7 m RMSE. The year of measurement is 2010.
  • Imperviousness Density: density of impervious surfaces. It represents the percentage of an area covered by artificially sealed surfaces, such as roads or buildings. It ranges from 0% to 100%, with a resolution of 1%, and is provided with a spatial resolution of 10 m. The year of measurement is 2018.
  • Tree Cover Density: represents the percentage presence of trees. Like imperviousness, it ranges from 0% to 100% with a step of 1%, and is provided with a spatial resolution of 10 m. The year of measurement is 2018.
  • Corine Land Cover: represents an inventory of land cover and land use, with 44 thematic classes ranging from extensive forest areas to single vineyards. Therefore, it is a discrete dataset based on categories, with a spatial resolution of 100 m, and the year of measurement is 2018.
  • Grassland: A binary pan-European product that provides a presence/absence mask of grasslands. Therefore, it can only take two values, 0 or 1, and is provided with a spatial resolution of 10 m, and the year of measurement is 2018.
  • NDVI: The normalized difference vegetation index (NDVI) quantifies the presence and health of vegetation cover. The data used here were recorded by the MODIS (Moderate Resolution Imaging Spectroradiometer) instrument aboard the Terra satellite, and are part of the MODIS/Terra Vegetation Indices Monthly L3 Global 1km SIN Grid V006 dataset [25], with spatial and temporal resolutions of 1 km and 1 month, respectively. The value range varies from −0.2 to 1.0, with a resolution of 0.0001. When the value is 0, there is no vegetation, while increasingly positive values indicate increasing vegetation density, peaking at 1.0 for very dense and healthy vegetation. Negative values indicate the presence of water bodies.
  • Air temperature: This data is provided by in situ measurements from 15 weather stations, indicated by the orange triangles in Figure 1. These stations are part of the ASTI-Network, established within the framework of the LIFE-ASTI project. The network includes stations from the Meteo Lazio amateur network [26], as well as additional stations installed specifically during the project to cover areas lacking data.
The sources for the Imperviousness Density, Tree Cover Density, Corine Land Cover, and Grassland variables are from the Copernicus Land Monitoring Service [27]. Some of these data are showed in Figure 2.
All data were mapped onto the LST grid, with a spatial resolution of 3 km per pixel, as the LST grid has the largest pixel size among the datasets. Each grid cell includes the spatial and temporal predictors, along with the target variable T a , calculated as the average of in situ measurements within the cell. The temporal resolution was standardized to one hour, with the addition of LST lag values at 1, 2, 3, and 4 h. The final dataset was prepared for input into the machine learning model, aiming to estimate T a in grid cells without direct measurements. The LST dataset inherently provides data only for clear-sky conditions, as cloud-covered pixels are assigned as missing values. This limitation reduced the dataset to 72% of its original size but does not pose an issue for this study, which focuses on summer months dominated by anticyclonic conditions with clear skies. Furthermore, such conditions are ideal for urban heat island (UHI) development. Additionally, a synoptic filter was applied manually by consulting meteorological charts, retaining only days with stable anticyclonic conditions.

2.4. Setting the Gradient-Boosting Algorithm

The gradient-boosting algorithm was trained using a dataset representing the spatial domain D, with predictors and target values available only for D ws D .
The learning sample L was constructed by extracting rows containing the target variable T a . This dataset was split into 80% for training ( L 1 ) and 20% for testing ( L 2 ) using a random selection method. Model performance was evaluated on L 2 using RMSE as the error metric, both as an overall score and through a spatiotemporal analysis to identify error patterns. Predictor importance was assessed based on their frequency of use in regression tree splits, and correlations between predictors were analyzed.
Finally, the trained model was applied to estimate air temperature in cells lacking measurements ( c i D 0 ). The resulting dataset, now complete with T a estimates, was exported and further processed to generate statistical analyses and interpolated maps supporting the study of the urban heat island (UHI).

2.5. Error Estimation and Hyperparameters

RMSE was chosen as an error metric. Since it is the most frequently used in the literature, this choice allows for direct comparison with most similar studies. One of the main advantages of using RMSE is that it is expressed in the same unit of measurement as the target variable, thus providing a more direct and intuitive estimate of the error. Moreover, the main hyperparameters, namely the number of estimators, learning rate, max depth, minimum child weight, subsample, and colesample by tree, were optimized using a cross-validation procedure.

2.6. Calculating UHI Intensity

For estimating the intensity of the UHI and SUHI, we used the imperviousness method [18,28], which assumes a linear relationship between temperature and imperviousness. In this way, the intensity of the UHI can be obtained from the temperature–imperviousness slope, corresponding to the difference between the temperature in areas with 100% imperviousness (densely urbanized) and 0% (rural). In practice, to estimate UHI intensity, a linear fit is performed at each time instant between the temperatures in each pixel and the corresponding imperviousness values, and the UHI intensity is derived from the resulting line slope.

3. Results

3.1. Error Analysis

The gradient-boosting algorithm yields an overall RMSE value of 0.9 °C, calculated on the test set by comparing the T a values predicted by the model with the actual in situ measurements. The diurnal cycle of the error is shown in Figure 3a. There is a noticeable temporal dependency of RMSE, with lower values during the nighttime. The extremes are 0.7 °C at 07:00 and 1.2 °C at 14:00. All times are expressed in Central European Time (CET, UTC+1). The model performs better during the nighttime, which is not surprising considering that, during the daytime, the energy exchanges between the surface and the air are more complex [1]. The spatial distribution of RMSE is shown in Figure 3b, where the reported values are the mean for each cell over the entire time period. There is no evident significant spatial dependency.

3.2. Predictors Importance and Correlations

The predictor importance analysis, shown in Figure 4a, highlights that the most significant variable by far is the LST lagged by 1 h, consistent with the timescale of boundary layer processes previously discussed. The correlation matrix in Figure 4b does not reveal any unexpected or problematic correlations among the predictors, ensuring the robustness of the model.

3.3. Daily Maximum and Minimum Temperatures

The estimated air temperature data from the model across the entire domain and study period are summarized in the present analysis of daily extremes. The daily maximum and minimum values, averaged over the entire study period, are presented through the interpolated two-dimensional maps in Figure 5, and the scatter plots against imperviousness in Figure 6. A significant correlation of R = 0.72 is observed between estimated minimum values and imperviousness, which decreases when considering maximum values ( R = 0.38 ), as reported in previous studies [18,28] The spatial distribution of the minimum temperatures highlights the urban heat island phenomenon, with higher values in more urbanized areas. Compared to the previous study, which was based solely on in situ measurements, there is a noticeable improvement in spatial data coverage, especially in the city’s peripheral areas. Specifically, the rural areas to the east of the city are slightly cooler than those to the west. The cooling effect of green areas within the city is also visible, particularly in the southeast area where the Appia Antica park is located, and in the north, which is partially covered by green areas such as Veio park (see Figure 1b). The thermal pattern in the more urbanized areas is similar to that obtained solely from in situ data. For convenience, we have also included the maps obtained with only in situ data for a direct comparison. An evident urban heat island phenomenon is also visible in the minimum LST data, highlighting the presence of a nocturnal SUHI. This is evident from the scatter plot in Figure 6a, showing a significant correlation between daily minimum LST and imperviousness ( R = 0.83 ).
From the daily maximum temperature map, a gradient from west to east is observed, due to the sea breeze. The same phenomenon is also visible when LST is considered, while the map based only on in situ data shows a more pronounced gradient along the southwest–northeast axis, with higher maximum values. This can be explained by the sparse data in peripheral areas, where the interpolation algorithm estimates using the same gradient observed within it; integrating LST data has allowed for a more accurate temperature estimation in these areas, resulting in less pronounced temperature increases when moving east of the city.
Finally, to better understand the model’s performance, the fractional bias (FB) was evaluated between the maximum and minimum temperature grids obtained using only in situ data and those from the model output, resulting in values of −0.08 for maximum and −0.06 for minimum temperatures. This indicates that the model is unbiased and does not suffer from any distortion.

3.4. UHI and SUHI Intensity

On the basis of modeled air temperature data, both the UHI and SUHI intensity were estimated using the imperviousness method [18,28].

3.4.1. UHI

The resulting UHI diurnal cycle is presented in Figure 7a, with extreme values ranging from 4.1 °C at 01:00 to 0.3 °C at 10:00 (UTC+1). This behavior is very similar to what is observed when using only in situ data, although it shows a significantly higher intensity during nighttime. Such a discrepancy can be attributed to the greater number of points used for the linear regression, encompassing a broader range of imperiousness percentages. More specifically, temperatures in rural areas surrounding the city, characterized by an IMP close to 0%, were lower compared to those measured by the single rural station in the Appia Antica Park, formerly used as a rural reference, leading to a higher nighttime intensity.

3.4.2. SUHI

The diurnal cycle of the SUHI over the entire study period is shown in Figure 7b, with extremes reaching 7.0 °C at 22:00 and −1.2 °C at 10:00 (UTC+1). Compared to the UHI, the SUHI presents higher intensities, as shown in Figure 8a.
A slight time lag is observed in the SUHI compared to the UHI, which is further illustrated in the gradient plot of Figure 8b, where the sign change from positive to negative for the SUHI occurs two hours earlier than for the UHI, corresponding to the time of maximum value. This is not surprising, considering that the effects of surface heating and cooling propagate to the air with a time delay. During the central part of the day, the SUHI exhibits positive values between 2 and 3 °C, exceeding those of the UHI except in the early morning hours, when the SUHI shows slightly negative values, indicating the faster warming of rural areas compared to urban areas after sunrise. This phenomenon can be explained by the three-dimensional nature of urban areas, where shaded spots within urban canyons slow down warming compared to generally flat, unshaded rural areas, especially when the sun is low on the horizon. Overall, the diurnal cycle of the SUHI is in agreement with the theoretical cycle [1], which predicts consistently positive values with a nighttime peak.

4. Discussion

The gradient-boosting model, applied here for the first time to convert geostationary LST into T a at an hourly temporal resolution for Rome and its surrounding countryside, delivered highly promising results with an overall RMSE of 0.9 °C. As highlighted in the introduction, few studies in the literature achieve such high temporal resolution in this type of conversion, and those that do report higher RMSE values, with the best performance being 1.1 °C in the case of Athens [13]. Therefore, the results obtained here represent a significant improvement over prior research, positioning this study as the most accurate application in the literature to date. The model’s performance also exhibited a temporal dependency, with lower RMSE values observed during the night (0.7 °C) and higher values during the day (up to 1.2 °C).
Among the predictors, the LST lagged by one hour was identified as the most significant, indicating that soil heating and cooling effects propagate to the air with a delay of approximately 1 h, which aligns with the expected timescale for boundary layer phenomena.
Additionally, the model was used to estimate air temperature in areas lacking in situ measurements within the study domain, providing the data necessary to compute UHI intensity over the summers of 2019–2020. The diurnal cycle of UHI intensity observed in this study showed a trend comparable to that of a previous UHI study based solely on in situ measurements [18]. However, this study reported a higher nocturnal UHI intensity (maximum 4.1 °C compared to 3.4 °C in the previous study), which can be explained by the inclusion of temperature estimates for rural areas with near-zero imperviousness, which were not available in the previous study.
Furthermore, the surface heat island (SUHI) was analyzed using LST data, revealing a daily trend that mirrored theoretical expectations. The peak SUHI intensity occurred at night, with values ranging from −1.2 °C to 7.0 °C. Notably, the nocturnal SUHI intensity was significantly higher than the UHI intensity, and a slight temporal lag between the SUHI and the UHI was observed, reflecting the delay in the propagation of heating and cooling effects from the ground to the air.

5. Conclusions

This study demonstrates the significant potential of the gradient-boosting model for predicting T a at high temporal resolution in urban and rural areas, using geostationary LST data. The overall RMSE of 0.9 °C achieved is the lowest reported in the literature to date, marking a breakthrough in the use of machine learning for LST-based air temperature estimation. The model’s ability to capture spatio-temporal patterns in temperature dynamics makes it a valuable tool for studying urban heat islands and other temperature-related phenomena. Further improvements to the model could be achieved by integrating additional in situ measurements in rural areas with minimal imperviousness or by downscaling LST measurements, as suggested in Sismanidis et al. [29], which achieved a spatial resolution of 1 km.
The methodology developed in this study is easily transferable to other cities, providing a powerful tool for the design of urban adaptation and mitigation strategies, and for assessing their potential impacts on local climate dynamics.

Author Contributions

Conceptualization, A.C. and G.C.; methodology, A.C. and G.C.; software, A.C.; validation, A.C.; formal analysis, A.C.; investigation, A.C.; resources, A.C. and I.P.; data curation, A.C.; writing—original draft preparation, A.C.; writing—review and editing, G.C. and I.P.; visualization, A.C.; supervision, G.C. and S.A.; project administration, S.A.; funding acquisition, S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used for this study are publicly available at the following sources: (1) ASTI-Network in situ air temperature: https://doi.org/10.5281/zenodo.14148647 (accessed on 22 January 2025); (2) LST: https://lsa-saf.eumetsat.int/en/data/products/land-surface-temperature-and-emissivity/ (accessed on 22 January 2025); (3) DTM Imperviousness density, tree cover density, corine land cover and grassland: https://land.copernicus.eu (accessed on 22 January 2025); (4) NDVI: https://lpdaac.usgs.gov/products/mod13a3v061/ (accessed on 22 January 2025).

Acknowledgments

This study was funded and supported by (1) EU LIFE-ASTI project “Implementation of a forecasting system for urban heat island effect for the development of urban adaptation strategy” (LIFE17 CCA/GR/000108); (2) LIFE21-GIE-EL-LIFE project “A System for Integrated Environmental Information in Urban Areas (SIRIUS)”; (3) PRIN 2022 project “Urban Heat and Pollution Islands Interaction in Rome and Possible Mitigation Strategies (RESTART)” funded by the Italian Ministry of University and Research (Prot. 2022KZ2AJE); the (4) Institute of Atmospheric Sciences and Climate (ISAC) of the National Research Council (CNR), Via Fosso del Cavaliere 100, 00133 Rome, Italy; (5) Meteo Lazio™ amateur weather network (https://www.meteoregionelazio.it (accessed on 22 January 2025)).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
UHICanopy Layer Urban Heat Island
SUHISurface Urban Heat Island
LSTLand Surface Temperature
IMPImperviousness
NDVINormalized Difference Vegetation Index
RMSERoot of mean squared error
a.s.l.above sea level

References

  1. Oke, T.R.; Mills, G.; Christen, A.; Voogt, J.A. Urban Climates; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar] [CrossRef]
  2. Marinaccio, A.; Scortichini, M.; Gariazzo, C.; Leva, A.; Bonafede, M.; De’ Donato, F.K.; Stafoggia, M. Nationwide epidemiological study for estimating the effect of extreme outdoor temperature on occupational injuries in Italy. Environ. Int. 2019, 133, 105176. [Google Scholar] [CrossRef]
  3. Ciardini, V.; Argentini, S.; Sozzi, R.; Caporaso, L.; Petenko, I.; Bolignano, A.; Morelli, M.; Melas, D.A. Interconnections of the urban heat island with the spatial and temporal micrometeorological variability in Rome. Urban Clim. 2019, 29, 100493. [Google Scholar] [CrossRef]
  4. Zhou, D.; Xiao, J.; Bonafoni, S.; Berger, C.; Deilami, K.; Zhou, Y.; Frolking, S.; Yao, R.; Qiao, Z.; Sobrino, J.A. Satellite remote sensing of surface urban heat islands: Progress, challenges, and perspectives. Remote Sens. 2019, 11, 48. [Google Scholar] [CrossRef]
  5. United Nations, Department of Economic and Social Affairs, Population Division. World Urbanization Prospects: The 2018 Revision, Custom Data Acquired via Website. 2018. Available online: https://population.un.org/wup/assets/WUP2018-Report.pdf (accessed on 22 January 2025).
  6. Meehl, G.; Tebaldi, C. More Intense, More Frequent, and Longer Lasting Heat Waves in the 21st Century. Science 2004, 305, 994–997. [Google Scholar] [CrossRef]
  7. Chen, E.; Allen, L., Jr.; Bartholic, J.; Gerber, J. Comparison of winter-nocturnal geostationary satellite infrared-surface temperature with shelter-height temperature in Florida. Remote Sens. Environ. 1983, 13, 313–327. [Google Scholar] [CrossRef]
  8. Bechtel, B.; Wiesner, S.; Zaksek, K. Estimation of Dense Time Series of Urban Air Temperatures from Multitemporal Geostationary Satellite Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4129–4137. [Google Scholar] [CrossRef]
  9. Good, E. Daily minimum and maximum surface air temperatures from geostationary satellite data. J. Geophys. Res. Atmos. 2015, 120, 2306–2324. [Google Scholar] [CrossRef]
  10. Pichierri, M.; Bonafoni, S.; Biondi, R. Satellite air temperature estimation for monitoring the canopy layer heat island of Milan. Remote Sens. Environ. 2012, 127, 130–138. [Google Scholar] [CrossRef]
  11. Nieto, H.; Sandholt, I.; Aguado, I.; Chuvieco, E.; Stisen, S. Air Temperature Estimation with MSG-SEVIRI Data: Calibration and Validation of the TVX Algorithm for the Iberian Peninsula. Remote Sens. Environ. 2011, 115, 107–116. [Google Scholar] [CrossRef]
  12. Wloczyk, C.; Borg, E.; Richter, R.; Miegel, K. Estimation of instantaneous air temperature above vegetation and soil surfaces from Landsat 7 ETM+ data in northern Germany. Int. J. Remote Sens. 2011, 32, 9119–9136. [Google Scholar] [CrossRef]
  13. Agathangelidis, I.; Cartalis, C.; Santamouris, M. Estimation of Air Temperatures for the Urban Agglomeration of Athens with the Use of Satellite Data. Geoinform. Geostat. Overv. 2016, 4, 2. [Google Scholar] [CrossRef]
  14. dos Santos, R.S. Estimating spatio-temporal air temperature in London (UK) using machine learning and earth observation satellite data. Int. J. Appl. Earth Obs. Geoinf. 2020, 88, 102066. [Google Scholar] [CrossRef]
  15. Carrión, D.; Arfer, K.B.; Rush, J.; Dorman, M.; Rowland, S.T.; Kioumourtzoglou, M.A.; Kloog, I.; Just, A.C. A 1-km hourly air-temperature model for 13 northeastern U.S. states using remotely sensed and ground-based measurements. Environ. Res. 2021, 200, 111477. [Google Scholar] [CrossRef]
  16. Tran, D.P.; Liou, Y.A. Creating a spatially continuous air temperature dataset for Taiwan using thermal remote-sensing data and machine learning algorithms. Ecol. Indic. 2024, 158, 111469. [Google Scholar] [CrossRef]
  17. Wang, C.; Bi, X.; Luan, Q.; Li, Z. Estimation of Daily and Instantaneous Near-Surface Air Temperature from MODIS Data Using Machine Learning Methods in the Jingjinji Area of China. Remote Sens. 2022, 14, 1916. [Google Scholar] [CrossRef]
  18. Cecilia, A.; Casasanta, G.; Petenko, I.; Conidi, A.; Argentini, S. Measuring the urban heat island of Rome through a dense weather station network and remote sensing imperviousness data. Urban Clim. 2022, 47, 101355. [Google Scholar] [CrossRef]
  19. Italian National Institute of Statistics (ISTAT). 2020. Available online: https://www.istat.it (accessed on 22 January 2025).
  20. Sozzi, R.; Casasanta, G.; Cecilia, A.; Argentini, S.; Ciardini, V.; Finardi, S.; Petenko, I. Surface and aerodynamic parameters estimation for urban and rural areas. Atmosphere 2020, 11, 147. [Google Scholar] [CrossRef]
  21. Köppen, W. Das geographische System der Klimate. Handb. Klimatol. 1936, 1. Available online: https://koeppen-geiger.vu-wien.ac.at/pdf/Koppen_1936.pdf (accessed on 22 January 2025).
  22. Kottek, M.; Grieser, J.; Beck, C.; Rudolf, B.; Rubel, F. World Map of the Köppen-Geiger Climate Classification Updated. Meteorol. Z. 2006, 15, 259–263. [Google Scholar] [CrossRef] [PubMed]
  23. Trigo, I.F.; Monteiro, I.T.; Olesen, F.; Kabsch, E. An assessment of remotely sensed land surface temperature. J. Geophys. Res. 2008, 113, D17108. [Google Scholar] [CrossRef]
  24. Copernicus EU. EU-DEM; Ultimo Accesso: Fresno, CA, USA, 15 March 2024. [Google Scholar]
  25. Didan, K. MODIS/Terra Vegetation Indices Monthly L3 Global 1km SIN Grid V061. 2021. Available online: https://doi.org/10.5067/MODIS/MOD13A3.061 (accessed on 15 March 2024). [CrossRef]
  26. Meteo Lazio, a Private Meteorological Center for Lazio Region, Which Mainly Deals with Monitoring Through Its Own Network of Meteorological Stations. Available online: https://www.meteoregionelazio.it/ (accessed on 22 January 2025).
  27. Copernicus EU. Land Monitoring Service; Ultimo Accesso: Fresno, CA, USA, 15 March 2024. [Google Scholar]
  28. Schatz, J.; Kucharik, C.J. Urban Climate Effects on Extreme Temperatures in Madison, Wisconsin, USA. Environ. Res. Lett. 2015, 10, 094024. [Google Scholar] [CrossRef]
  29. Sismanidis, P.; Keramitsoglou, I.; Kiranoudis, C. Diurnal analysis of surface Urban Heat Island using spatially enhanced satellite derived LST data. In Proceedings of the 2015 Joint Urban Remote Sensing Event (JURSE), Lausanne, Switzerland, 30 March–1 April 2015. [Google Scholar] [CrossRef]
Figure 1. (a) Spatial work domain, highlighting the edges of the LST measurement cells (pixels). In light blue are the cells where in situ meteorological stations are present, indicated by orange triangles. (b) Zoom on the domain to better shows the urbanized areas and the parks, the latter highlighted in green with the name in white. The blue lines indicate the main roads, while the light blue color represents the main watercourses. Finally, the green line indicates the A90 Grande Raccordo Anulare motorway, within which we conventionally define the city of Rome.
Figure 1. (a) Spatial work domain, highlighting the edges of the LST measurement cells (pixels). In light blue are the cells where in situ meteorological stations are present, indicated by orange triangles. (b) Zoom on the domain to better shows the urbanized areas and the parks, the latter highlighted in green with the name in white. The blue lines indicate the main roads, while the light blue color represents the main watercourses. Finally, the green line indicates the A90 Grande Raccordo Anulare motorway, within which we conventionally define the city of Rome.
Remotesensing 17 00468 g001
Figure 2. Some of the dataset predictors, resampled to LST resolution. (a) Altitude a.s.l. (b) Imperviousness. (c) Land Cover. (d) Tree Cover. (e) Grassland. (f) NDVI (July 2019).
Figure 2. Some of the dataset predictors, resampled to LST resolution. (a) Altitude a.s.l. (b) Imperviousness. (c) Land Cover. (d) Tree Cover. (e) Grassland. (f) NDVI (July 2019).
Remotesensing 17 00468 g002
Figure 3. (a) Diurnal cycle of RMSE; (b) spatial distribution of RMSE, the mean over the entire time period.
Figure 3. (a) Diurnal cycle of RMSE; (b) spatial distribution of RMSE, the mean over the entire time period.
Remotesensing 17 00468 g003
Figure 4. (a) Importance of predictors in percentage. (b) Correlation matrix between predictors. The abbreviations used are as follows: cell, cell ID in the domain; lon, longitude; lat, latitude; lst_C, synchronous LST; dtm, elevation; imp, imperviousness; lst_DX, LST with a lag of X hours backwards.
Figure 4. (a) Importance of predictors in percentage. (b) Correlation matrix between predictors. The abbreviations used are as follows: cell, cell ID in the domain; lon, longitude; lat, latitude; lst_C, synchronous LST; dtm, elevation; imp, imperviousness; lst_DX, LST with a lag of X hours backwards.
Remotesensing 17 00468 g004
Figure 5. Spatial distribution of daily extrema of LST and air temperature, including for the latter maps based only on in situ measurements and on model estimates. (a) Minimum T a (in situ data). (b) Minimum T a (model output). (c) Minimum LST. (d) Maximum T a (in situ data). (e) Maximum T a (model output). (f) Maximum LST.
Figure 5. Spatial distribution of daily extrema of LST and air temperature, including for the latter maps based only on in situ measurements and on model estimates. (a) Minimum T a (in situ data). (b) Minimum T a (model output). (c) Minimum LST. (d) Maximum T a (in situ data). (e) Maximum T a (model output). (f) Maximum LST.
Remotesensing 17 00468 g005
Figure 6. Scatter plots of daily extremes of LST and air temperature (model output data) against IMP (imperviousness), with respective Pearson correlation indices R. (a) Minimum LST. (b) Minimum temperature. (c) Maximum LST. (d) Maximum temperature.
Figure 6. Scatter plots of daily extremes of LST and air temperature (model output data) against IMP (imperviousness), with respective Pearson correlation indices R. (a) Minimum LST. (b) Minimum temperature. (c) Maximum LST. (d) Maximum temperature.
Remotesensing 17 00468 g006
Figure 7. (a) Diurnal cycle of UHI intensity estimated using air temperature data; (b) diurnal cycle of SUHI intensity estimated using LST and imperviousness data, along with the diurnal cycle of the correlation coefficient between LST and imperviousness.
Figure 7. (a) Diurnal cycle of UHI intensity estimated using air temperature data; (b) diurnal cycle of SUHI intensity estimated using LST and imperviousness data, along with the diurnal cycle of the correlation coefficient between LST and imperviousness.
Remotesensing 17 00468 g007
Figure 8. (a) Diurnal cycle of UHI intensity estimated using air temperature data output from the machine learning model, and SUHI intensity measured using LST data, compared, (b) and the diurnal cycle of the SUHI and UHI gradients.
Figure 8. (a) Diurnal cycle of UHI intensity estimated using air temperature data output from the machine learning model, and SUHI intensity measured using LST data, compared, (b) and the diurnal cycle of the SUHI and UHI gradients.
Remotesensing 17 00468 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cecilia, A.; Casasanta, G.; Petenko, I.; Argentini, S. A Machine Learning Algorithm to Convert Geostationary Satellite LST to Air Temperature Using In Situ Measurements: A Case Study in Rome and High-Resolution Spatio-Temporal UHI Analysis. Remote Sens. 2025, 17, 468. https://doi.org/10.3390/rs17030468

AMA Style

Cecilia A, Casasanta G, Petenko I, Argentini S. A Machine Learning Algorithm to Convert Geostationary Satellite LST to Air Temperature Using In Situ Measurements: A Case Study in Rome and High-Resolution Spatio-Temporal UHI Analysis. Remote Sensing. 2025; 17(3):468. https://doi.org/10.3390/rs17030468

Chicago/Turabian Style

Cecilia, Andrea, Giampietro Casasanta, Igor Petenko, and Stefania Argentini. 2025. "A Machine Learning Algorithm to Convert Geostationary Satellite LST to Air Temperature Using In Situ Measurements: A Case Study in Rome and High-Resolution Spatio-Temporal UHI Analysis" Remote Sensing 17, no. 3: 468. https://doi.org/10.3390/rs17030468

APA Style

Cecilia, A., Casasanta, G., Petenko, I., & Argentini, S. (2025). A Machine Learning Algorithm to Convert Geostationary Satellite LST to Air Temperature Using In Situ Measurements: A Case Study in Rome and High-Resolution Spatio-Temporal UHI Analysis. Remote Sensing, 17(3), 468. https://doi.org/10.3390/rs17030468

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop