Remotely Sensed Derived Land Surface Temperature (LST) as a Proxy for Air Temperature and Thermal Comfort at a Small Geographical Scale

: Urban Heat Islands (UHIs) and Urban Cool Islands (UCIs) can be measured by means of in situ measurements and interpolation methods, which often require densely distributed networks of sensors and can be time-consuming, expensive and in many cases infeasible. The use of satellite data to estimate Land Surface Temperature (LST) and spectral indices such as the Normalized Difference Vegetation Index (NDVI) has emerged in the last decade as a promising technique to map Surface Urban Heat Islands (SUHIs), primarily at large geographical scales. Furthermore, thermal comfort, the subjective perception and experience of humans of micro-climates, is also an important component of UHIs. It remains unanswered whether LST can be used to predict thermal comfort. The objective of this study is to evaluate the accuracy of remotely sensed data, including a derived LST, at a small geographical scale, in the case study of King Abdulaziz University (KAU) campus (Jeddah, Saudi Arabia) and four surrounding neighborhoods. We evaluate the potential use of LST estimates as proxy for air temperature (T air ) and thermal comfort. We estimate LST based on Landsat-8 measurements, T air and other climatological parameters by means of in situ measurements and subjective thermal comfort by means of a Physiological Equivalent Temperature (PET) model. We ﬁnd a signiﬁcant correlation (r = 0.45, p < 0.001) between LST and mean T air and the compatibility of LST and T air as equivalent measures using Bland-Altman analysis. We evaluate several models with LST, NDVI, and Normalized Difference Built-up Index (NDBI) as data inputs to proxy T air and ﬁnd that they achieve error rates across metrics that are two orders of magnitude below that of a comparison with LST and T air alone. We also ﬁnd that, using only remotely sensed data, including LST, NDVI, and NDBI, random forest classiﬁers can detect sites with “very hot” classiﬁcation of thermal comfort nearly as effectively as estimates using in situ data, with one such model attaining an F1 score of 0.65. This study demonstrates the potential use of remotely sensed measurements to infer the Physiological Equivalent Temperature (PET) and subjective thermal comfort at small geographical scales as well as the impacts of land cover and land use characteristics on UHI and UCI. Such insights are fundamental for sustainable urban planning and would contribute enormously to urban planning that considers people’s well-being and comfort.


Introduction
Urban Heat Island (UHI) occurs when the atmospheric temperature in urban areas is higher compared to surrounding rural settings [1]. It results, in part, from heat that the impacts of urban functional zones and spatial heterogeneity on LST in Beijing, China; Temporal changes in LST as a result of urban redevelopments in Lyon, France where investigated by [46]; and the impacts of different types of LULC patterns on SUHI variations in Bangkok, Thailand, where evaluated by [47]. Similarly, the impacts of the composition of different types of land cover on SUHI have been examined in the cases of Austin and San Antonio, Texas [48] and Portland, Oregon [49]. Recently, [50] estimated patterns of LST over King Abdulaziz University campus (Jeddah, Saudi Arabia), demonstrating the potential of Landsat-derived LST estimates to map long-term changes of the UHI phenomenon at a geographical scale of a university campus.

Thermal Comfort
While UHI and UCI are typically defined according to observed variations in air temperature, both phenomena have immense impacts on people's well-being and thermal comfort, which are predominantly associated with the UHI phenomenon [51]. Thermal comfort refers to the subjective sensation or satisfaction with the thermal environment [52,53]. According to this definition, any deviation from a thermal comfort range could result in a discomfort sensation [54]. Especially during periods of heat stress in warm-weather cities, the UHI effect may have a debilitating effect on the health and activity of people in the urban area [55] and their thermal comfort [56]. Thermal comfort is not only a factor of air temperature, but also a factor of other climatological conditions such as wind speed and air humidity [57,58], solar radiation [59] and duration of direct sun [60].
In addition, thermal comfort is impacted by LULC characteristics. Greenery areas, for example, urban parks, trees and gardens, or water bodies [61], can increase the livability of urban areas and people's comfort from both physiological and psychological perspectives [62][63][64].
While thermal comfort can be measured using direct interviews or questionnaires, several numeric indices have been developed to estimate the "real feel" or the sensation of the temperature, including the temperature-humidity index (THI) (originally developed by [65]), the wet bulb-globe temperature index (WBGT) [66], the physiological equivalent temperature (PET) [67] and the universal thermal climatic index (UTCI) [68]. Of these, PET is especially advantageous because it is a universal index useful in both hot and colder climates, irrespective of clothing and metabolic activity; it is calculated based on thermophysiological variables, which capture humans' sensation of climate without relying on subjective measures, and is measured in • C, which make it easy to interpolate [69].

Study Objectives
In parallel to the advancements in remotely sensed derived LST estimations, several approaches have been proposed to validate the accuracy of LST estimations. Validation with in situ measurements is probably the most commonly used method to assess the accuracy of LST [29,70]. Only a limited number of studies evaluate the potential of LST as a proxy for thermal comfort. Existing studies look, for example, at the potential of LST for estimating Temperature Humidity Index (THI) [71] or for heat stress mapping [72].
In this study, we evaluate the relationship between remotely sensed derived LST, in situ measurements and thermal comfort in the case study of King Abdulaziz University (KAU) campus (Jeddah, Saudi Arabia). We estimate LST using Landsat-8 observations and compare LST's derived spatial patterns against in situ climatological measurements and thermal comfort. The objectives of this study are as follows: (1) to validate the methodology presented in [50] to map the extent and spatial patterns of LST in the case study of KAU campus and the surrounding residential neighborhoods; (2) to evaluate the relationship between LST and climatological in situ measurements; (3) to evaluate the association between T air , LST and PET (as an indicator of thermal comfort); (4) to evaluate the relation of T air , LST and PET to LULC; (5) to demonstrate the impact of different types of land cover and land use on micro-climate variations. We now turn to describe the methodology adopted in this study to assess the UHI using remotely sensed measurements and in situ measurements (Section 2). In Section 3 we describe the results, followed by a discussion and conclusion (Sections 4 and 5, respectively).

Study Area
The analysis was performed on King Abdulaziz University (KAU) campus, Jeddah, Saudi Arabia, and four surrounding neighborhoods: Al Sulaymaniyah from the north, Abruq Ar Rughamah from the east, Al Jami'ah from the south, and from the west Al Fayha'a ( Figure 1).
of KAU campus and the surrounding residential neighborhoods; (2) to evaluate the relationship between LST and climatological in situ measurements; (3) to evaluate the association between Tair, LST and PET (as an indicator of thermal comfort); (4) to evaluate the relation of Tair, LST and PET to LULC; (5) to demonstrate the impact of different types of land cover and land use on micro-climate variations.
We now turn to describe the methodology adopted in this study to assess the UHI using remotely sensed measurements and in situ measurements (Section 2). In Section 3 we describe the results, followed by a discussion and conclusion (Sections 4 and 5, respectively).

Study Area
The analysis was performed on King Abdulaziz University (KAU) campus, Jeddah, Saudi Arabia, and four surrounding neighborhoods: Al Sulaymaniyah from the north, Abruq Ar Rughamah from the east, Al Jami`ah from the south, and from the west Al Fayha'a ( Figure 1).

Figure 1.
The location of the measurement sites on the King Abdulaziz University (KAU) campus, Jeddah, Saudi Arabia, and four surrounding neighborhoods: Al Sulaymaniyah, Abruq Ar Rughamah, Al Jami`ahand and Al Fayha'a.

Estimating Land Surface Temperature (LST)
Remotely sensed observations collected by the Landsat constellation of satellites are ideal for LST estimations thanks to the thermal infrared radiometers carried by the Landsat 5, 7 and 8 satellites, which make their data suitable for LST estimations, including for characterization of UHI. In this study, we derived LST estimations from Band 10 (Thermal infrared (TIR1), resampled from 100 m to 30 m, 10.60-11.19 µ m) of Landsat-8 observations  Remotely sensed observations collected by the Landsat constellation of satellites are ideal for LST estimations thanks to the thermal infrared radiometers carried by the Landsat 5, 7 and 8 satellites, which make their data suitable for LST estimations, including for characterization of UHI. In this study, we derived LST estimations from Band 10 (Thermal infrared (TIR1), resampled from 100 m to 30 m, 10.60-11.19 µm) of Landsat-8 observations (a detailed description of the methodology is provided in Appendix A). The in situ measurements were carried out between 3-9 September 2020; however, during this period there were no overpasses of Landsat-8 over the Area of Interest (AOI). We thus relied on four Landsat-8 overpasses that collected imagery closest to the time the in situ measurements were collected: 12 August, 28 August, 13 September, and 19 September 2020.
For analyzing the correspondence between LST and LULC, we calculated the median LST for these four Landsat-8 overpasses (12 August, 28 August, 13 September, 29 September Land 2021, 10, 410 5 of 24 2020). For a direct comparison with in situ and PET measurements (collected from 3 through 9 September), temporal precision was more important, given weather dynamics, so we calculated the mean of the Landsat-8 overpasses immediately preceding (28 August 2020) and succeeding (13 September 2020) the dates of in situ collection.

Climatological In Situ Measurements
We collected climatological measurements using a WTH600-E-KIT Wireless Weather Station Kit (http://www.extech.com/products/WTH600-E-KIT (accessed on 9 April 2021)), a versatile and mobile climate measuring toolkit ideal for taking climatological measurements at ground level (1.5 m off the ground). Collected measurements include temperature (in Celsius, with an accuracy of +/− 0.5 • C), relative humidity (the ratio of water vapor in the air versus the quantity of water vapor the air could theoretically hold at the given temperature, measured in %RH, with an accuracy of +/− 3 %RH), air pressure (measured in hectopascals with an accuracy of +/− 4 hPa), wind speed and direction (measured in km/h and cardinal directions, respectively).
Climatological measurements were collected during a seven-day period from 3 September to 9 September 2020, four times a day: morning ( We collected measurements in 20 locations (12 locations on campus and 8 outside of campus that were publicly accessible (i.e., by car or by foot) (Figure 1), with each measurement approximately 10 min in length. These locations represent a diverse range of land cover and land use types, including green spaces, densely built areas, bare land, paved areas, etc. (photos of these locations are provided in Figure 2). In each location we collected measurements at five measurement sites (a center location and four satellite locations, each 100 m to the north, south, east and west), totaling 100 measurement sites. Measurements were collected during the day under sunny and shaded conditions (e.g., under trees, fences, and buildings).
In total, 3405 measurements were collected; however, a few observations contained outlier values, including some possible data entry errors, which were cleaned and filtered. Observations that had a relative humidity measurement outside 0-100% or an air temperature measurement 5 standard deviations above or below the mean were omitted from analysis. After filtering, 3393 measurements remained for use in analysis.

Thermal Indices
Like [73], we calculate the Physiological Equivalent Temperature (PET) index based on the Radiation on the Human Body (RayMan) model [74]. RayMan is a freely available radiation and human bioclimate model (available for download at: https://www.urbanclimate. net/rayman/ (accessed on 9 April 2021)). PET allows for understanding the effect of complex outdoor weather factors by estimating an equivalent "indoor" temperature that simulates the same thermal effect given environmental variables (we provide the PET formula in Appendix B).
The RayMan model calculates the amount of heat flow that would be produced given certain outdoor conditions, including levels of radiation from direct and reflected sunlight that affect mean radiant temperature, a key influence on the human heat-balance equation [74]. The model then determines what air temperature would be required in the reference indoor environment to achieve the same heat flow, and the resulting temperature is the PET. The RayMan model simulates short-and long-wave radiation flux densities from the three dimensional surroundings [75]. According to [74], the radiation fluxes are most important for the human energy balance of the human body and the resulting thermal index PET, especially during summer. The model considers all simple and complex environments with their radiation properties (albedo and emissivity). It estimates the radiation fluxes and the effects of clouds on short-wave radiation fluxes [76].
PET. The RayMan model simulates short-and long-wave radiation flux densities from the three dimensional surroundings [75]. According to [74], the radiation fluxes are most important for the human energy balance of the human body and the resulting thermal index PET, especially during summer. The model considers all simple and complex environments with their radiation properties (albedo and emissivity). It estimates the radiation fluxes and the effects of clouds on short-wave radiation fluxes [76]. We calculated PET with the RayMan model using the data collected in situ as inputs, including date, time of day, air temperature, relative humidity, wind speed and estimated cloud coverage. We set the location to be centered on Jeddah (21° N, 39° E) and at sea level. For the biometric data, we assumed an average height (1.67 m), weight (75.3 kg) and age (31 years) (relying on https://www.who.int/ncds/surveillance/steps/2005_SaudiAra-bia_STEPS_Report_EN.pdf (accessed on 9 April, 2021)). We assumed a light layer of clothing and light activity (akin to walking).
We also calculated a "heat index" based on [77,78]: We calculated PET with the RayMan model using the data collected in situ as inputs, including date, time of day, air temperature, relative humidity, wind speed and estimated cloud coverage. We set the location to be centered on Jeddah (21 • N, 39 • E) and at sea level. For the biometric data, we assumed an average height (1.67 m), weight (75.3 kg) and age (31 years) (relying on https://www.who.int/ncds/surveillance/steps/2005_SaudiArabia_ STEPS_Report_EN.pdf (accessed on 9 April 2021)). We assumed a light layer of clothing and light activity (akin to walking).
We also calculated a "heat index" based on [77,78]: where T = ambient dry bulb temperature ( • F) and R = relative humidity (integer percentage). All measurements of field surveys were recorded into a mobile application survey platform, Fulcrum (https://www.fulcrumapp.com (accessed on 9 April 2021)) (provided by Fulcrum for evaluation for this study). Fulcrum was downloaded onto both OS and Android devices and synched with a web application.

Spectral Indices
We also assessed the relation between LST, in situ measurements, PET and LULC, namely green vegetation and built-up land cover. While several spectral indices have been proposed in the literature for capturing green vegetation and built-up land cover, it was not the objective of this study to compare between the different spectral indices. We thus relied on two of the most used spectral indices: The Normalized Difference Vegetation Index (NDVI) [79] to capture vegetation and the Normalized Difference Built-up Index (NDBI) [80] to capture built-up land cover.
NDVI expresses the relation between red visible light (which is typically absorbed by a plant's chlorophyll) and near-infrared wavelength (which is scattered by the leaf's mesophyll structure). It is computed as where NIR is the near-infrared wavelength and RED is the red wavelength. The values of NDVI range between (−1) and (+1). NDBI expresses the relation between the medium-infrared and the near-infrared wavelengths. It is computed as where MIR is the medium-infrared and NIR is the near-infrared wavelength. The index assumes a higher reflectance of built-up areas in the medium-infrared wavelength range than in the near-infrared.
To examine the relation between spatial patterns and characteristics of T air and remotely sensed data (LST, NDVI and NDBI) we overlaid the locations of the in situ measurement sites with the corresponding Landsat pixels. We used the Spearman Rank measure to calculate the correlations between these measures (given its robustness with non-normally distributed data [81]) and the Bland-Altman method [82] to assess the agreement between them. For the Bland-Altman analysis, we set a threshold so that 95% of the data within +/− two standard deviations of the mean difference between both methods would indicate a viable correspondence. For this analysis, we only included observations collected from 12 p.m. to 1 p.m. local time (n = 869), which are the closest in time to Landsat's overpass (i.e., around 10:50 a.m. local time); however, we used all available in situ observations after the filtering described in Section 2.2.2 for general comparative analysis (n = 3393).
To capture the nonlinear interactions of the remotely sensed and in situ covariates, we employed multiple linear regression (Ordinary Least Squares) and a few simple nonlinear models, including logistic regression, Bayesian ridge regression [83,84], random forest regression [85], and gradient boosted machines (XGB) [86], across combinations of LST, NDVI, NDBI against T air and PET (see Appendix C).
For validation metrics, we used the median error (MdE) for accuracy, the median absolute deviation for precision (MdAD) and the root mean squared error (equivalently, root mean squared deviation, RMSD) for uncertainty [87]. We ran 5-fold cross-validation and evaluated the metrics and correlation on only the validation fold of each run.
Finally, we investigated the ability of LST and these spectral indices to detect "very hot" locations (namely UHI) and compared these patterns against the in situ measurements. We assigned levels of thermal comfort to measurement sites based on median PET value. We followed the scale presented with the RayMan model and presented in [88] (Table 1). The median PET temperatures during the data collection fell in the Warm, Hot, and Very Hot scale range ( Figure 3). We therefore aimed to isolate "very hot locations" (i.e., where the median PET temperature was above 41 • C) in order to create two relatively comparable sample sets: "very hot" (n = 57) and the rest of the observations (n = 43).
Finally, we investigated the ability of LST and these spectral indices to detect "very hot" locations (namely UHI) and compared these patterns against the in situ measurements. We assigned levels of thermal comfort to measurement sites based on median PET value. We followed the scale presented with the RayMan model and presented in [88] ( Table 1). The median PET temperatures during the data collection fell in the Warm, Hot, and Very Hot scale range ( Figure 3). We therefore aimed to isolate "very hot locations" (i.e., where the median PET temperature was above 41 °C ) in order to create two relatively comparable sample sets: "very hot" (n = 57) and the rest of the observations (n = 43).  Table 1, there are 57 locations with median PET in the "very hot" range (above 41 °C), 35 locations with median PET in the "hot" range (above 35 °C and up to and including 41 °C ), 8 locations with median PET in the "warm" range (above 29 °C and up to and including 35 °C ), and no locations with median PET below the 29 °C threshold.
With these two classes identified, we evaluated a set of logistic regression and random forest classifiers to evaluate the potential to infer these classes based on a range of data inputs collected either from the in situ data or from Landsat-8. We used a harmonic mean (i.e., F1 score) as the evaluation metric:  Table 1, there are 57 locations with median PET in the "very hot" range (above 41 • C), 35 locations with median PET in the "hot" range (above 35 • C and up to and including 41 • C), 8 locations with median PET in the "warm" range (above 29 • C and up to and including 35 • C), and no locations with median PET below the 29 • C threshold.
With these two classes identified, we evaluated a set of logistic regression and random forest classifiers to evaluate the potential to infer these classes based on a range of data inputs collected either from the in situ data or from Landsat-8. We used a harmonic mean (i.e., F1 score) as the evaluation metric: where TP are True Positives, FP are False Positives and FN are False Negatives. As with the LST and T air comparison, we conducted 5-fold stratified cross-validation and only evaluated performance on the test fold of each iteration.

Spatial Patterns of LST Estimations
The analysis revealed variations in the spatial patterns of LST across the study area, which we relate to the distribution and characteristics of the LULC, including green vegetation, bare land, paved surfaces, built-up structures and building materials. LST varied between the four Landsat-8 overpasses; the highest LST in the study area (calculated as the average LST value of all pixels within the study area) was measured on 13 September and the lowest on 29 September (48.4 • C and 43 • C, respectively).
For comparison between LST, NDVI and NDBI, we calculated the median LST value of each pixel in the four Landsat-8 scenes to create a single composite scene. As expected, Land 2021, 10, 410 9 of 24 green spaces were characterized by lower LST compared to paved built surfaces and bare land ( Figure 4 presents the distribution of NDVI and NDBI (bottom left and right, respectively) compared to LST (top)). For example, the LST measured over the bare land area east of the staff housing ( Figure 4, area A) was 48.1 • C, while the lowest LST was measured around the green roundabout for the administrative entrance (around 41 • C, Figure 4, area C9), an area characterized by natural grass, mature trees, palms, shrubs, groundcover and grass.  A comparison of the four Landsat overpass LST values across the 100 measurement sites shows that the lowest mean temperature was measured on 29 September (42.33 °C), while the highest mean temperature was measured on 13 September (47.82 °C ) ( Table 2 provides the average derived LST from each Landsat overpass).  Outside of campus, the highest LST (48.8 • C) was measured over Al Jamea Plaza shopping center south of campus ( Figure 4, area B), which we attribute to the construction materials of the shopping center (namely rough textured concrete, stone, glass and metal). The association between LST and LULC was also reflected in the correlation we found between LST, NDVI and NDBI. Namely, as LST increased, NDVI decreased and NDBI increased (r= −0.45 and r = 0.6, respectively, p < 0.001 for both).
A comparison of the four Landsat overpass LST values across the 100 measurement sites shows that the lowest mean temperature was measured on 29 September (42.33 • C), Land 2021, 10, 410 10 of 24 while the highest mean temperature was measured on 13 September (47.82 • C) ( Table 2 provides the average derived LST from each Landsat overpass). Of the four Landsat overpasses, the overpasses on 28 August and 13 September over the AOI are the closest to the time the in situ measurements were collected. To evaluate the similarity in the climatological conditions between the time of the two Landsat overpasses (28 August and 13 September) and the time of the in situ measurements (3 September to 9 September), we obtained climatological data from the General Authority of Meteorology & Environmental Protection, Saudi Arabia, for additional validation. These measurements were collected at the Jeddah KAIA meteorological station (21 • 42 37.0 N 39 • 11 12.0 E)the closest official station to the research area. We obtained recordings of maximum temperature, maximum humidity and maximum wind speed during the research period. As shown in Table 3, the maximum temperature, humidity and wind speed recorded between 3 September and 9 September were relatively similar to the average recordings on 28 August and 13 September (the time of the two Landsat overpasses). The average maximum temperature during the week the in situ measurements were collected was 37.9 • C (SD = 1.13 • C) compared to an average of 39.6 • C on 28 August and 13 September. The relative humidity was also similar (around 75%). These findings confirm that the climatological conditions during the time of the two Landsat overpasses did not differ much from those observed during the week the in situ measurements were collected. In this study, we thus relied on the average LST measurements collected on 28 August and 13 September.

Spatial patterns of LST and Air Temperature (T air )
Analysis of the air temperature (T air ) measured across the 100 measurement sites reveals similar patterns to those observed with the LST measurements, which we relate to the characteristics of the LULC and micro-conditions of the measurement sites. Figure 5 presents the distributions of T air for all observations (n = 3393). Several trends are apparent, including the diurnal variation of the mean temperature, which ranges from 32.75 • C in the morning and 32.46 • C in the evening to 34.66 • C in the afternoon and 38.64 • C at midday. As expected, T air was lower in sites characterized by green vegetation and higher in areas covered with built-up or paved surfaces and bare land. This is indicated by a significant correlation between the average T air , NDVI and NDBI (r = −0.29, p = 0.003 and r = 0.36, p < 0.001, respectively). are apparent, including the diurnal variation of the mean temperature, which ranges from 32.75 °C in the morning and 32.46 °C in the evening to 34.66 °C in the afternoon and 38.64 °C at midday. As expected, Tair was lower in sites characterized by green vegetation and higher in areas covered with built-up or paved surfaces and bare land. This is indicated by a significant correlation between the average Tair, NDVI and NDBI (r = −0.29, p = 0.003 and r = 0.36, p < 0.001, respectively).  One of the key objectives of this study was to examine the potential utilization of remotely sensed derived LST to estimate T air and thermal comfort. We did not attempt to conduct an exact calibration of the two measures; rather, we wanted to understand their relative compatibility to detect spatial patterns of temperature variations. We used the Bland-Altman method to compare the mean and the difference of T air and LST. This method allowed us to identify acceptable parameters and any potential biases; it is often used in the literature for assessing agreement when simple correlation is not adequate [89]. Figure 6 presents a comparison (scatterplot) of the mean and the difference between T air and LST across 869 measurements collected between 12 p.m. and 2 p.m. local time (closest to the Landsat-8 overpass window time that data were collected). The results suggest that the distribution of the difference is normal (per Shapiro-Wilk test, p = 0.719) and that 96% of the observations are within 1.96 standard deviations of the mean difference (i.e., above the 95% threshold). There is a bias evident such that the overall LST temperatures were on average 6.54 • C above that of T air , whereas if the differences were due to stochastic error alone, we would expect the average to be zero. This bias tends to increase as the temperature increases. While we did not find a statistically significant correlation between LST and in situ data collected between 12 p.m. and 2 p.m., there was a significant correlation (r = 0.45, p < 0.001) between LST and the mean T air for each measurement site (n = 100) across all observations (n = 3393) (Figure 7).
We also evaluated the correlation between the LST derived from each of the two Landsat overpasses, with the mean T air measured for each measurement day (i.e., 3 September to 9 September). The results show a significant correlation between the LST we derived from each Landsat overpass individually (23 August and 13 September) and the T air that was measured during four of the measurement days (5 September, 7 September, 8 September and 9 September). For LST derived based on the 28 August Landsat overpass, this correlation ranges between r = 0.27, p = 0.006 and r = 0.35, p < 0.001 (when compared against the T air measured on 8 September and 7 September, respectively); for LST derived from the Landsat overpass on 13 September, this correlation ranges between r = 0.29, p = 0.002 and r = 0.36, p < 0.001 (when compared against the T air measured on 7 September, and 9 September, respectively). When looking at different times of the day T air was measured, we found the strongest correlation between LST derived from the Landsat overpass on 28 August and the T air measured on 8 September in the evening (between 7 p.m. and 9 p.m., r = 0.49, p < 0.001). We explain this finding by the fact that LST is likely higher than T air during the daytime and relatively lower or comparable to T air during the later times of the day, when the near surface atmosphere is more stable. , with 96% of the data within these limits. The mean difference (blue dashed) is below 0, indicating a bias such that LST is consistently higher, a difference that appears to increase as the temperature increases.  , with 96% of the data within these limits. The mean difference (blue dashed) is below 0, indicating a bias such that LST is consistently higher, a difference that appears to increase as the temperature increases.
Land 2021, 10, x FOR PEER REVIEW 13 of 27 Figure 6. The Bland-Altman plot shows the upper (red dotted) and lower (red dashed) thresholds of 1.96 standard deviations from the mean difference in Tair (measurements taken across all 100 sites between 12 p.m. and 1 p.m. local time, n = 869) and LST (28 August and 13 September mean composite), with 96% of the data within these limits. The mean difference (blue dashed) is below 0, indicating a bias such that LST is consistently higher, a difference that appears to increase as the temperature increases.  As explained above, in this study we sought to understand the relative compatibility between LST, T air and PET in detecting spatial patterns of temperature variations. As illustrated in Figure 8 (a visual assessment of the correspondence between the distribution of T air , max per measurement site against LST estimations), there was a general visual correspondence between LST and T air patterns. For example, the green space in the roundabout leading to campus administrative buildings (Figure 8, C9) and the green space near the women's campus (Figure 8, C10), which are characterized by the lowest LST (41 • C and 42.9 • C, respectively), are also among the coolest locations as indicated by T air (T air,max is on average 43.1 • C and 42.5 • C in these locations, respectively).
Land 2021, 10, x FOR PEER REVIEW 14 of 27 As explained above, in this study we sought to understand the relative compatibility between LST, Tair and PET in detecting spatial patterns of temperature variations. As illustrated in Figure 8 (a visual assessment of the correspondence between the distribution of Tair,max per measurement site against LST estimations), there was a general visual correspondence between LST and Tair patterns. For example, the green space in the roundabout leading to campus administrative buildings (Figure 8, C9) and the green space near the women's campus (Figure 8, C10), which are characterized by the lowest LST (41 °C and 42.9 °C , respectively), are also among the coolest locations as indicated by Tair (Tair,max is on average 43.1 °C and 42.5 °C in these locations, respectively). Next, we evaluated the overall relationships between LST and Tair as an indication of spatial characteristics contributing to UHI or UCI. Because the objective was to examine spatial (rather than temporal) similarities, we calculated the mean value of the in situ measurements collected throughout the entire day (n = 3393) and evaluated their relationship with LST and other potential nonlinear interactions among several variables. Determining optimal modeling techniques was not the primary aim of this analysis, so model evaluation and tuning were not explored in-depth; however, as noted, these techniques offer insight into the relationship of remotely sensed data and in situ measurements, including subtle nonlinear relationships. The results therefore show "adjusted" LST values as per the various models and input data, which are compared to Tair. To serve as a baseline assessment, a comparison between LST (non-adjusted) and Tair values yielded the following metrics (all units are in degrees Celsius): The results of these experiments (Table 4) suggest that, overall, the four evaluated methods performed relatively similarly across data inputs and methods, indicating a consistent association between LST and Tair. However, the more complex models (Random Forest and XGBoost) predicted an adjusted LST value closer to the actual Tair, particularly with the addition of NDVI and NDBI as inputs. Namely, LST combined with NDBI and Next, we evaluated the overall relationships between LST and T air as an indication of spatial characteristics contributing to UHI or UCI. Because the objective was to examine spatial (rather than temporal) similarities, we calculated the mean value of the in situ measurements collected throughout the entire day (n = 3393) and evaluated their relationship with LST and other potential nonlinear interactions among several variables. Determining optimal modeling techniques was not the primary aim of this analysis, so model evaluation and tuning were not explored in-depth; however, as noted, these techniques offer insight into the relationship of remotely sensed data and in situ measurements, including subtle nonlinear relationships. The results therefore show "adjusted" LST values as per the various models and input data, which are compared to T air . To serve as a baseline assessment, a comparison between LST (non-adjusted) and T air values yielded the following metrics (all units are in degrees Celsius): The results of these experiments (Table 4) suggest that, overall, the four evaluated methods performed relatively similarly across data inputs and methods, indicating a consistent association between LST and T air . However, the more complex models (Random Forest and XGBoost) predicted an adjusted LST value closer to the actual T air , particularly with the addition of NDVI and NDBI as inputs. Namely, LST combined with NDBI and NDVI inputs, adjusted with a nonlinear model, was a relatively strong predictor of T air (LST  Additionally, the analysis across all data and methods to determine an "adjusted" LST performed much better across the error metrics (i.e., the adjusted LST values were closer to T air ) than the baseline of LST and T air alone, indicating that using models that capture unobserved dynamics or latent relationships across spatial and temporal dimensions and covariates provide an effective way to proxy T air , as long as proper guards against over-fitting are observed (i.e., results hold up on out-of-sample data).

Inferring Thermal Comfort with In Situ and Landsat-8 Data
Next, we evaluated the association between T air and the Physiological Equivalent Temperature (PET) as a heat-balance model of the human body. As expected, we found a strong and statistically significant correlation between PET and T air (r = 0.95, p < 0.001). This relationship held when looking at the mean PET and T air across all observations for each location (r = 0.73; p < 0.001), as is clearly illustrated in Figure 9a. There was also a significant correlation between PET and heat index for each location (r = 0.75, p < 0.001) (Figure 9b). This indicates that the relationship between PET and in situ measurements held over time as a general trend and lends insight into spatial characteristics of these locations. We also found that mean PET correlated with LST (the mean of 28 August and 13 September Landsat-8 overpasses), albeit at a weaker and lower statistical threshold (r = 0.23, p = 0.02) (Figure 9c). We also evaluated the association between the LST derived from the two Landsat overpasses and the daily PET measurements. The results show a significant correlation between LST and the PET measurements on 5 September, 7 September and 9 September. The correlation ranged between r = 0.22 and r = 0.29 (between LST derived from Landsat's 28 August overpass and PET measurements from 7 September and 9 September, respectively, p = 0.03 for 7 September, and p = 0.004 for 9 September) and between r = 0.20 and r = 0.32 (between LST derived from Landsat's 13 September overpass and PET measurements from 5 September and 9 September, respectively, p = 0.04 for 5 September and p = 0.001 for 9 September). overpass and PET measurements from 5 September and 9 September, respectively, p = 0.04 for 5 September and p = 0.001 for 9 September).

Figure 9.
A scatterplot of (a) mean PET per location vs. mean in situ air temperature per location (n = 100), where there was a significant correlation (r = 0.73, p < 0.001); (b) mean PET per location vs. mean Heat Index (r = 0.75, p < 0.001); (c) mean PET per location and LST (28 August and 13 September mean), which was weakly correlated and not significant to the same degree, but still showed a relationship (r = 0.23, p = 0.02).
However, as described above, the main objective of this study was not to merely compare the absolute measurement values, but also to demonstrate the potential use of remotely sensed derived LST measures to identify "hot spots" of "uncomfortable areas", at least according to PET. For this aim, a series of logistic regression and Random Forest models were used to infer whether a location could be categorized as "very hot" (i.e., a median PET above 41 °C ). Both models were fit to in situ air temperature alone, the heat index alone, to all the inputs to the PET model (i.e., air temperature, relative humidity, wind speed, cloud coverage) and to all these inputs combined (i.e., air temperature, relative humidity, wind speed, cloud coverage, and heat index) ( Figure 10). When the same models were used to infer "very hot" locations using only combinations of remotely sensed data (LST, NDBI, NDVI), they performed surprisingly well. In fact, a Random Forest model with NDBI outperformed one of the models using in situ data.
As expected, the models using in situ data performed well given that they were based on the same input data as those used in the RayMan model to develop the PET score. The best model (i.e., a Random Forest model with air temperature, relative humidity, average wind speed, cloud coverage and heat index as inputs) attained an F1 score of 0.76116. However, the top-performing model using only remote sensing data (Random Forest with NDBI) attained an F1 score of 0.66429, which was competitive with the in situ based models and even outperformed one model that used in situ data. Other variations of Random Forest using only remote sensing data performed comparably as well. Figure 9. A scatterplot of (a) mean PET per location vs. mean in situ air temperature per location (n = 100), where there was a significant correlation (r = 0.73, p < 0.001); (b) mean PET per location vs. mean Heat Index (r = 0.75, p < 0.001); (c) mean PET per location and LST (28 August and 13 September mean), which was weakly correlated and not significant to the same degree, but still showed a relationship (r = 0.23, p = 0.02).
However, as described above, the main objective of this study was not to merely compare the absolute measurement values, but also to demonstrate the potential use of remotely sensed derived LST measures to identify "hot spots" of "uncomfortable areas", at least according to PET. For this aim, a series of logistic regression and Random Forest models were used to infer whether a location could be categorized as "very hot" (i.e., a median PET above 41 • C). Both models were fit to in situ air temperature alone, the heat index alone, to all the inputs to the PET model (i.e., air temperature, relative humidity, wind speed, cloud coverage) and to all these inputs combined (i.e., air temperature, relative humidity, wind speed, cloud coverage, and heat index) ( Figure 10). When the same models were used to infer "very hot" locations using only combinations of remotely sensed data (LST, NDBI, NDVI), they performed surprisingly well. In fact, a Random Forest model with NDBI outperformed one of the models using in situ data.
Land 2021, 10, x FOR PEER REVIEW 17 of 27 Figure 10. The model and data trials used to classify which of the locations (n = 100) were "very hot" with median PET over 41 °C . As expected, the models using in situ data (blue) did better, with the top-performing model (Random Forest (RF) with air temperature, relative humidity, average wind speed, cloud coverage, and heat index) attaining an F1 score of 0.76116. However, the top-performing model using only remote sensing data (RF with NDBI) attained an F1 score of 0.66429, which was competitive with the in situ based models and even outperformed a linear regression (LR) model using in situ data.

Discussion
Many studies have investigated the characteristics of Urban Heat Island (UHI) and the various factors that impact its spatial patterns and intensity. With the increased availability of remotely sensed satellite and airborne measurements, there is a growing interest Figure 10. The model and data trials used to classify which of the locations (n = 100) were "very hot" with median PET over 41 • C. As expected, the models using in situ data (blue) did better, with the top-performing model (Random Forest (RF) with air temperature, relative humidity, average wind speed, cloud coverage, and heat index) attaining an F1 score of 0.76116. However, the top-performing model using only remote sensing data (RF with NDBI) attained an F1 score of 0.66429, which was competitive with the in situ based models and even outperformed a linear regression (LR) model using in situ data.
As expected, the models using in situ data performed well given that they were based on the same input data as those used in the RayMan model to develop the PET score. The best model (i.e., a Random Forest model with air temperature, relative humidity, average wind speed, cloud coverage and heat index as inputs) attained an F1 score of 0.76116. However, the top-performing model using only remote sensing data (Random Forest with NDBI) attained an F1 score of 0.66429, which was competitive with the in situ based models and even outperformed one model that used in situ data. Other variations of Random Forest using only remote sensing data performed comparably as well.

Discussion
Many studies have investigated the characteristics of Urban Heat Island (UHI) and the various factors that impact its spatial patterns and intensity. With the increased availability of remotely sensed satellite and airborne measurements, there is a growing interest in developing methods that utilize remotely sensed observations to estimate the characteristics and extent of UHI. Moreover, both UHI and UCI have immense impacts on people's wellbeing and must be mapped and evaluated continuously to ensure sustainable development. Because in situ measurements to detect and map the distribution of UHI and UCI are not always available and are hard to collect, other sources of data and methodologies, such as using satellite data to estimate LST, have been increasingly adopted. The objective of this study was to evaluate the potential use of remotely sensed derived LST as proxy for spatial patterns of T air and thermal comfort (PET).
We show that patterns of UHI and UCI detected by satellite measurements can be used to identify UHI and UCI. We found a clear relationship between the built environment, climatological measures, measures of thermal comfort (PET) and, importantly, an indication that remotely sensed data, such as LST and various spectral indices, can characterize thermal comfort at micro-scales at a performance level comparable to that of in situ data.
We found a significant positive correlation between T air and LST (r = 0.45, p < 0.001). Despite the temporal inconsistency between the collection of the LST data and the in situ measurements, we were still able to find an agreement between these two measures via the Bland-Alman method. We also detected some systemic bias, such that the overall LST temperatures were on average 6.54 • C above that of T air. This bias tended to increase as the temperature increased. According to [90], T air is often lower than LST, especially when vegetation cover is low to moderate; over non-vegetated surfaces the difference may even exceed 20 • C.
We also evaluated the relationship between LST and T air using several regression models. The Random Forest model performed best; however, all models fared far better than just the baseline comparison between LST and T air , attaining error rates two orders of magnitude lower, indicating their potential to capture latent dynamics in the data that can be exploited to approximate T air . Nonparametric models such as Random Forests are capable of combining the benefits of interpretability and flexibility [91]. Here, we show that Random Forest Regression can be used to infer T air with LST, NDBI and NDVI, which together can predict up to 41% of the variation of T air . These findings support other studies of more complex models, such as Random Forest, to infer T air from LST [92,93].
Furthermore, we evaluated the potential use of LST to infer thermal comfort, i.e., the subjective perception of the thermal conditions by humans. We estimated thermal comfort using the Physiological Equivalent Temperature (PET) as a heat-balance model of the human body. Similar to [94], our results support the validity of PET as a measure of thermal comfort in local spatial and temporal (intra-day) scales and highlight its value in investigating the effects of UHI and UCI in micro-urban spaces. While the correlation between LST and PET was relatively low (r = 0.23, p = 0.02), a Random Forest regression was able to infer "very hot" locations using only combinations of remotely sensed data (LST, NDBI, NDVI). By classifying in situ measurement sites according to their median PET values and identifying "very hot" sites, we demonstrated that both logistic regression and Random Forest classifiers were able to infer levels of thermal comfort using a few input variables collected in situ, with the top model attaining an F1 score of 0.761157. This further affirms that PET might be useful in quantifying UHI and UCI effects using some basic measurements and a model, such as the RayMan model, for PET. What was perhaps more interesting was our finding that Random Forest classifiers using only remote sensing data (LST, NDBI, NDVI) were able to achieve comparable results (e.g., a Random Forest classifier using only NDBI attained an F1 score of 0.664286), demonstrating the potential to investigate predictions of thermal comfort at micro-scales using satellite data. Given the cost and inability to scale ground-based instrumentation, this is an encouraging sign, although more data and studies are needed.
Finally, we evaluated the relationship between in situ measurements, remotely sensed data and LULC characteristics. The results show variations in spatial patterns of LST that were closely related to the characteristics of the LULC, including the spatial distribution of green vegetated spaces, bare land, paved surfaces, built-up structures, and building materials. The spatial resolution of Landsat-8 (30 m) was high enough to distinguish between the temperature of sites separated by only a few hundred meters and characterized by different types of land use and land cover (LULC) (e.g., green spaces and parking lots). Similar to [95], we found that green spaces were characterized by lower LST compared to paved built surfaces and bare land (the difference between the maximum and minimum LST across the 100 measurement sites was 5.5 • C). This trend was indicated by a high and significant correlation between LST and two spectral indices representing green vegetation (NDVI) and built-up land cover (NDBI) (r = −0.45 and r = 0.6, respectively, p < 0.001 for both). These results align with previous studies that show a positive correlation between LST and NDBI and a negative correlation with NDVI [96][97][98]. To illustrate, [99] found a negative correlation between LST and NDVI (up to r = −0.63) and a positive correlation with NDBI (up to r = 0.67).
As expected, and similar to [100], we found that T air was lower in sites located in areas covered with green vegetation and higher in areas covered with built-up or paved surfaces and bare land. However, we found a lower correlation between the average T air , NDVI and NDBI (r = −0.29, p = 0.003 and r = 0.36, p < 0.001, respectively). We explain this relatively low correlation by the fact that the in situ temperature measurements represent a single location and are subject to various micro-climate conditions that are more significant than the overall vegetation cover around the location.
We note a few limitations to this study. First, in situ measurements were collected over a period of one week. Although the climate of Jeddah, Saudi Arabia, does not tend to vary much between seasons, a future study should also look at seasonal variations in predicting T air . Second, the validation of in situ data and remotely sensed data is best done with observations that are precisely aligned. Third, this study was conducted in Saudi Arabia, which is an arid region. A future study should evaluate the potential of LST to predict T air and thermal comfort in diverse types of climatological regions, including semi-arid and tropical. Fourth, in this study we used constant emissivity parameters to estimate LST. However, these parameters may differ by region and be subject to the land characteristics. While this was not the aim of the current study, a future study could investigate methods to improve the accuracy of T air predictions by identifying the optimal emissivity parameters.

Conclusions
Rapid urbanization processes together with temperature increases due to climate change are expected to result in an intensification of the Urban Heat Island (UHI) phenomenon by approximately 1 • C per decade, impacting the environment, ecology, biodiversity and people's well-being. Monitoring the extent and characteristics of the UHI is fundamental for the maintenance of sustainable urban systems and allows human settlements to be more inclusive, safe, resilient, and sustainable.
Traditional methods to measure UHI include the use of in situ measurements of climatological parameters, which are often collected ad hoc and are expensive and sparse. The increasing availability of new sources of medium spatial resolution remotely sensed satellite measurements in the last decade has triggered the development of new methods to estimate the Land Surface Temperature (LST) as a proxy for UHI. However, the majority of these studies evaluate the potential of these measurements to estimate the temperature of the air (T air ) at large geographical scales, due in part to the potential limitations due to the spatial resolution of the data. Furthermore, most studies utilize satellite measurements to estimate LST rather than the subjective thermal comfort, or sensation, of the thermal environment.
Three main conclusions arise from this study: • There is a significant correlation between LST, T air and PET, which are all also related to LULC characteristics (including those indicated by NDVI and NDBI spectral indices).

•
There is a correspondence between spatial patterns of UHI and UCI, as detected by means of remotely sensed and in situ observations. • Logistic regression and Random Forest models can be used to infer if a location is categorized as "very hot" (per PET), even when just considering LST, NDBI and NDVI.
Urban planning, mitigation and intervention efforts can help mitigate the UHI effect. This can be done, for example, by increasing green vegetation in urban areas or by planting trees along roadsides in dry areas (e.g., [101]). However, as noted by [102], there is no one mitigation solution that fits all cases, and any mitigation must consider the specific characteristics of the specific geographic area.  Acknowledgments: The authors would like to thank Brad Bottoms and Jenny Mannix for their help in the analysis and data collection. The research team would like to thank Fulcrum (https: //www.fulcrumapp.com/ (accessed on 9 April 2021)) for providing their software and tools used for data collection.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
We adopt a single-channel method to estimate LST based on Band 10 (Thermal infrared (TIR1), resampled from 100 m to 30 m, 10.60-11.19 µm) of Landsat-8 observations (path: 170, row: 045). The single-channel method is advantageous for LST estimations due to its simplicity [103,104] and accurate LST retrieval [105] [106]. In situ measurements were carried out between 3 September and 9 September 2020. However, during this period there were no overpasses of Landsat-8 over the Area of Interest (AOI). We thus relied on four Landsat-8 overpasses over the area on 12 August, 28 August, 13 September and 19 September 2020 ( Figure A1). All scenes had less than 5% cloud coverage. We used Landsat 8 Collection 1 Tier 1 DN, representing scaled, calibrated at-sensor radiance values (available in Google Earth Engine (GEE)). Tier 1 includes Level-1 Precision Terrain (L1TP) processed data with well-defined radiometry, inter-calibrated across different Landsat sensors. The full methodology to convert the raw reflectance value in Band 10 (B10) of each scene into a TOA spectral radiance and then to LST is described in this appendix. Similar to [50], we converted the raw reflectance value in Band 10 (B10) of each scene into a TOA spectral radiance (radiance measured by the sensor). We extracted the radiance "multiplicative" and the radiance "add" rescaling factors from the metadata of each scene using this expression: TOA (L) = (ML * DN) + AL.
where TOA (L) is the Top of Atmosphere radiance of B10, ML is the multiplicative rescaling factor of B10 in a given image, DN is the reflectance value of B10 and AL is the additive rescaling factor of B10 in a given image. We determined LST using the following expression: where ψ1, ψ2, and ψ3 can be derived as a function of total atmospheric water vapor content (w) using the following matrix approximation: Where w is the water vapor and coefficients c ij are derived from [105], as follows: We retrieved water vapor content information from the National Centers for Environmental Prediction (NCEP, formerly NMC) and the National Center for Atmospheric Research (NCAR) (NCEP/NCAR) (also available in GEE). We retrieved surface water vapor information for the corresponding date of each Landsat scene (four observations per day) and calculated the average water vapor over the four observations per day. We calculated emissivity (ε) according to the following expression [105]: where Fractional Vegetation Cover (FVC) is calculated as FVC = (NDV I − NDV I min ) (NDV I max + NDV I min ) 2 . (A6) where ε is the surface emissivity of vegetated (ε veg ) and soil (ε nonveg ) surfaces. Like [107], we assumed soil and vegetation emissivities of 0.97 and 0.99, respectively. We converted LST values form Kelvin to degrees Celsius. (A6) where is the surface emissivity of vegetated ( ) and soil ( ) surfaces. Like [107], we assumed soil and vegetation emissivities of 0.97 and 0.99, respectively. We converted LST values form Kelvin to degrees Celsius. Figure A1. A visual comparison between the measured LST from the four satellite overpasses on 12 August, 28 August, 13 September and 29 September 2020.

Appendix B
PET is based on the heat-balance equation for the human body [67]: where M is the metabolic rate (internal energy production by oxidation of food), W is the physical work output, R is the net radiation of the body, C is the convective heat flow, ED is the latent heat flow to evaporate water into water vapor diffusing through the skin (imperceptible perspiration), ERe is the sum of heat flows for heating and humidifying the inspired air, ESw is the heat flow due to evaporation of sweat, and S is the storage heat flow Figure A1. A visual comparison between the measured LST from the four satellite overpasses on 12 August, 28 August, 13 September and 29 September 2020.