The proposed methodology comprises five processing steps: (1) data calibration, (2) extraction of contributing factors, (3) estimation of temperature and emissivity, (4) validation of temperatures, and (5) modeling of the SUHI phenomenon. These are described in the following sections.
2.3.2. Definition and Extraction of Contributing Factors
Spectral indices such as NDBI, NDVI, and NDWI are used to examine the underlying properties of SUHI formation. Analytical expressions of these indices can be found in Zha et al. [
44], Tucker [
45], and Gao [
46]. Moreover, the components of the tasseled cap (TC) components (brightness, greenness, and wetness) are also computed [
47]. The rationale for the selection of these biophysical indices is as follows.
Energy exchange between latent and sensible heat is related to NDBI, since it detects impervious surfaces that reduce humidity and increase the average temperature of the environment [
48].
Temperature and vegetation maintain a spatially dependent relationship [
49]. Vegetation reduces surface irradiation and increases humidity through physiological processes that allow energy exchange, while producing a cooling effect. In this sense, an index for measuring this photosynthetic activity is the NDVI.
The presence of water bodies has a cooling effect on urban temperature [
50]. In this scheme, the NDWI quantifies the water content in the vegetation, while suggesting a significant effect in reducing SUHI. Likewise, rivers play an important role as thermal regulators of urban climate, increasing the cooling potential through evaporation and facilitating airflow. Given that the urban center is the main point for the development of socioeconomic activities, two additional variables were considered to describe the expression of the proximity, i.e., the proximity map of the water body (PW) and the proximity map (PW) and the city center (PUC). A greater distance would imply a lower thermal intensity [
51]. The proximity indices are computed by means of a Euclidean distance using the inverse weight distance operator in ArcGIS
® (
https://esri.com/, accessed on 20 October 2021).
The above indices conform to the contributing factors to our proposed SUHI model. To compute the emissivity values required to retrieve LST from Landsat thermal bands, a novel method is proposed through extracting F
cover biophysical variable, although this information can be derived indirectly from NDVI, Leaf area index (LAI), or other biophysical variables [
52,
53,
54]. Bacour et al. [
55] proposed a robust procedure based on the Neural Network training of the PROSAIL (PROSPECT leaf optical properties model and SAIL canopy bidirectional reflectance) model. This F
cover variable is implemented in the ESA’s Sentinel Application Platform (SNAP (
https://step.esa.int/main/toolboxes/snap/, accessed on 20 October 2021), and requires S2-MSI images. Detailed descriptions of this scheme are available in Weiss and Baret [
56]. The F
cover variable provides the emissivity values necessary to compute LST with the L8OLI/TIRS thermal band 10. Compared to traditional methods based on NDVI, this new approach for extracting the emissivity is suitable for thermal radiation models. Due to temporal synchronization between S2-MSI and L8OLI/TIRS images, this method is only applicable since 2015.
2.3.3. Estimation of Land Surface Temperature and Emissivity
Land surface temperatures are retrieved from L5TM, L7ETM+, and L8OLI/TIRS. For L8OLI/TIRS, only band 10 is used, since band 11 has large uncertainties, as reported by the USGS [
57]. The consistency of Landsat 5, 7 and 8 satellite thermal instruments in recovering LST was compared by Sekertekin and Bonafoni [
58] and validated with in situ LST measurements. The RMSE values were 2.39 °C, 2.57 °C and 2.73 °C, respectively, resulting in an average difference of 0.2 °C between the sensors. The uncertainty values are adequate uncertainty for the purpose of this study. In
Figure 3, our model to retrieve LST is presented in a flow chart. Temperatures are derived using the radiative components implemented by Barsi et al. [
59] for single-channel algorithms. This method simulates the attenuation effects of the atmosphere that disturb the TIR signal.
Radiance and transmissivity values are available at
https://atmcorr.gsfc.nasa.gov/, accessed on 20 October 2021. The data is a compendium of atmospheric transmissivity values, along with upwelling and downwelling radiances for a given geographical location. The radiative values can be used in atmospheric correction models, e.g., Equation (1), taking also into account the correction of spectral emissivity.
In this equation, is the spectral radiance at the top of the atmosphere (registered by the sensor), is the atmospheric transmittance, is the spectral emissivity, is the spectral radiance of a black-body target of kinetic temperature T, and are the upwelling atmospheric path radiance and the downwelling or sky radiance, respectively.
Implementing Equation (1) requires the supply of adequate emissivity values for a suitable estimation of LST. Since different land covers emit thermal radiation differently, spectral emissivity corrections are necessary [
60]. In this work, three emissivity models are tested to accurately estimate LST. First, the field-measured LSE (land surface emissivity) values are obtained from different authors, and are listed in
Table 1. Then the emissivity data of the ASTER-GEDv3 product [
61] were considered.
Then, the F
cover model of Valor and Caselles [
40] is applied; this model allows the calculation of the emissivity in the Landsat 8 thermal band, considering the F
cover index and the minimum and maximum values of the emissivity in the corresponding spectral band. Finally, the three LST models are compared and validated. In this study, land use features are categorized into seven classes. These are water bodies, cropland, forest, low vegetation, bare soil, urban/densely built, and suburban/medium built. We applied this scheme following the land cover classes proposed by Park et al. [
62]. Since impervious surfaces exhibit a large spectral variation [
63], two classes are used to represent artificial surfaces: urban/dense and suburban/medium. These classes are particularly identified by the impact on emissivity.
Table 1.
Reference values for the LSE model with L8OLI/TIRS band 10.
Table 1.
Reference values for the LSE model with L8OLI/TIRS band 10.
Land Cover | Emissivity | Reference |
---|
Waterbodies | 0.992 | FROM-GLC cited by [64] |
Cropland | 0.971 | FROM-GLC cited by [64] |
Forest | 0.995 | FROM-GLC cited by [64] |
Low vegetation | 0.986 | Tan et al. [65] |
Soil | 0.972 | Tan et al. [65] |
Urban/densely built | 0.973 | FROM-GLC cited by [64] |
Suburban/medium built | 0.971 | Tan et al. [65] |
The second emissivity dataset in this work is the ASTER Global Emissivity Dataset v3 (ASTER-GEDv3) [
61], (
https://emissivity.jpl.nasa.gov/aster-ged, accessed on 20 October 2021). This method was developed by the NASA Jet Propulsion Laboratory (JPL) as an algorithm based on temperature and emissivity separation along with an atmospheric correction model. More details can be found in Hulley and Hook [
66].
The third emissivity model requires knowledge of the F
cover variable [
40]. This method provides the emissivity of a heterogeneous surface as follows:
In this equation,
and
are reference vegetation and bare soil emissivity, respectively. ‘
’ is the cavity effect associated with the indirect radiance emitted due to internal reflections between the interfaces. Here, F
cover is obtained from S2-MSI Level 2A products (see
Section 2.3.2). The procedure to retrieve the F
cover variable differs from the NDVI methods [
67,
68], and is presented as a novel alternative for thermal modeling with Landsat data. In tropical areas, throughout the year, vegetation dynamics does not exhibit abrupt changes, and this implies that F
cover lacks significant seasonal variations. For the L5TM and L7ETM+ thermal instruments, the emissivity model of Equation (3) [
69] is used. This last method to obtain F
cover is based on the NDVI parameter.
In this equation,
and
are the reference emissivity values for nonvegetated and vegetated areas, being 0.97 and 0.99, respectively [
70]. In this work, the F
cover variable is recovered using the NDVI, as it effectively reflects the conditions of vegetation cover [
42]. This is estimated by Equation (4).
In this equation, is the NDVI value of pure soil, and is the NDVI of pure vegetation obtained from the NDVI image.
This method is based on the Carlson and Ripley [
69] model. Finally, the conversion from LTOA to LST is estimated by using the constants for sensor calibration and the inversion of the Planck equation [
71].
2.3.5. Modelling the SUHI Phenomenon
According to Rasul et al. [
72], SUHI modeling consists of identifying the spatial variation in time of thermal features in urban areas. Here, through the combination of thermal images from remote sensing and sparse measurements on field, our SUHI model employs the PCA to analyze space-time data. The PCA is a multivariate statistical technique that preserves the total variance of a dataset while reducing its dimensionality [
73]. In this way, the PCA can retrieve the main spatial patterns of variability in a time-series. The application of PCA provides a generalization of the changes that characterize the variability patterns in a time series of images [
18].
Then, the impacts of the eight factors considered in this study are assessed using a MLR approach. The MLR technique is a parametric model that adjusts the relationship between explanatory variables, that is, the contributing factors, and the response variable, e.g., LST. The inclusion or elimination of predictors depends on the significance of these variables within the model, which is defined by a test hypothesis based on the coefficients associated with the response variable. When using MLR techniques, it is important to examine the key assumptions of autocorrelation, normality of residuals, and multicollinearity. These factors determine the reliability of the model [
74]:
Finally, outliers can also alter the modelling approach, causing problems with regression assumptions [
77,
78], and these must be controlled or removed from the dataset. Here, our MLR analysis is an equation capable of describing the thermal intensity depending on the contributing factors. To verify the relative importance of each individual predictor of the LST model, a normalization procedure was previously performed to standardize the coefficients. We use the deviation of the mean values, which is divided by the standard deviation of the response variable in LST. This allows us to derive the standardized coefficients [
79]. Subsequently, the contribution of each variable to LST is obtained by weighting the absolute value of each variable. The resulting weights are further used for assessing the subsequent Machine Learning procedure that derives the multitemporal intensity of the SUHI model. This provides a technical basis for analyzing the factors that influence the thermal environment, which is of great significance for rational urban planning and sustainability.
The methodological workflow in
Figure 5 shows the spatiotemporal model followed to characterize the impact of environmental factors on the thermal changes. First, the multitemporal factors, such as LST, spectral indices, and other variables, are derived from the Landsat 2001–2020 dataset. Then, the PCA technique is applied to extract the main patterns of variability. Subsequently, all the variables involved are included in the MLR scheme to model the possible dependences on LST. The MLR is implemented with the software R Studio (
https://rstudio.com/, accessed on 20 October 2021).
Finally, the SUHI phenomenon is segmented into different zones depending on the thermal intensity. Thermal value ranges follow the categories of Wang et al. [
36], which consider the average temperature of the land surface and its standard deviation (SD). Segmentation provides a definitive SUHI product that categorizes the urban environment according to specific conditions. Here we test two different Machine Learning methods for classification; Support Vector Machine (SVM) and Naïve Bayes Machine Learning (NBML). Both SVM and NBML methods have shown in previous research their robustness for the characterization of various types of geospatial data [
80,
81]. The SVM method defines a separate hyperplane in a higher-dimensional space that optimally classifies the data. This method is particularly useful for solving nonlinear relations [
82], and is available as open-source software in Orfeo ToolBox (OTB) at
https://www.orfeo-toolbox.org/, accessed on 20 October 2021. The NBML technique is based on the Bayes theorem for conditional probability and assumes independence between predictors, variables, or features [
83].
NBML is often referred to as the maximum a posteriori decision rule [
84], and its code can be easily written in any programming language. NBML assigns the most likely class to a certain observation by estimating the probability density of the training classes [
85]. An observation is classified as a certain class when the posterior probability reaches the maximum value according to the following expression:
In this equation,
is the maximum a posteriori of
for the class labeled as
,
is the prior probability for class
represents the conditional probability distribution of
given
, and
is a particular weight applied to each factor. Usually, the independence assumption is not fulfilled, and the weighting of the features involved in the assignment process can satisfy the required assumptions [
86]. Here, each feature or factor is affected by a particular weight
, which can be formally defined by:
In this equation,
denotes the weight value of the
ith attribute, with values restricted to the range [0, 1]. In this work, attributes are the contributing factors involved in the SUHI phenomenon, while the
classes are the seven temperature categories defined by Wang et al. [
36]. These are described in
Table 2.
The prior
and conditional probabilities
are determined through a training process. Then, Equation (6) becomes:
In this equation, and are estimates of the probabilities density functions (PDFs). These are derived from the frequency of their respective arguments in the training sample. Here, can also be estimated from a preliminary outcome of a SVM process.
Equation (7) allows us to weight each environmental factor to generate the final SUHI product. The resulting map is generated according to the architecture shown in
Figure 6, which is based on the NB decision rule. This approach categorizes the urban environment according to a specific condition and assigns a specific type of action based on each temperature category. This analytical procedure allows one to obtain a map that delimits the areas of different thermal intensities. The resulting areas are based on the spatiotemporal trends of the contributing factors, facilitating the management and application of measures to mitigate/adapt the SUHI phenomenon.