Google Earth Engine Open-Source Code for Land Surface Temperature Estimation from the Landsat Series

: Land Surface Temperature (LST) is increasingly important for various studies assessing land surface conditions, e.g., studies of urban climate, evapotranspiration, and vegetation stress. The Landsat series of satellites have the potential to provide LST estimates at a high spatial resolution, which is particularly appropriate for local or small-scale studies. Numerous studies have proposed LST retrieval algorithms for the Landsat series, and some datasets are available online. However, those datasets generally require the users to be able to handle large volumes of data. Google Earth Engine (GEE) is an online platform created to allow remote sensing users to easily perform big data analyses without increasing the demand for local computing resources. However, high spatial resolution LST datasets are currently not available in GEE. Here we provide a code repository that allows computing LSTs from Landsat 4, 5, 7, and 8 within GEE. The code may be used freely by users for computing Landsat LST as part of any analysis within GEE. data available temporal resolution and degrees spatial resolution.


Introduction
Land Surface Temperature (LST) is an important component of the Earth's energy budget, closely linked to the partitioning between sensible and latent heat fluxes. As such, high-resolution satellite-derived LST is increasingly used in various applications related to the assessment of land surface conditions, including mapping the urban extent and the intensity of urban micro-climates, estimating high-resolution evapotranspiration for the management of water resources and assessing vegetation stress [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15].
The Landsat series of satellites have the potential to provide LST estimates at a high spatial resolution that are particularly appropriate for local and small-scale studies. Numerous LST algorithms for the Landsat series have been proposed [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]. While most algorithms are simple to implement, they require users to provide the necessary input data and calibration coefficients, which are generally not readily available. Some datasets are available online (e.g., [25,26]); however, they generally require users to be able to handle large volumes of data. Google Earth Engine (GEE; [31]) is an online platform created to allow remote sensing users to easily perform big data analyses without the need for computation resources. All Landsat Level-1 and 2 data are directly available to GEE, including  11 April 2013 to present Note: 1 low gain band (B6_VCID_1); 2 resampled to 30 m.
The SR data for each Landsat are also available in GEE, as provided by the USGS. The SR data from Landsat 8 are generated using the Land Surface Reflectance Code (LaSRC) algorithm, where atmospheric correction is performed using a radiative transfer model, auxiliary atmospheric data from MODIS, and makes use of the coastal aerosol band for aerosol inversion tests [44]. For Landsat 4-to-7, SRs are derived with the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm, which calculates the radiative transfer for atmospheric data from MODIS and NCEP.
Cloud coverage information, including cloud shadowing, can be retrieved from the quality assessment band (BQA), which is also made available by the USGS through GEE.

Atmospheric Data
Information on the atmospheric water vapor content is required to better account for atmospheric contributions in the TIR observations. Total Column Water Vapor (TCWV) values from NCEP/NCAR reanalysis data are available on GEE [36]. Currently, the NCEP/NCAR reanalysis is the only global TCWV dataset available on GEE that covers the full period of operations of the Landsat series. The TCWV data are available globally from 1948 to the present, with a six-hourly temporal resolution and 2.5 degrees spatial resolution. TCWV data from the two closest analyses (available at 00, 06, 12, and 18 UTC) are linearly interpolated to each Landsat image acquisition time. Only the NCEP/NCAR grid-boxes containing the respective Landsat pixels are considered.

Surface Emissivity
The LST retrieval algorithm used here requires prescribed values of surface emissivity. As in the operational Landsat LST product [25], we derive the surface emissivity for our LST retrievals from the ASTER GEDv3 dataset developed by JPL [37]. This dataset includes the emissivity for the five ASTER bands in the TIR region and is derived with a Temperature-Emissivity Separation (TES) algorithm from all clear-sky ASTER images acquired between 2000 and 2008. The emissivity data have a reported accuracy of~0.01 and are provided with a spatial resolution of 100 × 100 m 2 [45]. The emissivity data are adjusted to match Landsat's thermal bands using the coefficients provided by [25].
The ASTER GEDv3 dataset corresponds to the average of all retrievals performed over an eight-year period; therefore, emissivity variations over time, e.g., due to annual and inter-annual variations in vegetation density, need to be accounted for. Following [25], we apply a vegetation adjustment using Landsat derived NDVI and mean ASTER GEDv3 NDVI. The NDVI is commonly used to derive the fraction of vegetation cover (FVC) [25,26,37,46]. Here we use the relationship proposed by Carlson and Ripley [46]: where NDVI bare and NDVI veg are the NDVI values of completely bare and fully vegetated pixels, respectively. Following previous studies (e.g., [17,[47][48][49][50]), these two threshold values are set to NDVI bare = 0.2 and NDVI veg = 0.86. Although some studies use NDVI veg = 0.5, Jiménez-Muñoz et al. [50] showed that for high-resolution images, the values between 0.8 and 0.9 are more appropriate. Pixels with NDVI below NDVI bare are considered completely bare, while pixels with NDVI above NDVI veg are considered fully vegetated. Emissivity values over vegetated areas at any given time, may then be derived using the Vegetation-Cover method [51,52], which is defined as: where ε b,veg and ε b,bare are the emissivity of vegetation and bare ground for a given spectral band b.
The emissivity of vegetated surfaces typically shows relatively small variations in the TIR region and, therefore, this value is prescribed to ε veg = 0.99 [53]. For each pixel, the effective emissivity (ε b ) for each Landsat band is derived as follows [25]: (1) ASTER FVC is derived from NDVI using Equation (1); (2) The bare ground emissivity (ε b,bare ) for each ASTER band is derived from the original ASTER emissivity (ε b ) and the corresponding ASTER FVC, using Equation (2) with the prescribed value of ε b,veg = 0.99; (3) The bare ground emissivity for each Landsat TIR band (ε b,bare ) is derived from the ASTER bare ground emissivity using the spectral adjustments provided in Table II  For practical purposes, the possibility to directly use the ASTER emissivity corrected for the Landsat TIR band is also available in the code. Users may then choose whether they wish to apply the NDVI-based correction or use the instantaneous ASTER emissivity values.

SMW Algorithm
LST is computed with the SMW algorithm developed and used by CM-SAF to derive LST data records from the MFG and MSG series of satellites. The approach is based on an empirical relationship between TOA brightness temperatures in a single TIR channel and LST and utilizes simple linear regression [35,[54][55][56]. The model consists of a linearization of the radiative transfer equation that maintains an explicit dependence on surface emissivity: where Tb is the TOA brightness temperature in the TIR channel, and ε is the surface emissivity for the same channel. The algorithm coefficients A i , B i ; and C i are determined from linear regressions of radiative transfer simulations performed for 10 classes of TCWV (I = 1, . . . ,10), ranging from 0 to 6 cm in steps of 0.6 cm, with values of TCWV above 6 cm being assigned to the last class. Unlike for the analogous algorithms developed by CM-SAF for MFG and MSG, here the coefficients are not stratified by satellite view angle, because the Landsat sensors have a considerably narrower the near-nadir field of view (FOV).

Algorithm Calibration
The calibration database used to obtain the coefficients of Equation (3) has been derived using a dataset of air temperature, water vapor, and ozone profiles compiled by Borbas et al. [57] and its procedure follows Martins et al. [57]. The Borbas dataset comprises over 15,000 profiles and includes ancillary variables such as surface pressure, spectral emissivity, skin temperature, and elevation. To achieve a uniform coverage of surface and atmospheric conditions in the calibration exercise, a sub-set of profiles is selected as follows: (1) We define classes of LST ranging from 200 K to 330 K in steps of 5 K, and classes of TCWV from 0 to 6 cm in steps of 0.3 cm. TCWV values above 6 cm are assigned to the last class. (2) The Borbas database is iterated to randomly attribute a single clear-sky profile to each TCWV/LST class. At each new iteration, profile selection is limited to those with a great-circle distance to already selected profiles greater than 15 degrees. This guarantees a more extensive geographical coverage of the calibration database. (3) For each of the selected profiles, surface conditions are varied to ensure a wide range of conditions are included in the database: following [57], LST is set with respect to air temperature (Tair), namely to the difference between LST and air temperature (LST-Tair) ranging from -15 K to +15 K in steps of 5 K. Surface emissivity values are varied between 0.9 and 0.99 in steps of 0.01.
The selection procedure leads to a set of 257 profiles. For each profile, seven values of LST and 10 values of emissivity are prescribed, which yields a total of 17,990 cases for the calibration exercise, with sample sizes varying between 1260 and 2450 for each TCWV class considered for the coefficient derivation (Section 2.2.1). The obtained coefficients are validated using the full Borbas dataset with the provided LST and emissivity values.
For both, the calibration and validation databases, the respective brightness temperatures that would be observed by the Landsat sensors are computed in the Radiative Transfer for the TIROS Operational Vertical Sounder (RTTOV) version 12 developed by the Numerical Weather Prediction Satellite Application Facility (NWP-SAF) [58].

Processing Chain
Landsat LST values are produced using the algorithm described in Section 2.2.1, with the coefficients determined with the calibration procedure described in Section 2.2.2 and the datasets Remote Sens. 2020, 12, 1471 6 of 21 presented in Section 2.1. The production chain was fully coded in JavaScript using the Code Editor Platform. The code is available from the GEE or Git repositories, respectively: • https://code.earthengine.google.com/?accept_repo=users/sofiaermida/landsat_smw_lst • https://earthengine.googlesource.com/users/sofiaermida/landsat_smw_lst Figure 1 illustrates the processing chain for generating Landsat LSTs. The users first provide the date range, the Landsat satellite (4, 5, 7, or 8), the region of interest to process, and an NDVI flag indicating if they wish to apply the NDVI-based correction to emissivity. Based on this information, the main module, Landsat_LST, loads the respective collections of TOA brightness temperatures (BT) and Surface Reflectance (SR). A cloud mask is applied to both using the quality information bands (module cloudmask). For each TOA BT image, the two closest TCWV NCEP analysis times are selected and interpolated to the Landsat observation time (NCEP_TPW). The SR data are, in turn, used to compute NDVI (compute_NDVI), which is then converted to FVC values as described in Section 2.1.3 (compute_FVC). These FVC values are then used together with previously computed ASTER emissivity values for bare ground (ASTER_bare_emiss) to obtain the corresponding Landsat emissivity (compute_emissivity). Finally, the SMW algorithm is applied to the TOA TB of the Landsat TIR band (SMWalgorithm); the algorithm coefficients are mapped onto the Landsat image based on TWVC from NCEP and taking the classes defined in Section 2.2.1. The code and the LUT containing the coefficients are available to the users in the indicated repository (module SMW_coeffcients). In the same repository, we provide two scripts with examples demonstrating the usage of the modules for spatial and temporal analysis.

In situ Data
The quality of the LST data retrieved with the algorithm described above is evaluated by comparing the retrievals with in-situ LST from validation stations. In this study, we use six stations In the same repository, we provide two scripts with examples demonstrating the usage of the modules for spatial and temporal analysis.

In situ Data
The quality of the LST data retrieved with the algorithm described above is evaluated by comparing the retrievals with in-situ LST from validation stations. In this study, we use six stations from the SURFRAD network, two stations from the BSRN network, and three stations from the KIT network. Table 2 shows the location, land cover type, and period of observation of each station. The SURFRAD network was established in 1993 with the purpose of providing continuous long-term measurements of the surface radiation budget over the United States [38], and their data have been widely used in validation and quality assessment exercises of satellite LST [25,[59][60][61][62]. The BSRN is a global network of surface radiation stations managed by the Data and Assessments Panel from the Global Energy and Water Cycle Experiment (GEWEX) under the framework of the World Climate Research Program (WCRP) [39]. From the full BSRN network, we selected two stations that had the necessary variables and were located in sufficiently homogeneous areas. Both the SURFRAD and BSRN stations provide broadband hemispherical upwelling and downwelling infrared radiance measurements performed by pyrgeometers with an uncertainty of about 5 Wm −2 . The KIT stations are specifically dedicated to LST validation [40]. At these stations, Heitronics KT15.85 IIP radiometers (KT15; full-view angle of 8.3 • ) measure upwelling and downwelling radiances in the 9.6-11.5 µm spectral region, with an accuracy of 0.3 K in equivalent brightness temperature [40]. At the EVO and KAL sites, upwelling radiance is measured by two radiometers, one facing the ground and another a nearby tree. These two measurements allow the upscaling of the upwelling radiances to the satellite FOV [40,63,64]. Station measurements are resampled from the one-minute frequency to satellite observation time by averaging observations performed within ±3 min from the Landsat image acquisition time.

In situ LST Derivation
The pyrogeometers used at SURFRAD and BSRN stations provide broadband radiances in the infrared wavelength range of 4-50 µm. In the case of broadband measurements, the radiative transfer equation may be approximated by the Stefan-Boltzmann law: where L u and L d are the measured upward and downward longwave radiances, respectively, ε BB is the broadband emissivity of the surface, and σ SB is the Stefan-Boltzmann constant. Following Malakar et al. [25], the broadband emissivity of each site is derived from the ASTER GEDv3 product: where ε Ai , j = 10 − 14 denote the ASTER narrowband emissivities of bands 10 to 14, respectively.
To account for vegetation dynamics at the site, each ASTER band emissivity is first corrected using the actual Landsat FVC at the time of observation (see Section 2.1.3). The respective broadband value for each Landsat pixel is then obtained with Equation (5).
For the KIT sites, LST is derived from Planck's law using a simplified version of the radiative transfer equation [40]: where B λ (LST) represents Planck's function for wavelength λ, and ε λ is the respective surface emissivity. L λ,d is the downwelling radiance measured by the upward-facing radiometer. For GBB, the upwelling radiance, L λ,u , is directly obtained from the ground-facing radiometer. For EVO and KAL, L λ,u is a simple weighted average of the radiances measured by the ground and tree-facing radiometers, where the fractions of trees/shrubs and ground are used as weights [40,63,64]. To find adequate fractions of trees/shrubs for the Landsat's FOV, we use high-resolution flight imagery available on GEE. A perimeter of 30 m is defined around the stations, and trees/shrubs are identified manually, yielding values of 45% and 37% for EVO and KAL, respectively. For the GBB station, the emissivity of the site was determined by Goettsche et al. [65], yielding a value of 0.94 for the KT15 radiometer band. For EVO and KAL, in order to account for vegetation dynamics, we use the emissivity values derived for the Landsats, given that the TIR bands are spectrally close to that of the KT15 radiometer. Due to the narrow band of the KT15 radiometers, a central wavelength of 10.55 µm may be considered without significant loss of accuracy [40], and LST is obtained by inverting Plank's function.

Statistical Metrics
The obtained Landsat LST is assessed using robust statistics, as recommended by the Committee on Earth Observation Satellites Working Group (CEOS) on Calibration and Validation-Land Product Validation (LPV) Subgroup for the validation of Land Surface Temperature Products [66]. The so-called accuracy is given by the median error, which is the robust equivalent to the bias and computed as: where LST satellite and LST insitu are the satellite and in situ LSTs, respectively. The robust equivalent to the error standard deviation is the so-called precision, given by the median of absolute residuals: Remote Sens. 2020, 12, 1471 9 of 21 As an estimate of the total uncertainty of the product, we also compute the root mean squared error (RMSE): where N is the sample size.

Algorithm Calibration
Algorithm coefficients (A, B, and C of Equation (3)) are obtained by the ordinary least square fitting against the calibration database described in Section 2.2.2. The database is subdivided in classes of TCWV from 0 to 6 cm in steps of 0.6 cm, yielding a total of 10 sets of (A, B, C) coefficients. Due to the different spectral response functions of the sensors, the calibration must be performed for each Landsat separately. Figure 2 shows the coefficients obtained for the SMW algorithm and Figure 3 shows the LST error distribution of Equation (3) when applied to the validation database. The coefficients determined for the four Landsat sensors are very similar (Figure 2), which is to be expected given their nearly identical TIR bands. As generally seen in LST retrieval algorithms, the retrieval uncertainty increases with TCWV ( Figure 3), which is due to the increased atmospheric contribution of the TOA BT. This contribution also impacts the 95% confidence intervals of the coefficients, i.e., they become wider for higher values of TCWV ( Figure 2). For Landsat 4 and 7, there is a slight increase of bias with TCWV, with the LST being generally overestimated and biases reaching 0.9 K and 1.4 K, respectively, for very moist atmospheres ( Figure 3). For Landsat 5, the increase of the bias with TCWV is particularly high, reaching 2.5 K for TCWV values above 5.4 cm. For Landsat 8, the bias is negligible, and its error dispersion is also lower.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 21 coefficients determined for the four Landsat sensors are very similar (Figure 2), which is to be expected given their nearly identical TIR bands. As generally seen in LST retrieval algorithms, the retrieval uncertainty increases with TCWV ( Figure 3), which is due to the increased atmospheric contribution of the TOA BT. This contribution also impacts the 95% confidence intervals of the coefficients, i.e., they become wider for higher values of TCWV ( Figure 2). For Landsat 4 and 7, there is a slight increase of bias with TCWV, with the LST being generally overestimated and biases reaching 0.9 K and 1.4 K, respectively, for very moist atmospheres ( Figure 3). For Landsat 5, the increase of the bias with TCWV is particularly high, reaching 2.5 K for TCWV values above 5.4 cm. For Landsat 8, the bias is negligible, and its error dispersion is also lower.  . Boxes indicate the inter-quartile distance, red lines indicate the median, and whiskers indicate the 5% and 95% percentiles. Figure 4 shows an example of some of the layers generated by the LST production chain for each coefficients determined for the four Landsat sensors are very similar (Figure 2), which is to be expected given their nearly identical TIR bands. As generally seen in LST retrieval algorithms, the retrieval uncertainty increases with TCWV ( Figure 3), which is due to the increased atmospheric contribution of the TOA BT. This contribution also impacts the 95% confidence intervals of the coefficients, i.e., they become wider for higher values of TCWV ( Figure 2). For Landsat 4 and 7, there is a slight increase of bias with TCWV, with the LST being generally overestimated and biases reaching 0.9 K and 1.4 K, respectively, for very moist atmospheres (Figure 3). For Landsat 5, the increase of the bias with TCWV is particularly high, reaching 2.5 K for TCWV values above 5.4 cm. For Landsat 8, the bias is negligible, and its error dispersion is also lower.   Figure 4 shows an example of some of the layers generated by the LST production chain for each Landsat image; the data were acquired on the 17 th of May 2018 in a region in central Portugal close to  Figure 4 shows an example of some of the layers generated by the LST production chain for each Landsat image; the data were acquired on the 17 th of May 2018 in a region in central Portugal close to Coimbra city, located at approximately 40.2 • N and 8.4 • W. The region is characterized by extensive forest areas to the west of Coimbra and crop fields in the plains around the banks of the River Mondego. The crop fields are clearly visible in the RGB composite (Figure 4a) in brown tones and correspond mainly to maize, which is generally sown in April and harvested in August or September [67]. In the western part of the Mondego River (close to the river mouth), there are also extensive rice fields. Rice is sown around April/May, and the harvesting takes place around September/October [67]. Close to the seashore, there are also large forest areas, but a larger portion was burned by wildfires in October 2017. The scar of the burned area is also visible in the RGB composite ( Figure 4a)  needed to compute LST is derived from the ASTER GEDv3 dataset. Figure 4b shows the corresponding emissivity map of ASTER band 14, which approximates the Landsat TIR band best. ASTER NDVI values are used to compute the respective FVC, which is shown in Figure 4c. FVC and emissivity values are quite homogeneous over the selected area. FVC is generally quite low, with values between 0.3 and 0.5 for most of the region. The resulting bare ground emissivity map shows values around 0.96-0.97, which matches those typically found in spectral libraries for the 11 m region [53]. Figure 4d shows the respective bare ground emissivity for band 14 derived with equation (2) from the original emissivity values and the FVC. The FVC values for the Landsat image are also derived from the respective NDVI (Figure 4e). The instantaneous FVC map for Landsat shows considerably more variability than for ASTER. The cropland and burned areas show a clear pattern in the FVC map. Furthermore, the rice fields in the westernmost part of the river had very low NDVI values typically associated with water and, therefore, were masked out. The resulting TIR emissivity values for Landsat are displayed in Figure 4f. Areas with low vegetation density have values close to 0.97, while the emissivity values of densely vegetated areas reach values close to 0.99, both in agreement with typical values in spectral libraries [53]. This example emphasizes the importance of using vegetation cover to derive dynamic emissivity values. Figure 4g shows Landsat TOA brightness temperature values, and Figure 4h the corresponding LST values. For the daytime Landsat scene analyzed here, LST is higher over croplands and burned areas than over forests.

Validation with in situ LST
The quality of derived Landsat LST is assessed with in situ LST derived from twelve validation stations. Since in situ data were unavailable for Landsat-4, comparisons were only performed for Landsat 5, 7, and 8. We found a considerable number of Landsat LST values, much lower than the in situ estimates, which is probably due to cloud contamination. Besides producing outliers and yielding a bias, cloud contamination also affects NDVI and, therefore, emissivity estimation. Indeed, NDVI values over clouds are generally considerably lower, which results in lower FVC values and, consequently, lower emissivity; however, the dominant effect is the impact on BT. To assess the quality of the satellite LST with a minimum of cloud contamination, we applied a "3 -Hampel identifier" to remove potential outliers [68,69]. Figure 5 shows scatterplots between Landsat LST and in situ LST for each of the twelve sites. For each Landsat, the 3 limit is indicated by dashed lines, and the provided statistics are derived after the outliers' removal. Table 3 presents the corresponding statistics for each site (black figures-original data, grey figures-outliers removed).
The accuracy/bias ( ) is generally better than or close to 1 K, except for FPK, TBL, and GOB stations, with values varying between -1.2 and 1.3 K. The bias is highest at the GOB station but it is very low at GBB, which is located in the same area. Apart from FPK, TBL, and GOB, the RMSE values range between 1.4 K and 2.5 K. The lower RMSE values are found in KAL e DRA, and the best precision ( ) values are found in DRA, CAB, and KAL, which are all located in very homogeneous areas. There is no evident pattern when comparing between Landsats; Figure 3 suggests that Landsat 8 might perform slightly better for extremely moist atmospheres (essentially when total column water vapor reaches values above 5 cm), which, outside the tropics, is rare under clear skies. Without considering FPK, TBL, and GOB stations, the overall accuracy of the Landsat 5, 7, and 8 is 0.5 K, 0.0(3) K and 0.3 K, respectively, which meet the threshold accuracy proposed by Guillevic et al. [66]. The overall precision (RMSE) is 1.3 (2.0) K, 1.1 (2.0) K, and 1.0 (1.9) K for Landsat 5, 7, and 8, respectively. The first step in the LST production is to match TCVW estimates from NCEP reanalysis data to the Landsat observation time. However, due to the coarse resolution of the model, TCWV values are practically constant over the selected area and, therefore, are not shown. The TIR surface emissivity needed to compute LST is derived from the ASTER GEDv3 dataset. Figure 4b shows the corresponding emissivity map of ASTER band 14, which approximates the Landsat TIR band best. ASTER NDVI values are used to compute the respective FVC, which is shown in Figure 4c. FVC and emissivity values are quite homogeneous over the selected area. FVC is generally quite low, with values between 0.3 and 0.5 for most of the region. The resulting bare ground emissivity map shows values around 0.96-0.97, which matches those typically found in spectral libraries for the 11µm region [53]. Figure 4d shows the respective bare ground emissivity for band 14 derived with Equation (2) from the original emissivity values and the FVC. The FVC values for the Landsat image are also derived from the respective NDVI (Figure 4e). The instantaneous FVC map for Landsat shows considerably more variability than for ASTER. The cropland and burned areas show a clear pattern in the FVC map. Furthermore, the rice fields in the westernmost part of the river had very low NDVI values typically associated with water and, therefore, were masked out. The resulting TIR emissivity values for Landsat are displayed in Figure 4f. Areas with low vegetation density have values close to 0.97, while the emissivity values of densely vegetated areas reach values close to 0.99, both in agreement with typical values in spectral libraries [53]. This example emphasizes the importance of using vegetation cover to derive dynamic emissivity values. Figure 4g shows Landsat TOA brightness temperature values, and Figure 4h the corresponding LST values. For the daytime Landsat scene analyzed here, LST is higher over croplands and burned areas than over forests.

Validation with in situ LST
The quality of derived Landsat LST is assessed with in situ LST derived from twelve validation stations. Since in situ data were unavailable for Landsat-4, comparisons were only performed for Landsat 5, 7, and 8. We found a considerable number of Landsat LST values, much lower than the in situ estimates, which is probably due to cloud contamination. Besides producing outliers and yielding a bias, cloud contamination also affects NDVI and, therefore, emissivity estimation. Indeed, NDVI values over clouds are generally considerably lower, which results in lower FVC values and, consequently, lower emissivity; however, the dominant effect is the impact on BT. To assess the quality of the satellite LST with a minimum of cloud contamination, we applied a "3σ-Hampel identifier" to remove potential outliers [68,69]. Figure 5 shows scatterplots between Landsat LST and in situ LST for each of the twelve sites. For each Landsat, the 3σ limit is indicated by dashed lines, and the provided statistics are derived after the outliers' removal. Table 3 presents the corresponding statistics for each  site (black figures-    The accuracy/bias (µ) is generally better than or close to 1 K, except for FPK, TBL, and GOB stations, with values varying between -1.2 and 1.3 K. The bias is highest at the GOB station but it is very low at GBB, which is located in the same area. Apart from FPK, TBL, and GOB, the RMSE values range between 1.4 K and 2.5 K. The lower RMSE values are found in KAL e DRA, and the best precision (σ) values are found in DRA, CAB, and KAL, which are all located in very homogeneous areas. There is no evident pattern when comparing between Landsats; Figure 3 suggests that Landsat 8 might perform slightly better for extremely moist atmospheres (essentially when total column water vapor reaches values above 5 cm), which, outside the tropics, is rare under clear skies. Without considering FPK, TBL, and GOB stations, the overall accuracy of the Landsat 5, 7, and 8 is 0.5 K, 0.0(3) K and 0.3 K, respectively, which meet the threshold accuracy proposed by Guillevic et al. [66]. The overall precision (RMSE) is 1.3 (2.0) K, 1.1 (2.0) K, and 1.0 (1.9) K for Landsat 5, 7, and 8, respectively.

Discussion
This work presents a methodology to derive LST from the Landsat series with GEE. We implemented the single-channel empirical algorithm proposed by Duguay-Tetzlaff et al. [35] because it combines simplicity with reasonable accuracy. Furthermore, the CM-SAF used this algorithm to generate a climate data record of LST for the Meteosat series of satellites and, therefore, has been extensively analyzed. A similar algorithm (without explicit dependence on emissivity) is used by the Copernicus Global Land Service to generate LST from the Geostationary Operational Environmental Satellites (GOES) and the Multi-Function Transport Satellite (MTSAT) [55]; a detailed quality assessment of this product is performed continuously and available at https://land.copernicus.eu/global/products/lst. The algorithm uncertainty (Figure 3) obtained here are similar to those obtained by Duguay-Tetzlaff et al. [35] and Freitas et al. [55]. Landsat 5 shows the largest errors, followed by Landsat 7, 4, and 8-these discrepancies are related to the different spectral characteristics of the sensors. While the TIR bands of Landsat 4, 5, and 7 have similar widths, their spectral response functions differ slightly. Of the three satellites, the TIR signal measured by Landsat 5 has the highest weight around 12 µm, followed by Landsat 7 and 4. Since water vapor absorption/emission in the 12 µm spectral region is higher than in the 10-11 µm region, a larger contribution from the 12 µm region decreases the algorithm's retrieval performance. The TIR band of Landsat 8 is considerably narrower than for the other Landsats, and excludes the 12 µm region, which allows retrieving LST with better accuracy. Algorithms using both TIR bands of Landsat 8 (such as split-window algorithms) could provide better estimates of LST; unfortunately, a calibration problem of the TIRS sensor identified by USGS introduces large calibration uncertainty into the 12 µm band [70]. We confirmed this with independent Landsat 8 LST retrievals with a split-window algorithm, which showed a substantial degradation of retrieval quality over the GBB station (not shown).
The quality of the LST retrievals was further assessed in comparisons with in situ LST estimates from 12 stations obtained from the SURFRAD, BSRN, and KIT networks. The KIT network includes three stations dedicated to LST retrieval located on sites that are homogeneous at the satellite's pixel scale. At these stations, radiance measurements are performed by radiometers with spectral and directional characteristics similar to those typical of satellite sensors. SURFRAD and BSRN stations provide hemispherical measurements of broadband infrared radiative flux. Despite providing high-quality observations, several authors pointed out several limitations of the SURFRAD measurements for validating satellite LST, most importantly, a lack of spatial representativeness [59,62]. The BSRN sites, CAB and GOB, are located in highly homogeneous areas and have been previously used for validating satellite-derived LST (e.g., [70][71][72]).
In order to study and better understand the impact of land cover heterogeneity on the LST comparisons, we obtained Google high-resolution images for each site ( Figure 6). Within GEE, circles with 30 m and 100 m radius were superimposed on the images. The 30 m circle is used to represent the resampled spatial resolution of the Landsat's TIR bands, while the 100 m circle approximates their original resolution. The 30 m circles correspond to the geometries used in GEE to extract the Landsat LST time-series used in the validation exercise. It should be noted that these geometries are not meant to represent the actual shape of the Landsat pixels. In fact, the actual areas of the circles (with widths of 60 m and 200 m) are larger than both the resampled and original pixel areas. Since the actual position of Landsat pixels varies between overpasses and to account for geolocation uncertainties, we've selected a slightly larger area. For some stations, these areas are slightly distanced from the station location in order to encompass more homogeneous surfaces. For instance, at BND shifting the circle slightly to the northeast ensures that most Landsat signal is coming from the grass patch where the station is located (Figure 6a).

SURFRAD Stations
Statistics at BND, SXF, PSU, and GWN suggest good quality of the LST retrievals and agree well with those obtained by Malakar et al. [25]. Some overestimation at higher LST values is noticeable (abovẽ 305 K; Figure 5). The algorithm validation exercise suggests that the SMW slightly overestimates high LST values for Landsat 5 and 7 and slightly underestimates high LSTs for Landsat 8 (not shown). However, the same bias is not observed at DRA, GBB, or KAL stations where even higher LST values are observed. This suggests a likely underestimation of surface emissivity during the warm season when LSTs are higher. Although the same method is used to obtain Landsat and in situ emissivity, the impact of emissivity uncertainties on LST estimates is different in each case. Empirical LST retrieval algorithms, such as the SMW are generally more sensitive to emissivity errors than methods that derive in situ LST directly with Planck's or Stefan-Boltzmann law. Some authors found large biases when validating satellite-derived LST with in situ LST from DRA station (e.g., [25,59,62,73]). In contrast, we found good agreement between Landsat LST and DRA station LST, with bias values within ±0.5 K. DRA station is located in a desert area with sparse shrub cover. This seems to indicate that the station is more representative of Landsat's spatial scale than satellite sensors with a lower spatial resolution, e.g., MODIS. However, for Landsat, Malakar et al. [25] obtained a large positive bias over DRA. Looking at Figure 13 in Malakar et al. [25], we found that the area we selected for the time-series retrieval is slightly north of the respective area used by Malakar et al. Figure 7 shows LST values from Landsat-8 at DRA averaged for the years 2017 to 2019. The LST values extracted from our selected area (black circle) are on average 1 K lower than the corresponding LST extracted slightly further south (purple circle), which might explain the lower bias.
At the FPK station, the biases are slightly larger (1.3 K, 1.9 K, and 1.8 K for Landsat 5, 7, and 8, respectively) than at most SURFRAD stations ( Table 3). The bias seems to be mainly caused by LST observations above 300 K. As mentioned above, this overestimation at high LSTs could be related to an underestimation of emissivity during the warm season. We also obtain large biases at TBL station (2.3 K, 2.1 K, and 1.6 K for Landsat 5, 7, and 8, respectively). Analyses of the average LST retrieved from 2017 to 2019 with Landsat-8 (Figure 7), indicate that there are strong LST variations (almost 10 K) around the station, which are likely due to differences in soil and vegetation. The area selected to extract the Landsat LSTs (black circle) generally has slightly lower LST values than the area directly around the station (purple circle). However, the bias values are still quite high (Table 3), and the high spatial heterogeneity at this small scale makes it difficult to select a single representative area for the station.

BSRN Stations
The bias obtained at GOB station is quite high (2.3 K and 1.9 K for Landsat 7 and 8, respectively), while at GBB, which is located in the same area, the bias is much lower (0.1 K, -0.1 K, and 1.1 K for Landsat 5, 7, and 8, respectively). At GOB, the downwelling radiance sensor is located near the Gobabeb Research and Training Center, while the upwelling sensor is located approximately 7 km northeast. Since the upwelling radiance is most relevant for LST derivation, we've extracted the LST times-series for the Landsat at the location of the upwelling sensor. Nevertheless, the distance between the two sensors may introduce additional uncertainty into the estimation of the in situ LST.

KIT Stations
Landsat LST shows good agreement with the KIT stations. While EVO and KAL are very homogeneous in terms of land cover, they are heterogeneous when considering the different surface elements. Therefore, a proper upscaling of their in situ LSTs is required [40,63,64]. Given Landsat's high spatial resolution, we assume that the temperature contrasts between trees/shrubs and sunlit ground are most relevant and neglect possible shadowing effects. This assumption may lead to a slight increase in the in situ LST uncertainty. Also, the upscaling methodology is sensitive to the prescribed fraction of trees/shrubs, particularly at EVO, where LST contrasts between tree crowns and sunlit ground are the highest. Ermida et al. [64] reported that a 5% error in the fraction of tree crown cover could lead to an additional LST uncertainty up to 1.4 K in summer.
The GBB site has been carefully characterized in order to provide very accurate LST estimations for validation exercises. The instrument characteristics are similar to those generally seen in TIR satellite sensors and therefore is likely to provide LST estimates closer to the satellite estimates. Also, the emissivity of the site was obtained experimentally, increasing the accuracy of the in situ LST estimates.
When comparing the three Landsats, we were not able to identify a clear pattern of one sensor outperforming the other. Although we found a slightly poorer algorithm performance for Landsat-7, the general statistics do not indicate a lower LST retrieval quality for this sensor. This may be due to the higher spatial resolution of Landsat 7, which partially compensates the lower radiometric accuracy by reducing problems associated with site heterogeneity. However, when considering only the most homogeneous and stable stations GBB, CAB, and GOB, Landsat-7 LSTs show higher RMSE values that are suspected to be related to limitations in the atmospheric correction. Moreover, the algorithm performance is only slightly reduced for extremely moist atmospheres (Figure 3; TCWV above 5 cm), which rarely occur under clear skies outside the tropics.
It was not possible to validate Landsat-4 LST retrievals since in situ data are unavailable for its period of operation. However, given the nearly identical instrument characteristics of Landsat 4 and 5, we expect the quality of the Landsat-4 LST retrievals to closely resemble that of Landsat-5.

Conclusions
High-resolution LST estimates are increasingly used in small-scale studies. The Landsat series of satellites is particularly appropriate for such studies as it provides imagery at a high spatial resolution that is suitable for a broad range of applications. Although various algorithms for deriving Landsat LST have been proposed, such data are not easily available and generally lack consistency for the different Landsat satellites. Furthermore, the existing Landsat LST datasets require users to store the involved data locally, and the data processing and analysis may be computationally demanding. Making use of the computational resources of Google's servers, the new GEE online platform allows remote sensing data users to easily analyze large amounts of data. This study provides a GEE repository with all the code necessary to derive LST from Landsat. Furthermore, users are not required to download any data: simply by accessing the code, they can perform all their analyses within GEE. Users can also edit the code to better meet their needs, and improvements may be implemented over time. The GEE repository is publicly available at: https://code.earthengine.google.com/?accept_repo=users/sofiaermida/landsat_smw_lst The LST values are estimated using the SMW algorithm developed by CM-SAF. For each Landsat, the algorithm coefficients were derived using the same calibration database, thereby ensuring consistency between the satellites. All inputs to the algorithm are obtained from the GEE catalog, i.e., the water vapor content from NCEP/NCAR reanalysis data and emissivity from the ASTER GEDv3 dataset with an NDVI-based correction for vegetation dynamics. A validation exercise with in situ LST obtained from 12 stations indicated an overall accuracy of 0.5 K, -0.1 K, and 0.2 K and overall RMSE of 2.0 K, 2.1 K, and 2.1 K for Landsat 5, 7, and 8, respectively.
As the code is available publicly, future improvements to this methodology may be implemented based on user feedback.

Conclusions
High-resolution LST estimates are increasingly used in small-scale studies. The Landsat series of satellites is particularly appropriate for such studies as it provides imagery at a high spatial resolution that is suitable for a broad range of applications. Although various algorithms for deriving Landsat LST have been proposed, such data are not easily available and generally lack consistency for the different Landsat satellites. Furthermore, the existing Landsat LST datasets require users to store the involved data locally, and the data processing and analysis may be computationally demanding. Making use of the computational resources of Google's servers, the new GEE online platform allows remote sensing data users to easily analyze large amounts of data. This study provides a GEE repository with all the code necessary to derive LST from Landsat. Furthermore, users are not required to download any data: simply by accessing the code, they can perform all their analyses within GEE. Users can also edit the code to better meet their needs, and improvements may be implemented over time. The GEE repository is publicly available at: https://code.earthengine.google.com/?accept_repo=users/sofiaermida/landsat_smw_lst. The LST values are estimated using the SMW algorithm developed by CM-SAF. For each Landsat, the algorithm coefficients were derived using the same calibration database, thereby ensuring consistency between the satellites. All inputs to the algorithm are obtained from the GEE catalog, i.e., the water vapor content from NCEP/NCAR reanalysis data and emissivity from the ASTER GEDv3 dataset with an NDVI-based correction for vegetation dynamics. A validation exercise with in situ LST obtained from 12 stations indicated an overall accuracy of 0.5 K, -0.1 K, and 0.2 K and overall RMSE of 2.0 K, 2.1 K, and 2.1 K for Landsat 5, 7, and 8, respectively.
As the code is available publicly, future improvements to this methodology may be implemented based on user feedback.