Land Surface Temperature Retrieval from Landsat 5, 7, and 8 over Rural Areas: Assessment of Di ﬀ erent Retrieval Algorithms and Emissivity Models and Toolbox Implementation

: Land Surface Temperature (LST) is an important parameter for many scientiﬁc disciplines since it a ﬀ ects the interaction between the land and the atmosphere. Many LST retrieval algorithms based on remotely sensed images have been introduced so far, where the Land Surface Emissivity (LSE) is one of the main factors a ﬀ ecting the accuracy of the LST estimation. The aim of this study is to evaluate the performance of LST retrieval methods using di ﬀ erent LSE models and data of old and current Landsat missions. Mono Window Algorithm (MWA), Radiative Transfer Equation (RTE) method, Single Channel Algorithm (SCA) and Split Window Algorithm (SWA) were assessed as LST retrieval methods processing data of Landsat missions (Landsat 5, 7 and 8) over rural pixels. Considering the LSE models introduced in the literature, di ﬀ erent Normalized Di ﬀ erence Vegetation Index (NDVI)-based LSE models were investigated in this study. Speciﬁcally, three LSE models were considered for the LST estimation from Landsat 5 Thematic Mapper (TM) and seven Enhanced Thematic Mapper Plus (ETM + ), and six for Landsat 8. For the accurate evaluation of the estimated LST, in-situ LST data were obtained from the Surface Radiation Budget Network (SURFRAD) stations. In total, forty-ﬁve daytime Landsat images; ﬁfteen images for each Landsat mission, acquired in the Spring-Summer-Autumn period in the mid-latitude region in the Northern Hemisphere were acquired over ﬁve SURFRAD rural sites. After determining the best LSE model for the study case, ﬁrstly, the LST retrieval accuracy was evaluated considering the sensor type: when using Landsat 5 TM, 7 ETM + , and 8 Operational Land Imager (OLI), and Thermal Infrared Sensor (TIRS) data separately, RTE, MWA, and MWA presented the best results, respectively. Then, the performance was evaluated independently of the sensor types. In this case, all LST methods provided satisfying results, with MWA having a slightly better accuracy with a Root Mean Square Error (RMSE) equals to 2.39 K and a lower bias error. In addition, the spatio-temporal and seasonal analyses indicated that RTE and SCA presented similar results regardless of the season, while MWA di ﬀ ered from RTE and SCA for all seasons, especially in summer. To e ﬃ ciently perform this work, an ArcGIS toolbox, including all the methods and models analyzed here, was implemented and provided as a user facility for the LST retrieval from Landsat data.

and the same number of images for the three Landsat missions. Landsat data can be downloaded through the USGS 'Earth Explorer' website free of charge.
Landsat 5 TM and Landsat 7 ETM+ have six reflective bands (visible, near-infrared, and short-wavelength infrared, 30-m spatial resolution) and one band in the TIR region (Band 6). The thermal band has a native spatial resolution of 120-m and 60-m for TM and ETM+, respectively, but it is delivered by USGS at 30-m after cubic convolution resampling. The Landsat 8 OLI sensor has nine reflective bands with 30-m spatial resolution, and Landsat 8 TIRS sensor has two bands in the TIR region (Band 10 and Band 11). These thermal bands have a 100-m native spatial resolution but resampled and published at 30 m by USGS.
Appendix A reports the information of Landsat data utilized in the study (forty-five images from 2000 to 2019), as well as meteorological data (near-surface temperature and relative humidity) and NDVI value information corresponding to the acquisition times of Landsat data.

SURFRAD Data and Validation Sites
SURFRAD network was established by NOAA Office in 1993 in order to support climate-related researches over the US. SURFRAD stations have been measuring accurate, continuous, and long-term in-situ surface radiation budget [63]. The system became operational in 1995 with four stations, and, currently, eight SURFRAD stations are operating in different climatological regions of the US. Upwelling and downwelling components of solar and infrared radiation are the primary measurements. Besides, ancillary observations include direct and diffuse solar radiation, ultraviolet-B radiation, and meteorological parameters. Since SURFRAD stations provide unique in-situ LST information over rural sites, many researchers have used these data to validate satellite-based LST retrievals [15,46,52,53,56,[64][65][66][67]. In this study, five SURFRAD stations were considered as ground-based stations, and Table 1 reports information regarding the SURFRAD experimental sites. We also analyzed the LST retrieval at the SURFRAD station of Table Mountain, Boulder, Colorado (TBL). However, the LST differences between the satellite and the ground were high due to its elevation and the heterogeneity of the land cover as also assessed in other studies [15,59,68,69]. Thus, TBL station was not considered in the analyses. SXF (Sioux Falls, South Dakota) and SGP (ARM Southern Great Plains Facility, Oklahoma) station data were not processed in this study. The native spatial resolution of the Landsat thermal channels spans from 60 to 120 m, even if pixels are resampled at 30 m by USGS. The SURFRAD pyrgeometer used to measure the upwelling radiation is deployed at a 10-m high tower, producing an effective diameter of the field-of-view of about 40 m at the surface, i.e., roughly of the same order of the Landsat pixel size. Therefore, the Landsat pixel covering the SURFRAD instrument was selected for the comparison test.

LST Retrieval Methods
As previously pointed out, the following four LST retrieval methods will be considered: Mono Window Algorithm (MWA) [43], Single Channel Algorithm (SCA) [44], Radiative Transfer Remote Sens. 2020, 12, 294 5 of 32 Equation (RTE) method and Split Window Algorithm (SWA) [45,46]. While the first three methods can be applied to Landsat 5 TM, 7 ETM+ and 8 OLI/TIRS data, the SWA is applicable only to Landsat 8 OLI/TIRS data, since it requires at least two TIR bands. The essential differences between these methods are in the mathematical formulation and the input parameters [70]. In addition to the emissivity and the atmospheric transmissivity common to all methods, MWA needs near-surface air temperature for the effective mean atmospheric temperature computation unlike other methods. Conversely, RTE and SCA require the upwelling and downwelling atmospheric radiances for LST retrieval. The sensitivity of the input parameters on LST retrieval methods is reported in Appendix D.

Mono Window Algorithm
Mono Window Algorithm (MWA) was developed by Qin et al. [43] for Landsat TM data. The method requires three main parameters, i.e., emissivity, atmospheric transmittance, and effective mean atmospheric temperature. LST values from MWA can be estimated as: where T s is the LST in Kelvin, T is the at-sensor brightness temperature in Kelvin, T a is the effective mean atmospheric temperature in Kelvin, τ is the atmospheric transmittance, ε represents LSE, a and b are the algorithm constants, C and D are the algorithm parameters calculated using LSE and transmittance. A detailed description of the computations of the T, T a and τ parameters adopted in this work, are reported in Appendix B. The different LSE models tested in this work will be described in Section 4.

Single-Channel Algorithm
Jiménez-Muñoz et al. [44] introduced a revision of the Single-Channel Algorithm (SCA) to retrieve LST from Landsat TIR data. Considering SCA, LST (T s ) can be computed using the following general equation: where ε is the LSE, L sen is the at-sensor radiance of thermal band, ψ 1 , ψ 2 , and ψ 3 are atmospheric functions, and γ, δ are two parameters given by: where b γ = c 2 /λ i with c 2 = 14,387.7 µm·K and λ i is the effective band wavelength for band i, which is defined as: where f i (λ) is the spectral response function for the corresponding band. λ 1, i and λ 2, i are the lower and upper boundary of f i (λ), respectively. The value of b γ is equal to 1256 K and 1277 K for Band 6 of Landsat 5 and Landsat 7, respectively; for Band 10 and Band 11 of Landsat 8, it is equal to 1320 K and 1199 K, respectively. Atmospheric functions ψ 1 , ψ 2 , and ψ 3 are defined as: Remote Sens. 2020, 12, 294 6 of 32 where L ↑ λ (W·m −2 ·sr −1 ·µm −1 ) is upwelling or atmospheric path radiance, L ↓ λ (W·m −2 ·sr −1 ·µm −1 ) is downwelling or sky radiance. In this study, the atmospheric parameters τ, L ↑ λ and L ↓ λ used for the ψ 1 , ψ 2 , and ψ 3 computation are reported in Appendix B.

Radiative Transfer Equation Method
A straightforward method to retrieve LST from a single TIR band is the inversion of the radiative transfer equation (RTE) according to the following expressions: where L sen λ (W·m −2 ·sr −1 ·µm −1 ) is at-sensor registered radiance of the related thermal band, B λ (W·m −2 ·sr −1 ·µm −1 ) is the blackbody radiance. Blackbody radiance (B λ ) at a temperature of T s can be obtained by inverting the Equation (7): and, finally, T s can be obtained by inverting Planck's law as: where K 1 and K 2 are calibration constants for Landsat data reported in Appendix B.

Split-Window Algorithm
In previous studies, various Split Window Algorithms (SWAs) have been introduced for different sensors [4,[71][72][73][74] and detailed information of SWAs is reported in [7]. Among the different SWAs in the literature, in this study we considered SWA developed by Mao et al. [45] with coefficients re-parameterized by Yu et al. [46], corresponding to the Landsat 8 TIRS' spectral response curve. The USGS recommended not to use Band 11 of Landsat 8 for LST retrieval due to the large calibration uncertainty [75]. However, some researchers claimed that they obtained satisfactory results via SWA [46,76]. Thus, we also analyze and present SWA results in this study. According to SWA, LST (T s ) can be calculated using the following equations: B 1 = C 10 C 11 A 10 −C 10 A 11 (12) A 10 = ε 10 τ 10 (13) where ε 10 and ε 11 represent LSE for Band 10 and 11, respectively, τ 10 and τ 11 the atmospheric transmittance for Band 10 and 11, respectively, calculated as reported in Appendix B. L 10 and L 11 can be computed from Table 2 within a specific brightness temperature range for Band 10 (T 10 ) and Band 11 (T 11 ), respectively. In Table 2, "a" is the slope and B(K) is the intercept of linear regression. For example, if the brightness temperature of B 10 ranges between 20 and 50 • C, L 10 can be calculated by 0.4464 * T 10 − 66.61.

Land Surface Emissivity (LSE) Models
Surface emissivity stands for the surface ability that transforms heat energy into radiant energy [36]. LSE (ε) is one of the key parameters to retrieve accurate LST from remotely sensed imagery. Semi-Empirical Methods (SEMs), Physically-Based Methods (PBMs), and multi-channel Temperature/Emissivity Separation (TES) methods are three distinctive categories for LSE retrieval from space [7]. PBMs and multi-channel TES methods are not operational for Landsat data to obtain LSE due to the limitations presented in many studies, such as the requirement of more than two TIR bands or nighttime images [7,46,[76][77][78][79]. SEMs contain the Classification Based Emissivity Method (CBEM) [74,80] and the NDVI Based Emissivity Method (NBEM) [81,82], which are suitable for LSE estimation from Landsat data. The CBEM generates an LSE image from a classified image by applying an emissivity value for each class. However, CBEM is not practical since it requires a good knowledge of the study area and emissivity measurements on the surfaces representative of the different classes [70]. NDVI-based methods are operative and the most commonly utilized LSE retrieval methods since they are easy to apply and presenting satisfying results [36,58,83]. Li et al. [7] presented a detailed study showing the advantages, disadvantages, and limitations of different LSE models for LST retrieval from satellite data. Considering the study of Li et al. [7] and other researches, a state-of-the-art table showing different LSE categories and models, as well as the correspondent satellite data used is reported (Table 3).  As presented in Table 3, there are six NDVI-based models introduced for Landsat data, specifically three for Landsat TM and three for Landsat 8 OLI/TIRS. Therefore, we investigated the effect of these six LSE models on the accuracy of LST retrieval methods. Details about LSE models are presented in the following sub-sections. The sensitivity of the LSE on LST retrieval methods is reported in Appendix D.

LSE Model of Van de Griend and Owe
This model was applied to LST retrieval methods of all Landsat series (Landsat 5 TM, 7 ETM+, and 8 OLI/TIRS). Van de Griend and Owe [83] proposed a logarithmic approach for an LSE retrieval model based on NDVI ranging from 0.157 to 0.727. NDVI is obtained using the Near-Infrared (NIR) and Red (R) bands-the calculation steps of NDVI for Landsat 5, 7, and 8, are presented in Appendix C. The proposed model is given by:

LSE Model of Valor and Caselles
This model was applied to LST retrieval methods of all Landsat series (Landsat 5 TM, 7 ETM+, and 8 OLI/TIRS). Valor and Caselles [82] proposed a theoretical model that relates the emissivity to the NDVI of a given surface by: ε v and ε s represent the emissivity of vegetation and soil, respectively. dε is a term accounting for the cavity effect, which depends on the surface geometry. Pv (also referred to as fractional vegetation cover, FVC) is the proportion of vegetation calculated as [103]: (19) where NDVI max = 0.5 and NDVI min = 0.2 in a global situation [70]. As Valor and Caselles [82] suggested, ε v and ε s as 0.985 and 0.960, respectively, for unknown emissivity and vegetation structures, we also regarded these emissivity values in the calculation. Besides, they calculated the mean value for dε term as 0.015, and we utilized this value in LSE retrieval with this model. The final version of the LSE model can be given by:

NDVI Threshold (NDVI THM )-Based LSE Models
Sobrino et al. [58], Skoković et al. [60], Yu et al. [46], and Li and Jiang [76] estimated LSE from NDVI threshold (NDVI THM ) values considering three different cases as presented in Equation (21). In the first case (NDVI < 0.2), the pixel is considered as bare soil, and the emissivity is obtained from the reflectance values in the red region. In the second case (0.2 ≤ NDVI ≤ 0.5), the pixel is composed of a mixture of bare soil and vegetation, and in the third case (NDVI > 0.5), the pixels with NDVI values higher than 0.5 are considered as fully vegetated areas.
In Equation (21), ρ R is the reflectance value of the red band, a i and b i are estimated from an empirical relationship between the red band reflectance and Moderate Resolution Imaging Spectroradiometer (MODIS) emissivity library. ε v and ε s are the soil and vegetation emissivity, respectively. dε is the cavity effect due to surface roughness as in the previous model (dε = 0 for flat surfaces). F is a geometrical shape factor assumed as the mean value of 0.55 [70]. Table 4 presents the expressions of NDVI THM for all models mentioned above.

LST Computation Using Ground-Based SURFRAD Data
As stated in Section 2.2, image-based LST results were validated using the data of five ground-based SURFRAD stations. Since these stations do not provide LST measurements directly, LST is calculated from the upwelling and downwelling longwave radiation measurements using the following Equation (22) with regard to the Stefan-Boltzmann law: where F ↑ λ and F ↓ λ represent upwelling and dowelling thermal infrared (3-50 µm) irradiance in W/m 2 , respectively, measured during satellite passages. σ is the Stefan-Boltzmann constant (5.670367 × 10 −8 W·m −2 ·K −4 ), and ε b represents the broadband longwave surface emissivity, which is not measured by the station instruments. In previous studies on SURFRAD stations [53,56,65], the broadband emissivity was computed as reported in [104,105] by regression from narrowband emissivity of MODIS thermal bands, which are available through the MODIS monthly emissivity data set. The results in [104,105] proved that the longwave broadband emissivity for the SURFRAD sites could be considered 0.97, as also assumed in [64] and [53].
Therefore, in this study, the broadband emissivity was assumed 0.97. This assumption impacts only the SURFRAD LST estimation, not the satellite-derived estimation. Heidinger et al. [66] investigated the impact of changing the assumed broadband emissivity from 0.97 to 0.98 on the SURFRAD LST observation. The results indicated that a 0.01 error in broadband emissivity produces a SURFRAD LST error that rarely exceeds 0.25 K. In addition, Wang and Liang [105] proved that the sensitivity of the SURFRAD LST to broadband emissivity ranged from 0.1K/0.01 to 0.35K/0.01, which means the accuracy of LST varies between 0.1 K to 0.4 K when the broadband emissivity error is about ±0.01. While this error is not negligible, it does not appear to be a dominant source of uncertainty in the SURFRAD-based performance metrics considering the magnitude of the other uncertainties [66].
Satellite-based LST and SURFRAD-based LST were compared using statistical criteria, namely, the Root Mean Square Error (RMSE). RMSE, in Equation (23), is a widely used statistical metric evaluating the efficiency of the models.
where T SAT and T SURF are the satellite-based LST and SURFRAD-based LST, respectively, and n represents the pixel count.

Results
In order to compare the results of LST retrieval methods, fifteen images of each Landsat mission (Landsat 5, 7, and 8), a total of forty-five images, were utilized in this study. MWA, RTE, and SCA were performed for all satellite data, whilst SWA was only utilized with Landsat 8 data due to the requirement of two TIR bands. The values of the atmospheric and model parameters used in LST retrieval methods for the 45 images are reported in Appendix E. Furthermore, the effects of different LSE models on the accuracy of the LST retrieval methods were investigated.

Results of LST Algorithms and LSE Models Derived from Landsat 5 TM
In Table 5, the accuracy of LST retrieval methods for Landsat 5 TM data is presented based on the different LSE models. Considering the LSE models, Sobrino et al.'s model provided the best results for all three LST retrieval methods. Valor and Caselles' model was the second LSE model presenting satisfying results for LST estimation, whilst Van De Griend and Owe's LSE model provided very high RMSE. The SURFRAD test assessed whether the RTE method was a bit better than MWA and SCA, with a lower RMSE value of 2.35 K.

Comparison of LST Retrieval Algorithms Considering All Landsat Missions
In Sections 6.1-6.3, LST results were analyzed by dividing the sensor types of Landsat missions. In this section, the accuracies of the LST retrieval algorithms with respect to the ground-based LST data were evaluated considering the best LSE Model (Sobrino et al.'s) and all Landsat missions. This comparison can be significant for users who will conduct time-series analyses of LST over rural areas using the data of all Landsat missions. Since SWA was only used with Landsat 8 data, but it is not the best retrieval method, it was not considered in this section for the comparison purposes. In Figure 1a-c, MWA-based, RTE-based, and SCA-based LST results derived from Landsat 5, 7, and 8 data were compared with SURFRAD LST, respectively. The RMSE was 2.39 K for MWA, 2.57 K for RTE, and 2.73 K for SCA. The average biases (ground LST-satellite LST) for MWA, RTE, and SCA are −0.72 K, −1.63 K, and −1.81 K, respectively. Moreover, the error standard deviations (STD) are 2.28 K, 1.98 K, and 2.05 K for MWA, RTE, and SCA, respectively. Even though MWA has a slightly greater error STD, overall, the RMSE for RTE and SCA is higher due to the greater biases. This in-situ test over different rural areas showed that MWA, RTE, and SCA can provide good results with Sobrino et al.'s LSE model, and MWA presented slightly better performance. Figure 1 and biases show that although all methods tend to overestimate LST slightly, the MWA overestimation is lower. The different results, especially the bias, can be ascribed to the accuracy of the input parameters: as reported previously, RTA and SCA have the same input parameters, whilst MWA uses T a instead of L ↑ λ and L ↓ λ . We must also consider the different formulation of the methods: since SCA is derived from a mathematical approximation of RTE [41], it is expected they provide similar results.
The different results, especially the bias, can be ascribed to the accuracy of the input parameters: as from a mathematical approximation of RTE [41], it is expected they provide similar results.

Analysis of Spatio-Temporal and Seasonal LST Variations Between LST Retrieval Methods
As stated in previous sections, Sobrino et al.'s LSE model provided the best performance on all LST retrieval methods. In this section, we focus on investigating the spatio-temporal and seasonal relationship between the LST retrieval methods using the Sobrino et al.'s LSE model. This analysis does not represent an evaluation of the best LST retrieval method, which was assessed in previous sections by a test with SURFRAD data, but it suggests us if there is similarity or not between the three methods (MWA, RTE, and SCA) with the same LSE model under changing emissivity values due to seasonal variations. In this analysis, we consider the LST retrieval methods in a rural area of 6 km x 6 km (40000 pixels) centered on the SURFRAD station of each Landsat image. Therefore, a total of 45

Analysis of Spatio-Temporal and Seasonal LST Variations Between LST Retrieval Methods
As stated in previous sections, Sobrino et al.'s LSE model provided the best performance on all LST retrieval methods. In this section, we focus on investigating the spatio-temporal and seasonal relationship between the LST retrieval methods using the Sobrino et al.'s LSE model. This analysis does not represent an evaluation of the best LST retrieval method, which was assessed in previous sections by a test with SURFRAD data, but it suggests us if there is similarity or not between the three methods (MWA, RTE, and SCA) with the same LSE model under changing emissivity values due to seasonal variations. In this analysis, we consider the LST retrieval methods in a rural area of 6 km × 6 km (40,000 pixels) centered on the SURFRAD station of each Landsat image. Therefore, a total of 45 × example of Landsat 8 LST image over the 6  6 km 2 rural area, acquired on 27 April 2018 and covering the BND station, for the three methods used in this analysis. To analyze the spatio-temporal and seasonal LST variations among the LST retrieval methods, the 45 Landsat images were categorized into three seasons: spring (18 images), summer (13), and autumn (14), and Root Mean Square (RMS) differences were calculated for each image. Winter images were not available due to cloudy conditions (see Table A1). This is not an accuracy test as the one performed by the SURFRAD LST ground measurements, but it is an image-based analysis to highlight the differences among the three retrieval methods. Therefore, the RMS difference is used instead of the RMSE. In the 6x6 km 2 selected areas, the minimum LST values from satellite data for spring, summer, and autumn are 280.29 K, 281.70 K, and 284.67 K, respectively; the maximum are 321.88 K, 330.67 K, and 317.95 K, respectively. Figure 3 shows the box-plot graph presenting the seasonal RMS differences between the LST retrieval methods. The box-plot is used to display To analyze the spatio-temporal and seasonal LST variations among the LST retrieval methods, the 45 Landsat images were categorized into three seasons: spring (18 images), summer (13), and autumn (14), and Root Mean Square (RMS) differences were calculated for each image. Winter images were not available due to cloudy conditions (see Table A1). This is not an accuracy test as the one performed by the SURFRAD LST ground measurements, but it is an image-based analysis to highlight the differences among the three retrieval methods. Therefore, the RMS difference is used instead of the RMSE. In the 6 × 6 km 2 selected areas, the minimum LST values from satellite data for spring, summer, and autumn are 280.29 K, 281.70 K, and 284.67 K, respectively; the maximum are 321.88 K, 330.67 K, and 317.95 K, respectively. Figure 3 shows the box-plot graph presenting the seasonal RMS differences between the LST retrieval methods. The box-plot is used to display distributional characteristics of data [106]. The box-plot information, reported in Figure 3 by numbers, is the minimum (1) (the lowest data point excluding any outliers), first quartile (2), median (3), third quartile (4) and maximum (5) (the largest data point excluding any outliers). The cross "x" in the boxes refers to the mean value of the data set, and the points outside the minimum, and maximum values are assumed as outliers.
Concerning Figure 3, blue box-plots represent the RMS differences between MWA and RTE-based LST values across the seasons. Red and orange box-plots refer to the RMS differences between RTE and SCA-based LST values, and SCA and MWA-based LST values, respectively. distributional characteristics of data [106]. The box-plot information, reported in Figure 3 by numbers, is the minimum (1) (the lowest data point excluding any outliers), first quartile (2), median (3), third quartile (4) and maximum (5) (the largest data point excluding any outliers). The cross "x" in the boxes refers to the mean value of the data set, and the points outside the minimum, and maximum values are assumed as outliers. Concerning Figure 3, blue box-plots represent the RMS differences between MWA and RTE-based LST values across the seasons. Red and orange box-plots refer to the RMS differences between RTE and SCA-based LST values, and SCA and MWA-based LST values, respectively. Figure 3. Seasonal RMS differences between the LST retrieval methods. Box-plot information: minimum (1), first quartile (2), median (3), third quartile (4), and maximum (5). The cross "x" in the boxes is the mean value of the data set. Figure 3 highlights that RTE and SCA have a high level of agreement with each other regardless of the season. MWA is slightly different from RTE and SCA, and in the summer this difference is more evident due to the higher LST dynamics. Since SCA is derived based on RTE [41], and they have the same input parameters, their similarities can also be seen in Figure 3. MWA, on the other hand, uses Ta instead of L λ ↑ and L λ ↓ considered by RTE and SCA. Although the median values of all boxplots and seasons are close to zero (0.11-0.35 K for spring, 0.18-0.94 K for summer, and 0.12-0.43 K for autumn), MWA provides clearly different LST values than RTE and SCA in some summer images. In addition, the mean RMS differences (the cross "x" in boxes) (0.11-0.59 K for spring, 0.24-1.91 K for summer, and 0.12-0.64 K for autumn) reveals the higher variations between MWA and the other two methods in summer. Besides, there are two evident outliers over the maximum value in the summer and autumn for MWA-RTE and SCA-MWA.

Automated LST Extraction Toolbox for Landsat Missions
Different LST retrieval methods and LSE models are not available in packaged RS or GIS software. To overcome these difficulties, some researchers have developed plugins for different software such as ENVI [107], QGIS [108], ArcGIS [109], C++ [110], Python [111], Visual Basic [112] and ERDAS Imagine [113] to extract LST automatically. ERDAS Imagine and ArcGIS software present a visual programming interface which is of vital importance for users without specific knowledge of classical textual programming languages. The ModelBuilder is the visual programming language of ArcGIS software that enables connecting different steps of algorithms to automate the whole process.
GIS models and many remote sensing algorithms require a series of serial tasks called geoprocessing workflows. Thanks to the geospatial models, all steps of an algorithm can be connected to each other to automate the whole process. Although ESRI's ArcGIS is a commercial GIS software, many people and governmental institutions around the world utilize this software since it presents Figure 3. Seasonal RMS differences between the LST retrieval methods. Box-plot information: minimum (1), first quartile (2), median (3), third quartile (4), and maximum (5). The cross "x" in the boxes is the mean value of the data set. Figure 3 highlights that RTE and SCA have a high level of agreement with each other regardless of the season. MWA is slightly different from RTE and SCA, and in the summer this difference is more evident due to the higher LST dynamics. Since SCA is derived based on RTE [41], and they have the same input parameters, their similarities can also be seen in Figure 3. MWA, on the other hand, uses T a instead of L ↑ λ and L ↓ λ considered by RTE and SCA. Although the median values of all box-plots and seasons are close to zero (0.11-0.35 K for spring, 0.18-0.94 K for summer, and 0.12-0.43 K for autumn), MWA provides clearly different LST values than RTE and SCA in some summer images. In addition, the mean RMS differences (the cross "x" in boxes) (0.11-0.59 K for spring, 0.24-1.91 K for summer, and 0.12-0.64 K for autumn) reveals the higher variations between MWA and the other two methods in summer. Besides, there are two evident outliers over the maximum value in the summer and autumn for MWA-RTE and SCA-MWA.

Automated LST Extraction Toolbox for Landsat Missions
Different LST retrieval methods and LSE models are not available in packaged RS or GIS software. To overcome these difficulties, some researchers have developed plugins for different software such as ENVI [107], QGIS [108], ArcGIS [109], C++ [110], Python [111], Visual Basic [112] and ERDAS Imagine [113] to extract LST automatically. ERDAS Imagine and ArcGIS software present a visual programming interface which is of vital importance for users without specific knowledge of classical textual programming languages. The ModelBuilder is the visual programming language of ArcGIS software that enables connecting different steps of algorithms to automate the whole process.
GIS models and many remote sensing algorithms require a series of serial tasks called geo-processing workflows. Thanks to the geospatial models, all steps of an algorithm can be connected to each other to automate the whole process. Although ESRI's ArcGIS is a commercial GIS software, many people and governmental institutions around the world utilize this software since it presents a substantial context for GIS users. Besides, it offers a powerful geospatial model builder (the ModelBuilder) that allows automation and documentation of an algorithm or method. Please see Appendix F for more information about the LST toolbox created based on this study (Supplementary Materials).

Discussion
As stated in Section 4, CBEM and NDVI-based LSE models are two appropriate models for Landsat data; CBEM can be used if emissivity measurements are obtained from in-situ campaigns, but it is not practical if the land cover information is not known accurately. Thus, we evaluated the impact of different NDVI-based LSE models, introduced in previous works, on LST retrieval methods over rural areas. It should be noted that the parameters and coefficients in NDVI-based LSE models were used as proposed in the original articles, and they can also be adapted to any study area with field campaigns. Valor and Caselles [82] stated that the error of their methodology ranges from 0.5% (due to the experimental limitations of the field methods) to 2% (considering the case in which there is no information about the study area). Van de Griend and Owe [83] did not conduct any sensitivity/error analysis but indicated that the correlation coefficient between NDVI and thermal emissivity was 0.941 at a 0.01 level of significance. One of the limitations of NDVI-based LSE models is its ineffectiveness in estimating LSE values for water and urban environments [114,115]. In this study, we just investigated the validation of daytime LST images. For nighttime LST evaluation, the LSE image can be estimated using CBEM derived from in-situ measurements or daytime NDVI-based LSE acquired on close dates. Overall, the proposed LST retrieval methods and LSE models can be implemented for regions other than the US, as well as for nighttime, non-rural, and winter data in clear sky conditions.
Considering the LST validation, error sources come from both satellite-based LST and ground-based LST. Satellite-based LST retrieval is still a challenging process due to the great variability of Earth surfaces and the necessary a priori knowledge about several parameters such as the atmosphere, the LSE, the meteorological conditions and the sensor specifications (spectral responses, signal to noise ratio, spectral resolution, spatial resolution, and viewing angle) [7,32,48,[116][117][118]. Moreover, LST retrieval methods for satellite data are generally proposed considering different conditions and assumptions. Therefore, there is no universal method that always provides accurate LSTs from all satellite TIR data, and it is not easy to say which algorithm is superior to others [7]. The accuracy of the radiometric measurements and emissivity is the primary uncertainty for ground-based LST retrievals [119][120][121][122][123]. Sobrino and Skoković [119,122] presented an example of an error budget for The Global Change Unit (GCU) sites at the University of Valencia, and Table 8 indicates the impact of the parameter uncertainty ranges on ground-based LST. The limitations of this study and future investigations can be reported as follows: (1) Thermal bands have a native spatial resolution of 120 m, 60 m and 100 m for Landsat 5 TM, 7 ETM+, and 8 TIRS, respectively, but they are delivered by USGS at 30-m after cubic convolution resampling. Therefore, we also considered 30 m resampled TIR images which may lead to an error source when conducting pixel-scale validation. Different downscaling methods for TIR or LST data can be utilized as future work to examine the LST accuracy. (2) Although previous work proved that the accuracy of SURFRAD LST varies between 0.1 K to 0.4 K when the broadband emissivity error is about ±0.01, the use of fixed broadband emissivity (0.97) in this study may influence the ground-based LST calculation, but not dominantly. Broadband emissivity estimation models can be implemented in the future to show the optimal model for ground-based LST retrieval. (3) The LST toolbox, presented as a user facility in this study, does not include error analysis. Thus, users should carry out accuracy analysis after obtaining LST images. Besides, it can be improved and generated in an open-source environment as future work.

Conclusions
In this study, three LST retrieval algorithms (RTE, SCA, and MWA) were evaluated using Landsat 5 TM, 7 ETM+, and 8 OLI/TIRS data, and additionally, SWA were assessed for Landsat 8 OLI/TIRS data. Since LSE is one of the most important factors affecting the accuracy of LST retrieval methods, the effects of different NDVI-based LSE models on satellite-based LST accuracy were also investigated. Forty-five images acquired in the Spring-Summer-Autumn period over rural areas in the mid-latitude region in the Northern Hemisphere were obtained over five SURFRAD stations and simultaneous in-situ LST data were utilized for accuracy analyses. To get rid of step-by-step calculation for all LST methods and LSE models as well as for time consuming in processing of the images, an enhanced toolbox was generated for automated LST extraction. This toolbox can be utilized by all GIS users to obtain LST in an easy and practical way.
Three NDVI-based LSE models, namely, Sobrino et al.'s, Valor & Caselles' and Van De Griend & Owe's LSE model, were considered for Landsat 5 TM and 7 ETM+ data to investigate their effects on LST methods. In addition to these three LSE models, three more NDVI-based LSE models (Skoković et al.'s, Yu et al.'s, and Li and Jiang's LSE models) were added to the analyses of Landsat 8 based LST methods. That is, the effects of six LSE models on the performance of LST methods from Landsat 8 data were investigated. To sum up, this study only considered NDVI-based LSE models for the evaluation of LST retrieval methods. Two different approaches were considered: 1) Sensor types of Landsat missions (Landsat 5, 7, and 8) were evaluated separately. 2) LST retrieval methods were compared with each other independently of sensor type, i.e., considering all Landsat missions together. In the toolbox, users can decide which LST method and LSE model they can utilize if they are dealing with the use of Landsat data. Furthermore, if they have their own LSE image, the toolbox makes it possible to use any external LSE image.
The obtained results showed that Sobrino et al.'s LSE model provided the best performance to extract LST for all Landsat missions and LST methods. Although all LST retrieval methods with Sobrino et al.'s LSE model presented satisfying and close results when using Landsat 5 TM data, RTE offered the best accuracy (2.35 K RMSE). The same would apply to Landsat 7 ETM+ data, even if MWA presented the best results (2.24 K RMSE). Again, Sobrino et al.'s LSE model provided higher accuracy for all LST retrieval methods from Landsat 8 data, with MWA as the best method (2.52 K RMSE). Considering all Landsat missions, MWA offered slightly better accuracy than RTE and SCA. Concerning the analyses above, it is hard to say which method is globally the best one, since the accuracy of the input parameters largely affects the performance of the methods. In addition, the spatio-temporal and seasonal comparison among LST retrieval methods revealed that RTE and SCA have a high level of agreement with each other regardless of the season. Instead, MWA presented different results than RTE and SCA, especially in summer.
The results indicated that LSE models have a great impact on the accuracy of satellite-based LST retrieval methods. This study also revealed that Sobrino et al.'s LSE model was the most appropriate model for all Landsat missions and LST retrieval methods over rural areas. Moreover, validation of LST retrieval methods with different LSE models over urban areas is a further challenge to be faced that deserves future investigations. Acknowledgments: The authors thank USGS for providing Landsat 8 satellite imagery free of charge. In addition, the authors thank NOAA for providing in-situ LST measurements from SURFRAD stations publicly (ftp://aftp.cmdl.noaa.gov/data/radiation/surfrad/).

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Information about Landsat data used in the study (45 images from 2000 to 2019). The last column shows the SURFRAD station included within the satellite scene. SURFRAD codes can be found in: https://www.esrl.noaa.gov/gmd/grad/surfrad/sitepage.html or in Table A1.

. Brightness Temperature (T) Retrieval
The brightness temperature is the temperature of a blackbody that would emit an identical amount of radiation at a definite wavelength [125] and it can be calculated by inverting the Planck function. Considering satellite data, Thermal Infrared (TIR) pixel values are firstly converted into radiance from Digital Number (DN) values. Radiances for TIR band of Landsat 5 TM and 7 ETM+ are obtained using Equation (A1) [126]. Radiance values for Landsat 8 TIRs can be retrieved from Equation (A2) [127].
where L λ is Top of Atmosphere (TOA) spectral radiance (Watts/(m 2 ·srad·µm)), Q CAL is the quantized calibrated pixel value in DN, L MINλ (Watts/(m 2 ·srad·µm)) is the spectral radiance scaled to QCAL MIN , L MAXλ (Watts/(m 2 ·srad·µm)) is the spectral radiance scaled to QCAL MAX , QCAL MIN is the minimum quantized calibrated pixel value in DN and QCAL MAX is the maximum quantized calibrated pixel where L λ is the TOA spectral radiance (Watts/(m 2 ·srad·µm)), M L is the band-specific multiplicative rescaling factor from the metadata, A L is the band-specific additive rescaling factor from the metadata, Q CAL is the quantized and calibrated standard product pixel values (DN). All of these variables can be retrieved from the metadata file of Landsat 8 data. After radiance conversion, brightness temperature image can be generated by Equation (A3) for all Landsat missions [126,127].
where T refers to the effective at-satellite brightness temperature in Kelvin, K 1 (Watts/(m 2 ·srad·µm)) and K 2 (Kelvin) are the calibration constants and L λ is the spectral radiance. The values of the constants (K 1 and K 2 ) were presented in Table A2 since they change from sensor to sensor [126,127].  Table A3 reveals the practical equations to calculate effective mean atmospheric temperature (T a ) by means of near-surface temperature (T o ),essential for MWA [43]. In this work, mid-latitude summer region was considered for the calculation.
We used a mid-latitude summer region model for T a in this work; however, the USA 1976 Standard atmosphere is also suitable for our test sites. Thus, we also investigated the difference in LST when using mid-latitude summer and USA 1976 Standard models with simulations, and we obtained almost 1 K difference in LST in the analyses. National Aeronautics and Space Administration (NASA) provides an atmospheric correction tool, known as the Atmospheric Correction Parameter Calculator (ACPC) that calculates atmospheric transmissivity or transmittance, upwelling, and downwelling radiance. These atmospheric parameters were computed taking the National Centers for Environmental Prediction modeled atmospheric profiles as inputs to the MODTRAN radiative transfer code for a given site and date [128,129]. In this study, atmospheric transmittance, upwelling and downwelling radiance values were calculated using the ACPC for MWA, RTE, and SCA. Alternatively, a radiative transfer code can be used to estimate atmospheric transmittance, upwelling and downwelling radiance.
Considering Landsat 8 data, ACPC presents parameters just for Band 10. Thus, for SWA, atmospheric transmittance values of Band 10 and 11 (τ 10 and τ 11 ) were calculated using water vapor as presented in Table A4 [46].
In addition, we investigated how τ 10 and τ 11 vary when we use USA 1976 Standard atmosphere instead of mid-latitude summer model in SWA. The difference in τ 10 and τ 11 varies from −0.001 to 0.004 and −0.004 to 0.035, respectively, for the water vapor range of Table A4. As reported in Table A5 of Appendix D, a 1% uncertainty for transmissivity in SWA results in ±0.29 K variation in LST (Table A5). Table A4. The relationship between atmospheric transmittance (τ 10/11 ) and water vapor content (w).

Model Water Vapor Range Equation
Mid-latitude Summer Region 0.2-3.0 g/cm 2 τ 10 = −0.0164w 2 −0.04203w+0.9715 Water vapor content (w) can be either accessed from meteorological stations or calculated from Relative Humidity (RH) and near-surface temperature (T o ) using the following equation [130]: where w i (g/cm 2 ) is the water vapor content, To is the near-surface temperature in Kelvin, and RH (%) refers to the relative humidity.
where ρ NIR is the reflectance band in the NIR region and ρ R refers to the reflectance band in the R region.

Appendix D
Since the input parameters used in the retrieval methods inevitably have errors, affecting the LST accuracy, some papers reported sensitivity analyses of the input parameters on LST methods [131][132][133]. In this appendix a sensitivity analysis of each retrieval method to a specific input parameter is carried out, with the other input parameters fixed (Table A5). The input parameters are: effective mean atmospheric temperature, LSE, atmospheric transmittance, upwelling radiance and downwelling radiance of the atmosphere.
We assumed the near surface air temperature to be 295 K, thus the effective mean atmospheric temperature is calculated as 289.24 K. The atmospheric transmittance was assumed to be 0.77 and upwelling and downwelling radiances were assumed as 1.74 W·m −2 ·sr −1 ·µm −1 and 2.82 W·m −2 ·sr −1 ·µm −1 . We assumed the brightness temperature range from 285 K to 300 K, since the variation in the brightness temperature also affect the results. The LSE value was fixed as 0.97. Considering SWA, we observed the average difference between τ 10 and τ 11 as 0.05. Thus, we assumed τ 10 and τ 11 to be 0.82 and 0.77, respectively. A fixed value of 1.5 K for T 10 -T 11 as in [131]. Table A5 dhows that LSE is the most important parameter influencing the results of MWA and SWA compared to the other inputs. The sensitivity of L ↑ λ to the results of RTE and SCA is higher than L ↓ λ .

Appendix E
Required atmospheric and model parameters were derived for each satellite image and presented in Table A6. In this table, atmospheric parameters include upwelling radiance (L ↑ λ ), downwelling radiance (L ↓ λ ), atmospheric transmittance (τ), and mean atmospheric temperature (T a ); model parameters include Earth-sun distance (d), solar zenith angle (θ sz ) for Landsat 5 and 7, and sun elevation angle (θ se ) for Landsat 8. L ↑ λ , L ↓ λ and τ were calculated using NASA's ACPC (see Appendix B.3), and T a was obtained from Table A3 for mid-latitude summer region. d and θ se are obtained from metadata file of the Landsat data, and θ sz is equal to 90 o − θ se . Earth-sun distance "d" is not necessary for Landsat 8 data to obtain spectral reflectance. Thus, this column in the table is empty for Landsat 8. In addition to Table A6, Table A7 presents transmittance values for Band 10 and Band 11 of Landsat 8 TIRS required  for LST retrieval using SWA, and they were calculated from Table A4.

Appendix F
In this study, a total of 49 individual models were generated in the ModelBuilder for automated LST retrieval using different LST retrieval algorithms and LSE models (Supplementary Materials). Apart from SWA, MWA, RTE, and SCA were modeled for Landsat 5 TM, Landsat 7 ETM+ and Landsat 8 OLI/TIRS data. Since SWA requires more than one thermal band, it can be only implemented for Landsat 8 TIRS data. Figure A1 illustrates the toolbox in ArcGIS catalog window showing the LST retrieval models for the related Landsat missions. The toolbox consists of three main parts with reference to the three Landsat missions, and each mission was categorized considering different LSE models for LST retrieval methods. Furthermore, if the users have their own LSE image generated by a model different from those studied here, they can also use this toolbox by selecting the external LSE model for each Landsat mission.
In addition to Figure A1, the interface of the MWA method using Sobrino et al.'s LSE model and Landsat 5 TM data is presented in Figure A2 as an example. As shown in Figure A2, the users only select the required inputs and the destination folder for the LST image. Thus, this geospatial toolbox makes the processing of Landsat images much easier than step-by-step calculation. Researchers, who would like to use this toolbox, can get in touch with the authors without any hesitation. In addition to Figure F1, the interface of the MWA method using Sobrino et al.'s LSE model and Landsat 5 TM data is presented in Figure F2 as an example. As shown in Figure F2, the users only select the required inputs and the destination folder for the LST image. Thus, this geospatial toolbox makes the processing of Landsat images much easier than step-by-step calculation. Researchers, who would like to use this toolbox, can get in touch with the authors without any hesitation.