Comparison of Multiple Linear Regression , Cubist Regression , and Random Forest Algorithms to Estimate Daily Air Surface Temperature from Dynamic Combinations of MODIS LST Data

Recently, several methods have been introduced and applied to estimate daily air surface temperature (Ta) using MODIS land surface temperature data (MODIS LST). Among these methods, the most common used method is statistical modeling, and the most applied algorithms are linear/multiple linear regression models (LM). There are only a handful of studies using machine learning algorithm models such as random forest (RF) or cubist regression (CB). In particular, there is no study comparing different combinations of four MODIS LST datasets with or without auxiliary data using different algorithms such as multiple linear regression, random forest, and cubist regression for daily Ta-max, Ta-min, and Ta-mean estimation. Our study examines the mentioned combinations of four MODIS-LST datasets and shows that different combinations and differently applied algorithms produce various Ta estimation accuracies. Additional analysis of daily data from three climate stations in the mountain area of North West of Vietnam for the period of five years (2009 to 2013) with four MODIS LST datasets (AQUA daytime, AQUA nighttime, TERRA daytime, and TERRA nighttime) and two additional auxiliary datasets (elevation and Julian day) shows that CB and LM should be applied if MODIS LST data is used solely. If MODIS LST is used together with auxiliary data, especially in mountainous areas, CB or RF is highly recommended. This study proved that the very high accuracy of Ta estimation (R2 > 0.93/0.80/0.89 and RMSE ~1.5/2.0/1.6 ◦C of Ta-max, Ta-min, and Ta-mean, respectively) could be achieved with a simple combination of four LST data, elevation, and Julian day data using a suitable algorithm.


Introduction
Air surface temperature (T a ) with high spatial and temporal resolution plays an important role in various applications, such as crop growth monitoring and simulations [1], hydrological, ecological, and environmental studies [2][3][4], weather forecasting [5,6], and climate change [7,8].It is used as a key input variable and directly affects the accuracy of these applications.Traditionally, T a is usually measured by weather stations (often at 2 m above the ground) and usually limited in spatial coverage.Especially in mountainous areas of Vietnam, weather station coverage is extremely sparse.Meanwhile, satellite data available at various spatial and temporal resolutions, such as Landsat, the Advanced Very High Resolution Radiometer (AVHRR), Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), and especially Moderate-resolution Imaging Spectroradiometer (MODIS), which was launched in the early 2000s, have marked a significant increase in the quality and quantity of thermal data.The advantage of MODIS is that it can provide Land Surface Temperature (LST) data directly.However, there is a difference between T a and LST because of the complex surface energy budget and multiple related variables between them.
Looking at the current literature, there are plentiful T a estimation studies; however, studies using machine learning techniques such as cubist regression (CB) or random forests (RF) are very rare (as far as we know, only [18,[24][25][26].However, all of these studies used MODIS LST integrating auxiliary data and estimated only T a-max or T a-mean .Furthermore, their conclusions are also different.Meyer et al. [26] stated that RF algorithms show the weakest results among linear regression, generalized boosted regression models (GBM), and Cubist regression.In contrast, Xu et al. [25] concluded that RF outperforms the linear regression.Zhang et al. [18] divided their data record into two groups (group S1 contains all four MODIS LST under good quality and group S2 had at least one LST with poor quality).The results based on the two datasets are different: in group S1, RF shows the best results in almost all combinations, but in group S2 the best algorithm is the Cubist regression.As a final result, the best algorithm for daily T a-max , T a-min , and T a-mean estimation remains unknown.
Regarding MODIS LST data (v005), LST data are not available for a location (pixel) if cloudiness is present inside the pixel [27].Due to the differences in satellite overpass times, the valid observation data at a specific location (pixel) varies between LST ad , LST an , LST td , and LST tn .Therefore, it is important to compare the dynamic combination of one to four LST data that are available at different times and locations as well as the most suitable algorithm to apply for T a estimation.Furthermore, a rising question using LST MODIS solely is the kind of relationship (linear or nonlinear) between LST and T a , especially in mountainous areas.
Therefore, in this research, we investigate all 15 (i.e., 2 4 − 1) possible dynamic combinations of four LST with or without auxiliary data for daily T a estimation using three different algorithms: multiple linear regression (LM), cubist regression (CB), and random forests models (RF).Finally, the accuracies of these T a -estimated are evaluated by comparison with T a -measured data, which are collected from weather stations.Root mean square error (RMSE) and coefficient of determination (R 2 ) were used as the model evaluation scores.

Study Area and Weather Station Data
The study area is located in northwest Vietnam inside two large provinces: Lai Chau and Dien Bien.It covers an area of 18,600 km 2 (Figure 1).The study area presents a rural and mountainous region in northwest Vietnam with a sparse distribution of weather stations.There are only four weather stations (Figure 1) within these two provinces.However, due to the lack of data measurement, we chose only three stations, Sin Ho, Dien Bien, and Lai Chau, for this study (Table 1).In each station, the T a data were recorded hourly.T a-max and T a-min are the highest (maximum) and lowest (minimum) air surface temperatures that occur on a diurnal cycle (24 h cycle), respectively; T a-mean was calculated by averaging all 24 hourly measurements in a day.Generally, T a-max occurs after solar noon from one to two hours, and T a-min usually occurs shortly before dawn.In this study, we collected daily T a-max , T a-min , and T a-mean from 2009 to 2013 from the Vietnam Institute of Meteorology, Hydrology, and the Environment (IMHEN).
Based on the MODIS land cover type product (MCD12Q1 data of 2010), the major land cover type in this area is forest, covering approximately 64% (Figure 1).
after solar noon from one to two hours, and Ta-min usually occurs shortly before dawn.In this study, we collected daily Ta-max, Ta-min, and Ta-mean from 2009 to 2013 from the Vietnam Institute of Meteorology, Hydrology, and the Environment (IMHEN).
Based on the MODIS land cover type product (MCD12Q1 data of 2010), the major land cover type in this area is forest, covering approximately 64% (Figure 1).

MODIS LST
All MODIS LST data used in this study were acquired from the U.S. Geological Survey (USGS) website [28].
We used two MODIS LST products (v005, h27v06), MOD11A1 and MYD11A1 from TERRA and AQUA satellites, respectively.The MODIS LST consists of daytime and nighttime data at a spatial resolution of 1 km.Thus, in total there are four LST datasets: AQUA daytime (LSTad), AQUA nighttime (LSTan), TERRA daytime (LSTtd), and TERRA nighttime (LSTtn).
In the literature, there are some studies that use eight-day LST averages to estimate Ta [13,14,29].It should be considered that eight-day-average LST is calculated by averaging all valid data under clear sky conditions, the number of participant data points varying from one to eight days depending on availability.Meanwhile, eight-day-average Ta is calculated by averaging the data under changing sky conditions.Therefore, if we compare eight-day-average LST and eight-day-average Ta, the sampling may introduce uncertainty [22].Taking this difference into consideration, in this study we decided to use daily LST under clear sky conditions instead of eight-day-average LST data.

MODIS LST
All MODIS LST data used in this study were acquired from the U.S. Geological Survey (USGS) website [28].
We used two MODIS LST products (v005, h27v06), MOD11A1 and MYD11A1 from TERRA and AQUA satellites, respectively.The MODIS LST consists of daytime and nighttime data at a spatial resolution of 1 km.Thus, in total there are four LST datasets: AQUA daytime (LST ad ), AQUA nighttime (LST an ), TERRA daytime (LST td ), and TERRA nighttime (LST tn ).
In the literature, there are some studies that use eight-day LST averages to estimate T a [13,14,29].It should be considered that eight-day-average LST is calculated by averaging all valid data under clear sky conditions, the number of participant data points varying from one to eight days depending on availability.Meanwhile, eight-day-average T a is calculated by averaging the data under changing sky conditions.Therefore, if we compare eight-day-average LST and eight-day-average T a , the sampling may introduce uncertainty [22].Taking this difference into consideration, in this study we decided to use daily LST under clear sky conditions instead of eight-day-average LST data.

MODIS Land Cover
The MODIS Land Cover Type Product (MCD12Q1) is downloaded from the Land Processes Distributed Active Archive Center [30].In order to use this product easily in the community, four main classification schemes were provided, including IGBP (International Geosphere-Biosphere Programme), UMD (University of Maryland), LAI/fPAR (Leaf Area Index/fraction of Photosynthetically Active Radiation), and NPP (Net Primary Productivity) [30,31].For our study, we use the primary land cover scheme, which is provided by the IGPB land cover classification.Based on this scheme, our study has 13 types of land cover classes.However, in order to make it easy to use and distinguish between each class, consistent with the land cover of the study area we combined and reduced the classes to six types (Figure 1).As is shown in Figure 1, the majority of land cover in the study area is forest and crop land.
In addition, based on the results of our previous study [17], we take two more variables into account for T a estimation in northern Vietnam: station elevation (el) and Julian day data.Elevations of stations were obtained from the Vietnam Institute of Meteorology, Hydrology and Environment (IMHEN).The Julian day (jd) was extracted from the NASA server [32].

Calculating LST of Weather-Station-Location
LST data under clear sky conditions at weather stations are retrieved by the following steps:

•
All these LST data (DN value) were converted to Celsius temperature using the following equation: where • C is the Celsius temperature and 0.02 is the scale factor of the MODIS LST product.

•
Removing outlier data: MODIS LST products are not available for a location (pixel) if clouds are present [27].However, there are some pixels that are lightly covered or contaminated by clouds.These pixels are not removed because the contamination is very small and cannot be detected by the cloud-removing mask algorithm [33,34].To avoid this kind of data, we studied and developed a similar method that was used in [35].This approach includes two steps: First, we simply filter and remove all unrealistic LST data that had values greater than 100

• Dynamic Combination of MODIS LST data
To estimate daily T a , we used all possible combinations of four LST data (LST ad , LST an , LST td , and LST tn ).These 15-combinations are shown in Table 2.
Due to the cloud cover effect, the number of valid observations from each station and each combination (C01-C15) are various (Table 2).Due to some missing T a-max , T a-min , and T a-mean observations, the number of valid observations in Table 2 differs from that in Figure 2.
In order to investigate the difference between dynamic combinations, as well as the performance of different algorithms, we used two datasets: Dataset A, MODIS LST data only; and Dataset B, MODIS LST together with elevation (ele) and Julian day (jd) data.

•
Algorithms used Linear/Multiple Linear Regression Model (LM) is a model that represents the relationship between one response variable and one predictor variable (Simple Linear Regression) or more than one predictor variable (Multiple Linear Regression) by using parameters entered linearly and estimated by the least squares method.So far, LM is one of the most popular statistical models for T a estimation using MODIS LST [14,17,22,25,36,37].Although it was found that the correlation between LST and T a is high, this relationship may not actually be linear [18].Therefore, our current knowledge might be incomplete if we do not try machine learning algorithms.Machine learning algorithms promise a better estimation of T a using MODIS LST because they can handle non-linearity and highly correlated predictor variables [26,38,39].Furthermore, based on the conceptual designs of machine learning algorithms, they are able to deal with data that have a different relationship between predictor and response variables under different conditions such as season, elevation, and land cover characteristic [26].
Random Forests (RF), which was proposed by Breiman in 2001 [40], is a nonparametric and ensemble technique.Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.It is different from traditional statistical methods that contain a parametric model for prediction.In RF, it contains many decision trees, where each tree is built from a random subset of training data with a random subset of predictor variables.The final predicted values are produced by the aggregation of the results of all the individual trees that make up the forest [25].
Cubist regression (CB) is a rule-based regression technique that was developed based on a combination of the ideas of Quinlan [41][42][43].CB does not retrieve one final model like RF, but a set of rules associated with sets of multi-variate models.Then, a specific set of predictor variables will choose an actual prediction model based on the rule that best fits the predictors [44].Cubist is a commercial, proprietary product and has the least algorithmic documentation in comparison to linear regression and random forest [45].However, it is currently a popular and widely used regression and classification method because it was ported into R by Kuhn et al. [46] in 2013.Most recently, it was used in T a estimation research and showed very good results in the research of Meyer et al. [26] and Zhang et al. [18].
Therefore, in this study, to estimate T a and assess the accuracy of estimation, three different methods were employed: linear regression (LM), cubist regression (CB) and random forests (RF).All methods are performed in the R statistical software.

•
Assessment Criteria To assess the performance of models, we used and compared the values of the two most popular criteria: the coefficient of determination (R 2 ) and the root mean square error (RMSE) that were calculated from the measured and estimated T a values from three algorithms: LM, CB, and RF.

• Comparison
Being one of the most popular validation methods, cross-validation was used in order to compare different combinations and different algorithms.
In order to implement the cross-validation, the dataset is divided into k groups (k-fold) of approximately the same size.Then, k − 1 groups of the dataset are used as the training set, and the left-out group is used for validation.When the number of groups (k) equals the number of observations (n), it is called "leave-one-out cross-validation".
Due to the high number of observations, we used 10-fold cross-validation (k = 10) and repeated it twice for cross-validation.

The Relationship between T a and LST MODIS
In order to evaluate the MODIS LST data for T a estimation, we first test the relationship between T a and LST MODIS of all three weather stations.
Figure 2 shows the scatter plot of T a and LST of daytime and nighttime.It was found that: The coefficients of determination were high, ranging from 0.43 to 0.72, and the correlation between LST nighttime and T a-min were higher than LST daytime and T a-max at Dien Bien and Sin Ho stations.However, at Lai Chau station, the correlation between LST daytime versus T a-max was slightly higher than nighttime versus T a-min .This indicates that the relationship between MODIS LST and T a of this study area is complex.
The relationships between T a-min versus LST an and LST tn were quite similar at all three stations.However, the relationships between T a-max versus LST daytime were different; at Lai Chau station most T a-max values are higher than the LST ad and LST td values (most of the points lie above the red line).Meanwhile, at Dien Bien station, T a-max is quite similar to LST ad but T a-max was higher than LST td .At Sin Ho station, there is not much difference between T a versus LST but there are a lot of data points lying outside the "±5 lines".
Due to the cloud effect, the number of valid observations changes from daytime to nighttime; during the daytime the AQUA sensor gives more observations than TERRA.However, at night, TERRA gives more observations than AQUA.

Different Combinations of MODIS LST for Ta Estimation
As shown in Figure 2, the linear correlation between Ta and LST are strong for both Terra LST and Aqua LST of daytime and nighttime.Furthermore, in Section 1 we also showed that there are plenty of studies using MODIS LST data for Ta estimation using the LM method.
Therefore, in order to estimate the effect of different MODIS LST data combinations, we applied all three methods (linear regression, cubist regression, and random forest models) to the 15 combinations as shown in Table 2.
In the LM method, the equations of 15 combinations (C01-C15) are shown in Appendixes A and B. However, regarding CB and RF, which are nonparametric methods, equations cannot be provided as for the LM method.

Combinations Using One LST Variable
Figure 3a,b show the coefficient of determination (R 2 ) and root mean square error (RMSE) of combinations C01-C04 using three algorithms (LM, CB, and RF) with Dataset A and Dataset B, respectively.It can be clearly seen that there is a large difference between Figure 3a (using LST solely) and Figure 3b (using LST with elevation and Julian day data).At Figure 3a, LM and CB show similar results and higher accuracy than the RF algorithm in all four combinations (C01-C04).In contrast, Figure 3b shows similar results for CB and RF in all four combinations and slightly higher values than with the LM algorithm.It is suggested that when one LST is used with an auxiliary data for Ta estimation, RF and CB performance are better than LM.
Both Figure 3a,b show that the accuracy of C02 and C04 is much higher than for C01 and C03 (higher value of R 2 and lower value of RMSE).It can be stated that nighttime LST was better than daytime for deriving daily Ta.This result is consistent with [17,47].Regarding the two datasets used, in all combinations (C01-C04) the accuracies of Ta estimation using Dataset B are much higher than when using Dataset A.

Different Combinations of MODIS LST for T a Estimation
As shown in Figure 2, the linear correlation between T a and LST are strong for both Terra LST and Aqua LST of daytime and nighttime.Furthermore, in Section 1 we also showed that there are plenty of studies using MODIS LST data for T a estimation using the LM method.
Therefore, in order to estimate the effect of different MODIS LST data combinations, we applied all three methods (linear regression, cubist regression, and random forest models) to the 15 combinations as shown in Table 2.
In the LM method, the equations of 15 combinations (C01-C15) are shown in Appendixes A and B. However, regarding CB and RF, which are nonparametric methods, equations cannot be provided as for the LM method.

Combinations Using One LST Variable
Figure 3a,b show the coefficient of determination (R 2 ) and root mean square error (RMSE) of combinations C01-C04 using three algorithms (LM, CB, and RF) with Dataset A and Dataset B, respectively.It can be clearly seen that there is a large difference between Figure 3a (using LST solely) and Figure 3b (using LST with elevation and Julian day data).At Figure 3a, LM and CB show similar results and higher accuracy than the RF algorithm in all four combinations (C01-C04).In contrast, Figure 3b shows similar results for CB and RF in all four combinations and slightly higher values than with the LM algorithm.It is suggested that when one LST is used with an auxiliary data for T a estimation, RF and CB performance are better than LM.
Both Figure 3a,b show that the accuracy of C02 and C04 is much higher than for C01 and C03 (higher value of R 2 and lower value of RMSE).It can be stated that nighttime LST was better than daytime for deriving daily T a .This result is consistent with [17,47].Regarding the two datasets used, in all combinations (C01-C04) the accuracies of T a estimation using Dataset B are much higher than when using Dataset A.  For T a-min and T a-mean estimation, Figure 3a shows that the combinations using LST nighttime (C02 and C04) have significantly higher accuracy than the combinations using LST daytime (C01 and C03).However, these differences are not clearly shown in Figure 3b (except for in the LM results).
Regarding Dataset A, AQUA daytime (C01) shows better results for T a-max estimation than TERRA daytime (C03).However, at night AQUA and TERRA show similar results for T a estimation.The results of both daytime and nighttime of TERRA and AQUA are consistent and similar in T a estimation (Figure 3b).

Combinations Using Two-LST Variables
In this case, we used all possible combinations with LST to estimate T a .As shown in Table 2, we applied six possible combinations of LST for T a estimation.
In general, Figure 4a,b show that both results of T a estimation using Dataset A and B are higher than the one-LST-combination (Figure 3a,b).Figure 4a shows that the difference between the three algorithms is not as large as in the results shown in Figure 3a (except for C07).
In these combinations (C05-C10), CB and LM show similar and slightly higher accuracies than RF for Dataset A. The contrast is also evident in Dataset B: the CB and RF results are similar and slightly higher than LM.Especially in C07, the results of LM are much lower than those of CB and RF (Figure 4b).The results of all T a estimations using Dataset B are still higher than using Dataset A. For Ta-min and Ta-mean estimation, Figure 3a shows that the combinations using LST nighttime (C02 and C04) have significantly higher accuracy than the combinations using LST daytime (C01 and C03).However, these differences are not clearly shown in Figure 3b (except for in the LM results).
Regarding Dataset A, AQUA daytime (C01) shows better results for Ta-max estimation than TERRA daytime (C03).However, at night AQUA and TERRA show similar results for Ta estimation.The results of both daytime and nighttime of TERRA and AQUA are consistent and similar in Ta estimation (Figure 3b).

Combinations Using Two-LST Variables
In this case, we used all possible combinations with LST to estimate Ta.As shown in Table 2, we applied six possible combinations of LST for Ta estimation.
In general, Figure 4a,b show that both results of Ta estimation using Dataset A and B are higher than the one-LST-combination (Figure 3a,b).Figure 4a shows that the difference between the three algorithms is not as large as in the results shown in Figure 3a (except for C07).
In these combinations (C05-C10), CB and LM show similar and slightly higher accuracies than RF for Dataset A. The contrast is also evident in Dataset B: the CB and RF results are similar and slightly higher than LM.Especially in C07, the results of LM are much lower than those of CB and RF (Figure 4b).The results of all Ta estimations using Dataset B are still higher than using Dataset A.
(a) Looking at the first two rows of Figure 4a,b (C05, combined LSTad + LSTan; and C06, combined LSTtd + LSTtn), there are similar results for Ta estimations between them.It is indicated that the overpass times of AQUA and TERRA do not significantly affect the result of Ta estimation when we combine daytime and nighttime LST.This is true for all three methods (LM, CB, and RF).These results are also consistent with previous studies [15,17,47], which used LM as the statistical model for Ta estimation.
The most interesting finding of two-LST-combined is the combination of C07.The results of Dataset A (panel row 3, Figure 4a) show the lowest accuracy in comparison to five other two-LST-combined (R 2 approximately 0.6, 0.5 and 0.35; RMSE approximately 3.5, 3.2, and 3.7 °C for Ta-max, Ta-mean, and Ta-min, respectively).In addition, among the three algorithms, RF shows the lowest results with lower R 2 and higher RMSE.In contrast, the results of Dataset B are absolutely different (Figure 4b, panel row 3).The results of C07 (using Dataset B) are similar to the five other Looking at the first two rows of Figure 4a,b (C05, combined LST ad + LST an ; and C06, combined LST td + LST tn ), there are similar results for T a estimations between them.It is indicated that the overpass times of AQUA and TERRA do not significantly affect the result of T a estimation when we combine daytime and nighttime LST.This is true for all three methods (LM, CB, and RF).These results are also consistent with previous studies [15,17,47], which used LM as the statistical model for T a estimation.
The most interesting finding of two-LST-combined is the combination of C07.The results of Dataset A (panel row 3, Figure 4a) show the lowest accuracy in comparison to five other two-LST-combined (R 2 approximately 0.6, 0.5 and 0.35; RMSE approximately 3.5, 3.2, and 3.7 • C for T a-max , T a-mean , and T a-min , respectively).In addition, among the three algorithms, RF shows the lowest results with lower R 2 and higher RMSE.In contrast, the results of Dataset B are absolutely different (Figure 4b, panel row 3).The results of C07 (using Dataset B) are similar to the five other two-LST-combined (R 2 approximately 0.88, 0.80, and 0.73; RMSE approximately 1.8, 1.9, and 2.5 • C for T a-max , T a-mean , and T a-min , respectively, except for the results of LM) and much higher than using Dataset A. Among the three algorithms, the lowest result for T a estimation is LM (Figure 4b).Meanwhile, CB and RF show higher results, especially for T a-min and T a-mean estimation.It should be noted that C07 is the combination of TERRA and AQUA daytime LST, which is the most complicated in the relationship between T a and LST in comparison to the rest of the combinations.The difference between the results of Datasets A and B indicates that elevation and Julian day (i.e., season) also affect the relationship between LST and T a .This is consistent with the results from [15,23,48,49].The high accuracy of T a estimation using the RF and CB algorithms in Figure 4b also indicates that RF and CB can account for the complicated relationship between predictor and response variables under different conditions, especially in mountainous area.This finding is consistent with the studies by Zhang et al. [18] and Xu et al. [25].

Combinations Using Three-LST Variables
In general, Figure 5a,b show that all three-combined LST result in a very high accuracy of T a estimation and the differences in accuracy between the three different algorithms are not significant (p-value > 0.05).However, the results of T a estimation using Dataset B are much higher than using Dataset A. In both datasets, the results of T a-max and T a-mean are always better than T a-min (except C12 and C14 of Dataset A).This can be explained by the fact that, because of two LST nighttime variables (LST tn and LST an ) in C12 and C14, the accuracy of T a-min estimation could be increased.However, in Dataset B, by introducing the two variables elevation and Julian day, the accuracy of all T a-max , T a-min , and T a-mean estimations has increased (T a-max and T a-mean is increased more significantly than T a-min when elevation and Julian day data were introduced).
Remote Sens. 2017, 9, 398 11 of 23 two-LST-combined (R 2 approximately 0.88, 0.80, and 0.73; RMSE approximately 1.8, 1.9, and 2.5 °C for Ta-max, Ta-mean, and Ta-min, respectively, except for the results of LM) and much higher than using Dataset A. Among the three algorithms, the lowest result for Ta estimation is LM (Figure 4b).Meanwhile, CB and RF show higher results, especially for Ta-min and Ta-mean estimation.It should be noted that C07 is the combination of TERRA and AQUA daytime LST, which is the most complicated in the relationship between Ta and LST in comparison to the rest of the combinations.
The difference between the results of Datasets A and B indicates that elevation and Julian day (i.e., season) also affect the relationship between LST and Ta.This is consistent with the results from [15,23,48,49].The high accuracy of Ta estimation using the RF and CB algorithms in Figure 4b also indicates that RF and CB can account for the complicated relationship between predictor and response variables under different conditions, especially in mountainous area.This finding is consistent with the studies by Zhang et al. [18] and Xu et al. [25].

Combinations Using Three-LST Variables
In general, Figure 5a,b show that all three-combined LST result in a very high accuracy of Ta estimation and the differences in accuracy between the three different algorithms are not significant (p-value > 0.05).However, the results of Ta estimation using Dataset B are much higher than using Dataset A. In both datasets, the results of Ta-max and Ta-mean are always better than Ta-min (except C12 and C14 of Dataset A).This can be explained by the fact that, because of two LST nighttime variables (LSTtn and LSTan) in C12 and C14, the accuracy of Ta-min estimation could be increased.However, in Dataset B, by introducing the two variables elevation and Julian day, the accuracy of all Ta-max, Ta-min, and Ta-mean estimations has increased (Ta-max and Ta-mean is increased more significantly than Ta-min when elevation and Julian day data were introduced). (a)

Combinations Using Four-LST Variables
The first result clearly seen from Figure 6 is that all three algorithms show a similar accuracy of Ta estimation in both Dataset A and B. However, the results of Dataset B (R 2 approximately 0.93, 0.89 and 0.8, RMSE approximately 1.5, 1.6, and around 2.0 °C for Ta-max, Ta-mean, and Ta-min, respectively) are much higher than the results of Dataset A (R 2 approximately 0.84, 0.88, and 0.75; RMSE roughly 2.2, 1.7, and 2.2 °C for Ta-max, Ta-mean, and Ta-min, respectively).
In addition, the statistical results also indicate that the difference between the three algorithms is not significant (p-value > 0.05) in either Dataset A or B.

Combinations Using Four-LST Variables
The first result clearly seen from Figure 6 is that all three algorithms show a similar accuracy of T a estimation in both Dataset A and B. However, the results of Dataset B (R 2 approximately 0.93, 0.89 and 0.8, RMSE approximately 1.5, 1.6, and around 2.0 • C for T a-max , T a-mean , and T a-min , respectively) are much higher than the results of Dataset A (R 2 approximately 0.84, 0.88, and 0.75; RMSE roughly 2.2, 1.7, and 2.2 • C for T a-max , T a-mean , and T a-min , respectively).
In addition, the statistical results also indicate that the difference between the three algorithms is not significant (p-value > 0.05) in either Dataset A or B.

Model Calibration and Validation
In several previous studies [23,36], one of the most common validation methods is that the sample data is randomly divided into a calibration and a validation dataset (e.g., 70% and 30% respectively).Calibration data were used for training data and validation data were used to assess the model performance.However, there is a drawback with this random choice: If we use a local dataset to train the model (i.e., a dataset that does not represent all dataset characteristics), then we apply a fitted model to the validation data.This could be misleading in the accuracy assessment.Especially in machine learning algorithms like CB or RF, this could lead to overfitting problems (e.g., the accuracy of the training part is very high; however, the model cannot be applied successfully to the validation dataset).
In this paper, we studied this problem in Ta estimation using MODIS LST.First, we randomly divide the data of all 15 combinations into two datasets: calibration and validation (70% and 30%, respectively).Next, we fitted the model using a calibration dataset, and then we applied the fitted model to the validation dataset and the entire dataset.Finally, we assessed the accuracies of validation data, full data, and cross-validation.
These processes are applied to both Dataset A and Dataset B. In Figure 7, the LM algorithm shows consistent results between the validation data, the total data, and the cross-validations of both Dataset A and B. The results of Ta estimation using Dataset B (right-hand panel) are slightly higher than with Dataset A (left-hand panel).It could be suggested that when LST data alone were used (without auxiliary data), the accuracy of Ta estimation could be affected by a change in season or the elevation of the weather station.This is consistent with previous studies [17,36].In the CB method (Figure 8), the results of validation, full data, and cross-validation are also consistent with each other.However, in both algorithms LM and CB, the results of Dataset A and Dataset B showed a significant difference, especially the combinations 1, 3, and 7 (C01, C03, and C07), where there is only LST daytime data.It is suggested that if LST nighttime is not available then the accuracy of Ta estimation could be improved by adding auxiliary data.Comparing Figures 7 and 8, it can be clearly seen that CB produces better results for Ta estimation than LM.

Model Calibration and Validation
In several previous studies [23,36], one of the most common validation methods is that the sample data is randomly divided into a calibration and a validation dataset (e.g., 70% and 30% respectively).Calibration data were used for training data and validation data were used to assess the model performance.However, there is a drawback with this random choice: If we use a local dataset to train the model (i.e., a dataset that does not represent all dataset characteristics), then we apply a fitted model to the validation data.This could be misleading in the accuracy assessment.Especially in machine learning algorithms like CB or RF, this could lead to overfitting problems (e.g., the accuracy of the training part is very high; however, the model cannot be applied successfully to the validation dataset).
In this paper, we studied this problem in T a estimation using MODIS LST.First, we randomly divide the data of all 15 combinations into two datasets: calibration and validation (70% and 30%, respectively).Next, we fitted the model using a calibration dataset, and then we applied the fitted model to the validation dataset and the entire dataset.Finally, we assessed the accuracies of validation data, full data, and cross-validation.
These processes are applied to both Dataset A and Dataset B. In Figure 7, the LM algorithm shows consistent results between the validation data, the total data, and the cross-validations of both Dataset A and B. The results of T a estimation using Dataset B (right-hand panel) are slightly higher than with Dataset A (left-hand panel).It could be suggested that when LST data alone were used (without auxiliary data), the accuracy of T a estimation could be affected by a change in season or the elevation of the weather station.This is consistent with previous studies [17,36].In the CB method (Figure 8), the results of validation, full data, and cross-validation are also consistent with each other.However, in both algorithms LM and CB, the results of Dataset A and Dataset B showed a significant difference, especially the combinations 1, 3, and 7 (C01, C03, and C07), where there is only LST daytime data.It is suggested that if LST nighttime is not available then the accuracy of T a estimation could be improved by adding auxiliary data.Comparing Figures 7 and 8, it can be clearly seen that CB produces better results for T a estimation than LM.Unlike LM and CB, the results of RF algorithm (Figure 9) are not consistent when applied to the validation data, full data, and cross-validation using Dataset A or Dataset B. As is shown in Figure 9, the results of cross-validation and the results using the validation data are similar and lower than when using the full data.It is suggested that the RF algorithm could be overfitting the Ta estimation using MODIS LST.It is also clearly seen that the results of Ta estimation using Dataset B are much higher than Dataset A, especially the combinations C01, C03, and C07.Again, the results of RF confirm that auxiliary data (i.e., elevation and Julian day) together with the RF algorithm can increase the accuracy of Ta estimation, especially in the case of missing LST nighttime data (i.e., combinations C01, C03, and C07).Unlike LM and CB, the results of RF algorithm (Figure 9) are not consistent when applied to the validation data, full data, and cross-validation using Dataset A or Dataset B. As is shown in Figure 9, the results of cross-validation and the results using the validation data are similar and lower than when using the full data.It is suggested that the RF algorithm could be overfitting the T a estimation using MODIS LST.It is also clearly seen that the results of T a estimation using Dataset B are much higher than Dataset A, especially the combinations C01, C03, and C07.Again, the results of RF confirm that auxiliary data (i.e., elevation and Julian day) together with the RF algorithm can increase the accuracy of T a estimation, especially in the case of missing LST nighttime data (i.e., combinations C01, C03, and C07).

Effects of Different Combinations and Statistical Model Applications
Figure 10 shows a comparison between the 15 combined LST datasets when applied to three different algorithms (LM, CB, and RF), based on the criteria of R 2 and RMSE.
Regarding Dataset A, in all combinations (C01-C15) for all Ta-max, Ta-min, and Ta-mean estimations, the results of the LM and CB algorithms are similar and higher than RF.However, from C10 to C15, the differences between the three algorithms are not clear.The results of combinations C01, C03, and C07 are much lower than the rest of the combinations in all three algorithms.

Effects of Different Combinations and Statistical Model Applications
Figure 10 shows a comparison between the 15 combined LST datasets when applied to three different algorithms (LM, CB, and RF), based on the criteria of R 2 and RMSE.
Regarding Dataset A, in all combinations (C01-C15) for all T a-max , T a-min , and T a-mean estimations, the results of the LM and CB algorithms are similar and higher than RF.However, from C10 to C15, the differences between the three algorithms are not clear.The results of combinations C01, C03, and C07 are much lower than the rest of the combinations in all three algorithms.
Considering Dataset B, the results are very different to those of Dataset A. Especially, in combinations C01, C03, and C07, the results of CB and RF are similar and much higher than LM.This can be explained by the fact that during the daytime, solar radiation affects the thermal infrared signal, and the relationship between T a and LST becomes more complicated.That is why simple models like C01, C03, and C07 (of Dataset A) cannot handle this relationship well.The results of all combinations (C01 to C15) were quite similar when the CB and RF algorithms were applied.Considering Dataset B, the results are very different to those of Dataset A. Especially, in combinations C01, C03, and C07, the results of CB and RF are similar and much higher than LM.This can be explained by the fact that during the daytime, solar radiation affects the thermal infrared signal, and the relationship between Ta and LST becomes more complicated.That is why simple models like C01, C03, and C07 (of Dataset A) cannot handle this relationship well.The results of all combinations (C01 to C15) were quite similar when the CB and RF algorithms were applied.It can be clearly seen that, in all combinations (C01-C15) of Dataset B, the cubist regression always shows the highest accuracy of Ta estimation (slightly higher than RF and much higher than It can be clearly seen that, in all combinations (C01-C15) of Dataset B, the cubist regression always shows the highest accuracy of T a estimation (slightly higher than RF and much higher than LM).This is consistent with the studies of [18,25].It should be remembered that Xu et al. [25] used MODIS LST and many other auxiliary variables like NDVI, longitude, latitude, etc.In this case, it could be explained by the complex terrain of the study area.It is suggested that the differences in topography, land surface properties, solar radiation, and many other factors could affect the relationships between T a and LST [14,[50][51][52].Therefore, a linear regression model, considered as a single global model, could not handle the complicated relationship between T a and the abovementioned variables under different conditions [25].In contrast, CB and RF can account for the nonlinear and complicated relationship between the predictor and response variables under different conditions.That is why, in this mountainous study area, the cubist regression and random forest algorithms always show better results than LM in all 15 combinations (Figure 10, right panel).
However, from combination number C02 and C04 to 15 (except number 7-C07), which have at least one nighttime LST term in the combination, the performances of all three methods are good (high correlation and low errors).
Another point is that in Dataset A, the different combinations of LST had a similar effect on all three algorithms.However, in Dataset B, the different combinations of LST had a similar effect on RF and CB but a significantly different effect on the LM algorithm.The largest difference was found in T a-min estimation, follow by T a-mean and T a-max estimation.

Conclusions
This study proved that the very high accuracy of T a estimation (R 2 > 0.93/0.80/0.89and RMSE ~1.5/2.0/1.6 • C of T a-max , T a-min , and T a-mean , respectively) could be achieved with a simple combination of four LST data, elevation, and Julian day data using a suitable algorithm.
Using Dataset B (MODIS LST, elevation, and Julian day) with RF or CB algorithms would give a stable and high accuracy in all combinations (C01-C15).With the LM algorithm, the more LST terms (especially LST nighttime) are presented the higher the accuracy that can be achieved.
The impact of the different combinations is larger in Dataset A than in Dataset B. However, in Dataset B, this impact was also large when using the LM algorithm.
LST nighttime data of both AQUA and TERRA play an important role in daily T a estimation, guaranteeing higher accuracy.Depending on LST data availability, it could be used in any combination from C02, C04, and C05 to C15 (except C07 and C09) to achieve the highest results solely with MODIS LST using any of the three mentioned algorithms.However, when MODIS LST and auxiliary (elevation and Julian day) are available, any combination (C01-C15) can be applied with the CB or RF algorithm.
Among T a-max , T a-min , and T a-mean , using Dataset A, T a-mean was estimated with the highest accuracy, followed by T a-min and T a-max .However, the difference between T a-max and T a-min was not significant.Considering Dataset B, T a-max was estimated with the highest accuracy, followed by T a-mean and T a-min .This means that the highest improvement for T a-max is made by introducing elevation and Julian day data, followed by T a-mean and T a-min .However, the difference between T a-max and T a-mean was not significant.

Figure 1 .
Figure 1.Location of the weather stations and range of elevation (a) and land cover (b) from MODIS MCD12Q1 data in 2010 of the study area.

Figure 1 .
Figure 1.Location of the weather stations and range of elevation (a) and land cover (b) from MODIS MCD12Q1 data in 2010 of the study area.

Figure 2 .
Figure 2. The relationship between LST (x-axis) and Ta-max (first and third columns), Ta-min (second and last columns) of all meteorological stations from 2009 to 2013.The dashed line indicates that the difference between Ta and LST is over ±5 °C (±5 line).The red line indicates the 1:1 line.

Figure 2 .
Figure 2. The relationship between LST (x-axis) and T a-max (first and third columns), T a-min (second and last columns) of all meteorological stations from 2009 to 2013.The dashed line indicates that the difference between T a and LST is over ±5 • C (±5 line).The red line indicates the 1:1 line.

Figure 3 .
Figure 3. (a) Cross-validation results for one-LST-combination (C01-C04) using Dataset A, and multiple comparisons of the three algorithms.The x-axis shows the value of R 2 and RMSE (°C), the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE.(b) Cross-validation results for one-LST-combination (C01-C04) using Dataset B, and multiple comparisons of the three algorithms.The x-axis shows the values of R 2 and RMSE (°C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE.

Figure 3 .
Figure 3. (a) Cross-validation results for one-LST-combination (C01-C04) using Dataset A, and multiple comparisons of the three algorithms.The x-axis shows the value of R 2 and RMSE ( • C), the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE; (b) Cross-validation results for one-LST-combination (C01-C04) using Dataset B, and multiple comparisons of the three algorithms.The x-axis shows the values of R 2 and RMSE ( • C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE.

Figure 4 .
Figure 4. (a) Cross-validation results for two-LST-combinations (C05-C10) using Dataset A and multiple comparisons of the three algorithms.The x-axis shows the value of R 2 and RMSE ( • C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE; (b) Cross-validation results for two-LST-combinations (C05-C10) using Dataset B and multiple comparisons of the three algorithms.The x-axis shows the values of R 2 and RMSE ( • C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE.

Figure 5 .
Figure 5. (a) Cross-validation results for three-LST-combinations (C11-C14) using Dataset A and multiple comparisons of the three algorithms.The x-axis shows the values of R 2 and RMSE ( • C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE; (b) Cross-validation results for three-LST-combinations (C11-C14) using Dataset B and multiple comparisons of the three algorithms.The x-axis shows the value of R 2 and RMSE ( • C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE.

Figure 6 .
Figure 6.Cross-validation results for four-LST-combinations (C15) using Dataset A (upper rows) and B (lower rows) and multiple comparisons of the three algorithms.The x-axis shows the values of R 2 and RMSE (°C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE.

Figure 6 .
Figure 6.Cross-validation results for four-LST-combinations (C15) using Dataset A (upper rows) and B (lower rows) and multiple comparisons of the three algorithms.The x-axis shows the values of R 2 and RMSE ( • C); the y-axis shows the model types.The box and whiskers plots show the distributions of R 2 and RMSE.

Figure 7 .
Figure 7.Comparison of accuracy (R 2 and RMSE) when applying the LM algorithm to the validation dataset (_val), the full dataset (_all), and a cross-validation (_cv) of all combinations.The x-axis shows the combination number.The y-axis shows the values of RMSE (°C) and R 2 .

Figure 7 .
Figure 7.Comparison of accuracy (R 2 and RMSE) when applying the LM algorithm to the validation dataset (_val), the full dataset (_all), and a cross-validation (_cv) of all combinations.The x-axis shows the combination number.The y-axis shows the values of RMSE ( • C) and R 2 .

Figure 8 .
Figure 8.Comparison of accuracy (R 2 and RMSE) when applying the CB algorithm to the validation dataset (_val), the full dataset (_all), and a cross-validation (_cv) of all combinations.The x-axis shows the combination number.The y-axis shows the values of RMSE (°C) and R 2 .

Figure 8 .
Figure 8.Comparison of accuracy (R 2 and RMSE) when applying the CB algorithm to the validation dataset (_val), the full dataset (_all), and a cross-validation (_cv) of all combinations.The x-axis shows the combination number.The y-axis shows the values of RMSE ( • C) and R 2 .

Figure 9 .
Figure 9.Comparison of accuracy (R 2 and RMSE) when applying the RF algorithm to the validation dataset (_val), the full dataset (_all), and a cross-validation (_cv) of all combinations.The x-axis shows the combination number.The y-axis shows the values of RMSE (°C) and R 2 .

Figure 9 .
Figure 9.Comparison of accuracy (R 2 and RMSE) when applying the RF algorithm to the validation dataset (_val), the full dataset (_all), and a cross-validation (_cv) of all combinations.The x-axis shows the combination number.The y-axis shows the values of RMSE ( • C) and R 2 .

Figure 10 .
Figure 10.Different performance of the algorithms LM (red), CB (green), and RF (blue) through 15 combinations of Dataset A and Dataset B. The x-axis shows the combination number.The y-axis shows the values of RMSE (°C) and R 2 .

Figure 10 .
Figure 10.Different performance of the algorithms LM (red), CB (green), and RF (blue) through 15 combinations of Dataset A and Dataset B. The x-axis shows the combination number.The y-axis shows the values of RMSE ( • C) and R 2 .

Table 1 .
Geographical description and land cover type of weather stations used in this study.

Table 1 .
Geographical description and land cover type of weather stations used in this study.

Table 2 .
All possible combinations of four LST data and the valid number of observations.

Table A1 .
Parameters of LM models for T a estimation using Dataset A. 0 is the intercept of each model (combination), a 1 -a 4 are parameters of LST variables with the same order as shown in Table2. a

Table A2 .
Parameters of LM models for T a estimation using Dataset B. 0 is the intercept of each model (combination), a 1 -a 4 are parameters of LST variables with the same order as shown in Table2. a