Estimation of Daily and Instantaneous Near-Surface Air Temperature from MODIS Data Using Machine Learning Methods in the Jingjinji Area of China

to this work. Abstract: Meteorologically observed air temperature ( T a ) is limited due to low density and uneven distribution that leads to uncertain accuracy. Therefore, remote sensing data have been widely used to estimate near-surface T a on various temporal scales due to their spatially continuous characteristics. However, few studies have focused on instantaneous T a when satellites overpass. This study aims to produce both daily and instantaneous T a datasets at 1 km resolution for the Jingjinji area, China during 2018–2019, using machine learning methods based on remote sensing data, dense meteorological observation station data, and auxiliary data (such as elevation and normalized difference vegetation index). Newly released Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 6 surface Downward Shortwave Radiation (DSR) was introduced to improve the accuracy of T a estimation. Five machine learning algorithms were implemented and compared so that the optimal one could be selected. The random forest (RF) algorithm outperformed the others (such as decision tree, feedforward neural network, generalized linear model) and RF obtained the highest accuracy in model validation with a daily root mean square error (RMSE) of 1.29 ◦ C, mean absolute error (MAE) of 0.94 ◦ C, daytime instantaneous RMSE of 1.88 ◦ C, MAE of 1.35 ◦ C, nighttime instantaneous RMSE of 2.47 ◦ C, and MAE of 1.83 ◦ C. The corresponding R 2 was 0.99 for daily average, 0.98 for daytime instantaneous, and 0.95 for nighttime instantaneous. Analysis showed that land surface temperature (LST) was the most important factor contributing to model accuracy, followed by solar declination and DSR, which implied that DSR should be prioritized when estimating T a . Particularly, these results outperformed most models presented in previous studies. These ﬁndings suggested that RF could be used to estimate daily instantaneous T a at unprecedented accuracy and temporal scale with proper training and very dense station data. The estimated dataset could be very useful for local climate and ecology studies, as well as for nature resources exploration.


Introduction
Near-surface temperature (T a ) refers to the temperature two meters above the ground, which is an important parameter in many fields such as the studies of the environment [1][2][3], ecology [4], hydrology [5,6], and meteorology [7][8][9][10]. In the context of global warming over the past few decades, accurate estimation of T a is very important for climate change assessment and global collaborative response and also provides the scientific basis for China's carbon peak and neutrality goals to address climate change. The T a is usually obtained from 1.
Statistical methods such as linear regression models are commonly used to explore the relationship between T a and other variables [35][36][37][38][39]. Cresswell et al. [40] estimated instantaneous T a through a multiple regression model; the model used Solar Zenith Angle (SZA) as the only auxiliary variable and achieved an accuracy of mean deviation less than 3 • C for over 70% of the cases. Chen et al. [27] retrieved monthly average temperature (RMSE between 1.29 and 1.45 • C) and eight-day average temperature (RMSE between 0.8 and 1.29 • C) for China in 2010 using a model based on remote sensing data and a geographically weighted regression (GWR) algorithm; the elevation was the only secondary auxiliary variable; the results show that the GWR method performs better than the multiple linear regression method and the regression Kriging method. 2.
The temperature-vegetation index (TVX) method is based on the characteristics of plant canopy temperature that is close to the T a ; this method can be used to calculate the T a by the relationship between a vegetation index and LST and has also been widely used [31,41,42]. The TVX method was tested in many areas of the world; the resulting RMSE was between 1-3 • C [41][42][43][44][45]. Due to the principle of the method, the TVX method is more suitable for areas with more vegetation coverage. The TVX method shows significant uncertainties while applied to the area with sparse vegetation [43]. 3.
The energy balance method based on the surface heat flux balance equation, incoming net radiation flux, and anthropogenic heat fluxes equals the sum of outgoing land surface heat flux (sensible and latent heat flux) [46][47][48]. Zaksek et al. [49] carried out an estimation of T a in Slovenia and Germany using the energy balance method, having the root mean square deviation (RMSD) of the results at 2 • C. The method can well describe the physical mechanism of the near-surface energy balance process [50]. The main drawback of the method is that many environmental data (usually in hourly intervals) were needed to force the model and not all data were easy to obtain, especially in a large scale [48]. 4.
Machine learning (ML) methods (such as neural networks, decision trees, support vector machine) are based on nonlinear machine learning algorithms. ML methods greatly improve the computational efficiency and simplify the exploration process of Remote Sens. 2022, 14,1916 3 of 22 nonlinear and highly interactive relationships compared with the traditional statistical method, the TVX method, and the energy balance method [51][52][53].
Numerous studies tested the different ML methods in satellite-based T a estimating in recent years. Noi et al. [41] compared the accuracy of multiple regression model, decision tree model, and random forest model in estimating near-surface air temperature in the mountainous area of northwest Vietnam from 2009 to 2013. The results showed that the decision tree model and the linear regression model performed better than the random forest model when only LST data were used without auxiliary variables. However, when two easily accessible variables (altitude and Julian day) were introduced into the model as auxiliary variables, the decision tree model and the random forest model performed significantly better than the linear regression model, which indicates that the ML method is more suitable for a multi-auxiliary variable model. Yoo et al. [54] used the random forest model to estimate the daily maximum and minimum temperatures in Los Angeles and Seoul and introduced seven auxiliary variables into the model: elevation, solar radiation, normalized difference vegetation index, latitude, longitude, aspect, and the percentage of impervious area. The simulated R 2 ranged from 0.72-0.85 and RMSE ranged from 1.1-4.7 • C. Zhou et al. [55] proposed a two-stage RF based machine learning hybrid model to estimate intra-daily T a of Israel during 2004-2016. First, missing LST pixels were estimated and a gap-free LST dataset was obtained, then the RF model was employed to estimate T a with different auxiliary variables (six auxiliary variables in stage one and seven auxiliary variables in stage two), which reached R 2 0.96, MAE 1.12 • C, and RMSE 1.58 • C. Ruiz-Álvarez et al. [56] compared support vector machines, random forests, multiple linear regression, and Kriging interpolation in estimating the T a in the DHS region of southeastern Spain and the results showed that RF-based methods are more accurate and their performance improved when spatial components are included. Xu et al. [57] compared the accuracy of ten different machine learning methods in simulating the monthly average T a of the Qinghai-Tibet Plateau with a 1 km resolution and the result showed that the Cubist model performs better than the other models (RMSE, 1.0 • C; MAE, 0.73 • C). Hrisko et al. [58] simulated the T a of urban areas in the United States using a regression neural network based on GEOS-16 satellites with a simulation result RMSE of 2.6 • C. Li et al. [59] simulated the monthly average T a of China from 2001 to 2015 using the RF model; the simulation results had an MAE between 1.15 and 1.44 • C and an RMSE between 1.57 and 1.99 • C.
The following points can be summarized from previous studies: (1) estimating T a based on remote sensing data is one of the most feasible methods at present, especially in a large scale, (2) multi-auxiliary variables can greatly improve the T a estimation accuracy, (3) the machine learning methods with multi-auxiliary variables were suitable for remote sensing based estimation of T a [52,53,56,57]. Instantaneous T a is very important for meteorological processes and weather forecasts, such as numerical weather simulation, for its better near-real-time feature [56]; however, this study found that most previous studies focused on the estimation of average T a over a period of time from daily average T a to monthly average T a in a large-scale area, while instantaneous T a was rarely estimated [53,60,61]. Furthermore, instantaneous T a is more difficult to estimate due to considerable changes in one day [62,63]. Thus, this study aimed to further improve estimation accuracy by introducing the Downward Shortwave Radiation (DSR) product into the model, by using high dense in-situ observation data as training inputs in a large study area over different land cover types, and finally by estimating 1 km resolution instantaneous T a while satellites overpassed for the purpose of assessing the ability of machine learning methods. about 113.07 million and the GDP was 8.458 trillion yuan, making it the most important economic core area in northern China. Its terrain is complex and the altitudinal gradient is nearly 3000 m from the northwest mountainous area to the southeast plains area. The land use and vegetation cover are diverse, with a high degree of industrial and agricultural development, including farmland, cities and towns, forests and grasslands, and lakes and wetlands. The rapid urbanization in the Jingjinji area will have an impact on urban heat island and other phenomena. Due to its important status as an economic, cultural, and political center, the Jingjinji area is a hot area of climate change and urbanization research, therefore it is necessary to simulate temperature data. This area has two climatic zones. The northwest belongs to the temperate zone continental climate and the southeast belongs to the temperate zone monsoon climate, which means the T a estimation is very challenging because of the complexity of the geography. Therefore, factors such as topography, vegetation, and climatic types must be taken into account in the T a estimation.

Study Area
Jingjinji area, located in north China between 36°05′N-42°40′N and 113°27′E-119°50′E, covers three administrative regions including Beijing, Tianjin, and Hebei Province, with a total area of about 218,000 km 2 [64] (Figure 1). By 2019, the permanent population was about 113.07 million and the GDP was 8.458 trillion yuan, making it the most important economic core area in northern China. Its terrain is complex and the altitudinal gradient is nearly 3000 m from the northwest mountainous area to the southeast plains area. The land use and vegetation cover are diverse, with a high degree of industrial and agricultural development, including farmland, cities and towns, forests and grasslands, and lakes and wetlands. The rapid urbanization in the Jingjinji area will have an impact on urban heat island and other phenomena. Due to its important status as an economic, cultural, and political center, the Jingjinji area is a hot area of climate change and urbanization research, therefore it is necessary to simulate temperature data. This area has two climatic zones. The northwest belongs to the temperate zone continental climate and the southeast belongs to the temperate zone monsoon climate, which means the Ta estimation is very challenging because of the complexity of the geography. Therefore, factors such as topography, vegetation, and climatic types must be taken into account in the Ta estimation.

Ground-Based Weather Data
The T a observational data were acquired from the hourly observational data of 1527 weather stations in the Jingjinji region from 2018 to 2019. Figure 1 shows the spatial distribution of the stations. The data quality has been preliminarily controlled according to QX/T 458-2018 meteorological observational data interchange specifications.

Remotely Sensed Data
The remote sensing data used in this paper are shown in Table 1. Digital elevation model (DEM) data with a 1 km resolution were produced after secondary processing of the Shuttle Topography Radar Mission's digital elevation product with a resolution of 90 m. The LST data were derived from MODIS MOD11A1 and MYD11A1 products with a resolution of 1 km released by NASA in 2018-2019. The MODIS remote sensing data come from the infrared radiation sensors carried by Terra and Aqua satellites, which scan the study area twice each day as follows: the transit time of Terra is about 11:00 and 21:00 (Beijing time, also used below), while the transit time of Aqua is about 02:30 and 13:30. Figure 2 show the percentages of available data for each satellite overpass. The mountains in the north and the west have more available data. Data percentage ranged from 50~60%, because high-altitude mountainous areas are less affected by cloud, smog, fog, haze, etc. Overall, the average percentage of valid LST data for the whole study area is about 55%.

Ground-Based Weather Data
The Ta observational data were acquired from the hourly observational data of 1527 weather stations in the Jingjinji region from 2018 to 2019. Figure 1 shows the spatial distribution of the stations. The data quality has been preliminarily controlled according to QX/T 458-2018 meteorological observational data interchange specifications.

Remotely Sensed Data
The remote sensing data used in this paper are shown in Table 1. Digital elevation model (DEM) data with a 1 km resolution were produced after secondary processing of the Shuttle Topography Radar Mission's digital elevation product with a resolution of 90 m. The LST data were derived from MODIS MOD11A1 and MYD11A1 products with a resolution of 1 km released by NASA in 2018-2019. The MODIS remote sensing data come from the infrared radiation sensors carried by Terra and Aqua satellites, which scan the study area twice each day as follows: the transit time of Terra is about 11:00 and 21:00 (Beijing time, also used below), while the transit time of Aqua is about 02:30 and 13:30. Figure 2 show the percentages of available data for each satellite overpass. The mountains in the north and the west have more available data. Data percentage ranged from 50~60%, because high-altitude mountainous areas are less affected by cloud, smog, fog, haze, etc. Overall, the average percentage of valid LST data for the whole study area is about 55%. In this paper, three other types of remote sensing data were used as auxiliary input variables in the model, including Downward Shortwave Radiation (DSR), Normalized Difference Vegetation Index (NDVI), and Land Cover (LC) ( Table 1). The DSR data were derived from MODIS daily radiation products, numbered MCD18A1 with a spatial reso-  In this paper, three other types of remote sensing data were used as auxiliary input variables in the model, including Downward Shortwave Radiation (DSR), Normalized Difference Vegetation Index (NDVI), and Land Cover (LC) ( Table 1). The DSR data were derived from MODIS daily radiation products, numbered MCD18A1 with a spatial resolution of 5 km, according to NASA's description on its website; the reliability of DSR products has been improved by fixing several errors in the algorithm since 2018. Land Cover data were derived from MODIS annual land cover type data; the product number is MCD12Q1 and has a spatial resolution of 0.5 km. This dataset contains five land cover classification systems. In this paper, the global vegetation classification scheme of the International Geosphere-Biosphere Program was adopted, which divided land cover into 17 types, such as grassland, forest, and water body. Figure 3 presents the framework of model training and validation used in the present study. As shown in the figure, seven variables (latitude, elevation, declination, normalized difference vegetation index, land cover, downward shortwave radiation, land surface temperature) were selected as model inputs in this study. LST, DSR, NDVI, LC, and digital elevation model data were introduced in Section 2.3. In addition, latitude (LAT) and declination of the sun were selected as auxiliary variables. LAT largely determines the climatic and environmental characteristics of an area, while the declination of the sun is closely related to the day length and seasonal changes of a region. These two factors affect the energy balance process near the ground. Solar declination is rarely considered as a variable in the model of previous studies. This paper ranked the importance of each parameter in the model validation stage in order to verify the importance of different parameters. Table 2 shows the input variables for different scenarios. In this paper, two instantaneous T a estimation models were established, one for day and one for night. In the estimation of daily mean T a , the daily mean LST was obtained by summing the LST data of four times a day. If data were missing due to cloud cover or other reasons at a certain time, the daily mean LST of this point was considered as missing.

Models
As both complicated geographic environment and human factors such as impervious land surfaces have influence, T a changes dramatically along with time and space. Five typical machine learning models were employed to tackle the spatiotemporal variability and factors with complex effects on the relationship with T a , including a feedforward neural network (FNN) [65], decision tree (DT) [66], random forest (RF) [67], generalized linear model (GLM) [68], and support vector machine (SVM) [69]. We adjusted these models until the lowest level of error was procured using 10-fold cross-validation to obtain best fitting estimation as these models all have different parameter or algorithm combinations. In addition, the final model used for estimating T a was selected by comparing the performance of the five models (FNN, DT, RF, GLM, and SVM).

Feedforward Neural Network
Feedforward neural network (FNN) is a kind of artificial neural network that has a simple structure and a wide application. It is a kind of static nonlinear mapping, good at complex nonlinear processing [70]. Most feedforward networks are learning networks and their classification ability and pattern recognition ability are generally stronger than those of feedback networks. FNN adopts a unidirectional multilayer structure; each layer contains several neurons and each neuron can receive the signal of the previous neuron and produce the output to the next layer. The zero layer is called the input layer, the last layer is called the output layer, and the other intermediate layers are called the hidden layers; a hidden layer can be one layer or multiple layers [71].

Decision Tree
Decision tree (DT) is a simple but widely used classifier. In machine learning, DT is a prediction model that represents a mapping relationship between object attributes and object values [72]. A decision tree is a tree structure in which each internal node represents a test on an attribute, each branch represents a test output, and each leaf node represents a category. By training data to build a decision tree, the unknown data can be efficiently classified. The decision tree model is readable, descriptive, helpful for manual analysis, and efficient. The decision tree only needs to be constructed once, used repeatedly, and the maximum calculation times of each prediction does not exceed the depth of the decision tree [73].

Random Forest
Random forest (RF), proposed by Breiman in 2001, is a classifier. RF uses multiple trees to train and predict samples. The output categories of RF are determined by the output mode of individual trees [74,75]. RF runs efficiently on large data bases with high accuracy, which can handle thousands of input variables without variable deletion. RF can estimate which variables are important in classification and generate an internal unbiased estimate of generalization error as forest construction progresses. In addition, RF can also effectively estimate missing data and maintain accuracy when the proportion of missing data is large.

Generalized Linear Model
The generalized linear model is based on a linear model; the relationship between the mathematical expected value of a response variable and the predictive variables of a linear combination is established by means of a joint function [76]. It is characterized by the natural measurement of data without forcing changes and data can have nonlinear and unsteady variance structures. It is a development of a linear model in studying non-normal distribution of response value and simple and direct linear transformation of a nonlinear model [77].

Variable Importance Analysis
Mean decrease accuracy (IncMSE) [78] was used to calculate the importance of different variables in each model. Each model was recalculated to calculate the mean square error (MSE) increment of the new result after a random increase of ±25% deviation for a certain variable, assuming that other conditions remained unchanged. The average value was obtained after 30 repetitions. If the increase in the MSE value was larger, the importance of the parameter in the model was higher; otherwise, the importance was lower.

Model Training and Validation
In this study, data fusion correction and spatial matching were carried out based on meteorological and remote sensing data from 2018 to 2019. Ultimately, 166,008 daily sample data points and 992,705 instantaneous sample data points were collected. In this study, the widely used 10-fold cross-validation (10-CV) procedure [79] was selected for model validation, where all data samples were divided into ten subsets randomly; nine of them were used as the training data and the remaining as the testing data, and the holdout method is repeated 10 times. We computed an average accuracy score of all the accuracy scores that were calculated in each 10 iterations. The validation of the model was evaluated by mean absolute error (MAE), root mean square error (RMSE), and the coefficient of determination (R 2 ). The statistical measures were defined and used as follows: where n is the sample size, x i is the simulated value, y i is the measured value, andȳ is the average value of the measured value.

Comparison of Results of Different Models
We extracted 166,008 samples for the daily model, 456,422 samples for the daytime instantaneous model, and 536,283 samples for the nighttime instantaneous model. Table 3 shows the comparison of simulation results produced by the different models. The results show that the decision tree and RF algorithm performed better than other methods in the model fitting stage; the RMSE of the simulation results of daily mean air temperature, daytime instantaneous T a , and night instantaneous air temperature ranged from 0.71 • C to 1.42 • C. The RF algorithm is obviously better than other algorithms, while in the model validation stage the RMSEs were 1.29 • C, 1.88 • C, and 2.47 • C, respectively. Therefore, the RF model can be considered as the optimal model in the retrieval of T a in Jingjinji.  According to the results, RF produced the best simulation results for the daily average temperature in the model validation stage with the RMSE, MAE, and R 2 of 1.29 • C, 0.94 • C, and 0.99, respectively. Meanwhile, the RMSE, MAE, and R 2 of daytime instantaneous T a simulation were 1.88 • C, 1.35 • C, and 0.98, respectively. The estimation results of instantaneous T a at night were relatively poor with an RMSE, MAE, and R 2 of 2.47 • C, 1.83 • C, and 0.95, respectively (Table 3). Table 4 shows the IncMSE values and the weight ratios of each parameter in different scenarios. The results show that in the simulation of daily mean T a , the top four variables of IncMSE were LST, LAT, DECLINATION, and DSR. For daytime instantaneous T a simulation, the top four variables of IncMSE were LST, DECLINATION, DSR, and LAT. For the simulation of nighttime instantaneous T a , the top four variables of IncMSE were LST, declination, LAT, and LC. In general, the variable LST was much more important than other variables; IncMSE accounted for about 40-60%, followed by DECLINATION, of which the proportion of IncMSE was about 12-28%. In the simulation of daytime instantaneous T a , the importance of the variable DSR was second only to DECLINATION, and the proportion of IncMSE reached 14.93%, while in the simulation of daily average T a , the proportion of IncMSE was only 6.63%, which is lower than the variable LAT and DECLINATION.  Figure 4 presents the histogram of the distribution validation residuals of daily average T a , daytime instantaneous T a, and nighttime instantaneous T a simulated by the RF model. The figure shows the overall model error has a normal distribution. The error of less than ±1 • C accounts for 64.95%, 50.74%, and 38.09%, respectively, in the simulation of daily average T a , instantaneous T a during the day, and instantaneous T a at night, while the error of less than ±2 • C accounts for 89.18%, 77.86%, and 65.54%, respectively, in the simulation of these three types of T a .  Figure 4 presents the histogram of the distribution validation residuals of daily average Ta, daytime instantaneous Ta, and nighttime instantaneous Ta simulated by the RF model. The figure shows the overall model error has a normal distribution. The error of less than ±1 °C accounts for 64.95%, 50.74%, and 38.09%, respectively, in the simulation of daily average Ta, instantaneous Ta during the day, and instantaneous Ta at night, while the error of less than ±2 °C accounts for 89.18%, 77.86%, and 65.54%, respectively, in the simulation of these three types of Ta.  Figure 5 shows a diagram of the distribution of the spatial error in simulation results for daily mean Ta, daytime instantaneous Ta, and nighttime instantaneous Ta under the RF model. As can be seen from Figure 5, when simulating daily average Ta most station errors are between 0 and 2 °C. In the daytime instantaneous Ta simulation, compared with the average daytime Ta simulation, the number of stations with errors of >2 °C increased significantly. In addition, in the simulation of transient Ta at night, the station with error of >2 °C is further increased. According to the spatial distribution characteristics, the sites with errors >2 °C were mainly distributed in the western and northern high-elevation mountainous areas. Table 5 shows the comparison of the simulation results in areas of different types of terrain (plains and mountains) and land cover (urban and rural). It can be seen that the simulation accuracy in the plains was higher than in the mountainous areas under various scenarios; RMSE decreased by 0.62 °C, MAE decreased by 0.49 °C, and R 2 increased by 0.01 on average. The simulation accuracy in urban areas was higher than in rural areas; RMSE decreased by 0.24 °C, MAE decreased by 0.16 °C, and R 2 increased by 0.01 on average.  Figure 5 shows a diagram of the distribution of the spatial error in simulation results for daily mean T a , daytime instantaneous T a , and nighttime instantaneous T a under the RF model. As can be seen from Figure 5, when simulating daily average T a most station errors are between 0 and 2 • C. In the daytime instantaneous T a simulation, compared with the average daytime T a simulation, the number of stations with errors of >2 • C increased significantly. In addition, in the simulation of transient T a at night, the station with error of >2 • C is further increased. According to the spatial distribution characteristics, the sites with errors >2 • C were mainly distributed in the western and northern high-elevation mountainous areas. Table 5 shows the comparison of the simulation results in areas of different types of terrain (plains and mountains) and land cover (urban and rural). It can be seen that the simulation accuracy in the plains was higher than in the mountainous areas under various scenarios; RMSE decreased by 0.62 • C, MAE decreased by 0.49 • C, and R 2 increased by 0.01 on average. The simulation accuracy in urban areas was higher than in rural areas; RMSE decreased by 0.24 • C, MAE decreased by 0.16 • C, and R 2 increased by 0.01 on average.    Note: MD-mean deviation of observed T a ; SD-standard deviation of observed T a . Table 6 compares the evaluation results of daily average T a , daytime instantaneous T a , and nighttime instantaneous T a at different seasons simulated by the RF model. Figure 6 presents a scatter diagram of simulated and measured T a values at different seasons. As can be seen from Table 6, in T a daily average simulation, the error difference in four seasons is generally small. Among them, summer has the best simulation effect, followed by autumn, spring, and winter. The maximum and minimum values of RMSE and MAE in four seasons were 0.32 • C and 0.25 • C, respectively. In the simulation of daytime instantaneous T a , the simulation worked best in winter, followed by autumn and summer. The simulation results in spring are worse than in other seasons. The simulation of instantaneous T a worked best at night, followed by autumn. Table 6 shows that in summer the models have low MAE but also low R 2 in daily average and nighttime scenes. LST is closer to temperature in summer than in other seasons, which probably caused the small MAE in summer. Due to less cloud cover, there are more clear days in other seasons than in summer and a large number of relatively consistent data enhance the correlation of valid data, which may be the main reason for the high R 2 in other seasons. Note: MD-mean deviation of observed T a ; SD-standard deviation of observed T a .

Spatial Distribution of T a
Thirty-first May 2018 was one of the clearest days in the study period. It was selected for showing the model's ability to recreate spatial T a distribution maps where ground weather stations do not exist. Figure 7a shows satellite retrieval daily mean T a of Jingjinji region on the selected day. The result shows that the distribution of high and low T a is very close to the distribution of elevation, which is consistent with the fact that temperature decreases with height in the troposphere.
The Beijing region was selected for showing the correlation of estimated temperature and the distribution of elevation. Figure 7b contains significant details of the estimated daily mean temperature, which were in line with the distribution of elevation (Figure 7c).

Spatial Distribution of Ta
Thirty-first May 2018 was one of the clearest days in the study period. It was selected for showing the model's ability to recreate spatial Ta distribution maps where ground weather stations do not exist. Figure 7a shows satellite retrieval daily mean Ta of Jingjinji region on the selected day. The result shows that the distribution of high and low Ta is very close to the distribution of elevation, which is consistent with the fact that temperature decreases with height in the troposphere.
The Beijing region was selected for showing the correlation of estimated temperature and the distribution of elevation. Figure 7b contains significant details of the estimated daily mean temperature, which were in line with the distribution of elevation (Figure 7c).

The Performance of RF Model
This study indicated the RF algorithm is obviously better than other algorithms estimate daily and instantaneous air temperature from MODIS data over this study ar with the highest accuracy in model validation (Table 3). These results are consistent w other research [52,56,80,81]. In addition, RF produced the best simulation results for da mean temperature, whereas the results for instantaneous Ta at night were relatively po This mainly occurred because the daily average LST was calculated based on four MOD LST data from two sensors at two local overpass times (daytime and night-time). LST higher than Ta in the daytime and lower at night, therefore the difference between L and Ta is greatly reduced after averaging. The poorest simulation results were found night, mainly because the process of energy balance near the ground at night is quite d ferent from that in the daytime. Phan et al. [61] also found the correlation between L daytime versus Ta was slightly higher than nighttime versus Ta, which indicated that relationship between MODIS LST and Ta was complex. In this study, DSR was added an input variable in the daytime simulation; however, no such variable exists in nighttime model. Ruiz-Álvarez et al. [56] indicated that the most important variables RF were satellite land surface temperature, cdayt, and radiation, which could explain results of this study with lower accuracy observed at night due to the lack of radiat variables compared with the daytime. In addition, this study did not take into account advection process, which is also a potential factor affecting the instantaneous Ta at nig Specifically, there is no energy at night and the temperature depends on the cooling spe of the air and the ground. The ground and the nearby air are cooled by long wave rad tion. On the cloudy night, this long wave radiation is absorbed by the clouds and clouds also transmit long wave radiation upward and downward, with some of the dow

The Performance of RF Model
This study indicated the RF algorithm is obviously better than other algorithms to estimate daily and instantaneous air temperature from MODIS data over this study area, with the highest accuracy in model validation (Table 3). These results are consistent with other research [52,56,80,81]. In addition, RF produced the best simulation results for daily mean temperature, whereas the results for instantaneous T a at night were relatively poor. This mainly occurred because the daily average LST was calculated based on four MODIS LST data from two sensors at two local overpass times (daytime and night-time). LST is higher than T a in the daytime and lower at night, therefore the difference between LST and T a is greatly reduced after averaging. The poorest simulation results were found at night, mainly because the process of energy balance near the ground at night is quite different from that in the daytime. Phan et al. [61] also found the correlation between LST daytime versus T a was slightly higher than nighttime versus T a , which indicated that the relationship between MODIS LST and T a was complex. In this study, DSR was added as an input variable in the daytime simulation; however, no such variable exists in the nighttime model. Ruiz-Álvarez et al. [56] indicated that the most important variables in RF were satellite land surface temperature, cdayt, and radiation, which could explain the results of this study with lower accuracy observed at night due to the lack of radiation variables compared with the daytime. In addition, this study did not take into account the advection process, which is also a potential factor affecting the instantaneous T a at night. Specifically, there is no energy at night and the temperature depends on the cooling speed of the air and the ground. The ground and the nearby air are cooled by long wave radiation. On the cloudy night, this long wave radiation is absorbed by the clouds and the clouds also transmit long wave radiation upward and downward, with some of the downward radiation returning to the ground to compensate for heat lost at the ground [82,83]. Therefore, factors such as ground long wave radiation and clouds are affecting the simulation of night temperature and these factors are difficult to quantify using ready-made remote sensing products. Due to the difficulty of data acquisition, these factors were not selected in this study, which led to the phenomenon of low accuracy of night temperature. Therefore, due to the lack of potential influencing factors such as solar radiation and advection processes in the model, the simulation accuracy of instantaneous T a at night was relatively poor [4,37].
The experimental results of RF showed that the simulation accuracy in the plains area was higher than in the mountainous area ( Figure 5). Previous studies also found that the higher the altitude the greater the uncertainty of the model [37], which is mainly caused by the higher elevation, complex terrain, and the process of energy balance near the ground in mountainous areas being more complex [84]. Previous studies have shown that the relationships between daily mean air temperature and LST may change seasonally [85][86][87]. According to the simulation results of different seasons, it was found that the best results among the estimations of daily average T a and night instantaneous T a were obtained in summer. However, many previous studies have shown that the estimation accuracy is poorest in summer [37,57,88,89], and this study indicated that this phenomenon can be changed by selecting appropriate variables on the scale of daily average T a and instantaneous T a simulation.

Comparison with Recent Studies
As mentioned in the introduction, many studies have been carried out on the estimation of T a from remote sensing data. Some studies used machine learning methods that this study follows. Many of them focused on the estimation of monthly average T a [53,59,80], while other studies focused on estimating the daily average T a ; few studies have focused on the instantaneous T a . To the best of our knowledge, two such studies [55,56] used a similar method to estimate instantaneous T a at the satellite pass time. The authors of [55] proposed a two-stage random forest based approach to estimating intra-daily instantaneous T a across Israel for 2004-2016 and obtained an excellent resulting RMSE of 1.58 • C. The authors of [56] compared four different methods and reached the conclusion that RF performed better than other methods (Support Vector Machines, Multiple Linear Regression, Ordinary Kriging) with the resulting RMSE of 3.01 • C. When compared with studies of daily average validation, the results of this paper are better than most other studies. Table 7 shows the details of comparison with recent studies. Compared with other studies, firstly, we added elevation, which most studies also added [39,55,56,60,90], so this model can be applied to the simulation of temperature under different terrain conditions. Second, we had more stations (1527) than other studies [39,55,56,[89][90][91], but fewer than the study of Li et al. [60] (10,141). However, the station density in this study was higher than other studies, therefore the results and process of this current study are still facing uncertainties. It is most likely due to the high density of the site, resulting in high accuracy of the training model. Third, objectively speaking, our model is more suitable for clear-sky days. Cloud cover is a major challenge when modelling air temperature using satellite data. Inevitably, researchers would encounter data missing from satellite-based remote sensing products due to cloud impact or data quality. Since we used MODIS LST products, they had the same defect when it comes to zone under clouds or pixels contaminated by smog, fog, haze, etc. However, none of the currently available satellite-based LST products are spatially continuous due to the presence of clouds, restricting the application of LST and derived T a based on LST. In the future, we will produce long term T a products based on gap-free LST products released by other platforms or scholars to make our model applicable to all days not only clear-sky days.

The Importance of Model Variables
LST can be directly retrieved from remotely sensed radiance data, which is considered as one of the most important and useful data sources for T a retrieval over a region or large area [92][93][94]. In fact, various studies have used LST data for T a estimation with high accuracy [37,38,54,80,95]. According to the ranking results of models contributing variable importance, we found that LST was the most important variable affecting simulation results, which was consistent with previous studies [52].
DECLINATION was the second most important variable in the estimation of daytime and nighttime instantaneous T a and DSR was the third important variable in the estimation of daytime instantaneous T a only after DECLINATION. Previous studies have rarely used declination as a model input variable. The present study shows that declination is an important parameter for the retrieval of T a , which is closely related to the change of seasons and the variation of day length; therefore, declination plays an important role in the retrieval of T a .
Yang et al. [96] indicated that the complexity in land cover, elevation, and solar radiation at daytime could have resulted in low accuracy of T a estimation, because at night-time there is no solar radiation effect [89,96,97]. During the daytime, the effects of solar radiation will result in a more complex interaction between T a and LST, which is why the performance of the models with the nighttime LST variable was better than the models without LST nighttime in the study of Phan et al. [61]. In this study, the importance of DSR in the estimation of daytime instantaneous T a ranks third, which showed that DSR played an important role in the estimation of daytime scenarios and was also one of the reasons why the accuracy of estimation of daytime instantaneous T a was higher than that of nighttime. As a result, declination and solar radiation are highly recommended to use to improve the accuracy of T a estimation in the future study.

Limitations and Future Perspectives
In the estimation of instantaneous T a , the retrieval accuracy in daytime was obviously better than that at night, which may occur because DSR was added as an input variable of the model in daytime simulation, but there is no such variable in nighttime simulation. In the future, more input variables such as advection processes, wind speed, wind direction, etc., which remote sensors cannot retrieve, should be taken into account and model parameters can be adjusted to further improve the accuracy of nighttime instantaneous air T a simulation [98,99].
In addition, the results and processes of our current study are still facing uncertainties. Compared with other studies, this study has a high density of meteorological stations, which may have greatly improved the accuracy of the model. In the future, we will attempt to verify the performance of the model in the case of low-density meteorological stations.

Conclusions
In this paper, satellite remote sensing data and observational ground-based weather station temperature data in the Jingjinji region during 2018-2019 were used to establish five machine learning models for T a estimation; the accuracy of the model products was verified and compared. The results showed that RF provided the optimal model with the lowest RMSEs (day average 1.29 • C, daytime instantaneous 1.88 • C, nighttime instantaneous 2.47 • C). In addition, as for the instantaneous T a estimation, the retrieval accuracy in daytime was obviously better than that at night, the plains areas were obviously better than those in mountainous areas, and the summer simulation results were the best among the T a estimation of daily average and night instantaneous. This study showed that LST was the most important factor contributing to model accuracy, followed by solar declination and DSR, which implied that declination and DSR should be prioritized when estimating T a . However, it must be emphasized that there are still several limitations in this study, such as the nighttime instantaneous T a estimation, which was relatively low due to different surface energy balance processes that occur at night.
In conclusion, based on the support of high-density meteorological station and remote sensing data, a large-scale spatial continuous daily average and instantaneous T a estimation can be carried out by selecting appropriate variables to establish an RF model. On this basis, the daily average T a , daytime instantaneous T a , and nighttime instantaneous T a datasets with a 1 km resolution in the Jingjinji region from 2018 to 2019 were established in this paper, which can provide spatial continuous T a data and are of reference value for the boundary layer of related research studies in the Jingjinji region.