1. Introduction
As a key parameter of land surface energy balance models, near surface air temperature (NSAT) is not only an important indicator of the surface atmospheric environment and the urban thermal environment but is also a significant parameter of land surface energy balance models [
1,
2,
3]. Because various land surface processes, such as photosynthesis and respiration, are affected by NSAT, the continuous spatial distribution of the NSAT with suitable temporal and spatial resolution is necessitated by various studies concerning land–surface modelling, global environmental changes, urban heat island effect, etc. [
4,
5]. At present, NSAT data are primarily captured by meteorological ground stations. With the successful launching of various multispectral satellite sensors, including MODIS, Landsat, Sentinel, etc., satellite remote sensing technology has become one of the most efficient ways to acquire NSAT data with high temporal and/or spatial resolution. Five methods have been widely employed to retrieve the NSAT data from multispectral satellite images in recent decades: the single-factor method, multi-factor method, temperature vegetation index (TVX) method, surface energy balance method, and machine learning method.
1.1. The Single-Factor Method
The single-factor method can be used to analyze the correlation between NSAT and Land Surface Temperature (LST) based on a linear regression model [
4]. The correlation coefficients (R
2) between the LST extracted from satellite data and the NSAT measured by on-site monitors exceeded 0.95 in the experiments conducted by Basist et al. [
6]. Fu et al. used LST provided by MODIS to retrieve NSAT in the northern Qinghai-Tibet Plateau based on a linear regression model, and they found that the daily NSAT
min (minimum NSAT) and the nighttime NSAT
avg (average NSAT) can be retrieved in cases where the input LST data are accurate [
7]. In a paper by Zhang et al., the NSAT results retrieved from nighttime LST provided by MODIS were generally better than those from daytime LST [
8]. Although the single-factor method has been widely employed by many researchers for NSAT retrieval, the accuracy and stability of this method are highly dependent on the time, location, and amount of the observational data.
1.2. The Multi-Factor Method
As an upgrade to the single-factor method, the multi-factor method has been developed and employed by researchers to improve the retrieval accuracy and enhance its usability. Based on this method, Cresswell et al. used the temperature of Meteosat and the Solar Zenith Angle (SZA) to retrieve NSAT, with an accuracy of 3 °C for over 70% of the data [
9]. Zhao et al. found that the linear regression model with monthly average LST, elevation, and topographic factors outperformed the three statistical methods of spline, inverse distance weight, and kriging [
10]. Xu et al. developed a statistical model to retrieve NSAT using LST, elevation, Normalized Difference Vegetation Index (NDVI), and albedo as inputs [
11]. In general, the accuracy of the multi-factor method is better than the single-factor method, while the accuracy and stability are still limited by the time, location, and amount of the observational data.
1.3. The Temperature Vegetation Index Method
Like the single- and multi-factor methods, the temperature vegetation index (TVX) method can also be employed for NSAT retrieval, based on the negative correlation between the atmosphere temperature and spectral vegetation index [
12]. With the observed LST and the NDVI, for instance, Prihodko and Goward et al. implemented the TVX method to retrieve the NSAT—the R² reached 0.93 and the Mean Error (ME) was 2.92 °C [
13]. Nieto et al. compared the accuracy between the previous maximum NDVI (NDVI
max) and the current NDVI
max, which used the observed NSAT to calibrate the NDVI
max for each vegetation type [
14]. Zhu et al. improved the TVX method by lowered the threshold of the negative correlation coefficient of NDVI, which revealed the result with RMSE from 7.45 °C to 3.79 °C, the MAE from 6.21 °C to 3.03 °C, and R² equal to 0.83 [
15]. However, the TVX method is unsuitable in areas with low vegetation and bare soil.
1.4. The Surface Energy Balance Method
In the research conducted by Pape et al., the surface energy balance model was established to simulate the LST and NSAT changes in a high temporal resolution alpine region, based on the measured NSAT and satellite remote sensing data [
16]. Analogously, Hou et al. proposed an Energy Balance Bowen Ratio model to retrieval the NSAT from Landsat 5 TM images in Beijing, where the ME of the retrieval result was equal to 2.21 °C [
17]. The surface energy balance method has been implemented in various regions and has revealed satisfactory results in cases where sufficient amounts of observation data were available.
1.5. Machine Learning Methods
Since the 2010s, more and more researchers have started to employ machine learning methods for the task of NSAT retrieval. For instance, Xu et al. employed the Random Forest (RF) model to retrieve maximum NSAT (NSAT
max) from MODIS data in British Columbia, which revealed that the RF model had higher accuracy than the linear regression method [
18]. Similar to Xu et al., Ho et al. found that the NSAT retrieval results based on the RF model were generally better than the ordinary least squared regression and the support vector machine models [
19]. Based on the M5 model, Emamifar et al. retrieved the NSAT in Khuzestan province in southwestern Iran, with the LST calculated from MODIS, solar radiation, and Julian day as inputs. The results showed that the RMSE was 2.3 °C and the R² reached 0.96 [
20]. In the MBR-LST model proposed by Zhang et al., the back-propagation neural network was employed to retrieve the land surface temperature (LST) based on the inputs of the band reflectance from Landsat 8 OLI/TIRS images, NDVI, elevation, latitude, meteorological parameters, and time parameters, which revealed a significantly better performance than the traditional Radiative Transfer Equation (RTE) method [
21].
The reported work mentioned above indicates that satellite remote sensing technology, together with the proper machine learning methods have become an efficient way for the retrieval of NSAT in continuous space. In the literatures published so far, however, there are few studies concerning the comprehensive evaluation and/or systematic comparison of the NSAT retrieval performance of the various machine learning models. Hence, the three most commonly-used machine learning models in the environmental field, Support Vector Regression (SVR), Multilayer Perceptron Neural Network (MLBPN), and Random Forest (RF), have been employed for NSAT retrieval from the various multispectral satellite images of MODIS daytime and nighttime data, Landsat 8 data, and Sentinel-2 data. By comparing the NSAT retrieval results generated by the different machine learning models, the ‘optimal’ NSAT retrieval model, together with the ‘best’ satellite data as inputs will be sufficiently investigated in this research.
5. Discussion
For further analysis, the residuals of the retrieval NSAT values have been calculated as well as compared in
Figure 11.
Figure 11a shows the histograms of the NSAT-retrieval residuals, using the MODIS
TD data and based on different models. With the SVR-based model, there are 44.78%, 78.73%, and 94.40% of the residuals smaller than 1 °C, 2 °C and 3 °C respectively. With the MLBPN-based model, 49.25%, 80.60%, and 94.03% of the residuals are smaller than 1 °C, 2 °C, and 3 °C. With the RF model, the three proportions mentioned above have increased to 52.42%, 83.65%, and 95.17%, i.e. the RF-based model has revealed the best NSAT-retrieval performance in case of using the MODIS
TD data.
Figure 11b shows the histograms of NSAT-retrieval residuals using the MODIS
TN data. With the SVR-based model, there are 66.11%, 86.92%, and 96.65% residuals smaller than 1 °C, 2 °C, and 3 °C respectively. With the MLBPN-based model, 61.08%, 88.26%, and 98.32% of the residuals are smaller than 1 °C, 2 °C, and 3 °C. With the RF-based model, more residuals are located in the range of 0 °C ~ 3 °C − there are respectively 60.73%, 91.94%, and 98.99% of residuals in the range of 0 °C ~ 1 °C, 2 °C and 3 °C. Similar to the experiments using the MODIS
TD data, the RF model has revealed the best retrieval performance comparing to the SVR and MLBPN -based models.
Figure 11c shows the histograms of NSAT-retrieval residuals using the Landsat 8 data. With the SVR-based model, there are merely 35.09%, 54.38%, and 70.17% residuals smaller than 1 °C, 2 °C, and 3 °C respectively. With the MLBPN-based model, the corresponded proportions have increased to 40.35%, 70.18%, and 89.48%. With the RF-based model, the performance of the RF-based model becomes much better than the other two models, with more than 42.11%, 77.20% and 96.50% residuals located in the range of 0 °C~1 °C, 2 °C, and 3 °C.
Figure 11d shows the histograms of the retrieval residuals using the Sentinel-2 data and based on different models. With the SVR-based model, there are 30.63%, 53.15% and 64.84% of the residuals smaller than 1 °C, 2 °C, and 3 °C. With the MLBPN-based model, there are 27.93%, 47.75% and 66.67% of the residuals smaller than 1 °C, 2 °C and 3 °C. With the RF-based model, the residuals smaller than 1 °C, 2 °C and 3 °C increase to 40.18%, 59.83%, and 78.58%.
Based on the above analysis concerning on NSAT-retrieval residuals, it can be concluded that the RF-based model shows comprehensively better performance than the other two models.
Table 8 conducts further analysis of the NSAT retrieval residuals at different CMA-NOAA meteorological ground stations and in different seasons, produced by the RF-based model. With the MODIS
TD data, (a) the retrieval results in autumn are better than the other seasons and (b) the retrieval results at the XIANYANG meteorological ground station seem better than the other stations. With the MODIS
TN data, however, the spring becomes the ‘best’ season and the HUASHAN meteorological ground station become the ‘best’ station for the NSAT-retrievals. With the Landsat 8 data, the ‘best’ season goes back to autumn while the ‘best’ station changes to FENGXIANG. With the Sentinel-2 data, the retrieval results at JINGHE station/in winter have reached the highest accuracy. Thereby, it has not been found that there is a clear spatial or seasonal pattern concerning on the NSAT-retrieval residuals although the NSAT-retrieval performances of the RF-based model are very stable and satisfactory.
6. Conclusions
In this study, three machine-learning models based on SVR, MLBPN, and RF werte established to retrieve the NSAT from various multispectral satellite images in terms of MODIS daytime and nighttime data from 2010 to 2021, Landsat 8 data from 2013 to 2021, and Sentinel-2 data from 2018 to 2021. As well as the satellite images, the geospatial parameters, together with the time parameters, were also considered to enhance the accuracy of the NSAT retrieval.
The conducted experiments demonstrated that the RF-based model has better NSAT retrieval performance than the SVR- and MLBPN-based models with respect to both the accuracy and stability. With the MODISTD data, the RF-based model revealed a retrieval result with R2, RMSE, MAE, and ME equal to 0.9697, 1.48 °C, 1.17 °C, and 0.05 °C, respectively, where the RMSE and MAE are smaller than the NSAT retrieval model based on either SVR or MLBPN. With the MODISTN data, the RF-based model revealed a retrieval result with R2, RMSE, MAE, and ME equal to 0.9820, 1.21 °C, 0.95 °C, and −0.01 °C, respectively, where the R2 is larger and the RMSE, MAE, and ME are smaller than the SVR- and MLBPN-based models. With the Landsat 8 data, the RF-based model revealed a retrieval result with R2, RMSE, MAE, and ME equal to 0.9763, 1.54 °C, 1.27 °C, and 0.05 °C, respectively, where R2 is larger and the RMSE and MAE are smaller than the other two models. With the Sentinel-2 data, the NSAT result retrieved by the RF-based model has the largest R2 and the smallest RMSE and MAE. Moreover, several samples were randomly selected for more precise qualitative analysis, and the results also proved that the proposed RF-based model revealed the best NSAT retrieval performance.
Meanwhile, in the conducted experiments, as depicted in
Table 2 and
Table 7, the NSAT results retrieved from the MODIS data are generally better than those from Landsat 8 and Sentinel-2 data. Employing the MODIS
TN data, for instance, the R
2 reaches 0.9820 and the RMSE and MAE are only slightly larger than 1.0 °C. In addition, it was also found that the RF model does not show significant pattern differences for the NSAT retrievals, which indicates that this model is stable and can thereby be widely utilized.
To sum up, the proposed RF-based model, together with the MODIS data, has the best NSAT retrieval results, which provides a reference for practical applications relevant to NSAT retrievals. Taking advantage of its high accuracy and stability, the RF-based model has the potential to be widely utilized in various applications concerning climate research and environmental studies. For example, the retrieved NSAT results can not only help to select the optimal location for the new meteorological ground stations to be constructed, but can also provide necessitated data for the correlation analysis between the NSAT and topographic changes, population distribution, land use, and climatic conditions, etc.
There are only four qualified CMA-NOAA stations located in the study area, which restricted the amount of sample data for model training and validation. In further research, the data from more CMA-NOAA stations could be involved to enhance the accuracy and/or the generic nature of the established NSAT retrieval models. In addition, although most of the clouds were removed through the quality control products in the data preprocessing, the remaining clouds still affected the retrieval accuracy. In the future, different algorithms need to be investigated to overcome the influence of clouds.