A Novel Fully Coupled Physical–Statistical–Deep Learning Method for Retrieving Near-Surface Air Temperature from Multisource Data

: Retrieval of near-surface air temperature (NSAT) from remote sensing data is often ill-posed because of insufﬁcient observational information. Many factors inﬂuence the NSAT, which can lead to the instability of the accuracy of traditional algorithms. To overcome this problem, in this study, a fully coupled framework was developed to robustly retrieve NSAT from thermal remote sensing data, integrating physical, statistical, and deep learning methods (PS-DL). Based on physical derivation, the optimal combinations of remote sensing bands were chosen for building the inversion equations to retrieve NSAT, and deep learning was used to optimize the calculations. Multisource data (physical model simulations, remote sensing data, and assimilation products) were used to establish the training and test databases. The NSAT retrieval accuracy was enhanced using the land surface temperature (LST) and land surface emissivity (LSE) as prior knowledge. The highest mean absolute error (MAE) and root-mean-square error (RMSE) of the retrieved NSAT data were 0.78 K and 0.89 K, respectively. In a cross-validation against the China Meteorological Forcing Dataset (CMFD), the MAE and RMSE were 1.00 K and 1.29 K, respectively. The actual inversion MAE and RMSE for the optimal band combination were 1.21 K and 1.33 K, respectively. The proposed method effectively overcomes the limitations of traditional methods as the inversion accuracy is enhanced by adding the information of atmospheric water vapor and more bands, and the applicability (portability) of the algorithm is enhanced using LST and LSE as prior knowledge. This model can become a general inversion paradigm for geophysical parameter retrieval, which is of milestone signiﬁcance because of its accuracy and the ability to allow deep learning for physical interpretation.


Introduction
Near-surface air temperature (NSAT) typically refers to the atmospheric temperature approximately 2 m above the ground and is a key entity in many geoscientific and climatological studies such as those on global climate change, hydrology, atmosphere, ecology, agricultural production, urban heat island effect, and air pollution [1][2][3][4][5][6][7][8]. Consequently, the NSAT needs to be rapidly obtained with a high precision and spatial resolution in a large area. NSAT data are typically obtained using two methods: traditional methods based on in situ meteorological data, and remote sensing-based approaches.
Traditional methods can be mainly categorized as physical methods based on the energy balance [9] or empirical methods based on the relationships of the related variables [10,11]. Physical methods require many parameters (e.g., aerodynamic resistance, roughness, and soil physical properties) for characterizing the status of soil and vegetation, which are often difficult to obtain [9]. In contrast, in empirical methods, the NSAT is obtained by interpolating meteorological variables via the geographic information system (GIS). This method is simple and effective, but its accuracy depends on the number and distribution characteristics of the meteorological stations. If the number of meteorological stations is insufficient and/or the stations are unevenly distributed (especially in mountainous areas), it may be difficult to accurately estimate the NSAT [12].
Remote sensing approaches can be divided into statistical, energy balance, and machine learning methods. Statistical methods include simple direct and indirect methods [3,[13][14][15][16]. These methods were widely applied in early research as they require few input parameters and can be easily implemented. Simple direct statistical methods include univariate regression models that construct linear relationships between the NSAT and land surface temperature (LST) and multivariate regression models that consider other factors (e.g., solar zenith angle, latitude, and longitude) [13,17,18]. Indirect statistical methods try to relate NSAT to vegetation indices [3,15], assuming that the surface temperature of the canopy is in equilibrium with the air temperature within the canopy. The negative correlation between the LST and normalized difference vegetation index (NDVI) is used to obtain the NSAT. Such methods may be mainly suitable for woodland or farmland regions with high vegetation cover [19,20]. Notably, the portability of such methods is not high, and their accuracy must be enhanced through calibration by in situ NSAT data from meteorological stations in different regions. The energy balance methods [9,21] link NSAT and other surface environmental parameters by building a physical model. Although such methods involve clear physical meanings and exhibit a strong universality, they require many parameters that are difficult to obtain with remote sensing technologies [22,23]. Machine learning methods [12,24], specifically, neural network (NN) methods, can also be used to retrieve NSAT. Although the accuracy of such methods is high, it relies significantly on the training data. An NN trained in one region can often not be directly applied to another region because the physical relationship between the input and output parameters is ambiguous, and the training data are representative of local regions [12,25].
The difficulty in retrieving NSAT with thermal infrared remote sensing is that the brightness temperature (BT) at a satellite is mainly associated with the LST, and little information regarding the NSAT is available. The NSAT is influenced by not only LST variations but also other factors such as the land surface emissivity (LSE, surface type). Most of the existing remote sensing retrieval methods do not consider the effects of these parameters. To overcome the shortcomings of the abovementioned methods, in this study, a robust NSAT retrieval method is developed by coupling the physical, statistical, and deep learning approaches (PS-DL) with LST and LSE. The proposed hybrid method for estimating NSAT from thermal infrared remote sensing data fully utilizes the optimal computing power of DL while ensuring that the model is physically meaningful with a wide applicability (portability).

Data
We used the widely used MODTRAN radiative transfer model to simulate the solutions of the physical method [26,27]. The assimilation products, observation data of the ground meteorological stations, and corresponding synchronous observation BT at the satellite were used to obtain the solutions of the statistical method. These data were used to prepare the high-precision training and testing datasets required by DL.

Solutions of Physical Method
MODTRAN is a commonly used atmospheric radiative transfer model that can realistically simulate the radiative transfer process by setting different parameters such as the surface type, atmospheric conditions, and observation angle. To obtain a representative solution, we performed simulations for different bands of the satellite sensor. At the kilometer resolution scale of satellite images, 17 common surface types were selected as the real surface (Figure 1), including vegetation, soil and water bodies. The atmospheric mode was set as the default MODTRAN mode. The land surface temperature, NSAT, atmospheric water vapor content (WVC), and observation angle, which influence the satellite BT, were set as 280-325 K, 270-315 K, 0.2-4.0 g/cm 2 , and 0 • -65 • , respectively. Radiative transfer simulations were performed to determine every parameter in the forward process for a single thermal infrared band, including the solution to the radiative transfer equation. In this manner, we obtained representative solutions by setting different thermal infrared bands in different conditions, which formed the simulation database of physical method solutions.
Remote Sens. 2022, 14, x FOR PEER REVIEW 3 of 23 ground meteorological stations, and corresponding synchronous observation BT at the satellite were used to obtain the solutions of the statistical method. These data were used to prepare the high-precision training and testing datasets required by DL.

Solutions of Physical Method
MODTRAN is a commonly used atmospheric radiative transfer model that can realistically simulate the radiative transfer process by setting different parameters such as the surface type, atmospheric conditions, and observation angle. To obtain a representative solution, we performed simulations for different bands of the satellite sensor. At the kilometer resolution scale of satellite images, 17 common surface types were selected as the real surface (Figure 1), including vegetation, soil and water bodies. The atmospheric mode was set as the default MODTRAN mode. The land surface temperature, NSAT, atmospheric water vapor content (WVC), and observation angle, which influence the satellite BT, were set as 280-325 K, 270-315 K, 0.2-4.0 g/cm 2 , and 0°-65°, respectively. Radiative transfer simulations were performed to determine every parameter in the forward process for a single thermal infrared band, including the solution to the radiative transfer equation. In this manner, we obtained representative solutions by setting different thermal infrared bands in different conditions, which formed the simulation database of physical method solutions.

Solutions to Statistical Methods
Statistical methods differ from physical methods because the solutions can be obtained through regression computations of high-precision data obtained synchronously from the ground and satellite. For consistency, the statistical method uses the same bands and number of bands as those in the physical method. The remote sensing data used in this study correspond to MODIS, a sensor onboard both the Terra and Aqua satellites that

Solutions to Statistical Methods
Statistical methods differ from physical methods because the solutions can be obtained through regression computations of high-precision data obtained synchronously from the ground and satellite. For consistency, the statistical method uses the same bands and number of bands as those in the physical method. The remote sensing data used in this study correspond to MODIS, a sensor onboard both the Terra and Aqua satellites that orbit the Earth twice a day. The sensor has 36 bands and is characterized by global coverage, a high resolution, dynamic measurements, and high accuracy of alignment, which can facilitate the verification of the applicability of the PS-DL method for different combinations of bands. In addition, the MODIS LST and LSE products are mature and well validated [28] and can thus serve as sources for synchronous high-precision largescale LST and LSE data. MODIS data are available on the LAADS DAAC website: https: //ladsweb.modaps.eosdis.nasa.gov/search/ (accessed on 20 December 2021).
ERA5 is the fifth-generation product of the atmospheric reanalysis global climate data launched by the European Centre for Medium-Range Weather Forecasts (ECMWF), which provides many meteorological elements such as the air temperature at the reference height of 2 m. Since the release of the ERA5 reanalysis dataset, many researchers have tested their accuracy [29,30], and many analyses have shown that the accuracy of ERA5 is better than that of ERA-Interim higher spatial and temporal resolution and is beneficial to the accurate description of the regional atmosphere [31,32]. The ERA5 data can be obtained from Copernicus Climate Data Store (https://cds.climate.copernicus.eu/cdsapp#!/search? type=dataset&text=ERA5; (accessed on 1 December 2020)).
The China Meteorological Forcing Dataset (CMFD), developed by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences [33][34][35], contains various meteorological elements, such as atmospheric, land, and oceanic parameters. The temporal and spatial resolutions are 3 h and 0.1 • , respectively. The dataset was established by fusing remote sensing products, reanalysis datasets, and in situ observations from meteorological stations and interpolating them. Many researchers have highlighted that the accuracy of CMFD is high in the Chinese region, superior to that of the Global Land Data Assimilation System (GLDAS) data, and it thus satisfies the application requirements [29,30,33]. CMFD data can be extracted from the China National Qinghai-Tibet Plateau Science Data Center archive (http://data.tpdc.ac.cn/zh-hans/data/8028b944-daaa-4511-8769-965612652c49/; (accessed on 18 November 2021)).
The ground measurement data were derived from the China National Meteorological Information Center (CNMIC) dataset (http://www.nmic.cn/site/index.html; (access on 1 November 2020)), which includes the hourly air temperature, LST, and weather condition records. In situ NSAT data from 2002 to 2020 were used to build training and testing datasets and validate the accuracy of the PS-DL method. Table 1 lists the datasets used in this study.

Data Processing
The accuracy of remote sensing-based NSAT inversion largely depends on the quality of training and testing data. High-quality training data must be used to allow a DL-NN to extract information and perform accurate computations and to transform the feature extraction process into automatic feature learning and application-dependent feature exploration [36,37]. Considering this aspect, to ensure the accuracy and representativeness of the training and testing databases, we used multiple data sources to obtain solutions of the physical and statistical methods. The daily MODIS data of the Aqua and Terra satellites have approximately 50% missing values owing to the influence of weather phenomena such as clouds and rainfall [38,39]. To ensure the data quality, abnormal data were removed according to the MODIS LST product control file, and the data were resampled using the nearest neighbor method to achieve a grid cell consistency of 0.1 • × 0.1 • .

Research Concept and Methodology
Physical methods yield accurate results but require many parameters. Moreover, the equation solving process is complicated and cannot be easily realized using conventional methods. Statistical methods are simple to implement and widely used in practice. However, these methods exhibit a low generality (portability). Most physical methods cannot describe and represent all situations and need to be supplemented by statistical methods. In general, if a system of equations is solvable, then a fully connected DL-NN can approximate the curve of any complex equation solution. Furthermore, DL can simultaneously solve statistical methods if the input parameters (dimensions) and output parameters of the statistical methods are consistent with those of the physical methods. Therefore, we used DL to couple physical and statistical methods to exploit the advantages of the different methods and overcome the shortcomings of traditional methods.
The framework of the PS-DL method is shown in Figure 2.
Step 1 (dashed red rectangle) involves the physical method based on the radiative transfer energy balance equation. A physical forward model system is constructed, and the classical MODTRAN model is implemented to simulate the radiative transfer processes to obtain the solution of the physical equation system. Step 2 (dashed yellow rectangle) involves the statistical method, in which high-precision data from the assimilation model, in situ meteorological observations, and satellite BT data are used to generate a high-precision statistical database. In Step 3, the solutions of the physical method are combined with the solutions of highprecision statistical methods to build training and testing databases for DL. In order to improve the retrieval accuracy of near-surface air temperature, we first retrieve LST and LSE, and then use LST and LSE as prior knowledge to further retrieve NSAT, and these improve the information of the NSAT signal and enhance the retrieval accuracy. In Step 4, DL is used to optimize and solve the physical and statistical methods to achieve the required accuracy through repeated training and testing.
Step 5 involves the validation and application of the preceding operations.

Physical Method
The physical method for NSAT is established based on the thermal radiance of the ground and its transfer from the ground and near-surface air to the remote sensor. The radiation transfer process is shown in Figure 3. The theoretical basis for the remote sensing of NSAT is that the near-surface air exchanges energy with the ground surface and atmospheric profile owing to the temperature difference and transmits the radiation through the atmosphere to the sensor. The type of surface influences the surface radiation, resulting in different intensities of energy exchange with the near-surface air. Therefore, the inversion equation must consider the effect of the surface type (LSE). In other words, the LST influences the NSAT, which in turn affects the atmospheric profile temperature. Surface and near-surface energy is absorbed by the atmosphere, especially, by the water vapor, as it travels through the atmosphere to the sensor [40]. The spectral distribution of radiation emitted from the ground and near the surface depends on the wavelength. The simplified radiation energy balance equation is presented as Equation (1).
where B i (T i ) is the thermal radiance received by the sensor in band i; T i is the satellite BT; ε i is the LSE for band i; τ i is the atmospheric transmittance of band i; B i (T s ) is the radiance emitted by the surface; T s is the LST; R ↓ i T 0 , T ↓ a and R ↑ i T 0 , T ↑ a represent the downwelling and upwelling atmospheric radiation of band i, respectively; T 0 is the NSAT; and T ↓ a , and T ↑ a denote the downwelling and upwelling effective mean temperature of the atmosphere, respectively.

Physical Method
The physical method for NSAT is established based on the thermal radiance of the ground and its transfer from the ground and near-surface air to the remote sensor. The radiation transfer process is shown in Figure 3. The theoretical basis for the remote sensing of NSAT is that the near-surface air exchanges energy with the ground surface and atmospheric profile owing to the temperature difference and transmits the radiation through the atmosphere to the sensor. The type of surface influences the surface radiation, resulting in different intensities of energy exchange with the near-surface air. Therefore, the inversion equation must consider the effect of the surface type (LSE). In other words, the LST influences the NSAT, which in turn affects the atmospheric profile temperature. Surface and near-surface energy is absorbed by the atmosphere, especially, by the water vapor, as it travels through the atmosphere to the sensor [40]. The spectral distribution of radiation emitted from the ground and near the surface depends on the wavelength. The simplified radiation energy balance equation is presented as Equation (1).
where ( ) is the thermal radiance received by the sensor in band i; is the satellite BT; is the LSE for band i; is the atmospheric transmittance of band i; ( ) is the radiance emitted by the surface; is the LST; ↓ , ↓ and ↑ , ↑ represent the The atmospheric upwelling radiation R ↑ i T 0 , T ↑ a can be expressed as in Equation (2) [41,42]: where T h is the atmospheric temperature at elevation h, H is the sensor height, and τ i (h, H) is the atmospheric upwelling transmittance between elevations h and H. Equation (3) can be obtained by solving Equation (2) using the mean value theorem [41,42].
The atmospheric downwelling radiation R ↓ i T 0 , T ↓ a can be considered an integral of the atmospheric radiation from a hemispherical direction and can be expressed as in Equation (4) [41,42].
where θ is the direction angle of the atmospheric downwelling radiation, ∞ represents the elevation of the top of the Earth's atmosphere (km), and τ i (θ , h, 0) represents the atmospheric downwelling transmittance from elevation z to the surface. According to Franc and Cracknell (1994) [41], in clear sky conditions, the upwelling and downwelling transmittances can be considered equal for each thin layer of the atmosphere, i.e., ∂τ i (θ , h, 0) = ∂τ i (h, H). Equation (5) can be obtained using the mean value theorem of integrals: Therefore, the atmospheric downwelling radiation can be defined as in Equation (6).
The substitution of Equations (3) and (6) into Equation (1) yields Equation (7) Here, T ↑ a and T ↓ a represent the average atmospheric temperatures. Qin et al. (2001) analyzed these two variables and noted no substantial difference in the solution when the two variables were combined to generate one variable.
The key contribution of atmospheric radiation pertains to the bottom layer of the atmosphere [43]. According to the derivation analysis, a nearly linear relationship (Equation (9)) exists between the NSAT and average temperature of the atmosphere in the given conditions [42]. According to the reciprocity theory, a similar relationship exists between the NSAT and satellite BT (Equation (10)) [44].
where A 1 and A 2 are coefficients, B 1 and B 2 are constants, and T i is the satellite BT for band i. The coefficients in Equations (9) and (10) vary across regions and seasons. Therefore, one variable (T 0 ) can be substituted for the other (T a ) to decrease the unknowns in Equation (8). This analysis shows that although a certain constraint relationship exists be- tween the different temperature variables in the energy balance equation, this relationship cannot be strictly determined, which introduces uncertainties in the calculation process of traditional methods. The LSE represents the magnitude of radiant energy absorbed and emitted by the ground surface, which is related to the surface type, surface roughness, and water content.
The measured values vary with the wavelength and viewing angle. In general, the spectral curve is unique for each surface type, and thus, if the surface type of a pixel is known, the LSE for each band can be obtained. Therefore, it is reasonable to assume that the LSE for all bands can be reduced to an unknown parameter with respect to the surface type, as shown in Equation (11).
The atmospheric transmittance (τ i ) in the energy balance equation (Equation (8)) is affected by the atmospheric water vapor and other gases. In general, the WVC undergoes significant fluctuations, whereas the content of other gases (O) remains relatively stable. Therefore, only the atmospheric WVC needs to be determined to obtain the transmittance at different wavelengths. As shown in Equation (12), the transmittance of different wavelength bands can be summarized as a function of the atmospheric WVC.
Equation (10) contains four unknowns (LST, NSAT, atmospheric WVC, and surface type). If the physical method is used to invert the NSAT, at least four thermal infrared bands are needed to construct four radiative transfer equations. According to the simulation analysis, the thermal radiation energy emitted by the surface accounts for 75% of the energy received by the sensor when the atmospheric transmittance is higher than 0.65. Consequently, the NSAT inversion algorithm that directly uses the thermal infrared band is not adequately accurate, and thermal infrared remote sensing is more suitable for retrieving the surface temperature. To enhance the accuracy and versatility of the NSAT inversion algorithm, we first invert the LST and LSE and then use the LST and LSE as prior knowledge to further invert the NSAT. Equation (8) indicates that if the atmospheric WVC is known, the NSAT can be retrieved using the three thermal infrared bands. Additionally, if the surface temperature and atmospheric WVC are known, the NSAT can be retrieved using the two thermal infrared bands. The MODTRAN model can obtain as many solutions of equations as possible by setting the variation ranges of parameters such as the surface temperature, air temperature, and atmospheric state under clear sky conditions. The solutions of physical and statistical methods constitute the training and test data for DL, and DL optimization can then be performed to solve the inversion equations.

Statistical Method
The statistical method for retrieving the NSAT based on thermal infrared remote sensing mainly involves the direct statistical regression of the NSAT and multiple thermal infrared bands [13]. Another statistical algorithm directly uses the NSAT and surface temperature retrieved from thermal infrared remote sensing to perform statistical regression [14,22]. In these methods, the inversion accuracy of a single surface type is reliable over a short period. However, these methods are not portable or versatile over a long period because the influence of other factors such as the LSE on the NSAT is not considered. To eliminate the shortcomings of traditional methods, we use the LST and LSE as prior knowledge. The consideration of these parameters can help eliminate the instability of thermal infrared remote sensing-based NSAT inversion, enhance the computational accuracy, and improve the generality of the algorithm. According to the derivation of the physical method, if the LST and LSE are not used as prior knowledge, the statistical method needs Remote Sens. 2022, 14, 5812 9 of 23 at least four thermal infrared bands to achieve a high accuracy. Because the signal between the different bands is nonlinear, the statistical method can be expressed as in Equation (13).
where T i represents the satellite BT of the thermal infrared band i (i ≥ 4), and fi stands for fuzzy statistical function. Here we do not need specific calculation, just need to directly use multisource data to obtain the solutions of the statistical method. For consistency, the location and number of bands used in the statistical and physical methods are identical in this study. Most physical methods do not describe all situations, and statistical methods can thus be an effective complement. From a big data perspective, if the data collected through statistical methods can effectively represent all spatial solutions, then inversion can be achieved through DL. To enhance the inversion accuracy and generality of the algorithm, we collect high-precision statistical sample data from multiple sources.

DL
In recent years, DL has been widely used in many fields and received considerable attention from the remote sensing community. However, DL techniques are typically regarded as a black box, and the physical mechanism remains to be clarified [23,[45][46][47]. As discussed in Section 3.1, DL can be used to couple physical and statistical algorithms. Moreover, Sections 3.2 and 3.3 describe the sufficient conditions for physical and statistical methods to optimize calculations with DL techniques, which can render the use of DL physically meaningful. Many researchers have presented the principles and technical details associated with the use of DL in the inversion of geophysical parameters and the surface temperature [48,49]. As shown in Figure 4, a fully connected DL-NN consists of an input layer, an output layer, and multiple hidden layers, and the number of neurons in each layer depends on the setting of the initial parameters. The weight and bias of a single neuron are (w·X + σ), which is activated by the nonlinear function sigmoid function. The result of the activation is the input of the next neuron or output of the entire network. The input of the neuron can be the actual input X of the entire network or the output of the previous neuron.
ote Sens. 2022, 14, x FOR PEER REVIEW In the training process, the Kalman filter algorithm is used to en gence speed of the learning phase and separation ability for highly n The initial neural network weights are set to be small random numbe man filtering process is a recursive mean square estimation process, a for each update is calculated based on the previous estimation results a Consequently, the weights connected to each output node can be upda To rapidly obtain the required root-mean-square error (RMSE), the DL iterations, and the results obtained by the NN are highly stable [12,44]   In the training process, the Kalman filter algorithm is used to enhance the convergence speed of the learning phase and separation ability for highly nonlinear problems. The initial neural network weights are set to be small random numbers (−1, 1). The Kalman filtering process is a recursive mean square estimation process, and the NN weight for each update is calculated based on the previous estimation results and new input data. Consequently, the weights connected to each output node can be updated independently.
To rapidly obtain the required root-mean-square error (RMSE), the DL requires only a few iterations, and the results obtained by the NN are highly stable [12,44].

Model Construction
MODIS consists of multiple mid-infrared and thermal infrared bands. MODIS bands 20, 22, and 23 range from 3.5 to 7.2 µm, and bands 29, 30, 31, 32, and 33 range from 8 to 13.5 µm. The mid-infrared band (3.5-4.2 µm) is affected by solar radiation, and its use is therefore mainly suitable for night retrieval, whereas the thermal infrared bands (8-13.5 µm) are suitable for both day and night retrieval. According to the position of the central wavelength and characteristics of the MODIS bands, we constructed three combinatorial modes (  (3) combinations suitable for day and night retrievals: LST, LSE, and BT in thermal infrared bands 29, 31, 32, and 33. Through the trial and error, we set the number of hidden nodes and hidden layers from small to large, and obtained the relatively optimal results. Some of the results are shown in Tables 3-5. The first row of the table represents the number of hidden nodes, and the second row represents the accuracy, and the first column represents the number of hidden layers.

Theoretical Accuracy Validation and Analysis
Simulation data and DL are used to optimize the computational solution process for the physical method. The MODIS sensor provides the water vapor inversion calculation bands (2, 5, 17, 18, and 19). First, we calculate the atmospheric water vapor content [50] and retrieve LST and LSE using the corresponding combination model, which are consistent with those in Table 2 and can be referred to in References [48,49]. Then, we take LST and LSE as prior knowledge, and use the corresponding model combination to continue to retrieve NSAT. The combination 1-3 of inversion of LST and LSE corresponds to model 1-3 in Table 2, and the difference is that the surface temperature and emissivity are unknown. Combination 1 is suitable for day retrieval, and Table 3 summarizes the LST retrieval errors. The LST is retrieved most accurately when the numbers of hidden layers and hidden nodes are 9 and 600, respectively, corresponding to a mean absolute error (MAE) and RMSE of 0.59 K and 0.93 K, respectively. The corresponding MAEs of LSE 29 , LSE 31 , and LSE 32 are 0.007, 0.005, and 0.006, and the RMSEs are 0.009, 0.006, and 0.004, respectively. The average errors of the emissivity inversions are lower than 0.01.
Combination 2 is suitable for night retrieval because MODIS bands 20, 22, and 23 have an atmospheric window wavelength of 3-5 µm, which is suitable for night use. Because band 33 is affected by the carbon dioxide level, we select the combination of bands 20, 22, 23, 29, 31, and 32 to decrease the amount of data. Combination 3 is suitable for both day and night. Table 5 summarizes the retrieval errors for the LST. The LST is retrieved most accurately when the numbers of hidden layers and hidden nodes are 5 and 800, respectively, with the MAE and RMSE being 0.87 K and 1.02 K, respectively. The corresponding MAEs of LSE 29 , LSE 31 , LSE 32 , and LSE 33 are 0.008, 0.007, 0.005, and 0.004, and the RMSEs are 0.010, 0.007, 0.006, and 0.006, respectively. The inversion errors of LSE 29 are considerably larger than those of LSE 31 and LSE 32 . Compared to Combination 1, the overall accuracy of the retrieval result is slightly decreased. A comparison of the differences in the input parameters of the two combinations indicates that the addition of water vapor information can enhance the accuracy of the LST retrieval results. Figure 5 shows the distribution of LSE inversion errors when the three Combinations are the most accurate. The retrieval accuracies of the different combinations are stable, consistent with those of the LST. The LSE retrieval accuracy of Combination 2 is the highest, followed by Combinations 1 and 3. The average retrieval error of the LSE for all three Combinations is less than 0.01. The LSE inversion accuracy of bands 20, 22, and 23 is inferior to that of bands 29, 31, and 32. The main reason is that LSE20, LSE22, and LSE23 have a larger fluctuation range than in the other bands, which deteriorates the retrieval accuracy. The retrieval accuracy of LSE29 is slightly lower than those of LSE31 and LSE32, and the retrieval errors of LSE31 and LSE32 are approximately equal to and lower than 0.006, respectively. The National Aeronautics and Space Administration (NASA) defines MODIS products as having a high pixel quality when the LSE error is smaller than 0.01. To ensure the accuracy of NSAT inversion, we select only LSE31 and LSE32 as the prior knowledge.
Remote Sens. 2022, 14, x FOR PEER REVIEW 12 of 23 0.020, 0.024, 0.025, 0.007, 0.003, and 0.003, respectively. The inversion errors of LSE20, LSE22, and LSE23 are considerably larger than those of LSE29, LSE31, and LSE32. Combination 3 is suitable for both day and night. Table 5 summarizes the retrieval errors for the LST. The LST is retrieved most accurately when the numbers of hidden layers and hidden nodes are 5 and 800, respectively, with the MAE and RMSE being 0.87 K and 1.02 K, respectively. The corresponding MAEs of LSE29, LSE31, LSE32, and LSE33 are 0.008, 0.007, 0.005, and 0.004, and the RMSEs are 0.010, 0.007, 0.006, and 0.006, respectively. The inversion errors of LSE29 are considerably larger than those of LSE31 and LSE32. Compared to Combination 1, the overall accuracy of the retrieval result is slightly decreased. A comparison of the differences in the input parameters of the two combinations indicates that the addition of water vapor information can enhance the accuracy of the LST retrieval results. Figure 5 shows the distribution of LSE inversion errors when the three Combinations are the most accurate. The retrieval accuracies of the different combinations are stable, consistent with those of the LST. The LSE retrieval accuracy of Combination 2 is the highest, followed by Combinations 1 and 3. The average retrieval error of the LSE for all three Combinations is less than 0.01. The LSE inversion accuracy of bands 20, 22, and 23 is inferior to that of bands 29, 31, and 32. The main reason is that LSE20, LSE22, and LSE23 have a larger fluctuation range than in the other bands, which deteriorates the retrieval accuracy. The retrieval accuracy of LSE29 is slightly lower than those of LSE31 and LSE32, and the retrieval errors of LSE31 and LSE32 are approximately equal to and lower than 0.006, respectively. The National Aeronautics and Space Administration (NASA) defines MODIS products as having a high pixel quality when the LSE error is smaller than 0.01. To ensure the accuracy of NSAT inversion, we select only LSE31 and LSE32 as the prior knowledge. The information obtained by the sensor is mainly associated with the LST. We solve Equation (8) for the NSAT considering the LSE and LST as prior knowledge. The process is similar to the inversion calculation of the LST and LSE described above, but the two parameters are simultaneously used as the input information. Table 6 summarizes the retrieval errors of the NSAT for different models. The information obtained by the sensor is mainly associated with the LST. We solve Equation (8) for the NSAT considering the LSE and LST as prior knowledge. The process is similar to the inversion calculation of the LST and LSE described above, but the two parameters are simultaneously used as the input information. Table 6 summarizes the retrieval errors of the NSAT for different models.  Figure 6 shows the distribution of the NSAT inversion errors when the three models are the most accurate. The NSAT retrieval accuracies of the models are high. Model 2 (20,22,23,29,31,32, LSE, LST) achieves the highest retrieval accuracy (MAE and RMSE of 0.78 K and 0.89 K, respectively), followed by Models 1 (29, 31, 32, WVC, LSE, LST) and 3 (29, 31, 32, 33, LSE, LST), which have MAE and RMSE values of 0.89 K and 1.13 K, respectively, and 1.01 K and 1.26 K, respectively. A comparison of Models 1 and 3 shows that the NSAT retrieval accuracy can be improved by replacing band 33 with the water vapor band information. The retrieval accuracy is different for different band combinations, and the optimal combination must be selected considering the band settings of thermal infrared instruments in specific applications.

Practical Validation and Analysis
The simulated data mainly represent the solutions of physical methods. To make the inversion more representative, we supplement the high-precision solutions of the corresponding statistical methods. Sections 2.2 and 3.3 describe the methods to obtain the solutions of the statistical method. The inversion calculation and accuracy after the simulation data are supplemented with statistical data are similar to those of the analysis presented in Section 4.1 and are thus not repeated. Overall, the previous sections describe the NSAT inversion method in which the LST and LSE are used as prior knowledge and DL is coupled with physical and statistical methods.
The study area in this analysis is the North China Plain region (Figure 7), which is one of the three major plains in China and ranges from 32°-40° N and 114°-121° E. The area is a typical alluvial plain with a flat terrain that facilitates transportation, and it contains many rivers and lakes. The North China Plain is a developed agricultural region in China and experiences a temperate monsoon climate. Many ground observations regarding this area are available, which is conducive to analyzing the inversion results.

Practical Validation and Analysis
The simulated data mainly represent the solutions of physical methods. To make the inversion more representative, we supplement the high-precision solutions of the corresponding statistical methods. Sections 2.2 and 3.3 describe the methods to obtain the solutions of the statistical method. The inversion calculation and accuracy after the simulation data are supplemented with statistical data are similar to those of the analysis presented in Section 4.1 and are thus not repeated. Overall, the previous sections describe the NSAT inversion method in which the LST and LSE are used as prior knowledge and DL is coupled with physical and statistical methods.
The study area in this analysis is the North China Plain region (Figure 7), which is one of the three major plains in China and ranges from 32 • -40 • N and 114 • -121 • E. The area is a typical alluvial plain with a flat terrain that facilitates transportation, and it contains many rivers and lakes. The North China Plain is a developed agricultural region in China and experiences a temperate monsoon climate. Many ground observations regarding this area are available, which is conducive to analyzing the inversion results.
Two MODIS images are used to demonstrate the PS-DL inversion application. The dates of the Terra/MODIS and Aquia/MODIS images are 14 October 2014 and 1 August 2018 (nighttime), respectively. Mid-infrared data (bands 20, 22, and 23), thermal infrared data (bands 29, 31, 32, and 33), and WVC data (obtained by bands 2, 5, 17, 18, and 19 retrieval) are used as the input parameters for the PS-DL. The inversion process of the surface temperature and emissivity has been described by  and Wang et al. (2021). In this study, the NSAT inversion results are presented. Figures 8 and 9 show that the overall distribution of the NSAT retrieved by the PS-DL method is similar to the distribution pertaining to the ERA5-Land products. In addition, Figures 10 and 11 show that the differences between the ERA5-Land products and NSAT retrieved by the PS-DL method range from −2 K to 2 K. As shown in Figure 8, the differences in the NSAT retrieval results and ERA5-Land products are significant in certain regions that are mainly agricultural production areas. These differences can be explained by the following aspects: Straw burning occurs after the autumn harvest in the North China Plain in October, and the LSE varies, which influences the NSAT retrieval results. The MAE and RMSE for Models 1 and 3 are 1.18 K and 1.52 K, respectively, and 1.61 K and 1.90 K, respectively. A comparison of the two inversion modes shows that replacing the band 33 information with water vapor information allows the DL to capture more information and improve the retrieval accuracy. As shown in Figures 10 and 11, the NSAT retrieved by Model 2 is the closest to the ERA5-Land products, with the MAE and RMSE being 0.89 K and 1.18 K, respectively. The NSAT retrieval accuracy of Model 3 is slightly worse (MAE and RMSE of 1.33 K and 1.65 K, respectively). The use of more band information helps increase the inversion accuracy. In practical applications, the appropriate combination must be selected according to the specific thermal infrared sensor.
is coupled with physical and statistical methods.
The study area in this analysis is the North China Plain region (Figure 7), which is one of the three major plains in China and ranges from 32°-40° N and 114°-121° E. The area is a typical alluvial plain with a flat terrain that facilitates transportation, and it contains many rivers and lakes. The North China Plain is a developed agricultural region in China and experiences a temperate monsoon climate. Many ground observations regarding this area are available, which is conducive to analyzing the inversion results.   29, 31, 32, and 33), and WVC data (obtained by bands 2, 5, 17, 18, and 19 retrieval) are used as the input parameters for the PS-DL. The inversion process of the surface temperature and emissivity has been described by  and Wang et al. (2021). In this study, the NSAT inversion results are presented. Figures 8 and 9 show that the overall distribution of the NSAT retrieved by the PS-DL method is similar to the distribution pertaining to the ERA5-Land products. In addition, Figures 10 and 11 show that the differences between the ERA5-Land products and NSAT retrieved by the PS-DL method range from −2 K to 2 K. As shown in Figure 8, the differences in the NSAT retrieval results and ERA5-Land products are significant in certain regions that are mainly agricultural production areas. These differences can be explained by the following aspects: Straw burning occurs after the autumn harvest in the North China Plain in October, and the LSE varies, which influences the NSAT retrieval results. The MAE and RMSE for Models 1 and 3 are 1.18 K and 1.52 K, respectively, and 1.61 K and 1.90 K, respectively. A comparison of the two inversion modes shows that replacing the band 33 information with water vapor information allows the DL to capture more information and improve the retrieval accuracy. As shown in Figures 10 and 11, the NSAT retrieved by Model 2 is the closest to the ERA5-Land products, with the MAE and RMSE being 0.89 K and 1.18 K, respectively. The NSAT retrieval accuracy of Model 3 is slightly worse (MAE and RMSE of 1.33 K and 1.65 K, respectively). The use of more band information helps increase the inversion accuracy. In practical applications, the appropriate combination must be selected according to the specific thermal infrared sensor.      To further evaluate the accuracy of the PS-DL method, the retrieved NSAT is crossvalidated against the China Meteorological Forcing Dataset (CMFD). The temporal resolution of this dataset is 3 h, and we evaluate the model accuracy by interpolating the CMFD products with the MODIS imaging time as the base. Figures 12-15 show the comparison of the inversion results. In the daytime and nighttime scenarios, Model 1 (MAE and RMSE of 1.46 K and 1.91 K, respectively) and Model 2 (MAE and RMSE of 1.00 K and 1.29 K, respectively) exhibit the highest retrieval accuracies, respectively. Most of the retrieval errors are concentrated between −2 K and 2 K. The cross-validation accuracy with the CMFD product is slightly lower than that of the ERA5-Land product, attributable to the temporal resolution (3 h) for CMFD products. In certain regions, the use of linear interpolation may lead to increased errors. The cross-validation results demonstrate the potential of the PS-DL method in retrieving the NSAT. The inversion error at the junction of water and land is large, attributable to the presence of mixed pixels. Additionally, the energy exchange at the junction of water and land is different from that in the other locations. Therefore, the training data for these regions must be supplemented.
Moreover, we verify the PS-DL performance against ground measured data. The ground verification data are based on ground meteorological observation points in a flat terrain with a single surface type. In situ data acquired in clear sky conditions are compared with the NSAT inversions from Models 1-3. As shown in Figure 16, the MAE and RMSE are, respectively, 1.44 K and 1.63 K for Model 1 and 1.21 K and 1.33 K for Model 2. In the case of Model 3, the inversion accuracy for the nighttime is higher than that for the daytime.   In the daytime and nighttime scena and RMSE of 1.46 K and 1.91 K, respectively) and Model 2 (MAE and 1.29 K, respectively) exhibit the highest retrieval accuracies, respectiv trieval errors are concentrated between −2 K and 2 K. The cross-valid the CMFD product is slightly lower than that of the ERA5-Land pro the temporal resolution (3 h) for CMFD products. In certain regions, terpolation may lead to increased errors. The cross-validation results tential of the PS-DL method in retrieving the NSAT. The inversion err water and land is large, attributable to the presence of mixed pixel energy exchange at the junction of water and land is different from th tions. Therefore, the training data for these regions must be suppleme        Moreover, we verify the PS-DL performance against ground measured data. The ground verification data are based on ground meteorological observation points in a flat terrain with a single surface type. In situ data acquired in clear sky conditions are compared with the NSAT inversions from Models 1-3. As shown in Figure 16, the MAE and RMSE are, respectively, 1.44 K and 1.63 K for Model 1 and 1.21 K and 1.33 K for Model 2.
In the case of Model 3, the inversion accuracy for the nighttime is higher than that for the daytime.

Discussion
The NSAT is affected by not only the surface radiation but also the airflow and other factors. Most of the existing methods consider the effect of only the surface temperature or atmospheric profile temperature, which limits their accuracy and portability. In this study, to accurately invert the NSAT, we establish the physical method by considering

Discussion
The NSAT is affected by not only the surface radiation but also the airflow and other factors. Most of the existing methods consider the effect of only the surface temperature or atmospheric profile temperature, which limits their accuracy and portability. In this study, to accurately invert the NSAT, we establish the physical method by considering not only the influence of the LST and LSE but also the relationship between the atmospheric profile temperature (average effective temperature of the atmosphere), satellite BT, and NSAT. Our statistical approach is based on the derivation of the physical method. The main difference is that the NSAT and satellite BT data obtained directly from multiple sources are used as a supplement to the physical method. DL is incorporated to combine physical and statistical methods and exploit the advantages of the three methods.
The inversion of the NSAT using the PS-DL approach yields satisfactory results when the LST and LSE are used as prior knowledge. To demonstrate the importance of using these parameters as prior knowledge, we analyze the inversion accuracy of the NSAT without using the LST and LSE. Five groups of comparative analyses are performed, and the results are shown in Table 7. In Test 3, in which there are only four bands of information, the MAE and RMSE are 1.61 K and 1.90, respectively. The accuracy is enhanced when band 33 is replaced with the water vapor information (Test 1), with the MAE and RMSE being 1.32 K and 1.65 K, respectively. In Test 2, more band information is added, and the accuracy is further enhanced, with the MAE and RMSE being 0.98 K and 1.21 K, respectively. This finding highlights that in given conditions, the accuracy can be increased by increasing the number of bands. A comparison of the values presented in Tables 6 and 7 indicates that the NSAT inversion accuracy can be increased by using LST and LSE as prior knowledge. The accuracy of Test 4 (bands 29, 31, 32, LST, LSE), with the MAE and RMSE being 1.11 K and 1.40 K, respectively, is lower than that of Model 1. This finding demonstrates the importance of using the water vapor information for NSAT retrieval. A comparison of Models 2 and 3 shows that increasing the band information can enhance the inversion accuracy. Notably, in practical applications, several satellites may not have adequate thermal bands to complete the inversion. Therefore, we designed Test 5 (bands 31, 32, LST, and LSE), which achieves satisfactory MAE and RMSE values of 1.24 K and 1.56 K, respectively. This analysis demonstrates that the optimal combination must be selected considering the sensor band settings to enhance the NSAT inversion accuracy.

Conclusions
Considering the advantages and disadvantages of traditional methods, we develop a novel fully coupled framework to robustly invert the NSAT. The proposed PS-DL framework inherits the advantages of physical, statistical, and DL methods. Through the iterative optimization of physical and statistical methods, DL effectively solves the ill-conditioned problem of NSAT inversion and enhances the NSAT inversion accuracy. In this framework, LST and LSE are retrieved first, and then LST and LSE are used as prior knowledge to further retrieve NSAT. This model will become a paradigm for retrieving near-surface air temperature from thermal infrared remote sensing, and the DL for NSAT inversion is not only physically meaningful but also interpretable, which is a milestone in the history of near-surface air temperature retrieval. Because it realizes the coordinated development, mutual promotion, and integration of different methods, the proposed framework demonstrates significant application potential in the inversion of geophysical parameters.
To enhance the accuracy and the portability of the algorithm, the LST and LSE are used as prior knowledge. The best band combinations for NSAT retrieval from MODIS data are bands 20,22,23,29,31, and 32 with LST and LSE because the influence of the solar radiation is eliminated (nighttime conditions), and the relationship between the ground and atmosphere gradually reaches a state of equilibrium. Consequently, in these conditions, the NSAT experiences low interference and is stable. The inversion accuracy for the combination of bands suitable for day inversion can be enhanced by adding the atmospheric WVC information. The inversion results of different models show that the PS-DL method exhibits a high inversion accuracy in all periods, which satisfies practical application requirements. To further enhance the inversion accuracy, the observation angles can be divided into different intervals to build training and test databases to retrain the DL-NN. The division can also be performed according to different seasons and regions.