Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints

: The surface all-wave net radiation ( R n ) plays an important role in the energy and water cycles, and most studies of R n estimations have been conducted using satellite data. As one of the most commonly used satellite data sets, Moderate Resolution Imaging Spectroradiometer (MODIS) data have not been widely used for radiation calculations at mid-low latitudes because of its very low revisit frequency. To improve the daily R n estimation at mid-low latitudes with MODIS data, four models, including three models built with random forest (RF) and different temporal expansion models and one model built with the look-up-table (LUT) method, are used based on comprehensive in situ radiation measurements collected from 340 globally distributed sites, MODIS top-of-atmosphere (TOA) data, and the ﬁfth generation of European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) data from 2000 to 2017. After validation against the in situ measurements, it was found that the RF model based on the constraint of the daily R n from ERA5 (an RF-based model with ERA5) performed the best among the four proposed models, with an overall validated root-mean-square error (RMSE) of 21.83 Wm − 2 , R 2 of 0.89, and a bias of 0.2 Wm − 2 . It also had the best accuracy compared to four existing products (Global LAnd Surface Satellite Data (GLASS), Clouds and the Earth’s Radiant Energy System Edition 4A (CERES4A), ERA5, and FLUXCOM_RS) across various land cover types and different elevation zones. Further analyses illustrated the effectiveness of the model by introducing the daily R n from ERA5 into a “black box” RF-based model for R n estimation at the daily scale, which is used as a physical constraint when the available satellite observations are too limited to provide sufﬁcient information (i.e., when the overpass time is less than twice per day) or the sky is overcast. Overall, the newly-proposed RF-based model with ERA5 in this study shows satisfactory performance and has strong potential to be used for long-term accurate daily R n global mapping at ﬁner spatial resolutions (e.g., 1 km) at mid-low latitudes.


Introduction
The land surface all-wave net radiation (R n ) represents the radiative energy balance at the land surface [1]; it drives the main biophysical processes, such as evapotranspiration and photosynthesis [2,3], and is one of the most essential physical parameters describing Remote Sens. 2022, 14, 33 2 of 27 interactions between the land and atmosphere. Moreover, R n is one of the most important inputs for various land surface process models and parameter calculations (e.g., evapotranspiration <ET>) [4]. Mathematically, R n is the sum of the net shortwave and net longwave radiation [1,5]: R n = R ns +R nl (1) where R ns , R nl , R ↓ s , R ↑ s , R ↓ l , and R ↑ l are the land surface net shortwave radiation (Wm −2 , where downward is defined as positive), net longwave radiation (Wm −2 ), downward shortwave radiation (Wm −2 ), upward shortwave radiation (Wm −2 ), downward longwave radiation (Wm −2 ), and upward longwave radiation (Wm −2 ), respectively, and α is the land surface broadband albedo.
R n can be obtained from direct measurements or from the four radiative components shown in Equation (2) measured at ground stations, which are usually taken as "ground truth" values [6,7]. However, because of the limited length of observing periods and limited spatial distributions due to expensive maintenance costs, applications of ground R n measurements are very limited. An alternative way to obtain R n is from various products, including reanalysis [8] and remotely sensed products [9]. Reanalysis products (e.g., ECMWF Interim Reanalysis from the European Centre for Medium-Range Weather Forecasts [8]) are derived by merging available observations with an atmospheric model [10], and despite their longer temporal spans, they are usually at coarse resolutions and are thought to be less accurate than products obtained from remotely sensed products [11], where the poorer accuracies are possibly caused by inaccurate cloud information [11][12][13]. Remotely sensed products are generated based on satellite data; however, most existing satellite products have coarse spatial resolutions (e.g., CERES4A [9]). Even the latest product with the finest spatial resolution, Global LAnd Surface Satellite Data (GLASS) [14,15], is still 0.05 • , and thus their applications are still limited, especially at regional scales. Therefore, more R n products with high accuracy and finer spatial resolutions are still required where satellites are the most appropriate data source.
Thus far, many methods have been successfully developed for retrieving surface radiation from satellite data [5,15,16], which either take high-level remotely sensed products (e.g., atmospheric, surface, and cloud parameters) or top of the atmosphere (TOA) satellite observations (e.g., surface reflectance/radiance) as input. By considering error propagation and accumulation, the method of estimating radiation directly from TOA satellite observations is more preferable [15,17]. These kinds of methods can be roughly divided into three classes: radiative transfer models (RTMs) [9,18], hybrid models, and empirical/statistical models. RTMs have clear physical mechanisms and have been used to generate popular radiative products such as CERES4A. However, RTMs are complicated and they have high requirements in terms of, both the quality and quantity of input.
A hybrid model is established by combining simulations from RTMs (i.e., MOD-TRAN5) with statistical regression methods, making them easy to implement in practical applications; they also have strong physical mechanisms. For example, Wang et al. [19] developed a hybrid model to estimate the clear-sky instantaneous R n (R n_ins ) directly from combined visible and shortwave infrared (VSWIR) and thermal infrared (TIR) remote sensing data. Their results demonstrated that the new model outperformed traditional component-based models in R n_ins estimation, with a root-mean-square error (RMSE) of 70.6 Wm −2 at seven sites observed by the Surface Radiation Budget Network (SURFRAD). However, a successful and robust hybrid model highly depends on the comprehensive training samples simulated from the RTM by pre-setting various combinations of surface and atmospheric conditions. Indeed, several scenarios are very difficult to simulate, such as cases with a very large solar zenith angle [19] or cases where cloud structure is taken into account [19]. To improve this, an empirical model was recently successfully developed at high latitudes by Chen et al. [15] in which comprehensive R n ground measurements were collected to replace simulated values of R n in the hybrid model in order to estimate the daily R n (R n_daily ) from MODIS TOA observations. Then, by considering the influence of the daytime length on R n_daily , the length ratio of a daytime (LRD)-based model was built with an artificial neural network (ANN). The model validation results indicated the effectiveness of this new model, which had a similar spatial distribution to that of the CERES4A product but with more details (1 km) and higher accuracy, with an overall RMSE of 23.66 Wm −2 and 15.04 Wm −2 when using instantaneous MODIS TOA observations at daytime and nighttime, respectively. However, this model was only built for high latitudes, and more efforts are needed to make it globally applicable. Compared to other satellites, MODIS, a polar-orbiting satellite with 36 bands between 0.405 and 14.385 µm, including 20 visible and shortwave infrared bands (at 250 m and 500 m) and 16 thermal infrared bands (at 1 km), has been preferred for radiation estimation in previous studies, such as for R n_ins [19], R n_daily [15], instantaneous R ns [17], daily R ns [19], etc. Hence, it is feasible to develop a similar empirical model also taking MODIS TOA data as input for R n_daily estimation at mid-low latitudes.
However, one of the most essential issues to be addressed is how to obtain R n_daily at mid-low latitudes from limited MODIS instantaneous observations. It is known that the revisit frequency of MODIS at high latitudes is~6 to 20 times per day [15], which is much higher than that at mid-low latitudes, which may only be observed 2-4 times per day for most regions and only once for some tropical regions, which means that the information for characterizing variations in R n throughout a single day at mid-low latitudes is very likely inadequate. To address this issue, most previous studies have developed a temporal expansion model, which can include the theoretical sinusoidal model [5,20], empirical models (i.e., quadratic polynomial regression method [21]), and the look-up-table (LUT) method [19]. However, most of these models have to be used with the assumption that the atmospheric and surface conditions remain unchanged throughout the day [19], which is not in accord with reality [22]. Another solution is to add some necessary information from other data sources when making the estimations, such as various parameters from remotely sensed or reanalysis products. For example, Zhang et al. [23] employed the aerosol optical depth (AOD) and cloud optical depth (COD) products from MODIS in a cost function to constrain their retrieving procedure of atmospheric parameters and obtain more accurate instantaneous R ↓ s estimations. Another approach is to simply apply the same parameter estimates from a physical-based model as constraints, such as using estimations from empirical models built with machine-learning methods, which are thought to be "black box" calculation processes lacking any physical basis and which can easily produce output with large uncertainties [24]. For instance, Xu et al. [21] applied the daily solar radiation from Modern-Era Retrospective Analysis for Research and Applications (MERRA) to correct the corresponding values, which were systematically underestimated when using MODIS's atmospheric products.
Hence, inspired by these studies, R n_daily estimates from the newly released long-term fifth generation of European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) reanalysis product [25] are introduced as a physical constraint in a newly proposed empirical model for R n_daily estimation at mid-low latitudes from MODIS TOA data in this study. ERA5 is among the newest generation of reanalysis products, providing high-quality reanalysis of global atmospheric, oceanic, and land-surface fields at hourly time steps with a spatial resolution of 25 km [26]. ERA5 was generated based on the Integrated Forecasting System (IFS), and benefits from a decade of developments in model physics, core dynamics, and data assimilation with the 12 hourly 4DVar using information from a substantial volume of improved observations [27]. Specifically, the radiation components in ERA5 were generated from RTMs which simulate the attenuation caused by various atmosphere and surface conditions [28]; they perform better and with a significantly lower bias than products from the previous versions, ERA-Interim and MERRA version2 (MERRA2), which were validated against ground-based measurements, especially for inland regions [27]. The major purpose of this study is to propose a new empirical model to estimate R n_daily at mid-low latitudes directly from MODIS TOA observations by introducing R n_daily from ERA5 as a constraint. For comparison, three other R n_daily estimation models, one without ERA5 R n_daily , one with a traditional temporal expansion model, and another based on the LUT method, are also established at mid-low latitudes. Afterwards, all four proposed models are evaluated with the ground measurements and compared with other products providing R n_daily .
The organization of the remainder of this paper is as follows. Section 2 introduces the data used in this study and their pre-processing. Section 3 presents the development of the R n_daily estimation models. The results of the model validation and comparison along with analysis are given in Section 4. Sections 5 and 6 provide discussion and conclusions, respectively.

Data and Pre-Processing
In this study, in situ radiation measurements collected from more than 300 stations at mid-low latitudes together with MODIS products and ERA5 reanalysis data were used for model development. For comparison, R n_daily from four newly released products, namely GLASS [14,15], FLUXCOM [29], CERES4A [9], and ERA5 [25], were used. More details are given below.

Ground Measurements
As stated above, in situ radiation measurements from 340 global distributed sites at mid-low latitudes (60 • S-60 • N) in thirteen measuring networks were collected for the period 2000-2017 (see Table 1). Some sites provided R n measurements directly, while most sites provided four radiative components (R ↓ s , R ↑ s , R ↓ l , and R ↑ l ). Since R ↓ s was needed for model evaluation, this study only used samples from the sites that could provide R n and R ↓ s simultaneously. According to the study of Jia et al. [11], the uncertainty of the measured R n in most networks at mid-low latitudes is relatively small and can be neglected.  Figure 1 shows the geographical distribution of all sites with various land cover types defined by the International Geosphere-Biosphere Programme (IGBP), and also presents the climate zones defined by the Köppen-Geiger climate classification [39] to which these sites belong. The elevations of the sites range from −7 m to 4698 m above sea level. Therefore, the comprehensive in situ measurements collected in this study, spanning areas within 60 • S to 60 • N, represent a variety of climatic conditions, geographic environments, and ecosystem conditions. More information can be found in [14]. Figure 1 shows the geographical distribution of all sites with various land cover types defined by the International Geosphere-Biosphere Programme (IGBP), and also presents the climate zones defined by the Köppen-Geiger climate classification [39] to which these sites belong. The elevations of the sites range from −7 m to 4698 m above sea level. Therefore, the comprehensive in situ measurements collected in this study, spanning areas within 60°S to 60°N, represent a variety of climatic conditions, geographic environments, and ecosystem conditions. More information can be found in [14].

Radiation Measurements
In order to ensure the quality of the in situ measurements, all the and ↓ measurements were manually checked and further selected, with only measurements labelled as high quality by their releasers being used. In this study, measurements with both instantaneous and daily scales as well as daily ↓ measurements are used. Because the observed frequencies are different for various sites, the scale of "instantaneous" is defined as half an hour, following the study of Wang et al. [19], which means that the average of all available from fifteen minutes before and after MODIS passes over the pixel of the corresponding site (whose sampling frequency is less than 30 min) is taken as the in situ _ value. The _ and ↓ values are calculated only if at least one observation is available in each half hour in one day, as performed in the study of Jiang et al. [14]. After matching with the MODIS observations and manually checking and removing any records missing band observations, 164,654 samples at instantaneous scales and 564,967 samples at daily scales were collected, which were then divided into training and validation datasets. To ensure similar distributions of the characteristics of the two datasets, all instantaneous and daily samples for each site were first sorted by their observing time, then a sample was taken from every five samples as validation and the remaining four were used as training samples; this helps to ensure the same physical range of for training and validation samples. This procedure was repeated sequentially for each site until all sites were processed. Therefore, 80% of the samples were used for modeling (131,563 instantaneous samples and 452,098 daily samples), and the other 20% for independent validation (33,091 instantaneous samples and 112,869 daily samples).

Radiation Measurements
In order to ensure the quality of the in situ measurements, all the R n and R ↓ s measurements were manually checked and further selected, with only measurements labelled as high quality by their releasers being used. In this study, R n measurements with both instantaneous and daily scales as well as daily R ↓ s measurements are used. Because the observed frequencies are different for various sites, the scale of "instantaneous" is defined as half an hour, following the study of Wang et al. [19], which means that the average of all available R n from fifteen minutes before and after MODIS passes over the pixel of the corresponding site (whose sampling frequency is less than 30 min) is taken as the in situ R n_ins value. The R n_daily and R ↓ s values are calculated only if at least one observation is available in each half hour in one day, as performed in the study of Jiang et al. [14]. After matching with the MODIS observations and manually checking and removing any records missing band observations, 164,654 samples at instantaneous scales and 564,967 samples at daily scales were collected, which were then divided into training and validation datasets. To ensure similar distributions of the R n characteristics of the two datasets, all instantaneous and daily samples for each site were first sorted by their observing time, then a sample was taken from every five samples as validation and the remaining four were used as training samples; this helps to ensure the same physical range of R n for training and validation samples. This procedure was repeated sequentially for each site until all sites were processed. Therefore, 80% of the samples were used for modeling (131,563 instantaneous samples and 452,098 daily samples), and the other 20% for independent validation (33,091 instantaneous samples and 112,869 daily samples).
Furthermore, considering the discrepancy in the spatial resolution of MODIS (1 km) and the ERA5 reanalysis data (25 km), the spatial homogeneity of all sites within a 25 × 25 km 2 region was examined from three aspects: variations in the land cover type, topography, and vegetation coverage; the results are shown in Figure 2. For land cover (Figure 2a), the proportion of land cover which is the same as the site within a 25 × 25 km 2 grid was identified according to the MODIS Land Cover Climate Modeling Grid (CMG) (MCD12C1) product, which has a resolution of 5 km (https://lpdaac.usgs.gov/products/ mcd12c1v006/, accessed on 30 July 2021), and calculated for each of the 340 sites; then, the proportions of all sites were divided into five ranges (0-20%, 20-40%, 40-60%, 60-80%, and 80-100%). It can be seen in Figure 2a that more than 65% of the samples have a consistency of land cover around the sites of more than 60%. Figure 2b shows the variations in elevation (represented by the standard deviation) within a 25 × 25 km 2 region of each site; the elevations were provided by the National Aeronautics and Space Administration (NASA) Shuttle Radar Topographic Mission (SRTM), with a spatial resolution of 90 m. Most sites (>75%) had a relatively flat surface, with an elevation standard deviation of less than 20 m. The vegetation coverage of each site is represented by the normalized difference vegetation index (NDVI), obtained from GLASS at a 5 km spatial resolution [40]. Similarly, the variation in NDVI within the 25 × 25 km 2 region of each site was calculated; the results are shown in Figure 2c. The vegetation coverage within this region is relatively uniform for more than 85% of the sites, which have an NDVI standard deviation less than 0.1. Overall, most sites in this study were homogeneous, with a 25 km spatial size.
Furthermore, considering the discrepancy in the spatial resolution of MODIS (1 k and the ERA5 reanalysis data (25 km), the spatial homogeneity of all sites within a 25 × km 2 region was examined from three aspects: variations in the land cover type, topog phy, and vegetation coverage; the results are shown in Figure 2. For land cover (Figu 2a), the proportion of land cover which is the same as the site within a 25 × 25 km 2 gr was identified according to the MODIS Land Cover Climate Modeling Grid (CM (MCD12C1) product, which has a resolution of 5 km (https://lpdaac.usgs.gov/pro ucts/mcd12c1v006/, accessed on 30 July 2021), and calculated for each of the 340 sites; the the proportions of all sites were divided into five ranges (0-20%, 20-40%, 40-60%, 60-80 and 80-100%). It can be seen in Figure 2a that more than 65% of the samples have a co sistency of land cover around the sites of more than 60%. Figure 2b shows the variatio in elevation (represented by the standard deviation) within a 25 × 25 km 2 region of ea site; the elevations were provided by the National Aeronautics and Space Administrati (NASA) Shuttle Radar Topographic Mission (SRTM), with a spatial resolution of 90 Most sites (>75%) had a relatively flat surface, with an elevation standard deviation of le than 20 m. The vegetation coverage of each site is represented by the normalized diff ence vegetation index (NDVI), obtained from GLASS at a 5 km spatial resolution [4 Similarly, the variation in NDVI within the 25 × 25 km 2 region of each site was calculate the results are shown in Figure 2c. The vegetation coverage within this region is relative uniform for more than 85% of the sites, which have an NDVI standard deviation less th 0.1. Overall, most sites in this study were homogeneous, with a 25 km spatial size.

Clearness Index Calculation
To evaluate model performance under various sky conditions, the daily clearness dex (CI), which is usually used for representing atmosphere transmittance [41], was a plied in this study. The daily CI is defined as the ratio of the daily ↓ to extraterrestr radiation ( ): CI =

Clearness Index Calculation
To evaluate model performance under various sky conditions, the daily clearness index (CI), which is usually used for representing atmosphere transmittance [41], was applied in this study. The daily CI is defined as the ratio of the daily R ↓ s to extraterrestrial radiation (R se ): In this study, the in situ daily R ↓ s measurements are used for the CI calculations, and R se (Wm −2 ) is calculated using Equation (5) from [42]: (w s sin(ϕ) sin(δ) + cos(δ) sin(w s )) where G sc is the solar constant (0.0820 MJm −2 ·min −1 ), d r is the inverse relative distance from the Earth to the Sun, ω s is the sunset hour angle (rad), ϕ is the latitude (rad), δ is the solar declination (rad), and doy is the day of the year. In this study, the daily CI at each site is calculated. Additionally, d r and the latitude (ϕ, • ) of each site are used for model development.

MODIS Products for Modeling
Theoretically, the two MODIS sensors on the Terra and Aqua satellites funded by NASA can provide approximately four overpasses per day for most places globally [5]; the four times are 10:30 am, 1:30 pm, 10:30 pm, and 1:30 am at local time. To provide a better illustration, the average overpass frequency of MODIS during the daytime for a randomly selected day (the 240th day of 2005) at a spatial resolution of 5 km was calculated and is shown in Figure 3; it should be noted that this is the average frequency in a 5 × 5 km grid. It can be seen that the overpass frequency during the daytime for most regions at mid-low latitudes is only 1-2 times a day, which is much less than at high latitudes. In this study, both the MOD and MYD series products provided by Terra and Aqua, respectively, were used, similar to the study of Wang et al. [19], including the TOA reflectance and radiance data from MOD/MYD021KM, the corresponding geolocation and view geometry information from MOD/MYD03, and the cloud mask from MOD/MYD35 from 2000 to 2017. Table 2 lists all of the MODIS products used. All MODIS products were extracted according to the locations of the 340 sites and converted from coordinated universal time (UTC) to local time (LT) to match the in situ measurements. In addition, the Z-score normalization (mean = 0, standard deviation = 3) method was applied to the MODIS TOA observations to exclude any abnormal values before modeling. In this study, the in situ daily ↓ measurements are used for the CI calculations, and (Wm −2 ) is calculated using Equation (5)  = arccos (− tan( ) tan ) where is the solar constant (0.0820 MJm −2 ·min −1 ), is the inverse relative distance from the Earth to the Sun, is the sunset hour angle (rad), φ is the latitude (rad), δ is the solar declination (rad), and is the day of the year. In this study, the daily CI at each site is calculated. Additionally, and the latitude (φ, °) of each site are used for model development.

MODIS Products for Modeling
Theoretically, the two MODIS sensors on the Terra and Aqua satellites funded by NASA can provide approximately four overpasses per day for most places globally [5]; the four times are 10:30 am, 1:30 pm, 10:30 pm, and 1:30 am at local time. To provide a better illustration, the average overpass frequency of MODIS during the daytime for a randomly selected day (the 240th day of 2005) at a spatial resolution of 5 km was calculated and is shown in Figure 3; it should be noted that this is the average frequency in a 5 × 5 km grid. It can be seen that the overpass frequency during the daytime for most regions at mid-low latitudes is only 1-2 times a day, which is much less than at high latitudes. In this study, both the MOD and MYD series products provided by Terra and Aqua, respectively, were used, similar to the study of Wang et al. [19], including the TOA reflectance and radiance data from MOD/MYD021KM, the corresponding geolocation and view geometry information from MOD/MYD03, and the cloud mask from MOD/MYD35 from 2000 to 2017. Table 2 lists all of the MODIS products used. All MODIS products were extracted according to the locations of the 340 sites and converted from coordinated universal time (UTC) to local time (LT) to match the in situ measurements. In addition, the Zscore normalization (mean = 0, standard deviation = 3) method was applied to the MODIS TOA observations to exclude any abnormal values before modeling.

GLASS
The GLASS R n_daily product is generated by combining the results from two algorithms. In regions at mid-low latitudes, R n_daily is mainly estimated based on the daily R ↓ s and other ancillary information (i.e., meteorological factors) under various underlying surface types and daytime lengths [14]. Jiang et al. [43] evaluated the GLASS daytime R n (R n_daytime ) product generated with a similar algorithm and concluded it had better performance compared with CERES. Specifically, GLASS had an RMSE of 51.24 Wm −2 for global sites validation, compared to 54.96 Wm −2 for CERES. For regions at high latitudes, R n is estimated based on the LRD model applied to MODIS TOA data [15]. After weighted fusion, a long-term seamless GLASS R n_daily product with a spatial resolution of 0.05 • from 2000 to 2018 was produced. Guo et al. [44] applied the GLASS R n_daily for ET estimation, which had better accuracy than R n_daily obtained from MERRA2. In this study, the GLASS R n_daily values were extracted according to the location and observation time of each validation sample for comparison. Specifically, we extracted the GLASS product values at the pixel with the center closest to the longitude and latitude of the sites, and corresponding to the observation time of the sites.

CERES4A
Because of its high accuracy and reliable quality, the CERES4A series of radiation products have been widely used as a reference in previous studies [6,45,46]. The surface radiative fluxes of CERES4A are generated using Fu-Liou's radiative transfer model in tandem with CERES's TOA measurements and high-quality meteorological data sets from different reanalysis projects [9,47,48]. CERES4A provides the all-sky R n_daily by adding the four radiative components from its Level 3 products at 1 • spatial resolution from 2000 to present. The CERES4A R n_daily values were extracted according to the location and observation time of each validation sample for comparison, with a specific extraction process consistent with GLASS

FLUXCOM_RS
FLUXCOM is a remote sensing product [29] consisting of two data sets, FLUXCOM_RS and FLUXCOM_RS+METEO, generated using ground measurements, remotely sensed data, and reanalysis data with a machine learning algorithm from 2001 to 2010 and 2001 to 2015, respectively. For FLUXCOM_RS, the fluxes are estimated exclusively from MODIS satellite data; thus, its spatial resolution is 0.0833 • and temporal resolution is eight days. For FLXUCOM_RS+METEO, the additional meteorological information from the reanalysis products is added as input; thus, its spatial resolution is 0.5 • and its temporal resolution is daily. According to Jung et al. [29], the spatial patterns of the R n_daily values from the two FLUXCOM datasets both agree well with the one from CERES4A, yielding an R 2 of 0.96, and they both show similar large-scale variations. However, the energy fluxes from FLUX-COM_RS perform better in heterogeneous regions than those from FLUXCOM_RS+METEO because of the former's higher spatial resolution. Two radiative components, R ns and R nl , at the eight-daily scale are provided by FLUXCOM_RS; hence, R n_daily from FLUXCOM_RS was calculated using R ns and R nl and then extracted according to the site locations and observation time for comparison with a specific extraction process consistent with GLASS.

ERA5 Reanalysis Products
ERA5 is the updated version of the previous generation ERA-interim [8], and is the fifth generation of European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis products. It has a higher time resolution of 1 h, a finer spatial resolution of 31 km (0.28125 • × 0.28125 • ), and a higher number of vertical levels of 137. The performance of various parameters from ERA5 is acknowledged and has been widely used [49,50]. The R n_daily values from ERA5 are calculated by adding the four radiative components, which have an hourly resolution, and aggregated into daily means. Similarly, the R n_daily from ERA5 were also extracted with an extraction process consistent with GLASS. Additional meteorological factors, including the 2 m air temperature (T, • C), surface pressure (SP, Pa), total cloud cover (CC, %), total column cloud liquid water (TCL, gm −2 ), relative humidity (RH, %), and total column water vapor (TCW, gm −2 ) were also extracted at each site at a spatial resolution of 25 km and used as ancillary information in this study. In addition, the CI of ERA5 (CI ERA5 ) and the ratio of R n_ins and R n_daily ) (defined according to the study of Carmona et al. [51]), which is used to express the linear statistical relationship between R n_ins and R n_daily in a certain region, were calculated and applied in this study.

Methodology
A flowchart of this study is shown in Figure 4. First, the optimal combinations of MODIS bands most appropriate for modeling were determined. Then, four empirical models for R n_daily estimation at mid-low latitudes were established, according to the statistical relationships between the selected MODIS TOA data and the in situ R n_daily and R n_ins values, using the random forest (RF) and LUT methods. Afterwards, the performances of the four new models were evaluated and further analyses conducted with the best one; the results from this model were then compared with other products and preliminarily used for mapping. then extracted according to the site locations and observation time for comparison with a specific extraction process consistent with GLASS.

ERA5 Reanalysis Products
ERA5 is the updated version of the previous generation ERA-interim [8], and is the fifth generation of European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis products. It has a higher time resolution of 1 h, a finer spatial resolution of 31 km (0.28125° × 0.28125°), and a higher number of vertical levels of 137. The performance of various parameters from ERA5 is acknowledged and has been widely used [49,50]. The _ values from ERA5 are calculated by adding the four radiative components, which have an hourly resolution, and aggregated into daily means. Similarly, the _ from ERA5 were also extracted with an extraction process consistent with GLASS. Additional meteorological factors, including the 2 m air temperature (T, °C), surface pressure (SP, Pa), total cloud cover (CC, %), total column cloud liquid water (TCL, gm −2 ), relative humidity (RH, %), and total column water vapor (TCW, gm −2 ) were also extracted at each site at a spatial resolution of 25 km and used as ancillary information in this study. In addition, the CI of ERA5 ( ) and the ratio of

Methodology
A flowchart of this study is shown in Figure 4. First, the optimal combinations of MODIS bands most appropriate for modeling were determined. Then, four empirical models for _ estimation at mid-low latitudes were established, according to the statistical relationships between the selected MODIS TOA data and the in situ _ and _ values, using the random forest (RF) and LUT methods. Afterwards, the performances of the four new models were evaluated and further analyses conducted with the best one; the results from this model were then compared with other products and preliminarily used for mapping.

R n_daily Estimation Model Development
Because of the abundant bands and different overpass times of the MODIS observations, the most appropriate combinations of the TOA bands and their overpass time should be determined first for modeling. To do this, correlations between the in situ R n_ins and R n_daily values under various conditions were analyzed. Afterwards, by taking the optimal combinations of MODIS TOA data as major input, three RF-based models, namely an RF-based instantaneous model (RF-based ins model), an RF-based model, and an RF-based model with ERA5, and a LUT-based model were built. Specifically, the RF-based ins model was established by regressing the MODIS TOA data to the in situ R n_ins values directly with RF, then the estimated R n_ins values were converted into R n_daily values using a temporal expansion method. The RF-based model was established by regressing the MODIS TOA data to the in situ R n_daily values directly with RF; the RF-based model with ERA5 was similar to the RF-based model but added the R n_daily of ERA5 as input. The LUT-based model contained a group of conditional models built by regressing the MODIS TOA data to the in situ R n_daily values under different cases defined by combining the various observation geometric information and sky conditions obtained from the MOD/MYD03 and MOD/MYD35 products. Table 3 shows the detailed information of the four models. Note that an observation from MODIS yields an estimated R n_daily values; thus, the final obtained R n_daily values are the average of all estimated R n_daily in a single day, the values of which are determined by the MODIS observation frequency. To compare the four models objectively, the intersection of all independent validation samples for the four models (No. of samples = 23,826) were used. In this study, all models were implemented on a PC with the Microsoft Windows 10 operating system, an Intel Core 3.20 GHz processor, and 32 GB of memory. According to previous studies [15], when LRD is greater than around 0.3, instantaneous daytime MODIS observations provide a better estimation of R n_daily at high latitudes; however, it is uncertain whether such a relationship holds at mid-low latitudes. Hence, the determination coefficient (R 2 , see Section 3.2 for specific computational definitions) between the in situ R n_ins and its corresponding R n_daily at each hour in each day as well as their variations with different factors including latitude, month, and the land cover type were examined using all collected sites measurements, not only those that successfully matched with MODIS; the results are shown in Figure 5. In Figure 5a, all sites and their correlation results are divided into six groups using 10 • latitude intervals from 0 • to 60 • N. Additionally, these correlation results are divided into twelve months periods and sixteen land cover types, as shown in Figure 5b,c. The results indicate that strong correlations (warm colors) between the in situ R n_ins and R n_daily values mainly appear from 10:00 to 15:00 (local time) for nearly all land cover types at mid-low latitudes throughout the year. Hence, the MODIS TOA observations during the daytime are used for R n_daily estimation at mid-low latitudes, with most samples between 10:00 and 14:00.
Meanwhile, according to previous studies, surface shortwave radiation is closely related to the MODIS TOA reflectance from bands 1-7 [17,19,52], while the longwave radiation is mostly represented by the MODIS TOA radiance from bands 27-36 [53][54][55]; water vapor, which has a significant impact on solar radiation, is related to band 19 [19], and bands 21, 24, and 25 are affected by both solar illumination and the Earth's infrared emission [19]. Thus, MODIS bands 1-5, 7, 19, 21, 24, 25, and 27-36 were selected for modeling in this study. Note that MODIS band 6 was not used due to its poor quality [56]. Afterwards, the _ estimation model was established using all training samples with the following mathematical expression: where is _ for the RF-based ins model and _ for the RF-based model, is the TOA reflectance from MODIS bands 1-5, 7, and 19, is the TOA Figure 5. The R 2 between the in situ R n_ins and R n_daily for each of the 24 h in one day at the mid-low latitudes in the Northern hemisphere for (a) latitudes from 0 to 60 • (10 • interval), (b) twelve months from January to December, and (c) sixteen land cover types.
Meanwhile, according to previous studies, surface shortwave radiation is closely related to the MODIS TOA reflectance from bands 1-7 [17,19,52], while the longwave radiation is mostly represented by the MODIS TOA radiance from bands 27-36 [53][54][55]; water vapor, which has a significant impact on solar radiation, is related to band 19 [19], and bands 21, 24, and 25 are affected by both solar illumination and the Earth's infrared emission [19]. Thus, MODIS bands 1-5, 7, 19, 21, 24, 25, and 27-36 were selected for modeling in this study. Note that MODIS band 6 was not used due to its poor quality [56].
Afterwards, the R n_daily estimation model was established using all training samples with the following mathematical expression: where R n is R n_ins for the RF-based ins model and R n_daily for the RF-based model, TOA re f is the TOA reflectance from MODIS bands 1-5, 7, and 19, TOA rad is the TOA radiance from MODIS bands 21, 24, 25, and 27-36, VAA, SAA, SZA, VZA, and the height are from MOD/MYD03 (Table 2), RAA is the relative azimuth angle calculated by the absolute values of VAA minus SAA, and d r is calculated according to Equation (6).

Modeling with Random Forest
RF, proposed by Breiman et al. [57], has been one of the most popular machine learning methods developed thus far. RF is based on the construction of a multitude of regression or classification trees. Each tree is grown using randomly selected features and samples extracted from the whole training set. The samples are then successively split to produce two branches until the terminal nodes of the tree return the priors provided by the user. Specifically, for classification, the results of RF turn out to be the voting results for all decision trees; for regression, the final RF values can be determined by averaging over the ensemble regression values. In addition, the under or over-fitting issues which appear very often in machine learning can be solved by tuning some hyperparameters in RF (i.e., the number of trees in the forest, the maximum depth of each tree, minimum split sample number, and minimum sample leaves) [50]. RF has become a popular technique in remote sensing-related studies such as parameter estimation [17,50,52], parameter prediction [58], classification [59], variable importance determination [22], etc. The performance of RF is determined by the setting of the hyperparameters, such as "n-estimators", "max depth", "max features", "min samples split", "min sample leaves", etc. By fixing other hyperparameters, a certain hyperparameter was traversed within a certain range in order to observe the corresponding change in accuracy of the RF model; four hyperparameters were found to be critical in this study, each varying within a certain range (Table 4). Hence, the optimal RF model was determined by the smallest RMSE value of validation accuracy using different combinations of these hyperparameters. In this study, the RF method was implemented in Python using the RandomForestRegressor module in the Scikit-learn toolbox, using a Microsoft Windows 10 system on the Intel Core 3.2 GHz PC with 32 GB memory [60]. 1.

RF-based ins model
It is more common to estimate the instantaneous surface radiation from instantaneous satellite observations [5,20,51,61]. Then, the daytime or daily value is usually calculated from a temporal expansion model, such as the sinusoidal curve proposed by Bisit et al. [5], as Equation (11) shows: where N is the empirical coefficient and t overpass , t rise , and t set are the MODIS overpass, sunrise, and sunset times, respectively. In order to determine the optimal N value, a group of experiments with an interval of 0.1 from 1 to 2 were compared, and it was determined that when N was defined as 1.7, the error of the estimated R n_daytime was the smallest. The sunrise and sunset times were calculated following the method of Doggett et al. [62]. Afterwards, the estimated R n_daytime from Equation (11) was converted into R n_daily with a statistical model (e.g., a linear or quadratic polynomial regression) built based on the ground measurements. Through experiments, the optimal combination of hyperparameters for the RF-based ins model was found to be 50 for "n-estimators", 9 for "max depth", 5 for "min samples split" and 8 for "min samples leaf". As mentioned above, the sinusoidal model was applied by assuming that atmospheric conditions are stable throughout each day. Thus, the model was developed based on the assumption that the surface radiation is primarily determined by SZA and then follows a sinusoidal curve throughout the daytime [63]. However, no significant correlation was found between the instantaneous and daily atmospheric conditions on the same day by Wang et al. [22], and cloudy skies are more likely to appear than clear skies, especially in the afternoon [20,64]. Thus, the performance of sinusoidal models will likely deteriorate with mixed cloudy or overcast skies.

RF-based model
To avoid error accumulation occurring during temporal expansion, the RF-based model was built by taking R n_daily as the output according to Equation (10), which means that one R n_daily estimate was calculated from the MODIS TOA observations, and the final R n_daily values in a single day are the average of all available estimates during that day. Through experiments, the optimal combination of hyperparameters for the RF-based model was determined to be 50 for "n-estimators", 10 for "max depth", 3 for "min samples split" and 7 for "min samples leaf".

3.
RF-based model with ERA5 To reduce the uncertainties of the RF-based model, especially when the available MODIS TOA observations are very limited (only 1-2 revisits each day) during the daytime, the surface R n_daily from ERA5 was introduced into the RF-based model as a physical constraint with the mathematic expression R n_daily = f TOA re f , TOA rad , SZA, VZA, RAA, height, d r , ϕ, R n_ERA5 (12) where R n_ERA5 is the R n_daily from ERA5. Through experiments, the optimal combination of hyperparameters for the RF-based model with ERA5 was 50 for "n-estimators", 12 for "max depth", 4 for "min samples split" and 8 for "min samples leaf".

Look-Up Table (LUT-Based) Model
For comparison, R n_daily was also estimated from MODIS TOA observations based on the typical LUT method using ground measurements instead of simulations from RTM. As was done in the study of Wang et al. [19], the viewing geometry (SZA, VZA, and RAA) and sky conditions (determined by the cloud mask) during each MODIS overpass time were taken as the key factors for case division and each of them divided into different bins, as shown in Table 5. Then, a group of conditional models determined by the combination of the above four factors was built by regressing the MODIS TOA data to the R n_daily as R n_daily = f TOA re f , TOA rad , height, d r , ϕ The cloud mask data obtained from MOD/MYD35 indicates the sky condition when the two MODIS sensors overpass, with "clear" defined as when the clear-sky confidence levels are "confident clear" or "probably clear" and "cloudy" defined as when the clear-sky confidence levels are "cloudy" or "uncertain clear" [23,52].
In this study, each conditional model was built with the multilayer perceptron (MLP) method [65], which is one of the basic building blocks of an ANN. MLP has been widely used in radiation estimation because of its superior performance in regression [66,67]. A three-layer network consisting of an input layer, a hidden layer with the rectified linear unit (ReLU) transfer function [68,69], and an output layer with a linear function was constructed for each conditional model. The most optimal model was determined by the minimum validated RMSE after setting with different numbers of neurons in the hidden layer.  Figure 6 presents the proportion of all samples classified by each bin of the four key factors (Table 5). It can be seen that most of the matched observations have clouds in Figure 6a, and the SZA of most samples is concentrated at 20 • to 70 • in Figure 6b.  Theoretically, 504 conditional models (7 SZA bins × 6 VZA bins × 6 RAA bins × 2 sky conditions) should be built; however, only 312 models were established (153 for clear-sky and 159 for cloudy-sky) in this work because a minimum of 80 training samples was required. Figure 6 presents the proportion of all samples classified by each bin of the four key factors (Table 5). It can be seen that most of the matched observations have clouds in Figure 6a, and the SZA of most samples is concentrated at 20° to 70° in Figure 6b.

Model Performance Evaluation
In this study, four statistical indices were used to evaluate model performance: the determination coefficient (R 2 ), RMSE, bias, and relative root mean square error (rRMSE), given respectively as: where is the estimated from the proposed model, is the corresponding measured , and represents all site measurements of .

Model Performance Evaluation
In this study, four statistical indices were used to evaluate model performance: the determination coefficient (R 2 ), RMSE, bias, and relative root mean square error (rRMSE), given respectively as: where e i is the estimated R n from the proposed model, o i is the corresponding measured R n , and X represents all site measurements of R n .

Proposed Model Performance Evaluation
Among the four proposed models, the RF-based ins model was the only one that cannot derive R n_daily directly, and instead yielding R n_ins . Figure 7 shows the training and validation accuracy of the RF-based ins model, which yields RMSEs of 78.06 and 81.74 Wm −2 , biases of −0.08 and −0.34 Wm −2 , and R 2 values of 0.84 and 0.83, respectively. Wang et al. [19] applied a hybrid model to estimate clear-sky R n_ins , and the direct validation accuracy of SURFARD yielded an RMSE of 70.6 Wm −2 (number of samples = 1522), while a similar validation was also conducted on the all-sky R n_ins estimates from the RF-based ins model and yielded a smaller RMSE of 67.7 Wm −2 (number of samples = 11,268). Therefore, the results here and in the previous study demonstrate the limitations of using simulated values. cannot derive _ directly, and instead yielding _ . Figure 7 shows the training and validation accuracy of the RF-based ins model, which yields RMSEs of 78.06 and 81.74 Wm −2 , biases of −0.08 and −0.34 Wm −2 , and R 2 values of 0.84 and 0.83, respectively. Wang et al. [19] applied a hybrid model to estimate clear-sky _ , and the direct validation accuracy of SURFARD yielded an RMSE of 70.6 Wm −2 (number of samples = 1522), while a similar validation was also conducted on the all-sky _ estimates from the RF-based ins model and yielded a smaller RMSE of 67.7 Wm −2 (number of samples = 11,268). Therefore, the results here and in the previous study demonstrate the limitations of using simulated values.
Next, the estimated _ values were aggregated into _ according to Equation (11). To obtain _ from _ , the relationship between the two parameters was examined first based on the in situ measurements, the results of which are given in Figure 7c. After multiple experiments, a linear regression model (Equation (18)) was built and then applied to convert For a better comparison, the _ estimates from the RF-based ins model were compared with those obtained from the other three models (i.e., the RF-based model, RFbased model with ERA5, and the LUT-based model). Figure 8 presents the overall results of the training and validation accuracies of the four models; the three plots in each row from top to bottom are for the RF-based ins model, RF-based model, RF-based model with ERA5, and LUT-based model, respectively. Specifically, the four plots in the left column (Figure 8a,d,g,j)) are the training results of the four models (note that Figure 8a shows the training results of Equation (18)). The four plots in the middle column (Figure 8b,e,h,k) show the independent validation accuracies of the four models with their corresponding independent validation samples (see Table 3). The four plots in the right column ( Figure  8c,f,i,l) present the validation accuracies of the four models against the intersection validation samples for intercomparison. Next, the estimated R n_ins values were aggregated into R n_daytime according to Equation (11). To obtain R n_daily from R n_daytime , the relationship between the two parameters was examined first based on the in situ measurements, the results of which are given in Figure 7c. After multiple experiments, a linear regression model (Equation (18)) was built and then applied to convert R n_daytime to R n_daily : For a better comparison, the R n_daily estimates from the RF-based ins model were compared with those obtained from the other three models (i.e., the RF-based model, RF-based model with ERA5, and the LUT-based model). Figure 8 presents the overall results of the training and validation accuracies of the four models; the three plots in each row from top to bottom are for the RF-based ins model, RF-based model, RF-based model with ERA5, and LUT-based model, respectively. Specifically, the four plots in the left column (Figure 8a,d,g,j)) are the training results of the four models (note that Figure 8a shows the training results of Equation (18)). The four plots in the middle column (Figure 8b,e,h,k) show the independent validation accuracies of the four models with their corresponding independent validation samples (see Table 3). The four plots in the right column (Figure 8c,f,i,l) present the validation accuracies of the four models against the intersection validation samples for intercomparison.
Generally speaking, the performance of the three models deriving R n_daily directly (i.e., the RF-based model, RF-based model with ERA5, and LUT-based model) was similar, and better than that of the RF-based ins model in terms of training, validation, and intercomparison validation results without overfitting. Specifically, the RF-based model with ERA5 performed the best among the four models, with both the independent and intercomparison validation accuracy (Figure 8h Relatively, the performance of the LUT-based model was only slightly worse than that of the RF-based model in terms of the independent and intercomparison validation accuracies, with R 2 values of 0.87 and 0.90, RMSEs of 23.71 and 20.28 Wm −2 , and biases of 0.09 and 0.56 Wm −2 (Figure 8k,l), respectively, which might have resulted from the insufficient number of samples collected under some conditions for modeling. The performance of the RF-based ins model was the worst among the four proposed models. It is speculated that most samples of the cloudy sky used in this study deteriorated the performance of the sinusoidal model used to convert R n_ins to R n_daytime , which further resulted in errors propagating from R n_daytime to R n_daily . Generally speaking, the performance of the three models deriving _ directly (i.e., the RF-based model, RF-based model with ERA5, and LUT-based model) was similar, Based on these results, it can be concluded that (1) the application of the RF-based model is likely limited by the limited number of overpass times of polar-orbiting satellites at mid-low latitudes, even though performance in the respect was acceptable in this study; (2) the RF-based model with ERA5 can effectively improve estimation accuracy based on the physical constraints imposed by R n_daily from ERA5; (3) the sinusoidal curve model performs undesirably when sky conditions are not clear; and (4) the LUT-based method might perform uncertainly in some cases even though it had comparable performance here. Because of its good performance, further analysis with the RF-based model with ERA5 was conducted, and is discussed in Section 4.2.

Further Analysis with RF-Based Model with ERA5
A more comprehensive evaluation was conducted with the RF-based model with ERA5 by comparing its R n_daily values with those obtained from other products (GLASS, CERES4A, ERA5 and FLUXCOM_RS) under various conditions.

Comparison with Other Products at the Site Scale
First, the spatial distribution of the independent validation accuracy in the R n_daily estimations from the RF-based model with ERA5 at all sites is shown in Figure 9. It can be seen that the uncertainties for the R n_daily estimations over the regions between 30 • and 60 • latitude (30 • N/S-60 • N/S) are smaller than those near the equator (30 • S-30 • N), with RMSE values mostly between 15 to 25 W·m −2 (Figure 9a). Meanwhile, the distribution of rRMSE (Figure 9b), which could eliminate the influence of the sample size on the RMSE, is generally uniform globally, with values mostly between 0.15 and 0.35. In addition, the distribution of the bias values (Figure 9c) is erratic, although mostly within ±10 W·m −2 , and no significant overestimation or underestimation can be found over all regions at mid-low latitudes. These results demonstrate the robustness of the RF-based model with ERA5, as shown by the relatively uniform distribution of random errors and systematic errors over the globe.
Meanwhile, the R n_daily values from the four products (GLASS, CERES4A, ERA5, and FLUXCOM_RS) were also validated against the ground measurements at mid-low latitudes. Because of the discrepancies in temporal resolution (8-day for FLUXCOM_RS) and span, the number of validation samples used for product intercomparison was 8380. Figure 10b-e present the validation results for each of the four products, and Figure 10a shows them for the RF-based model with ERA5. Based on Figure 10, the R n_daily estimates from the RF-based model with ERA5 (Figure 10a) have the best validation accuracies, with the smallest RMSE of 22.97 Wm −2 and the largest R 2 of 0.88, while those of GLASS (Figure 10b) are second best, with an RMSE of 26.04 Wm −2 . The performance of the other three products (CERES4A, ERA5, and FLUXCOM_RS) is slightly worse although they are close to each other, with RMSE values of~30 Wm −2 . Moreover, it is encouraging to see that the R n_daily values estimated by the RF-based model with ERA5 are more accurate than the ones from ERA5, as demonstrated by the former having a~7 Wm −2 smaller RMSE. In addition, overestimations of R n_daily by CERES4A were also found in this study, in accordance with the results of previous studies [6,11]. rRMSE (Figure 9b), which could eliminate the influence of the sample size on the RMSE, is generally uniform globally, with values mostly between 0.15 and 0.35. In addition, the distribution of the bias values (Figure 9c) is erratic, although mostly within ±10 W·m −2 , and no significant overestimation or underestimation can be found over all regions at midlow latitudes. These results demonstrate the robustness of the RF-based model with ERA5, as shown by the relatively uniform distribution of random errors and systematic errors over the globe.   Furthermore, the performances of the five _ estimates for nine land cover types (croplands <CRO>, grasslands <GRA>, deciduous broadleaf forests <DBF>, mixed forests <MF>, open shrublands <OSH>, evergreen broadleaf forests <EBF>, evergreen needleleaf forests <ENF>, woody savannas <WSA> and closed shrublands <CSH>) and six ranges of elevation (<200 m, 200-400 m, 400-600 m, 600-1000 m, 1000-1500 m, and >1500 m) were compared against their respective validation samples; the results represented by RMSEs are shown in Figure 11. The RF-based model with ERA5 performed the best across the nine land cover types and six elevation zones, approaching the smallest RMSE values; the GLASS product again performed second best. In particular, almost all products performed poorly for the EBF and DBF land cover classes (Figure 11a), in keeping with the results of previous studies [14,17]. However, the results from the newly proposed model are satisfactory and are significantly improved compared to that of ERA5, with the RMSE reduced by ~5 W·m −2 . It has been pointed out that surface elevation is one of the most crucial factors in surface radiation estimation because of its impact on controlling atmospheric mass [70]. The performance of the RF-based model with ERA5 appears to be stable and remains optimal at all elevation groups (Figure 11b), with the smallest RMSE; it is much better than even that of ERA5, especially at high elevation zones (>1500 m). Therefore, all the above results demonstrate the effectiveness of the newly proposed model (RFbased model with ERA5) in both robustness and in accuracy. Furthermore, the performances of the five R n_daily estimates for nine land cover types (croplands <CRO>, grasslands <GRA>, deciduous broadleaf forests <DBF>, mixed forests <MF>, open shrublands <OSH>, evergreen broadleaf forests <EBF>, evergreen needleleaf forests <ENF>, woody savannas <WSA> and closed shrublands <CSH>) and six ranges of elevation (<200 m, 200-400 m, 400-600 m, 600-1000 m, 1000-1500 m, and >1500 m) were compared against their respective validation samples; the results represented by RMSEs are shown in Figure 11. The RF-based model with ERA5 performed the best across the nine land cover types and six elevation zones, approaching the smallest RMSE values; the GLASS product again performed second best. In particular, almost all products performed poorly for the EBF and DBF land cover classes (Figure 11a), in keeping with the results of previous studies [14,17]. However, the results from the newly proposed model are satisfactory and are significantly improved compared to that of ERA5, with the RMSE reduced bỹ 5 W·m −2 . It has been pointed out that surface elevation is one of the most crucial factors in surface radiation estimation because of its impact on controlling atmospheric mass [70]. The performance of the RF-based model with ERA5 appears to be stable and remains optimal at all elevation groups (Figure 11b), with the smallest RMSE; it is much better than even that of ERA5, especially at high elevation zones (>1500 m). Therefore, all the above results demonstrate the effectiveness of the newly proposed model (RF-based model with ERA5) in both robustness and in accuracy.   Figure 12. The two R n_daily estimates both captured the variations in the in situ R n_daily (black dots) very well; however, those from the RF-based model with ERA5 (red line) were better than those of GLASS (blue line), with smaller RMSE values for all three sites. The smaller RMSEs may possibly be due to the R n_daily values from RF-based model with ERA5 being closer to the in situ measurements, especially at very high values. However, the two data sets both had a tendency to overestimate R n_daily , with low values at the three sites.
In summary, the estimated R n_daily from the newly proposed RF-based model with ERA5 has superior performance to the other four products in terms of direct validation accuracy, robustness, and temporal variation characterization. Figure 13a shows a further application of the RF-based model with ERA5 to real MODIS images and ERA5 R n_daily data to map R n_daily at mid-low latitudes on the 220th day of 2017. The spatial distribution in R n_daily on this day was roughly commensurate with the high values appearing in the Northern Hemisphere during summer and the low values in the Southern Hemisphere during winter. For further illustration, magnified images over a smaller region (24 • -36 • N, 75 • -93 • E) from the RF-based model with ERA5, GLASS, ERA5, and CERES4A are presented in Figure 13b-e. The result from the RF-based model with ERA5 is comparable to those of the others, and more similar to that of ERA5, although with more details thanks to its higher spatial resolution. It seems that the distribution of R n_daily from GLASS (Figure 13c) is unsmooth in this region, and has higher values compared to the RF-based model with ERA5 (Figure 13b). A significant difference in the four images appears in the northern part in the region (33 • -36 • N, 81 • -87 • E), where GLASS has lower values (blue color) than the other three, possibly caused by the presence of clouds. More verification should be conducted in the future. _ values from the RF-based model with ERA5 and GLASS at three SURFRAD sites (SF_DRA <36.63°N, 116.02°W, BSV>, SF_FPK <48.31°N, 105.1°W, GRA>, and SF_SXF <43.73°N, 96.62°W, OSH>) were taken as examples, as shown in Figure 12. The two _ estimates both captured the variations in the in situ _ (black dots) very well; however, those from the RF-based model with ERA5 (red line) were better than those of GLASS (blue line), with smaller RMSE values for all three sites. The smaller RMSEs may possibly be due to the _ values from RF-based model with ERA5 being closer to the in situ measurements, especially at very high values. However, the two data sets both had a tendency to overestimate _ , with low values at the three sites.  Overall, the newly proposed model (RF-based model with ERA5) has strong potential to be used for high-resolution global R n_daily mapping at mid-low latitudes owing to its stable performance, easily-collected input, and high production efficiency.  Figure 13b-e. The result from the RF-based model with ERA5 is comparable to those of the others, and more similar to that of ERA5, although with more details thanks to its higher spatial resolution. It seems that the distribution of _ from GLASS (Figure 13c) is unsmooth in this region, and has higher values compared to the RF-based model with ERA5 (Figure 13b). A significant difference in the four images appears in the northern part in the region (33°-36°N, 81°-87°E), where GLASS has lower values (blue color) than the other three, possibly caused by the presence of clouds. More verification should be conducted in the future. Overall, the newly proposed model (RF-based model with ERA5) has strong potential to be used for high-resolution global _ mapping at mid-low latitudes owing to its stable performance, easily-collected input, and high production efficiency.

Discussion of the RF-Based Model with ERA5
More discussion about the development of the RF-based model with ERA5 is provided in this section. Based on all the validation results after incorporating the R n_daily from ERA5 as input, (which could supplement situations when insufficient information from instantaneous satellite observations is lacking), the superior performance of the RF-based model was fully demonstrated. However, it is also common to take various parameters, especially meteorological variables from other data sources, as ancillary information in modeling, as employed in previous studies [14,71]. Hence, another four experiments were conducted with the RF-based model by introducing different combinations of climatic and atmospheric parameters from ERA5. Then, the estimated R n_daily values were validated and compared with the ones from the RF-based model and the RF-based model with ERA5. Explanations of the variables in Table 6 were introduced in Section 2.3, with most selected according to previous studies. The RF-based model is labelled "Original" for convenience, and detailed information about all the experiments and validation results is given in Table 6. Meanwhile, to evaluate the improved accuracy of the RF-based model with ERA5 compared to ERA5, we also evaluated the accuracy of ERA5 using the same samples in Table 6, generating an RMSE of 29.1 Wm −2 and a bias of −1.32 Wm −2 . The RF-based model with ERA5 (the last row in Table 6) performed the best among all the models, with the smallest RMSE of 21.83 Wm −2 . It is speculated that the radiative components simulated in ERA5 by the RTM (in which radiative attenuation is determined via complex atmospheric and surface parameters as well as various assimilated meteorological conditions) contains much more information than the single or combined parameters. Moreover, when taking the ERA5 R n_daily as a constraint in the RF-based model, the RF method can adaptively adjust the weights between the spectral information provided by MODIS and the features of the ERA5 R n_daily product at different overpass times to obtain the most optimal estimations by minimizing their differences to the ground measurements.
In addition, as the most important input, the influence of the MODIS TOA observations on the performance of the RF-based model with ERA5 was analyzed in terms of the overpass time, sensor, revisit frequency, and sky condition. Based on the independent validation samples, the accuracy of the estimated R n_daily from the RF-based model with ERA5 under various cases were compared. Figure 14 presents the performance of the RF-based model with ERA5 as a function of overpass time and latitude. The overpass time is divided into five bins from 9:00 am to 14:00 pm in one-hour intervals, and the latitude from 0 • -60 • N is also divided into six using 10 • intervals. Note that the R n_daily discussed here is the direct output of the new model. The validation rRMSEs are mainly smaller than 0. Moreover, there is only a slight difference between the results when using MODIS TOA data from only the Terra or Aqua satellite, as shown in Figure 15. It can be seen that the new model yields better performance when the MODIS TOA data in the morning are added (Figure 15a), which accords with the results of the study of Wang et al. [19], likely due to the influence of clouds [64]. However, the influence of the amount of available MODIS TOA data on the performance of the RF-based model with ERA5 is significant. Table 7 gives the validation accuracy of the final _ when using one to three MODIS TOA data points per day as input for the RF-based model and RF-based model with ERA5. For the RF-based model, its estimation accuracy gets better when the amount of MODIS TOA observations increases during the daytime, which again illustrates the importance of including information about variations in the daily condition for _ estimation. Meanwhile, for the RF-based model with ERA5, its estimation accuracy improves the most when only one MODIS TOA observation per day is used as input, where the RMSE value decreases from 24.17 Wm −2 to 22.41 Wm −2 relative to the RF-based model; however, the improvements are smaller with an increase in MODIS TOA data. Therefore, it is recommended to introduce the _ values from ERA5 into any RF-based model when the number of MODIS TOA observations is fewer than two per day.
At last, the daily sky condition represented by CI was used to examine the effects of  Moreover, there is only a slight difference between the results when using MODIS TOA data from only the Terra or Aqua satellite, as shown in Figure 15. It can be seen that the new model yields better performance when the MODIS TOA data in the morning are added (Figure 15a), which accords with the results of the study of Wang et al. [19], likely due to the influence of clouds [64]. Moreover, there is only a slight difference between the results when using MODIS TOA data from only the Terra or Aqua satellite, as shown in Figure 15. It can be seen that the new model yields better performance when the MODIS TOA data in the morning are added (Figure 15a), which accords with the results of the study of Wang et al. [19], likely due to the influence of clouds [64]. However, the influence of the amount of available MODIS TOA data on the performance of the RF-based model with ERA5 is significant. Table 7 gives the validation accuracy of the final _ when using one to three MODIS TOA data points per day as input for the RF-based model and RF-based model with ERA5. For the RF-based model, its estimation accuracy gets better when the amount of MODIS TOA observations increases during the daytime, which again illustrates the importance of including information about variations in the daily condition for _ estimation. Meanwhile, for the RF-based model with ERA5, its estimation accuracy improves the most when only one MODIS TOA observation per day is used as input, where the RMSE value decreases from 24.17 Wm −2 to 22.41 Wm −2 relative to the RF-based model; however, the improvements are smaller with an increase in MODIS TOA data. Therefore, it is recommended to introduce the _ values from ERA5 into any RF-based model when the number of MODIS TOA observations is fewer than two per day.
At last, the daily sky condition represented by CI was used to examine the effects of  However, the influence of the amount of available MODIS TOA data on the performance of the RF-based model with ERA5 is significant. Table 7 gives the validation accuracy of the final R n_daily when using one to three MODIS TOA data points per day as input for the RF-based model and RF-based model with ERA5. For the RF-based model, its estimation accuracy gets better when the amount of MODIS TOA observations increases during the daytime, which again illustrates the importance of including information about variations in the daily condition for R n_daily estimation. Meanwhile, for the RF-based model with ERA5, its estimation accuracy improves the most when only one MODIS TOA observation per day is used as input, where the RMSE value decreases from 24.17 Wm −2 to 22.41 Wm −2 relative to the RF-based model; however, the improvements are smaller with an increase in MODIS TOA data. Therefore, it is recommended to introduce the R n_daily values from ERA5 into any RF-based model when the number of MODIS TOA observations is fewer than two per day.
At last, the daily sky condition represented by CI was used to examine the effects of R n_daily from ERA5 in the newly-proposed model. The CI is divided into seven bins (<0.2, 0.2-0.3, 0.3-0.4, 0.4-0.5, 0.5-0.6, 0.6-0.7, and >0.7), and the validation accuracy of the final R n_daily estimation in each bin is shown in Figure 16. The sky conditions are defined as overcast (CI < 0.3), clear (CI > 0.7), and cloudy (CI ∈ [0.3, 0.7]). Furthermore, the differences in the rRMSEs of the RF-based model with or without ERA5 as a function of the number of MODIS TOA observations used as input for the different CI bins are also examined in Figure 17. The two results demonstrate that the introduction of R n_daily from ERA5 into the RF-based model is more effective under overcast skies (CI < 0.3) and when only one MODIS TOA observation is available as input.    Overall, it is very effective to take the ERA5 as a physical constraint in the RFbased model for _ estimation at mid-low latitudes, and it is most effective when the available number of the MODIS TOA observations in a single day is fewer than two and the daily sky condition is mainly overcast.

Conclusions
is a key parameter representing the available radiation budget at the land surface. As it also regulates most biological and physical processes, an accurate estimation of is of great significance for various applications. To improve the estimation accuracy using a limited amount of polar-orbiting satellite data at mid-low latitudes, we proposed to estimate _ at mid-low latitudes directly from MODIS TOA data along with comprehensive in situ measurements collected from globally distributed 340 sites from 2000- the RF-based model is more effective under overcast skies (CI < 0.3) and when only one MODIS TOA observation is available as input.   Overall, it is very effective to take the ERA5 as a physical constraint in the RFbased model for _ estimation at mid-low latitudes, and it is most effective when the available number of the MODIS TOA observations in a single day is fewer than two and the daily sky condition is mainly overcast.

Conclusions
is a key parameter representing the available radiation budget at the land surface. As it also regulates most biological and physical processes, an accurate estimation of is of great significance for various applications. To improve the estimation accuracy using a limited amount of polar-orbiting satellite data at mid-low latitudes, we proposed to estimate _ at mid-low latitudes directly from MODIS TOA data along with comprehensive in situ measurements collected from globally distributed 340 sites from 2000- Overall, it is very effective to take the ERA5 R n as a physical constraint in the RFbased model for R n_daily estimation at mid-low latitudes, and it is most effective when the available number of the MODIS TOA observations in a single day is fewer than two and the daily sky condition is mainly overcast.

Conclusions
R n is a key parameter representing the available radiation budget at the land surface. As it also regulates most biological and physical processes, an accurate estimation of R n is of great significance for various applications. To improve the R n estimation accuracy using a limited amount of polar-orbiting satellite data at mid-low latitudes, we proposed to estimate R n_daily at mid-low latitudes directly from MODIS TOA data along with comprehensive in situ measurements collected from globally distributed 340 sites from 2000-2017, three RF-based models (named the RF-based ins model, RF-based model, and RF-based model with ERA5), and one LUT based model (named the LUT-based model),. Specifically, the RF-based ins model only calculated R n_ins , which was then converted into R n_daily using a temporal expansion model (i.e., the theoretical sinusoidal curve and linear regression model), while the RF-based model and LUT-based model delivered R n_daily directly. The RF-based model with ERA5 was developed by introducing R n_daily from ERA5 as a physical constraint into the RF-based model. After validation against ground measurements, the RF-based model with ERA5 performed the best among the four proposed models, yielding an overall validated R 2 of 0.89, RMSE of 21.83 Wm −2 , and bias of 0.2 Wm −2 .
Further evaluation was conducted on the R n_daily estimated from the RF-based model with ERA5 by comparing them with corresponding values from four existing products (GLASS, CERES4, ERA5, and FLUXCOM_RS) under various conditions. The results demonstrate the superior performance of the newly proposed model in terms of high accuracy (R 2 = 0.88, RMSE = 22.97 Wm −2 , and bias = 1.2 Wm −2 ) and robustness across various land cover types and elevation zones. Moreover, this newly proposed model has great potential for mapping long term global R n_daily at mid-low latitudes at finer resolution, which could provide more detail and be more useful for practical applications.
Based on this study, all the results illustrate the effectiveness of introducing a physical constraint (here, R n_daily from ERA5) into machine-learning RF-based models in order to reduce the uncertainty in R n_daily estimations when the available satellite observations are insufficient to characterize the variations in R n at the daily resolution, especially when the MODIS overpass frequency is less than twice a day or for overcast days. Compared to the traditional temporal expansion model, which is applied with the assumption that the atmospheric condition remains unchanged for an entire day, this proposed model is more reasonable as it does not make any such assumptions. Therefore, the proposed model has strong potential to be used for regional studies at a 1 km spatial resolution at mid-low latitudes.
However, several issues have not been fully considered with respect to this proposed RF-based model with ERA5, even though it achieved a satisfactory result at mid-low latitudes. For instance, spatially adjacent effects such as horizontal photon transport due to cloud heterogeneity and rapid changes of cloud on R n_ins [72], as well as the effect of topography, which has different influences on direct radiation, diffuse radiation, and irradiance obstructed and reflected by nearby terrain, were not considered. Efforts to address these issues are currently underway.