Estimation of Long-Term Surface Downward Longwave Radiation over the Global Land from 2000 to 2018

: It is of great importance for climate change studies to construct a worldwide, long-term surface downward longwave radiation ( L d , 4–100 µ m) dataset. Although a number of global L d datasets are available, their low accuracies and coarse spatial resolutions limit their applications. This study generated a daily L d dataset with a 5-km spatial resolution over the global land surface from 2000 to 2018 using atmospheric parameters, which include 2-m air temperature (Ta), relative humidity (RH) at 1000 hPa, total column water vapor (TCWV), surface downward shortwave radiation ( S d ), and elevation, based on the gradient boosting regression tree (GBRT) method. The generated L d dataset was evaluated using ground measurements collected from AmeriFlux, AsiaFlux, baseline surface radiation network (BSRN), surface radiation budget network (SURFRAD), and FLUXNET networks. The validation results showed that the root mean square error (RMSE), mean bias error (MBE), and correlation coefﬁcient (R) values of the generated daily L d dataset were 17.78 W m − 2 , 0.99 W m − 2 , and 0.96 ( p < 0.01). Comparisons with other global land surface radiation products indicated that the generated L d dataset performed better than the clouds and earth’s radiant energy system synoptic (CERES-SYN) edition 4.1 dataset and ERA5 reanalysis product at the selected sites. In addition, the analysis of the spatiotemporal characteristics for the generated L d dataset showed an increasing trend of 1.8 W m − 2 per decade ( p < 0.01) from 2003 to 2018, which was closely related to Ta and water vapor pressure. In general, the generated L d dataset has a higher spatial resolution and accuracy, which can contribute to perfect the existing radiation products.


Introduction
The surface downward longwave radiation (L d , 4-100 µm) is an indispensable component needed to study the Earth's surface radiation budget and energy balance [1]. Currently, there are four main ways of obtaining L d : ground measurement data, reanalysis retrieval methods, general circulation model (GCM) simulations and satellite products. However, L d is not always treated as a conventional observation as other common meteorological parameters are, such as air temperature (Ta), relative humidity (RH), etc. Moreover, its observation stations are sparsely distributed and even entirely absent in certain areas due to a high cost, a difficult calibration process, and a required quality control step [2][3][4][5]. In addition, there are uncertainties and biases in GCM simulations [6][7][8], reanalysis retrievals [9,10], and satellite products [11]. Therefore, establishing a more accurate long-term global L d dataset is not only useful for improving the knowledge of the surface radiation balance but is also helpful for perfecting the existing L d products. associated with their distribution and variability. The ready availability of S d data makes the L d estimation model more readily applicable under cloudy conditions.
In addition, S d and L d both show a strong dependence on altitude. Zeng et al. [38] evaluated the global land surface satellite (GLASS) L d product using the ground observations collected from 141 stations in six networks at different surface elevations. The RMSE values are 22.09, 23.31, 26.94, and 26.99 W m −2 at elevations of <500, 500-1000, 1000-3000, and >3000 m, respectively. The bias values are −3.19, −4.73, −2.26, and 15.34 W m −2 at the four elevation intervals, respectively. The validation results showed that the performance of L d degraded as the surface elevation increased. This may be due to special environmental conditions present at high altitudes with lower air pressures, smaller water vapor densities, and fewer clouds, leading to a greater uncertainty in S d and L d data at high elevations [37,[39][40][41][42][43]. In addition, some studies have also quantitatively measured the effect of elevation on L d and attempted to correct its deviation [37,[39][40][41][42]. Yang et al. [42] reported that the MBE of GEWEX-SRB V2.5 L d can be reduced by 7-10 W m −2 after an altitudinal correction of 2.8 W m −2 per hundred meters in the Tibet Plateau. It can be concluded that the influence of elevation cannot be ignored in addition to the abovementioned influencing factors including temperature profiles, water vapor, and clouds. Although the importance of elevation has been verified by previous studies, few studies have taken elevation as an important variable to predict L d . This paper used elevation as the input variable of the model, hoping to reduce the errors caused by elevation.
Based on the above summary, it is clear that L d is closely related to Ta, RH, water vapor, S d , and elevation. Therefore, this study utilized the gradient boosting regression tree (GBRT) method with the daily mean Ta of 2 m, RH at 1000 hPa, total column water vapor, S d , and elevation to estimate daily L d over global land surface from 2000 to 2018. In contrast to prior methods, this machine learning method can automatically establish the relationship between the input data and target variable, and has a strong predictive ability [44,45], which has been widely employed to retrieve radiation [46][47][48][49]. Yang et al. [46] applied the GBRT method to estimate daily S d with a spatial resolution of 5 km in China using ground observations and satellite retrievals with good results. The RMSE and R between the ground measurements and daily L d estimates were 27.71 W m −2 and 0.91, respectively, under cloudy conditions; these values were 42.97 W m −2 and 0.80, respectively, under clear conditions. To date, few studies have used this method to predict L d over the globe based on ground observations. We demonstrated that it can be reasonably and reliably used for L d estimation by building the relationship between L d observations and its influencing factors based on the GBRT method [49,50]. Therefore, the objective of this study is to use the GBRT model to generate a 5-km L d dataset over the global land surface with a daily time scale from 2000 to 2018.
The structure of this paper is as follows: Section 2 introduces the data used, including the ground measurements, ERA5 reanalysis data, GLASS S d , global multi-resolution terrain elevation data 2010, and existing L d products. The detailed model construction process is displayed and described in Section 3. Section 4 provides the evaluation results and analyzes the spatiotemporal distribution of L d . Finally, the discussion and conclusion are presented in Sections 5 and 6, respectively.

Ground Measurements
The ground measurements of surface downward longwave radiation (L d ) used in this study from 2000 to 2018 were collected from the AmeriFlux network (175 sites), AsiaFlux network (26 sites), baseline surface radiation network (BSRN, 57 sites), surface radiation budget network (SURFRAD, 7 sites), and FLUXNET (84 sites). The observation sites were randomly divided into 90% (314 sites) and 10% (35 sites) datasets, as shown in Figure 1. After removing the outliers, the L d observations collected at 314 sites were used as target variables to build and train the model. The remaining L d observations collected at 35 sites were used to evaluate the generated global land daily L d . The spatial distribution of the Remote Sens. 2021, 13, 1848 4 of 30 observation sites used to build the model and validate it is shown in Figure 1. The detailed information of ground sites is listed in the Appendix A Table A1. variables to build and train the model. The remaining Ld observations collected at 35 sites were used to evaluate the generated global land daily Ld. The spatial distribution of the observation sites used to build the model and validate it is shown in Figure 1. The detailed information of ground sites is listed in the Appendix A Table A1.
Critical quality control procedures were implemented to calculate the daily Ld because the selected networks only provided instantaneous Ld values, except for FLUXNET. The daily mean Ld was integrated from the instantaneous values if the portion of missing instantaneous values was less than 20% in one day. The monthly mean values used for validation were obtained by averaging the effective daily values if the missing daily data reached less than 10 days in one month.  [51,52] is a joint regional network that provides continuous measurements of various ecological parameters at five temporal resolutions, including carbon dioxide, water, meteorological data, and radiation data. The FLUXNET2015 dataset contains 1532 site-years of data from 1996 to 2014, of which daily Ld observations are used to build and evaluate Ld estimates over global land surface in this study. The AmeriFlux network [53][54][55] includes 151 sites with more than 100 active sites as of 2012, providing half-hourly or hourly Ld data spanning from 1996 to present. Flux tower sites of the AsiaFlux network [56,57] are spread across various representative climate zones (from humid to arid climates) and land cover types (forest, grass, cropland, and urban area), of which Ld observations have half-hourly or hourly temporal resolutions from 1998 to 2018.
To reduce systematic measurement errors, the data QA/QC checks proposed by Pastorello et al. [58], including single-variable, multi-variable, and specialized checks, are implemented at each site within the three networks. Single-variable checks are aimed at exploring the consistency of one variable in the long and short time series trends. Multivariable checks focus on the relationship among correlation variables to ascertain discrepant periods. Specialized checks look at common issues in eddy covariance (EC) and meteorological data, such as timestamp shifts or sensor deterioration patterns. The last step for data QA/QC is automatic checks that use specific variable de-spiking routines adapted from Papale et al. [59] to set a range for each variable. Critical quality control procedures were implemented to calculate the daily L d because the selected networks only provided instantaneous L d values, except for FLUXNET. The daily mean L d was integrated from the instantaneous values if the portion of missing instantaneous values was less than 20% in one day. The monthly mean values used for validation were obtained by averaging the effective daily values if the missing daily data reached less than 10 days in one month.
2.1.1. AmeriFlux, AsiaFlux, and FLUXNET Data FLUXNET [51,52] is a joint regional network that provides continuous measurements of various ecological parameters at five temporal resolutions, including carbon dioxide, water, meteorological data, and radiation data. The FLUXNET2015 dataset contains 1532 site-years of data from 1996 to 2014, of which daily L d observations are used to build and evaluate L d estimates over global land surface in this study. The AmeriFlux network [53][54][55] includes 151 sites with more than 100 active sites as of 2012, providing half-hourly or hourly L d data spanning from 1996 to present. Flux tower sites of the Asi-aFlux network [56,57] are spread across various representative climate zones (from humid to arid climates) and land cover types (forest, grass, cropland, and urban area), of which L d observations have half-hourly or hourly temporal resolutions from 1998 to 2018.
To reduce systematic measurement errors, the data QA/QC checks proposed by Pastorello et al. [58], including single-variable, multi-variable, and specialized checks, are implemented at each site within the three networks. Single-variable checks are aimed at exploring the consistency of one variable in the long and short time series trends. Multi-variable checks focus on the relationship among correlation variables to ascertain discrepant periods. Specialized checks look at common issues in eddy covariance (EC) and meteorological data, such as timestamp shifts or sensor deterioration patterns. The last step for data QA/QC is automatic checks that use specific variable de-spiking routines adapted from Papale et al. [59] to set a range for each variable. The baseline surface radiation network (BSRN) was initiated by the world climate research program (WCPR) and aimed to provide accurate observations for validation of satellite radiometry and climate models [60]. The BSRN project has established more than 60 stations globally since January 1992 spanning latitudes ranging from 80 • N to 90 • S, providing continuous meteorological and radiation data on a minute time scale. By improving its calibration process, the difference between L d observations from different pyrgeometers only reached 10 W m −2 in 1995 [61]. Only 6.5% of the L d data are missing, which indicates that the pyrgeometers within the BSRN maintain high standards [62]. Moreover, the missing data have less influence on L d because L d has a small diurnal cycle. Overall, the BSRN L d observations are relatively accurate and reliable.

SURFRAD Data
The surface radiation budget network (SURFRAD) has provided meteorological and radiation data used for evaluating satellite products and researching climate changes in the United States since 1995. Currently, it is composed of seven stations representing diverse climates with elevations ranging from 98 to 1689 m. It provides long-term and continuous surface radiation measurements with 3 min and 1 min time intervals before and after 2009, respectively. The L d measured by SURFRAD, with an uncertainty of ±9 W m −2 , covers a wavelength spanning from 4 to 50 µm [63]. The time period of L d measurements used ranges from 2000 to 2018 in this study.

ERA5 Reanalysis Dataset
ERA5 [64], produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), is the fifth generation reanalysis dataset and a successor of ERA-Interim. It provides complete and consistent hourly temperature, relative humidity, and radiation datasets, in addition to many other atmospheric parameter datasets, with a 25-km spatial resolution from 1979 to near real time. Compared with ERA-Interim [65], ERA5 applied the updated integrated forecast system (IFS) "Cy41r2" 4D-var and produced many new parameters, such as a 100-m wind vector [66]. Many studies have also compared the accuracy of ERA5 and used it to analyze climate change. For example, Wang et al. [66] found that the warm bias of ERA5 2-m air temperature (Ta) is smaller in the warm season and larger in the cold season in relation to the buoy observations over Arctic sea ice. Zhen et al. [67] indicated that the mean relative humidity (RH) of ERA5 displayed a sharp decreasing jump for China during the early 2000s. In this study, the parameters of the ERA5 hourly reanalysis dataset, including the 2-m Ta ( • C), the RH at 1000 hPa (%), and the total column water vapor (TCWV, kg m −2 ) from 2000 to 2018, were consolidated into a daily temporal resolution as input data to construct global land L d (W m −2 ) dataset based on the GBRT method.

GLASS Surface Downward Shortwave Radiation Product
The global land surface satellite (GLASS) daily surface downward shortwave radiation (S d , W m −2 ) product with a 5-km spatial resolution from 2000 to 2018 was produced from the moderate resolution imaging spectroradiometer top-of-atmosphere (TOA) spectral reflectance on the basis of a direct estimation method [68,69]. First, the TOA reflectance was retrieved using atmospheric radiation transfer simulations under different solar or view geometries. Then, surface shortwave net radiation (S n ) was estimated from the TOA reflectance on the basis of a linear regression relationship between them under different atmospheric conditions and surface properties. Finally, the GLASS daily S d was produced using daily S n estimates and surface broadband albedo values. The GLASS daily S d values obtained an overall RMSE and bias of 32.84 and 3.72 W m −2 , respectively, compared to the ground observations at 525 sites from 2003 to 2005 [68].

Global Multi-Resolution Terrain Elevation Data 2010
The 2010 Global Multi-resolution Terrain Elevation dataset (GMTED2010DEM) [70] is a global continent-wide elevation dataset generated by the U.S. Geological Survey (USGS) and the National Geospatial-Intelligence Agency (NGA). This product contains three spatial resolutions (approximately 250, 500, and 1000 m) aimed at providing generic products for different applications. Carabajal et al. [71] indicated that the GMTED2010DEM products exhibited a great improvement relative to previous elevation data at comparable resolutions. Compared to the global set of the ice, cloud, and land elevation satellite (ICESat) geodetic ground control points, it obtained a positive bias of approximately 3 m. In this study, GMTED2010DEM data with a spatial resolution of approximately 250 m were resampled to a 5-km resolution as input data for estimating L d to match the generated L d dataset.

Exiting Surface Downward Longwave Radiation Datasets
The L d products used for validation and comparison with the generated L d dataset contain the clouds and earth's radiant energy system synoptic (CERES-SYN) edition 4.1 and ERA5 reanalysis datasets. The CERES-SYN product with a 100-km spatial resolution, generated on the basis of the Langley Fu-Liou radiation transfer model [72], provides flux estimates at the TOA and surface, as well as four atmospheric pressure levels (70,200, 500, and 850 hPa) from 2000 to 2020. Compared with CERES-SYN Edition 3A, the L d of Edition 4A has been improved due to the improvement of nighttime retrieved cloud properties [73,74]. The ERA5 hourly L d with a 25-km spatial resolution from 1979 to near real time used the more complicated method proposed by Morcrette [9] to replace the old L d parametrization [75]. Silber et al. [10] demonstrated that ERA5 underestimated L d compared with the ground measurements collected from the ARM West Antarctic radiation experiment (AWARE) campaign at McMurdo Station and the West Antarctic Ice Sheet (WAIS) divide. In this study, the daily ERA5 L d dataset consolidated from the hourly dataset and the CERES-SYN product were compared and used to evaluate the generated L d dataset from 2000 to 2018.

Gradient Boosting Regression Tree
The gradient boosting regression tree (GBRT) is an ensemble approach that enhances the accuracy of the model by aggregating multiple weak forms of regression and decision trees first proposed by Friedman [76]. The GBRT method is capable of predicting and solving overfitting problems [77]. The core idea of this model is to select the appropriate decision tree function based on the current model and fitting function in order to minimize the loss function. The model produces a strong predictive model by constructing an M amount of different weak classifiers through multiple iterations in order to obtain an accurate prediction rule. Each iteration is to improve the previous results by reducing the residuals of the previous model and establish a new combined model in the gradient direction of the reduced residual [46].
is the training dataset, where x represents the predictor variables, y represents the target variable, and N is the number of the training dataset. The GBRT model constructs M different individual decision trees, expressed as {h(x, α i )} M i=1 , which can be used to calculate the approximation function of the target variable f (x) as follows: where β m and α m are the weight and classifier parameter of each decision tree, respectively. A loss function L(y, f (x)) is introduced to describe the accuracy of the model. Each tree partitions the input space into J regions R 1m , R 2m , · · · , R jm and each R jm corresponds to Remote Sens. 2021, 13, 1848 7 of 30 a predicted value γ jm . The general process of the GBRT method is shown in Appendix A, Algorithm A1. More details about the GBRT method can be found in Hastie et al. [78] and Ridgeway [79].
The accuracy of the GBRT model which is implemented in the scikit-learn toolbox is mainly affected by its n-estimator, learning rate, max-depth, and subsample parameters. The n-estimator parameter is the maximum number of iterations completed by a weak learner. Larger n-estimators are more likely to lead to overfitting due to a poorer prediction ability with an increasing model complexity. The learning rate parameter is the weight reduction factor of each weak learner, which is usually used together with the n-estimator parameter to determine the fitting effect of the algorithm. The max-depth parameter is the maximum depth of each regression tree, which limits the number of nodes in the tree. The subsample parameter is the proportion of samples used for fitting the base decision tree. Selecting a subsample less than 1 can reduce overfitting but increase the deviation of sample fitting. In this study, the root mean square error (RMSE), mean bias error (MBE), and correlation coefficient (R) between the L d observations and estimates are used to evaluate the accuracy of the model.

Model Construction
The daily 2-m air temperature (Ta), relative humidity (RH) at 1000 hPa, total column water vapor (TCWV), surface downward shortwave radiation (S d ), and elevation datasets are selected as predictor variables to estimate the daily surface downward longwave radiation (L d ). The target variable is the daily L d observations collected at AmeriFlux, AsiaFlux, BSRN, FLUXNET, and SURFRAD from 2000 to 2018. First, the predictor variables were extracted from global datasets corresponding to the ground stations. Then, the dataset of 314 sites was divided into two portions at random: 80% for the training dataset and the remaining 20% for the test dataset. To select the optimal model, 5-fold cross-validation was applied during the training process. The main steps are as follows: (1) Calculating daily L d observations. The daily mean L d was integrated from the instantaneous values if the missing instantaneous values were less than 20% in one day because the AmeriFlux, AsiaFlux, BSRN, and SURFRAD networks only provide instantaneous L d values; (2) Data preprocessing. After resampling to a 5-km resolution, the ERA5 Ta, ERA5 RH, ERA5 TCWV, GLASS S d , and GMTED2010DEM elevation datasets were extracted according to the latitude, longitude, and time corresponding to the ground stations; (3) Training the GBRT model. By circulating within the range of each parameter displayed in Table 1, the GBRT model where the n-estimator parameter is set to 50, the learning rate is set to 0.1, the max-depth is set to 6, and the subsample parameter of 0.8 was selected as the optimal model to estimate global land L d , achieving the lowest RMSE and MBE values on the test dataset; (4) Implementing the model. The global land L d was produced on the basis of the trained model using the daily ERA5 Ta, ERA5 RH, ERA5 TCWV, GLASS S d , and GMTED2010DEM elevation datasets; (5) Evaluation of the generated global land L d dataset. Daily L d values collected at 35 observation sites were used to validate the generated global land L d dataset and compare it with the existing L d datasets. The main flowchart in this study is shown in Figure 2.  In order to investigate the impact of the predictor variables used in the GBRT model on the Ld estimation, the feature importance measures provided by the GBRT method was conducted. As shown in Table 2, the importance of the predictor variables of the GBRT model was in the order of the total column water vapor (TCWV), 2-m air temperature (Ta), relative humidity at 1000hPa (RH), surface downward shortwave radiation (Sd), and elevation. The Ld estimates are shown to be more sensitive to the TCWV and Ta than to most of other variables, thus highlighting the importance of taking TCWV and Ta as inputs.  In order to investigate the impact of the predictor variables used in the GBRT model on the L d estimation, the feature importance measures provided by the GBRT method was conducted. As shown in Table 2, the importance of the predictor variables of the GBRT model was in the order of the total column water vapor (TCWV), 2-m air temperature (Ta), relative humidity at 1000hPa (RH), surface downward shortwave radiation (S d ), and elevation. The L d estimates are shown to be more sensitive to the TCWV and Ta than to most of other variables, thus highlighting the importance of taking TCWV and Ta as inputs. Table 2. Importance rankings of all predictor variables for L d estimation.

Performance of the Model
After confirming the optimal parameters, 80% and 20% of the extracted dataset collected at 314 stations were used as the training and test datasets, respectively, to train the GBRT model and evaluate the L d estimates. Figure 3 displays the evaluation results of daily L d estimates for the training and test datasets against the ground measurements collected at the AmeriFlux, AsiaFlux, BSRN, FLUXNET, and SURFRAD networks from 2000 to 2018. For the training dataset, the root mean square error (RMSE), mean bias error (MBE), and correlation coefficient (R) are 16.73 W m −2 , 0 W m −2 , and 0.96 (p < 0.01), respectively, between the ground observations and L d estimates on the basis of the GBRT model from 2000 to 2018. Those values are 16.75 W m −2 , 0.05 W m −2 , and 0.96 (p < 0.01) for the test dataset, respectively, which shows a tendency to slightly overestimate L d . As a whole, the performance of the GBRT model on the test dataset is satisfactory and reliable with an MBE close to zero.
to 2018. For the training dataset, the root mean square error (RMSE), mean bias error (MBE), and correlation coefficient (R) are 16.73 W m −2 , 0 W m −2 , and 0.96 (p < 0.01), respectively, between the ground observations and Ld estimates on the basis of the GBRT model from 2000 to 2018. Those values are 16.75 W m −2 , 0.05 W m −2 , and 0.96 (p < 0.01) for the test dataset, respectively, which shows a tendency to slightly overestimate Ld. As a whole, the performance of the GBRT model on the test dataset is satisfactory and reliable with an MBE close to zero.

Comparison with Existing Ld Products
To better evaluate the accuracy of the generated Ld dataset, the valuation result against the 35 sites from 2000 to 2018 was compared with the CERES-SYN and ERA5 products. The generated Ld and ERA5 products were resampled to a 100-km resolution using the nearest neighbor interpolation method to match the CERES-SYN product. As shown in Figure 6, the RMSE and MBE are 17.94 and 0.25 W m −2 , 18.81 and 1.76 W m −2 , 18.52 and −2.09 W m −2 , respectively, for the daily generated, CERES-SYN, and ERA5 Ld datasets. CERES-SYN and ERA5 Ld show overestimated and underestimated tends on the daily time scale, respectively. Relatively speaking, the overestimated trend of the generated Ld dataset with an MBE of 0.25 W m −2 is slight. In addition, the RMSE of the ERA5 daily Ld dataset is less than that of the CERES-SYN product, and it can be concluded that the ERA5 Ld product over land is more accurate than that of the CERES-SYN on the daily time scale. This is consistent with the conclusion of Tang et al. [11] that the ERA5 Ld product over land surface has a higher accuracy on average than the CERES-SYN on the hourly, daily,

Comparison with Existing L d Products
To better evaluate the accuracy of the generated L d dataset, the valuation result against the 35 sites from 2000 to 2018 was compared with the CERES-SYN and ERA5 products. The generated L d and ERA5 products were resampled to a 100-km resolution using the nearest neighbor interpolation method to match the CERES-SYN product. As shown in Figure 6, the RMSE and MBE are 17.94 and 0.25 W m −2 , 18.81 and 1.76 W m −2 , 18.52 and −2.09 W m −2 , respectively, for the daily generated, CERES-SYN, and ERA5 L d datasets. CERES-SYN and ERA5 L d show overestimated and underestimated tends on the daily time scale, respectively. Relatively speaking, the overestimated trend of the generated L d dataset with an MBE of 0.25 W m −2 is slight. In addition, the RMSE of the ERA5 daily L d dataset is less than that of the CERES-SYN product, and it can be concluded that the ERA5 L d product over land is more accurate than that of the CERES-SYN on the daily time scale. This is consistent with the conclusion of Tang et al. [11] that the ERA5 L d product over land surface has a higher accuracy on average than the CERES-SYN on the hourly, daily, and monthly time scales but has a worse accuracy than the CERES-SYN dataset over ocean surface. On the monthly time scale, the RMSE and MBE are 11. 75 Figure 7, there are 28, 28, and 25 sites with RMSEs less than 25 W m −2 for the daily generated, CERES-SYN, and ERA5 L d datasets. Only 3, 3, and 2 out of 35 sites had RMSEs greater than 30 W m −2 for the three daily L d datasets, respectively. These three daily L d datasets have 23, 19, and 24 sites with MBEs between −10 and 10 W m −2 , respectively. However, the daily CERES-SYN L d product obtained 10 sites with MBEs greater than 10 W m −2 , compared with 6 for the generated L d dataset and 3 for the ERA5 L d retrieval.
ERA5 Ld datasets. As shown in Figure 7, there are 28, 28, and 25 sites with RMSEs less than 25 W m −2 for the daily generated, CERES-SYN, and ERA5 Ld datasets. Only 3, 3, and 2 out of 35 sites had RMSEs greater than 30 W m −2 for the three daily Ld datasets, respectively. These three daily Ld datasets have 23, 19, and 24 sites with MBEs between −10 and 10 W m −2 , respectively. However, the daily CERES-SYN Ld product obtained 10 sites with MBEs greater than 10 W m −2 , compared with 6 for the generated Ld dataset and 3 for the ERA5 Ld retrieval.   Figures 8 and 9, respectively. The highest multiyear seasonal mean Ld value is 333.21 W m −2 in Northern hemisphere summer (June, July, and August), followed by 311.94 W m −2 in Northern hemisphere autumn (September, October, and November), and the lowest value is 286.09 W m −2 in Northern hemisphere winter (December, January, and February). The seasonal variation in Ld is closely related to the annual solar zenith cycle and the maximum sunshine dura-   Figures 8 and 9, respectively. The highest multiyear seasonal mean L d value is 333.21 W m −2 in Northern hemisphere summer (June, July, and August), followed by 311.94 W m −2 in Northern hemisphere autumn (September, October, and November), and the lowest value is 286.09 W m −2 in Northern hemisphere winter (December, January, and February). The seasonal variation in L d is closely related to the annual solar zenith cycle and the maximum sunshine duration. After the winter solstice, the direct sun point moves northward from the Tropic of Capricorn, causing changes in the global heat distribution, which increases the overall L d value in the Northern hemisphere. Overall, the multiyear annual mean value of the generated L d dataset is 308.76 W m −2 , which is greater than the ERA5 value of 306.92 W m −2 and less than the CERES-SYN value of 313.83 W m −2 from 2003 to 2018. The spatial distribution of L d not only shows significant latitudinal dependencies in which the mean L d value decreases with increasing latitude but also relates to the surface elevation and regional climate. The mean L d values estimated over the Andes and Tibetan Plateau are comparatively and obviously low due to their high elevation with a low cloud coverage, thin air and readily lost heat. The mean L d values of Antarctica and Greenland are always lowest owing to the perennial snow cover and frigid climate. Apparently, the generated L d value is lower than the CERES-SYN value and higher than the ERA5 value. The lowest and highest differences between the generated L d and the CERES-SYN product are −81.44 and 60.56 W m −2 , respectively; and the values between the generated L d and the ERA5 product are −46.17 and 58.83 W m −2 , respectively. The generated L d value is significantly lower than the CERES-SYN in the Tibetan Plateau, Andes Mountains, and Antarctica, and is significantly higher than it in a small area of the northern Amazon Rainforest and eastern Indonesia. Compared with the CERES-SYN dataset, the difference between the generated L d dataset and the ERA5 product is evenly distributed with no obvious high and low values over the global land surface. The multiyear annual mean Ld values of the generated dataset, ERA5 retrieval, and CERES-SYN product are consistent with the evaluation results against the ground measurements that ERA5 and CERES-SYN tend to underestimate and overestimate Ld value, respectively. However, there is still much debate about the specific multiyear annual mean Ld value over the global land surface. The uncertainty of the global land mean Ld estimation is difficult to quantify, and different periods may influence the estimated values. Ma et al. [80]

Time Series and Long-Term Trend
To study the temporal variations of the generated Ld dataset, we monthly and annual mean Ld from 2003 to 2018, as shown in Figure 1 whether the interannual changes in the generated Ld dataset were reliabl with the ERA5 and CERES-SYN Ld datasets. Overall, ERA5 has a relativ and CERES-SYN shows a larger value for the multiyear monthly mean L monthly mean Ld of the ERA5, generated dataset, and CERES-SYN are al ary, with values of 280.02, 284.22, and 284.49 W m −2 , respectively; they a July, with values of 336.12, 336.24, 343.94 W m −2 , respectively. The mu mean Ld values of the three datasets all increase from January to July an July to December in connection with the revolution of the earth around t in more total solar radiation in Northern hemisphere summer than in sphere winter. Compared with ERA5 and CERES-SYN, the boxed part (Figure 10a

Time Series and Long-Term Trend
To study the temporal variations of the generated L d dataset, we calculated the monthly and annual mean L d from 2003 to 2018, as shown in Figure 10. We analyzed whether the interannual changes in the generated L d dataset were reliable by comparing with the ERA5 and CERES-SYN L d datasets. Overall, ERA5 has a relatively lower value, and CERES-SYN shows a larger value for the multiyear monthly mean L d . The multiyear monthly mean L d of the ERA5, generated dataset, and CERES-SYN are all lowest in January, with values of 280.02, 284.22, and 284.49 W m −2 , respectively; they are all largest in July, with values of 336.12, 336.24, 343.94 W m −2 , respectively. The multiyear monthly mean L d values of the three datasets all increase from January to July and decrease from July to December in connection with the revolution of the earth around the sun resulting in more total solar radiation in Northern hemisphere summer than in Northern hemisphere winter. Compared with ERA5 and CERES-SYN, the boxed part of the box-plot (Figure 10a)

Relationships between the Long-Term Ld and the Key Factors
Previous studies indicated that the accuracy of Ld estimation mainly depends on the reliability of the air temperature, precipitable water vapor, cloud, and elevation data retrieved from the reanalysis and satellite products. In view of the small variations in elevation and cloud cover over the long time series, the trend of Ld estimation is mainly influenced by air temperature and water vapor pressure. Therefore, we calculated the anomalies of the 2-m air temperature (Ta, °C) and water vapor pressure (e, hPa) datasets based on the ERA5 hourly products from 2003 to 2018. e can be calculated with Ta using the following equations based on the Ta and relative humidity at 1000 hPa (RH) derived from the ERA5 products.  [80] concluded that the trend of the annual mean L d over the global land surface for 44 CMIP5 GCMs varied from 0.69 to 2.86 W m −2 per decade (p < 0.01) during the time period of 1970-2005, and its median value is 1.86 W m −2 per decade. Therefore, it is reliable for the generated and ERA5 L d datasets, with trends of 1.8 (p < 0.01) and 1.9 (p < 0.01) W m −2 per decade over the global land surface, respectively.

Relationships between the Long-Term L d and the Key Factors
Previous studies indicated that the accuracy of L d estimation mainly depends on the reliability of the air temperature, precipitable water vapor, cloud, and elevation data retrieved from the reanalysis and satellite products. In view of the small variations in elevation and cloud cover over the long time series, the trend of L d estimation is mainly influenced by air temperature and water vapor pressure. Therefore, we calculated the anomalies of the 2-m air temperature (Ta, • C) and water vapor pressure (e, hPa) datasets based on the ERA5 hourly products from 2003 to 2018. e can be calculated with Ta using the following equations based on the Ta and relative humidity at 1000 hPa (RH) derived from the ERA5 products.
where e and e s are the water vapor pressure and saturation water vapor pressure, respectively. Figure 11 presents the temporal variation in the annual mean anomalies for the generated L d estimation, Ta  Their distribution characteristics are similar to that of the generated L d dataset that its spatial distribution not only shows notable latitudinal dependencies as the annual mean values decrease with increasing latitudes but it also relates to the surface elevation and regional climate. The annual mean Ta and e values on the Andes and Tibetan Plateau are comparatively and obviously low due to their high elevations. The annual mean Ta values of Antarctica and Greenland are always the lowest due to their perennial snow coverage and frigid climates. The annual mean e values are relatively low, less than 21.72 hPa at middle to high latitudes. The spatial distribution of R between the generated L d estimation and Ta and e from 2003 to 2018 is also drawn, as shown in Figure 13. Only significant pixels where p values are less than 0.05 appeared. The R values ranged from 0.50 and −0.67 to 1 for Ta and e, respectively. There was a positive correlation between the generated L d and Ta in the region where the R passed the significant test. For the e, there are few pixels with R value less than 0, which even cannot be shown up on the map. Except for values less than 0, the minimum value of R between the generated L d and e is also 0.50. The R values between annual mean L d estimates and Ta and e failed the significant test mainly occurred in the Andes Mountains, Brazilian Plateau, Tibet Plateau, Australia, Southern Africa, and southern North America. This may be due to the influences of clouds, elevation controls, and carbon dioxide emissions [29,30,41,82], that play a dominant role in these regions. The possible reasons need to be further explored.

Shortcomings of the GBRT Model
The gradient boosting regression tree (GBRT) method has advantages in forecasting and solving overfitting problems [76]. The evaluation results demonstrated that the generated L d dataset based on the GBRT method performed better at selected stations than the ERA5 and CERES-SYN products on daily and monthly time scales. However, there are still some disadvantages in the machine learning methods for radiation estimation represented in the GBRT method [45,77,83]. Alizamir et al. [77] utilized six different machine learning models to estimate solar radiation from two stations of two different locations, and found that the six models all tend to overestimate L d for low values of it and to underestimate L d for high values of it. Fan et al. [83] also exposed similar problem in using support vector machine and extreme gradient boosting methods to predict daily solar radiation in China. With reference to Figures 3, 4 and 6, it is evident that the GBRT method for L d estimation also overestimate L d at low values and underestimate L d at high values. Since the departures of slope from 1 and intercept from 0 for fitting linear regression equations can measure the degree of deviation, the linear regression equations were fitted between the ground measurements of 35 stations and the generated, ERA5, and CERES-SYN L d datasets on daily and monthly time scales, as listed in Table 3. Compared to those two L d products, the fitted linear regression equations of the generated L d has a smaller slope and a greater intercept on both daily and monthly time scales, which indicates that the fitted line deviates more from the 1:1 line and that the GBRT method underestimates L d for high values and overestimates it for low values. Other machine learning methods, including support vector regressions, multivariate adaptive regression splines, and artificial neural networks used to estimate L d , also show the same problem [49]. On the other hand, the GBRT method makes predictions by learning rules from many sample data, so it has higher requirements for the accuracy and quantity of its training datasets. However, the ground measurements of L d used as the target variable exhibit missing values and deviations, although the obviously incorrect data have been removed, which may be limit the accuracy of the GBRT method. Finally, the learning and training process of the GBRT method is a black box whose processes are not known and may not be effective [45]. Table 3. The fitted linear regression equations for the generated, ERA5, and CERES-SYN L d datasets on both daily and monthly time scales. Where the x and y represent the ground measurements of L d and the L d estimates, respectively.

Time Scale Dataset Fitted Linear Regression Equation
Daily time scale

Accuracy and Completeness of Input Datasets and Ground Measurements
Because L d was estimated based on the relationships between its ground observations and input variables, including the 2-m air temperature (Ta), relative humidity (RH) at 1000 hPa, total column water vapor (TCWV), surface downward shortwave radiation (S d ), and elevation, the accuracy and completeness of the input datasets and ground measurements are vital. However, ground observations exist measurement errors and problem of spatial representativeness, which are potential sources of errors in L d estimation. A larger part of measurements errors is caused by systematic deviations and calibration process differences. Ohmura et al. [60] demonstrated the accuracy of L d observations in the baseline surface radiation network improved from 30 W m −2 in 1999 to 10 W m −2 in 1995 due to improvement of the calibration process. Currently, although pyrgeometers for L d measurement are regularly maintained and calibrated, there is still a lack of recognized world reference calibration standard [60,84]. The different calibration methods of different observation networks can lead to inconsistencies of L d measurements at the close positions, which also brings uncertainties of ground measurements [81]. The spatial representativeness plays an important role in the surface radiation retrieval and validation [85][86][87][88]. Jiang et al. [87] indicated the accuracy of S d retrieval can be improved, that maximum improvement of root mean square error is up to 9%, after considering the scale information. In this study, we only compared the accuracies of the generated, ERA5, and CERES-SYN L d datasets at 100-km spatial resolution but did not examine the representativeness of surface observation points, which maybe lead to the uncertainty of compared result.
On the other hand, the completeness of input datasets limits the continuity of the generated L d dataset. For example, the generated daily L d dataset was discontinuous before 2003 due to the data missing of the global land surface satellite (GLASS) daily S d product which was produced by using moderate resolution imaging spectroradiometer (MODIS) top of atmosphere (TOA) reflectance data. Compared with the previous methods of S d retrieval, however, GLASS S d had a higher spatial resolution of 5 km and was directly estimated using TOA reflectance without the need for cloud and aerosol data, which contributes to a better ability to demonstrate temporal variations in S d over a long time period. Moreover, the ERA5 and elevation datasets were resampled to a 5-km spatial resolution matching with S d , which can also introduce uncertainty into the data. In summary, the L d estimates will be more accurate if the accuracy and completeness of the input datasets and ground measurements are improved.

Conclusions
It is of great importance for studying the Earth's surface radiation budget and energy balance to construct a long-term surface downward longwave radiation (L d , 4-100 µm) dataset worldwide. This study generated a daily L d dataset with a 5-km spatial resolution over the global land surface utilizing the gradient boosting regression tree (GBRT) method with 2-m air temperature (Ta), relative humidity (RH) at 1000 hPa, total column water vapor (TCWV), surface downward shortwave radiation (S d ), and elevation datasets from 2000 to 2018. The L d observations of 349 stations collected at the AmeriFlux, AsiaFlux, baseline surface radiation network (BSRN), surface radiation budget network (SURFRAD), and FLUXNET networks were randomly divided into 90% (314 sites), as the target variable to build the model, and 10% (35 sites), as the evaluation dataset to independently validate the L d estimates. First, the predictor variables were extracted from the global datasets according to the latitude, longitude, and time corresponding to the ground stations. Then, the dataset of 314 sites was further divided into two portions at random to train the GBRT model: 80% for the training dataset and the remaining 20% for the test dataset. Then, the daily L d observations collected at 35 stations were used to validate the generated global land L d dataset.
The evaluation results showed that the root mean square error (RMSE), mean bias error (MBE), and correlation coefficient (R) values on the daily time scale were 17.78 W m −2 , 0.99 W m −2 , and 0.96 (p < 0.01), respectively, between the L d estimates with a 5-km spatial resolution and the ground measurements. On the monthly time scale, those values are 11.53 W m −2 , 0.68 W m −2 , and 0.98 (p < 0.01), respectively. At a 100-km spatial resolution, the performance of the generated L d dataset is better than that of ERA5 and CERES-SYN. On the daily time scale, the RMSE and MBE are 17.94 and 0.25 W m −2 , 18.81 and 1.76 W m −2 , 18.52 and −2.09 W m −2 , respectively, for the generated, CERES-SYN, and ERA5 L d datasets.
The multiyear seasonal and annual mean values of the generated L d dataset from 2003 to 2018 were calculated due to the absence of daily L d from 2000 to 2002. In terms of their temporal variation, the multiyear monthly mean L d values of the three datasets increase from January to July and decrease from July to December in connection with the revolution of the earth around the sun resulting in more total solar radiation in Northern hemisphere summer than in Northern hemisphere winter. Overall, the temporal variation and trend of the generated L d dataset are more consistent with the ERA5 product that the annual mean L d values display a gradual increasing trend from 2003 to 2018. The spatial distribution of L d not only shows a notable latitudinal dependency in which the mean L d value decreases with increasing latitudes but also relates to the surface elevation and regional climate. In addition, L d is positively affected by the 2-m air temperature and water vapor pressure with R values of 0.96 (p < 0.01) and 0.97 (p < 0.01), respectively.
Overall, the generated L d dataset has a higher spatial resolution and accuracy, contributing to knowledge of the surface radiation budget and energy balance of the Earth.

Acknowledgments:
We sincerely thank the institutions and researchers who provided the data used in this study and made them available to the public. We also sincerely thank the anonymous reviews for their constructive suggestions which are greatly helpful for improving the article.

Conflicts of Interest:
The authors declare no conflict of interest.