Evaluation of Reanalysis Datasets for Solar Radiation with In Situ Observations at a Location over the Gobi Region of Xinjiang, China

: Solar radiation is the most important source of energy on the Earth. The Gobi area in the eastern Xinjiang region, due to its geographic location and climate characteristics, has abundant solar energy resources. In order to provide detailed scientiﬁc data supporting solar energy development in this area, we used ground-based data to evaluate the applicability of the ﬁve reanalysis data sources: the Clouds and the Earth’s Radiant Energy System (CERES), the European Center for Medium-Range Weather Forecasts Reanalysis version 5 (ERA5), the Modern-Era Retrospective Analysis for Research and Applications version 2 (MERRA2), and the Japanese 55-year Reanalysis (JRA-55). Our results indicated that the CERES data show underestimated short-wave radiation and overestimated long-wave radiation. The correlation coefﬁcients ( r ) between the ERA5 dataset and the net long-wave and short-wave radiation in observation were 0.92 and 0.91, respectively, and the r between the MERRA2 dataset and the net long-wave and short-wave radiation in observation were both 0.88. The JRA-55 dataset overestimated the long-wave radiation ﬂux and underestimated the short-wave radiation ﬂux. The clearness index ( k t ) of all datasets was poor during autumn and winter, the ERA5 estimates were cloudy when the actual condition was sunny, while the JRA-55 estimates were sunny when the actual condition was cloudy. Overall, the radiation ﬂux in the ERA5 dataset had the highest applicability in the Gobi region.


Introduction
Solar radiation is the main energy source for all atmospheric physical processes on Earth and it influences the climate and weather [1]. Accurate estimation of surface solar radiation is essential for studying solar energy resources, hydrological processes, and climate change [2]. In early studies, scholars used solar radiation data directly from surface observation stations to conduct experiments and studies. Zha et al. [3] analyzed the spatial -temporal variations in surface solar radiation in China from 1957 to 1992 by using the solar radiation data from 58 surface stations. The results showed that since 1972, the total amount of surface solar radiation in China has decreased due to an increase in aerosol concentration. Zhou et al. [4] analyzed the abundance, utilization value, and stability of is 6.1 °C, the extreme maximum temperature is 40.6 °C, the minimum temperature is −35.1 °C, and the annual average precipitation is 50.9 mm [21].

Surface Observation Data
To study the land-atmosphere interaction in the Black Gobi area, the Institute of Desert Meteorology, China Meteorological Administration had established a land-atmospheric interaction observation station in Hongliuhe Gobi in May 2016 (hereafter HLH). The HLH station (41°32′ N, 94°43.8′ E, at 1579 m above sea level) includes a 35 m gradient tower observation system, eddy covariance system, and solar radiation observation system. The solar radiation observation system includes a pyranometer (SR20, Hukseflux, The Netherlands) for measuring downward short-wave solar radiation, upward shortwave solar radiation, and scatter solar radiation, pyrgeometers (IR20, Hukseflux, The Netherlands) for measuring downward and upward infrared radiation, and pyrheliometers (DR02, Hukseflux, The Netherlands) for measuring direct solar radiation. In addition, it is also equipped with sun trackers (STR-22G, EKO, Japan), UV radiometer (UVS-AB-T, Kipp&Zonen, The Netherlands), net radiometer (NR01, Hukseflux, The Netherlands), and sunshine duration sensors (CSD3, Kipp&Zonen, The Netherlands). The data from these sensors are recorded continuously by a data logger (CR6, Campbell, CA, USA) at 1 Hz intervals and averaged at 10s, 1 min, 20 min, 1 h, and 1 d intervals. Excluding the evident outliers in the measured data, the total solar radiation was less than 20 W·m −2 and the corresponding reflected short-wave radiation. In the calculation of the average value, short-wave radiation is the daytime average, while long-wave radiation and net radiation are the whole day averages. The observation data used in this study is for the year 2018, but more than 2/3 of the long-wave radiation data from 19 May to 22 June 2018 were missing; hence it was excluded.

Satellite Data
The CERES satellite data were obtained from the Atmospheric Science Data Center of the National Aeronautics and Space Administration Langley Research Center (http://ceres.larc.nasa.gov/ (accessed on 16 October 2021)) with a temporal resolution of 1 h and a spatial resolution of 1°. This dataset provided the monthly, 3 h, and 1 h averaged direct and scattered radiation, ultraviolet, short-wave and long-wave fluxes, albedo products, aerosol optical thickness, cloud optical thickness, ozone, and other auxiliary parameters. The detailed instructions can be found on the official website.

Reanalysis Dataset
The ERA5 is a fifth-generation atmospheric reanalysis of the global climate from the ECMWF (https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5 (accessed on 16 October 2021)), covering the hourly meteorological data of the atmosphere, land, and

Surface Observation Data
To study the land-atmosphere interaction in the Black Gobi area, the Institute of Desert Meteorology, China Meteorological Administration had established a land-atmospheric interaction observation station in Hongliuhe Gobi in May 2016 (hereafter HLH). The HLH station (41 • 32 N, 94 • 43.8 E, at 1579 m above sea level) includes a 35 m gradient tower observation system, eddy covariance system, and solar radiation observation system. The solar radiation observation system includes a pyranometer (SR20, Hukseflux, The Netherlands) for measuring downward short-wave solar radiation, upward short-wave solar radiation, and scatter solar radiation, pyrgeometers (IR20, Hukseflux, The Netherlands) for measuring downward and upward infrared radiation, and pyrheliometers (DR02, Hukseflux, The Netherlands) for measuring direct solar radiation. In addition, it is also equipped with sun trackers (STR-22G, EKO, Japan), UV radiometer (UVS-AB-T, Kipp&Zonen, The Netherlands), net radiometer (NR01, Hukseflux, The Netherlands), and sunshine duration sensors (CSD3, Kipp&Zonen, The Netherlands). The data from these sensors are recorded continuously by a data logger (CR6, Campbell, CA, USA) at 1 Hz intervals and averaged at 10 s, 1 min, 20 min, 1 h, and 1 d intervals. Excluding the evident outliers in the measured data, the total solar radiation was less than 20 W·m −2 and the corresponding reflected short-wave radiation. In the calculation of the average value, short-wave radiation is the daytime average, while long-wave radiation and net radiation are the whole day averages. The observation data used in this study is for the year 2018, but more than 2/3 of the long-wave radiation data from 19 May to 22 June 2018 were missing; hence it was excluded.

Satellite Data
The CERES satellite data were obtained from the Atmospheric Science Data Center of the National Aeronautics and Space Administration Langley Research Center (http://ceres. larc.nasa.gov/ (accessed on 12 September 2021)) with a temporal resolution of 1 h and a spatial resolution of 1 • . This dataset provided the monthly, 3 h, and 1 h averaged direct and scattered radiation, ultraviolet, short-wave and long-wave fluxes, albedo products, aerosol optical thickness, cloud optical thickness, ozone, and other auxiliary parameters. The detailed instructions can be found on the official website.

Reanalysis Dataset
The ERA5 is a fifth-generation atmospheric reanalysis of the global climate from the ECMWF (https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5 (accessed on 12 September 2021)), covering the hourly meteorological data of the atmosphere, land, and ocean around the world from 1979 to the present day, with a spatial resolution of 0.25 • * 0.25 • . It uses a more advanced four-dimensional vibrational (4D-Var) assimilation method to provide higher-resolution data [22]. In this study, the net long-wave and shortwave radiation flux data of the ERA5 in 2018 were used.
The MERRA2 provides the global atmospheric reanalysis data from 1980, with a temporal resolution of 1 h and spatial resolution of 0.625 • * 0.5 • (https://gmao.gsfc.nasa. gov/reanalysis/MERRA-2/ (accessed on 12 September 2021)). It uses an upgraded version of the fifth edition of the Goddard Earth Observation System Model data assimilation system to replace the original MERRA reanalysis data and uses bilinear interpolation to spatially interpolate abundant global meteorological variables [23].
The JRA-55 is a new-generation reanalysis dataset provided by the Japan Meteorological Agency based on the JRA-25 (https://jra.kishou.go.jp/JRA-55/index_en.html (accessed on 12 September 2021)). It is the first comprehensive reanalysis dataset to apply the 4D-Var assimilation method since the ECMWF 45-year reanalysis. The JRA-55 includes the global reanalysis data for half a century since 1958 with a spatial resolution of 1.25 • . The temporal resolution of the prediction dataset and the observation dataset are 3 h and 6 h, respectively. The detection data used include conventional data and remote sensing data. The upward and downward long and short radiation flux data used in this study were the prediction datasets with a temporal resolution of 3 h. The detailed information can be found in the user manual of the JRA-55 product.

Data Processing
Before the calculation, the evident outliers such as the short-wave radiation at night, albedo higher than 0.85, garbled or missing data, and decimal point drift were removed. Then the net radiometer was used to interpolate and complete the long-term continuous missing data to ensure data continuity. The daily average observation data was calculated by using the surface observation radiation data with an interval of 30 min at the HLH station in 2018. The daily average data of the CERES, ERA5, and MERRA2 were calculated from the hourly data, and the zero values were removed in the short-wave radiation. When evaluating the accuracy of the JRA-55 radiation data, the 3 h average of surface data was calculated because of its temporal resolution (3 h). Due to instrument malfunction, the longwave radiation data for a total of 35 d from 19 May to 22 June were garbled, and thus, they were excluded. To test the reliability of the satellite and reanalysis data, we evaluated the correlation between the two different data and observations by calculating their correlation coefficients (r) with the observation data. We also conducted a significance test at a significance level of 0.01. The mean bias (MB) was used to measure the difference between the estimated and observed data and the root mean square error (RMSE) represented the degree of dispersion between estimation and observation. The relative mean bias (rMB) was used to quantify the difference between estimation and observation, and to reflect the credibility of the estimated data with better accuracy [24].
To match the data under different cloud conditions and cloud coverage, we used the cloud area fraction (CAF) parameter in the CERES product to classify the satellite data. CAF < 20% was set as sunny days, 30% < CAF < 50% was partly cloudy, 60% < CAF < 80% was cloudy, and CAF > 80% was overcast [25].
where F O is the surface observed data, F e is the estimated data set, USW is the surface upward short-wave radiation, and DSW is the surface downward short-wave radiation. Similarly, the surface upward long-wave radiation and downward long-wave radiation are abbreviated as ULW and DLW, respectively. The clearness index (k t ) is an important parameter used to evaluate the accuracy of datasets to predict the cloud fraction of solar radiation. The k t is the ratio of the total solar radiation S (W·m −2 ) received by the Earth's surface at a certain solar altitude angle to the total solar radiation S e (W·m −2 ) received above the atmosphere parallel to the Earth's surface [26]. The clearness index can be used to measure cloud attenuation and the effects of surface albedo [27]. Cloud fraction and aerosol are important factors in predicting solar radiation, and thus, k t is often used to assess the performance of the model [28]. Based on the values of k t , we defined clear sky days (k t > 0.7) and cloudy days (0.7 < k t < 0.3).
where S sc is the solar constant (1370 W·m −2 ), t d is the day number, and β is the solar altitude angle. This index reflects the impact of cloudiness variation on solar radiation: values close to zero indicate cloudy conditions and little solar radiation, while values close to one indicate clear sky conditions with no clouds and strong solar radiation. Because of their simplicity and low computational cost, bias correction methods are often used in climate research as global climate model simulation datasets increase [29,30]. With the deepening of research [31][32][33], different methods are widely being used in postprocessing climate predictions [34,35]. Among them, linear adaptive bias correction is a well-known method that can adjust the model to obtain radiation data according to the measured data of stations. The method eliminates bias by using the slope (c) and gradient (m) of the best-fitting equation (y d = mx g + c) between the measured radiation and radiation from the reanalysis datasets. The zero-bias correction time series was calculated using Equation (8), where, y cd , y d , and x g represent the corrected radiation of the reanalysis datasets, radiation of datasets, and observed radiation, respectively.

Conventional Analysis of Radiation
The annual variations in monthly average data for each radiation component in 2018 are shown in Figure 2. The annual variations are distributed in an inverted "U" shape. The ranking of the overall radiation intensities was DSW > ULW > DLW > USW. The DSW was the largest in May and the smallest in December, with a peak value of 1104.30 W·m −2 . The distribution range of USW was relatively small, and it was the highest in December, with the maximum value occurring on 9 February (484.96 W·m −2 ). This is because the surface is covered with snow during winter, during which the surface albedo is high. The annual variation in the monthly averaged ULW was similar to that of DSW. According to the existing data, they were the highest in July and the lowest in December as both are closely related to the surface temperature. DLW is mainly related to the atmospheric state. According to the weather phenomenon records at the HLH station, the sky cloud cover is relatively large in summer (June, July, and August) in this area, s; therefore, in summer, the DLW is high and the DSW is low.
closely related to the surface temperature. DLW is mainly related to the atmospheric state. According to the weather phenomenon records at the HLH station, the sky cloud cover is relatively large in summer (June, July, and August) in this area, s; therefore, in summer, the DLW is high and the DSW is low.

Applicability Evaluation of the CERES Satellite Data
In this study, we calculated the daily average values of the CERES satellite data and ground-based observed radiation component data and quantitatively analyzed the accuracy of the satellite data at the HLH. The results are shown in Figure 3 and Table 1. The correlation coefficient of the ULW between satellite data and observation was only 0.48, and the correlation coefficients of the other radiation components were all above 0.97. The MBs of the ULW and DLW were −4.78 W·m −2 and −9.81 W·m −2 , respectively, and the rMBs were −1.32% and −3.71%, respectively. According to Equation (2), this result shows an overestimation of long-wave radiation from the satellite data. The MB and rMB of the DSW were 35.34 W·m −2 and 8.51%, respectively, which shows an underestimation in satellite data. The estimation accuracy of satellite data for long-wave radiation is greater than that for short-wave radiation.

Applicability Evaluation of the CERES Satellite Data
In this study, we calculated the daily average values of the CERES satellite data and ground-based observed radiation component data and quantitatively analyzed the accuracy of the satellite data at the HLH. The results are shown in Figure 3 and Table 1. The correlation coefficient of the ULW between satellite data and observation was only 0.48, and the correlation coefficients of the other radiation components were all above 0.97. The MBs of the ULW and DLW were −4.78 W·m −2 and −9.81 W·m −2 , respectively, and the rMBs were −1.32% and −3.71%, respectively. According to Equation (2), this result shows an overestimation of long-wave radiation from the satellite data. The MB and rMB of the DSW were 35.34 W·m −2 and 8.51%, respectively, which shows an underestimation in satellite data. The estimation accuracy of satellite data for long-wave radiation is greater than that for short-wave radiation.    The estimation accuracy of the satellite data for the USW was inferior to that of the other radiation components. To determine the reason for this, the surface-reflected radiation is separately presented in Figure 4a. The ground-based observed surface-reflected radiation was generally greater than the CERES satellite-estimated data. After cross-checking the data, we found that there was snow cover on the surface. Therefore, the surface snow cover data were excluded and then linearly fitted. The results are shown in Figure 3b. The correlation coefficient between the surface observed and satellite-retrieved surface-reflected radiation increased to 0.82 and the deviation and RMSE were reduced.

Applicability Evaluation of ERA5 and MERRA2 Reanalysis Datasets
In radiation balance, DSW is the main source of surface energy. The net short-wave radiation (NSW) is defined as the difference between DSW and USW [36]. Similarly, the difference between ULW and DLW is the effective radiation (net long-wave radiation-NLW), which is an important form of energy exchange between the Earth and the atmosphere and an important part of the surface radiation balance [37,38]. To evaluate the accuracy of the reanalysis data, we used observed radiation components to calculate the daily average values of NSW and NLW and evaluated the applicability of the ERA5 and MERRA2 data at the HLH. Figure 5 shows that the two reanalysis datasets are both closed to the ground-based observation data. Overall, the observation data represented by the black curve is larger than the two datasets, showing the underestimation of NLW and NSW in the reanalysis data. However, the estimated NSW in December was notably higher than the observation. This was because of snow cover (similar as in the case for the CERES algorithm) and the snow cover of the underlying surface in this area was not considered enough. The comparison and analysis of the correlation and deviation between the ground-based observations and the daily average values of NLW and NSW of ERA5 and MERRA2 are shown in Figure 6. Compared with the MERRA2, the ERA-5 has a better correlation with the observation. The correlation coefficients of NLW and NSW between the ERA-5 and the observations were 0.92 and 0.91, respectively. The correlation coefficients for MERRA2 were both 0.88. According to Table 2, the MBs of the two reanalysis datasets and the observation datasets are equivalent. The rMB of NSW is within 5%, and the rMB of NLW is less than 10%. In the figure, the dispersion of the ERA5 data to the fitted line is lower. Therefore, compared with the MERRA2, the applicability of the ERA-5 NLW and NSW is better at HLH.

Applicability Evaluation of ERA5 and MERRA2 Reanalysis Datasets
In radiation balance, DSW is the main source of surface energy. The net short-wave radiation (NSW) is defined as the difference between DSW and USW [36]. Similarly, the difference between ULW and DLW is the effective radiation (net long-wave radiation-NLW), which is an important form of energy exchange between the Earth and the atmosphere and an important part of the surface radiation balance [37,38]. To evaluate the accuracy of the reanalysis data, we used observed radiation components to calculate the daily average values of NSW and NLW and evaluated the applicability of the ERA5 and MERRA2 data at the HLH. Figure 5 shows that the two reanalysis datasets are both closed to the ground-based observation data. Overall, the observation data represented by the black curve is larger than the two datasets, showing the underestimation of NLW and NSW in the reanalysis data. However, the estimated NSW in December was notably higher than the observation. This was because of snow cover (similar as in the case for the CERES algorithm) and the snow cover of the underlying surface in this area was not considered enough. The comparison and analysis of the correlation and deviation between the ground-based observations and the daily average values of NLW and NSW of ERA5 and MERRA2 are shown in Figure 6. Compared with the MERRA2, the ERA-5 has a better correlation with the observation. The correlation coefficients of NLW and NSW between the ERA-5 and the observations were 0.92 and 0.91, respectively. The correlation coefficients for MERRA2 were both 0.88. According to Table 2, the MBs of the two reanalysis datasets and the observation datasets are equivalent. The rMB of NSW is within 5%, and the rMB of NLW is less than 10%. In the figure, the dispersion of the ERA5 data to the fitted line is lower. Therefore, compared with the MERRA2, the applicability of the ERA-5 NLW and NSW is better at HLH.

Applicability Evaluation of the JRA-55 Reanalysis Dataset
In order to evaluate the accuracy of the JRA-55 at HLH, we calculated the 3 h average of the ground-based observation data and compared it with the JRA-55. As shown in Figure 7, the correlation coefficient of USW between JRA-55 and the observation was 0.84, and the correlation coefficients of the other radiation components were all greater than 0.93. Upon calculating the MB (Table 3), the biases of ULW and DLW between JRA-55 and observations were 20.98 W·m −2 and 10.42 W·m −2 , respectively, and the biases of the USW and DSW were −66.76 W·m −2 and −26.03 W·m −2 , respectively. Overall, the JRA-55 underestimated the long-wave radiation and overestimated the short-wave radiation. In JRA-55, except for the USW, the rMB of other radiation components was less than 10%. However, according to the scatter plot and the RMSE, the short-wave radiation had a high dispersion to the fitted line, and the RMSEs were 98.47 W·m −2 and 93.04 W·m −2 . Therefore, the radiation flux estimated by ERA5 has the best applicability in HLH on an annual scale. In order to evaluate the accuracy of the JRA-55 at HLH, we calculated the 3 h average of the ground-based observation data and compared it with the JRA-55. As shown in Figure 7, the correlation coefficient of USW between JRA-55 and the observation was 0.84, and the correlation coefficients of the other radiation components were all greater than 0.93. Upon calculating the MB (Table 3), the biases of ULW and DLW between JRA-55 and observations were 20.98 W·m −2 and 10.42 W·m −2 , respectively, and the biases of the USW and DSW were −66.76 W·m −2 and −26.03 W·m −2 , respectively. Overall, the JRA-55 underestimated the long-wave radiation and overestimated the short-wave radiation. In JRA-55, except for the USW, the rMB of other radiation components was less than 10%. However, according to the scatter plot and the RMSE, the short-wave radiation had a high dispersion to the fitted line, and the RMSEs were 98.47 W·m −2 and 93.04 W·m −2 . Therefore, the radiation flux estimated by ERA5 has the best applicability in HLH on an annual scale.

Accuracy Evaluation of Reanalysis Datasets under Different Cloud Covers
The reflection and scattering of clouds, scattering and absorption of aerosols, absorption of water vapor, scattering of atmospheric molecules, and absorption of gases weaken solar radiation [39]. The contribution of cloud cover to the deviation of estimated solar radiation is more notable than that of aerosol and water vapor content [40]. Based on the CERES CAF data, the cloud cover was classified into four categories: sunny (CAF < 20%), partly cloudy (30% < CAF < 50%), cloudy (60% < CAF < 80%), and overcast (CAF > 80%). Then, the deviations of the ERA5, MERRA2, and JRA-55 radiation data from the observations under different cloud covers were analyzed on a daily scale.
It can be seen from Table 4 that the bias of NSW between the ERA5 and the observation under different cloud cover conditions ranged from 0.76% to 6.06%, and the bias of NLW ranged from −4.22% to −16.88%. The bias of NSW between the MERRA2 and the observation under different cloud cover conditions ranged from 1.21% to 3.67%, and the bias of NLW ranged from −3.80% to −9.69%. Moreover, the bias of NLW was the smallest under the partly cloudy conditions and the largest under the overcast conditions. The bias of NSW was the smallest under the sunny condition and the largest under partly cloudy conditions. To match the temporal resolution of the JRA-55 dataset, a 3 h average was performed on the CERES cloud cover data, and then the biases between the JRA-55 and the surface observation radiation under different cloud cover conditions were evaluated. It can be seen from Table 4 that the bias of the estimated DSW in the JRA-55 dataset under different cloud cover conditions ranged from −2.94% to −11.01%, and it was the smallest under sunny conditions and the largest under the overcast conditions. The bias of the estimated DLW ranged from 3.11% to 4.88%, it was the smallest under sunny conditions and the largest under overcast conditions. The bias of the estimated ULW ranged from 4.51% to 6.41%, and it was the smallest under cloudy conditions and the largest under overcast conditions. The bias of the estimated USW was the largest, ranging from −61.89% to −71.96%.

Comparison of Clearness Index
The prediction of the JRA-55, ERA5, CERES, and MERRA2 were all poor during the autumn and winter, and the main reason was the cloud score. The cloud score was quantified as k t . Table 5 shows the k t and statistical parameters for the comparison of the observed and reanalysis datasets. The positive values of RMSE, MB, and rMB represent an overestimation of k t by the dataset and vice versa. The prediction error of k t for the JRA-55 data was the smallest, indicating better consistency between the predicted and observed values during spring (r = 0.75, rMB = 0.6%, MB = 3.98 W·m −2 , RMSE = 26.33 W·m −2 ) and summer (r = 0.85, rMB = 4.2%, MB = 2.52 W·m −2 , RMSE = 17.36 W·m −2 ). The r of k t for all reanalysis datasets was poor during autumn and winter. However, the error in the ERA5 dataset was small. It may be because the ERA5 dataset accurately considers the cloud cover in the calculation of the radiation transmission model.
The regression plots for k t of the reanalysis data are shown in Figure 8. The values of k t for clear sky were underestimated with more scatter for all datasets, and the least was for ERA5. For the partly cloudy days, k t values were overestimated with less scatter and the least is for JRA55.
The prediction of the JRA-55, ERA5, CERES, and MERRA2 were all poor during the autumn and winter, and the main reason was the cloud score. The cloud score was quantified as kt. Table 5 shows the kt and statistical parameters for the comparison of the observed and reanalysis datasets. The positive values of RMSE, MB, and rMB represent an overestimation of kt by the dataset and vice versa. The prediction error of kt for the JRA-55 data was the smallest, indicating better consistency between the predicted and observed values during spring (r = 0.75, rMB = 0.6%, MB = 3.98 W·m −2 , RMSE = 26.33 W·m −2 ) and summer (r = 0.85, rMB = 4.2%, MB = 2.52 W·m −2 , RMSE = 17.36 W·m −2 ). The r of kt for all reanalysis datasets was poor during autumn and winter. However, the error in the ERA5 dataset was small. It may be because the ERA5 dataset accurately considers the cloud cover in the calculation of the radiation transmission model.
The regression plots for kt of the reanalysis data are shown in Figure 8. The values of kt for clear sky were underestimated with more scatter for all datasets, and the least was for ERA5. For the partly cloudy days, kt values were overestimated with less scatter and the least is for JRA55.

Bias Correction by Linear Adaptation
Deviation may occur in the reanalysis dataset, resulting in an overall overestimation or underestimation of the radiation value, which makes the prediction unreliable. Therefore, it is important to correct the deviation. Linear adaptation of Equation (8) was used as a correction method to optimize the reanalysis dataset.

Bias Correction by Linear Adaptation
Deviation may occur in the reanalysis dataset, resulting in an overall overestimation or underestimation of the radiation value, which makes the prediction unreliable. Therefore, it is important to correct the deviation. Linear adaptation of Equation (8) was used as a correction method to optimize the reanalysis dataset.
The statistical parameters of the radiation data of the reanalysis dataset after deviation correction are listed in Table 6. The r, RMSE, MB, and rMB after offsetting bias correction for the net long-wave and short-wave radiation of the HLH station are presented in Table 6. The corrected ERA5 data perform best for net short-wave radiation

Discussion
Reanalysis datasets (JRA-55, ERA5, and MERRA2) and satellite data from CERES were evaluated with in situ observations over the Gobi region of Xinjiang, China. We evaluated these observations for each radiation component.
We found that the annual variation of the radiation component in the Gobi area presents an inverted U-shaped distribution, and the overall radiation intensity ranking was DSW > ULW > DLW > USW, which was consistent with the results of Niu et al. [41]. In addition, CERES satellite data showed an overestimation of long-wave radiation, and a similar overestimation for long-wave radiation has been reported by Romano et al. [42]. However, CERES data underestimates the downward short-wave radiation; the average deviation and relative average deviation of the predicted downward short-wave radiation were 35.34 W·m −2 and 8.51%, respectively. The estimation accuracy of CERES for longwave radiation was superior to that of short-wave radiation. One of the reasons for the satellite estimation accuracy for USW is surface snow in winter, which causes albedo and USW to increase. It was speculated that the CERES algorithm does not include a scheme of the underlying snow conditions [43,44], resulting in a serious underestimation of USW in this region.
The radiation datasets of the ERA-5 and MERRA2 were similar to the ground-based measured data; the correlation coefficients were all above 0.88. The net short-wave radiation in winter was estimated to be higher, and it was speculated to be due to snow cover (similarly to that of the CERES algorithm); in both cases, snow cover was not considered. Upon comparing the applicability of the net long-wave and short-wave radiation data for ERA-5 and MERRA2 over HLH, we found that ERA-5 has better applicability, which is consistent with the verification results of the Chen et al. [45]. However, the results of Yi [46] showed that the MERRA2 (with a small positive bias: <1 MJ·m −2 ·day −1 ) presents over some arid areas better statistics of daily short-wave (SWrad) compared to other satellite datasets. Wang [47] found that the MERRA2 has the smallest standard deviation of differences (σ d ) of upward short-wave radiation (σ d = 1.00) and downward long-wave radiation (σ d = 1.50) in the Tibetan Plateau. You [48] reported the surface radiation of the Tibet Plateau will be affected by elevation data and land cover. Therefore, the strong heterogeneity of the topography of the Gobi and the plateau leads to the difference in surface radiation between the two.
In addition, the JRA-55 underestimates the long-wave radiation and overestimates the short-wave radiation. The deviations between the upstream and downstream long-wave radiation were 20.98 W·m −2 and 10.42 W·m −2 , respectively, and the deviations between the upward and downward short-wave radiation were −66.76 W·m −2 and −26.03 W·m −2 , respectively. Furthermore, Peng [49] analyzed the error of JRA-55 solar radiation data on a global scale, and the result showed an overestimation (22.61 W·m −2 ) on an annual scale, which was consistent with the results from this study. According to the deviation of the ERA5, MERRA2, and JRA-55 reanalysis data under different cloud cover on a daily scale, it can be seen that for ERA5 and MERRA2 data, NLW is the smallest under low cloud conditions (rMB of ERA: −5.42%, rMB of MERRA2: −3.80%) and the largest under cloudy conditions (rMB of ERA: −16.88%, rMB of MERRA2: −9.69%), and the smallest under NSW cloudy conditions (rMB of ERA: 0.76%, rMB of MERRA2: 2.04%) and the largest in the less cloud state (rMB of ERA: 6.06%, rMB of MERRA2: 3.67%).
The k t values of all datasets were poor during autumn and winter. The JRA-55 had the best prediction effect, and the sunny or cloudy weather predicted by the reanalysis of the data may be contrary to the actual situation, which was consistent with the results of Alexandre [50] and Zia [51]. Finally, this study optimized each dataset by the deviation correction method of linear equation adaptation and showed that the order of increasing overall accuracy of reanalysis dataset prediction was JRA-55 (r = 0.94, RMSE = 33. . The above research shows that the radiative transfer model of all reanalysis datasets needs to be improved according to the cloud cover and aerosol of different regions to better estimate radiation [52]. In general, the radiant flux estimated by the ERA-5 reanalysis dataset has the best applicability in the Gobi area on an annual scale. The results from a study of Pakistan's Baluchistan area (average sunshine duration is 8-8.5 h), which is a region that experiences strong sunlight, also showed that the radiation dataset of ERA5 has the best prediction results in this area [53]. In another study, it was found that the radiation of ERA5 shows moderate errors with an average absolute deviation of the global monthly average level of irradiance of 6.8 W·m −2 , while other satellite images are considerably larger in highlatitude locations [54].

Conclusions
The annual variations of the various radiation components at the HLH in 2018 were distributed in an inverted "U" shape. The overall radiation intensities were ranked in the order DSW > ULW > DLW > USW. The USW was the highest during the snow cover period in winter. Due to the influence of cloud cover, the DSW was the highest in May while the DLW and ULW were the highest in July.
The CERES dataset displayed an overestimation of the surface short-wave radiation and an underestimation of the surface long-wave radiation. The estimation accuracy for the DSW, DLW, and ULW was high, and their correlation coefficients with the surface observation data were above 0.97; while the estimation accuracy for the USW was low, and the correlation coefficient was only 0.48. According to the albedo data (0.51-0.63), it was speculated that this was because the snow conditions of the underlying area were not considered enough in the algorithm.
The ERA5 and MERRA2 reanalysis datasets showed an overestimation of the NLW and NSW in the HLH region. In contrast, the performance of the ERA5 dataset was slightly better, and its correlation coefficients with the surface observation data for NSW and NLW were 0.91 and 0.92, respectively. In addition, the performances of the two reanalysis datasets under different cloud cover conditions were different. For ERA5, the rMB of NSW was the smallest under overcast conditions and the largest under partly cloudy conditions (6.06%), while the rMB of NLW was the smallest under sunny conditions and the largest under overcast conditions (−16.88%). However, for MERRA2, the rMB of NSW was the smallest under sunny conditions and the largest under partly cloudy conditions (3.67%); while the rMB of NLW was the smallest under partly cloudy conditions and the largest under overcast conditions (−9.69%).
The correlation coefficients of DSW, DLW, and ULW between the JRA-55 reanalysis dataset and the observations are all above 0.93, and the correlation coefficient of the USW was 0.84. JRA-55 underestimated short-wave radiation and overestimated long-wave radiation. In the JRA-55 dataset, the MB of DSW was the smallest under sunny conditions and the largest under overcast conditions (11.01%). The MB of DLW was similar to that of DSW, and it was the largest under overcast conditions (4.88%). The MB of ULW was large, with it being the smallest under cloudy conditions and the largest under overcast conditions (6.41%). The MB of USW was the largest, ranging from −61.89% to −71.96% under different cloud cover conditions.
The k t of the JRA-55 dataset had the best prediction effect, indicating better agreement between the predicted and observed values during spring (r = 0.91 and rMB = 0.6) and summer (r = 0.91 and rMB = 0.6).
Overall, the ERA5 dataset had the best estimation accuracy on the surface observation among all the reanalysis datasets and better applicability in the Gobi area. Future research will be directed at improving the radiation transmission model of the reanalysis data set according to the cloud cover and aerosol in different regions to effectively estimate the radiation.