Evaluation of Direct Horizontal Irradiance in China Using a Physically-Based Model and Machine Learning Methods

Accurate estimation of direct horizontal irradiance (DHI) is a prerequisite for the design and location of concentrated solar power thermal systems. Previous studies have shown that DHI observation stations are too sparsely distributed to meet requirements, as a result of the high construction and maintenance costs of observation platforms. Satellite retrieval and reanalysis have been widely used for estimating DHI, but their accuracy needs to be further improved. In addition, numerous modelling techniques have been used for this purpose worldwide. In this study, we apply five machine learning methods: back propagation neural networks (BP), general regression neural networks (GRNN), genetic algorithm (Genetic), M5 model tree (M5Tree), multivariate adaptive regression splines (MARS); and a physically based model, Yang’s hybrid model (YHM). Daily meteorological variables, including air temperature (T), relative humidity (RH), surface pressure (SP), and sunshine duration (SD) were obtained from 839 China Meteorological Administration (CMA) stations in different climatic zones across China and were used as data inputs for the six models. DHI observations at 16 CMA radiation stations were used to validate their accuracy. The results indicate that the capability of M5Tree was superior to BP, GRNN, Genetic, MARS and YHM, with the lowest values of daily root mean square (RMSE) of 1.989 MJ m−2day−1, and the highest correlation coefficient (R = 0.956), respectively. Then, monthly and annual mean DHI during 1960–2016 were calculated to reveal the spatiotemporal variation of DHI across China, using daily meteorological data based on the M5tree model. The results indicated a significantly decreasing trend with a rate of−0.019 MJ m−2 during 1960–2016, and the monthly and annual DHI values of the Tibetan Plateau are the highest, while whereas the lowest values occur in the southeastern part of the Yunnan−Guizhou Plateau, the Sichuan Basin and most of the southern Yangtze River Basin. The possible causes for spatiotemporal variation of DHI across China were investigated by discussing cloud and aerosol loading.


Introduction
Solar energy is regarded as a clean, renewable, sustainable and environmentally-friendly energy source for life on Earth [1].With the rapid economic development of China, large amounts of fossil fuel have been burned in recent decades, which have caused serious environmental pollution [2].Moreover, the concentration of atmospheric greenhouse gases continues to increase unless the global consumption of fossil fuel drops sharply [3].To address the problem of air pollution, solar electrical energy applications converting solar energy into heat and electricity have been developed rapidly in China in recent years.For example, since 2016, China has had the largest solar power generation capability, with an installed capacity of 77.42 GW [4].Therefore, an accurate estimation of solar radiation is crucial for site resource analysis, system design and plant operation [5].Solar radiation consists of two parts: direct radiation and diffuse radiation, which determine areas with potential for solar power generation, and thus have a profound influence on the solar photovoltaic (PV) industry [6].There are two main types of solar energy systems: concentrated solar power thermal systems (CSP) and photovoltaic (PV) systems, which use the direct radiation and both the direct and diffuse components (global radiation), respectively.Currently, solar radiation is measured mainly using four methods: solar radiation retrieval from satellite observations, reanalysis data, simulations based on general circulation models, and direct measurements at the surface [7].Several surface-radiation measurement networks have been established worldwide; for example the Baseline Surface Radiation Network (BSRN), which provides measurements of global radiation of high accuracy and high-temporal resolution [8]; and the Global Energy Balance Archive (GEBA), which compiles monthly global radiation data from more than 2500 stations worldwide [9].However, because of their sparse and heterogeneous distribution, these networks are insufficient for deriving estimates of global radiation (including direct radiation) from surface observations alone.Remote sensing provides an alternative method for retrieving spatiotemporally continuous solar radiation values; for example, the Global Energy and Water Cycle Experiment-Surface Radiation Budget (GEWEX-SRB) provides solar radiation products at a 3 h temporal resolution and 1 • spatial resolution [10,11], as does the International Satellite Cloud Climatology Project-Flux Data (ISCCP-FD) at 3 h intervals and 2.5 • spatial resolution [12].However, they are limited because of the relatively short historical record, and in addition, their accuracy needs further improvement.Reanalysis datasets are another feasible means of producing global radiation products and have a relatively high temporal resolution.For example, the National Center for Environmental Prediction−National Centre for Atmospheric Research (NCEP−NCAR) reanalysis is a global reanalysis from 1948 to near-present, with a temporal resolution of 6 h and spatial resolution of 1.9 • [13,14]; in addition, the Japan Meteorological Agency (JMA) conducted the second Japanese global atmospheric reanalysis, called the Japanese 55-year reanalysis or JRA-55, which covers the period from 1958 to 2013, and has a 3 h temporal resolution and spatial resolution of 0.56 • [15].Unfortunately, however, in comparison with ground measurements, there is a larger positive bias in current reanalysis products [16].Furthermore, all the three above-mentioned sources mainly provide global radiation estimates and there is a lack of products for direct radiation.However, direct radiation has great potential for the application of CSP electricity production across China and, therefore, it is of vital importance to develop models to estimate and map direct radiation to maximize the efficiency of installations of solar power plants using CSP.
Many methods have been developed to estimate solar radiation [17][18][19][20][21], and they can be divided into three principal categories: empirical parameterization models, physical models and data-driven models [22].Empirical parameterization schemes are designed to indirectly estimate solar radiation from routine meteorological variables (mainly temperature, sunshine duration and cloud data) [23][24][25][26][27][28][29].For example, Yao [30] proposed a new high-quality sunshine duration model to estimate daily global solar radiation, and the results were shown to be more accurate than those of the Lewis model.In addition, Hassan [31] proposed 20 different ambient temperature-based models for estimating the monthly average daily global solar radiation; the performance of 12 of the new models was superior to that of the three models selected from the literature.However, because no physical principles were considered explicitly in these estimation schemes, the calibration parameters often vary between sites, which limits the general applicability of the models, and may lead to large uncertainties in solar radiation estimates at uncalibrated sites.Physical models provide an effective means of estimating solar radiation with high accuracy; for example, a clear-sky broadband radiative transfer model was developed by Yang [32,33], which was shown to be one of the best broadband models in terms of accuracy and robustness.In addition, Qin [7] developed an efficient physically-based parameterization scheme to derive surface solar irradiance in China and the USA; the results showed that the model can effectively retrieve surface solar irradiance, with a root mean square error (RMSE) of 35 Wm −2 , on a daily basis.In comparison, data-driven models are more concise in that they do not explicitly incorporate physical principles and do not require prior assumptions about the underlying data.In other examples, Deo and Sahin [22] forecast solar radiation using an ANN model in Queensland.Celik [34] optimized the performance of an ANN model to provide an efficient estimation of solar radiation for the eastern Mediterranean region of Turkey.
However, most studies estimating solar radiation have focused on global radiation, and few on estimating direct radiation [35].Nevertheless, there has been some success in estimating direct radiation worldwide using various methods.For example, Bertrand [36] evaluated decomposition models of varying degrees of complexity to estimate direct solar irradiance from Meteosat second generation images over Belgium; Rosales [37] proposed an analytical and numerical model to simulate the interactions of direct solar radiation.Gueymard [38,39] compared four models (CPCR2, MLWT2, REST and Yang) and concluded that the newly developed MLWT2 model provides the best performance in all direct solar radiation estimation tests.Mellit et al. [5] developed an adaptive model for predicting hourly DHI and a good agreement between measured and predicted data was obtained.However, there are very few published studies about the estimation of direct radiation in China.Although Chen et al [40] developed twenty satellite-based MOD08-M3 atmospheric product and proposed the best site-specific models for DHI in China; Tang [35] made the first attempt to construct direct solar radiation data sets for China based on a physical model.It is necessary to apply and compare different modeling techniques for the simulation of direct radiation.In addition, there have been very few studies focusing on the analysis and comparison of the patterns of variation within different climatic zones.
From the above, it can be concluded that there is an increasing need for accurate and reliable daily direct radiation data for solar energy applications using various methods within different climatic zones in China.Therefore, we conducted a comparative study of the spatiotemporal performance of six selected direct radiation models in different climate zones of China.The model with the highest degree of accuracy was used to demonstrate the temporal variation of annual and monthly mean direct radiation, and the spatial distribution of annual mean sum of potential CSP electricity production in different climatic zones across China.Our study is the first assessment of various direct radiation models in different ecosystems in China.

Observation Data
Figure 1 shows the location of 16 DHI stations and 839 CMA stations across China.Daily meteorological measurements of 839 CMA stations were used to estimate DHI values.The meteorological elements of these measurements were sunshine duration (SD), relative humidity (RH), air temperature (T) and surface pressure (P).All

Temperature and Humidity Zones
Figure 2 shows the respective distribution of temperature and humidity across China.The temperature and humidity data were obtained from the website: http://www.resdc.cn[41].

Back Propagation Neural Network
BP is a supervised learning algorithm proposed by D.E.Rumelhart in 1986 [42].The BP neural network is a three-layer structure that includes an input layer, an intermediate layer, and an output layer; the layers are interconnected but the nodes of each layer are not connected.In this study, for establishing the DHI calculation model, the basic structure of the BP neural network is 6-10-1.The air temperature (T), air pressure (PS), day number (D), relative humidity (RH), sunshine duration (SD), and altitude (A) are the input variables; the number of nodes in the hidden layer is 10; and one output variable of the model is the value of the daily DHI.
For the whole study period, a total of 70% of the datasets were modeled using the BP model, and the remaining datasets were used to evaluate it.The DHI values were calculated using the following equation: where M i is the estimated DHI, Z(.) is the hidden transfer function, w i (t) is the weight coefficient, x i (t) are the input variables and b is the neuronal bias.

General Regression Neural Network (GRNN)
A GRNN model is a type of radial basis function neural network (RBF).The classical generalized regressive neural network is a type of radial basis neural network [43].It has a strong nonlinear mapping ability and flexibility, and the network structure has a high degree of fault tolerance and robustness.GRNN consists of a four-layer network, this is, the input layer, the mode layer, the summation layer and the output layer.Qin et al. [44] and Wang et al. [17] provide detailed information on the steps required for estimating DHI using the GRNN model using meteorological data.

Genetic Algorithm (Genetic)
Genetic is a metaheuristic proposed by Holland in 1969 [45], which consists of a class of evolutionary algorithms analog summarized by DeJong et al. [46] and Goldberg et al. [47].Genetic is a type of self-organizing simulation of the natural process of biological evolution and the mechanisms needed to solve the problems of extreme values and is based on adaptive artificial intelligence technology.We used BP neural networks to determine the weights and thresholds of the optimization needed to improve the accuracy of the model for predicting DHI values.Based on the basic structure of BP neural network 6-10-1 above, therefore, when the genetic algorithm initializes the random population, the number of weights is 70 (6 × 10 + 10 × 1) and the number of thresholds is 11 (10 + 1).The coding length of Genetic for estimating the daily DHI is 81 (70 + 11).A detailed explanation of Genetic used to estimate DHI is given in Wang et al. [17].

M5 Model Tree (M5Tree)
M5Tree was based on a binary decision tree with linear regression functions at the terminal (leaf) nodes [48], which was first developed by Quinlan [49].This model is based on the classification tree to create a relationship between the dependent and independent variables.Evaluating with M5Tree generally consists of three steps [50,51]: (1) divide the data into subsets to create a decision tree; (2) generate a model tree based on the decision tree; and (3) construct a linear regression model.The schematic diagram of M5Tree can be obtained in Wang et al. [52].The standard deviation reduction (SDR) can be expressed by the following formula: Energies 2019, 12, 150 where E is the set of examples of reaching the node, SD is the standard deviation.

Multivariate Adaptive Regression Splines (MARS)
MARS is a form of non-parametric regression technique [53].The MARS consecutive relevant prediction model utilizes a set of independent variables or variable value predictors, and generally operates without assuming a functional relationship between the dependent and independent variables [54].The general MARS model equation for estimating DHI values is expressed as follows: where V is the estimated DHI values, which is a function of the input parameters (RH, T, PS, SD, A and day number); α is the intercept parameter; β m is the weight of the input parameters; and h m (X) is the basis function.

Yang's Hybrid Model (YHM)
The physical parameterization scheme of Yang's Hybrid model [55] takes into account the five solar radiation transmittance of the atmosphere during damping (aerosol extinction, ozone absorption, Rayleigh scattering, gas absorption and water vapor absorption).It can be expressed as follows: where H all (MJ m −2 day −1 ) is the daily DHI for all-sky conditions and H clr is the daily DHI for clear-sky conditions.τ c is a cloud transmittance parameter to corrected the cloud effect on daily DHI, which is a function of the SD and the maximum possible sunshine durations (N) [35].

Statistical Measures of Model Accuracy
We used regression analysis between the measured and estimated values of DHI to validate the accuracy of the DHI models.The mean absolute bias error (MAE), mean bias errors (MBE), root mean square error (RMSE), coefficient of determination (R 2 ) and correlation coefficient (R) were used for the evaluation.They were calculated as follows: Here, N represent the number of samples collected; V est and V obs represent the estimated and observed DHI, respectively; V est represents the mean value of the estimated DHI; and V obs represent the mean value of the observed DHI.

Data Quality Control
The DHI data from CMA used for model evaluation in this study covered the 57-year interval from 1960-2016.Three different classes (first class, second class and third class) of radiometer were installed for measuring solar radiation in China both before 1990 and subsequent to (Table 1).However, only first-class stations (16 stations) could measure DHI from 1960 to the present.It is reported that observed DHI data may have an inhomogeneity problem because of the sensitivity drift and instrument replacement [55]; therefore, daily DHI observations at these stations should be checked to ensure data quality.Detailed rules for assessing these data are given in Tang et al. [56].After the quality control, the remaining data were used for model estimation and accuracy verification.In addition, quality control of all the other daily meteorological data was performed by CMA, and the detailed procedures are given at http://data.cma.gov.cn[57].

Results and Discussion
To ensure the accuracy and effectiveness of the experimental results, 10,752 samples (the compilations of the estimated results of BP, GRNN, Genetic, M5Tree, MARS and YHM at 16 CMA stations) were taken to verify the accuracy of the models during 1960-2016.

Validation of Estimated DHI
Correlation analysis was used to analyze the differences between the observed and estimated daily DHI for BP, GRNN, Genetic, M5Tree, MARS, and YHM at 16 CMA stations.The results are shown in Figure 3 and they indicate that the estimated values of DHI simulated by BP, GRNN, Genetic, M5Tree, MARS and YHM are all positively correlated with the DHI measurements; the correlation coefficients are 0.949, 0.918, 0.939, 0.956, 0.908 and 0.923, respectively.For the 16 CMA stations, M5Tree yielded slightly better forecasts (RMSE = 1.989,MAE = 1.923,R 2 = 0.915) than BP (RMSE = 2.122, MAE = 1.440,R 2 = 0.901), GRNN (RMSE = 2.784, MAE = 1.788,R 2 = 0.843), Genetic (RMSE = 2.390, MAE = 1.618,R 2 = 0.882) and MARS (RMSE = 3.090, MAE = 2.250, R 2 = 0.825).YHM, which is based on the theoretical principles of atmospheric physical transmission processes, also showed a relatively high degree of accuracy (RMSE = 3.692, MAE = 2.463, R 2 = 0.853); however, it performed the worst among the six models, because of the effects of cloud, aerosol extinction and water vapor.In summary, M5Tree was superior to the other DHI models in stations with smaller seasonal variations, which suggests that this model can be used to reconstruct the historical datasets of DHI at CMA stations across mainland China, with a high degree of accuracy and robustness.
In summary, M5Tree was superior to the other DHI models in stations with smaller seasonal variations, which suggests that this model can be used to reconstruct the historical datasets of DHI at CMA stations across mainland China, with a high degree of accuracy and robustness.

Analysis of Spatial-Temporal Variations of DHI Values across China
Daily meteorological data from 839 CMA stations across China were used as input data using the M5Tree model to simulate the daily average DHI values for 1960-2016.The monthly and annual mean DHI values were calculated and software tools in ArcGIS were used to illustrate the temporal trend of annual average DHI.[44].In 1990-2007, the annual mean DHI decreased slowly, at the rate of 0.027 MJm -2 day -1 per decade, and the annual mean DHI increased at the rate of 0.084 MJm −2 day −1 per decade during 2008-2016; this is because the aerosol radiative forcing effect was decreasing as a consequence of the further implementation of environmental protection policies [44].The trend of the DHI in the research matched the previous finding in Athens [60], the Mediterranean Basin [61], the Indian monsoon region [62], China and Japan [63], all of which reveal the "global dimming/brightening" effect.The difference of the result illustrates that the affecting factors on the trends change of DHI change with climate, topography, humidity, cloud, and aerosol effects.

Analysis of Spatial-Temporal Variations of DHI Values across China
Daily meteorological data from 839 CMA stations across China were used as input data using the M5Tree model to simulate the daily average DHI values for 1960-2016.The monthly and annual mean DHI values were calculated and software tools in ArcGIS were used to illustrate the temporal trend of annual average DHI.[44].In 1990-2007, the annual mean DHI decreased slowly, at the rate of 0.027 MJm −2 day −1 per decade, and the annual mean DHI increased at the rate of 0.084 MJm −2 day −1 per decade during 2008-2016; this is because the aerosol radiative forcing effect was decreasing as a consequence of the further implementation of environmental protection policies [44].The trend of the DHI in the research matched the previous finding in Athens [60], the Mediterranean Basin [61], the Indian monsoon region [62], China and Japan [63], all of which reveal the "global dimming/brightening" effect.The difference of the result illustrates that the affecting factors on the trends change of DHI change with climate, topography, humidity, cloud, and aerosol effects.The spatial pattern of monthly DHI in China is shown in Figure 10.Here, spring is March-May, summer is June-August, autumn is September-November, and winter is December-February.The sunshine duration in summer is longer and the daily average solar elevation angle higher than in other seasons, because China is in the middle and low latitudes of the Northern Hemisphere.Therefore, the DHI value in summer (13.185MJ m −2 day −1 ) over mainland China is significantly higher than in spring (11.046MJ m −2 day −1 ), autumn (8.609 MJ m −2 day −1 ) and winter (7.218 MJ m −2 day −1 ).
The monthly and annual DHI values of the Qinghai-Tibetan Plateau are the highest because of the lowest attenuation of the solar rays due to the high attitude.For the Qinghai-Tibetan Plateau, the annual mean DHI during 1960-2016 is 13.43 MJ m −2 day −1 , and the values range from 8.522 MJ m −2 day −1 (January) to 17.946 MJ m −2 day −1 (May).The annual DHI value of the Mongolian Plateau is 11.49MJ m −2 day −1 , which is because this arid region has less precipitation and a higher frequency of sunny days than elsewhere, and thus the attenuation of solar radiation by the thin atmosphere is weak [35,64] The Sichuan Basin lies within the warm and humid climate zone, and there is a high incidence of cloud and fog throughout the year.In addition to the basin terrain, aerosol particles and clouds result in the substantial attenuation of atmospheric radiation; therefore, the Sichuan Basin has low values of direct radiation in China, and its annual mean DHI is 6.Because of its low latitude and short sunshine duration, northeast China is characterized by low values in winter; for example, the annual DHI value of the Greater Khingan Range is 7.282 MJ m −2 day −1 .In the desert regions of northwestern China, although aerosols in the desert can weaken the effects of solar radiation, because of the perennial arid climate conditions in the region, the solar radiation attenuation effect is weak, so the annual average solar radiation on the surface is higher.The spatial pattern of monthly DHI in China is shown in Figure 10.Here, spring is March-May, summer is June-August, autumn is September-November, and winter is December-February.The sunshine duration in summer is longer and the daily average solar elevation angle higher than in other seasons, because China is in the middle and low latitudes of the Northern Hemisphere.Therefore, the DHI value in summer (13.185MJ m −2 day −1 ) over mainland China is significantly higher than in spring (11., which is because this arid region has less precipitation and a higher frequency of sunny days than elsewhere, and thus the attenuation of solar radiation by the thin atmosphere is weak [35,64].
The Sichuan Basin lies within the warm and humid climate zone, and there is a high incidence of cloud and fog throughout the year.In addition to the basin terrain, aerosol particles and clouds result in the substantial attenuation of atmospheric radiation; therefore, the Sichuan Basin has low values of direct radiation in China, and its annual mean DHI is 6.Because of its low latitude and short sunshine duration, northeast China is characterized by low values in winter; for example, the annual DHI value of the Greater Khingan Range is 7.282 MJ m −2 day −1 .In the desert regions of northwestern China, although aerosols in the desert can weaken the effects of solar radiation, because of the perennial arid climate conditions in the region, the solar radiation attenuation effect is weak, so the annual average solar radiation on the surface is higher.
Figure 11 shows the monthly average DHI values in different climatic zones.Because cloud cover is the dominant factor affecting surface solar radiation [35], the DHI values are generally higher in semi-arid and arid zones than in humid zones, which have a more frequent cloud occurrence.The ranges of DHI for HIC, HIIC, HID and IID are 8.

Conclusions
Solar radiation stations are extremely sparsely distributed across China, and, therefore, they are unable to meet the requirements for the optimum location of solar power thermal plants, especially for planning CSP systems.Therefore, we conducted a systematic study of six different models (a physically based model and five machine learning methods) to estimate daily DHI.The accuracies of the DHI estimates using the six different models are comparable to those at the 16 CMA DHI stations; however, the M5Tree method for daily DHI estimation exhibits the best performance, with RMSE and MAE being about 1.989 and 1.923 MJ m −2 day −1 , respectively, for different climatic conditions across China.
We extended the method to compile a dataset from 839 CMA meteorological stations using the M5Tree model and then investigated the spatiotemporal distribution of DHI across China.There was

Conclusions
Solar radiation stations are extremely sparsely distributed across China, and, therefore, they are unable to meet the requirements for the optimum location of solar power thermal plants, especially for planning CSP systems.Therefore, we conducted a systematic study of six different models (a physically based model and five machine learning methods) to estimate daily DHI.The accuracies of the DHI estimates using the six different models are comparable to those at the 16 CMA DHI stations; however, the M5Tree method for daily DHI estimation exhibits the best performance, with RMSE and MAE being about 1.989 and 1.923 MJ m −2 day −1 , respectively, for different climatic conditions across China.
We extended the method to compile a dataset from 839 CMA meteorological stations using the M5Tree model and then investigated the spatiotemporal distribution of DHI across China.There was a significant decreasing trend for DHI, at a rate of −0.19 MJm −2 per decade, during 1960-2016.The annual mean DHI was generally higher in the Qinghai−Tibetan Plateau (13.43 MJ m −2 day −1 ) and the Mongolian Plateau, and lower in the Yunnan−Guizhou Plateau, the Sichuan Basin (6.478 MJ m −2 day −1 ) and most of the southern Yangtze River Basin.
Our various analyzed DHI datasets from 1960 to 2016 for China potentially have applications in fields such as global and regional climatology, agricultural production, and especially in the application of CSP thermal systems throughout China.Our study also provides a refined methodology for estimating DHI.Finally, we suggest that these models should be further tested and verified in other climatic zones and terrain types worldwide.This study is the first assessment of various DHI models in different ecosystems, and derives a long-term high-density dataset of daily DHI in China, which contributes to many aspects, such as terrestrial ecosystem processes, solar energy technologies, and especially CSP thermal systems.Meanwhile, the effects of water vapor, aerosols, and cloud, as well as on the temporal variations of DHI will be quantitatively analyzed in future work.Although we have done the variable selection and parameter tuning in preparatory work, the parameter selection and tuning could be further improved in future work.Especially, the mean impact value (MIV) method proposed by Dombi et al. [65] will be used for variable selection and the Bayesian optimization method will be used to tune the hyper parameters for AI models.

Figure 1 .
Figure 1.Distribution of the 16 direct horizontal irradiance (DHI) stations and 839 China Meteorological Administration (CMA) stations across China used as data sources (GKR for the Greater Khingan Range; IMP for the Inner Mongolia High Plain; TP for the Tibet Plateau; SB for the Sichuan Basin; YRB for the Yangtze River Basin)

Figure 1 .
Figure 1.Distribution of the 16 direct horizontal irradiance (DHI) stations and 839 China Meteorological Administration (CMA) stations across China used as data sources (GKR for the Greater Khingan Range; IMP for the Inner Mongolia High Plain; TP for the Tibet Plateau; SB for the Sichuan Basin; YRB for the Yangtze River Basin).

Figure 1 .
Figure 1.Distribution of the 16 direct horizontal irradiance (DHI) stations and 839 China Meteorological Administration (CMA) stations across China used as data sources (GKR for the Greater Khingan Range; IMP for the Inner Mongolia High Plain; TP for the Tibet Plateau; SB for the Sichuan Basin; YRB for the Yangtze River Basin)

Figures 4 -
6 illustrate spatial changes of the mean values of the statistical indicators (RMSE, MBE and MAE), representing the accuracy of the six different DHI models.It is evident that all model performances are better in the eastern regions than in the western regions of China.In addition, the M5Tree model is overwhelmingly superior to the other DHI models, because of its strong self-learning ability.The values of RMSE, MAE, and R 2 for M5Tree are 1.839, 1.194 MJ m −2 day −1 and 0.899, respectively.The SHY station in Hainan province has the largest MAE (3.028 MJ m −2 day −1 ), MBE (−2.371MJ m −2 day −1 ), and RMSE (4.258 MJ m −2 day −1 ) for M5Tree.The lowest MAE and RMSE are 0.380 and 0.781 MJ m −2 day −1 , respectively, and are for WJ station in the southern Sichuan.YHM was not as accurate as the other DHI models, because of its vulnerability to cloud and terrain effects; for this model, RMSE, MAE and R 2 are 3.554, 2.468 MJ m −2 day −1 and 0.837, respectively.The highest RMSE and MAE (5.830 and 4.589 MJ m −2 day −1 , respectively) are for YHM of KAS station in northwestern Xinjiang, possibly because of the high atmospheric dust loading.The lowest RMSE and MAE (2.066 and 1.302 MJ m −2 day −1 , respectively) are for MH station in northeastern Heilongjiang, because of the low radiative damping processes in the region.

Figures 4 -
Figures 4-6 illustrate spatial changes of the mean values of the statistical indicators (RMSE, MBE and MAE), representing the accuracy of the six different DHI models.It is evident that all model performances are better in the eastern regions than in the western regions of China.In addition, the M5Tree model is overwhelmingly superior to the other DHI models, because of its strong selflearning ability.The values of RMSE, MAE, and R 2 for M5Tree are 1.839, 1.194 MJ m −2 day −1 and 0.899, respectively.The SHY station in Hainan province has the largest MAE (3.028 MJ m −2 day −1 ), MBE (−2.371MJ m −2 day −1 ), and RMSE (4.258 MJ m −2 day −1 ) for M5Tree.The lowest MAE and RMSE are 0.380
accurate as the other DHI models, because of its vulnerability to cloud and terrain effects; for this model, RMSE, MAE and R 2 are 3.554, 2.468 MJ m −2 day −1 and 0.837, respectively.The highest RMSE and MAE (5.830 and 4.589 MJ m −2 day −1 , respectively) are for YHM of KAS station in northwestern Xinjiang, possibly because of the high atmospheric dust loading.The lowest RMSE and MAE (2.066 and 1.302 MJ m −2 day −1 , respectively) are for MH station in northeastern Heilongjiang, because of the low radiative damping processes in the region.

Figure 4 .
Figure 4. Spatial variation of the root mean square error (RMSE) for the six DHI models.Figure 4. Spatial variation of the root mean square error (RMSE) for the six DHI models.

Figure 4 .
Figure 4. Spatial variation of the root mean square error (RMSE) for the six DHI models.Figure 4. Spatial variation of the root mean square error (RMSE) for the six DHI models.

Figure 5 .
Figure 5. Spatial variation of the mean bias errors (MBE) for the six DHI models.Figure 5. Spatial variation of the mean bias errors (MBE) for the six DHI models.

Figure 5 .
Figure 5. Spatial variation of the mean bias errors (MBE) for the six DHI models.Figure 5. Spatial variation of the mean bias errors (MBE) for the six DHI models.

Figure 7
Figure 7 shows the monthly changes of RMSE, MBE and MAE for BP, GRNN, Genetic, M5Tree, MARS and YHM.The results show that the model measurements are better in winter than in summer, because of the relatively strong radiative damping processes due to frequent cloud occurrence and wet weather in summer.The mean MAE and RMSE in May are 2.318 and 3.255 MJ m −2 day −1 , respectively, which are the highest values.The minimum mean values of MAE and RMSE are 1.105 and 1.690 MJ m −2 day −1 , respectively, and are observed in January.In addition, the RMSE and MAE values from winter to summer show an increasing trend.The lowest mean MAE and RMSE are 1.215 and 1.736 MJ m −2 day −1 , respectively, and are observed in winter; while the highest mean MAE and RMSE are 2.277 and 3.061 MJ m −2 day −1 , respectively, and are observed in summer.M5tree performed better than BP, GRNN, Genetic, MARS and YHM in most months of the year.The largest RMSE for M5Tree is 2.438 MJ m −2 day −1 in July, while the smallest RMSE is 1.323 MJ m −2 day −1 in January; the largest MAE for M5Tree is 1.526 MJ m −2 day −1 in July, and the lowest MAE is 0.724 MJ m −2 day −1 in January.Although the RMSE varies for YHM and BP, GRNN, Genetic, M5tree and MARS, they all show a similar pattern of significant seasonal changes because of strong radiative damping processes in summer.The highest

Figure 6 .
Figure 6.Spatial variation of the mean absolute bias error (MAE) for the six DHI models.

Figure 7
Figure 7 shows the monthly changes of RMSE, MBE and MAE for BP, GRNN, Genetic, M5Tree, MARS and YHM.The results show that the model measurements are better in winter than in summer, because of the relatively strong radiative damping processes due to frequent cloud occurrence and wet weather in summer.The mean MAE and RMSE in May are 2.318 and 3.255 MJ m −2 day −1 , respectively, which are the highest values.The minimum mean values of MAE and RMSE are 1.105 and 1.690 MJ m −2 day −1 , respectively, and are observed in January.In addition, the RMSE and MAE values from winter to summer show an increasing trend.The lowest mean MAE and RMSE are 1.215 and 1.736 MJ m −2 day −1 , respectively, and are observed in winter; while the highest mean MAE and RMSE are 2.277 and 3.061 MJ m −2 day −1 , respectively, and are observed in summer.M5tree performed better than BP, GRNN, Genetic, MARS and YHM in most months of the year.The largest RMSE for M5Tree is 2.438 MJ m −2 day −1 in July, while the smallest RMSE is 1.323 MJ m −2 day −1 in January; the largest MAE for M5Tree is 1.526 MJ m −2 day −1 in July, and the lowest MAE is 0.724 MJ m −2 day −1 in January.Although the RMSE varies for YHM and BP, GRNN, Genetic, M5tree and MARS, they all show a similar pattern of significant seasonal changes because of strong radiative damping processes in summer.The highest MAE and RMSE for YHM are 3.676 and 5.041 MJ m −2 day −1 , respectively, and are observed in May; and the lowest lest MAE and RMSE are 0.842 and 1.258 MJ m −2 day −1 , respectively, and are observed in December.

Figure 6 .
Figure 6.Spatial variation of the mean absolute bias error (MAE) for the six DHI models.

Figure 8
illustrates the temporal trends of the annual mean values of DHI during 1960-2016 across mainland China.The results indicate that the highest value occurs in 1963 (9.402 MJm −2 day −1 ) and the lowest value in 2015 (7.954 MJm −2 day −1 ).The DHI values decrease at the rate of −0.190 MJm - 2 day −1 per decade during 1960-2016.Especially after 1980, the annual mean DHI values decreased gradually, because of the enhancement of the aerosol radiative forcing effect with the population growth and rapid development of the economy

Figure 8
illustrates the temporal trends of the annual mean values of DHI during 1960-2016 across mainland China.The results indicate that the highest value occurs in 1963 (9.402 MJm −2 day −1 ) and the lowest value in 2015 (7.954 MJm −2 day −1 ).The DHI values decrease at the rate of −0.190 MJm −2 day −1 per decade during 1960-2016.Especially after 1980, the annual mean DHI values decreased gradually, because of the enhancement of the aerosol radiative forcing effect with the population growth and rapid development of the economy

Figure 9
Figure 9 indicates the spatial pattern of the annual average DHI values over mainland China.The values increased gradually from southeast China to northwest China, from 3.481-17.495MJ m −2 day −1 with an average of ~9.836 MJm −2 day −1 .The pattern of annual mean DHI is spatially heterogeneous across the mainland; for example, the maximum DHI value (17.195MJ m −2 day −1 ) occurs on the Tibetan Plateau.Northwest China and the Mongolian Plateau also exhibit high values (12.542MJ m −2 day −1 ), whereas the lowest values (from 3.481 MJ m −2 day −1 to 7.460 MJ m −2 day −1 ) occur in the southeastern part of the Yunnan-Guizhou Plateau, the Sichuan Basin and most of the southern Yangtze River Basin.The spatial pattern of monthly DHI in China is shown in Figure10.Here, spring is March-May, summer is June-August, autumn is September-November, and winter is December-February.The sunshine duration in summer is longer and the daily average solar elevation angle higher than in other seasons, because China is in the middle and low latitudes of the Northern Hemisphere.Therefore, the DHI value in summer (13.185MJ m −2 day −1 ) over mainland China is significantly higher than in spring(11.046MJ m −2 day −1 ), autumn (8.609 MJ m −2 day −1 ) and winter (7.218 MJ m −2 day −1 ).The monthly and annual DHI values of the Qinghai-Tibetan Plateau are the highest because of the lowest attenuation of the solar rays due to the high attitude.For the Qinghai-Tibetan Plateau, the annual mean DHI during 1960-2016 is 13.43 MJ m −2 day −1 , and the values range from 8.522 MJ m −2 day −1 (January) to 17.946 MJ m −2 day −1 (May).The annual DHI value of the Mongolian Plateau is 11.49MJ m −2 day −1, which is because this arid region has less precipitation and a higher frequency of sunny days than elsewhere, and thus the attenuation of solar radiation by the thin atmosphere is weak[35,64] The Sichuan Basin lies within the warm and humid climate zone, and there is a high incidence of cloud and fog throughout the year.In addition to the basin terrain, aerosol particles and clouds result in the substantial attenuation of atmospheric radiation; therefore, the Sichuan Basin has low values of direct radiation in China, and its annual mean DHI is 6.478 MJ m −2 day −1 .The monthly mean DHI values for the Sichuan Basin from January to December are 5.45, 7.09, 7.608, 7.256, 10.24, 9.976, 7.35, 5.91, 4.754, 3.952, 3.697 and 4.35 MJ m −2 day −1 , respectively.The middle and lower reaches of the Yangtze River (MLYR) and the Chiang-nan Hilly Region (CHR) have consistently low values because of the influence of the monsoon in spring and summer, because solar radiation receipt is reduced during the rainy season.The ranges of the monthly mean DHI values in Hanzhong Basin, the MLYR, and the CHR are 5.08-9.62MJ m −2 day −1 , 5.31-9.34MJ m −2 day −1 , and 3.47-13.65MJ m −2 day −1 , respectively.Because of its low latitude and short sunshine duration, northeast China is characterized by low values in winter; for example, the annual DHI value of the Greater Khingan Range is 7.282 MJ m −2 day −1 .In the desert regions of northwestern China, although aerosols in the desert can weaken the effects of solar radiation, because of the perennial arid climate conditions in the region, the solar radiation attenuation effect is weak, so the annual average solar radiation on the surface is higher.
Figure 9 indicates the spatial pattern of the annual average DHI values over mainland China.The values increased gradually from southeast China to northwest China, from 3.481-17.495MJ m −2 day −1 with an average of ~9.836 MJm −2 day −1 .The pattern of annual mean DHI is spatially heterogeneous across the mainland; for example, the maximum DHI value (17.195MJ m −2 day −1 ) occurs on the Tibetan Plateau.Northwest China and the Mongolian Plateau also exhibit high values (12.542MJ m −2 day −1 ), whereas the lowest values (from 3.481 MJ m −2 day −1 to 7.460 MJ m −2 day −1 ) occur in the southeastern part of the Yunnan-Guizhou Plateau, the Sichuan Basin and most of the southern Yangtze River Basin.The spatial pattern of monthly DHI in China is shown in Figure10.Here, spring is March-May, summer is June-August, autumn is September-November, and winter is December-February.The sunshine duration in summer is longer and the daily average solar elevation angle higher than in other seasons, because China is in the middle and low latitudes of the Northern Hemisphere.Therefore, the DHI value in summer (13.185MJ m −2 day −1 ) over mainland China is significantly higher than in spring(11.046MJ m −2 day −1 ), autumn (8.609 MJ m −2 day −1 ) and winter (7.218 MJ m −2 day −1 ).The monthly and annual DHI values of the Qinghai-Tibetan Plateau are the highest because of the lowest attenuation of the solar rays due to the high attitude.For the Qinghai-Tibetan Plateau, the annual mean DHI during 1960-2016 is 13.43 MJ m −2 day −1 , and the values range from 8.522 MJ m −2 day −1 (January) to 17.946 MJ m −2 day −1 (May).The annual DHI value of the Mongolian Plateau is 11.49MJ m −2 day −1, which is because this arid region has less precipitation and a higher frequency of sunny days than elsewhere, and thus the attenuation of solar radiation by the thin atmosphere is weak[35,64] The Sichuan Basin lies within the warm and humid climate zone, and there is a high incidence of cloud and fog throughout the year.In addition to the basin terrain, aerosol particles and clouds result in the substantial attenuation of atmospheric radiation; therefore, the Sichuan Basin has low values of direct radiation in China, and its annual mean DHI is 6.478 MJ m −2 day −1 .The monthly mean DHI values for the Sichuan Basin from January to December are 5.45, 7.09, 7.608, 7.256, 10.24, 9.976, 7.35, 5.91, 4.754, 3.952, 3.697 and 4.35 MJ m −2 day −1 , respectively.The middle and lower reaches of the Yangtze River (MLYR) and the Chiang-nan Hilly Region (CHR) have consistently low values because of the influence of the monsoon in spring and summer, because solar radiation receipt is reduced during the rainy season.The ranges of the monthly mean DHI values in Hanzhong Basin, the MLYR, and the CHR are 5.08-9.62MJ m −2 day −1 , 5.31-9.34MJ m −2 day −1 , and 3.47-13.65MJ m −2 day −1 , respectively.Because of its low latitude and short sunshine duration, northeast China is characterized by low values in winter; for example, the annual DHI value of the Greater Khingan Range is 7.282 MJ m −2 day −1 .In the desert regions of northwestern China, although aerosols in the desert can weaken the effects of solar radiation, because of the perennial arid climate conditions in the region, the solar radiation attenuation effect is weak, so the annual average solar radiation on the surface is higher.

Figure 9
Figure 9 indicates the spatial pattern of the annual average DHI values over mainland China.The values increased gradually from southeast China to northwest China, from 3.481-17.495MJ m −2 day −1 with an average of ~9.836 MJm −2 day −1 .The pattern of annual mean DHI is spatially heterogeneous across the mainland; for example, the maximum DHI value (17.195MJ m −2 day −1 ) occurs on the Tibetan Plateau.Northwest China and the Mongolian Plateau also exhibit high values (12.542MJ m −2 day −1 ), whereas the lowest values (from 3.481 MJ m −2 day −1 to 7.460 MJ m −2 day −1 ) occur in the southeastern part of the Yunnan-Guizhou Plateau, the Sichuan Basin and most of the southern Yangtze River Basin.

Figure 10 .Figure 10 .
Figure 10.Spatial and temporal changes of DHI (MJ m −2 day −1 ) across mainland China 478 MJ m −2 day −1 .The monthly mean DHI values for the Sichuan Basin from January to December are 5.45, 7.09, 7.608, 7.256, 10.24, 9.976, 7.35, 5.91, 4.754, 3.952, 3.697 and 4.35 MJ m −2 day −1 , respectively.The middle and lower reaches of the Yangtze River (MLYR) and the Chiang-nan Hilly Region (CHR) have consistently low values because of the influence of the monsoon in spring and summer, because solar radiation receipt is reduced during the rainy season.The ranges of the monthly mean DHI values in Hanzhong Basin, the MLYR, and the CHR are 5.08-9.62MJ m −2 day −1 , 5.31-9.34MJ m −2 day −1 , and 3.47-13.65MJ m −2 day −1 , respectively.

Figure 11 .
Figure 11.Monthly mean DHI (MJ m −2 day −1 ) values in different climatic zones.The zones are defined based on both temperature and humidity (a) Temperature zones, (b) humidity zones, (c) climatic zones.

Figure 11 .
Figure 11.Monthly mean DHI (MJ m −2 day −1 ) values in different climatic zones.The zones are defined based on both temperature and humidity (a) Temperature zones, (b) humidity zones, (c) climatic zones.