Low Redundancy Feature Selection of Short Term Solar Irradiance Prediction Using Conditional Mutual Information and Gauss Process Regression

Solar irradiation is influenced by many meteorological features, which results in a complex structure meaning its prediction has low efficiency and accuracy. The existing prediction methods are focused on analyzing the correlation between features and irradiation to reduce model complexity but they do not account for redundant analysis in feature subset. In order to reduce the information redundancy in the feature set and improve prediction accuracy, a novel feature selection method for short-term irradiation prediction based on Conditional Mutual Information (CMI) and Gaussian Process Regression (GPR) is proposed. Firstly, the CMI values of different features are calculated to evaluate correlation and redundant information between features in the feature subsets. Secondly, GPR with a stable prediction performance and adaptively determined hyper parameters is used as the predictor. The optimal feature subset and the GPR covariance function can be selected using Sequential Forward Selection (SFS). Finally, an optimal predictor is determined by the minimum prediction error and the prediction of solar irradiation is carried out by the determined predictor. The experimental results show that CMI-GPRAEK has the highest prediction accuracy with the optimal feature set has low dimension, which is 4.33% lower in MAPE than the predictor without feature selection, although both of them have an optimal kernel function. The CMI-GPRAEK is less complicated for the predictor and there is less redundancy between features in the model with the dimension of the optimal feature set is only 14.


Introduction
Solar energy is the cleanest and richest renewable energy in the world.However, photovoltaic power generation is influenced by the randomness and volatility of solar irradiation.In order to reduce the negative effect on the stability when the photovoltaic connects to power grid, the photovoltaic power need to be predicted accurately [1].The solar irradiance is the most important factor affecting the power output of photovoltaic power, so the prediction results of solar irradiation with high accuracy can effectively improve the prediction accuracy of photovoltaic output and help for the dispatching department of the electrical grid to arrange the scheduling plan and operation mode for the power grid [2,3].
The conventional irradiation prediction models can be divided into three types: statistical models [4], physical models [5] and intelligent algorithm models [6].The statistical models are established by analyzing the relationship between irradiation data at each time, they are simple and efficient.However, the prediction has low accuracy and the parameters of the higher-order models are difficult to determine.The physical models are based on numerical weather forecasts.Because of a large number of factors that affect the accuracy of solar irradiation predictor, the input of the physical models has a pretty high dimension and they are very complicated to operate.By using intelligent algorithm models, the nonlinear intelligent prediction models can be constructed.They have a good nonlinear fitting ability and takes full account of the impact of external conditions on irradiation.The predicted results are more accurate.
At present, the commonly used intelligent algorithm methods in short-term solar irradiation prediction include BP artificial neural network (BPNN) [7], RBF neural network (RBFNN) [8], extreme learning machine (ELM) [9] and support vector machine (SVM) [10].BPNN has a good self-organization and adaptive processing ability and it can solve the nonlinear fitting problem in irradiation prediction.But it is prone to the problem of local optimal solution.RBFNN does not have the local minimum problem but it has high demand on feature set.When the data is not sufficient, it will have a low prediction accuracy.ELM has the randomly generated initial weights, which will lead to over-fitting or instability.SVM can transform the prediction problem into quadratic programming problems from the perspective of risk minimization and obtain the global optimal problem [11].However, the kernel function's selection and optimization of parameters are complex and the prediction result is unstable.
The solar irradiation can be influenced by various natural environmental factors [12], such as air pressure, precipitation, humidity, temperature and so on [13,14].Hence the irradiation prediction model based on intelligent algorithm is more complicated than traditional load forecasting [15].Furthermore, because of different meteorological environment in different regions, a unified irradiation prediction model cannot meet all the needs of irradiation prediction in different places.Therefore, the historical data of different specific regions should be analyzed separately and the optimal prediction models with different feature subsets should be designed separately in different areas [16,17].
In order to reduce the complexity of predictor, feature selection is used to reduce the feature set dimension [18,19].The existing feature selection method commonly used for irradiation prediction is the Filter method [20,21].In the Filter method, when the feature importance is got, the optimal feature subset can be determined by SFS or Sequential Backward Selection (SBS).The methods for measuring the correlation of features include Pearson Correlation Coefficient (PCC) [22,23], Mutual Information (MI) [24,25] and so forth.Although PCC and MI can analyze the correlation between features and solar irradiation, it cannot analyze the information redundancy between features in the subset.The redundant information existing among the highly correlated meteorological features lead to the high complexity and low prediction accuracy.On the basis of MI, Conditional Mutual Information (CMI) also measured the redundancy of features in the process of feature importance calculation [26,27].Therefore, using CMI to construct the importance rank of features can further reduce the influence of informational redundancy in the feature subset based on the strong correlation between selected features and solar radiation which has already calculated by CMI [28].
In order to obtain reliable feature selection results, the predictor should have less parameters and stable prediction accuracy in the feature selection process.GPR is a machine learning method based on Bayesian theory and statistical theory.It has good performance in dealing with high dimension, small data set and nonlinear complex problems.GPR has less parameters to be optimized and strong generalization ability.It also has a stable and accurate forecast result in prediction [29].Thus, it can be efficiently used to predict solar irradiation [30].
In order to reduce the information redundancy in the feature set and improve the prediction accuracy, a feature selection method based on CMI and GPR for solar irradiation prediction is proposed.Firstly, CMI is used as the feature importance analysis method and adopted to calculate the importance of each feature.Secondly, the SFS method based on CMI and GPR with 10 different covariance functions is used to choose the optimal feature subset for solar irradiation forecast.The feature selection is carried out with the prediction accuracy as the index of evaluation.Finally, the optimal predictor used for solar irradiation forecast with highest forecast accuracy is constructed with the optimal feature subset and the best covariance function.The commonly used methods are used as the contrast test to prove the superiority of the proposed method.In order to prove the advantage and feasibility of the new method, the real measured solar irradiation data in solar irradiation research laboratory (SRRL) [31], Oak Ridge National Laboratory (ORNL) [32] and Natural Energy Laboratory of Hawaii laboratory Authority (LELH) are used in numerical experiments [33] and each data set could contain same feature types and the feature set with same structure will be built.

Solar Irradiance Forecasting Using CMI and GPR
The new method is mainly composed of two components: CMI and GPR.The purpose of using CMI is to build the optimal feature subset and the optimal predictor is based on the GPR.The combination of the optimal feature subset and the optimal predictor makes up the optimal prediction method.
To build the optimal feature subset, the importance values of different features need to be calculated by CMI first.Then, in terms of the descending order of CMI value, the ranking of feature importance is got.Finally, the feature selection is carried out by using GPR combines the SFS method according to the ranking of feature importance and the optimal feature subset is determined with the minimum error.
To build the predictor of GPR, ten different GPR models with different covariance functions are built.And best covariance function can be selected in the experiment and the optimal predictor is constructed.The optimal predictor method can be determined by the optimal feature subset and optimal predictor.The methodology of the proposed method can be shown in the red box of Figure 1.covariance functions is used to choose the optimal feature subset for solar irradiation forecast.The feature selection is carried out with the prediction accuracy as the index of evaluation.Finally, the optimal predictor used for solar irradiation forecast with highest forecast accuracy is constructed with the optimal feature subset and the best covariance function.The commonly used methods are used as the contrast test to prove the superiority of the proposed method.In order to prove the advantage and feasibility of the new method, the real measured solar irradiation data in solar irradiation research laboratory (SRRL) [31], Oak Ridge National Laboratory (ORNL) [32] and Natural Energy Laboratory of Hawaii laboratory Authority (LELH) are used in numerical experiments [33] and each data set could contain same feature types and the feature set with same structure will be built.

Solar Irradiance Forecasting Using CMI and GPR
The new method is mainly composed of two components: CMI and GPR.The purpose of using CMI is to build the optimal feature subset and the optimal predictor is based on the GPR.The combination of the optimal feature subset and the optimal predictor makes up the optimal prediction method.
To build the optimal feature subset, the importance values of different features need to be calculated by CMI first.Then, in terms of the descending order of CMI value, the ranking of feature importance is got.Finally, the feature selection is carried out by using GPR combines the SFS method according to the ranking of feature importance and the optimal feature subset is determined with the minimum error.
To build the predictor of GPR, ten different GPR models with different covariance functions are built.And best covariance function can be selected in the experiment and the optimal predictor is constructed.The optimal predictor method can be determined by the optimal feature subset and optimal predictor.The methodology of the proposed method can be shown in the red box of Figure 1.
When the optimal prediction method is determined, the solar irradiation prediction experiment will be carried out by using this method, as shown in the blue box of Figure 1.
The details about CMI and GPR are explained in the following parts.When the optimal prediction method is determined, the solar irradiation prediction experiment will be carried out by using this method, as shown in the blue box of Figure 1.
The details about CMI and GPR are explained in the following parts.

Conditional Mutual Information
CMI makes the new selected features in the subset have strongly correlated with the solar irradiance.It also selects the features with least redundant information.
Suppose that X and Y are two random variables, and p(x, y) is the joint probability distribution of X and Y.The mutual information between X and Y is expressed as: When the irradiation value is X, the features to be selected is Y and the selected feature is Z, the CMI is expressed as Y between X and Z: I(X; Y|Z) refers to that the information sharing between X and Y in the case that Z is the selected feature.If Y and Z contains the same amount of information about X, the values of I(X; Z) and I(X; Y; Z) are equal.The value of I(X; Y|Z) is zero.If Y contains information about X but Y does not contain information related to Z, the value of CMI is nonzero.If the X and Y have lower correlation with Z, the amount of shared information I(X; Y|Z) will be bigger and a greater value of CMI will be got.Therefore, CMI takes full account of the information redundancy between candidate features and selected features.That makes the CMI value between the candidate feature and the target feature is the largest.Therefore, it can effectively reduce the redundant information in the optimal feature subset of short term solar irradiation prediction.

Gauss Process Regression
In the process of feature selection, different feature subsets have different dimension and feature types.It is difficult to ensure the predictor using different feature sets has a well effect with same parameters.Therefore, the predictor with few parameters and stable prediction accuracy will ensure the stability and credibility of feature selection.
Gaussian Process (GP) is a set of any finite number of random variables meet the joint Gauss distribution, which is determined by mean function and covariance function: where x, x ∈ R d is the arbitrary random variable, m(x) is the expectation of the f (x), k(x, x ) is the covariance of x and x .Therefore, GPR can be defined as f (x) ∼ GP(m(x), k(x, x )).For the irradiation prediction, using the following model: x is the input feature vector, f refers to the value of the function, y is the vector of observed value with noise.Assume the noise is ε ∼ N(0, σ 2 n ), the prior distribution of y can be obtained: The joint prior distribution of the observed value y and the predicted value f * is described as follows.
where K(X, X) = K n = k ij is a n × n symmetric positive definite covariance matrix and the k ij = k x i , x j is to measure correlation between feature vector x i and feature vector x j , K(X, x * ) = K(x * , X) T is the n × 1 covariance matrix between test set x * and the input of training set X. K(x * , x * ) is the covariance matrix of test point x * .I n is the n-dimensional unit matrix.The posterior distribution of the predictive value f * can be calculated by using the following formula: In this equation, μ * = f * and σ2 f * = cov( f * ) are the mean value and variance of the predicted values ( f * ) corresponding to the test points x * .
GPR can use different covariance functions.The covariance functions can determine how the response at one point x i is affected by responses at other points x j , i = j, i = 1, 2, . . ., n.The most commonly used covariance function is Squared Exponential Kernel functions: In this equation, σ l is the length scale of feature data, σ f is the signal standard deviation.The feature length scales briefly define how far apart the input values x i can be for the response values to become uncorrelated.Both σ l and σ f need to be greater than 0.
When get the σ l and σ f is determined, the prediction values f * and variance σ2 f * can be obtained by using the Equations ( 9) and (10).
In order to get an accurate prediction value, the decision of GPR need a loss function L(y, y * ), which specifies the loss incurred by predicting the value y * when the true value is y.For example, the loss function could equal the absolute deviation between the prediction and the truth.The goal is to make the point prediction y * which incurs the smallest loss.The best prediction, in the sense that it minimizes the expected loss, is: x * is the input data corresponding to y * .D is the data set corresponding to x * .When the predictive distribution is Gaussian the mean and the median coincide and indeed for any symmetric loss function and symmetric predictive distribution we always get y as the mean of the predictive distribution.
The model is established by GPR applying the principle of probability distribution, then transform the distribution from the prior distribution to the posterior distribution in the Bayesian framework.GPR has less parameters to set.The parameters of GPR can be automatically obtained through the training process and avoid the complex process of parametric optimization.There are less factors affect the prediction stability of GPR [34].Therefore, its suit for solar irradiation forecast feature selection.

Feature Importance and Election Analysis
In this section, part 3.1 is the construction of the feature set.The intuitive analysis of feature sets is carried out in part 3.2.In part 3.3, the further evaluation of feature importance by using CMI, MI and PCC is carried out.In order to select the best feature subset, the sequential forward feature selection is proposed in part 3.4 and part 3.5.In part 3.6, ten kinds of covariance functions of GPR are compered to select the best one.In part 3.7, the feature selection experiments with contrast predictors are carried out and the optimal feature subsets and predictors are determined in this part.

The Construction of the Data Set
In order to construct the original dataset for the experiment and verify the effectiveness of the method, the measured data collected from SRRL, ORNL and LELH were used respectively.Each data set contains 7 types of meteorological information.In order to accurately identify important features and redundant features among the feature sets, 10 similar neighboring historical features are included in each type of information.At the same time, the Angstrom-Prescott linear regression equation: N is usually used to reflect the solar irradiation and the time features [35].In this equation: S 0 is the extraterrestrial irradiation on horizontal surface (Wh/m 2 ).S is the annual horizontal global solar irradiation (kWh/m 2 ).Coefficients a and b are the empirical coefficients, n is the actual sunshine duration in a day (hours) and N is the monthly average maximum bright sunshine duration in a day (hours).The empirical coefficients a and b depend on the S and n.Considering that the solar irradiation is related to the diurnal and annual variations, the time features are added to the original feature set as date (day) and moment (hour).Therefore, the original feature set is made up of the following features: feature 1 is day, feature 2 is hour; feature 3 to 12 is historical irradiation (S t-i ); feature 13 to 22 is historical temperature (T t-i ); feature 23 to 32 is historical relative humidity (H t-i ); feature 33 to 42 is historical wind direction (Wd t-i ); feature 43 to 52 is historical wind speed (Ws t-i ); feature 53 to 62 is historical air pressure (P t-i ); feature 63 to 72 is historical precipitation (R t-i ), i = 1, 2, ... 10.Among them, t is the time to be predicted; i is the sampling point.Because this work is part of the PV output prediction and according to the requirements of the national grid for short-term and ultra-short-term PV output forecasting, the sampling interval is 15 minutes.The measured data collected from SRRL, ORNL and LELH have the same feature types and data set structure.

Analysis of Original Feature Set
When using the data from SRRL, the relationship between the meteorological features and solar irradiation can be analyzed by using Figure 2. In this section, part 3.1 is the construction of the feature set.The intuitive analysis of feature sets is carried out in part 3.2.In part 3.3, the further evaluation of feature importance by using CMI, MI and PCC is carried out.In order to select the best feature subset, the sequential forward feature selection is proposed in part 3.4 and part 3.5.In part 3.6, ten kinds of covariance functions of GPR are compered to select the best one.In part 3.7, the feature selection experiments with contrast predictors are carried out and the optimal feature subsets and predictors are determined in this part.

The Construction of the Data Set
In order to construct the original dataset for the experiment and verify the effectiveness of the method, the measured data collected from SRRL, ORNL and LELH were used respectively.Each data set contains 7 types of meteorological information.In order to accurately identify important features and redundant features among the feature sets, 10 similar neighboring historical features are included in each type of information.At the same time, the Angstrom-Prescott linear regression equation: is usually used to reflect the solar irradiation and the time features [35].In this equation: S0 is the extraterrestrial irradiation on horizontal surface (Wh/m 2 ).S is the annual horizontal global solar irradiation (kWh/m 2 ).Coefficients a and b are the empirical coefficients, n is the actual sunshine duration in a day (hours) and N is the monthly average maximum bright sunshine duration in a day (hours).The empirical coefficients a and b depend on the S and n.Considering that the solar irradiation is related to the diurnal and annual variations, the time features are added to the original feature set as date (day) and moment (hour).Therefore, the original feature set is made up of the following features: feature 1 is day, feature 2 is hour; feature 3 to 12 is historical irradiation (St-i); feature 13 to 22 is historical temperature (Tt-i); feature 23 to 32 is historical relative humidity (Ht-i); feature 33 to 42 is historical wind direction (Wdt-i); feature 43 to 52 is historical wind speed (Wst-i); feature 53 to 62 is historical air pressure (Pt-i); feature 63 to 72 is historical precipitation (Rt-i), i=1,2,... 10.Among them, t is the time to be predicted; i is the sampling point.Because this work is part of the PV output prediction and according to the requirements of the national grid for short-term and ultrashort-term PV output forecasting, the sampling interval is 15 minutes.The measured data collected from SRRL, ORNL and LELH have the same feature types and data set structure.The data shown in Figure 2a are randomly selected from 18 February to 24 February 2015.It can be seen from Figure 2a that, the values of solar irradiation, pressure, relative humidity and temperature have obvious daily periodicity.Wind speed, wind direction and precipitation show obvious randomness.In order to show the trend of changes more clearly, the fifth day (in the area of red box) is randomly selected from Figure 2a for detailed analysis.Figure 2b is used to display the data in the red box.Figure 2b is the measured data from 7 to 18 o'clock.At the time of 7 to 12 o'clock, the values of pressure, relative humidity are in the process of decline, while the value of temperature and irradiation are in the process of increase.At the time of 12 to 18 o'clock, the value of pressure and relative humidity are decreased slightly, the value of temperature continues to rise, the values of irradiation began to decline during this time, while the changes of wind speed and wind direction do not show significantly correlated with the solar irradiation.

Feature Importance Analysis
To analyze the importance of features, three different evaluation methods are used, they are PCC, MI and CMI.In Figure 3, the values of features importance are calculated by CMI, MI and PCC respectively.The measured data are collected from the whole year of 2015 using the data of SRRL.Therefore, the importance values contain the information of the whole year.In order to embody the superiority of CMI in analyzing feature importance and redundancy, the top 12 features are selected and marked in red in Figure 3.As shown in Figure 3, different importance criteria lead to obviously different rankings of feature importance.Compared with PCC and MI, CMI contains more features types (4 types) marked in red in the top 12 features.The features rankings of top 12 according to feature importance are shown in Table 1 and the results of different measured data sets by using different calculation methods can be compared.When using the data from SRRL, the relationship between the meteorological features and solar irradiation can be analyzed by using Figure 2.
The data shown in Figure 2(a) are randomly selected from February 18 th to February 24 th , 2015.
It can be seen from Figure 2 (a) that, the values of solar irradiation, pressure, relative humidity and temperature have obvious daily periodicity.Wind speed, wind direction and precipitation show obvious randomness.In order to show the trend of changes more clearly, the fifth day (in the area of red box) is randomly selected from Figure 2 (a) for detailed analysis.o'clock, the values of pressure, relative humidity are in the process of decline, while the value of temperature and irradiation are in the process of increase.At the time of 12 to 18 o'clock, the value of pressure and relative humidity are decreased slightly, the value of temperature continues to rise, the values of irradiation began to decline during this time, while the changes of wind speed and wind direction do not show significantly correlated with the solar irradiation.

Feature Importance Analysis
To analyze the importance of features, three different evaluation methods are used, they are PCC, MI and CMI.In Figure 3, the values of features importance are calculated by CMI, MI and PCC respectively.The measured data are collected from the whole year of 2015 using the data of SRRL.Therefore, the importance values contain the information of the whole year.In order to embody the superiority of CMI in analyzing feature importance and redundancy, the top 12 features are selected and marked in red in Figure 3.As shown in Figure 3, different importance criteria lead to obviously different rankings of feature importance.Compared with PCC and MI, CMI contains more features types (4 types) marked in red in the top 12 features.The features rankings of top 12 according to feature importance are shown in Table1 and the results of different measured data sets by using different calculation methods can be compared.Using the data of SRRL as the example to evaluate the ability of different methods about the features correlation and redundancy.As shown in Figure 3 and Table 1, St-1 is the most important feature by the 3 methods in SRRL data set.St-2 is the most similar feature to the St-1, it is behind St-1 in the ranking of PCC and MI.But St-2 is behind the twelfth at the ranking of CMI importance.Because of the feature sets selected by PCC and MI include many features close in time among the same type, it leads to a large amount of redundancy information contained in the feature sets.The feature set Using the data of SRRL as the example to evaluate the ability of different methods about the features correlation and redundancy.As shown in Figure 3 and Table 1, S t-1 is the most important feature by the 3 methods in SRRL data set.S t-2 is the most similar feature to the S t-1 , it is behind S t-1 in the ranking of PCC and MI.But S t-2 is behind the twelfth at the ranking of CMI importance.Because of the feature sets selected by PCC and MI include many features close in time among the same type, it leads to a large amount of redundancy information contained in the feature sets.The feature set selected by CMI include more features types and has fewer features within the same type.It obviously to see that CMI could evaluate the redundancy of information between features.
In addition, the common features of PCC, MI and CMI in the top 12 features with highest feature importance value include: S t-1 , S t-3 , S t-4 , S t-5 , S t-6 , T t-1 and so forth.The top 12 features with highest feature importance of the PCC include 3 types of features: historical solar irradiation, temperature and relative humidity.The top 12 features with highest feature importance obtained by MI mainly include 3 types of features: historical irradiation, temperature and time.But the top 12 features with highest feature importance obtained by CMI method include four types of features: historical irradiation, temperature, wind speed and time.Therefore, CMI can fully consider the redundancy between the features in the same type when calculating the feature importance.
Using the same method, the measured data of ORNL and LELH are also used to analyze the importance of features.From Table 1, it can be seen that the ranking of CMI usually contains more type of features than MI and PCC among the top 12 features by the data of ORNL and LELH.So, it can prove that by different data sets, CMI can also effectively analyze the informational redundancy between the features.
At the same time, Table 1 shows that the data sets collected from different locations will have different rankings.Therefore, feature selection needs to be carried out when the different data sets are used.

Data Description and Evaluation Indicators of Feature Selection
The measured data of SRRL, ORNL and LELH in 2015 are selected as the training set and validation set.Because the values of solar irradiation are greater than zero is mainly concentrated at 7:00 to 19:00, the irradiation values to be predicted are located at this time domain [36].
In this experiment, the validation set is made up of the data four weeks random select from spring, summer, autumn and winter respectively in 2015.Other data is used to constitute training sets.The experiment uses the data with the time interval of 15 minutes.In order to achieve the prediction goal of 1 hour ahead, it needs to do rolling forecast with four steps continuously, the construction of the original feature set and the prediction goal are shown in Figure 4.In Figure 4, the original feature set for feature selection is made up of feature 1 to feature 72, the features marked with t are the original input features of 7 o'clock and the features marked with t' are the original input features of 19 o'clock.Therefore, S (t) to S (t+3) as the 4 prediction values in 7 o'clock and S (t') to S (t'+3) as the 4 prediction values in 19 o'clock.As shown in Figure 4, the dimension of input of the predictor is 72, which increases the complexity of predictor training process and reduces the prediction efficiency.At the same time, there is a lot of redundant information in the original feature set, which makes the prediction accuracy at a lower level.While, feature selection can reduce the dimension of the input features and solve these negative effects.
The feature selection method of SFS is carried out by using MAPE as a measure: where X t is the real value, X t is the predictive value, m is the number of the predictive value (or real value).

The Method of Feature Selection Based on CMI and GPR
The SFS process based on CMI and GPR is as follows: (1) Construct the original feature set; (2) Using CMI to calculate feature importance; (3) The SFS method is carried out according to the ranking of features' importance.GPR predictors is constructed with different feature subsets and the MAPE obtained from the GPR predictor is used as index to determine the optimal features subset.
(4) Finally, the prediction model constructed with the optimal feature subset is used as the final prediction model to predict the solar irradiation.

Covariance Function Selection and Optimal Predictor Build of GPR
To select the best covariance function of GPR, the experiment of covariance functions selection is proposed.The expressions of 10 covariance functions of GPR are shown in table 2 [30].
Table 2. 10 kinds of covariance functions of GPR.

GPR covariance function Mathematical expression Function number
In Table 2, from function ① to ⑤: l  is the feature length scale and the value will not be changed with the input determined.

The Method of Feature Selection Based on CMI and GPR
The SFS process based on CMI and GPR is as follows: (1) Construct the original feature set; (2) Using CMI to calculate feature importance; (3) The SFS method is carried out according to the ranking of features' importance.GPR predictors is constructed with different feature subsets and the MAPE obtained from the GPR predictor is used as index to determine the optimal features subset.
(4) Finally, the prediction model constructed with the optimal feature subset is used as the final prediction model to predict the solar irradiation.

Covariance Function Selection and Optimal Predictor Build of GPR
To select the best covariance function of GPR, the experiment of covariance functions selection is proposed.The expressions of 10 covariance functions of GPR are shown in Table 2 [30].
Table 2. 10 kinds of covariance functions of GPR.

GPR Covariance Function Mathematical Expression Function Number
Squared Exponential Kernel k(x i , In Table 2, from function 1 to 5 : σ l is the feature length scale and the value will not be changed with the input determined.σ f is the signal standard deviation.θ = [log θ l , log θ f ], r 1 = (x i − x j ) T (x i − x j ) is the Euclidean distance between x i and x j .From function 6 to 10 are the automatic relevance determination covariance function, , where the d is the number of features entered into the predictor, m = 1, . . ., d, σ m is the feature length scale under the different features, its numeric value will be changed with the different features.σ f is the signal standard deviation.α is a parameter which is larger than zero and value of it is determined by σ m or σ l .
10 covariance functions are used to construct 10 GPR models respectively.The training set is used to train the 10 GPR models and the feature selection process is carried out using the validation set in each model.The verification set is constructed by the data randomly selected from the four seasons in 2015.The training set is made up of the remaining data in 2015.
Figure 5 shows the process of feature selection for GPR combines with 10 different covariance functions by using SRRL data sets.Each GPR model combines PCC, MI and CMI to get the error statistics (measured by MAPE).The three methods of PCC-GPR, MI-GPR and CMI-GPR correspond to Figure 5a-c.Figure 5 shows the process of feature selection for GPR combines with 10 different covariance functions by using SRRL data sets.Each GPR model combines PCC, MI and CMI to get the error statistics (measured by MAPE).The three methods of PCC-GPR, MI-GPR and CMI-GPR correspond to the Figure 5(a), Figure 5(b) and Figure 5(c).
It can be seen from Figure 5 that the prediction error decreases with the increase of the feature dimension at first.When the dimension of features from 11 to 40, the error decreases slightly and the predictors with different covariance functions will have a different minimum MAPE value.It can be seen that the accuracy of prediction can be increased by adding redundant features.The black circle in Figure 5 is used to mark the minimum MAPE value.In this process, the value of error is mainly distributed between 10% and 20%.When the dimension of feature set increases to 41 dimensions, all of the error values are greater than the minimum error.3 shows the dimension of optimal feature subset and the minimum MAPE values obtained in the experiment of covariance function selection by using three different data sets of SRRL, ORNL and LELH.
Using the SRRL data, the minimum MAPE of PCC-GPR is 9.825%, the covariance function is ARD Exponential Kernel and the dimension of feature set is 15.The minimum MAPE value of MI-GPR is 9.860%, the covariance function is ARD Exponential Kernel and the feature dimensional is 36.The minimum value of MAPE by CMI-GPR is 8.707%, the covariance function is ARD Exponential It can be seen from Figure 5 that the prediction error decreases with the increase of the feature dimension at first.When the dimension of features from 11 to 40, the error decreases slightly and the predictors with different covariance functions will have a different minimum MAPE value.It can be seen that the accuracy of prediction can be increased by adding redundant features.The black circle in Figure 5 is used to mark the minimum MAPE value.In this process, the value of error is mainly distributed between 10% and 20%.When the dimension of feature set increases to 41 dimensions, all of the error values are greater than the minimum error.
Table 3 shows the dimension of optimal feature subset and the minimum MAPE values obtained in the experiment of covariance function selection by using three different data sets of SRRL, ORNL and LELH.Using the SRRL data, the minimum MAPE of PCC-GPR is 9.825%, the covariance function is ARD Exponential Kernel and the dimension of feature set is 15.The minimum MAPE value of MI-GPR is 9.860%, the covariance function is ARD Exponential Kernel and the feature dimensional is 36.The minimum value of MAPE by CMI-GPR is 8.707%, the covariance function is ARD Exponential Kernel and the feature dimension is 14.The minimum MAPE of CMI-GPR is 1.153% lower than the MAPE of MI-GPR and 2.118% lower than PCC-GPR.
The dimension of optimal feature subset by using CMI-GPR is least, which is 22 less than the dimension of MI-GPR and 1 less than dimension of PCC-GPR.Although the dimension of PCC-GPR's feature set is 1 more than CMI-GPR, the MAPE value of PCC-GPR is 2.118% larger than CMI-GPR.
In summary ARD Exponential Kernel function shows the best performance of the prediction.So, ARD Exponential Kernel function is selected as the best covariance function of GPR.In the same way, the covariance function selection experiment is performed by using the other two locations: ORNL and LELH.
In comparison, CMI-GPR with ARD-Exponential Kernel (abbreviated as CMI-GPR AEK ) is the optimal predictor when SRRL and ORNL datasets are used.The optimal predictor with its optimal feature subset has better prediction accuracy.When using the LELH dataset, the optimal predictor of CMI-GPR build by ARD Rational Quadratic (abbreviated as CMI-GPR ARQ ) with its optimal feature subset has the better prediction accuracy.

The Comparison Experiment of Feature Selection
CMI, MI and PCC are combined with BPNN and SVR respectively as the contrastive experiment of the proposed method.The results of feature selection are analyzed in this part.In order to show the difference between different predictors, the same training set and verification set of GPR models were used in the comparison experiment.
In the contrast experiment, the number of input layer and hidden layer nodes of BPNN is set according to Kolmogorov theory: the number of input layer nodes is n 1 , the number of hidden layer nodes is n 2 .The mathematical relation between n 1 and n 2 is n 2 = n 1 + 1.In the process of feature selection, n 1 and n 2 are adjusted with the changes of the dimension of input features [37][38][39].
The RBF kernel function of is selected as SVR's kernel function.The cross-validation method is used to determine the parameters of SVR such as the penalty factor c and the variance coefficient g [40].Then the best combination of parameters is obtained.Setting the [−10,10] to the optimal range of c and g [41,42].The parameters of GPR, such as the feature length scale σ l , the signal standard deviation σ f and the distributed feature length scale σ m , can be automatically acquired during training processing according to the dimensions and the length of features [34].So, the parameter optimization can be simplified.
Figure 6 shows the process of feature selection based on GPR AEK , SVR and BPNN combined with CMI, MI and PCC respectively using data of SRRL. Figure 6a shows the process of feature selection by GPR AEK combined with CMI, MI and PCC respectively.When the first 10 features are added to the predictor, predicted error is significantly reduced.Adding new features, the error continuous to reduce and then reaches the minimum MAPE.The value of minimum MAPE is 9.025% and the dimension of feature subset is 14.From the perspective of MAPE, CMI-GPR AEK obtain the minimum MAPE value (9.025%).The minimum MAPE of MI-GPR AEK is 9.154%.The smallest MAPE of PCC-GPR AEK is 9.193%.Therefore, CMI-GPR AEK has the highest accuracy.CMI, MI and PCC are combined with BPNN and SVR respectively as the contrastive experiment of the proposed method.The results of feature selection are analyzed in this part.In order to show the difference between different predictors, the same training set and verification set of GPR models were used in the comparison experiment.
In the contrast experiment, the number of input layer and hidden layer nodes of BPNN is set according to Kolmogorov theory: the number of input layer nodes is n1, the number of hidden layer nodes is n2.The mathematical relation between n1 and n2 is n2 = n1 + 1.In the process of feature selection, n1 and n2 are adjusted with the changes of the dimension of input features [37][38][39].
The RBF kernel function of is selected as SVR's kernel function.The cross-validation method is used to determine the parameters of SVR such as the penalty factor c and the variance coefficient g [40].Then the best combination of parameters is obtained.Setting the [−10,10] to the optimal range of c and g [41,42].The parameters of GPR, such as the feature length scale l  , the signal standard deviation f  and the distributed feature length scale m  , can be automatically acquired during training processing according to the dimensions and the length of features [34].So, the parameter optimization can be simplified.Figure 6 shows the process of feature selection based on GPRAEK, SVR and BPNN combined with CMI, MI and PCC respectively using data of SRRL. Figure 6(a) shows the process of feature selection by GPRAEK combined with CMI, MI and PCC respectively.When the first 10 features are added to the predictor, predicted error is significantly reduced.Adding new features, the error continuous to reduce and then reaches the minimum MAPE.The value of minimum MAPE is 9.025% and the dimension of feature subset is 14.From the perspective of MAPE, CMI-GPRAEK obtain the minimum MAPE value (9.025%).The minimum MAPE of MI-GPRAEK is 9.154%.The smallest MAPE of PCC-GPRAEK is 9.193%.Therefore, CMI-GPRAEK has the highest accuracy.
Paying attention to the types and dimension of feature subset, the following conclusions can be obtained: the dimension of the optimal features subset is 14 for CMI-GPRAEK, the dimension is 17 for MI-GPRAEK and the dimension is 24 for PCC-GPRAEK.In the three optimal feature subsets, the CMI-GPRAEK contains four types of features: historical irradiation, temperature, wind speed and time.MI-GPRAEK contains four types of features: historical irradiation, moment, temperature and relative humidity.PCC-GPRAEK contains 3 types of features: historical irradiation, temperature and relative humidity.
Combining the above analysis, the following conclusions can be obtained: the optimal feature Paying attention to the types and dimension of feature subset, the following conclusions can be obtained: the dimension of the optimal features subset is 14 for CMI-GPR AEK , the dimension is 17 for MI-GPR AEK and the dimension is 24 for PCC-GPR AEK .In the three optimal feature subsets, the CMI-GPR AEK contains four types of features: historical irradiation, temperature, wind speed and time.MI-GPR AEK contains four types of features: historical irradiation, moment, temperature and relative humidity.PCC-GPR AEK contains 3 types of features: historical irradiation, temperature and relative humidity.
Combining the above analysis, the following conclusions can be obtained: the optimal feature subset of CMI-GPR AEK contains more types of features and has lowest dimension.While the predictor of CMI-GPR AEK has the highest accuracy than MI-GPR AEK and PCC-GPR AEK .Therefore, MI-GPR AEK and PCC-GPR AEK contains some redundant and invalid information, which results in lower accuracy and higher dimension of feature subset.
Figure 6b is the processes of feature selection combination of CMI, MI and PCC with SVR.In terms of error, the minimum MAPE is 10.150%, the prediction method is CMI-SVR.In terms of feature's dimension, the dimension of CMI-SVR is the lowest and the dimension is 13.
As shown in Figure 6c, the feature selection of BPNN with CMI, MI and PCC are obtained.CMI-BPNN has the best results with the minimum MAPE (10.652%) and the dimension of feature subset is 16.
Based on the above analysis, CMI-GPR AEK , CMI-SVR and CMI-BPNN are the three best predictors after the feature selection by using SRRL data sets.
In order to compare the feature selection with different data sets, the experimental results of ORNL and LELH are sorted out. it is can be seen from Table 4 that CMI-GPR AEK and CMI-GPR ARQ are the optimal predictors when using ORNL and LELH data sets, respectively.The errors and dimensions of feature subset by using different data sets can be shown in Table 4.The results show that CMI-GPR AEK and CMI-GPR ARQ have the highest accuracy and low dimension of the optimal feature subset for different data set.

Prediction Experiment of Actual Measured Irradiation Data
In this section, the experiment of solar irradiation prediction is carried out.The optimal feature subsets and optimal predictors are constructed by the experiments of feature selection in chapter 3. The comparative prediction method with different feature selection methods and the basic method with established feature set can be used to testify the validity of the proposed method.The established feature set is built refer to the [43].

Data Description and The Construction of Predictor with Optimal Subset
In order to demonstrate the universal adaptability and effectiveness of the proposed method in different time, weather conditions and seasons, the measured data of SRRL, ORNL and LELH in 2016 are used to test respectively.The test set is selected from spring, summer, autumn and winter randomly.In the experiment, two covariance functions (ARD Exponential Kernel and ARD Rational Quadratic) are used in GPR to build the optimal GPR predictors (GPR AEK and GPR ARQ ).Meanwhile, SVR and BPNN as the contrastive predictors.
The construction of optimal features subsets is shown in Tables 3 and 4. For example, the optimal subsets of CMI-GPR AEK are composed of the first 14 features in the ranking of feature importance when the SRRL data set is used.

Valuation Indicators
As for evaluation indicators, Mean Absolute Error (MAE), Relative Mean Absolute Error (rMAE), Root Mean Square Error (RMSE) and Relative Root Mean Square Error (rRMSE) as the indicators to evaluate each method in addition to MAPE [23].The error formulas are follows: where X t is the real value, X t is the predictive value, m is the number of the predictive value.

Prediction Experiment
In order to compare the accuracy of different predictors with their optimal feature subsets, the prediction experiments are carried out in spring, summer, autumn and winter by using the data collected from three locations.When the data of SRRL is used, 4 optimal prediction methods are proposed to verify prediction accuracy, namely, CMI-GPR AEK , CMI-GPR ARQ , CMI-SVR and CMI-BPNN.The predicted results are shown in Figure 7 and error statistics are shown in Figure 8. Figure 7 shows the result of solar irradiation forecasting in 4 reasons by 4 different optimal predictors with their optimal feature subset.Figure 8 shows the error distribution of solar irradiation forecasting in 4 reasons by 4 different optimal predictors with their optimal feature subset.As Figure 8 shown, the method has higher predictive accuracy when the absolute value of the error is closer to 0. The result of the errors of different prediction methods (they may not the optimal ones) under the different feature sets are shown in Table 5.
Figure 7a shows the results of a random selected weekly irradiation prediction experiment of spring.As shown in Figure 7a, it is usually sunny in spring and the predictors have a high predictive accuracy.Figure 8a-d show the error distribution of real values and predictive values of solar irradiation in spring, which corresponds to the error distribution of CMI-GPR AEK , CMI-GPR ARQ , CMI-SVR and CMI-BPNN respectively.As shown in Figure 8, two of the most precise prediction methods are CMI-GPR AEK and CMI-GPR ARQ .Figure 8a shows that the errors are concentrated between −50 W/m 2 and 50 W/m 2 , while Figure 8c,d show that the distribution of predictive errors about CMI-SVR and CMI-BPNN are relatively dispersed.
As Table 5 shown, the MAPE of CMI-GPR AEK are the lowest.For other error indicators, such as the rRMSE of CMI-GPR AEK is decreases about 4.4% and rMAE decreases about 3.564% than CMI-SVR.CMI-GPR AEK also has shown better predicted accuracy than CMI-GPR ARQ and CMI-BPNN.CMI-GPR AEK has the highest predictive accuracy.
Figure 7b shows the results of a random selected weekly irradiation prediction experiment from summer.In Figure 7b, the fluctuation of solar irradiation is undulating in summer, especially in the 2nd, 6th and 7th days.Figure 8e-h shows the error distribution of CMI-GPR AEK , CMI-GPR ARQ , CMI-SVR and CMI-BPNN respectively in summer.The number of error's values between −100 W/m 2 and 100 W/m 2 is biggest by using CMI-GPR AEK .Therefore, the CMI-GPR AEK has the highest accuracy of prediction.As Table 5 shows, the error value in summer is significantly increased compared with spring.CMI-GPR AEK predictor has the smallest error.Other error indicators refer to Table 5.
Figure 7c shows the results of a random selected weekly irradiation prediction experiment from autumn.As shown in Figure 7c, the overall trend of autumn is very unstable and the predictive accuracies of the 2nd, 3rd and 5th days is significantly reduced.Figure 8i-l show the error distribution of CMI-GPR AEK , CMI-GPR ARQ , CMI-SVR and CMI-BPNN respectively in autumn.The predictor of CMI-GPR AEK is the best.Compared with CMI-GPR AEK , the error of CMI-GPR ARQ increased by 6.958% in MAPE, increased by 7.205% in rRMSE and 2.161% in rMAE.The predictive errors of CMI-SVR and CMI-BPNN are obviously worse than CMI-GPR AEK and CMI-GPR ARQ .
Figure 7d shows the results of a random selected weekly irradiation prediction experiment from winter.As shown in Figure 7d, the 2nd and 4th days of real value fluctuates obviously in winter while the other five days of real value are less volatile.Figure 8m-p shows the distribution of error by using MI-GPR AEK , CMI-GPR ARQ , CMI-SVR and CMI-BPNN in winter and CMI-GPR AEK has the best distribution of error.As the Table 5 shows, the MAPE of CMI-GPR AEK is 12.472%.Other predictors: the MAPE values of CMI-GPR ARQ , CMI-SVR and CMI-BPNN are 16.628%, 16.942% and 16.992% respectively.
From the errors of all year to analyze the result of prediction, CMI-GPR AEK has the highest predictive accuracy.The MAPE of CMI-GPR AEK is 5.365%.It is decreased about 3.299% than CMI-GPR ARQ , 7.647% than CMI-SVR and 7.871% than CMI-BPNN.As shown in Table 5, by using other indexes of error evaluation, the CMI-GPR AEK also shows the best performance.
To verify the effectiveness of the proposed method, the statistical errors about suboptimal prediction method are shown in Table 5.From Table 5, it can be seen that, the methods by using PCC and MI have higher errors than by using CMI generally.For example, in spring, the RMSE of CMI-GPR AEK is 25.917 W/m 2 lower than MI-GPR AEK and 20.547 W/m 2 lower than PCC-GPR AEK .In summer, the MAPE of MI-GPR AEK and PCC-GPR AEK is 3.846% and 5.764% higher than CMI-GPR AEK .
. The result of solar irradiation forecasting with different predictors using optimal feature subset.Figure 7(c) shows the results of a random selected weekly irradiation prediction experiment from autumn.As shown in Figure 7(c), the overall trend of autumn is very unstable and the predictive accuracies of the 2 nd , 3 rd and 5 th days is significantly reduced.Figure 8(i) to Figure 8(l) show the error distribution of CMI-GPRAEK, CMI-GPRARQ, CMI-SVR and CMI-BPNN respectively in autumn.The predictor of CMI-GPRAEK is the best.Compared with CMI-GPRAEK, the error of CMI-GPRARQ increased by 6.958% in MAPE, increased by 7.205% in rRMSE and 2.161% in rMAE.The predictive errors of CMI-SVR and CMI-BPNN are obviously worse than CMI-GPRAEK and CMI-GPRARQ.To testify the adaptability of the proposed method, the verification experiments of solar irradiation are carried out by using the data of ORNL and LELH at the same time.In the verification experiments, GPR AEK shows the higher accuracy than SVR, BPNN and GPR ARQ .To make further compare the influence of different feature subsets on the solar irradiation prediction, the error statistics of GPR AEK combines CMI, MI and PCC (named as CMI-GPR AEK , MI-GPR AEK and PCC-GPR AEK respectively) and the GPR AEK combines the constructed feature set (named as GPR AEK ) are shown respectively in Table 6.As shown in Table 6, using the data of ORNL and LELH, CMI-GPR AEK shows the lower errors than the other prediction methods.For example, when using the data of ORNL, the rRMSE of CMI-GPR AEK is 3.302% than GPR AEK , is 2.221% lower than MI-GPR AEK and is 2.983% lower than PCC-GPR AEK .Therefore, CMI-GPR AEK is the best prediction method with the highest predictive accuracy.
Comprehensive consideration of irradiation prediction experiments using 3 different data sets, CMI-GPR AEK has higher predictive accuracy.Therefore, the best prediction method is CMI-GPR AEK .

Conclusions
In order to determine the optimal feature subset of solar irradiation prediction and construct the optimal predictor, the new feature selection method of the irradiation prediction based on CMI and GPR is proposed.This method can avoid the negative effects of redundancy between features in the feature subset and improve the forecast accuracy by the GPR with ARD Exponential Kernel function.
The following results are obtained: (1) When the importance of features is analyzed by CMI, the optimal feature subset with low redundancy of information and strong correlation between the selected features are constructed.Therefore, the influence of redundancy in the irradiation prediction can be reduced.
(2) From the experiment of solar irradiation forecasting, GPR shows the higher prediction accuracy.It could determine the parameters automatically and avoid the complex parameters optimization process, which is the advantage for feature selection.
(3) The predict ability of GPR with different covariance functions has been analyzed and the covariance function of ARD Exponential Kernel is chosen to construct the predictor according to the experiment results.CMI-GPR AKE is the best prediction model with low feature dimension and the highest prediction accuracy.The dimension of optimal feature set of CMI-GPR AKE is 14, which is 5 lower than PCC-GPR AKE and 11 lower than MI-GPR AEK .The MAPE of solar irradiation forecasting of CMI-GPR AEK is 3.299% lower than CMI-GPR ARQ , 7.647% lower than CMI-SVR and 7.871% lower than CMI-BPNN.

Figure 1 .
Figure 1.The flowchart of the proposed method.

Figure 1 .
Figure 1.The flowchart of the proposed method.

Figure 2 .
Figure 2. The relationship between different meteorological features and solar irradiation.(a) The original data for a week in September 2015 (b) Enlarge the data in the red box.Figure 2. The relationship between different meteorological features and solar irradiation.(a) The original data for a week in September 2015 (b) Enlarge the data in the red box.

Figure 2 .
Figure 2. The relationship between different meteorological features and solar irradiation.(a) The original data for a week in September 2015 (b) Enlarge the data in the red box.Figure 2. The relationship between different meteorological features and solar irradiation.(a) The original data for a week in September 2015 (b) Enlarge the data in the red box.

Figure 2 (
b) is used to display the data in the red box.

Figure 2 (
b) is the measured data from 7 to 18 o'clock.At the time of 7 to 12

Figure 3 .
Figure 3.The importance of features using different importance analysis method.

Figure 3 .
Figure 3.The importance of features using different importance analysis method.

Figure 4 .
Figure 4.The description of original feature set and prediction goal.

fFigure 4 .
Figure 4.The description of original feature set and prediction goal.


From function ⑥ to ⑩ are the automatic relevance determination covariance function, is the feature length scale under the different features, its numeric value will be changed with the different features.f  is the signal standard deviation. is a parameter which is larger than zero and value of it is determined by m are used to construct 10 GPR models respectively.The training set is used to train the 10 GPR models and the feature selection process is carried out using the validation set in each model.The verification set is constructed by the data randomly selected from the four seasons in 2015.The training set is made up of the remaining data in 2015.

Figure 5 .
Figure 5. Feature selection process with different GPR covariance functions.Table3shows the dimension of optimal feature subset and the minimum MAPE values obtained in the experiment of covariance function selection by using three different data sets of SRRL, ORNL and LELH.Using the SRRL data, the minimum MAPE of PCC-GPR is 9.825%, the covariance function is ARD Exponential Kernel and the dimension of feature set is 15.The minimum MAPE value of MI-GPR is 9.860%, the covariance function is ARD Exponential Kernel and the feature dimensional is 36.The minimum value of MAPE by CMI-GPR is 8.707%, the covariance function is ARD Exponential

Figure 5 .
Figure 5. Feature selection process with different GPR covariance functions.

Figure 6 .
Figure 6.The process of feature selection with different predictors and different importance analysis methods.Figure6shows the process of feature selection based on GPRAEK, SVR and BPNN combined with CMI, MI and PCC respectively using data of SRRL.Figure6(a) shows the process of feature selection by GPRAEK combined with CMI, MI and PCC respectively.When the first 10 features are added to the predictor, predicted error is significantly reduced.Adding new features, the error continuous to reduce and then reaches the minimum MAPE.The value of minimum MAPE is 9.025% and the dimension of feature subset is 14.From the perspective of MAPE, CMI-GPRAEK obtain the minimum MAPE value (9.025%).The minimum MAPE of MI-GPRAEK is 9.154%.The smallest MAPE of PCC-GPRAEK is 9.193%.Therefore, CMI-GPRAEK has the highest accuracy.Paying attention to the types and dimension of feature subset, the following conclusions can be obtained: the dimension of the optimal features subset is 14 for CMI-GPRAEK, the dimension is 17 for MI-GPRAEK and the dimension is 24 for PCC-GPRAEK.In the three optimal feature subsets, the CMI-GPRAEK contains four types of features: historical irradiation, temperature, wind speed and time.MI-GPRAEK contains four types of features: historical irradiation, moment, temperature and relative humidity.PCC-GPRAEK contains 3 types of features: historical irradiation, temperature and relative humidity.Combining the above analysis, the following conclusions can be obtained: the optimal feature

Figure 6 .
Figure 6.The process of feature selection with different predictors and different importance analysis methods.

Figure 7 .Figure 7 .
Figure 7.The result of solar irradiation forecasting with different predictors using optimal feature subset.

Figure 8 .
Figure 8.The histogram of the error by different predictor in 4 seasons using the optimal feature subset.

Figure 7 (
Figure 7(c)  shows the results of a random selected weekly irradiation prediction experiment from autumn.As shown in Figure7(c), the overall trend of autumn is very unstable and the predictive accuracies of the 2 nd , 3 rd and 5 th days is significantly reduced.Figure8(i) to Figure8(l) show the error distribution of CMI-GPRAEK, CMI-GPRARQ, CMI-SVR and CMI-BPNN respectively in autumn.The predictor of CMI-GPRAEK is the best.Compared with CMI-GPRAEK, the error of CMI-GPRARQ increased by 6.958% in MAPE, increased by 7.205% in rRMSE and 2.161% in rMAE.The predictive errors of CMI-SVR and CMI-BPNN are obviously worse than CMI-GPRAEK and CMI-GPRARQ.Figure7(d) shows the results of a random selected weekly irradiation prediction experiment from winter.As shown in Figure7(d), the 2 nd and 4 th days of real value fluctuates obviously in winter while the other five days of real value are less volatile.Figure 8(m) to Figure 8(p) shows the Figure 7(c)  shows the results of a random selected weekly irradiation prediction experiment from autumn.As shown in Figure7(c), the overall trend of autumn is very unstable and the predictive accuracies of the 2 nd , 3 rd and 5 th days is significantly reduced.Figure8(i) to Figure8(l) show the error distribution of CMI-GPRAEK, CMI-GPRARQ, CMI-SVR and CMI-BPNN respectively in autumn.The predictor of CMI-GPRAEK is the best.Compared with CMI-GPRAEK, the error of CMI-GPRARQ increased by 6.958% in MAPE, increased by 7.205% in rRMSE and 2.161% in rMAE.The predictive errors of CMI-SVR and CMI-BPNN are obviously worse than CMI-GPRAEK and CMI-GPRARQ.Figure7(d) shows the results of a random selected weekly irradiation prediction experiment from winter.As shown in Figure7(d), the 2 nd and 4 th days of real value fluctuates obviously in winter while the other five days of real value are less volatile.Figure 8(m) to Figure 8(p) shows the

Figure 8 .
Figure 8.The histogram of the error by different predictor in 4 seasons using the optimal feature subset.

Table 1 .
The importance ranking of features of different data sets with different importance analysis method.

Table 3 .
Feature selection results with GPR using different covariance functions in different area.

Table 4 .
Feature selection results in different area.

Table 5 .
Error statistics of experimental results by SRRL.

Table 6 .
Error statistics of experimental results by ORNL and LELH.