1. Introduction
Daily total solar radiation is considered as the most important parameter in meteorology, solar conversion, and renewable energy applications, particularly for the sizing of stand-alone photovoltaic (PV) systems. The knowledge of the amount of solar radiation falling on the surface of the earth is of prime importance to engineers and scientists involved in the design of solar energy systems. In particular, many design methods for thermal and photovoltaic systems require information about the daily radiation on a horizontal surface in order to predict the energy production of the system, and its prediction precision is extremely instructive for the stable operation of the power grid as well as the formulation of a scheduling plan [
1].
Many researchers have produced daily solar radiation (DSR) predictions. DSR prediction methods are usually divided into three categories: conventional physical models, mathematical statistical models, and machine learning. Conventional physical models predict solar radiation values through a series of physical analysis, data fitting, complex mathematical model construction, and calculations in the absence of meteorological data on total solar radiation intensity. A lot of authors have succeeded in various clear day models as well, including the simple ones of the half-sine [
2,
3] and the Collares-Pereira and Rabl model [
4]. Physical models do not reflect the strong randomness of the solar radiation sequence. Once the meteorological environment changes, the calculation accuracy is greatly reduced.
The traditional mathematical statistics models include regression analysis [
5], time series analysis [
6,
7], gray theory [
8], fuzzy theory [
9,
10], and Kalman filter [
11]. Trapero et al. (2015) [
5] applied dynamic harmonic regression model (DHR) to forecast short-term direct solar radiation and scattered solar radiation in Spain for the first time in 2015. Huang et al. (2013) [
6] used the autoregressive model to predict the 2013 meteorological factors—when the solar radiation is within the dynamic adjustment system framework, the accuracy is 30% higher than the general neural network or stochastic model. Through the integration of Fourier transform and neural network, Fidan et al. (2014) [
8] predicted hourly solar radiation in Izmir, Turkey. Olcan, Mahmudov (2016) [
10] proposed a modified fuzzy time series (FTS) using eight different radiation mixing models. The results show that, compared with other fuzzy models and traditional time series methods, the proposed hybrid model-8 exhibits better performance. Akarslan et al. (2014) [
11] first utilized the multi-dimensional linear predictive filtering model to predict solar radiation, and the two-dimensional linear predictive filtering model as well as the traditional statistical forecasting method have passed through empirical analysis.
With the development of big data mining, machine learning technology has attracted widespread attention. For instance, artificial neural network (ANN) [
12,
13,
14,
15,
16,
17] and support vector machine (SVM) [
18,
19,
20] have been widely applied in solar radiation prediction. Amrouche and Le Pivert (2014) [
12] took advantage of spatial modeling and artificial neural networks (ANNs) to predict daily total solar radiation in four locations in the United States, and the empirical results indicate that the proposed model satisfies the expected accuracy. Benmouiza and Cheknane (2013) [
13] used K-means to classify the input data, then used nonlinear autoregressive neural networks to model various categories, and eventually predicted the solar radiation of test data through the accordant model. Adel, Massi (2010) [
15] used artificial neural network (ANN) for solar irradiance prediction. A comparison between the forecasted one and the energy produced by the Grid Connected Photovoltaic Plants (GCPV) installed on the rooftop of the municipality of Trieste showed the advantages of the model. From the above, the conclusion can be drawn that, when the data set is not enough, ANN cannot perform well. Ekici, B.B. [
18] developed an intelligent model based on least squares support vector machine (LSSVM) to forecast solar radiation for the next day. The number of days from 1 January, the daily average temperature, the daily maximum temperature, the sunshine time, and the sun day before the parameter were used as input to predict the daily sun sunshine. The results indicated that LSSVM is a superb approach to evaluate the amount of solar radiation of a specific location with an accuracy of 99.294%. Sun Shaolong et al. [
19] put forward a decomposition cluster set (DCE) learning method for solar radiation prediction. In the proposed DCE learning method, (1) Ensemble Empirical Mode Decomposition (EEMD) is used to decompose the original solar radiation data into several intrinsic mode functions (IMF) and residual components, (2) least squares support vector regression (LSSVR) is utilized to predict IMF and residual component, and (3) the Kmeans method is used to cluster the prediction results of all components. The empirical analysis of the solar radiation data introduced in Beijing shows that, compared with other benchmark models, the Normalized Root Mean Square Error (NRMSE) and Mean Absolute Percentage Error (MAPE) generated by the DCE learning method are smaller, and the accuracy rates are 2.96% and 2.83%, respectively. In the forecast one day ahead, Meenal and Selvakumar [
20] assessed the accuracy of support vector machine (SVM), artificial neural network (ANN), and empirical solar radiation models with different combinations of input parameters. The parameters include month, latitude, longitude, bright sunshine hours, day length, relative humidity, and maximum and minimum temperature. Based on statistical measures, the daily solar radiation forecasting models of different cities in India were evaluated. The results indicated that, compared with ANN and empirical models, the SVM model with the most influencing input parameters is superior. However, diverse types of kernel functions and kernel parameters greatly affect the accuracy of fitting and generalization.
The Extreme Learning Machine (ELM) originally put forward by Huang in 2004 [
21] has faster convergence speed and less human interference than traditional neural networks and can also prevent possible occurrences in gradient-based learning, such as stopping criteria, learning rate, and learning periods. In view of this, extreme learning machines are widely used in different forecasting areas, load forecasting [
22,
23], wind speed forecasting [
24,
25], electricity price forecasting [
26], carbon emission forecasting [
27], and so on. However, the input weight matrix and hidden layer bias of the randomly assigned ELM is likely to influence the generalization ability of the ELM. Therefore, an optimization algorithm is highly desirable to obtain the optimal weight of the input layer and the bias of the hidden layer.
The Bat Algorithm (BA) [
28] is considered to be a new meta-heuristic method that dynamically controls the mutual conversion between local search as well as global search and achieves better convergence. For the superb performance of local search and global search compared with existing algorithms such as genetic algorithm (GA) [
29] and particle swarm optimization (PSO) [
30], researchers and scholars have made wide use of BA in various optimization problems [
31,
32,
33]. Qi Liu et al. [
31] proposed a novel Hybrid Bat Algorithm for complex continuous optimization problems. Deepak Gupta et al. [
32] proposed an Optimized Binary Bat algorithm for classification of different types of leukocytes. Lili A et al. [
33] proposed bat algorithm to minimize total generator cost from the thermal power plant, and their experimental results showed that bat algorithm is able to save approximately 1.23% compare to the actual cost.
Therefore, this paper optimizes the input weight and hidden bias of the extreme learning machine through BA to realize the advantages of maximizing the global and local search capabilities of BA and the goal of ELM fast learning speed, which overcomes the inherent instability of ELM.
Considering the solar radiation’s inherent complexity, which is influenced by many parameters, it is expected to complete data processing ahead [
34,
35]. Wavelet transform (WT) is considered to be the most commonly used data preprocessing method for decomposing time series and eliminating stochastic volatility. Tan et al. [
36] succeeded in using wavelet decomposition to decompose the electricity price sequence into an approximate sequence and detail sequences, and each sub-series can be forecasted separately by an appropriate time series model. The results show that WT can capture the complex features of non-stationary, nonlinear, and high volatility.
Based on the above studies, it can be found that the appropriate selection of influencing factors has a significant impact on the prediction of solar radiation. Nevertheless, most studies only emphasize the effects of these factors on solar radiation, ignoring the interrelationships between them. In fact, the information contained in the data overlaps, so the computational efficiency is greatly reduced due to the complexity of the network. Principal component analysis (PCA) simplifies the network structure and significantly improves operational efficiency and prediction accuracy by minimizing the dimensionality of pre-selected influencing factors for information retention. Sun, Wet al. [
37] used PCA to draw original features and the dimensions of the LSSVM input selection were reduced to predict daily PM2.5 concentrations. Experimental studies show that this method is superior to the single LSSVM model. Therefore, PCA is utilized in this paper with the intention of reducing the dimensionality of data and improving prediction accuracy.
At present, most of the empirical studies on solar radiation prediction use data from a certain region or a certain country. Few use different longitudes and different dimensions to predict simultaneously. In order to verify the solar radiation prediction model proposed in this paper. The validity and application of four solar radiation time series in Beijing (40 degrees north latitude 116), New York (north latitude 40 degrees −73), Melbourne (latitude −37, longitude 145), and São Paulo (latitude −23, longitude −46) are studied in this paper.
In summary, after the WT decomposition, the solar radiation series is split into an approximate series and a detailed series. Then, the detailed time series is discarded, and the approximate time series and meteorological indicators are processed by partial autocorrelation analysis (PCA) to further determine the input variables of the prediction model. Finally, a BA-optimized Extreme Learning Machine (ELM) is applied to obtain predicted daily solar radiation. In order to verify the validity and superiority of the proposed model, four different sites were simulated in this paper. The main contributions of this article are as follows.
The factors affecting solar radiation contain meteorological indicators and historical data on solar radiation in this paper;
ELM is a new type of neural network that has been applied in solar radiation prediction, which avoids the shortcomings of slow learning, large training samples, and over-fitting in previous studies;
The BA-optimized ELM application further improves the robustness and prediction accuracy of the model;
Implementation of WT greatly reduces the difficulty of solar radiation prediction;
This paper focuses on the correlation between influencing factors and uses PCA to reduce the dimensionality to improve computational efficiency and prediction accuracy;
This may be the first paper to study solar radiation prediction methods that can be applied to different parts of the world at the same time.
The structure of this paper is as follows:
Section 2 briefly introduces WT, PACF, PCA, BA, ELM, and BA-ELM, and the new hybrid prediction technique (PCA-WT-BA-ELM) is then discussed in detail.
Section 3 provides empirical analysis, which includes data collection, input selection, parameter settings, prediction results, and error analysis for four cities.
Section 4 shows the general conclusions based on the experimental results.
2. Methodology
2.1. Wavelet Transform
Wavelet decomposition and reconstruction are based on multi-resolution analysis. It was first proposed by Mallat in 2000 [
38] and is one of the most useful tools in signal analysis. The observation data usually consists of two parts—the true value and the error (i.e., noise). The true value (i.e., the useful signal) in the observed data is different from the characteristic exhibited by the random noise in the time-frequency domain. The useful features of the useful signal in the time domain and the frequency domain are obvious, and generally appear as low frequency signals. The random noise has obvious global characteristics in the time domain and the frequency domain, and the high frequency signal appears in the frequency domain. According to the different characteristics of the two in the time-frequency domain, multi-resolution analysis by wavelet transform can be performed. The components of different frequencies are effectively separated to eliminate random noise. Finally, according to the inverse operation of wavelet transform, the denoising processing of the original observation data is realized by wavelet reconstruction [
39].
The Wavelet Transform equation is defined as the integral of the signal multiplied by scaled, shifted versions of a basic wavelet function—a real-valued function whose Fourier Transform fulfill the admissibility criteria.
where
a is the so-called scaling parameter and b is the time positioning or shifting parameter. Both
a and
b can be continuous or discrete variables. t represents time,
f(
t) represents the original signal, and Ψ (·) represents the mother wavelet function.
Wf(a,b) is the result of the wavelet transform.
2.2. Bat Algorithm
Bat algorithm is inspired by micro-bats’ echolocation behavior, through which bats are able to probe prey and evade obstacles. It has the advantages of parallelism, fast convergence, and less distribution and parameter adjustment [
33].
In the d-dimensions of search space during the global search, the bat
i has the position of
, and velocity
at the time of
, whose position and velocity will be updated as Equations (2) and (3), respectively:
where
is the current global optimal solution.
is the sonic wave frequency that can be seen in Equation (4).
where
is a random number within [0, 1] and
Fmax and
Fmin are the max and min sonic wave frequency of the bat
i. In the flight, each initial bat is allocated one frequency in conformity with [
Fmin,
Fmax] randomly.
If a solution is selected in the current global optimal solution in local search, each bat will bring about a new alternative solution in the way of random walk according to Equation (5).
where
x0 is the solution which is randomly selected in current best disaggregation,
represents the mean of current bat populations, and
μ is the D-dimensional vector within [−1, 1].
Impulse volume
A(
i) and impulse emission rate
R(
i) controlled the balance of bats. When a bat aims at its prey, the volume
A(
i) will declines while the emission rate R(i) ascends. The update of
A(
i) and
R(
i) are expressed as Equations (6) and (7), respectively.
where
γ and
θ are the attenuation coefficient of the volume and the enhancement factor of the search frequency, respectively.
γ is within [0, 1], and
θ > 0.
It has already been proven (Yang, 2012) [
28] that bat algorithm is potentially more powerful than PSO, GA, and Harmony Search. Because BA uses a good combination of major advantages of these algorithms in some way, it has been confirmed by Yang that bat algorithm is potentially more powerful than PSO, GA, and Harmony Search. Thanks to its parallelism, quick convergence, distribution, and less parameter adjustment, BA has been utilized in various areas.
2.3. Extreme Learning Machine
ELM is a novel algorithm based on single hidden layer feed forward neural network (SFLN). Most traditional neural networks, as their nature of the gradient descent method, adjust the weight and bias through multiple iterations, making them slow for training and easy to plunge into the local optimum. Their performance is also subject to certain limitations because it is sensitive to the learning rate. On account of its high sensibility to the learning rate, their performance is restricted [
21].
To improve the SLFN, ELM, as shown in
Figure 1, randomly assign the weight of the input layer and the thresholds of the hidden layer. Without a necessary iteration, the speed of completing the network learning is accelerated. Once the number of the hidden nodes is set, ELM can make use of the Moore-Penrose (MP) [
40] generalized inverse matrix to calculate the output weight, which transforms the training program into solving the least square problem. Moreover, ELM is more accurate on performance than other neural networks [
41]. The calculation steps of the standard ELM can be explained as follows:
The ELM consists of an input layer including, an implicit layer, and an output layer, where the n is input layer neuron number, which corresponds to n input variables x1 … xn. The hidden layer neuron number is L; the output layer neuron number is m, corresponding to m input variables y1… yn.
The connection weights between the input layer and the hidden layer is , and the connection weight between hidden layer and output layer is. .
Make the training set input matrix with Q samples to be and the output matrix to be The hidden layer neuron threshold is , the hidden layer activation function is g(x), and the expected output of the network is . Therefore, ELM can be illustrated as
2.4. The Proposed Model
Despite the fact that ELM has expected performance in major cases, its weakness affects its accuracy. While learning, the possible non-optimal or unnecessary weight values as well as thresholds may reduce ELM’s performance, leading to erratic results. Additionally, in some practical applications, ELM demands a lot of hidden layer nodes to receive expected results, which precisely adds complications and makes it easy to overfit.
In order to solve the above problems and obtain a stable network, an extreme learning machine based on the bat algorithm is proposed to guarantee that the input weight and the bias threshold are reasonably selected. The proposed model takes full advantage of BA′s global search capability and ELM’s rapid convergence rate and also overcomes the inherent problems of ELM. Consequently, BA-ELM performs better in generalization, function approximation, and has more stable simulation results.
Figure 2 shows the whole flowchart of daily solar radiation forecasting, which is divided into four parts.
Part 1 is designed for input selection. First, the meteorological indicator is chosen in view of the Pearson coefficient test, and then the original historical solar radiation sequence is decomposed into two parts: an approximate series and a detailed series. The detailed series is abandoned, and PACF is applied to analyze the intrinsic relationships between the approximation series so as to determine the lag phases of historical radiation. PCA is used to decline the dimensionality of the influential factors, which contains meteorological indicators selected by Pearson coefficient test and historical radiation indicators selected by WT and PACF. The result is the inputs of BA-ELM.
Part 2 is the bat optimization algorithm (BA). It is obvious that BA is utilized to make the weight of the input layer and the bias of the hidden layer in ELM more optimal, for which the expected network can be achieved.
Part 3 is the training process of extreme learning machine (ELM). The training set data is derived from Part 1, and the parameters of ELM are optimized by Part 2, so that the ELM model is obtained with less training errors.
Part 4 is the testing process of ELM. The test set data is derived from Part 1, and the trained model is provided by Part 3 to obtain the test set prediction values.