1. Introduction
With the continuous consumption of coal, oil, natural gas and other resources, energy depletion and environmental pollution are becoming more serious [
1]. Solar energy has the characteristics of being green, clean and renewable, which have been widespread concerns, and the urgent demand of environmental protection has promoted the rapid growth of global solar energy system [
2,
3,
4]. Photovoltaic (PV) power generation is an efficient way to utilize solar energy, and the PV power generation proportion is increasing in line with reductions in cost and improvements in technology [
5]. PV output power has fluctuation and uncertainty [
6,
7,
8]; these characteristics bring difficulties to the optimal dispatching of power system and are not conductive to the stability of renewable energy power system [
9,
10,
11]. The adverse impacts of PV power generation limit the improvement of the grid connection rate of PV power generation, and are not conducive to the application of clean energy. Accurate prediction of PV power is conducive to the safe operation of renewable energy power systems, and is beneficial for the application of clean energy [
12,
13,
14].
Most of the studies for short-time PV power prediction are based on the statistical analysis of historical data [
15]. PV prediction models include the linear model, nonlinear model and combination model. The linear model infers changing trends in PV historical data, and then predicts the PV output power. Reikard [
16] used autoregressive models to predict PV power, and achieved better results. Li et al. [
17] argued that the linear model was simple to implement, but had poor flexibility and low prediction accuracy. The combination model is used to combine different prediction models in order. The results of different prediction models are combined in the final result of combination model. Liu et al. [
18] proposed a variable weight combination prediction model which integrates three different neural networks, and the prediction result of the combined model is more accurate. Combined prediction method can integrate the advantages of each prediction method in specific conditions, but the prediction model and prediction process are more complicated.
The nonlinear models include Artificial Neural Network (ANN), Extreme Learning Machine (ELM) and Support Vector Machines (SVM). The models are trained by PV historical data. Then, the input data and trained model are used for prediction. The advantages of the nonlinear models are relative simplicity and high accuracy. Zhu et al. [
19] proposed an adaptive Back Propagation (BP) neural network model, which can adapt to the changes of time and external environment by updating the training data. Priyadarshi et al. [
20] proposed an ANN method which applies the calculated astronomical variables to predict the PV output power with certainty and probability. Li et al. [
21] proposed a deep belief network model, and realized the prediction of PV power generation in different seasons. The experiment proves that the model achieves better prediction accuracy. Ni et al. [
22] considered the uncertainty of model and noise and proposed an optimal prediction interval method based on the ELM for PV power generation prediction. J. Wang et al. [
23] proposed an ELM model which can update data automatically to predict PV power, and the model improves the prediction accuracy through continuous learning. Ni et al. [
24] combined the upper and lower bound estimation method with the ELM to form the PV power generation interval prediction model, and used a heuristic algorithm to optimize the model, which achieved better effect. Neural network and ELM models have better prediction accuracy, but neural network and ELM models require a large amount of training data. Prediction accuracy will decrease when the amount of training data is small.
SVM requires less training data, and many studies have applied SVM to predict PV power. The method proposed by Huang et al. [
25] classified the PV power generation historical data firstly, then used support vector regression (SVR) to learn the classified data. Compared with simple SVR, the prediction accuracy of the method is enhanced. Bae et al. [
26] proposed an SVM regression model that considers multiple meteorological factors. First, the meteorological data is clustered, and then the SVM is trained to predict the clustered data. The above SVM models improve prediction accuracy, but the parameters of SVM are not optimized, so the performance of the model cannot be maximized.
The regression performance of SVM is very sensitive to parameters. Chen et al. [
27] argued that SVM parameters had an important impact on regression performance. They focus on maximizing the prediction ability of SVM, and demonstrate that its parameters should be optimized. Many studies have optimized the parameters of prediction models by using intelligent algorithms, and achieved better results [
28,
29]. Eseye et al. [
30] proposed a model which consisted of wavelet, optimization algorithm and SVM. The parameters of SVM were optimized by optimization algorithm and wavelet was applied to process bad data. The prediction accuracy was improved compared with other seven prediction strategies. The prediction model proposed by Lin and Pai [
31] consisted of least squares vector regression and seasonal decomposition. The genetic algorithm was applied to optimize the model and achieved better accuracy. Shang and Wei [
32] proposed a PV power generation prediction method which consisted of feature selection and support vector regression. The best candidate input is selected by feature selection and the SVM is optimized by the heuristic algorithm. The prediction accuracy of the method was enhanced. Lin et al. [
33] improved the moth to fire algorithm by using the mutation operator to increase population diversity. The improved algorithm was applied to optimize the PV prediction model. Experiments show that the optimized model achieves sufficient accuracy. A genetic algorithm optimizing the SVM (GASVM) model is proposed by VanDeventer et al. [
34] for PV power prediction at a residential level and the root mean square error (RMSE) and mean absolute percentage error (MAPE) parameters of GASVM are reduced compared with the traditional SVM.
Prior studies have achieved better prediction accuracy in a single weather condition, but these studies lack consideration of different weather conditions and cannot guarantee that the proposed prediction models have better performance in different weather conditions. There is no unified method of PV prediction in different weather conditions. In the research of prediction model optimization, there is a lack of in-depth research on the selection and improvement of the algorithm. Therefore, an improved whale algorithm (IMWOA) optimizing SVM model (IMWOA-SVM) for PV power prediction is proposed in this study. SVM has advantages of greater efficiency and less demand for training data, so it is more suitable as the prediction model when the training data is less. SVM prediction accuracy is improved when the parameters are optimized by the intelligent algorithm. The advantages of the whale optimization algorithm (WOA) include less codes, fast execution speed and higher optimization accuracy, which make it suitable for SVM optimization [
35]. However, the WOA algorithm also has the disadvantage of falling into a local optimal solution, which affects the optimization effect [
36,
37,
38]. In addition, the objective of this study is to propose a high-precision PV power prediction method, symmetrically enhance the adaptability of prediction method to different weather, reduce the amount of training data required by prediction methods, reduce the adverse effects of PV power fluctuations on the grid and promote the application of clean energy. Therefore, the WOA is improved by a variety of methods in this study. Furthermore, the parameters of SVM are optimized by IMWOA. For enhancing the prediction stability of IMWOA-SVM in different weather conditions, wavelet threshold denoising is applied to preprocess the input data. The results of the experiment prove that the IMWOA has stable optimization ability, and the prediction model based on symmetry concept can achieve ideal prediction accuracy in different weather conditions. This study helps to enhance the stability of renewable energy power systems and is of great significance to sustainable development.
The work and innovation of this study include: (1) The IMWOA-SVM model is proposed. The proposed model requires less training data, can symmetrically adapt to various weather conditions, and has better prediction accuracy. (2) Wavelet soft threshold denoising is applied to preprocess the input data. The interference signal from the input data is reduced, and the prediction model has better prediction accuracy in various complex conditions. (3) IMWOA is proposed by combining WOA with tent chaos initialization, mutation disturbance and the differential evolution algorithm (DE), which significantly enhances the ability of IMWOA to search for the optimal solution and escape from the local optimal solution. (4) IMWOA is used to optimize SVM photovoltaic prediction model. IMWOA has better global search ability and improves the comprehensive performance of the PV prediction model.
This study is structured as follows. The
Section 2 is the prediction model and its improvement. The
Section 3 is the input data preprocessing and experimental arrangement. The
Section 4 is the discussion of experimental results. Lastly,
Section 5 are given.
3. Input Data Preprocessing and Experimental Arrangement
Many meteorological factors have an impact on the PV power, and different meteorological factors have different effects on it. In this section, the Pearson coefficient is first applied to analyze the relationship between different meteorological factors and PV power. The meteorological factors strictly related to the PV output power are selected as the prediction input data. Then, wavelet denoising is used to denoise the input data to remove the noise in it, and the input data is normalized. Finally, the experimental arrangement is described.
3.1. Selection of Input Data
PV output power is closely related to meteorological factors including light intensity, ambient temperature, relative humidity, etc.
Figure 3 shows the variation curve of the PV output power from 8 to 18 h in sunny and cloudy weather conditions. Sunny and cloudy weather are typical weather with representative characteristics, and the PV output power presents great differences in these two types of weather. In sunny weather, the changes of the meteorological factors are slow and the PV output power presents a regular parabola. In cloudy weather conditions, the meteorological factors have a greater fluctuation, resulting in PV output power also presenting a greater volatility, and the overall output power being low. Therefore, the external meteorological factors determine the PV power and they can be applied to accurately predict the PV power. Ignoring the key meteorological parameters will increase the prediction deviation. However, considering too many meteorological factors will greatly increase the workload of prediction, and excessive consideration of irrelevant factors will also reduce the prediction accuracy. So, accurately measuring the correlation between various meteorological factors and PV power and selecting the appropriate factors as the input data of the prediction model is important to enhance the prediction accuracy. The Pearson correlation coefficient is selected to measure the correlation between meteorological factors and PV output power. The expression of the Pearson coefficient is shown in (22).
and
are variables with correlation.
is the total number of data. In this problem,
is the weather factor,
is the output power of photovoltaic power generation, and
is the correlation coefficient.
Table 4 shows the meaning of the Pearson coefficient. It indicates positive correlation when
is greater than zero, it indicates no linear correlation when
is equal to zero, and it indicates negative correlation when
is less than zero.
The correlation coefficients between PV power and the meteorological factors including light intensity, diffuse intensity, ambient temperature, wind speed and humidity are calculated. The test data are from a PV power station in Australia.
Table 5 shows the calculation results of the correlation coefficient.
The correlation coefficient of wind speed and PV power remains at a low level, and the average correlation coefficient of six months is 0.321, showing a weak correlation. Moreover, the correlation coefficient fluctuates greatly among different months. The maximum correlation coefficient is 0.580, and the minimum correlation coefficient is −0.053, indicating that the correlation is extremely unstable. So, it is not suitable to select wind speed as the input data. The correlation coefficient of light intensity and PV power remains at a high value. The average correlation coefficient of six months is 0.993, the maximum and minimum correlation coefficient are 0.996 and 0.989, showing an extremely strong and stable correlation. So, it is appropriate to select the light intensity to predict the PV power. According to the same analysis method, the environmental temperature and humidity present stable moderate correlation with PV power, diffuse radiation and PV output power present stable weak correlation. In this study, the light intensity, ambient temperature and humidity are selected as input data considering the accuracy and complexity of the model.
3.2. Denoising and Normalization of Input Data
Light intensity, ambient temperature and relative humidity should be continuous and slowly changing signals, but, when affected by measurement conditions and other factors, these signals contain a lot of noise, which causes the measurement waveform to show the characteristics of fluctuation. The accuracy of the prediction model will be adversely affected if these data with noise are used for the training of the prediction model. So, it is necessary to take measures to reduce the noise in input data.
There are many data denoising methods, and many studies have also examined various data denoising methods [
48], among which wavelet threshold denoising is widely used [
49]. The signal containing noise is decomposed by wavelet. The decomposed signal has a larger signal wavelet coefficient and a smaller noise wavelet coefficient. Comparing the obtained wavelet coefficients with the threshold value, the wavelet coefficient that is higher than the threshold value is considered as the signal, and should be retained. Furthermore, the wavelet coefficient that is lower than the threshold value is considered as noise, and should be removed. Setting an appropriate threshold can achieve ideal denoising effect.
Figure 4 shows the flow chart of wavelet threshold denoising, and the steps are shown below.
- (1)
Decompose the original signal and obtain the wavelet coefficients.
- (2)
Set threshold and threshold function.
- (3)
Denoise the wavelet coefficients by threshold to filter the noise information in the signal.
- (4)
Reconstruct the processed wavelet coefficients to obtain the denoised signal.
The threshold denoising of wavelet denoising has a decisive impact on the denoising effect of the signal. Threshold denoising involves the selection of the threshold and threshold function. The selection of the threshold is complex and related to many aspects, and the default setting is generally used. There are two types of threshold functions. The first type is a hard threshold function, which sets the signal that is lower than the threshold to zero and keeps the signal that is higher than the threshold unchanged. The hard threshold function can cause the signal to retain more original information. However, the signal will break obviously near the threshold and the signal continuity is poor. The second type is a soft threshold function which processes the signal that is higher than the threshold by adding or subtracting the threshold value and sets the signal that is lower than the threshold to zero. The soft threshold function causes the denoised signal to become smoother and more continuous. Practice has proved that the prediction model trained by the signal processed by the soft threshold function has higher prediction accuracy. Therefore, this study applies the soft threshold function to preprocess the prediction input data. For intuitively reflecting the denoising effect of the soft threshold function, the data of daytime temperature, humidity and light intensity are intercepted for wavelet soft threshold denoising.
Figure 5 shows the effect picture of wavelet soft threshold denoising.
The first picture is the temperature curve, the second one is the humidity curve, and the third one is the light intensity curve. The red line in
Figure 5 represents the data denoised by wavelet soft threshold and the blue line is the original data. Since there are many noise signals in the original temperature and humidity data, the signal denoised by wavelet undergoes obvious changes compared to the original signal. However, the light intensity signal changes gently and contains less noise, so the signal after wavelet denoising undergoes no obvious change compared to the original signal. It can be found that the input signal without denoising contains high-frequency noise, and the signal waveform contains many turning points and fluctuates greatly. After wavelet soft threshold denoising, the waveform is smoother, many abrupt points are eliminated, and the real data are restored, which is beneficial for improving prediction accuracy. In this study, the training and testing input data are processed by wavelet soft threshold denoising to improve the prediction accuracy.
As shown in
Figure 5, different meteorological factors have different units, and the variation range of the signals is also very different. So, they cannot be directly used for model training and prediction, and they need to be normalized. The normalization Formula is shown in Formula (23).
is the normalized data,
is the non-normalized data,
is the maximum value of the data,
is the minimum value of the data.
In this study, the data of temperature, humidity, light intensity, PV power are normalized.
3.3. Experimental Arrangement
The prediction model of IMWOA-SVM was tested in sunny and cloudy weather conditions based on the data from Desert Knowledge Australia (DKA) Solar Center in Australia, and the prediction results were compared with five SVM models, and ELM and BP neural networks, respectively. Two days’ data of randomly selected typical weather are used for training, and one day’s randomly selected data are used for testing. The prediction period is from 8:00 to 18:00 in the daytime.
Figure 6 shows the prediction flow chart of IMWOA-SVM, and the steps of using IMWOA-SVM to predict PV power are as follows.
- (1)
Select training data and testing data in sunny and cloudy weather, respectively.
- (2)
Preprocess training input data and testing input data by wavelet soft threshold denoising.
- (3)
Normalize training and testing data.
- (4)
Initialize the parameters of the IMWOA-SVM photovoltaic output power prediction model.
- (5)
Train the prediction model by training data. Apply the IMWOA to optimize the SVM. Test the prediction model by the test data.
- (6)
Obtain the optimal prediction model of PV power. Predict the PV power.
- (7)
Normalize the prediction output power inversely and output the experimental results.
In the process of model optimization, the mean square error (MSE) of the prediction power and the actual power is used as the objective function of IMWOA optimization. The definition of MSE is shown in Formula (24).
is the predicted PV power,
is the actual PV power,
is the number of sample points.
Besides MSE, some other indexes are applied to evaluate the prediction results more comprehensively, including mean absolute error (MAE), root mean squared error (RMSE), R-square (R2) and mean absolute percentage error (MAPE). The definitions of these evaluation indexes are shown in Formula (25).
It should be noted that the larger is and the smaller the , , and are, the better the prediction results can track the actual output power.
5. Conclusions
PV output power has the characteristic of uncertainty, which is not conductive to the stability and security of power system. In different weather, PV power has significantly different characteristics, which increases the difficulty of power prediction. For further symmetrically improving the prediction accuracy of PV power in different weather conditions and promoting the use of clean energy, an improved whale algorithm optimizing SVM model is proposed in this study. The advantages of this model are that it can reduce the demand for input data, adapt to the changes of weather conditions, and achieve ideal prediction accuracy in complex weather conditions compared with similar prediction models. The research contents include the selection of input data, the preprocessing of data, the improvement of the optimization algorithm and the optimization of the SVM prediction model. The following conclusions are obtained.
- (1)
The PV power is determined by some meteorological factors, and it has significantly different characteristics in different weather conditions. Through the correlation analysis, it is found that PV power has the strongest correlation with the meteorological factors including light intensity, ambient temperature and humidity. Furthermore, these meteorological factors can be used to accurately predict PV power.
- (2)
The wavelet soft threshold denoising can be applied for the pretreatment of PV input data. It can effectively eliminate the noise contained in input data and improve the coherence of the input data, which is beneficial to remove the adverse impact of noise and enhance the stability of the prediction model in complex weather conditions.
- (3)
The BP neural network and ELM have large demand for training data. When the training data are not sufficient, the ideal prediction accuracy cannot be achieved. SVM has less demand for training data, and can achieve ideal prediction accuracy when there is less training data, which is suitable for PV output power prediction models with less training data.
- (4)
The optimization performance of WOA can be effectively improved through combination with the hybrid improved method. By combining the original WOA with tent chaos initialization, mutation disturbance of the optimal individual and DE algorithm, the comprehensive performance of the IMWOA is significantly enhanced.
- (5)
The IMWOA-SVM photovoltaic output power prediction model applies wavelet denoising to process the predicted input data, and applies the hybrid improved whale algorithm to optimize the SVM, which significantly improves comprehensive prediction performance in different weather conditions.
- (6)
The proposed IMWOA-SVM photovoltaic output power prediction model can symmetrically achieve the accurate prediction for PV power in different weather conditions, especially in complex weather conditions. It can provide the operation and scheduling department with reliable reference, help to improve the utilization rate of renewable energy power generation and maintain the security of renewable energy power systems. It is of great significance to the application of clean energy.
The input data selection method, input data preprocessing method, algorithm selection and improvement method and model optimization method proposed in this study constitute a complete prediction method of renewable energy generation output power. It can be applied not only to predict the PV power, but also to predict other similar renewable energy power. It provides a reference for the optimization and improvement of prediction models of other similar renewable energy and is expected to develop into a general method to predict the output power of renewable energy power.
This study has limitations. In the stage of selecting input data, the linear correlation between wind speed, temperature, humidity, light intensity, diffuse intensity and PV power is simply analyzed, but the complex nonlinear relationship behind the data is not deeply considered, which may cause some important data to be ignored. Future studies should enhance the optimization of the prediction method and the selection of input data to improve the performance of the IMWOA-SVM. In addition, this study is based on the SVM model; more types of models should be introduced into the prediction field to further improve the prediction accuracy in future research.