An Integrated Framework Based on an Improved Gaussian Process Regression and Decomposition Technique for Hourly Solar Radiation Forecasting

Sun, Na; Zhang, Nan; Zhang, Shuai; Peng, Tian; Jiang, Wei; Ji, Jie; Hao, Xiangmiao

doi:10.3390/su142215298

Open AccessArticle

An Integrated Framework Based on an Improved Gaussian Process Regression and Decomposition Technique for Hourly Solar Radiation Forecasting

by

Na Sun

^1,*,

Nan Zhang

^1,*

,

Shuai Zhang

²,

Tian Peng

¹,

Wei Jiang

¹,

Jie Ji

¹ and

Xiangmiao Hao

³

¹

Jiangsu Permanent Magnet Motor Engineering Research Center, Faculty of Automation, Huaiyin Institute of Technology, Huai’an 223003, China

²

Key Laboratory of Thermo-Fluid Science and Engineering, Ministry of Education, School of Energy & Power Engineering, Xi’an Jiaotong University, Xi’an 710049, China

³

Research and Development Department(R&D), Xi’an ShuFeng Technological Information, Ltd., Xi’an 710061, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2022, 14(22), 15298; https://doi.org/10.3390/su142215298

Submission received: 17 October 2022 / Revised: 12 November 2022 / Accepted: 15 November 2022 / Published: 17 November 2022

Download

Browse Figures

Versions Notes

Abstract

:

The precise forecast of solar radiation is exceptionally imperative for the steady operation and logical administration of a photovoltaic control plant. This study proposes a hybrid framework (CBP) based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), an enhanced Gaussian process regression with a newly designed physical-based combined kernel function (PGPR), and the backtracking search optimization algorithm (BSA) for solar radiation forecasting. In the CEEMDAN-BSA-PGPR (CBP) model, (1) the CEEMDAN is executed to divide the raw solar radiation into a few sub-modes; (2) PACF (partial autocorrelation coefficient function) is carried out to pick the appropriate input variables; (3) PGPR is constructed to predict each subcomponent, respectively, with hyperparameters optimized by BSA; (4) the final forecasting result is produced by combining the forecasted sub-modes. Four hourly solar radiation datasets of Australia are introduced for comprehensive analysis and several models available in the literature are established for multi-step ahead prediction to demonstrate the superiority of the CBP model. Comprehensive comparisons with the other nine models reveal the efficacy of the CBP model and the superb impact of CEEMDAN blended with the BSA, respectively. The CBP model can produce more precise results compared with the involved models for all cases using different datasets and prediction horizons. Moreover, the CBP model is less complicated to set up and affords extra decision-making information regarding forecasting uncertainty.

Keywords:

solar radiation forecasting; CEEMDAN; machine learning; Gaussian process regression; combined kernel function

1. Introduction

1.1. Background and Motivation

The current global energy crisis has added new urgency to speed up clean energy (wind and solar PV) transitions and, once again, emphasized the vital role of renewable energy. Notably, clean energy power generation, such as wind and solar PV, has great potential in reducing the dependence of the power sector on many fossil fuels including coal, oil, and natural gas. As stated in the latest Renewables Global Status Report in 2021, the new global renewable power capacity has broken a new record, reaching nearly 315 GW. Together, solar PV and wind sources accounted for nearly 90% of all new renewable power additions. Solar PV represented more than half of the additions (around 175 GW) and wind power another 102 GW [1]. Solar electricity is becoming one of the most appealing energy-saving sources due to its tremendous potential. However, solar energy exhibits burdensome characteristics such as intermittency, fluctuation, and stochasticity due to the effects of global climate change and human intervention [2]. Therefore, integrating PV energy sources into a power grid becomes more difficult [3]. As a result, accurate and dependable forecasting information about solar radiation is extremely important for choosing the most cost-effective power grid strategy and power storage capacity [4].

1.2. Literature Review

Recently, a variety of successful applications have been applied to the forecasting of solar energy. These methodologies can be broadly classified into the three most important approaches: physical models (PMs), traditional time series (TS) models, and machine learning (ML) models [2]. The PMs rely heavily on Numerical Weather Prediction (NWP). Based on high-performance computers and the actual situation of the atmosphere, NWP solves a number of thermo-hydrodynamic equations with weather evolution to predict the future solar radiation intensity. In its calculation, multiple data sources are required, such as climate data and satellite cloud observations. Therefore, the prediction capability is determined by the consistency of weather conditions [5]. Thereby, these processes are intricate and time-consuming. Traditional TS models are primarily based on time series and mainly rely on historical measured data to predict solar radiation value in future, such as the autoregressive (AR) model [6], and variants of the AR model such as ARMA [7,8] and ARIMA [9], as well as the Grey model. In early solar radiation forecasting, the classical TS models were commonly utilized; however, their performance is relatively poor due to the fixed parameters and inability to adjust to shifting time sequences. Many researchers made efforts to strengthen the model performance by introducing other techniques. Ding et al. [10] proposed an improved Grey model with real-time updated parameters to perform photovoltaic power forecasting. Unfortunately, the Grey model cannot deal with a massive amount of stochastic information effectively, which leads to the degradation of prediction performance.

In contrast, ML models are more commonly used today, thanks to the development of high-performance computers, big data mining techniques, and artificial intelligence theory. Most widely used are artificial neural networks (ANN) [11], random forests (RF), support vector machines (SVM) [12], extreme learning machines (ELM) [13], and heuristic intelligent optimization algorithms based on the above-mentioned methods [14]. Yagli et al. [15] investigated the forecasting abilities of 68 off-the-shelf ML methods for solar irradiance forecasting in seven locations that had different meteorological characteristics. According to their findings, there is no one-size-fits-all model, although there are some preferred options for each sky and climate. Based on ANN, Moreira et al. [16] created an ANN ensemble to forecast medium-term photovoltaic power. They provided a versatile method that enhanced the forecasting results by allowing for the modification of various experimental elements, such as the base forecasting module and the desired forecast horizon. They also suggested using the feature selection method to eliminate redundant information. Sun et al. [17] proposed a progressive back-propagation neural network (BP) model for PV module temperature. The step-by-step BP model produced more precise prediction results relative to the direct one. Benali et al. [18] investigated the performance of two famous ML methods, ANN and RF, in hourly solar radiation forecasting. Experimental analysis suggested that the RF technique outperformed ANN in forecasting solar data. Cervone et al. [19] presented a new model by integrating ANN and Analog Ensemble (AnEn) together. The results indicated that the combined model outperforms each method running individually.

More recently, the hybrid model with decomposition techniques and/or optimization algorithms has been shown to exhibit a more desirable forecasting performance and its use has grown rapidly. Zhang et al. [14] provided a comprehensive comparison of the single BP, BP with optimum parameters searched by particle swarm optimization (PSO), and several widely used statistical models for daily solar radiation prediction. They found that PSO-BP was more effective than all involved benchmark models, confirming that the compound model surpasses the standalone model. Feng et al. [20] developed a new hybrid model in which PSO is used to find the suitable parameters of ELM. The PSO-ELM has better forecasting ability than the five other ML models in forecasting daily solar radiation. Undoubtedly, the hybrid models integrated with several complementary techniques are more promising.

From the aspect of real operation, the uncertainty of forecasting results is necessary to include as additional information for decision-making. Unfortunately, most existing research seems to ignore that factor. A novel ML approach, namely GPR, has the advantages of traditional ML in dealing with non-linear problems, but also inherits the flexible inductive reasoning ability of Bayesian methods. The capability of nonlinear fitting and uncertainty quantification of GPR has been successfully tested in a variety of applications, including wind energy forecasting [21] and monthly streamflow forecasting [22]. Additionally, some researchers made efforts to enhance the ability of GPR by utilizing combined kernel functions [23] or by bringing in emerging optimization algorithms such as the PSO algorithm and Bayesian Optimization [24]. To the best of our knowledge, the advantage of GPR has not been fully investigated in the solar energy domain.

Due to the influence of environmental and meteorological variables such as irradiance and cloudiness, observed solar radiation series show some obvious features such as strong stochasticity, complex nonlinearity, and being non-stationary. The historical sequence of solar radiation is formed by many components, including tendency, periodicity, and randomness. To improve the precision of model predictions, it is vital to employ suitable series decomposition techniques to dig the concealed information in the observed series. The widely used decomposition approaches contain variational mode decomposition (VMD) [25], discrete wavelet transform (DWT) [26], empirical mode decomposition (EMD) [27], and their variants including empirical wavelet transform (EWT) [28], ensemble EMD (EEMD) [29], and complementary EEMD (CEEMD) [5]. For example, Zhang and Wei [30] utilized the WT to process the original daily solar radiation, and then used principal component analysis (PCA) to decrease input dimension; after that, ELM optimized by bat algorithm (BA) was established to make predictions. To address the mode mixing of EMD, the high computational cost of EEMD, and to make the reconstructed time series noise-free, another improved version of EMD was proposed [31]. The name of the new version is complete EEMD with adaptive noise (CEEMDAN), and it is appropriate for decomposing a non-stationary signal into several more stationary sub-modes. Therefore, this study adopts the CEEMDAN to disassemble the initial solar series into several simpler sub-series to improve forecasting accuracy.

1.3. Contribution and Organization

Through the above systematic analysis of the existing studies, some deficiencies and problems are found. The standard GPR used a single kernel function, which is incompetent for accurate real solar radiation series forecasting due to the fact the periodicity, tendency, and randomness features of the solar radiation are intertwined with each other. When utilizing the forecasting model in real operation, the uncertainty of forecasting results is not provided.

Therefore, to provide more accurate forecasting results, a compound framework, namely CEEMDAN-BSA-PGPR (CBP), is put forward by integrating the CEEMDAN, BSA, and GPR models with a newly designed physical-based kernel function. The CBP is established to adequately perform from one-hour to three-hour solar radiation forecasting. Contributions of this study mainly include the following:

(1): To reduce the volatility of the original solar data, the non-stationary solar radiation series is divided into several stable sub-series with various scales via the CEEMDAN method. These stable components are simpler than the original series for forecasting, which is beneficial to enhance forecasting accuracy.
(2): According to the inherent features, including periodicity, tendency, and randomness, of the original solar radiation series, a new physical information-based kernel function is designed by using several commonly used kernel functions. A GPR model based on the new physical kernel (PGPR) is constructed. The model PGPR can realize both deterministic and directly probabilistic solar radiation prediction, which is more in line with the actual demand during decision-making.
(3): To handle local optimality and initialization dependence disadvantages of the conjugate gradient (CG) algorithm, widely used for searching the optimal hyper-parameters, BSA is executed to search for the optimum hyperparameters of PGPR.

The following is the layout of the remainder of the paper: Section 2 presents the theoretical foundation of this study, including CEEMDAN, PGPR, BSA, and the CBP; a comprehensive case study is given in Section 3; Experimental results of all involved models using four datasets and for all horizons are presented in Section 4; Section 5 analyzes and testifies to the positive effects of the CEEMDAN and the BSA algorithm; Section 6 summarizes the main findings of this paper. The nomenclature for abbreviations is shown in Appendix A.

2. Methodology

2.1. Solar Radiation Prediction Based on ML

Solar radiation forecasting based on ML can be stated as:

s (t + L) = φ (s (t - d_{1} + 1), O t h e r (t - d_{2} + 1))

(1)

where

s (t + L)

represents the computed solar radiation value at t + L in the future, L is the corresponding forecasting horizon,

s (t - d_{1} + 1)

stands for the measured solar radiation value up to

t - d_{1} + 1

time steps,

O t h e r (t - d_{2} + 1)

is the other variable having great relevance with the solar radiation at time t + L, such as precipitation, surface pressure, temperature, etc., and

φ (\cdot)

is the forecasting function which can be replaced by one of the three types of methods mentioned in the introduction.

PMs can sometimes be replaced by ML models because of issues such as a lack of modeling data, a poor understanding of physical mechanisms, or an inaccurate forecast of long-term weather conditions. In addition, ML models are simple to construct and can be easily expanded by utilizing additional optimization algorithms or series decomposition methods to create new hybrid models, resulting in highly accurate forecasting.

2.2. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

The CEEMDAN [31] retains the advantages of the EEMD method to effectively eliminate the mode-mixing hassle of EMD via repeatedly feeding white noise to the preliminary series. Simultaneously, it can likewise use less computation sources and improve the decomposition efficiency. The primary steps of CEEMDAN are as follows:

(1): Create a series $S (t) + ε_{0} n^{i} (t)$ by adding additional white noise $n^{i} (t)$ to the original series $S (t)$ , where $ε_{0}$ is the noise amplitude;
(2): Execute EMD N times to generate the first components $I M F_{1}^{i} (t) (i = 1, 2, \dots, N)$ . The final first component $I M F_{1}^{} (t)$ is calculated by taking the mean of $I M F_{1}^{i} (t)$ . The corresponding first residual series $r_{1}^{} (t)$ is obtained by $r_{1} (t) = S (t) - I M F_{1}^{} (t)$ ;
(3): Decompose the set of noise-added residual series $r_{1} (t) + ε_{1} E_{1} (n^{i} (t))$ to produce the second $I M F_{2}^{i} (t) (i = 1, 2, \dots, N)$ ; $E_{1} (n^{i} (t))$ is the first sub-mode of the $n^{i} (t)$ . Thus, the final second $I M F_{2}^{} (t)$ is calculated by taking the average of $I M F_{2}^{i} (t)$ ;
(4): The subsequent processes of decomposition are identical to Steps 2–3. After calculating the k-th residual series, the (k + 1)-th mode can be obtained. Continue doing so until the residual series exhibits monotony. The remaining modes of CEEMDAN can be represented mathematically as:

$r_{k} (t) = r_{k - 1} (t) - I M F_{k} (t) = r_{k - 1} (t) - 1 / N \sum_{i = 1}^{N} I M F_{k}^{i} (t)$

(2)

The following can be used to compute the final residual R(t):

R (t) = S (t) - \sum_{k = 1}^{K} I M F_{k} (t)

(3)

Therefore, the original solar data can be written as:

S (t) = R (t) + \sum_{k = 1}^{K} I M F_{k} (t)

(4)

2.3. Gaussian Process Regression with Physical-Based Kernel Function (PGPR)

2.3.1. Basic Principles of Gaussian Process Regression

The GPR, a famous ML method, is a non-parametric Bayesian method [32]. It was proposed by integrating Bayesian and statistical theory. Therefore, the GPR possesses both advantages of the Bayesian method and ML. These characteristics grant the GPR model superiority in handling high-dimensional complicated nonlinear issues; therefore it is widely used in all walks of life [32].

The training input sample of GPR can be denoted as

S_{n} = \{(X_{i}, y_{i})\}, i = 1, \dots, n, X \in R^{m}, y \in R

, where X is the input data with n observations for the corresponding target output y with m-dimensional factors. The stochastic process state set of the X obeys a joint Gaussian distribution with n dimensions. According to the definition, the state set g of the stochastic process is a Gaussian process. Thereby, its unique probability density function, denoted GP, can be obtained according to its mean function

E (X)

and covariance function matrix

K (X, X)

.

g (X) \sim G P (E (X), K (X, X))

(5)

Then, in accordance with the features of GP, the y of the training dataset

S_{n}

and the

y_{t e s t}

of the testing dataset

S_{n}^{t e s t} = \{(x_{t e s t}, y_{t e s t}) | x_{t e s t} \in R^{m}, y_{t e s t} \in R\}

obey the following multivariate Gaussian distribution:

[\begin{matrix} y \\ y_{t e s t} \end{matrix}] \sim N (0, [\begin{matrix} K (X, X) + σ_{n}^{2} I_{n} & K (X, x_{t e s t}) \\ K (x_{t e s t}, X) & k (x_{t e s t}, x_{t e s t}) \end{matrix}])

(6)

where

K (x_{t e s t}, X)

equals

K {(X, x_{t e s t})}^{T}

, and

k (x_{t e s t}, x_{t e s t})

is the covariance for

x_{t e s t}

itself.

After that, according to the Bayesian principle, the posterior distribution can be inferred as:

\begin{array}{l} p (y_{t e s t} | X, y, x_{t e s t}) \sim N (E (y_{t e s t}), cov (y_{t e s t})) \\ E (y_{t e s t}) = K (x_{t e s t}, X) {[K (X, X) + σ_{n}^{2} I_{n}]}^{- 1} y \\ cov (y_{t e s t}) = k (x_{t e s t}, x_{t e s t}) - K (x_{t e s t}, X) {(K (X, X) + σ_{n}^{2} I_{n})}^{- 1} K (X, x_{t e s t}) \end{array}

(7)

where

E (y_{t e s t})

is the expected value, and

cov (y_{t e s t})

represents the posterior variance for

y_{t e s t}

. The

cov (y_{t e s t})

is used to measure the forecasting uncertainty.

2.3.2. Covariance Functions and Their Hyperparameters Determination

As stated above, GP can be obtained by using its

E (X)

and

K (X, X)

. Based on the statistical principles, the GP can be easily transformed into a new Gaussian distribution with

E (X) = 0

. As a result, figuring out the GPR model’s

K (X, X)

is the most important part of solving it. The

K (X, X)

of GPR are numerous. Three commonly used single

K (X, X)

are presented:

(1): Squared-Exponential (SE) kernel:

$k_{S E} = σ_{f}^{2} \exp (- r / 2 l^{2})$

(8)
(2): Matern (Ma) 5/2 kernel:

$k_{M a} = σ_{f}^{2} (1 + \sqrt{5} \frac{r}{l} + \frac{5 r^{2}}{3 l^{2}}) \exp (- \sqrt{5} \frac{r}{l})$

(9)
(3): Rational Quadratic (RQ) kernel:

$k_{R Q} = σ_{f}^{2} {(1 + r^{2} / 2 α l^{2})}^{- α}$

(10)
(4): Linear kernel:

$k_{L i n} = σ_{f}^{2} (x - c) (x ’ - c)$

(11)
(5): Random noise (RN) kernel:

$k_{R N} = σ_{f}^{2} δ_{x x^{’}}$

(12)

where $r = ‖x - x^{'}‖$ , l is the relevance determination; $σ_{f}^{2}$ is the variance and is used to control the degree of local relevance; $δ_{x x^{’}}$ is the Kronecker’s delta function for x and $x^{'}$ ; the $θ = {l, σ_{f}^{2}}$ is called the hyperparameter set of the kernel function.

Usually, hyperparameters

θ

have a significant impact on the training process and forecasting outcomes. It can be iteratively searched by the usage of the maximum likelihood method to build a log marginal likelihood function

L (θ)

of a training dataset. The

L (θ)

is calculated by:

L (θ) = 0.5 y^{T} C^{- 1} y + 0.5 \log | C | + 0.5 n \log 2 π

(13)

After that, the partial derivative of the hyperparameters can be obtained as:

\begin{array}{l} \frac{\partial L (θ)}{\partial θ_{i}} = 0.5 t r (((C^{- 1} y) {(C^{- 1} y)}^{T} - C^{- 1}) \frac{\partial C}{\partial θ_{i}}) \\ C = K_{n} + σ_{n}^{2} I_{n} \end{array}

(14)

At last, the optimum

θ

is sought by minimizing Equation (14) via the conjugate gradient (CG) optimization. Using the best

θ

values, the mean value

E (y_{t e s t})

and the posterior variance

cov (y_{t e s t})

can be obtained using Equation (7).

Based on the sigma rule, the predicted interval under a given confidence level

1 - α

is:

(E (y_{t e s t}) - \frac{cov (y_{t e s t})}{\sqrt{n}} z_{α / 2}, E (y_{t e s t}) + \frac{cov (y_{t e s t})}{\sqrt{n}} z_{α / 2})

(15)

2.3.3. Physical-Based Kernel Function

A physical-based kernel function using several common kernels is designed for the GPR to model solar radiation. Generally, solar radiation data are periodic; therefore a periodic kernel is adopted. Additionally, to capture the amplitude of data changes in a day, the SE kernel is combined with the periodic function. Moreover, a linear and RN kernel is also considered to describe linearity and noise in solar radiation. Therefore, the new physical-based kernel function is defined as:

k_{C K} = θ_{1}^{2} \exp (- r / 2 θ_{2}^{2} - 2 \sin (\frac{π r}{θ_{4}^{}}) / θ_{3}^{2}) + θ_{5}^{2} (x - c) (x ’ - c) + θ_{6}^{2} δ_{x x^{’}}

(16)

To avoid too strong a dependence on the optimization result on the initial value, and easily plunging into the local optima of the standard CG method, BSA is adopted to find the optimum hyperparameters,

θ_{i} s

, of the combined covariance function. The principle of BSA is introduced in the next subsection.

2.4. Backtracking Search Optimization Algorithm

BSA [33] is an advanced optimization method to solve complicated numerical issues. BSA utilizes well-known operators, such as selection, mutation, and crossover, of the genetic algorithms (GAs) in a new structure. In addition to these operators, BSA incorporates a number of novel strategies, such as a memory that stores a population from a historical generation selected at random. Civicioglu’s research [33] reveals that there are five key steps involved in putting BSA into action. The following is a simple introduction of these five stages:

(a) Initialization

In this stage, individuals of two populations are randomly initialized in the search space by a uniform distribution U, as shown below:

P_{e}^{i, j} = U (l o w_{j}, u p_{j}), P_{o l d}^{i, j} = U (l o w_{j}, u p_{j}), i = 1, 2, \dots, N_{p o p}; j = 1, 2, \dots, D

(17)

where

P_{o l d}

is the historical population,

P_{e}

presents the evolution population,

N_{p o p}

and D are the size of the population and the dimension of variables, respectively, and

l o w_{j}

and

u p_{j}

are the predefined lower and upper limitations of variables to be optimized.

(b) Selection-I

Prior to each iteration, an option is used to update the

P_{o l d}

using the following "if-then" rule:

\begin{array}{l} i f R_{1} < R_{2} t h e n P_{o l d}^{i, j} = P_{e}^{i, j}, R_{1}, R_{2} \in U (0, 1) \\ P_{o l d}^{} : = p e r m u t i n g (P_{o l d}^{}) \end{array}

(18)

where

R_{1}

and

R_{2}

are two random numbers subject to uniform distribution in the range [0,1], and they are used to judge whether

P_{o l d}

should be replaced by

P_{e}

in the current generation;

: =

is an operator for update;

p e r m u t i n g (\cdot)

is a function that alters the order of individuals in

P_{o l d}

.

(c) Mutation

The initial format of the trial population

P_{t}

is generated by:

\begin{array}{l} P_{t} = P_{e} + F \cdot (P_{o l d} - P_{e}) \\ F = 3 \cdot r n d n, r n d n \sim N (0, 1) \end{array}

(19)

where F is a random number and controls the magnitude of the search direction matrix

(P_{o l d} - P_{e})

.

BSA can take some of the experiences from antecedent generations, due to the utilization of the

P_{o l d}

in the mutation process.

(d) Crossover

In this stage, the final format of

P_{t}

can be obtained. The crossover operator is expressed by:

P_{t}^{i, j} = \{\begin{matrix} P_{e}^{i, j}, i f m a p_{i j} = 1 \\ P_{t}^{i, j}, o t h e r w i s e \end{matrix}

(20)

where

m a p

is a binary matrix with size

N_{p o p} \times D

, and determines the directions of crossover operator.

(e) Selection-II

A greedy selection strategy is utilized to create a new population of the next generation. This means that the individuals in the

P_{t}

who have higher fitness values will be chosen to update the individuals in the

P_{e}

that correspond to them.

P_{e}^{i, j} = \{\begin{matrix} P_{t}^{i, j}, i f f i t n e s s (P_{t}^{i, j}) < f i t n e s s (P_{e}^{i, j}) \\ P_{e}^{i, j}, o t h e r w i s e \end{matrix}

(21)

2.5. Flowchart of CEEMDAN-BSA-PGPR (CBP) Model

The implementation of the developed CBP model is given in Figure 1 and summarized as follows:

Step 1: For a nonlinear and nonstationary solar radiation series $S (t)$ , employ CEEMDAN to decompose it into a residue, $R (t)$ , and several intrinsic mode functions, (IMFs) $I M F_{k} (k = 1, 2, \dots, K)$ ;
Step 2: For each component ( $I M F_{k}$ or $R (t)$ ), apply the PACF to determine suitable input variables;
Step 3: Based on the selected input, establish PGPR for each component ( $I M F_{k}$ or $R (t)$ ) and apply the BSA to obtain the best hyperparameters of the PGPR model;
Step 4: The well-tuned models are utilized separately to do multi-hour-ahead forecasting using the testing dataset for every IMF and the residue;
Step 5: The final forecasting values of the original solar radiation are obtained by aggregating all prediction results of all the IMFs and the residue.

3. Case Studies

3.1. Data Collection

Solar radiation data, from 2017 to 2018 and with a 1 h interval, are collected from Plataforma Solar de Almeria (PSA), which is located at 37.094416° N, 2.359850° W 490.7 m in Spain, and are used. The data are obtained from the SolarGIS database (https://solargis.com/cn/products/time-series-and-tmy-data/useful-resources, accessed on 15 October 2022), which is obtained by monitoring information from MTSAT and HIMAWARI satellites. Four datasets representing different seasons are utilized for experimental analysis. Figure 2 presents hourly GHI in W/m² (4663h in total) from different seasons, where Datasets 1–4 represent Spring, Summer, Autumn, and Winter. In Figure 2, the training and testing datasets are represented by the blue and red curves, respectively. Data with a 0 value at night have been deleted. Table 1 summarizes the statistics for the four datasets.

3.2. Parameter Setting

To comprehensively analyze the efficacy of the proposed CBP model, many commonly used models are constructed to do comprehensive comparisons. These models include BP, GRNN, RBF, ELM, SVR, GPR-SE, PGPR, BSA-PGPR, and DT-PGPR. The number of neurons in the hidden layers for BP and ELM is searched by the grid search (GS) in [1,100] with an interval of 1 [34]. The parameters of SVM, including kernel parameter λ and error penalty ρ, are also determined by the GS [34]. According to Reference [21], the parameters of GRNN and RBF are determined by a five-fold cross-validation method (CV) in (0,1) and [0.5, 1.5]. Besides the above five standalone models, the GPR with a SE kernel function has been established to attest to the effectiveness of the physical-based kernel function applied to the GPR model.

Considering the inherent non-stationarity and nonlinearity of the original solar radiation series, for the proposed CBP model, the CEEMDAN is first executed to divide the initial data into several IMFs and a residue. The related parameters of the CEEMDAN are as shown below: the realization number N is 20; the white noise amplitude is 0.2; the maximum iterations is 500 [35]. Figure 3 depicts the 11 subseries for the Spring dataset.

3.3. Determination of Input Variables

Input feature selection is a vital aspect of ML modelling. In general, if the input features are insufficient, the forecasting results may not be satisfactory [5]. On the contrary, too many inputs may lead to an additional computational burden. The antecedent solar radiation will be utilized as an input vector for modelling and forecasting. The optimal lag time of an input which has a significant impact on the target solar data needs to be determined. The PACF method, a widely used method, is adopted, and its principle can be found in Reference [36]. The PACF values for 24 lags with a 95% confidence interval (CI) for all datasets are calculated. As for the Spring dataset, the correlograms are presented in Figure 4.

3.4. Performance Criteria

In order to conduct a synthetical evaluation of the proposed model’s forecasting performance, four widely used evaluation criteria, Pearson’s correlation coefficient (R), root mean square error (RMSE), mean absolute error (MAE), and mean average percentage error (MAPE) [22], are utilized. These indices are calculated by

R = (\frac{\sum_{i = 1}^{N} (S_{o b s, i} - \bar{S}_{o b s}) (S_{f o r e, i} - \bar{S}_{f o r e})}{\sqrt{\sum_{i = 1}^{N} {(S_{o b s, i} - \bar{S}_{o b s})}^{2}} \sqrt{\sum_{i = 1}^{N} {(S_{f o r e, i} - \bar{S}_{f o r e})}^{2}}}), - 1 < R < 1

(22)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(S_{o b s, i} - S_{f o r e, i})}^{2}}

(23)

MAE = \frac{1}{N} \sum_{i = 1}^{N} |S_{o b s, i} - S_{f o r e, i}|

(24)

MAPE = \frac{1}{N} \sum_{i = 1}^{N} |\frac{S_{o b s, i} - S_{f o r e, i}}{S_{o b s, i}}|

(25)

where

S_{o b s, i}

and

S_{f o r e, i}

are the i-th measured and forecasted solar radiation, respectively,

\bar{S}_{o b s}

and

\bar{S}_{f o r e}

are the mean of the observed and computed solar radiation data, respectively, and N is the whole number of samples.

Improved percentage indices of these four metrics between model 1 and model 2 are calculated by [25]:

P_{i n d e x} = (i n d e x_{1} - i n d e x_{2}) / i n d e x_{1}

(26)

4. Results

The 10 models, including BP, GRNN, RBF, ELM, SVR, GPR-SE, PGPR, BSA-PGPR, CEEMDAN-PGPR (CEN-PGPR), and CBP, established in this study are utilized on the four datasets for from one-hour- to three-hour-in-advance solar radiation forecasting. These models are evaluated via four indices. Additionally, other results analysis approaches, such as histograms, scatter, radar graphs, and area graphs are also adopted to evaluate model performances more visually. In this section, the developed model, CBP, is compared with the other benchmark models, including seven standalone models (BP, GRNN, RBF, ELM, SVR, GPR-SE, and PGPR) and two hybrid models (BSA-PGPR and CEN-PGPR).

4.1. Results of 1-Hour Ahead Forecasting

The four metrics provided by the proposed model, CBP, and nine contrast models on the testing dataset for all datasets are presented in Table 2. Additionally, the indices values of the 10 models are visually presented in Figure 5. M1 to M10 are BP, GRNN, RBF, ELM, SVR, GPR-SE, PGPR, BSA-PGPR, CEN-PGPR and CBP, respectively.

According to Table 2 and Figure 5, the forecasting performance of the Winter dataset is the poorest for almost all cases, then the Summer, followed by the Spring, and, lastly, the Autumn dataset. This may be due to the gradual increase in the value of the testing dataset in the Winter dataset compared to the training dataset (Figure 2). In addition, the Summer solar radiation testing dataset has obvious volatility which is different from that of the training data, which increases the difficulty of prediction (Figure 2).

Based on a thorough analysis of Table 2 and Figure 5, a few conclusions can be drawn: (a) Among seven single models, the PGPR model produces the highest R-value and the lowest RMSE, MAE, and MAPE values compared to the other six standalone models for all datasets, except for the Spring dataset. For the Spring dataset, the evaluation indices of the PGPR are close to those of the best single model, indicating that the PGPR model enhances the interpretability without sacrificing precision. In contrast, the GRNN performs the worst. For the Spring dataset, the R, RMSE, MAE, and MAPE of the PGPR model are 0.9831, 63 W/m², 34 W/m², and 2.32%, respectively. Indices of the GRNN are 0.9735, 85 W/m², 66 W/m², and 3.14%, respectively. This result demonstrates the effective nonlinear fitting ability of the PGPR model. (b) Among all hybrid and single models, the proposed CBP model generated the lowest RMSE, MAE, and MAPE values and the highest R-value. For the Spring dataset, the R, RMSE, MAE, and MAPE of the CBP model are 0.9973, 26 W/m², 17 W/m², and 0.95%, respectively, indicating an excellent forecasting performance. (c) The four indices of the BSA-PGPR are better than those of the PGPR for all datasets. The four metric values of the CBP are always superior to those of the CEN-PGPR for all four datasets. From the comparison of the PGPR (CEN-PGPR) and BSA-PGPR (CBP) models, it is shown that the BSA can be a useful tool to search the optimum hyperparameters of the PGPR model. (d) The comparison of the CBP and BSA-PGPR models indicates that the CEEMDAN can be adopted as a capable alternative in improving the accuracy of solar radiation forecasting.

To intuitively present the forecasting performance and further testify to the efficiency of the CBP model, the single SVR, the best standalone model PGPR, and the hybrid model BSA-PGPR are used for further analysis. The one-hour-ahead forecasting results of these four models for the four datasets are portrayed in Figure 6, Figure 7, Figure 8 and Figure 9. A line chart at the bottom of these graphs displays forecasting errors between actual and predicted values of solar radiation. At the top of these graphs are forecasting and real curves. When compared to the forecasting curves provided by SVR, PGPR, and BSA-PGPR, it is evident that the forecasting curves of the CBP model are the closest to the observed curves. Additionally, the forecasting errors of the proposed CBP are around zero, with a narrow range of variation, significantly lower than those of other contrast models, especially for extreme situations. As for the Spring dataset, the residual value of the proposed CBP is relatively small, that of the PGPR has an obvious underestimation, and that of the BSA-PGPR has a significant overestimation between 350 hours and 375 hours (marked in Figure 6 as a blue ellipse). According to the scatter plots (Figure 10), it is shown that the forecasted solar radiation of the CBP model is uniformly scattered around the 45° line with a narrower range in the description of solar radiation values with different amplitudes than other contrast models, which graphically illustrates the superiority of the CBP model.

The prediction intervals (PIs) with two confidence levels obtained by the proposed CBP are visually depicted in Figure 11. There are a few extreme fluctuations in a small number of samples that fall outside of 90% PIs for all datasets. Furthermore, the fact that the PIs for these two confidence levels are sufficiently narrow draws the conclusion that the proposed CBP model is capable of producing the applicable PI.

4.2. Results of Multi-Hour Ahead Forecasting

In practical application, multi-hour-ahead forecasting of solar radiation is also required. The multi-hour-in-advance forecasting information is of great help to photovoltaic grid connection and power dispatching. Thereby, the CBP model is also executed to do two- and three-hour-ahead forecasting. The forecasting results of the developed CBP model are compared with the other nine benchmark models. Two-hour- and three-hour-ahead forecasting results are given in Table 3 and Table 4, respectively. Moreover, the corresponding heatmap for R, vertical histograms for RMSE, radar plots for MAE, and area graphs for MAPE, for two-hour- and three-hour-ahead forecasting are visually presented in Figure 12 and Figure 13, respectively. From these results, listed in Table 3 and Table 4 and Figure 12 and Figure 13, it is reasonable to draw the same conclusion as Section 4.1. Therefore, the effectiveness and superiority of the proposed CBP have been attested.

5. Discussion

In light of the preceding comparative analysis of one-hour- to three-hour-ahead solar radiation forecasting results, the impacts of the CEEMDAN on smoothing raw data and the BSA searched the optimal hyperparameters on forecasting performance are further discussed. The improvements of R, RMSE, MAE, and MAPE, provided by the model with decomposition technique and without decomposition technique, are further compared and listed in Table 5. All the improved percentages are over zero, which indicates that the CEEMDAN has a significant positive impact on strengthening the forecasting performance of solar radiation. Compared with the BSA-PGPR method, the P_RMSE of the proposed CBP model for four datasets are 52.49%, 65%, 49.32%, and 47.93%, respectively, for one-hour-ahead forecasting; 37.04%, 62.96%, 44.63%, and 45%, respectively, for three-hour-ahead forecasting. By embedding CEEMDAN, the prediction accuracy of the CBP is more attractive relative to the BSA-PGPR, and the average promotion rate in terms of RMSE is approximately 50%.

The four improvement percentage metrics, including P_R, P_RMSE, P_MAE, and P_MAPE, obtained by the model with the optimization algorithm and without the optimization algorithm are listed in Table 6. It indicates that the metric values of BSA-PGPR are slightly poorer than those of PGPR in a handful of the cases, such as three-step forecasting for Spring, and one-step forecasting for Winter and Spring. In other cases, the BSA-PGPR performs well, while the R, RMSE, MAE, and MAPE of CBP are superior to those of CEN-PGPR in all cases. This phenomenon indicates that the positive effect of BSA can be amplified by introducing the decomposition technique and highlights the advantages of combining CEEMDAN and BSA.

6. Conclusions

This study develops and attests the forecasting ability of a compound framework based on an improved GPR and CEEMDAN (called CBP) for multi-step hourly solar radiation prediction. In the improved GPR, a newly physical-based kernel is designed to enhance forecasting performance, and BSA is applied to optimize hypermeters of the physical-based GPR (PGPR). To attest the superiority of the proposed model, nine models, namely BP, GRNN, RBF, ELM, SVR, GPR-SE, PGPR, BSA-PGPR, and CEN-PGPR are established. The following is a summary of the study’s main findings:

(1): The proposed CBP model provided the best results, with the biggest R and lowest RMSE, MAE, and MAPE, which attested to the excellent forecasting ability of the CBP relative to the other comparative models;
(2): The forecasting results produced by the PGPR model are close to or superior to those of the GPR model, indicating that the newly designed kernel function enhances the interpretability of the GPR model while maintaining accuracy;
(3): The proposed PGPR model outperformed many widely used standalone BP, GRNN, RBF, ELM, and SVR models, which revealed the effectiveness of the CBP model relative to other comparative models;
(4): The BSA, applied in searching appropriate hyperparameters of the PGPR, along with the use of CEEMDAN to extract detailed information, can enhance the forecasting ability of CBP. Using BSA alone to optimize PGPR parameters does not always achieve an ideal forecasting performance.

In a word, the proposed CBP model can be a viable option for improving solar radiation forecasting. In the future, the prediction ability of the CBP can be further validated by employing other solar radiation data with various time scales and for other locations. The CBP model will also be used with other kinds of data, such as wind energy data.

Author Contributions

Conceptualization, N.S., T.P. and N.Z.; methodology, N.S.; validation, T.P. and S.Z.; writing—original draft preparation, N.S.; writing—review and editing, J.J., W.J., X.H. and T.P.; funding acquisition, N.S., and N.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of the Jiangsu Higher Education Institution of China (Nos. 20KJD480003, 19KJB480007, and 19KJB470012), the Natural Science Foundation of Jiangsu Province (No. BK20201069), the National Natural Science Foundation of China (Nos. 91547208 and 51909010), and the Jiangsu Innovative and Entrepreneurial Talents Project (JSSCBS(2020)31035).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Special thanks are given to the handling editor, two anonymous reviewers for their constructive comments and our colleague Muhammad Shahzad Nazir for English grammar modification that helped us to greatly improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Nomenclature

AnEn	Analog Ensemble
ANN	Artificial neural network
AR	Autoregressive
ARIMA	Auto-regressive integrated moving average
ARMA	Autoregressive moving average
BP	Back propagation neural network
BSA	Backtracking search optimization algorithm
CEEMD	Complementary EEMD
CEEMDAN	Complete EEMD with adaptive noise
DWT	Discrete wavelet transform
EEMD	Ensemble EMD
ELM	Extreme learning machine
EMD	Empirical mode decomposition
EWT	Empirical wavelet transform
GA	Genetic algorithms
GPR	Gaussian process regression
GRNN	Generalized regression neural network
IMF	Intrinsic mode function
MAE	Mean absolute error
MAPE	Mean average percentage error
ML	Machine learning
NWP	Numerical weather prediction
PACF	Partial autocorrelation coefficient function
PGPR	Gaussian process regression with physical-based kernel function
PSO	Particle swarm optimization
R	Pearson’s correlation coefficient
RBF	Radial basis function neural network
RF	Random forests
RMSE	Root mean square error
SVM/SVR	Support vector machine/Support vector regression
VMD	Variational mode decomposition

References

Murdock, H.E.; Gibb, D.; André, T.; Sawin, J.L.; Brown, A.; Ranalder, L.; Collier, U.; Dent, C.; Epp, B.; Hareesh Kumar, C.; et al. Renewables 2021-Global Status Report; UN Environment Programme: Paris, France, 2021. [Google Scholar]
Liu, Y.; Qin, H.; Zhang, Z.; Pei, S.; Wang, C.; Yu, X.; Jiang, Z.; Zhou, J. Ensemble spatiotemporal forecasting of solar irradiation using variational Bayesian convolutional gate recurrent unit network. Appl. Energy 2019, 253, 113596. [Google Scholar] [CrossRef]
Sun, S.; Wang, S.; Zhang, G.; Zheng, J. A decomposition-clustering-ensemble learning approach for solar radiation forecasting. Sol. Energy 2018, 163, 189–199. [Google Scholar] [CrossRef]
Dong, J.; Olama, M.M.; Kuruganti, T.; Melin, A.M.; Djouadi, S.M.; Zhang, Y.; Xue, Y. Novel stochastic methods to predict short-term solar radiation and photovoltaic power. Renew. Energy 2020, 145, 333–346. [Google Scholar] [CrossRef]
Niu, D.; Wang, K.; Sun, L.; Wu, J.; Xu, X. Short-term photovoltaic power generation forecasting based on random forest feature selection and CEEMD: A case study. Appl. Soft Comput. 2020, 93, 106389. [Google Scholar] [CrossRef]
Huang, J.; Korolkiewicz, M.; Agrawal, M.; Boland, J. Forecasting solar radiation on an hourly time scale using a Coupled AutoRegressive and Dynamical System (CARDS) model. Sol. Energy 2013, 87, 136–149. [Google Scholar] [CrossRef]
Sun, H.; Yan, D.; Zhao, N.; Zhou, J. Empirical investigation on modeling solar radiation series with ARMA–GARCH models. Energy Convers. Manag. 2015, 92, 385–395. [Google Scholar] [CrossRef]
David, M.; Ramahatana, F.; Trombe, P.-J.; Lauret, P. Probabilistic forecasting of the solar irradiance with recursive ARMA and GARCH models. Sol. Energy 2016, 133, 55–72. [Google Scholar] [CrossRef] [Green Version]
Alsharif, M.H.; Younes, M.K.; Kim, J. Time series ARIMA model for prediction of daily and monthly average global solar radiation: The case study of Seoul, South Korea. Symmetry 2019, 11, 240. [Google Scholar] [CrossRef] [Green Version]
Ding, S.; Li, R.; Tao, Z. A novel adaptive discrete grey model with time-varying parameters for long-term photovoltaic power generation forecasting. Energy Convers. Manag. 2021, 227, 113644. [Google Scholar] [CrossRef]
Reddy, K.S.S.; Ranjan, M. Solar resource estimation using artificial neural networks and comparison with other correlation models. Energy Convers. Manag. 2003, 44, 2519–2530. [Google Scholar] [CrossRef]
Zeng, J.; Qiao, W. Short-term solar power prediction using a support vector machine. Renew. Energy 2013, 52, 118–127. [Google Scholar] [CrossRef]
Al-Dahidi, S.; Ayadi, O.; Adeeb, J.; Alrbai, M.; Qawasmeh, B.R. Extreme learning machines for solar photovoltaic power predictions. Energies 2018, 11, 2725. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Cui, N.; Feng, Y.; Gong, D.; Hu, X. Comparison of BP, PSO-BP and statistical models for predicting daily global solar radiation in arid Northwest China. Comput. Electron. Agric. 2019, 164, 104905. [Google Scholar] [CrossRef]
Yagli, G.M.; Yang, D.; Srinivasan, D. Automatic hourly solar forecasting using machine learning models. Renew. Sustain. Energy Rev. 2019, 105, 487–498. [Google Scholar] [CrossRef]
Moreira, M.O.; Balestrassi, P.P.; Paiva, A.P.; Ribeiro, P.F.; Bonatto, B.D. Design of experiments using artificial neural network ensemble for photovoltaic generation forecasting. Renew. Sustain. Energy Rev. 2021, 135, 110450. [Google Scholar] [CrossRef]
Yujing, S.; Fei, W.; Zhao, Z.; Zengqiang, M.; Chun, L.; Bo, W.; Jing, L. Research on Short-Term Module Temperature Prediction Model Based on BP Neural Network for Photovoltaic Power Forecasting. In Proceedings of the 2015 IEEE Power & Energy Society General Meeting, Denver, CO, USA, 26–30 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–5. [Google Scholar]
Benali, L.; Notton, G.; Fouilloy, A.; Voyant, C.; Dizene, R. Solar radiation forecasting using artificial neural network and random forest methods: Application to normal beam, horizontal diffuse and global components. Renew. Energy 2019, 132, 871–884. [Google Scholar] [CrossRef]
Cervone, G.; Clemente-Harding, L.; Alessandrini, S.; Delle Monache, L. Short-term photovoltaic power forecasting using Artificial Neural Networks and an Analog Ensemble. Renew. Energy 2017, 108, 274–286. [Google Scholar] [CrossRef] [Green Version]
Feng, Y.; Hao, W.; Li, H.; Cui, N.; Gong, D.; Gao, L. Machine learning models to quantify and map daily global solar radiation and photovoltaic power. Renew. Sustain. Energy Rev. 2020, 118, 109393. [Google Scholar] [CrossRef]
Sun, N.; Zhang, S.; Peng, T.; Zhou, J.; Sun, X. A Composite Uncertainty Forecasting Model for Unstable Time Series: Application of Wind Speed and Streamflow Forecasting. IEEE Access 2020, 8, 209251–209266. [Google Scholar] [CrossRef]
Sun, N.; Zhang, S.; Peng, T.; Zhang, N.; Zhou, J.; Zhang, H. Multi-Variables-Driven Model Based on Random Forest and Gaussian Process Regression for Monthly Streamflow Forecasting. Water 2022, 14, 1828. [Google Scholar] [CrossRef]
Zhu, S.; Luo, X.; Xu, Z.; Ye, L. Seasonal streamflow forecasts using mixture-kernel GPR and advanced methods of input variable selection. Hydrol. Res. 2018, 50, 200–214. [Google Scholar] [CrossRef]
Alali, Y.; Harrou, F.; Sun, Y. Optimized Gaussian Process Regression by Bayesian Optimization to Forecast COVID-19 Spread in India and Brazil: A Comparative Study. In Proceedings of the 2021 International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia, 2–4 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Sun, N.; Zhou, J.; Chen, L.; Jia, B.; Tayyab, M.; Peng, T. An adaptive dynamic short-term wind speed forecasting model using secondary decomposition and an improved regularized extreme learning machine. Energy 2018, 165, 939–957. [Google Scholar] [CrossRef]
Ghimire, S.; Deo, R.C.; Raj, N.; Mi, J. Wavelet-based 3-phase hybrid SVR model trained with satellite-derived predictors, particle swarm optimization and maximum overlap discrete wavelet transform for solar radiation prediction. Renew. Sustain. Energy Rev. 2019, 113, 109247. [Google Scholar] [CrossRef]
Li, F.-F.; Wang, S.-Y.; Wei, J.-H. Long term rolling prediction model for solar radiation combining empirical mode decomposition (EMD) and artificial neural network (ANN) techniques. J. Renew. Sustain. Energy 2018, 10, 013704. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, Q.; Li, G.; Wu, J.; Wang, C.; Li, Z. Hybrid Model for Renewable Energy and Load Forecasting Based on Data Mining and EWT. J. Electr. Eng. Technol. 2022, 17, 1517–1532. [Google Scholar] [CrossRef]
Jiang, Y.; Zheng, L.; Ding, X. Ultra-short-term prediction of photovoltaic output based on an LSTM-ARMA combined model driven by EEMD. J. Renew. Sustain. Energy 2021, 13, 046103. [Google Scholar] [CrossRef]
Zhang, X.; Wei, Z. A Hybrid Model Based on Principal Component Analysis, Wavelet Transform, and Extreme Learning Machine Optimized by Bat Algorithm for Daily Solar Radiation Forecasting. Sustainability 2019, 11, 4138. [Google Scholar] [CrossRef] [Green Version]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 4144–4147. [Google Scholar]
Schulz, E.; Speekenbrink, M.; Krause, A. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. J. Math. Psychol. 2018, 85, 1–16. [Google Scholar] [CrossRef]
Civicioglu, P. Backtracking Search Optimization Algorithm for numerical optimization problems. Appl. Math. Comput. 2013, 219, 8121–8144. [Google Scholar] [CrossRef]
Zhang, C.; Peng, T.; Nazir, M.S. A novel hybrid approach based on variational heteroscedastic Gaussian process regression for multi-step ahead wind speed forecasting. Int. J. Electr. Power Energy Syst. 2022, 136, 107717. [Google Scholar] [CrossRef]
Peng, T.; Zhang, C.; Zhou, J.; Nazir, M.S. An integrated framework of Bi-directional long-short term memory (BiLSTM) based on sine cosine algorithm for hourly solar radiation forecasting. Energy 2021, 221, 119887. [Google Scholar] [CrossRef]
Cao, E.; Bao, T.; Gu, C.; Li, H.; Liu, Y.; Hu, S. A Novel Hybrid Decomposition—Ensemble Prediction Model for Dam Deformation. Appl. Sci. 2020, 10, 5700. [Google Scholar] [CrossRef]

Figure 1. Implementation of the proposed CBP model.

Figure 2. Historical solar radiation.

Figure 3. Subseries obtained by CEEMDAN for Spring dataset.

Figure 4. Correlogram with 95% CI for the raw data and its decomposed subseries (for Spring).

Figure 5. R, RMSE, MAE, and MAPE (left to right) for one-hour-ahead forecasting.

Figure 6. One-hour-ahead forecasting results of the SVR, PGPR, BSA-PGPR, and CBP for Spring.

Figure 7. One-hour-ahead forecasting results of the SVR, PGPR, BSA-PGPR, and CBP for Summer.

Figure 8. One-hour-ahead forecasting results of the SVR, PGPR, BSA-PGPR, and CBP for Autumn.

Figure 9. One-hour-ahead forecasting results of the SVR, PGPR, BSA-PGPR, and CBP for Winter.

Figure 10. Scatters of the SVR, PGPR, BSA-PGPR, and CBP for one-hour-ahead forecasting.

Figure 11. Interval prediction results for one-hour-ahead of the CBP model.

Figure 12. R, RMSE, MAE and MAPE (left to right) for two-hour-ahead forecasting.

Figure 13. R, RMSE, MAE and MAPE (left to right) for three-hour-ahead forecasting.

Table 1. Statistics for all datasets.

Datasets	Ave. (W/m²)	Std. (W/m²)	Cv.	Cs.	Max. (W/m²)	Min. (W/m²)	Sample Size	Data Period
Spring	526.4	340.0	0.65	−0.13	1029	1	1327	2017/3/20~2017/6/20
Summer	483.0	326.7	0.68	−0.03	1033	1	1347	2017/6/21~2017/9/21
Autumn	337.4	224.3	0.66	0.08	804	1	998	2017/9/22~2017/12/20
Winter	300.2	216.9	0.72	0.36	817	1	991	2017/12/21~2018/3/19

Table 2. Evaluation indices for one-hour-ahead forecasting.

Datasets	Metrics	BP	GRNN	RBF	ELM	SVR	GPR-SE	PGPR	BSA-PGPR	CEN-PGPR	CBP
Spring	R	0.9879	0.9735	0.9747	0.9764	0.9814	0.9880	0.9831	0.9881	0.9965	0.9973
	RMSE (W/m²)	54	85	81	79	67	54	63	54	29	26
	MAE (W/m²)	31	66	60	59	42	31	34	31	17	17
	MAPE (%)	1.42	3.14	3.05	2.93	1.97	2.15	2.32	2.18	1.02	0.95
Summer	R	0.9790	0.8785	0.9464	0.9518	0.9759	0.8822	0.9728	0.9762	0.9960	0.9969
	RMSE (W/m²)	64	177	109	103	69	159	73	71	28	25
	MAE (W/m²)	41	115	77	75	44	92	49	46	19	17
	MAPE (%)	1.43	5.30	3.66	4.11	1.25	5.59	1.52	1.66	0.83	0.79
Fall	R	0.9420	0.8491	0.8633	0.9027	0.9272	0.8922	0.9300	0.9405	0.9829	0.9857
	RMSE (W/m²)	60	111	92	81	71	84	67	62	33	31
	MAE (W/m²)	43	92	68	64	54	59	49	44	24	23
	MAPE (%)	0.52	1.00	1.08	0.71	0.60	0.85	0.60	0.64	0.31	0.30
Winter	R	0.9202	0.8663	0.8532	0.8486	0.8928	0.9235	0.9246	0.9255	0.9796	0.9817
	RMSE (W/m²)	100	135	144	134	113	97	96	95	51	50
	MAE (W/m²)	77	108	118	109	85	73	73	71	39	37
	MAPE (%)	3.41	4.18	7.47	5.93	3.42	4.35	4.01	3.73	2.00	1.92

Table 3. Evaluation indices for two-hour-ahead forecasting.

Datasets	Metrics	BP	GRNN	RBF	ELM	SVR	GPR-SE	PGPR	BSA-PGPR	CEN-PGPR	CBP
Spring	R	0.9718	0.9478	0.9579	0.9486	0.9436	0.9750	0.9723	0.9751	0.9915	0.9927
	RMSE (W/m²)	86	117	105	114	116	84	86	81	46	43
	MAE (W/m²)	57	95	83	92	84	59	56	54	31	29
	MAPE (%)	3.70	3.44	3.69	6.06	6.34	3.56	3.62	3.26	1.90	1.83
Summer	R	0.9480	0.8111	0.8940	0.8998	0.9292	0.8488	0.9336	0.9435	0.9864	0.9927
	RMSE (W/m²)	109	219	150	149	127	183	116	110	51	37
	MAE (W/m²)	73	141	107	111	91	122	83	74	38	27
	MAPE (%)	3.76	6.04	7.04	5.59	3.71	7.43	4.04	3.62	1.39	1.36
Fall	R	0.8367	0.7049	0.7769	0.7868	0.7458	0.8015	0.7988	0.8457	0.9633	0.9697
	RMSE (W/m²)	98	152	116	119	134	110	112	97	49	45
	MAE (W/m²)	73	115	86	96	107	82	84	73	36	35
	MAPE (%)	0.96	1.44	1.29	1.12	1.14	1.27	1.07	1.05	0.52	0.52
Winter	R	0.8254	0.7612	0.7267	0.7245	0.7369	0.8299	0.8346	0.8408	0.9622	0.9654
	RMSE (W/m²)	145	170	176	177	170	142	139	137	70	68
	MAE (W/m²)	116	141	147	146	139	115	112	110	53	53
	MAPE (%)	5.03	4.91	8.50	5.97	7.55	5.59	5.06	4.88	3.10	2.61

Table 4. Evaluation indices for three-hour-ahead forecasting.

Datasets	Metrics	BP	GRNN	RBF	ELM	SVR	GPR-SE	PGPR	BSA-PGPR	CEN-PGPR	CBP
Spring	R	0.9497	0.9242	0.9383	0.9173	0.9040	0.9571	0.9587	0.9585	0.9805	0.9813
	RMSE (W/m²)	112	139	132	139	147	112	110	106	68	67
	MAE (W/m²)	78	116	108	117	117	85	84	78	48	47
	MAPE (%)	2.27	3.58	4.33	5.89	6.31	2.97	3.00	3.08	2.46	1.95
Summer	R	0.9046	0.7885	0.8486	0.8460	0.8589	0.8152	0.8806	0.9037	0.9719	0.9845
	RMSE (W/m²)	144	236	184	182	179	207	160	147	73	54
	MAE (W/m²)	100	158	135	141	138	145	117	105	56	40
	MAPE (%)	6.13	7.29	8.49	7.54	7.52	8.07	6.77	6.52	2.30	1.90
Fall	R	0.7693	0.6666	0.7617	0.6781	0.5326	0.7781	0.6881	0.7924	0.9314	0.9437
	RMSE (W/m²)	115	152	116	141	175	113	137	110	67	61
	MAE (W/m²)	88	119	91	118	148	85	107	83	50	48
	MAPE (%)	0.88	1.40	1.21	1.37	1.67	1.19	1.37	1.12	0.76	0.70
Winter	R	0.7159	0.6372	0.6747	0.6224	0.5339	0.7258	0.7341	0.7406	0.9268	0.9346
	RMSE (W/m²)	177	198	191	202	219	173	171	169	97	93
	MAE (W/m²)	144	168	163	167	180	144	141	139	79	76
	MAPE (%)	6.08	6.61	8.13	6.03	9.08	6.11	5.27	4.69	5.03	3.53

Table 5. Improvements of models with decomposition technique relative to that without decomposition technique.

Datasets	Improvements (%)	CEN-PGPR vs. PGPR			CBP vs. BSA-PGPR
Datasets	Improvements (%)	1-hour	2-hour	3-hour	1-hour	2-hour	3-hour
Spring	P_R	1.37	1.97	2.27	0.93	1.80	2.37
	P_RMSE	54.09	47.24	37.84	52.49	47.82	37.04
	P_MAE	47.85	44.63	42.30	44.75	45.69	39.26
	P_MAPE	56.06	47.44	17.87	56.32	43.84	36.61
Summer	P_R	2.39	5.66	10.37	2.13	5.22	8.94
	P_RMSE	62.41	56.42	54.76	65.00	66.38	62.96
	P_MAE	61.17	54.03	52.31	62.37	63.96	62.10
	P_MAPE	45.72	65.58	66.00	52.80	62.57	70.86
Fall	P_R	5.69	20.60	35.35	4.81	14.67	19.10
	P_RMSE	51.32	56.45	51.32	49.32	53.54	44.63
	P_MAE	51.85	56.81	53.36	48.52	51.28	42.56
	P_MAPE	47.99	51.74	44.65	53.53	50.87	37.16
Winter	P_R	5.96	15.30	26.25	6.08	14.82	26.20
	P_RMSE	46.91	50.00	43.13	47.93	49.94	45.00
	P_MAE	46.67	52.41	44.30	47.30	52.15	45.04
	P_MAPE	50.16	38.81	4.59	48.68	46.46	24.76

Table 6. Improvements of models with BSA optimization relative to those without BSA optimization.

Datasets	Improvements (%)	BSA-PGPR vs. PGPR			CBP vs. CEN-PGPR
Datasets	Improvements (%)	1-hour	2-hour	3-hour	1-hour	2-hour	3-hour
Spring	P_R	0.51	0.29	−0.02	0.07	0.12	0.08
	P_RMSE	13.58	5.54	3.17	11.8	7.05	1.95
	P_MAE	8.02	4.02	7.45	2.63	6.22	2.65
	P_MAPE	5.93	9.89	−2.68	6.95	3.86	26.2
Summer	P_R	0.35	1.06	2.62	0.09	0.64	1.28
	P_RMSE	3.81	5.06	8.56	11.66	36.52	33.59
	P_MAE	7.62	10.04	9.77	11.71	41.78	39.45
	P_MAPE	−9.43	10.40	3.81	5.1	2.65	21.31
Fall	P_R	1.13	5.87	15.15	0.29	0.66	1.3
	P_RMSE	8.60	13.15	19.81	5.09	7.95	9.63
	P_MAE	10.33	13.29	22.64	4.3	2.24	4.95
	P_MAPE	−8.02	1.67	18.82	3.61	0.11	8.51
Winter	P_R	0.10	0.74	0.88	0.21	0.33	0.83
	P_RMSE	1.04	1.84	0.92	3.01	1.73	4.35
	P_MAE	2.13	1.86	1.78	3.39	1.34	3.18
	P_MAPE	6.94	3.66	11.07	4.36	18.63	42.6

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, N.; Zhang, N.; Zhang, S.; Peng, T.; Jiang, W.; Ji, J.; Hao, X. An Integrated Framework Based on an Improved Gaussian Process Regression and Decomposition Technique for Hourly Solar Radiation Forecasting. Sustainability 2022, 14, 15298. https://doi.org/10.3390/su142215298

AMA Style

Sun N, Zhang N, Zhang S, Peng T, Jiang W, Ji J, Hao X. An Integrated Framework Based on an Improved Gaussian Process Regression and Decomposition Technique for Hourly Solar Radiation Forecasting. Sustainability. 2022; 14(22):15298. https://doi.org/10.3390/su142215298

Chicago/Turabian Style

Sun, Na, Nan Zhang, Shuai Zhang, Tian Peng, Wei Jiang, Jie Ji, and Xiangmiao Hao. 2022. "An Integrated Framework Based on an Improved Gaussian Process Regression and Decomposition Technique for Hourly Solar Radiation Forecasting" Sustainability 14, no. 22: 15298. https://doi.org/10.3390/su142215298

APA Style

Sun, N., Zhang, N., Zhang, S., Peng, T., Jiang, W., Ji, J., & Hao, X. (2022). An Integrated Framework Based on an Improved Gaussian Process Regression and Decomposition Technique for Hourly Solar Radiation Forecasting. Sustainability, 14(22), 15298. https://doi.org/10.3390/su142215298

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Integrated Framework Based on an Improved Gaussian Process Regression and Decomposition Technique for Hourly Solar Radiation Forecasting

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Literature Review

1.3. Contribution and Organization

2. Methodology

2.1. Solar Radiation Prediction Based on ML

2.2. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

2.3. Gaussian Process Regression with Physical-Based Kernel Function (PGPR)

2.3.1. Basic Principles of Gaussian Process Regression

2.3.2. Covariance Functions and Their Hyperparameters Determination

2.3.3. Physical-Based Kernel Function

2.4. Backtracking Search Optimization Algorithm

2.5. Flowchart of CEEMDAN-BSA-PGPR (CBP) Model

3. Case Studies

3.1. Data Collection

3.2. Parameter Setting

3.3. Determination of Input Variables

3.4. Performance Criteria

4. Results

4.1. Results of 1-Hour Ahead Forecasting

4.2. Results of Multi-Hour Ahead Forecasting

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI