Evaluating the Potential of Gaussian Process Regression for Solar Radiation Forecasting: A Case Study

: The proliferation of solar power systems could cause instability within existing power grids due to the variable nature of solar power. A well-deﬁned statistical model is important for managing the supply-and-demand dynamics of a power system that contains a signiﬁcant variable renewable energy component. It is furthermore important to consider the inherent uncertainty in the data when modeling such a complex power system. Gaussian process regression has the potential to address both of these concerns: the probabilistic modeling of solar radiation data could assist in managing the variability of solar power, as well as provide a mechanism to deal with uncertainty. In this paper, solar radiation data was obtained from the Southern African Universities Radiometric Network and used to train a Gaussian process regression model which was developed especially for this purpose. Attention was given to constructing an appropriate Gaussian process kernel. It was found that a carefully constructed kernel allowed for the successful interpolation of global horizontal irradiance data, with a root-mean-squared error of 82.2W/m 2 . Gaps in the data, due to possible meter failure, were also bridged by the Gaussian process with a root-mean-squared error of 94.1 W/m 2 and accompanying conﬁdence intervals. A root-mean-squared error of 151.1 W/m 2 was found when forecasting the global horizontal irradiance with a forecasting horizon of ﬁve days. These results, achieved in modeling solar radiation data using Gaussian process regression, could open new avenues in the development of probabilistic renewable energy management systems. Such systems could aid smart grid operators and support energy trading platforms, by allowing for better-informed decisions that incorporate the inherent uncertainty of stochastic power systems.


Introduction
The variability of renewable energy resources, such as solar and wind, poses a challenge to the stability of the electricity grid. One way of mitigating this challenge is to use grid-scale energy storage systems such as lithium-ion, lead-acid and redox flow batteries as well as molten salt storage [1]. Molten salt energy storage is commonly used in conjunction with concentrated solar power (CSP) plants to store heat and allow these plants to deliver energy in the evening (when the sun has set and energy demand is typically high) [2]. Pumped hydro storage is also used to store large amounts of energy and can be used when the network needs sudden support during times of peak energy consumption [3]. In recent years, hydrogen has garnered a lot of attention as a storage medium for renewable energy due to diverse production sources [4]. In such a storage system, hydrogen (preferably produced from renewable energy sources and accordingly called 'green hydrogen') is converted to electrical energy and water via a fuel cell [4].
An energy system containing a large component of variable renewable energy generation as well as energy storage, requires a robust energy management system to ensure that energy is dispatched when and where needed, in the most cost-effective way, while taking into account the variability of renewable energy resources such as solar and wind [5]. Such a management system could possibly gain from the probabilistic modeling and forecasting of renewable energy resource behavior. Gaussian process regression could form part of this solution by allowing energy management entities to make better-informed decisions. The Energy [R]evolution 2010 scenario, developed by Teske et al. [6], requires renewable energy generation to constitute 40% of the global primary energy supply by the year 2030 in order to reach the climate goals set out in the Paris Agreement. Furthermore, according to the European Commission's Energy Roadmap 2050, the share of renewable energy sources in the gross final energy consumption in the EU could top 55% by the middle of the century. As of 2017 12.1% of electricity produced worldwide were from renewable sources [7]. Finding solutions to the variability problem are therefore important. According to Bumpus and Comello [8] an average annual investment in renewable energy of US$1 trillion is required until the year 2050 in order to reach the goals of the Paris Agreement. Much of the current investment in renewable energy technology is underpinned by subsidies, which cannot be taken for granted going forward [9]. An important component of a climate change mitigation strategy should therefore be energy efficiency [9]. The proliferation of energy system data, combined with improved computational power, enable better management of complex and stochastic renewable energy systems, resulting in improved energy efficiency [10]. Gaussian process regression might therefore play a role in improving the efficiency of renewable energy systems.
The success of data analysis in grid management strategies is dependant on the quality of the energy system data [11]. There is inherent uncertainty in energy system data which affects its quality [12]. It is therefore important to define the uncertainty in such a way that its impact on data quality can be mitigated. The probabilistic nature of Gaussian process models allows for uncertainty to be well-defined [13]. Finally, determining the parameters that govern an energy system that has a high sensitivity to complex exogenous parameters, such as an energy system with a significant renewable energy component, can be expensive and time-consuming. Gaussian process regression allows for the parameters to be learned by a machine without having in-depth knowledge of the underlying energy system operation. This leads to more robust predictions of energy production [14] and improved confidence in energy system models [13].
State-of-the-art forecasting techniques for power supply and demand include numerical weather prediction (NWP), autoregression (AR), moving average (MA), autoregression moving average (ARMA), Kalman filter, artificial neural network (ANN), support vector regression (SVR) and genetic algorithm [15]. An energy forecasting method for buildings using Gaussian process regression has been introduced by Prakash et al. [16]. They have shown that Gaussian process regression produces more accurate forecasts when compared to other state-of-the-art forecasting algorithms. Gaussian process quantile regression (GPQR) is proposed by Yang et al. [11] for the formulation and prediction of the power load in a smart microgrid. The prediction performance of Gaussian process regression has been quantified by Wågberg et al. [17]. Kamath et al. [18] fitted a potential energy surface for formaldehyde using both neural networks and Gaussian process regression and then compared the errors. The vibrational spectra computed from the fitted potential energy surface were also compared to a reference vibrational spectrum for formaldehyde, in order to assess the accuracy of both methods. The Gaussian process regression was done using the Python scikit learn and different kernel functions were tested [18]. The study by Kamath et al. [18] highlights some of the advantages of Gaussian process regression over neural networks. The fitting error was found to be smaller for Gaussian process regression, as given by the root-mean-squared error, while the neural network required more data to achieve a similar error. A smaller number of points on the potential energy surface is needed for quantum dynamics calculations with Gaussian processes than is the case for neural networks, while Gaussian process regression is easier to use than neural networks. Overfitting is not as much of a concern with Gaussian process regression as it is with neural networks [18]. Gaussian Processes for Machine Learning [19] provides a unified account of the application of Gaussian process models in machine learning.
Tolba et al. [20] used Online Gaussian Process Regression and Online Sparse Gaussian Process Regression to forecast global horizontal irradiance (GHI) over time horizons ranging between 30 min and 48 h. By carefully considering kernel selection, Tolba et al. [20] have furthermore compared the performance of different kernels (simple as well as compound kernels) for application in GHI data modelling. A thorough investigation into the effect of different kernels on the performance of the model was done. Tolba et al. [20] have found that quasi-quadratic kernels are particularly useful for modelling and forecasting GHI data and suggest that a periodic component should be present in a kernel for modelling the global structure of GHI data, while a random component should be used for modelling local variation due to atmospheric disturbances. This result of Tolba et al. [20] has been confirmed in the current paper.
Prediction by making use of Gaussian processes originated with Kolmogorov (1941) and Wiener (1949). It has also found use in the field of geostatistics where it was first used by the statistician and mining engineer Danie G. Krige for the valuation of new gold mines using a limited number of boreholes [21]. It was later realized that Gaussian process regression could be used for prediction within a more general, multivariate setting [19]. It is in this multidimensional framework that Gaussian process regression will be applied in this paper.
Rasmussen and Williams [19] give important insight into the relation between Gaussian process regression and artificial neural networks. Lee et al. [22] and Matthews et al. [23] studied the similarity between infinitely wide neural networks and Gaussian processes. Novak et al. [24] expanded this neural network equivalent Gaussian process for multi-layered convolution neural networks. Gaussian process regression is more convenient to handle and interpret than neural networks [19]. Furthermore, according to Rasmussen and Williams [19], a model with a finite-dimensional parameter vector (such as artificial neural networks) will not be universally consistent. This limitation is not present with non-parametric models such as Gaussian processes. Semi-parametric models, such neural networks with a dynamic set of hidden units, is an intermediate solution to the problem of universal consistency. Another factor to consider when comparing neural networks to Gaussian process regression is the occurrence of local optima when optimizing the model: with artificial neural networks local optima can occur, while this is not the case with Gaussian process regression since its posterior is convex [19]. This paper is based on a thesis titled "Evaluating the Potential of Gaussian Process Regression for Data-driven Renewable Energy Management" (the full thesis is available at http://hdl.handle.net/ 10019.1/107207) [25]. It was found that Gaussian process regression can successfully interpolate and predict solar radiation data, on the condition that an appropriate kernel is constructed. The application of Gaussian process regression to solar radiation data could be valuable for the successful integration of variable renewable energy resources into the grid.

Gaussian Process Regression
For this paper, Gaussian process regression was done by considering inference directly in function space (an equivalent way of achieving the same result would be to consider Gaussian process regression in weight-space). Imagine a function f (x) to be an infinitely long vector, with each entry representing an instance of f (x) at an input x. By considering the instances in the vector to be properties of a stochastic process, the properties of the function f (x) can be inferred by the Gaussian process based on only a finite number of points. In order to do this, the mean and covariance of a given training set are determined as functions of the position of a data point within the data set. The covariance is encoded by making use of an appropriate kernel that employs hyperparameters to describe the relationship between data points in the set. When interpolation and prediction is done, the learnt mean and covariance are used to specify a distribution of possible outputs-p(y * |x * , X, y)-for a specific input, x * , given the training set (X, y). This process is illustrated in Figure 1 [19]. x 1 to x c and y 1 to y c form the training set, while (x * , y * ) forms the test set. The Gaussian field represents the Gaussian distribution over functions.
Due to the probabilistic nature of the Gaussian process, measurement uncertainty in the training data set is well defined. However, interpolation errors could also be modelling-related. A large condition number for the covariance matrix could cause an error in the matrix computations. Scikit-learn's GaussianProcessRegressor [26], which was used for this paper, takes care of this problem by adding value to each entry in the diagonal of the covariance matrix, thus ensuring that the matrix is positive definite.
Three kernels are of importance in this paper.

Periodic Kernel
The periodic kernel (Per) is a stationary covariance function given by with p the period and l the characteristic length scale. The periodic kernel can be used to model functions having a repetitive pattern [27]. Figure 2 illustrates the prior and posterior of a periodic kernel and was obtained by adapting code provided by [26].

RBF Kernel
The radial basis function (RBF) kernel is given by with l being the characteristic length scale, which can be tuned to specify the precise shape of the covariance function, and σ 2 f being a constant noise function. The RBF kernel is infinitely differentiable and is therefore handy when modeling the characteristic of smoothness of a function [26]. The RBF kernel could be used to represent a local variation within a dataset [28]. It is the most widely-used kernel [19]. Figure 3 illustrates a prior and posterior constructed by using the RBF kernel and was obtained by adapting code provided by [26].

Rational Quadratic Kernel
The rational quadratic function (RQ) is given by where l is the characteristic length scale. The hyperparameter α provides the scale mixture, which allows the rational quadratic function to act like an infinite sum of RBF kernels with different length scales. Figure 4 illustrates a prior and posterior obtained using the rational quadratic function as kernel. The figure was created by adapting code provided by [26].

Methodology
Weather data for the week 1 February 2015 to 8 February 2015 was acquired from the Stellenbosch weather station of the Southern African Universities Radiometric Network (Sauran) [29].

Illustrating the Possible Effect of Interval Deficiency on Weather Data
The weather data is averaged over 1 h periods for this study. This diminishes the effect of short term variability in atmospheric conditions. As Tolba et al. [20] point out, the effect of short-term variability is an important consideration when constructing a kernel. To illustrate the effect of sampling interval on data characteristics, wind speed data from the Stellenbosch weather station was sampled at different intervals, ranging from 1 h to 12 h. A Gaussian process regression was done for each of the wind speed data sets. This is illustrated in Figure 5-note how short-term variability is lost as the sampling interval increases, as well as how the confidence bounds of the Gaussian process regression become wider. For each sampling frequency, the Weibull shape and scale parameters were calculated from the Gaussian process interpolation of the wind speed and compared to reference values. The reference values for the Weibull shape and scale parameters were calculated by using wind speed data averaged every 10 min, in accordance with the IEC 61400-12-1:2005(E) standard of the International Electrotechnical Commission. The degree to which the wind speed data interpolation deviates from the real behavior of the wind resource, as the interval deficiency in the data becomes larger, is measured by the percentage deviation of the Weibull parameters from the reference values. The results are given in Table 1.  Table 1. The impact of the interval deficiency in wind speed data on the Weibull parameters, compared to wind speed data average over 10 min periods.  Table 2 lists the weather metrics available from the Sauran Stellenbosch weather station [29]. Figure 6 includes the plots of the weather metrics, averaged over 1 h periods for the week 1 February 2015 to 8 February 2015, that were used to train the Gaussian process regression algorithm [25]. The Gaussian process algorithm will therefore use these metrics to construct a covariance matrix that represents the degree of correlation between the various weather metrics. The following metrics are expected to be strongly correlated: GHI, DNI, DHI, UVA, UVB and air temperature. The other metrics may exhibit weak correlation-whether this is the case will be determined by the Gaussian process regression algorithm. The stochastic behavior of the parameters calls for Gaussian process regression.  The weather data for the week 1 February 2015 to 8 February 2015 was used to train a Gaussian process regression algorithm. A standard Gaussian process regression was employed, with a multi-in-single-out structure. This means that the Gaussian process regressor was trained on a multi-dimensional array of input data, but that interpolation and prediction was only done for a single output, namely GHI. Alternative structures for a Gaussian process regression model include online Gaussian process regression (OGPR) and online sparse Gaussian process regression (OSGPR), both of which have been investigated within the context of GHI forecasting by Tolba et al. [20].

Gaussian Process Regression on GHI Data
A compound kernel, Per × RQ, was utilized to construct the covariance matrix. The choice of kernel was arrived at after testing different kernel configurations, such as Per, RQ, Per × RQ and RBF × (RQ + Per). The use of the Per × RQ kernel was also informed by the characteristics of the respective kernels, as described in Sections 2.1-2.3. The algorithm was coded in Jupyter Notebook using the Python libraries NumPy, pandas, scikit-learn and Matplotlib and is illustrated schematically in Figure 7. All code was implemented on an Intel Core i7-5600U 2.60 GHz CPU with two cores and four logical processors.
In order to test the ability of the algorithm to bridge gaps in the data, meter failure was simulated by removing entries within the data set from and including 2 February 2015 09:00 to 2 February 2015 16:00, as well as from and including 3 February 2015 10:00 to 3 February 2015 12:00. The resulting training data set consisted of 137 hourly data points between t 0 = 0 hours and t end = 177 hours. Each data point was thirteen-dimensional, [1 × 13], since it contained, at each time step, readings for GHI, DNI, DHI, DHI_shadowband, UVA, UVB, air_temp, RH, WS, WD, WD_SD and BP (see Table 2).
The resulting Gaussian process model was used to interpolate global horizontal irradiance (GHI), as this parameter is generally used to calculate the available solar power at a given site on the earth's surface. The root-mean-squared (RMS) error was used to quantify the goodness-of-fit for the interpolation. The RMS error was also employed by [20].
An attempt was also made to predict the behaviour of solar radiation data using the Per × RQ kernel. A Gaussian process model was trained on the data set that spanned the period from and including 1 February 2015 00:00 to 10 February 2015 05:00, while the test set comprised the rest of the GHI data up to and including 14 February 2015 23:00, concluding a two-week training and test set. No meter failure was simulated with the forecasting.  Figure 8 shows the interpolation of GHI-data after training the multi-in-single-out Gaussian process regression model on the one-hourly averaged weather data set (with no meter failure), using a Per × RQ kernel. The inlay shows how the predictions align with measured values. The root-mean-squared error for the interpolation with meter failure, when compared to the measured GHI-values, was found to be 82.2 W/m 2 . Figure 8. Gaussian process regression of one-hourly averaged global horizontal irradiance (GHI)-data, where no meter failure is present, with a Per × RQ kernel. Inlay: the points constituting the Gaussian process regression, together with observations of GHI, sampled every minute. Figure 9 shows the result of interpolating GHI-data where meter failure is present. Four different kernels were employed for this interpolation: Per, RQ, Per × RQ and RBF × (RQ + Per), while the Gaussian process regression model was once again trained on one-hourly averaged weather data. The root-mean-squared errors for each of the different kernels are given in Table 3. Table 3. The root-mean-square error of Gaussian process regression on one-hourly averaged data for different kernels used.   Figure 10 shows the result of forecasting GHI-data using the Per × RQ kernel. The root-mean-squared error over the forecast period was found to be 151.1 W/m 2 . Figure 10. Gaussian process forecast using a periodic kernel times a rational quadratic kernel.

Discussion
The process of interpolation of GHI data through Gaussian process regression is not trivial and clear answers on the reasons for certain kernels performing better than others are not apparent.
It was found that a compound kernel consisting of a periodic kernel component and a rational quadratic kernel component provided the best interpolation and prediction results for solar radiation data. This seems to confirm work done by Tolba et al. [5,20], who found that quasiperiodic-kernel-based GPR is particularly well-suited for modeling GHI-data. The kernel also managed to bridge gaps in the data.
Caution should be employed when interpreting the root-mean-squared error. It is used to evaluate the quality of fit of the Gaussian process regression over the whole time period and the effect of meter failure during a relatively small period of time therefore influences the root-mean-squared error disproportionately. The root-mean-squared error could, however, still be used to rank the different kernels, as was done in Table 3. The goodness-of-fit could also be evaluated by interpreting the 95 % confidence intervals on the regression points. This approach does not give an average error and allows for a localized evaluation of the goodness-of-fit. Note how the confidence intervals expand where there are gaps in the data (Figure 9).
The periodic kernel is used to model periodic structures in data, which can be transformed to quasiperiodic structures by multiplication with a non-periodic kernel [19,20]. Furthermore, the RQ kernel incorporates different length scales, accounting for the irregular variations in the GHI-data. Tolba et al. [20] propose an interpretation for the aptness of a quasiperiodic kernel for the modelilng of GHI data: 'The proposed interpretation is that the structure of GHI is better modelled through an omnipresent periodic component representing its global structure and a random latent component explaining rapid variations due to atmospheric disturbances.'

Conclusions
This paper aimed to illustrate the potential of Gaussian process regression for the interpolation and forecasting of GHI data. Reasonably good interpolation and prediction of solar radiation data were achieved by employing a multi-in-single-out Gaussian process regression with a Per × RQ kernel to a one-hourly averaged weather data set. The effect of sampling interval on the effectiveness of the Gaussian process regression model to capture the characteristics of a weather data set, was also briefly investigated. In this regard, it can be concluded that the sampling interval has a significant effect on the ability of the Gaussian process regression model to accurately capture the structure of a wind speed data set, as measured by the Weibull parameters. This gives rise to the question of whether a Gaussian process regression model of GHI data will be similarly affected by interval deficiency.
The reasonably successful forecasting of one-hourly averaged solar radiation data using Gaussian process regression points to the possibility of effectively integrating multi-in-single-out Gaussian process regression into a renewable energy management system. It is recommended that future work focuses on finding a rules-based method for the construction of kernels for GHI data modeling, as well as applying Gaussian process regression to larger datasets. The impact of sampling interval in Gaussian process regression models of GHI data could also make for interesting future research.

Funding:
The financial assistance of the Manufacturing, Engineering and Related Services Seta (merSETA) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to merSETA.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: