1. Introduction
In the past few years, the higher penetration of renewable energy sources, in particular solar photovoltaics, into power grids, has brought new challenges for grid operators [
1,
2,
3,
4,
5]. As electricity is not easy to store, supply and demand have to be balanced at all times by grid operators. Nonetheless, due to the intermittent nature of the solar resource, the deployment of photovoltaic (PV) power generation makes the power grid balance more complex to ensure using standard tools [
6]. Indeed, the proliferation of PV power generation in power distribution grids brings out constraints, in particular voltage constraints, mainly observed on the medium-voltage power distribution grid, on the low-voltage one. Ergo, evolution of the power distribution grid and smart management tools are necessary to alleviate these constraints. It seems necessary to develop new tools that must help to improve the grid observability and management to go along with the grid’s evolution. As a result, the development of predictive management strategies is a stepping stone to more efficient real-time monitoring and optimization of grid operation [
6,
7,
8]. Therefore, tools that allow accurate forecasting of PV power generation at several time horizons are needed to achieve the power grid stability and reliability. Towards this same objective, in the context of the Smart Occitania project, PROMES-CNRS laboratory and ENEDIS (the French distribution grid operator) have been developing tools to improve the observability and regulation of the low-voltage distribution grid in the presence of PV power generation in Occitania (southern France). Intrahour, intraday and day-ahead PV power generation forecasts allow grid operators to make decisions related to real-time grid regulation, unit commitment, and control of electricity demand [
9]. To reach the goals of the Smart Occitania project, this work will focus on short-term forecasting horizons, ranging from 10
to 4
. Indeed, in order to perform a predictive management of power grids, which usually have fast dynamics, very short forecast horizons are needed.
The generated PV power can be deduced from forecasts of global horizontal irradiance (GHI), which is the total amount of shortwave radiation received from above by a horizontal surface on Earth. Hence, accurate forecasts of GHI at various time horizons are required for efficient development of grid-connected PV power systems. Reviewing the scientific literature, various solar irradiance forecasting methods have been developed (see e.g., [
9,
10,
11,
12,
13,
14,
15]). These models are used to forecast GHI depending on the input data and forecast horizon. For the intrahour range, GHI forecasts with higher spatial and temporal resolution derived from cloud information from ground-based sky imagers are more accurate than the satellite-based forecasts [
10,
16,
17]. For short-term horizons ranging from few minutes to 6
, statistical models with on-site GHI measurements are appropriate [
10,
11,
12,
13]. Satellite images, which provide information about cloud motion that can be extrapolated to the upcoming few hours, allow to have good forecasts for time horizons up to 6
ahead [
10,
18]. However, the spatial extension of the monitored cloud scenes and corresponding cloud velocities limit the forecast horizons. Numerical weather prediction (NWP) models deliver more precise forecasts for time horizons from about 6
onwards [
9,
10,
19]. There are also combined or hybrid methods that integrate different kinds of input data and/or approaches to elaborate a high-performance forecasting model [
20,
21,
22].
One drawback of statistical models is that they cannot account for dynamic phenomena like motion and formation of clouds that create sudden changes in the GHI signal [
10]. Based on these effects, one might desire to use models to describe cloud motion and to derive irradiance from images provided by satellites or sky imagers. However, these models are complex and exhibit an inherent uncertainty related to limits in spatial and temporal resolution, uncertainty in input parameters, and simplifying assumptions within the models [
10]. As a result, for short-term horizons, although dynamic phenomena may not be anticipated, statistical models developed are nonetheless used to provide forecasts. A simple way of using statistical approaches to forecast solar irradiance is to develop models that deliver forecasts based only on endogenous data (only GHI measurements). Statistical models can also be fed with exogenous input data such as NWP forecasts [
20] or other data (direct horizontal irradiance, direct normal irradiance, dew point, temperature, humidity, wind direction) [
23]. However, the development of statistical models using endogenous data is prevalent in the solar irradiance forecasting literature in a real-time operational context because they can provide accurate forecasts with limited investment and computational effort.
As the current paper deals with short-term GHI forecasting based on historical GHI data, the focus is put on statistical models. These models include classical time-series approaches such as the persistence model and autoregressive models, and artificial intelligence-based techniques such as regression trees,
k-nearest neighbours (kNN), artificial neural networks (ANNs), support vector regression (SVR) and Gaussian process regression (GPR) [
12,
13,
24,
25,
26,
27,
28]. Note that as most of classical time-series models need stationarity, the solar irradiance data, which are non-stationary, can be preprocessed using the clearness index [
11] or the clear-sky index [
29]. However, machine learning-based techniques can model non-stationary signals and are capable of capturing both the periodic component and the stochastic part of the GHI time-series. Additionally, in [
30], it is argued that artificial intelligence-based techniques without any specific preprocessing of data outperform classical approaches with preprocessed data. Finally, the development of models free from using clear-sky models or other preprocessing steps implies that all errors come solely from the forecasting method. As a result, in this paper, GHI data without any specific preprocessing step will be used.
Machine learning methods have been increasingly used in recent years. In [
31], the authors applied deep recurrent neural networks for solar irradiance forecasting and showed that these networks outperform SVR models and feedforward neural networks. In [
23], long short-term memory (LSTM) neural networks were used for multi-step ahead forecasting of GHI. In [
32], the authors developed a hybrid model using an autoregressive model and an ANN model for forecasting hourly solar radiation in the Mediterranean area. One-hour ahead solar irradiance were predicted using support vector machines (SVMs) in [
33]. In [
34], a deep convolutional neural network (CNN) model has been developed for hourly GHI forecasting based only on sky images without numerical measurements and extra feature engineering. A hybrid CNN with a LSTM neural network for forecasting half-hourly solar radiation has been proposed in [
35]. This hybrid model has been compared to other deep learning models and results show that the hybrid model outperformed its counterparts. The potential of GPR for GHI forecasting has been investigated in [
24,
25]. In [
24], the authors have made a comparative study of online GPR and online sparse GPR models based on simple kernels or combined kernels defined as sums or products of simple kernels and the results have shown the superiority of quasiperiodic kernels-based GPR models over the classic persistence model as well as simple kernels-based GPR models.
Based on their proven good performance, popularity and potential in providing accurate forecasts, it yields that machine learning methods such as ANN, SVR and GPR are well-suited for GHI forecasting. The present paper focuses on the development and comparison of intrahour and intraday machine learning-based GHI forecasting models. Even though several such comparative analyses exist in the literature (
Table 1 offers a comparison between the work presented in this paper and recent comparative studies), several questions still remain to be answered.
Most studies use databases having at least a 1-hour time step, which leads to the impossibility of intrahour forecasting and to significant simplification of GHI dynamics, as illustrated in
Figure 1.
The methods themselves are not always optimized and used to their fullest extent. For example, when using GPR, the default kernel (i.e., the squared exponential) is usually used [
13,
36]; however, a kernel tailored to the application at hand has a significant influence on results [
24].
Regarding input data: some studies use only endogenous data, while others use additional data, which prevents from making fair comparisons between methods.
Some authors forecast GHI directly, others the clear-sky index (using clear-sky models as pre- and postprocessing steps) or even PV power generation.
As a result, despite the extent of research on GHI forecasting in the scientific literature, a thorough comparative study of machine learning-based methods using endogenous data only, without any preprocessing step, for intrahour GHI forecasting is, to the best of the authors’ knowledge, inexistent.
The main contributions of the present paper are threefold.
The models are developed using a two-year GHI database with a 10
time step. As can be seen in
Table 1, in previous machine learning studies the time step is usually 1
. However, such a time step leads to significant simplification of GHI dynamics: as can be noticed in
Figure 1, GHI data sampled with 10
time step exhibit more fluctuations and are thus more difficult to forecast.
Contrary to developing a specific model for each forecast horizon, as shown in some studies in the literature [
13,
28], we made the choice of multi-horizon forecasting models. Developing a specific model for each forecast horizon can be computationally demanding when many horizons are considered and it would be more practical to use a multi-horizon forecasting model when trying to run the algorithms in situ to produce real-time forecasts at various horizons, especially when intrahour forecasts are needed. Therefore, in this paper, the models are developed for multi-step ahead GHI forecasting and once the training phase is over, the models are used to forecast GHI for all horizons.
Besides, many authors generally choose classical performance criteria (nRMSE, MAE, MBE, MAPE) for their models’ evaluation. In the present paper, two criteria are used in addition to the nRMSE: DMAE, that accounts for temporal distortion error and absolute magnitude error simultaneously; and CWC, that assesses the quality of prediction intervals. These criteria provide more detailed and comprehensive information about the models’ performance, and allow an in-depth analysis of their forecasts.
In this paper, the same data are used for each model training and tests are performed on the same dataset to make a fair comparison and a thorough analysis of the results. When using GPR, kernels with automatic relevance determination (ARD), such as the squared exponential kernel with ARD (
) and the rational quadratic kernel with ARD (
), are chosen to account for the relevance of each input dimension in the underlying function modelling. It is a good option, when using GPR for GHI forecasting, as the underlying function has a multi-dimensional input variable, to use ARD kernels that implicitly determine the relevance of each input dimension [
37]. That is why we have decided for such kernels. Nonetheless, the flexibility of ARD kernels means that they can be relatively slow to learn and, as a result, authors generally choose simple kernels with isotropic correlation length parameter [
13]. The ANN models developed in this paper are based on MLP (multilayer perceptron) and LSTM neural networks. These artificial neural networks are widely used for time series forecasting.
The rest of the paper is organized as follows: in
Section 2, the data used to develop and validate the models included in the comparative study are described.
Section 3 provides a description of the scaled persistence model and the machine learning methods (GPR, SVR and ANN) used to forecast GHI and presents the models’ structure. The forecasting results as well as the criteria used to assess the models’ performance (i.e., forecasting accuracy) are presented and discussed in
Section 4. The paper ends with a conclusion.