Next Article in Journal
Multi-Objective Short-Term Optimal Dispatching of Cascade Hydro–Wind–Solar–Thermal Hybrid Generation System with Pumped Storage Hydropower
Previous Article in Journal
Impact of K-H Instability on NOx Emissions in N2O Thermal Decomposition Using Premixed CH4 Co-Flow Flames and Electric Furnace
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Advances in Short-Term Solar Forecasting: A Review and Benchmark of Machine Learning Methods and Relevant Data Sources

by
Franko Pandžić
* and
Tomislav Capuder
University of Zagreb Faculty of Electrical Engineering and Computing, Unska ulica 3, 10000 Zagreb, Croatia
*
Author to whom correspondence should be addressed.
Energies 2024, 17(1), 97; https://doi.org/10.3390/en17010097
Submission received: 17 October 2023 / Revised: 18 December 2023 / Accepted: 22 December 2023 / Published: 23 December 2023
(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Abstract

:
Solar forecasting is becoming increasingly important due to the exponential growth in total global solar capacity each year. More photovoltaic (PV) penetration in the grid poses problems for grid stability due to the inherent intermittent and variable nature of PV power production. Therefore, forecasting of solar quantities becomes increasingly important to grid operators and market participants. This review presents the most recent relevant studies focusing on short-term forecasting of solar irradiance and PV power production. Recent research has increasingly turned to machine learning to address this challenge. The paper provides a discussion about building a solar forecasting model, including evaluation measures and machine learning method selection through analysed literature. Given that machine learning is data-driven, the focus of this review has been placed on data sources referenced in the literature. Open-access data sources have been compiled and explored. The main contribution of this paper is the establishment of a benchmark for assessing the performance of solar forecasting models. This benchmark utilizes the mentioned open-source datasets, offering a standardized platform for future research. It serves the crucial purpose of streamlining investigations and facilitating direct comparisons among different forecasting methodologies in the field of solar forecasting.

1. Introduction

Solar energy has emerged as a vital and sustainable source of energy, contributing significantly to the global transition toward renewable energy sources. According to the International Energy Agency, solar energy capacity share accounted for 12.8% of the world’s total energy capacity for 2022. Their projections indicate that by 2027, it is expected to exceed 20%, surpassing all other sources of energy [1]. However, the inherently intermittent and variable nature of solar radiation poses challenges to grid integration, and subsequently, grid energy management. Photovoltaic (PV) power production forecasting becomes increasingly vital for both short-term and long-term horizons. PV production depends on the amount of solar radiation reaching the PV cells on panels. As solar radiance is affected by current weather conditions such as cloudiness, temperature and windiness, it becomes difficult to plan and manage the energy within the system [2]. As the share of solar energy rapidly increases, so does the need for reliable and accurate PV production forecasting. It is crucial for several reasons, such as grid integration, system stability, energy management, market participation and research and development [3]. Recent advancements in computing power have led to a significant increase in the popularity of machine learning (ML) algorithms for PV forecasting. This trend has resulted in a surge in the publication of research papers on this topic, with the number of such papers rising exponentially, as highlighted in a 2023 study [2] by Alcaniz et al. Researchers have been increasingly leveraging the capabilities of ML to enhance the accuracy and efficiency of PV forecasting models, reflecting the growing significance of this field in the renewable energy sector. This paper serves as a comprehensive review of the field of solar irradiance and PV power forecasting with a focus on data sources. Research questions that we focus on are as follows:
  • The available open-source data enabling the comparison and benchmarking of different forecasting methods. Today, there is a significant body of work describing different ML methods and explaining the benefits of applying them to specific case studies. However, in most of the papers, the dataset is unavailable and unknown in terms of the number of data and quality of the data, as well as availability to produce the results from the published paper. We find that such an approach hinders future development, as each researcher/developer needs to self-test all available methods to learn about their advantages and disadvantages. Our goal is to list the open available data and to assist in creating an open-source community where transparency of newly developed tools/solutions is key to quality research.
  • The relevant metrics to benchmark the effectiveness of a certain ML method as well as what is the range of the values for those metrics in previously published papers. Our goal is to provide a framework for future researchers to use adequate metrics and to understand the quality of their proposed method.
  • New sources of data, previously less or not utilized, that could improve the existing or new ML methods. Here again, we focus on open sets of data that are transparent and available to everyone, and as such can serve as a unified benchmark of the proposed method.
Review papers [2,3,4,5,6,7] helped shape the outline of this paper. These review papers focus on summarizing state-of-the-art (SOTA) papers regarding PV forecasting using ML, outlining their highlights and techniques used to challenge the problem. In this context, this research builds upon the aforementioned studies while simultaneously going into greater depth of exploring utilized data sources of solar irradiance, historical PV power production and satellite images. Given the emphasis on data-driven forecasting methods in this research, it is important to highlight and explore available open-source data.
Outline This review encompasses papers that employ machine learning techniques for PV power or solar irradiance forecasting. While numerous papers, such as [8,9,10,11,12], concentrate on long-term forecasts, a greater emphasis on the short-term horizon (up to 7 days in advance [2]) has been placed in this review. Out of the total number of identified papers relevant to this topic, we have selected papers published in the past 5 years (2018 onwards), focusing on the ones most cited in literature. This includes journal and conference papers, along with book sections.
Contributions This paper presents a contribution to the field through the creation of a comprehensive benchmark for open-source datasets, providing a valuable resource accessible to the research community for rigorous and standardized benchmarking. Furthermore, an exploration of the training and testing periods within the analysed literature is included, further enriching the understanding of temporal aspects in the context of benchmarking practices for solar forecasting. Additionally, we provide a comprehensive discussion with suggestions on building a solar forecasting model; including the selection of machine learning methods and evaluation measures. Together, these contributions aim to enhance the methodological robustness and comparability of research in the domain of solar forecasting.
Contents In Section 2, Research Area Overview, are highlighted key features of analysed papers. Section 3, Machine Learning Methods, provides brief explanations of ML methods used, while Section 4, Open-Source Data, offers a more comprehensive summary of data utilized in this field. Finally, Section 5, Conclusions, summarizes the paper and recommends future work.

2. Research Area Overview

The focus is placed on literature regarding ML forecasting of the short-term PV power output of a solar power plant. As the PV power output of a plant is entirely dependent on the amount of solar irradiance reaching the solar panels, the focus was extended to literature regarding solar irradiance forecasting as well. The true amount of solar radiation reaching panels depends on the local weather conditions. Consequently, many papers focused on forecasting local cloud coverage to aid irradiance or PV power forecasting (up to 6 h in advance) [13,14,15,16]. This is carried out by using either satellite images [13] or images collected using a sky imager on the location [17]. The images are utilized to predict cloud movement for ultra-short-term horizons [13], as local weather forecasts are inaccurate beyond that time span.
Information on future local solar irradiance is most often obtained through Numerical Weather Prediction (NWP) [18], e.g., the Weather Research and Forecasting (WRF) model [19,20]. It often includes forecasts of multiple weather variables such as temperature, humidity, wind speed, etc., 72 h in advance, extracted for a specific location.
The next key highlights have been extracted from analysed literature and are shown in Table 1:
  • Goal—goal of the paper: PV power or solar irradiance forecasting;
  • Horizon (Step)— forecast horizon with the granularity of the forecast (step);
  • Test size—it is important to highlight this feature, as a larger test set provides a more statistically significant sample of the data, indicates robustness, reduces risk of overfitting and gives more credibility to solutions tested on larger datasets;
  • Error term—measure of performance used to compare methods;
  • Method—ML method employed in the paper;
  • Location—geographical location of the PV power plant(s); all locations referred to in the analysed literature are shown on the world map in Figure 1.
Table 1 encompasses the key features of the analysed literature. The terms from the table are explained in the subsection Performance Measures and Section 3, Machine Learning Methods.

Performance Measures

In order to evaluate predictive models, a number of performance measures have been reported in literature. Mean Squared Error (MSE), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) are the most common performance measures used in analysed papers. Mean Absolute Deviation (MAD) [48] is also referred to as Mean Relative Error (MRE) in [33,46]; both measures represent a percentage deviation from the total (maximum) capacity of the PV plant. In contrast, Mean Absolute Percentage Error (MAPE) represents the percentage error relative to the actual production at a given moment. Forecast skill (FS) as employed in [25,31], explained in [63], quantifies the percentage deviation from persistence models. A positive percentage denotes improvement over persistence models, while a negative percentage indicates the opposite. Envelope Weighted Mean Absolute Error (eMAE) measures the absolute error between predicted values and the nearest points on the envelope (upper and lower bounds of historical data) curves [52]. If a prediction falls within the envelope, the error is zero; otherwise, it quantifies the vertical distance from the prediction to the envelope. Index of Agreement (IA) [37] denotes the closeness (agreement) between the forecasted and the actual values. IA ranges from 0 to 1, with 1 indicating perfect alignment between forecasted and actual values. It considers the squared differences with variability of the observed values to provide a comprehensive measure of model performance.
Probabilistic models are evaluated with the following measures. Continuous Ranked Probability Score (CRPS) [34] quantifies the discrepancy between a forecast probability distribution and the observed outcomes. It assesses the accuracy of probabilistic forecasts, with a lower CRPS value indicating better forecast performance. Prediction Interval Average Width (PIAW) is also commonly used for probabilistic forecasts and indicates sharpness of the prediction interval. In other words, it represents the ability of a forecast to accurately represent the variability or uncertainty in the data (prediction intervals are neither too wide or too narrow). Prediction Interval Coverage Probability (PICP), utilized in [16,51], denotes how often the forecasted value falls within the prediction intervals.
The aforementioned performance measures of the analysed papers are shown in more detail in Table 2.
When selecting error measures to evaluate forecasting models, the authors provide the following remarks. It is important not to evaluate forecasting performance on a single measure as it can be misleading. To give a banal example, a forecaster always outputTing the average value of the training set would have a very low mean bias error. Consequently, evaluating only on bias error in this example would be decieving. Therefore, multiple error measures must be utilized to accurately evaluate a forecasting model, preferably measures which focus on different aspects of the forecast deviation from the target values. Chicco et al. [64] argue that R 2 is more informative than MAE, MAPE, RMSE, MSE and SMAPE (Symmetric MAPE) for evaluating regression models. They report comparisons in several use cases and suggest the use of R 2 as a standard metric for regression analyses. Hodson in [65] debates about the choice of RMSE or MAE for model evaluation. He presents that RMSE is optimal for normal (Gaussian) errors while MAE is optimal for Laplacian errors. In the case of solar forecasting, the Laplacian distribution is often more suitable for modelling solar forecasting errors than the normal distribution. This is especially true in scenarios with heavier-tailed errors or sharp peaks, likely due to outliers or sudden environmental changes impacting PV generation. Nevertheless, there is no drawback in evaluating models on both measures. Furthermore, Koutsandreas et al. in [66] provide a comprehensive discussion about the selection of accuracy measures for forecasting models. Optimizing a forecasting model based on business case requirements, such as minimizing MSE, might seem intuitive. However, Koutsandreas et al. [66] show that mismatching the models’ loss function and evaluating measure has a minor effect on accuracy. Robust and accurate models are supposed to perform well according to any evaluating measure. Additionally, the authors recommend the use of forecast skill (FS) measure when building a project concerning solar forecasting. It requires comparison of the preferred model and simple, baseline models. Low FS value indicates that the preferred model is not significantly outperforming baseline models and needs further improvements. To summarize, multiple error measures need to be used for solar model evaluation. Among these errors, we suggest using RMSE, MAE, R 2 and forecast skill as standard metrics for deterministic forecasting, at the minimum. For probabilistic models we recommend using CRPS, PICP and PIAW.

3. Machine Learning Methods

In this section we provide brief descriptions of various machine learning methods used in analysed literature. The idea is not to go into great detail of all the machine learning architectures for clarity purposes as these architectures are discussed in detail in other literature. To conclude the section, the authors provide their suggestions regarding the selection of machine learning architectures for solar forecasting.
Autoregressive models (AR) are a group of statistical models used in time-series analysis and forecasting. AR models describe a variable’s behaviour over time by relating it to its past values. They assume that the current value of a variable is a (non)linear function of its previous values. AR models can be very successful in solar forecasting, as often we can model forecasts on prior measurements. For example, during a summer period we can infer that the irradiance value at noon for tomorrow is likely to resemble the irradiance value of noon for today. Upgraded versions of AR models use exogenous variables (X) as inputs, denoted ARX. Most widespread AR models also use a moving average (MA) for forecasting, such as autoregressive moving average (ARMA) and its improvements ARIMA (I-Integrated), SARIMA (S-Seasonal) and SARIMAX [28,54]. Autoregressive Neural Networks (NAR(X)) have also been used in [53,56].

3.1. Classical Machine Learning

Linear regression (LR) is a simple ML method which assumes linear dependency between input and output variables. Linear regression is denoted as y = w x + b and is considered to be a simple and interpretable ML solution. This approach remains relevant in the field of solar forecasting due to the inherent linear dependence between PV power plant production (y from the notation) and solar irradiance (x from the notation). In other words, the relationship between the aforementioned variables can be explained by a linear dependence in majority, and thus, LR remains a powerful tool in solar forecasting. This approach is simple to optimize, and is also interpretable, as we can explain how the inputs constructed the output of the model via weights associated with each input. To prevent overfitting, we discern between different types of regularization to obtain L0, L1 (Lasso) and L2 (Ridge) linear regression. The authors recommend employing this method as it is extremely effective and interpretable, despite its simplicity, when building a solar forecaster.
A Random Forest (RF [22,35,45]) is an ensemble ML method that combines multiple decision trees (DT) to improve generalization by aggregating individual DT predictions. By aggregating multiple simpler models, RF intrinsically reduces overfitting and improves generalization. This architecture can easily handle multiple features for solar forecasting such as temperature, wind speed, etc. Similar to LR, RF can be considered a more interpretable solution than most as it provides feature importance scores. Moreover, RF can handle missing data effectively. In practical scenarios, meteorological data may have missing values due to sensor malfunctions or outages. This is where an RF-based solar forecaster is extremely valuable and can provide accurate forecasts, even with incomplete data.
A Support Vector Machine (SVM) [13,17] is a classification ML algorithm that finds the optimal hyperplane that maximizes the margin between nearest data points of different classes. The Support Vector Regressor (SVR) [60] uses the same principle of support vectors. Like SVM, SVR aims to find an optimal hyperplane, but in the context of regression, the goal is to find a hyperplane that best fits the data points while minimizing the margin violations (i.e., how much the predicted values deviate from the actual values). Similar to RF, SVR is robust to outliers and can handle any possible non-linearity and high-dimensionality in the data space of solar forecasting. Furthermore, SVR can be tuned to handle non-uniform data distributions, as the distribution of solar parameters is often non-uniform due to sudden changes in weather conditions. Moreover, SVR employs a hyperparameter that defines a margin of tolerance for errors. This can be particularly useful in scenarios where a certain degree of error is acceptable, i.e., it allows for flexibility in the model’s forecasts. The major drawback of SVRs in solar forecasting is their complexity and lack of explainability, which can be a deal-breaker for many operators and grid participants utilizing solar forecasts.
A Gradient Boosting (GB) regression model is created in a sequential fashion, starting from a simple, “weak” prediction model (e.g., a decision tree). The initial model is improved (boosted) in each iteration based on the loss function gradient, with the goal of minimizing said loss function. Similar to RF and SVR, GB can handle complex, high-dimensional data and is robust to outliers due to the ensemble nature of the method. This method can also handle missing data and provide accurate forecasts, even when data are incomplete. GB can adapt to varying data distributions, making it suitable for forecasting scenarios where the distribution of solar irradiance may suddenly change during the day which is likely. It also allows for the use of various loss functions, providing flexibility in optimizing the solar forecaster for specific business requirements. Its effectiveness on complex data, along with its robustness, has sparked new iterations of it, such as the Light Gradient Boosting Model (LGBM) [39], Adaptive Boosting model (AdaBoost) [41] and Categorical Boosting model (CatBoost) [39]. In addition to LR, the authors recognize the significance of GB methods for building a solar forecaster.

3.2. Neural Networks (Deep Learning)

Feedforward Neural Networks (FNN) comprise fully interconnected nodes, including an input layer, one or more hidden layers and an output layer. Information flows unidirectionally from one node (neuron) to the node in the next layer, i.e., the node “feeds” information forward to the next node. FNN, also called Multilayer Perceptron (MLP) [55], is sometimes synonymous with the term Deep Neural Network (DNN), as any model with at least one hidden layer is called deep, so FNN with a hidden layer is actually a DNN as well. A Backpropagation Neural Network (BPNN), used in [17,48,49], is also a type of FNN that is specifically optimized using a backpropagation algorithm. FNNs can effectively capture non-linear relationships in data. However, they posses a limited capability to capture temporal dependencies, which is important in solar forecasting. Furthermore, FNNs often lack explainability, and the authors suggest other appropriate methods to tackle solar forecasting.
A Convolutional Neural Network (CNN), also referred to as ConvNet [58] in text, is a specialized deep learning architecture for processing grid-like data, such as images and spatial or tabular data. Unlike traditional NNs, CNNs excel at capturing localized patterns and spatial hierarchies within the input data. They are composed of convolutional layers that apply trainable filters to the input, producing feature maps that highlight relevant features. They typically produce outputs using fully connected (dense) layers after relevant feature extraction has been performed by convolutional and pooling layers, shown in Figure 2. CNNs can be highly effective for solar forecasting purposes. For example, Simeunović et al. in [43] use a Graph CNN for forecasting the production of multiple PV power plants, where each plant represents a node of the graph. This type of CNN can effectively model spatial and temporal dependencies between different PV power plants, identifying common patterns across different locations through these dependencies. The authors suggest this method when forecasting production for multiple PV power plants that have spatial dependencies between them.
Recurrent Neural Networks (RNNs) are designed for processing sequential data with the ability to capture dependencies over time. Long Short-Term Memory (LSTM) networks are the most prominent RNN-type architectures in all analysed papers. LSTM excels at capturing and modelling dependencies in sequential data using specialized memory cells, which can capture long-range dependencies. Their suitability for time-series (sequential data) analysis has led to the widespread adoption of LSTM networks in recent research papers focused on PV forecasting.
Figure 3 displays an LSTM cell, which is made of three discernable gates. The forget gate controls what information from the previous cell state should be forgotten, while the input gate determines what new information should be added to the current cell state. The output gate regulates what information should be exposed to the next cell, filtering less important information out. Networks based on bidirectional LSTM cells are also used in papers, such as [28]. Bidirectional cells add upon the original LSTM cells with a functionality of passing information forward or backwards through the cell. The Gated Recurrent Unit (GRU) [35] is also closely related to—and can be seen as a precursor to—LSTM. One of the disadvantages is the vulnerability to vanishing/exploding gradients when training RNN models. They are also often computationally expensive to train. However, due to their suitability for sequential and time-series data, the authors recognize RNNs as powerful tools in solar forecasting when trained and constructed correctly.
Finally, it is worth mentioning Residual Networks, or ResNets for short [14,30]. ResNets first introduced skip (residual) connections between layers [69]. Their goal is to mitigate the vanishing gradient problem, which allows for the training of very deep NNs more effectively. First introduced for CNNs, the term ResNet is now used for any NN that uses skip connections. These skip connections ensure that exploding or vanishing gradients do not propagate further by replacing them with an identity function. The authors advise constructing a solar forecaster using RNN-type architecture with the addition of skip connections to mitigate the problem of vanishing/exploding gradients.
Most of the ML methods used are constructed using some of the aforementioned methods, either individually or in a hybrid mode, combining multiple methods such as Conv-LSTM [40] or NARX [53].
Given NNs’ capability to model complex dependencies, it might seem intuitive to assume that they are always the optimal choice for solar forecasting, and forecasting tasks in general. However, this assumption relies on the availability of substantial computational power and extensive, theoretically sound datasets. However, real-world data often are noisy and scarce, and here, NNs fail to outperform simpler methods. Grinsztajn et al. in [70] provide a comprehensive comparison of tree-based models such as Random Forest (RF) and Gradient Boosting (GB) against state-of-the-art deep networks on various medium-sized tabular datasets. They clearly state the superiority of simpler, tree-based models against more complex, deep networks. Munawar et al. in [71] confirm this hypothesis for short-term solar forecasting with XGB outperforming NNs. On the other hand, Markovics et al. in [39] show that a Multilayer Perceptron (MLP) performs at least as well as the top classical methods across various evaluation metrics. Another aspect of building a solar forecasting model and understanding business needs includes the interpretability of the forecaster output. One of the problems with complex NNs is their black-box nature. It is often needed to reduce the complexity of the model (which may affect performance) in order to increase the understanding of the forecaster results. We recommend reading [72,73,74] in order to explore methods of increasing the interpretability of a forecaster. Papers that emphasize the interpretability of solar forecasting models include [61,62,75]. Taking all of this into consideration, we recommend the following methodology when building a solar forecasting model: Start the process with fundamental baseline models and progressively advance from simpler models, such as linear regression (LR), to more intricate ones, including gradient boosting (GB) models, culminating with recurrent neural networks (RNNs), which use skip connections (ResNet). Evaluate models on the already-mentioned minimum set of evaluation metrics: RMSE, MAE, FS and R 2 . When building a probabilistic forecaster, evaluate on CRPS, PICP and PIAW, at the minimum. Model architecture selection is an important part of building an accurate and robust solar forecasting model. Nonetheless, we argue that it is not the most important; the quality and timely accessibility of data is the most important part of building any forecasting model. Having sound data and using any of the above-mentioned ML methods should provide similar results at least. The final minor improvements can be made with method selection and proper training and hyperparameter optimization. This is why we put focus on the sources of data for solar forecasting in Section 4.

4. Open-Source Data

ML methods rely heavily on the data they are trained on, making the quality and accessibility of data the pivotal factor in the success of the methods solving complex tasks. Many authors gathered data directly from the power plant owner or from the grid operator. These data are most often closed-source. To oppose this and to encourage further research and benchmarking, we provide a compilation of open-access data regarding PV forecasting. Table 3 summarizes the data explored in this review. It states the Source/Name of the database or provider of the data, Data type, Link to the data or the website containing it (all accessed on 18 September 2023) and the literature citing the use of the database in the References column.

4.1. Sources

The National Renewable Energy Laboratory (NREL, Denver West Parkway Golden, CO, USA), specifically the Solar Radiation Research Laboratory (SRRL, Tokyo, Japan), provides the largest publicly available dataset with both total sky images (TSI) and numerical measurements. Continuous solar measurements have been gathered by the SRRL at NREL’s South Table Campus in Golden, Colorado (USA), starting from 1981. Alongside numerical measurements of minutely resolution, the SRRL dataset includes two sets of TSIs taken by a Yankee Total Sky Imager (TSI-800, Fort Washington, PA, USA) and EKO All Sky Imager (Lily Dale, NY, USA)with a 10 min resolution [27].
The HelioClim-3 database was created using a Meteosat First Generation (MFG) satellite of European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT, Darmstadt, Germany) [24,76]. The satellite images obtained from MFG EUMETSAT are processed to yield values of incoming surface solar irradiance (SSI). The data collection for the database began in 2004, and is updated daily.
EUMETSAT, named in the Source/Name column of Table 3, contains various products (types) of satellite images. In [20], cloud mask satellite images (clm) were explored to investigate their usefulness for PV power forecasting. Another example of EUMETSAT products would be a microphysics (mphys) image product that contains information of distinction between ice, water clouds and cirrus clouds. Some products are available in 5 min and some in 15 min temporal resolution, with 1 to 3 km spatial resolution [84].
Girasol, a sky imaging and global solar irradiance dataset, contains solar cycle recordings of 244 days from 3 years (the size of 110+ GB). The cameras sample the images in 15 s intervals, with logs recorded in periods when the Sun’s elevation angle is higher than 15°. The pyranometer data are sampled 4 to 6 times per second, while the weather station data sample interval is 10 min [78]. Further explanation can be found following the provided link in the Link column.
The European Centre for Medium-Range Weather Forecasts (ECMWF) provides forecasts of multiple weather variables, including various solar radiation parameters such as direct solar radiation and top net solar radiation, with a resolution of 0.1° × 0.1° latitude/longitude [85].
The Aerosol Robotic Network (AERONET) provides aerosol optical depth data. It is a measure of the attenuation of solar radiation due to the scattering and absorption of sunlight by aerosol particles in the Earth’s atmosphere [49,81], which can also be utilized to improve PV forecasting.
The Copernicus Climate Change Service (C3S) contains multiple useful datasets regarding solar irradiance. For example, the global dataset ERA5-Land hourly data from 1950 to present contains hourly data from 1950 until 2021, such as surface net solar radiation and surface solar radiation downwards. Some other possible useful datasets include the Surface radiation budget from 1982 to present derived from satellite observations. C3S also provides subregion extraction through latitudinal and longitudinal values [83,86].
The Surface Radiation Budget Network (SURFRAD) was established in 1993 with the focus on supporting climate research with accurate, continuous and long-term measurements of the surface radiation budget across USA. SURFRAD consists of a seven-station radiation monitoring network that has collected minutely solar radiation data since 2009 [29,87].
The Global Forecasting Service (GFS) is a weather forecast model that generates data for dozens of atmospheric and land-soil variables. GFS is a global model with a base horizontal resolution of 28 km between grid points. The forecast horizon is up to 16 days in advance, with a temporal resolution of 3 h [47,88].
A Sky Images and Photovoltaic Power Generation Dataset (SKIPP’D) contains three years (2017–2019) of quality-controlled downsampled minutely sky images and PV power generation data [31].
Photovoltaic Geographical Information System-Solar Air Heating 2 (PVGIS-SARAH-2) is derived based on the second version of the SARAH solar radiation data provided by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF). The solar radiation data come in GeoTIFF [89] format and consist of the average irradiance over a specific time period, measured in W/m 2 [82].
Some other possibly useful references can be the Goddard Earth Sciences Data and Information Services Center (GES-DISC) https://disc.gsfc.nasa.gov/ (last accessed on 18 September 2023), Meteotest https://meteotest.ch/en/ (last accessed on 18 September 2023) [43,47], Moderate Resolution Imaging Spectroradiometer (MODIS) of NASA’s Terra and Aqua satellites https://modis.gsfc.nasa.gov/data/dataprod/mod01.php (last accessed on 18 September 2023) [11] and the Finnish Meteorological Institute’s open-source datasets https://en.ilmatieteenlaitos.fi/open-data-sets-available (last accessed on 18 September 2023) [90].

4.2. Benchmark

In Table 4 are shown benchmarked results of the existing literature on open-access data. The idea is to enable novel research to compare the results against the existing ones on the identical, readily available datasets. Other datasets mentioned in Table 3 are left out due to the closed-source nature of the PV plant production data.
The column Period (Training) shows the temporal scale of the dataset used in the referenced paper. The dates in parentheses refer to the training part of the dataset.
The error measures RMSE, MAE and MBE are denoted in W/m 2 (irradiance forecasting) or MW (PV power forecasting). The error measures MAPE, FS, MRE, nRMSE and nMAE are denoted in % deviation. We identified three sources available for benchmarking: NREL, SKIPP’D and SURFRAD. NREL is the most referenced dataset in the analysed literature. Nevertheless, no papers can be directly compared to each other as they have no overlapping test periods and have different forecast horizons or forecast steps for the NREL dataset. Despite that, a similarity becomes evident when comparing their common error measures. To be more specific, the RMSEs for [57], [27], [15] and [16] are 80.02, 81.03, 80.48 and 98.94 W/m 2 for irradiance forecasting, respectively. Out of these results, we would emphasize the results obtained by Dolatabadi et al. [57] and Feng et al. [27], as their solutions yield robust results over longer testing periods of one year and two years, respectively. When benchmarking against these papers, future authors must have in mind the forecast horizon and the granularity of the forecast, as they differ for each paper encompassed in Table 4. We encourage future authors to use identical error measures to accurately benchmark models against existing ones. Regarding forecast skill (FS), new models must be compared against the same baseline models as in the referenced papers.
In [31] by Nie et al., the model was evaluated on only 4% of data (∼20 days) with a dataset containing almost three years of data (SKIPP’D). A more statistically robust approach would involve extending the testing period to cover a longer duration.
Finally, Yang et al. in [29] evaluated and compared different clear sky models for the SURFRAD dataset without specifying the testing period. We anticipate improvements to be made over the clear sky models by using a machine learning approach on the SURFRAD dataset.

5. Conclusions

In this paper, an overview of the state of the art of short-term PV forecasting using machine learning is provided. This review focused on the most recent relevant studies in the last 5 years with the highest number of citations. Key features of each relevant paper have been compiled and summarized in Table 1, including the goal of the paper, forecast horizon, ML methods employed, evaluation techniques, location of the power plants and finally the test size, as we have not seen this included in other similar review literature. A larger test size signifies a more reliable assessment of the method tested. Furthermore, the methods employed and evaluation measures used in the reviewed literature have been discussed. The use of deep learning in this field is on the upswing, as evident from analysed literature. Many of the papers describe the use of a hybrid deep learning model combining feedforward, convolutional or recurrent layers. Nevertheless, classical machine learning architectures and autoregressive methods continue to be relevant alongside more complex methods. We provide a discussion and offer recommendations on the methodology of building a solar forecasting model including evaluation measure and machine learning method selection. Machine learning methods are data-driven, and many data sources have been reported in the literature. Consequently, open-access data sources have been compiled and explored. These open-access data sources and papers utilizing them serve as a benchmark against which new research can be evaluated. Table 4 benchmarks the existing analysed literature on readily available datasets; we also provide suggestions on the steps to ensure proper comparison of new research against the benchmark.
Future work For future research directions, we encourage new authors to challenge the newly established benchmark with their innovative solutions. This serves to broaden and deepen the knowledge within the field of solar forecasting. Moreover, probabilistic solar forecasting is less represented in literature. We believe that this is a promising research direction as it can provide new solutions to the grid, such as probabilistic reserve markets. Additionally, a less-explored direction is the aspect of explainability when it comes to solar forecasters built on ML. This is important, as grid and plant operators often require full transparency in their decision-making process, which makes it worth pursuing.

Author Contributions

Conceptualization: F.P. and T.C.; writing—original draft preparation: F.P.; writing—review and editing: F.P. and T.C.; methodology: F.P.; supervision: T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available in manuscript, Section 4.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

ANNArtificial Neural Network
SVMSupport Vector Machine
BPNNBackpropagation Neural Network
MLRMultiple Linear Regression
(Bi)LSTM(Bidirectional) Long Short-Term Memory
CNNConvolutional Neural Network
ConvNetConvolutional Neural Network
SVRSupport Vector Regression
DTDecision Tree
MLPMultiple Layer Perceptron
RFRandom Forest
ARAutoregressive Model
DNNDeep Neural Network
FNNFeedforward Neural Network
DXNNDirect Explainable Neural Network
GB(D)TGradient Boosting (Decision) Trees
(L)GBM(Light) Gradient Boosting Machine
XGBExtreme Gradient Boosting
NGBoostNatural Gradient Boosting
ReLURectified Linear Unit
DenseFully connected feedforward layer
(S)AR(I)MA(Seasonal) Autoregressive (Integrated) Moving Average
ResNetResidual Neural Network
GRUGated Recurrent Unit
QRQuantile Regression
ANFISAdaptive Neuro-Fuzzy Inference System
GP(R)Gaussian Process (Regression)
KNNK-Nearest Neighbors
LRLinear Regression
AdaBoostAdaptive Boosting
ETRExtra Trees Regressor or Extremely Randomized Trees
EDLSTMEncoder-Decoder Long Short Term Memory
STCNNSpace-Time Convolutional Neural Network
NLSTMMulti-Task Multi-Channel Nested Long Short Term Memory
SSASalp-Swarm Algorithm
PSOParticle Swarm Algorithm
ENNElman Neural Network
NAR(X)Autoregressive Neural Network (with Exogenous Inputs)
PHANNPhysical Hybrid Artificial Neural Network
GCLSTMGraph Convolutional Long Short Term Memory
STARSpatio-Temporal Autoregressive Model
GCTrafoGraph Convolutional Transformer
MR-ESNMultiple Reservoirs Echo State Network
ELMExtreme Learning Machine
DRLDeep Reinforcement Learning
KS testKolmogorov–Smirnov test
IAEIndividual Absolute Error
SSIMStructural Similarity
ρ Coefficient of correlation
PIAWPrediction Interval Average Width
PICPPrediction Interval Coverage Probability
CRPSContinuous Ranked Probability Score
CWCCoverage Width Calculation
TSM-GATTemporal-Spatial Multi-Windows Graph Attention Network
LassoLeast Absolute Shrinkage and Selection Operator
ST-LassoSpatio Temporal model with Least Absolute Shrinkage and Selection Operator

References

  1. International Energy Agency. Share of Cumulative Power Capacity by Technology. 2022. Available online: https://www.iea.org/data-and-statistics/charts/share-of-cumulative-power-capacity-by-technology-2010-2027 (accessed on 7 September 2023).
  2. Alcañiz, A.; Grzebyk, D.; Ziar, H.; Isabella, O. Trends and gaps in photovoltaic power forecasting with machine learning. Energy Rep. 2023, 9, 447–471. [Google Scholar] [CrossRef]
  3. Liu, C.; Li, M.; Yu, Y.; Wu, Z.; Gong, H.; Cheng, F. A review of multitemporal and multispatial scales photovoltaic forecasting methods. IEEE Access 2022, 10, 35073–35093. [Google Scholar] [CrossRef]
  4. Gupta, P.; Singh, R. PV power forecasting based on data-driven models: A review. Int. J. Sustain. Eng. 2021, 14, 1733–1755. [Google Scholar] [CrossRef]
  5. Mellit, A.; Massi Pavan, A.; Ogliari, E.; Leva, S.; Lughi, V. Advanced methods for photovoltaic output power forecasting: A review. Appl. Sci. 2020, 10, 487. [Google Scholar] [CrossRef]
  6. Benavides Cesar, L.; Amaro e Silva, R.; Manso Callejo, M.Á.; Cira, C.I. Review on spatio-temporal solar forecasting methods driven by in situ measurements or their combination with satellite and numerical weather prediction (NWP) estimates. Energies 2022, 15, 4341. [Google Scholar] [CrossRef]
  7. Mohamad Radzi, P.N.L.; Akhter, M.N.; Mekhilef, S.; Mohamed Shah, N. Review on the Application of Photovoltaic Forecasting Using Machine Learning for Very Short-to Long-Term Forecasting. Sustainability 2023, 15, 2942. [Google Scholar] [CrossRef]
  8. Jung, Y.; Jung, J.; Kim, B.; Han, S. Long short-term memory recurrent neural network for modeling temporal patterns in long-term power forecasting for solar PV facilities: Case study of South Korea. J. Clean. Prod. 2020, 250, 119476. [Google Scholar] [CrossRef]
  9. Haider, S.A.; Sajid, M.; Sajid, H.; Uddin, E.; Ayaz, Y. Deep learning and statistical methods for short-and long-term solar irradiance forecasting for Islamabad. Renew. Energy 2022, 198, 51–60. [Google Scholar] [CrossRef]
  10. Ofori-Ntow Jnr, E.; Ziggah, Y.Y.; Rodrigues, M.J.; Relvas, S. A New Long-Term Photovoltaic Power Forecasting Model Based on Stacking Generalization Methodology. Nat. Resour. Res. 2022, 31, 1265–1287. [Google Scholar] [CrossRef]
  11. Ghimire, S.; Deo, R.C.; Downs, N.J.; Raj, N. Self-adaptive differential evolutionary extreme learning machines for long-term solar radiation prediction with remotely-sensed MODIS satellite and Reanalysis atmospheric products in solar-rich cities. Remote. Sens. Environ. 2018, 212, 176–198. [Google Scholar] [CrossRef]
  12. Han, S.; Qiao, Y.; Yan, J.; Liu, Y.; Li, L.; Wang, Z. Mid-to-long term wind and photovoltaic power generation prediction based on copula function and long short term memory network. Appl. Energy 2019, 239, 181–191. [Google Scholar] [CrossRef]
  13. Wang, F.; Lu, X.; Mei, S.; Su, Y.; Zhen, Z.; Zou, Z.; Zhang, X.; Yin, R.; Duić, N.; Shafie-khah, M.; et al. A satellite image data based ultra-short-term solar PV power forecasting method considering cloud information from neighboring plant. Energy 2022, 238, 121946. [Google Scholar] [CrossRef]
  14. Pothineni, D.; Oswald, M.R.; Poland, J.; Pollefeys, M. Kloudnet: Deep learning for sky image analysis and irradiance forecasting. In Proceedings of the Pattern Recognition: 40th German Conference, GCPR 2018, Stuttgart, Germany, 9–12 October 2018; Proceedings 40. Springer: Berlin/Heidelberg, Germany, 2019; pp. 535–551. [Google Scholar]
  15. Zhen, Z.; Liu, J.; Zhang, Z.; Wang, F.; Chai, H.; Yu, Y.; Lu, X.; Wang, T.; Lin, Y. Deep learning based surface irradiance mapping model for solar PV power forecasting using sky image. IEEE Trans. Ind. Appl. 2020, 56, 3385–3396. [Google Scholar] [CrossRef]
  16. Wang, F.; Zhang, Z.; Chai, H.; Yu, Y.; Lu, X.; Wang, T.; Lin, Y. Deep learning based irradiance mapping model for solar PV power forecasting using sky image. In Proceedings of the 2019 IEEE Industry Applications Society Annual Meeting, Baltimore, MD, USA, 29 September 2019–3 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–9. [Google Scholar]
  17. Wang, F.; Xuan, Z.; Zhen, Z.; Li, Y.; Li, K.; Zhao, L.; Shafie-khah, M.; Catalão, J.P. A minutely solar irradiance forecasting method based on real-time sky image-irradiance mapping model. Energy Convers. Manag. 2020, 220, 113075. [Google Scholar] [CrossRef]
  18. Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar] [CrossRef]
  19. Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Barker, D.M.; Duda, M.G.; Huang, X.Y.; Wang, W.; Powers, J.G. A description of the advanced research WRF version 3. NCAR Tech. Note 2008, 475, 113. [Google Scholar]
  20. Pandžić, F.; Sudić, I.; Capuder, T. Cloud Effects on Photovoltaic Power Forecasting: Initial Analysis of a Single Power Plant Based on Satellite Images and Weather Forecasts. In Proceedings of the 8th International Conference on Advances on Clean Energy Research, Barcelona, Spain, 28–30 April 2023. [Google Scholar]
  21. AlShafeey, M.; Csáki, C. Evaluating neural network and linear regression photovoltaic power forecasting models based on different input methods. Energy Rep. 2021, 7, 7601–7614. [Google Scholar] [CrossRef]
  22. Huang, C.J.; Kuo, P.H. Multiple-input deep convolutional neural network model for short-term photovoltaic power forecasting. IEEE Access 2019, 7, 74822–74834. [Google Scholar] [CrossRef]
  23. Si, Z.; Yang, M.; Yu, Y.; Ding, T. Photovoltaic power forecast based on satellite images considering effects of solar position. Appl. Energy 2021, 302, 117514. [Google Scholar] [CrossRef]
  24. Agoua, X.G.; Girard, R.; Kariniotakis, G. Photovoltaic power forecasting: Assessment of the impact of multiple sources of spatio-temporal data on forecast accuracy. Energies 2021, 14, 1432. [Google Scholar] [CrossRef]
  25. Venugopal, V.; Sun, Y.; Brandt, A.R. Short-term solar PV forecasting using computer vision: The search for optimal CNN architectures for incorporating sky images and PV generation history. J. Renew. Sustain. Energy 2019, 11, 066102. [Google Scholar] [CrossRef]
  26. Lago, J.; De Brabandere, K.; De Ridder, F.; De Schutter, B. Short-term forecasting of solar irradiance without local telemetry: A generalized model using satellite data. Sol. Energy 2018, 173, 566–577. [Google Scholar] [CrossRef]
  27. Feng, C.; Zhang, J. SolarNet: A sky image-based deep convolutional neural network for intra-hour solar forecasting. Sol. Energy 2020, 204, 71–78. [Google Scholar] [CrossRef]
  28. Sharadga, H.; Hajimirza, S.; Balog, R.S. Time series forecasting of solar power generation for large-scale photovoltaic plants. Renew. Energy 2020, 150, 797–807. [Google Scholar] [CrossRef]
  29. Yang, D. Choice of clear-sky model in solar forecasting. J. Renew. Sustain. Energy 2020, 12, 026101. [Google Scholar] [CrossRef]
  30. Wen, H.; Du, Y.; Chen, X.; Lim, E.; Wen, H.; Jiang, L.; Xiang, W. Deep learning based multistep solar forecasting for PV ramp-rate control using sky images. IEEE Trans. Ind. Inform. 2020, 17, 1397–1406. [Google Scholar] [CrossRef]
  31. Nie, Y.; Li, X.; Scott, A.; Sun, Y.; Venugopal, V.; Brandt, A. SKIPP’D: A SKy Images and Photovoltaic Power Generation Dataset for short-term solar forecasting. Sol. Energy 2023, 255, 171–179. [Google Scholar] [CrossRef]
  32. Kuo, W.C.; Chen, C.H.; Chen, S.Y.; Wang, C.C. Deep learning neural networks for short-term PV Power Forecasting via Sky Image method. Energies 2022, 15, 4779. [Google Scholar] [CrossRef]
  33. Hossain, M.S.; Mahmood, H. Short-term photovoltaic power forecasting using an LSTM neural network and synthetic weather forecast. IEEE Access 2020, 8, 172524–172533. [Google Scholar] [CrossRef]
  34. Mayer, M.J.; Yang, D. Probabilistic photovoltaic power forecasting using a calibrated ensemble of model chains. Renew. Sustain. Energy Rev. 2022, 168, 112821. [Google Scholar] [CrossRef]
  35. Dai, Y.; Wang, Y.; Leng, M.; Yang, X.; Zhou, Q. LOWESS smoothing and Random Forest based GRU model: A short-term photovoltaic power generation forecasting method. Energy 2022, 256, 124661. [Google Scholar] [CrossRef]
  36. Fu, Y.; Chai, H.; Zhen, Z.; Wang, F.; Xu, X.; Li, K.; Shafie-Khah, M.; Dehghanian, P.; Catalão, J.P. Sky image prediction model based on convolutional auto-encoder for minutely solar PV power forecasting. IEEE Trans. Ind. Appl. 2021, 57, 3272–3281. [Google Scholar] [CrossRef]
  37. Brester, C.; Kallio-Myers, V.; Lindfors, A.V.; Kolehmainen, M.; Niska, H. Evaluating neural network models in site-specific solar PV forecasting using numerical weather prediction data and weather observations. Renew. Energy 2023, 207, 266–274. [Google Scholar] [CrossRef]
  38. Mayer, M.J. Benefits of physical and machine learning hybridization for photovoltaic power forecasting. Renew. Sustain. Energy Rev. 2022, 168, 112772. [Google Scholar] [CrossRef]
  39. Markovics, D.; Mayer, M.J. Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renew. Sustain. Energy Rev. 2022, 161, 112364. [Google Scholar] [CrossRef]
  40. Yu, D.; Lee, S.; Lee, S.; Choi, W.; Liu, L. Forecasting photovoltaic power generation using satellite images. Energies 2020, 13, 6603. [Google Scholar] [CrossRef]
  41. Abdellatif, A.; Mubarak, H.; Ahmad, S.; Ahmed, T.; Shafiullah, G.M.; Hammoudeh, A.; Abdellatef, H.; Rahman, M.M.; Gheni, H.M. Forecasting photovoltaic power generation with a stacking ensemble model. Sustainability 2022, 14, 11083. [Google Scholar] [CrossRef]
  42. Harrou, F.; Kadri, F.; Sun, Y. Forecasting of photovoltaic solar power production using LSTM approach. In Advanced Statistical Modeling, Forecasting, and Fault Detection in Renewable Energy Systems; Intech Open: London, UK, 2020; pp. 3–18. [Google Scholar]
  43. Simeunović, J.; Schubnel, B.; Alet, P.J.; Carrillo, R.E.; Frossard, P. Interpretable temporal-spatial graph attention network for multi-site PV power forecasting. Appl. Energy 2022, 327, 120127. [Google Scholar] [CrossRef]
  44. Jeong, J.; Kim, H. Multi-site photovoltaic forecasting exploiting space-time convolutional neural network. Energies 2019, 12, 4490. [Google Scholar] [CrossRef]
  45. Guo, X.; Mo, Y.; Yan, K. Short-term photovoltaic power forecasting based on historical information and deep learning methods. Sensors 2022, 22, 9630. [Google Scholar] [CrossRef]
  46. Aprillia, H.; Yang, H.T.; Huang, C.M. Short-term photovoltaic power forecasting using a convolutional neural network–salp swarm algorithm. Energies 2020, 13, 1879. [Google Scholar] [CrossRef]
  47. Simeunović, J.; Schubnel, B.; Alet, P.J.; Carrillo, R.E. Spatio-temporal graph neural networks for multi-site PV power forecasting. IEEE Trans. Sustain. Energy 2021, 13, 1210–1220. [Google Scholar] [CrossRef]
  48. Gao, M.; Li, J.; Hong, F.; Long, D. Day-ahead power forecasting in a large-scale photovoltaic plant based on weather classification using LSTM. Energy 2019, 187, 115838. [Google Scholar] [CrossRef]
  49. Liu, W.; Liu, C.; Lin, Y.; Ma, L.; Xiong, F.; Li, J. Ultra-short-term forecast of photovoltaic output power under fog and haze weather. Energies 2018, 11, 528. [Google Scholar] [CrossRef]
  50. Yao, X.; Wang, Z.; Zhang, H. A novel photovoltaic power forecasting model based on echo state network. Neurocomputing 2019, 325, 182–189. [Google Scholar] [CrossRef]
  51. Han, Y.; Wang, N.; Ma, M.; Zhou, H.; Dai, S.; Zhu, H. A PV power interval forecasting based on seasonal model and nonparametric estimation algorithm. Sol. Energy 2019, 184, 515–526. [Google Scholar] [CrossRef]
  52. Dolara, A.; Grimaccia, F.; Leva, S.; Mussetta, M.; Ogliari, E. Comparison of training approaches for photovoltaic forecasts by means of machine learning. Appl. Sci. 2018, 8, 228. [Google Scholar] [CrossRef]
  53. Louzazni, M.; Mosalam, H.; Khouya, A.; Amechnoue, K. A non-linear auto-regressive exogenous method to forecast the photovoltaic power output. Sustain. Energy Technol. Assess. 2020, 38, 100670. [Google Scholar] [CrossRef]
  54. Lee, D.; Kim, K. Recurrent neural network-based hourly prediction of photovoltaic power output using meteorological information. Energies 2019, 12, 215. [Google Scholar] [CrossRef]
  55. Son, J.; Park, Y.; Lee, J.; Kim, H. Sensorless PV power forecasting in grid-connected buildings through deep learning. Sensors 2018, 18, 2529. [Google Scholar] [CrossRef]
  56. Mellit, A.; Pavan, A.M.; Lughi, V. Deep learning neural networks for short-term photovoltaic power forecasting. Renew. Energy 2021, 172, 276–288. [Google Scholar] [CrossRef]
  57. Dolatabadi, A.; Abdeltawab, H.; Mohamed, Y.A.R.I. Deep reinforcement learning-based self-scheduling strategy for a CAES-PV system using accurate sky images-based forecasting. IEEE Trans. Power Syst. 2022, 38, 1608–1618. [Google Scholar] [CrossRef]
  58. Pérez, E.; Pérez, J.; Segarra-Tamarit, J.; Beltran, H. A deep learning model for intra-day forecasting of solar irradiance using satellite-based estimations in the vicinity of a PV power plant. Sol. Energy 2021, 218, 652–660. [Google Scholar] [CrossRef]
  59. Zhen, Z.; Pang, S.; Wang, F.; Li, K.; Li, Z.; Ren, H.; Shafie-khah, M.; Catalao, J.P. Pattern classification and PSO optimal weights based sky images cloud motion speed calculation method for solar PV power forecasting. IEEE Trans. Ind. Appl. 2019, 55, 3331–3342. [Google Scholar] [CrossRef]
  60. Akhter, M.N.; Mekhilef, S.; Mokhlis, H.; Almohaimeed, Z.M.; Muhammad, M.A.; Khairuddin, A.S.M.; Akram, R.; Hussain, M.M. An hour-ahead PV power forecasting method based on an RNN-LSTM model for three different PV plants. Energies 2022, 15, 2243. [Google Scholar] [CrossRef]
  61. Wang, H.; Cai, R.; Zhou, B.; Aziz, S.; Qin, B.; Voropai, N.; Gan, L.; Barakhtenko, E. Solar irradiance forecasting based on direct explainable neural network. Energy Convers. Manag. 2020, 226, 113487. [Google Scholar] [CrossRef]
  62. Mitrentsis, G.; Lens, H. An interpretable probabilistic model for short-term solar power forecasting using natural gradient boosting. Appl. Energy 2022, 309, 118473. [Google Scholar] [CrossRef]
  63. Sun, Y.; Venugopal, V.; Brandt, A.R. Short-term solar power forecast with deep learning: Exploring optimal input and output configuration. Sol. Energy 2019, 188, 730–741. [Google Scholar] [CrossRef]
  64. Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
  65. Hodson, T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
  66. Koutsandreas, D.; Spiliotis, E.; Petropoulos, F.; Assimakopoulos, V. On the selection of forecasting accuracy measures. J. Oper. Res. Soc. 2022, 73, 937–954. [Google Scholar] [CrossRef]
  67. Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Mraz, A.; Kashiyama, T.; Sekimoto, Y. Transfer learning-based road damage detection for multiple countries. arXiv 2020, arXiv:2008.13101. [Google Scholar]
  68. Madiniyeti, J.; Chao, Y.; Li, T.; Qi, H.; Wang, F. Concrete Dam Deformation Prediction Model Research Based on SSA–LSTM. Appl. Sci. 2023, 13, 7375. [Google Scholar] [CrossRef]
  69. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, CA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  70. Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Adv. Neural Inf. Process. Syst. 2022, 35, 507–520. [Google Scholar]
  71. Munawar, U.; Wang, Z. A framework of using machine learning approaches for short-term solar power forecasting. J. Electr. Eng. Technol. 2020, 15, 561–569. [Google Scholar] [CrossRef]
  72. Bhatt, U.; Xiang, A.; Sharma, S.; Weller, A.; Taly, A.; Jia, Y.; Ghosh, J.; Puri, R.; Moura, J.M.; Eckersley, P. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 648–657. [Google Scholar]
  73. Burkart, N.; Huber, M.F. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 2021, 70, 245–317. [Google Scholar] [CrossRef]
  74. Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
  75. Lu, Y.; Murzakhanov, I.; Chatzivasileiadis, S. Neural network interpretability for forecasting of aggregated renewable generation. In Proceedings of the 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Aachen, Germany, 25–28 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 282–288. [Google Scholar]
  76. Blanc, P.; Gschwind, B.; Lefèvre, M.; Wald, L. The HelioClim project: Surface solar irradiance data for climate applications. Remote Sens. 2011, 3, 343–361. [Google Scholar] [CrossRef]
  77. Vernay, C.; Blanc, P.; Pitaval, S. Characterizing measurements campaigns for an innovative calibration approach of the global horizontal irradiation estimated by HelioClim-3. Renew. Energy 2013, 57, 339–347. [Google Scholar] [CrossRef]
  78. Terrén-Serrano, G.; Bashir, A.; Estrada, T.; Martínez-Ramón, M. Girasol, a sky imaging and global solar irradiance dataset. Data Brief 2021, 35, 106914. [Google Scholar] [CrossRef]
  79. Terrén-Serrano, G. Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted from Ground-Based Infrared Sky Images. Ph.D. Thesis, The University of New Mexico, Albuquerque, NM, USA, 2022. [Google Scholar]
  80. Augustine, J.A.; DeLuisi, J.J.; Long, C.N. SURFRAD–A national surface radiation budget network for atmospheric research. Bull. Am. Meteorol. Soc. 2000, 81, 2341–2358. [Google Scholar] [CrossRef]
  81. Holben, B.N.; Eck, T.F.; Slutsker, I.a.; Tanré, D.; Buis, J.; Setzer, A.; Vermote, E.; Reagan, J.A.; Kaufman, Y.; Nakajima, T.; et al. AERONET—A federated instrument network and data archive for aerosol characterization. Remote Sens. Environ. 1998, 66, 1–16. [Google Scholar] [CrossRef]
  82. Amillo, A.; Taylor, N.; Fernandez, A.; Dunlop, E.; Mavrogiorgios, P.; Fahl, F.; Arcaro, G.; Pinedo, I. Adapting PVGIS to trends in climate, technology and user needs. In Proceedings of the 38th European Photovoltaic Solar Energy Conference and Exhibition, Online, 6–10 September 2021; pp. 6–10. [Google Scholar]
  83. Hersbach, H. The ERA5 Atmospheric Reanalysis. In Proceedings of the AGU Fall Meeting Abstracts, San Francisco, CA, USA, 12–16 December 2016; Volume 2016, p. NG33D–01. [Google Scholar]
  84. European Space Agency and European Organisation for the Exploitation of Meteorological Satellites. Meteosat Second Generation. Available online: https://www.eumetsat.int/meteosat-second-generation (accessed on 21 September 2023).
  85. European Centre for Medium-Range Weather Forecasts. Atmospheric Model High Resolution 10-Day Forecast (Set I-HRES). Available online: https://www.ecmwf.int/en/forecasts/datasets/set-i (accessed on 21 September 2023).
  86. European Centre for Medium-Range Weather Forecasts. Copernicus Climate Change Service-C3S. Available online: https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset (accessed on 21 September 2023).
  87. Surface Radiation Budget Network. Surface Radiation Budget Network. Available online: https://gml.noaa.gov/grad/surfrad/overview.html (accessed on 21 September 2023).
  88. National Centers for Environtmental Prediction (NCEP). Global Forecasting System. Available online: https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast (accessed on 21 September 2023).
  89. Mahammad, S.S.; Ramakrishnan, R. GeoTIFF-A standard image file format for GIS applications. Map India Image Process. Interpret. 2003, 2023, 28–31. [Google Scholar]
  90. Finnish Meteorological Institute. Open Data Sets. Available online: https://en.ilmatieteenlaitos.fi/open-data-sets-available (accessed on 21 September 2023).
Figure 1. Referenced geographical locations for PV power plants in analysed literature that cover various types of climate: Europe—16 references, USA—15 references, China—8 references, created using Google My Maps (https://www.google.com/mymaps) on 12 September 2023.
Figure 1. Referenced geographical locations for PV power plants in analysed literature that cover various types of climate: Europe—16 references, USA—15 references, China—8 references, created using Google My Maps (https://www.google.com/mymaps) on 12 September 2023.
Energies 17 00097 g001
Figure 2. Typical CNN architecture. Adopted and adjusted from source: [67].
Figure 2. Typical CNN architecture. Adopted and adjusted from source: [67].
Energies 17 00097 g002
Figure 3. LSTM cell architecture Adopted and adjusted from source: [68].
Figure 3. LSTM cell architecture Adopted and adjusted from source: [68].
Energies 17 00097 g003
Table 1. Research area overview.
Table 1. Research area overview.
ReferenceGoalHorizon (Step)Test SizeError TermMethodsLocation
[21]PV power forecasting24 h (15 min)5 monthsMAE, MSE, R 2 , RMSE, ErrorMLR, ANNHungary
[22]PV power forecasting24 h (1 h)not specifiedRMSE, MAECNN, SVM, DT, MLP, LSTM, RFTaiwan
[23]PV power forecasting15, 30, 60 min (15 min)4 monthsnMAE, nRMSE, 1- ρ Conv-LSTM, CNNChina
[24]PV power forecasting15 min, 1 h, 3 h, 6 h (15 min)not specifiednRMSE, nMAEST-Lasso, ARFrance
[25]PV power forecasting15 min20 daysRMSE, FSCNN, ARUSA
[26]Irradiance forecasting6 h (1 h)1 yearrRMSE, MBE, FSDNN (FNN), AR, GBTNetherlands
[27]Irradiance forecasting1 h (10 min)2 yearsRMSE, MBE, FSANN, GBM, RF, Conv ReLU + DenseUSA
[28]PV power forecasting1, 2, 3 h (1 h)23 daysRMSE, MSE, ρ BiLSTM, ARMA, ARIMA, SARIMAChina
[29]Irradiance forecasting24 h (15 min)3 yearsMSE, FSClear sky modelsUSA
[15]Irradiance forecasting15 min36 daysRMSE, MAE, ρ CNN-LSTM, CNN-ANNUSA
[30]Irradiance forecasting10 min (1 min)1 monthRMSE, MAE, nRMSE, nMAE, FSResNetUSA
[31]PV power forecasting15 min (1 min)20 daysRMSE, MAE, FSCNN + DenseUSA
[32]PV power forecasting1 h (1 min)1 weekRMSE, MAE, MAPELSTM, ANN, GRUTaiwan (*)
[33]PV power forecasting6, 12, 24 h (1 h)100 daysRMSE, MAE, MAPE, MRE, MBELSTM, ANN, GRUUSA
[16]Irradiance forecasting15 min70 daysRMSE, MAE, ρ , nPIAW, PICP, CWCCNN-QR, LSTM-QR, ANNUSA
[17]Irradiance forecasting10 min (1 min)30 daysMAPE, RMSE, MBESVM, BPNN, ARIMAUSA
[14]Irradiance forecasting5, 10 min5 monthsAccuracyResNetItaly, Switzerland
[34]PV power forecasting24 h (15 min)1 yearCRPS, MARE, PIAWQRHungary
[35]PV power forecasting24 h31 daysRMSE, MSE, MAE, MAPE, nRMSE, R 2 GRU, RFChina
[36]PV power forecasting5 min (0.5 min)1 daySSIM, MSEConv-AutoEncoderUSA
[37]PV power forecasting24 h5 weeksMAE, RMSE, nMAE, nRMSE, IA, ErrorMLPFinland
[38]PV power forecasting24 h (15 min)1 yearnRMSE, nMAE, nMBEANN + Physical ModelHungary
[39]PV power forecasting24 h (15 min)2 yearsnRMSE, nMAE, nMBE, FS, ρ LR, SVM, CatBoost, MLP, RF, LGBM, XGBHungary
[40]PV power forecasting5 h (1 h)4 monthsMAPE, MAE, RMSE, nMAELSTM, CNN, FNNSouth Korea
[41]PV power forecasting24 h10 monthsRMSE, MSE, MAE, ρ XGB, AdaBoost, RF, ETRKuala Lumpur
[42]PV power forecasting24 h (15 min)1.5 monthsRMSE, MAPE, MAE, ρ LSTMSaudi Arabia (*)
[43]PV power forecasting6 h (15 min)4 months, 1 yearnMAE, nRMSEGCLSTM, GCTrafo, TSM-GAT, STCNNSwitzerland, USA
[44]PV power forecasting2, 6 h (1 h)2.5 monthsnRMSE, MAPE, ErrorAR, FNN, LSTM, STCNNUSA
[45]PV power forecasting7.5, 15 min, (0.05 min)4 daysMAE, RMSE, adj R 2 , AccuracyDT, RF, SVR, MLP, LSTM, BiLSTM, NLSTMChina
[46]PV power forecasting24 h (1 h)3.5 monthsMAPE, MRESVM-SSA, CNN-SSA, LSTM-SSATaiwan
[47]PV power forecasting6 h (15 min)1 yearnMAE, nRMSESTAR, GCLSTM, GCTrafo, STCNN, SVR, EDLSTMSwitzerland
[48]PV power forecasting16 h (15 min)not specifiedRMSE, MADLSTM, Wavelet NN, SVM, BPNNChina
[49]PV power forecasting72 h (1 h)2 monthsMAE, MSEBPNNChina
[50]PV power forecasting10 h (1 h)500 samplesMAPE, ErrorMR-ESNUSA
[51]Irradiance forecastingnot specifiednot specifiedMAE, RMSE, PICP, PINAW, AccuracyELMChina
[20]PV power forecasting1 h (1 h)3.5 monthsMSERidgeCroatia
[52]PV power forecasting24 h (1 h)22 daysMAPE, nMAE, wMAE, eMAE, nRMSE, ρ , ErrorPHANNItaly
[53]PV power forecasting24, 72 h (1 h)1 monthIAE, RMSE, MSE, R 2 NARXEgypt
[54]PV power forecasting14 h (1 h)4 monthsMAE, RMSEANN, DNN, LSTM, ARIMA, SARIMASouth Korea
[55]PV power forecasting24 h4 monthsMAE, RMSE, MAPE, Error, ρ MLP (DNN)South Korea
[56]PV power forecasting1, 5, 30, 60 min (1 min)1 monthMAE, RMSE, MAPE, ρ LSTM, ENN, NARItaly
[57]Irradiance forecasting17 h (1 h)1 yearMAE, MAPE, RMSEDRL + CNN-BiLSTMUSA
[13]PV power forecasting4 h (15 min)30 daysnMAE, nRMSESVM, GBDT, ARMAUSA
[58]Irradiance forecasting6 h (15 min)6 monthsMAE, rMAE, RMSE, rRMSEConvNet + DenseFrance
[59]PV power forecasting1 min200 samples ρ K-means, PSOChina
[60]PV power forecasting1 h1 yearRMSE, R 2 LSTM, SVR, ANN, ANFIS, GPRKuala Lumpur
[61]Irradiance forecasting1 minnot specifiedRMSE, MAE, R 2 SVR, BPNN, XGB, DXNNFrance
[62]PV power forecasting36 h (15 min)1 monthRMSE, MAE, MBE, CRPS, PIAW, PICPNGBoost, GPGermany
*—Location of author affiliation
Table 2. Performance measures employed in analysed research studies.
Table 2. Performance measures employed in analysed research studies.
Measure of PerformanceEquation
Root Mean Squared Error (RMSE) * 1 n i = 1 n ( y i y ^ i ) 2
Mean Squared Error (MSE) 1 n i = 1 n ( y i y ^ i ) 2
Mean Absolute Error (MAE) * 1 n i = 1 n | y i y ^ i |
Mean Absolute Percentage Error (MAPE) 1 n i = 1 n | y i y ^ i | | y i | × 100 %
Mean Absolute Deviation (MAD) 1 n i = 1 n | y i y ^ i | | P V c a p a c i t y | × 100 %
Mean Bias Error (MBE) * 1 n i = 1 n ( y i y ^ i )
Index of Agreement (IA) 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( | y ^ i y ¯ | + | y i y ¯ | ) 2
Correlation Coefficient ( ρ ) i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2 i = 1 n ( y i y ¯ ) 2
Coefficient of Determination (R 2 ) 1 i = 1 n ( y i y ^ i ) i = 1 n ( y i y ¯ )
Continuous Ranked Probability Score (CRPS) 1 n i = 1 n [ F y i ( x ) 1 ( x y i ) ] 2 d x
Prediction Interval Average Width (PIAW) * 1 n i = 1 n ( U i L i )
Prediction Interval Coverage Probability (PICP) 1 n i = 1 n k i
Coverage Width Calculation (CWC) P I A W , P I C P u P I A W ( 1 + P I C P ) e g ( P I C P u ) , P I C P < u
Forecast Skill (FS) 1 R M S E m o d e l R M S E p e r s i s t e n c e
Envelope Weighted Mean Absolute Error (eMAE) i = 1 n | y i y ^ i | i = 1 n m a x ( y i , y ^ i ) × 100 %
* Also employed in normalized (or relative) shape related to maximum plant capacity.
Table 3. Data sources.
Table 4. Benchmark.
Table 4. Benchmark.
Source/Ref.Period (Training)TestingGoalHorizon (Step)Best MethodError
NREL/[57](1 January 2015–31 December 2018)1 January 2020–31 December 2020Irradiance forecasting17 h (1 h)DRL + CNN-BiLSTMRMSE = 80.02 W/m 2 MAE = 51.95 W/m 2 MAPE = 7.64%
NREL/[27](1 January 2012–31 December 2014)1 January 2016–31 December 2017Irradiance forecasting1 h (10 min)SolarNetFS = 34.02 RMSE = 81.03 W/m 2 MBE = −0.44 W/m 2
NREL/[15](1 July 2017–19 April 2018)25 May 2018–30 June 2018Irradiance forecasting15 minCNN-LSTM CNN-ANN ρ = 0.97 RMSE = 80.48 W/m 2 MAE = 51.89 W/m 2
NREL/[33](1 January 2016–22 September 2018)22 September 2018–31 December 2018PV power forecasting24 h (1 h)LSTM-NNMAE = 0.36 MW RMSE = 0.71 MW MAPE = 22.31% MRE = 1.44% MBE = 0.01 MW
NREL/[16]1 July 2017–30 June 20185-fold CV∼70 daysIrradiance forecasting15 minCNN-QR LSTM-QRMAE = 68.84 W/m 2 RMSE = 98.94 W/m 2 nPIAW = 0.09% PICP = 0.92% CWC = 0.16 ρ = 0.96
NREL/[43](1 January 2006–31 August 2006)1 September 2006–31 December 2006PV power forecasting6 h (15 min)TSM-GATnMAE = 14.78% nRMSE = 10.37%
SKIPP’D/[31]1 March 2017–1 December 2019∼20 days (4% of data)PV power forecasting15 min (1 min)ConvNetFS = 16.44 RMSE = 0.0024 MW MAE = 0.0015 MW
SURFRAD/[29]1 January 2015–31 December 2018not specifiedIrradiance forecasting24 h (15 min)Ineichen-Perez clear sky modelFS = 14.3 RMSE = 120 W/m 2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pandžić, F.; Capuder, T. Advances in Short-Term Solar Forecasting: A Review and Benchmark of Machine Learning Methods and Relevant Data Sources. Energies 2024, 17, 97. https://doi.org/10.3390/en17010097

AMA Style

Pandžić F, Capuder T. Advances in Short-Term Solar Forecasting: A Review and Benchmark of Machine Learning Methods and Relevant Data Sources. Energies. 2024; 17(1):97. https://doi.org/10.3390/en17010097

Chicago/Turabian Style

Pandžić, Franko, and Tomislav Capuder. 2024. "Advances in Short-Term Solar Forecasting: A Review and Benchmark of Machine Learning Methods and Relevant Data Sources" Energies 17, no. 1: 97. https://doi.org/10.3390/en17010097

APA Style

Pandžić, F., & Capuder, T. (2024). Advances in Short-Term Solar Forecasting: A Review and Benchmark of Machine Learning Methods and Relevant Data Sources. Energies, 17(1), 97. https://doi.org/10.3390/en17010097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop