# Comparison of Data-Driven Techniques for Nowcasting Applied to an Industrial-Scale Photovoltaic Plant

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Data Sources and Preprocessing

#### 2.1. Data Preprocessing

#### 2.2. Feature Selection

## 3. Methods

- Three-fold cross-validation and hyperparameter search on the training set (${N}_{\mathrm{tr}}$) samples: training parameters yielding the best cross-validation score were selected. This should select the algorithm with the best generalization performance, which is the one which most likely performs best on new, previously unseen, input data.
- Evaluation of the performance of the algorithm on the test set, using common error metrics, after reconstruction of the actual power from the predicted stochastic component.

#### Error Metrics

## 4. Results

#### 4.1. Cross-Validation Performance

#### 4.2. Test Set Performance

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A

Random Forest | Lasso | ||
---|---|---|---|

n_estimators | 300, 400 | alpha | [0, 0.2] |

max_depth | 10, 20, 100, auto | selection | random, cyclic |

min_samples_split | 2, 10, 50 | warm_start | true, false |

min_samples_leaf | 4, 10, 50 | ||

MLP | KNN | ||

activation | relu, tanh, logistic | n_neighbors | 10, 20, 200, 300 |

alpha | [0, 1] | selection | random, cyclic |

learning_rate | adaptive, constant | ||

hidden_layer_size | 5, 55, 155 | ||

n_hidden_layers | 2, 3 |

## References

- IEA. Renewables 2018—Market Analysis and Forecast from 2018 to 2023. 2018. Available online: https://www.iea.org/renewables2018/ (accessed on 13 September 2019).
- Antonanzas, J.; Osorio, N.; Escobar, R.; Urraca, R.; Martinez-de-Pison, F.; Antonanzas-Torres, F. Review of Photovoltaic Power Forecasting. Sol. Energy
**2016**, 136, 78–111. [Google Scholar] [CrossRef] - Cros, S.; Buessler, E.; Huet, L.; Sébastien, N.; Schmutz, N. The Benefits of Intraday Solar Irradiance Forecasting to Adjust the Day-Ahead Scheduled PV Power. In Proceedings of the Solar Integration Workshop 2015, Brussels, Belgium, 19–20 October 2015. [Google Scholar]
- Wan, C.; Zhao, J.; Song, Y.; Xu, Z.; Lin, J.; Hu, Z. Photovoltaic and Solar Power Forecasting for Smart Grid Energy Management. CSEE J. Power Energy Syst.
**2015**, 1, 38–46. [Google Scholar] [CrossRef] - Chow, S.K.; Lee, E.W.; Li, D.H. Short-Term Prediction of Photovoltaic Energy Generation by Intelligent Approach. Energy Build.
**2012**, 55, 660–667. [Google Scholar] [CrossRef] - Chow, C.W.; Belongie, S.; Kleissl, J. Cloud Motion and Stability Estimation for Intra-Hour Solar Forecasting. Sol. Energy
**2015**, 115, 645–655. [Google Scholar] [CrossRef] - Lorenzo, A.T.; Morzfeld, M.; Holmgren, W.F.; Cronin, A.D. Optimal Interpolation of Satellite and Ground Data for Irradiance Nowcasting at City Scales. Sol. Energy
**2017**, 144, 466–474. [Google Scholar] [CrossRef] - Li, Y.Z.; He, L.; Nie, R.Q. Short-Term Forecast of Power Generation for Grid-Connected Photovoltaic System Based on Advanced Grey-Markov Chain. In Proceedings of the 2009 International Conference on Energy and Environment Technology, Guilin, China, 16–18 October 2009; pp. 275–278. [Google Scholar] [CrossRef]
- Ogliari, E.; Dolara, A.; Manzolini, G.; Leva, S. Physical and Hybrid Methods Comparison for the Day Ahead PV Output Power Forecast. Renew. Energy
**2017**, 113, 11–21. [Google Scholar] [CrossRef] - Inman, R.H.; Pedro, H.T.; Coimbra, C.F. Solar Forecasting Methods for Renewable Energy Integration. Prog. Energy Combust. Sci.
**2013**, 39, 535–576. [Google Scholar] [CrossRef] - Dolara, A.; Leva, S.; Manzolini, G. Comparison of Different Physical Models for PV Power Output Prediction. Sol. Energy
**2015**, 119, 83–99. [Google Scholar] [CrossRef] - Tina, G.M.; Marletta, G.; Sardella, S. Multi-layer thermal models of PV modules for monitoring applications. In Proceedings of the 2012 38th IEEE Photovoltaic Specialists Conference, Austin, TX, USA, 3–8 June 2012; pp. 002947–002952. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Pedro, H.T.; Coimbra, C.F. Assessment of Forecasting Techniques for Solar Power Production with No Exogenous Inputs. Sol. Energy
**2012**, 86, 2017–2028. [Google Scholar] [CrossRef] - Meyers, B.; Hoffimann, J. Short time horizon solar power forecasting. Neural Netw.
**2017**, 3, 2340. [Google Scholar] - Ineichen, P.; Perez, R. A New Airmass Independent Formulation for the Linke Turbidity Coefficient. Sol. Energy
**2002**, 73, 151–157. [Google Scholar] [CrossRef] - Holmgren, W.F.; Hansen, C.W.; Mikofski, M.A. Pvlib Python: A Python Package for Modeling Solar Energy Systems. J. Open Source Softw.
**2018**, 3, 884. [Google Scholar] [CrossRef] - James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R, corrected at 8th printing ed.; Springer Texts in Statistics; Springer: New York, NY, USA; Heidelberg, Germany; Dordrecht, The Netherlands; London, UK, 2017. [Google Scholar]
- Mellit, A.; Pavan, A.M. A 24-h Forecast of Solar Irradiance Using Artificial Neural Network: Application for Performance Prediction of a Grid-Connected PV Plant at Trieste, Italy. Sol. Energy
**2010**, 84, 807–821. [Google Scholar] [CrossRef] - Zamo, M.; Mestre, O.; Arbogast, P.; Pannekoucke, O. A Benchmark of Statistical Regression Methods for Short-Term Forecasting of Photovoltaic Electricity Production, Part I: Deterministic Forecast of Hourly Production. Sol. Energy
**2014**, 105, 792–803. [Google Scholar] [CrossRef] - Husein, M.; Chung, I.Y. Day-Ahead Solar Irradiance Forecasting for Microgrids Using a Long Short-Term Memory Recurrent Neural Network: A Deep Learning Approach. Energies
**2019**, 12, 1856. [Google Scholar] [CrossRef] - Santosa, F.; Symes, W.W. Linear Inversion of Band-Limited Reflection Seismograms. SIAM J. Sci. Stat. Comput.
**1986**, 7, 1307–1330. [Google Scholar] [CrossRef] - Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2017. [Google Scholar]
- Ho, T.K. Random Decision Forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar] [CrossRef]
- Altman, N.S. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am. Stat.
**1992**, 46, 175–185. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] - Chollet, F. Keras 2015. Available online: https://keras.io/ (accessed on 27 November 2019).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: tensorflow.org (accessed on 27 November 2019).
- Bao, W.; Yue, J.; Rao, Y. A Deep Learning Framework for Financial Time Series Using Stacked Autoencoders and Long-Short Term Memory. PLoS ONE
**2017**, 12, e0180944. [Google Scholar] [CrossRef] - Tang, Y.; Xu, J.; Matsumoto, K.; Ono, C. Sequence-to-Sequence Model with Attention for Time Series Classification. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 503–510. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2014**, arXiv:cs/1412.6980. [Google Scholar] - Yang, D. A Guideline to Solar Forecasting Research Practice: Reproducible, Operational, Probabilistic or Physically-Based, Ensemble, and Skill (ROPES). J. Renew. Sustain. Energy
**2019**, 11, 022701. [Google Scholar] [CrossRef] - Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat.
**1956**, 27, 832–837. [Google Scholar] [CrossRef] - Raza, M.Q.; Nadarajah, M.; Ekanayake, C. On Recent Advances in PV Output Power Forecast. Sol. Energy
**2016**, 136, 125–144. [Google Scholar] [CrossRef] - Zhang, J.; Verschae, R.; Nobuhara, S.; Lalonde, J.F. Deep Photovoltaic Nowcasting. Sol. Energy
**2018**, 176, 267–276. [Google Scholar] [CrossRef] - Chu, Y.; Pedro, H.T.; Kaur, A.; Kleissl, J.; Coimbra, C.F. Net Load Forecasts for Solar-Integrated Operational Grid Feeders. Sol. Energy
**2017**, 158, 236–246. [Google Scholar] [CrossRef] - Elsinga, B.; van Sark, W.G. Short-Term Peer-to-Peer Solar Forecasting in a Network of Photovoltaic Systems. Appl. Energy
**2017**, 206, 1464–1483. [Google Scholar] [CrossRef] - Oneto, L.; Laureri, F.; Robba, M.; Delfino, F.; Anguita, D. Data-Driven Photovoltaic Power Production Nowcasting and Forecasting for Polygeneration Microgrids. IEEE Syst. J.
**2018**, 12, 2842–2853. [Google Scholar] [CrossRef] - Pedro, H.T.; Coimbra, C.F.; David, M.; Lauret, P. Assessment of Machine Learning Techniques for Deterministic and Probabilistic Intra-Hour Solar Forecasts. Renew. Energy
**2018**, 123, 191–203. [Google Scholar] [CrossRef] - Chu, Y.; Coimbra, C.F. Short-Term Probabilistic Forecasts for Direct Normal Irradiance. Renew. Energy
**2017**, 101, 526–536. [Google Scholar] [CrossRef]

**Figure 2.**Separation in the stochastic (${I}_{\mathrm{stoc}}$) and deterministic (${I}_{CS}$) contribution of the total irradiance (${I}_{GHI}$), as shown in Equation (1).

**Figure 3.**Structure of the input and output of the regression problem. B time steps in the past for each of the F physical features, for a total of $BF$ predictors are fed as an input to the regression algorithm. The output is selected to simultaneously predict the 12 time steps corresponding to a forecast interval of 1 h at 5 min time resolution. The picture shows one sample (observation) indexed with $i=1,\cdots ,N$ where N is the number of samples considered (for instance $N={N}_{\mathrm{tr}}$ in the training phase).

**Figure 5.**Sketch of the regression input specific for the long short-term memory (LSTM) network. Each sample is in this case composed of F sequences of B consecutive and chronologically ordered time steps. The variables ${x}_{m}^{\left(k\right)}$ are denoted so that the $k=1,\cdots ,F$ spans the input features while m is the time index.

**Figure 6.**Comparison of the mean square error on the training and on the test set, as the LSTM is trained. Overfitting can occur if training is extended for too long (

**left**). A suitable early stopping strategy helps to prevent this occurrence (

**right**).

**Figure 7.**Plots of ${\mathrm{RMSE}}_{j}$ with $\phantom{\rule{1.em}{0ex}}j=1,\cdots ,12$, corresponding to forecast horizons from 5 to 60 min ahead, for $\mathsf{\Delta}{T}_{B}=2$ h (

**left**) and $\mathsf{\Delta}{T}_{B}=8$ h (

**right**).

**Figure 8.**Plots of ${\mathrm{MAE}}_{j}$ with $j=1,\cdots ,12$, corresponding to forecast horizons from 5 to 60 min ahead, for $\mathsf{\Delta}{T}_{B}=2$ h (

**left**) and $\mathsf{\Delta}{T}_{B}=8$ h (

**right**).

**Figure 9.**Plots of ${\mathrm{MBE}}_{j}$ with $j=1,\cdots ,12$, corresponding to forecast horizons from 5 to 60 min ahead, for $\mathsf{\Delta}{T}_{B}=2$ h (

**left**) and $\mathsf{\Delta}{T}_{B}=8$ h (

**right**). To preserve the readability of the plot, the legend is not reported and follows the same convention in Figure 7.

**Figure 10.**Plots of ${SS}_{j}$ with $j=1,\cdots ,12$, corresponding to forecast horizons from 5 to 60 min ahead, for $\mathsf{\Delta}{T}_{B}=2$ h (

**left**) and $\mathsf{\Delta}{T}_{B}=8$ h (

**right**). A dashed line at $SS=0.3$ is inserted to guide the eye.

**Figure 11.**Distributions of the residuals of the actual and predicted average power over the hour, namely (${P}_{1h}-{\widehat{P}}_{1h}$). Continuous probability density functions (PDF) were obtained from the histograms via the kernel density estimation [33] to enhance the readability of the plots. For the same reason, only two pairs of algorithms were selected for comparison. Dashed lines represent the MBE, which is the average value of the PDF. $\mathsf{\Delta}{T}_{b}=8$ h was set in all four cases.

**Figure 12.**Scatter plots comparing actual (P) and predicted ($\widehat{P}$) for different forecasts horizons and algorithm. Dashed lines shifted by $\pm \mathrm{RMSE}$ from the identity line were inserted to give visual indication of the spread of the data. $\mathsf{\Delta}{T}_{B}=2$ h was selected for both algorithms.

**Figure 13.**Comparison of predictive performance on three selected days, and three forecasts horizon. Black line (measured) is the actual power, whereas the red line (predicted) is the power which was predicted 5, 30, or 60 min before, respectively from left to right.

Variable | Description | Units |
---|---|---|

P | Output power of the PV plant | $\left[\mathrm{kW}\right]$ |

${T}_{p}$ | Panel temperature | ${[}^{\circ}\mathrm{C}]$ |

${I}_{GHI}$ | Global horizontal irradiance | $[\mathrm{kW}/{\mathrm{m}}^{2}]$ |

T | Ambient temperature | ${[}^{\circ}\mathrm{C}]$ |

H | Relative humidity | $[\%]$ |

p | Atmospheric pressure | $\left[\mathrm{mbar}\right]$ |

${W}_{d}$ | Wind direction | $\left[\mathrm{deg}\right]$ |

${W}_{s}$ | Wind speed | $[\mathrm{m}/\mathrm{s}]$ |

${\mathit{I}}_{\mathbf{GHI}}$ | ${\mathit{T}}_{\mathit{p}}$ | H | T | ${\mathit{W}}_{\mathit{s}}$ | p | ${\mathit{W}}_{\mathit{d}}$ | |
---|---|---|---|---|---|---|---|

Correlation with P | 0.97 | 0.80 | −0.69 | 0.58 | 0.28 | 0.028 | 0.016 |

Selected Physical Variables | ||||
---|---|---|---|---|

${I}_{\mathrm{stoc}}$ | ${T}_{p}$ | T | ${P}_{CS}$ | ${P}^{s}$ |

Algorithm | Tuning | Implementation |
---|---|---|

Linear regression | CV and training parameter search | scikit-learn [13] |

Lasso | CV and training parameter search | scikit-learn |

Random Forest | CV and training parameter search | scikit-learn |

MLP | CV and training parameter search | scikit-learn |

KNN | CV and training parameter search | scikit-learn |

LSTM | Manual tuning | keras [27]/Tensorflow [28] |

Parameter | Definition | Values |
---|---|---|

N | Number of samples | 103,740 |

${N}_{\mathrm{tr}}$ | Number of training samples | 82,992 |

${N}_{\mathrm{te}}$ | Number of test samples | 20,748 |

F | Number of features | 5 |

Lookback Time $\mathbf{\Delta}{\mathit{T}}_{\mathit{B}}$ | Lookback Time Steps B |
---|---|

2 h | 24 |

4 h | 48 |

8 h | 96 |

**Table 7.**Summary of cross-validation results. For each setting of $\mathsf{\Delta}{T}_{B}$, the maximum, average, and the standard deviation of the cross-validation score (R${}^{2}$) are reported.

Cross-Validation Score | |||||||||
---|---|---|---|---|---|---|---|---|---|

Maximum | Average | Standard Deviation | |||||||

Lookback Time | 2 h | 4 h | 8 h | 2 h | 4 h | 8 h | 2 h | 4 h | 8 h |

Linear regression | 0.78 | 0.78 | 0.78 | 0.78 | 0.78 | 0.78 | 0.0019 | 0.0023 | 0.0008 |

Lasso | 0.77 | 0.77 | 0.77 | 0.07 | 0.07 | 0.07 | 0.2300 | 0.2301 | 0.2301 |

Random Forest | 0.79 | 0.78 | 0.74 | 0.78 | 0.77 | 0.74 | 0.0031 | 0.0022 | 0.0027 |

MLP | 0.79 | 0.79 | 0.78 | 0.70 | 0.70 | 0.69 | 0.1705 | 0.1560 | 0.1402 |

KNN | 0.64 | 0.53 | 0.40 | 0.60 | 0.48 | 0.35 | 0.0403 | 0.0509 | 0.0588 |

**Table 8.**Summary of the hourly average power output predictive performance, quantified by the metrics ${\mathrm{RMSE}}_{1\mathrm{h}}$, ${\mathrm{MAE}}_{1\mathrm{h}}$, ${\mathrm{MBE}}_{1\mathrm{h}}$, and ${\mathrm{R}}_{1\mathrm{h}}^{2}$ defined in Section 3.

${\mathbf{RMSE}}_{1\mathbf{h}}\phantom{\rule{0.166667em}{0ex}}\left[\mathbf{kW}\right]$ | ${\mathbf{MAE}}_{1\mathbf{h}}\phantom{\rule{0.166667em}{0ex}}\left[\mathbf{kW}\right]$ | ${\mathbf{MBE}}_{1\mathbf{h}}\phantom{\rule{0.166667em}{0ex}}\left[\mathbf{kW}\right]$ | ${\mathbf{R}}_{1\mathbf{h}}^{2}$ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Lookback Time | 2 h | 4 h | 8 h | 2 h | 4 h | 8 h | 2 h | 4 h | 8 h | 2 h | 4 h | 8 h |

Linear regression | 50.8 | 50.7 | 50.7 | 27.9 | 27.7 | 27.6 | 5.40 | 5.12 | 4.37 | 0.968 | 0.969 | 0.968 |

Lasso | 48.4 | 48.5 | 48.5 | 25.6 | 25.6 | 25.7 | 0.20 | -0.01 | -0.02 | 0.971 | 0.971 | 0.971 |

Random Forest | 47.0 | 47.8 | 47.9 | 22.3 | 22.8 | 23.1 | −2.77 | −3.79 | −3.24 | 0.973 | 0.972 | 0.972 |

MLP | 46.8 | 48.8 | 48.5 | 23.0 | 25.5 | 26.5 | −2.44 | −1.10 | −4.66 | 0.973 | 0.971 | 0.971 |

KNN | 58.5 | 61.2 | 63.7 | 29.1 | 30.7 | 31.9 | −9.62 | −10.88 | −12.94 | 0.958 | 0.954 | 0.950 |

LSTM | 47.4 | 47.8 | 47.7 | 24.2 | 24.0 | 24.1 | −2.03 | −0.11 | −0.60 | 0.972 | 0.972 | 0.972 |

LSTM2 | 47.8 | 47.4 | 46.9 | 23.9 | 24.6 | 24.2 | −3.84 | −5.32 | −0.40 | 0.972 | 0.972 | 0.973 |

**Table 9.**Predictive performance during the day in terms of the indicators $\mathrm{RMSE}$, $\mathrm{MAE}$, ${\mathrm{R}}^{2}$, for the three representative days reported in Figure 13.

$\mathbf{RMSE}\phantom{\rule{0.166667em}{0ex}}\left[\mathbf{kW}\right]$ | $\mathbf{MAE}\phantom{\rule{0.166667em}{0ex}}\left[\mathbf{kW}\right]$ | ${\mathbf{R}}^{2}$ | |||||||
---|---|---|---|---|---|---|---|---|---|

${\mathit{f}}_{\mathit{h}}$ [min] | 5 | 30 | 60 | 5 | 30 | 60 | 5 | 30 | 60 |

Day 1 | 22.0 | 34.9 | 38.8 | 10.3 | 18.6 | 22.5 | 0.995 | 0.988 | 0.985 |

Day 2 | 40.5 | 59.7 | 71.1 | 21.3 | 33.9 | 40.9 | 0.924 | 0.836 | 0.767 |

Day 3 | 60.5 | 85.1 | 104.1 | 29.0 | 44.1 | 55.2 | 0.882 | 0.766 | 0.649 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sala, S.; Amendola, A.; Leva, S.; Mussetta, M.; Niccolai, A.; Ogliari, E.
Comparison of Data-Driven Techniques for Nowcasting Applied to an Industrial-Scale Photovoltaic Plant. *Energies* **2019**, *12*, 4520.
https://doi.org/10.3390/en12234520

**AMA Style**

Sala S, Amendola A, Leva S, Mussetta M, Niccolai A, Ogliari E.
Comparison of Data-Driven Techniques for Nowcasting Applied to an Industrial-Scale Photovoltaic Plant. *Energies*. 2019; 12(23):4520.
https://doi.org/10.3390/en12234520

**Chicago/Turabian Style**

Sala, Simone, Alfonso Amendola, Sonia Leva, Marco Mussetta, Alessandro Niccolai, and Emanuele Ogliari.
2019. "Comparison of Data-Driven Techniques for Nowcasting Applied to an Industrial-Scale Photovoltaic Plant" *Energies* 12, no. 23: 4520.
https://doi.org/10.3390/en12234520