In photovoltaic (PV) systems, energy yield is one of the essential pieces of information to the stakeholders (grid operators, maintenance operators, financial units, etc.). The amount of energy produced by a photovoltaic system in a specific time period depends on the weather conditions, including snow and dust, the actual PV modules’ and inverters’ efficiency and balance-of-system losses. The energy yield can be estimated by using empirical models with accurate input data. However, most of the PV systems do not include on-site high-class measurement devices for irradiance and other weather conditions. For this reason, the use of reanalysis-based or satellite-based data is currently of significant interest in the PV community and combining the data with decomposition and transposition irradiance models, the actual Plane-of-Array operating conditions can be determined. In this paper, we are proposing an efficient and accurate approach for PV output energy modelling by combining a new data filtering procedure and fast machine learning algorithm Light Gradient Boosting Machine (LightGBM). The applicability of the procedure is presented on three levels of irradiance data accuracy (low, medium, and high) depending on the source or modelling used. A new filtering algorithm is proposed to exclude erroneous data due to system failures or unreal weather conditions (i.e., shading, partial snow coverage, reflections, soiling deposition, etc.). The cleaned data is then used to train three empirical models and three machine learning approaches, where we emphasize the advantages of the LightGBM. The experiments are carried out on a 17 kW roof-top PV system installed in Ljubljana, Slovenia, in a temperate climate zone.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited