Photovoltaic Energy Production Forecasting in a Short Term Horizon: Comparison between Analytical and Machine Learning Models

Etxegarai, Garazi; Zapirain, Irati; Camblong, Haritza; Ugartemendia, Juanjo; Hernandez, Juan; Curea, Octavian

doi:10.3390/app122312171

Open AccessArticle

Photovoltaic Energy Production Forecasting in a Short Term Horizon: Comparison between Analytical and Machine Learning Models

by

Garazi Etxegarai

^1,2,*,

Irati Zapirain

^1,2,

Haritza Camblong

¹

,

Juanjo Ugartemendia

³,

Juan Hernandez

¹ and

Octavian Curea

²

¹

Department of Systems Engineering and Control, Faculty of Engineering of Gipuzkoa, University of the Basque Country (UPV/EHU), Plaza de Eurpoa 1, 20018 Donostia-San Sebastian, Spain

²

Estia Institute of Technology, University of Bordeaux, F-64210 Bidart, France

³

Department of Electrical Engineering, Faculty of Engineering of Gipuzkoa, University of the Basque Country (UPV/EHU), Plaza de Eurpoa 1, 20018 Donostia-San Sebastian, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(23), 12171; https://doi.org/10.3390/app122312171

Submission received: 14 October 2022 / Revised: 10 November 2022 / Accepted: 16 November 2022 / Published: 28 November 2022

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

A potential application of this work is to use the PV generation prediction model within an EMS, with the aim of increasing the self-consumption ratio and reducing energy consumption as far as possible.

Abstract

The existing trend towards increased penetration of renewable energies in the traditional grid, and the intermittent nature of the weather conditions on which these energy sources depend, make the development of tools for the forecasting of renewable energy production more necessary than ever. Likewise, the prediction of the energy generated in these renewable production plants is key to the implementation of efficient Energy Management Systems (EMS) in buildings. These will aim both to increase the energy efficiency of the building itself, as well as to encourage self-consumption or, where appropriate, collective self-consumption (CSC). This paper presents a comparison between four different models, the former one being an analytical model and the remaining three machine learning (ML) based models. All of them will forecast the photovoltaic (PV) production curve for the next day. In order to validate these models, a case study of a PV system installed on the roof of a university building located in Bidart (France) is proposed. The model that most accurately forecasts the PV production during the period of July 2021 is the support vector regression (SVR), which has a mean R² of 0.934 for July, being 0.97 on sunny days and 0.85 on cloudy ones. This is an improvement of 5.14%, 4.07%, and 4.18% over the nonlinear autoregressive with exogenous inputs (NARX), feedforward neural network (FFNN), and analytical model, respectively.

Keywords:

PV production forecasting; artificial intelligence; machine learning; feedforward neural network; support vector regression; nonlinear autoregressive exogenous; OpenModelica; analytical model

1. Introduction

The energy transition is one of the great challenges of our society at the beginning of this millennium. To this, it must be added that the latest international socio-political events have exacerbated the energy crisis, evidencing the need for an acceleration in this transition. The electrical grid, as it has been known so far, is changing. It is evolving from a centralized to a more decentralized layout [1], where priority is given to the consumption of energy coming from renewable energy sources. It has been shown that a restructuring of the electrical grid into local micro-grids makes it possible for greater integration of the amount of energy from renewable [2,3]. In particular, given its great potential for installation on building roofs, photovoltaic (PV) energy is becoming the most widely extended one [4]. In this context, due to the increasing price of electricity and government subsidies for financing PV installations, these facilities are paying for themselves more quickly than ever, especially in the frame of self-consumption. Self-consumption or collective self-consumption (CSC), as its name suggests, is related to the consumption of local electricity production. In the Spanish state, according to data recorded by the Spanish Photovoltaic Union (UNEF), 1203 MW of new PV capacity was installed in 2021 in self-consumption facilities. This figure represents an increase in 101.8% compared to 2020 when 596 MW were commissioned [5].

This research study has been developed in the framework of the EKATE project, an InterregPoctefa type program. EKATE is a project for PV electricity management and CSC in the France-Spain cross-border area, using Blockchain and Internet of Things (IoT) technologies [6]. One of the pilot projects being developed within the framework of EKATE takes place in the Izarbel technology park in Bidart (France), involving the buildings of ESTIA Technology Institute. This Izarbel pilot project aims to implement innovative energy management in buildings in a CSC operation. In order to maximize the self-consumption rate and, as far as possible, energy efficiency, two types of energy management systems (EMS) are being designed and developed to be applied in the ESTIA2 building. To achieve the aforementioned objectives, these EMSs act on flexible loads (FL). What is known as demand side management (DSM) or demand response (DR)? For that, two types of FL have been considered, the Heating, Ventilation, and Air Conditioning (HVAC) system and the energy consumption behavior of the building users. Regarding the two EMSs designed:

(1) The first one is based on simple logic rules that act on the ON/OFF status of the internal HVAC units and/or on the temperature setpoints of the system and also influences the ESTIA2 user’s behavior in real-time according to the instantaneous surplus of PV energy.

(2) The second one is an intelligent EMS based on predictive models of ESTIA2 consumption and ESTIA1 PV production, a thermal model of ESTIA2 and HVAC, and an optimization algorithm.

This work complements a previously developed one, as shown in [7], where three prediction models based on artificial intelligence (AI) are developed to predict the consumption of the ESTIA2 building.

The prediction of energy generation from renewable sources is not a new challenge. Different strategies can be found in the literature. Some works choose to predict the meteorological variables that influence energy production, such as solar radiation, temperature, or the clear sky index [8,9]. Afterward, in some cases, they use equations to calculate the corresponding energy generation. Whilst other works propose to directly predict the PV generation [10,11].

Another possible criterion for classifying prediction techniques is according to the type of model used. They are generally divided into three types: physical or analytical methods, recurrent methods, and AI-based methods. A physical method describes the atmospheric dynamics and physical states by a set of mathematical equations. This kind of model was the former one used for the forecasting of meteorological variables and PV generation. They are more trustworthy in the long-term forecast when weather conditions are more stable, as they do not behave well in the face of sudden changes. A drawback of physical techniques is that a thorough knowledge of physics is essential. Among the most know methods are numerical weather prediction (NWP) [12] and sky imagery [13]. As for recurrent methods, they have been the most widely used for time series forecasting for several years. Nevertheless, recurrent models can show problems in dealing with the non-linearity and seasonality properties [14]. Finally, AI methods, contrary to recurrent methods, are indeed able to handle non-linear problems; thanks to this ability and to learning algorithms, AI methods can provide precise predictions and react to sudden meteorological changes. The researchers of [15] provide an extensive review of AI-based solar energy prediction. They conclude that the most frequently applied techniques are artificial neural networks (ANN) in first place, followed by support vector machine (SVM) techniques.

Finally, another term by which PV generation predictors are also divided is by time horizon. The time horizon is the amount of time to forecast. In the literature, they are usually divided into three categories [16]; (1) Short-term forecasting is considered to be between a few minutes and a week. Its function is to schedule energy transfer, demand response, and economic dispatch of load [17]. (2) Medium-term forecasting is typically between 1 month and one year and is used to plan the next energy plans. (3) Long-term forecasting is usually considered when it is longer than one year, usually to plan the power plant to meet future needs and cost efficiency [18].

This article presents four short-term PV generation, prediction models. One of them is an analytical model, and the other three are AI-based models. The overall objective of this paper is to design and compare different prediction models, with the goal of obtaining an accurate model that can be integrated into an EMS. A detailed explanation of how the models have been developed is given. The results are analyzed in different ways. On the one hand, the general behavior of the four models during one month is studied and compared to real data, and on the other hand, the analysis of the behavior of the models for different types of days, differentiating between sunny and cloudy days, is presented.

The remaining sections of the current article are organized as follows: Section 2 describes the case study for which the four PV energy generation forecasters have been implemented. Afterward, Section 3 explains the theory behind the techniques used to build the models. The development of the four models is described in Section 4. Section 5 presents the results obtained. Finally, Section 6 discusses the obtained conclusions.

2. Case Study

This next section describes the real case study in which the models will be implemented. The first sub-section describes the characteristics of the site and the buildings that take part in the CSC. Furthermore, the second sub-section presents the features and the source of the data used in the development of the four models.

2.1. Izarbel Pilot Project Description

As previously mentioned, this pilot project has been carried out at the Izarbel technology park in Bidart, France. The PV energy CSC demonstrator is composed of three buildings, which are managed by the ESTIA Institute of Technology. Figure 1 shows the buildings participating in the CSC.

Within the framework of the EKATE project, a total of 286 kWp of PV panels will be installed in these three buildings. Initially, the circular part of the ESTIA1 building will be equipped with an installation of 117.17 kWp. The rest of the PV installation will be carried out at a later stage in the ESTIA2 and ESTIA4 buildings. It should be noted that the 117.17 kWp installation has not yet been completed. However, data from a 2004 PV installation of 5.6 kWp capacity have been used to develop the PV generation, prediction models. The 5.6 kWp installation is not part of the CSC and is located on the circular part of the roof of the ESTIA1 building, with a southeast orientation and a slope of 20%.

2.2. Used Data Set

PV production data are collected via a Linky smart meter and are stored on a server in the cloud. These data started to be registered in April 2021 and are recorded every 30 min, i.e., 48 data sets per day. Figure 2 shows the PV production for 15 days in June. The production pattern is as expected, a null generation at night and a bell-shaped production during the day, where the maximum generation occurs at noon. Figure 2 shows two types of days; sunny days, where the production is more constant, as seen in the characteristic bell shape, and more cloudy days, where the production is more irregular and saw-toothed. July PV production data have been forecasted for all four models.

Some meteorological data related to PV production have also been used in the development of the four models: irradiation, temperature, wind speed, and wind direction. These data have been obtained from the Météo France weather station. The closest station to the PV installation is the one located at Biarritz airport, approximately 3 km from the ESTIA1 building. The downloaded historical measured data have been recorded every hour, i.e., 24 data sets per day.

That one of the most important matters when developing data-based models is data quality, and thus, data pre-processing, all the data have been analyzed, and various filters have been applied to the collected database. Firstly, all data—PV production and meteorological data—have been analyzed to detect and repair possible outliers by interpolation. In case of duplicate data, these have been removed, and missing point data have been repaired by interpolation. Concerning solar irradiance, as mentioned above, the station that collects these data is located at an airport, so powerful light sources that are used near the station can alter the measured values of solar irradiance, especially at night when the lights are in operation. In order to remedy this potential problem, the sunrise and sunset times have been checked to determine a night time slot in which all solar irradiance values are replaced by zero. This range for the month of July has been set between 22:00 PM and 06:30 AM. Moreover, it is worth mentioning that once the prediction is made, in the post-processing of the data, the same filter is applied.

Furthermore, since wind speed reduces the temperature of the PV cell, it has been proposed to use two different vectors representing the wind. On the one hand, the wind coming from all directions will be used. On the other hand, it has been proposed to consider only the wind that directly hits the PV panels. Figure 3 shows how the panels are oriented to the southeast, at 145° with respect to the north. As mentioned above, the roof on which the panels are located is slanted, so the wind coming from the back of the panels does not affect it. Therefore, by means of a weighting system, the wind coming from between 55° and 235° has been multiplied by 1, and the wind coming from the rest of the angles has been multiplied by 0. The direction-weighted wind speed has been used in the AI models.

Finally, all data have been set with the same sampling time. It has been decided to make the predictions every 30 min. Therefore, the hourly recorded meteorological data have been interpolated to obtain data every 30 min. Furthermore, all data have been normalized between the range 0 and 1. Normalization helps in the training period of the models. Indeed, if the range of values to be used was very different, a small learning rate value would be used, increasing the training time. On the other hand, if the range of all values is the same, the model can use an appropriate learning rate for all data and therefore reduce the training time.

3. PV Generation Forecaster Models

This section presents the theoretical fundaments of the models that have been developed to predict PV generation. The first described model is an analytical model developed in the OpenModelica software. The other three models are AI techniques, namely a feedforward neural network (FFNN), a nonlinear autoregressive with exogenous inputs (NARX) neural network, and a support vector regression (SVR).

3.1. Analytical Model

Analytical models are mathematical models that can be applied to address various working conditions, thanks to some assumptions that are made about the way a process evolves. The strength of the analytical model is that it provides a generic way of obtaining results for various conditions using a mathematical formulation. One of the disadvantages of analytical models is that they are often very difficult to obtain mathematically. Therefore, the accuracy of the model will depend on the validity of the assumptions made during the mathematical formulation. Analytical models can be further classified as static or dynamic. A static model represents properties of a system that are independent of time or true for any point in time. A dynamic model is an analytical model that represents the time-varying state of the system, such as its acceleration, velocity, and position as a function of time [19].

Modelica language has been chosen to implement the PV generation model. Modelica aims at acausal (non-causal) modeling of systems involving various physical domains by expressing them in the form of ordinary differential and algebraic equations [20]. The model has been developed with the open-source software OpenModelica.

3.2. Artificial Intelligence Models

3.2.1. Feed Forward Neural Network

An FFNN is the simplest model of an ANN. Therefore, it also represents the definition of the ANN, and its main characteristic is the neurons. Moreover, as can be seen in Figure 4, an FFNN is composed of three layers, an input layer, a hidden layer, and an output layer. These denominated layers can have a different number of neurons. The number of neurons in the input and output layers are the same as the input and output data, respectively. On the contrary, the number of neurons in the hidden layer has to be adjusted depending on the complexity of the problem to be solved in order to achieve the most accurate prediction possible.

These neurons are connected by adjustable weights. The initial weights, together with the biases, are modified during the learning process to minimize the cost function of the network. The mathematical expression of a cell is as follows [21]:

u_{k} = \sum_{j = 1}^{n} w_{k j} \cdot x_{j} + b_{k},

(1)

y_{k} = φ (u_{k})

(2)

where, x_j is the input vector of the cell,

w_{k j}

is the weights matrix,

b_{k}

is the bias vector,

u_{k}

is the output before the activation function, φ is the activation function and

y_{k}

is the output of the cell. k is the number of the cell of the hidden layer, and n is the number of inputs. More information about FFNNs can be found in [22].

3.2.2. Non-Linear Autoregressive with Exogenous Input Neural Network

NARX neural networks are considered a type of recurrent neural network (RNN) that have been widely used in the literature for time series prediction due to, on the one hand, their easy implementation and, on the other hand, their fast-training procedures. As can be seen in Figure 5, the predictions carried out by dynamic neural networks, such as NARX, are driven by the historical input-output pairs, as well as by the previous states of the network, that is to say, by the input and feedback delays.

In Figure 5, X and Y represent the input and outputs vectors, respectively,

W_{i}^{h 1}

is the weight matrix,

b_{h i}

is the bias, and finally, the TDL block represents the tapped-delay lines, that is to say, the number of time delay steps applied to the input and the feedback (output).

NARX neural networks are based on a Multi-Layer Perceptron (MLP) structure, which consists of an input layer, hidden layer, and output layer that are connected by adjustable weights, and the neurons that compose the hidden and output layer are associated with bias values [24]. The weights and biases are adjusted during the training process of the network, aiming to achieve their optimal values and make the best approach between the input and output of the network.

In each layer, each neuron carries out a scalar multiplication of the input vector

x_{j}

and the weight matrix

w_{i j}

Likewise, the activation function (

φ

) is added, obtaining the following equation in the output of each neuron:

y_{i} = φ (\sum_{j = 1}^{n} x_{j} * w_{i j})

(3)

The activation function that is chosen for each layer may change depending on the application in which the neural network is used. Usually, the activation function applied in the input and hidden layer is the sigmoid, and the one used in the output layer is the linear function.

The following equation shows the input-output relationship using a NARX:

\hat{y^{t + 1}} = f (x_{1}^{t}, x_{1}^{t - 1}, x_{1}^{t - 2}, \dots, x_{P}^{t - D_{x p}}, y^{t}, y^{t - 1}, y^{t - 2}, \dots, y^{t - D_{y}}),

(4)

where

y^{t + 1}

is the future value of the target variable, p is the total exogenous inputs,

D_{x p}

is the time lag of each exogenous input

x_{p}

and

D_{y}

is the time lag of the historical targeted values

(y^{t}, y^{t - 1}, y^{t - 2}, \dots, y^{t - D_{y}})

.

The number of time delay steps of the output,

D_{y}

, is the one that gives recurrence to the NARX, in contrast to the structures of other RNNs, in which the recurrence is given by the internal state of the network [23]. Additionally, f is the non-linear mapping function performed by the MLP. MLP is a powerful structure very appropriate for learning any kind of nonlinear mapping [25].

In the first place, the training of the network is performed in open-loop (series-parallel) architecture (see Figure 6a) whenever the historical data of the targeted output are available. The network can be trained using different types of training algorithms, with the Levenberg-Marquardt backpropagation algorithm being the most widely used. After the training is concluded, the loop is closed (see Figure 6b), and this time backs, signaled estimations are introduced to the model. In the second phase, also called simulation or operational mode, the step-ahead forecasting is carried out.

3.2.3. Support Vector Regression

SVM is a supervised machine learning method for function estimation [26]. SVM is mostly used in classification problems but is suitable for regression tasks as well. A version of SVM for regression is referred to as SVR.

Suppose that we are given training data

\{(x_{1}, y_{1}), ..., (x_{l}, y_{l})\}

, where

l

is the number of samples in the training set,

x_{i} \in R^{n}

is an input vector, and,

y_{i} \in R

is the corresponding target value. In SVR, the basic idea is to map the

x

into a high dimensional feature space

F

via a non-linear mapping function and do linear regression in this space. Thus, the linear regression in a high dimensional (feature) space corresponds to non-linear regression in a low dimensional input space [27]. SVR approximates the regression function as follows:

f (x) = 〈 ω, Φ (x) 〉 + b,

(5)

where 〈

\cdot, \cdot

〉 denotes the dot product in

F

,

ω

is a vector of weight coefficients,

Φ (x)

is the non-linear mapping function, and

b

denotes the bias constant. The common formulation of SVR is Vapnik’s

ε

-SVR.

In

ε

-SVR the goal is to find the

f (x)

that has at most

ε

deviation from the actual targets

y_{i}

for all the training data. We can write this problem as a convex optimization problem where the coefficients

ω

and

b

can be obtained by the following formula:

minimise : \frac{1}{2} {∥ ω ∥}^{2} + C \sum_{i = 1}^{l} ξ_{i} + ξ_{i}^{*}, subject to : \{\begin{matrix} y_{i} - ω, Φ (x_{i}) - b \leq ε + ξ_{i} \\ ω, Φ (x_{i}) + b - y_{i} \leq ε + ξ_{i}^{*} \\ ξ_{i}, ξ_{i}^{*} \geq 0 \end{matrix},

(6)

where

ξ_{i}

and

ξ_{i}^{*}

are slack variables, the constant C determines the amount up to which deviations larger than

ε

are tolerated, and

ε

is the margin of tolerance. This optimization problem (6) is a quadratic programming type, and in most cases, the problem can be solved more easily in its dual formulation [28]. In the case of

ε

-SVR the support vectors are training samples that lie on the

ε

-tube bounding decision surface, as illustrated in Figure 7.

As noted in the previous definitions, the algorithm only depends on dot products between vectors

x_{i}

. Because of this is enough to know

k (x, x) = 〈 Φ (x), Φ (x ’) 〉

rather than

Φ

explicitly.

k (x, x^{'})

is known as the kernel function of the SVR model. The radial basis function (RBF) kernel is used in this study, expressed as:

k (x, x^{'}) = e^{- γ ∥ x - x^{'} ∥^{2}},

(7)

where

γ

defines the influence of the support vectors selected by the model.

To summarize the models, the following lines present the advantages and disadvantages of each of them.

Starting with the analytical model, as mentioned above, one of the main disadvantages of the model is that it is difficult to obtain the mathematical definition that describes the system to be modeled. For the same reason, it is necessary to have a high level of knowledge of the physical functioning of the system to be simulated. On the other hand, at the same time, by having a high level of knowledge, it is more likely that all the aspects involved in the system will be considered. Furthermore, obtaining a generic model that provides good results under different conditions.

As for the ML-type models, the main characteristic that differentiates them from the analytical model is that it is not necessary to have a detailed knowledge of the operation of the system to be modeled.

One of the major advantages of SVR is that it can obtain good results with a small dataset [30], so it is not necessary to have a large training set. As for disadvantages, SVRs have a high dependency on hyper-parameters, and the selection of parameters determines the prediction effect of the model. Therefore, the model must be regularly adjusted to fit the characteristics of the input data to maintain a good generalization. Consequently, it requires a lot of adjustment time.

Next, the FFNN and NARX models (both ANN) have the advantage that they are very simple models, so they are very easy to implement. In addition, the NARX model presents ease of learning when the system has very large non-linearities, and it is especially efficient at predicting time-series systems. However, the vanishing and exploding gradient problem appear in the vast majority of RNNs, and NARX is no exception. This problem can be clearly seen when the information of past inputs must be recovered. Because of the vanishing problem of the gradient, the weights are less and less updated, and this causes a limitation of memory capacity. Anyway, several solutions have already been applied to avoid this problem [23].

Finally, the FFNN model being the simplest ANN model, its design and implementation are easy since very few hyper-parameters need to be adjusted. For this reason, it is easy to obtain a general model that provides good results with little adjustment time. In contrast, it presents some difficulties when dealing with problems with large non-linearities and complex systems.

3.3. Error Metrics

In order to understand and evaluate the operation of any model, it is necessary to establish certain metrics to calculate the error of the model’s results. In this work, the developed models are assessed using three different metrics, which are widely used in the literature for accuracy evaluation purposes.

So, the models designed to forecast the day-ahead PV production curve are assessed by calculation of MAE (Mean Absolute Error), RMSE (Root Mean Square Error), and R² (coefficient of determination).

All three metrics show great potential for comparing the operation of different models, which is of vital importance in this work.

The equations for each of them are shown below [31].

M A E = \frac{1}{N} \sum_{i = 1}^{N} |y_{j} - t_{j}|,

(8)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{j} - t_{j})}^{2}},

(9)

R^{2} = {[\frac{\sum_{i = 1}^{N} [(y_{j} - \bar{y_{j}}) * (t_{j} - \bar{t_{j}})]}{\sqrt{\sum_{i = 1}^{N} {(y_{j} - \bar{y_{j}})}^{2}} * \sqrt{\sum_{i = 1}^{N} {(t_{j} - \bar{t_{j}})}^{2}}}]}^{2},

(10)

where

y_{j}

and

t_{j}

are measured and predicted values, respectively, in this case of the PV production and

\bar{y_{j}}

and

\bar{t_{j}}

the mean values of both. N represents the number of the used test samples.

With the calculation of MAE, the uniform forecast error of the model results is evaluated. The RMSE calculates the general accuracy of the model. Large deviation errors are the ones more desirable to identify, and in this case, RMSE offers robustness in dealing with this kind of error. While both mentioned metrics can range in value from 0 to infinity, the coefficient of determination, R², takes values between 0 and 1 so that the assessment of model prediction accuracy becomes more intuitive. R² reflects the goodness of fit of a model to the variable it seeks to explain.

4. Forecasters Development

4.1. Development of the Analytical Model

The following section presents the development of the analytical model. As mentioned above, the open-source software OpenModelicahas been used to implement it. The libraries used were PVSystems and PhotoVoltaics [32]. Figure 8 shows the ESTIA1 building PV installation as a block diagram. The model has the classical components of a PV system; a DC/DC converter, the PV panels, an MPPT tracker, and a voltage source representing the grid side.

In order to characterize the PV cells, the most common equivalent circuit has been used, which is the single diode model, shown in Figure 9. The cells are gathered together to create PV modules, which are then connected in series or in parallel to form a PV installation. From the equivalent circuit, the following mathematical equations are obtained, which characterize the PV cell:

i = I_{p h} - I_{d} - I_{r},

(11)

I_{p h} = \frac{(I_{S C} + K_{i} \cdot (T - T_{n})) \cdot G}{G_{n}},

(12)

I_{d} = I_{0} \cdot \{\exp (\frac{v - R_{s} \cdot i}{α \cdot V_{t}}) - 1\},

(13)

I_{r} = \frac{v - R_{s} \cdot i}{R_{s h}},

(14)

V_{t} = \frac{N_{S} \cdot k \cdot T}{q},

(15)

I_{0} = \frac{I_{S C} + K_{i} \cdot (T - T_{n})}{\exp (\frac{V_{o c} + K_{v} \cdot (T - T n)}{α \cdot V_{t}}) - 1},

(16)

where, I and V are the current and voltage,

I_{p h}

addresses the cell photo-generated current,

I_{d}

correspond to the diode current,

I_{r}

is the resistance current,

V_{t}

equals to the thermal voltage,

I_{s c}

and

V_{o c}

are the short circuit current and open circuit voltage, respectively,

N_{s}

are the cells connected in series,

k

is the Boltzmann’s constant,

q

is the electronic charge,

K_{i}

and

K_{v}

are the

I_{s c}

and

V_{o c}

temperature coefficient respectively, α is the diode ideality constant, and finally,

R_{s}

and

R_{s h}

are the equivalent series resistance and equivalent parallel resistance, respectively.

Most of the parameters that appear in Equations (11)–(16) are provided by the PV panels supplier in the datasheet, however, some of them are rarely facilitated. This is the case of;

I_{p h}

,

I_{0}

,

R_{s}

,

R_{s h}

and α. In order to obtain a model as accurate as possible, the steps proposed in work [33] have been followed, in which a simple method to extract the parameters of the single diode model of a PV system is developed.

Finally, this model uses the measured historical data of solar irradiance, ambient temperature, and wind speed to obtain the results.

4.2. Development of the Artificial Intelligence Models

For the development of the AI-based models, the methodology summarised in Figure 10 has been followed. It is divided into two main stages. Firstly, in step 1, once the data have been properly pre-processed, as explained in Section 2 (which is represented by step 0), different tests are carried out to determine which data best characterize the output and, therefore, which ones will be used as inputs for the networks. Secondly, once the input vector is defined, the hyper-parameters of each model are adjusted in order to obtain a network that best fits the problem to be solved.

4.2.1. Data Analysis and Input Parameter Selection

Given that these models learn from the data provided to them, it is very important to use input data that characterize the output, i.e., PV production. It is also important that these data are of good quality. Therefore, using the four different types of data available (solar irradiance, temperature, wind speed, and direction-weighted wind speed), different combinations have been tested to see which of them best characterizes PV production.

These tests have been carried out on the three AI models: FFNN, NARX, and SVR. Furthermore, it is noteworthy to mention that the FFNN and SVR models are non-recurrent models. Hence, as concluded in previous work [34], adding a time vector to these models as an extra input helps significantly in the performance. Therefore, in all tests performed and listed in Table 1, it should be noted that the FFNN and SVR models have a time vector as an additional input.

By looking at the results in Table 1, it is concluded that in all three models, the same behavior occurs, obtaining better results when only solar irradiance is used. Thus, solar irradiance will be used as an input vector in the AI models.

4.2.2. Models Hyper-Parameters Adjustment

Once the inputs of the models have been selected, the next step is to adjust the hyper-parameters that each model has. The hyper-parameters are variables that describe the models and determine how the training process will be. They are adjusted depending on the characteristics of the problem to be solved to avoid creating an oversized model. A good adjustment of these hyper-parameters prevents problems such as overfitting and helps to achieve better behavior. Finally, although some of the hyper-parameters are common, each model has different hyper-parameters since each one has its own design and functionality.

Finally, the same training conditions have been taken into account in all three AI models. Furthermore, as concluded in previous work [34], the use of a Time Window (TW) of 10 days is appropriate. It should also be noted that all three AI models train (using the previous 10 days) a new model to predict the PV production of each day during the month of July.

Hyper-Parameters Adjustment of FFNN

For the case of the FFNN model, being one of the simplest ANNs, the hyper-parameter to be adjusted is the number of neurons in the hidden layer. In order to determine which number best fits our problem, tests have been carried out with values from 2 neurons to 20 neurons, increasing the number gradually. When observing the results shown in Table 2, it can be seen that the highest R² is obtained with a value of 5 neurons. As for the rest of the metrics, the lower MAE is also achieved with five neurons. In addition to looking for the highest R² metric, the performance has also been analyzed throughout the training process, ensuring that the network has no overfitting and verifying that the validation errors are slightly higher than the training errors.

Hyper-Parameters Adjustment of NARX

The hyper-parameters that have been adjusted in order to improve as much as possible the accuracy of the forecasting performed by the NARX model are: (i) the number of neurons in the single hidden layer and (ii) the input and feedback delays.

Regarding the number of neurons selected in the hidden layer, it has been observed that increasing the number of neurons not only did not improve the accuracy with which the NARX model made the prediction but also the phenomenon of overfitting appeared. In order to select the optimum number of neurons, the daily prediction for the month of July has carried out with 1, 2, 3, 4, 5, and 10 neurons. It was found that the best results were obtained with 1 and 3 neurons, obtaining a significant improvement in the R² metric.

In addition, in the adjustment of both delays, as can be seen in Table 3, the predictions have been carried out with different combinations of the delays in the input and feedback. The delay numbers in the input have been limited in order to avoid representing the dynamics between irradiance (input) and PV production (output), considering the physical relationship between these two variables. So, delays of 2 and above are not taken into account.

Obviously, many more simulations have been performed than those shown in the table above. However, it is the results gathered in Table 3 that allow us to draw useful conclusions regarding the optimal value that the hyper-parameters of the NARX model should have.

Therefore, looking at the results obtained, it can be concluded that the model that better forecasts the PV production is obtained by adding three neurons in the hidden layer and setting 1-time delay step in the input and 2 in the feedback.

Hyper-Parameters Adjustment of SVR

ε

-SVR with RBF kernel has three hyper-parameters that can be adjusted:

C

,

ε

, and

γ

.

C

is the regularization parameter,

ε

determines the margin of the bounding decision, and

γ

determines so far, the influence of a single training example reaches. If

C

has a large value, the decision function is better at estimating training points, and if

γ

is too large, the influence of support vectors is more localized. In both cases, overfitting will happen, and as a result, a trained model with bad generalization capabilities will be obtained.

The selection of each hyper-parameter was made through Bayesian optimization, an automatic search algorithm based on the Gaussian process. Bayesian optimization has become a successful tool for hyper-parameter tuning that can achieve great forecast accuracy in a few samples [35]. Cross-validation (CV) is used as a validation method during the optimization. CV is a data re-sampling method to assess the generalization ability of predictive models and to prevent overfitting [36] because time series data are used to train the model, the CV was employed on a rolling basis.

The Bayesian optimization was made with a 10-day TW for training, using a CV to measure the performance of each hyper-parameter combination and the model hyper-parameters

(C, ε, γ)

were sampled using a log-uniform distribution as specified in Table 4. The results from the hyper-parameter optimization are shown in Table 5.

5. Numeric Results and Discussion

The following section presents the results obtained with the four models developed for the prediction of PV generation. The results are presented in different ways; first, the averages of the error metrics for the month of July are shown. Then, the results are analyzed in a graphical form. Afterward, the difference in the behavior of the models depending on whether the day is sunny or cloudy is shown. Finally, the percentage errors of the models are calculated.

Table 6 shows the average of the metrics for the entire month of July. If results are analyzed in terms of the R² metric, the SVR model clearly stands out from the others, with a value of 0.93. Even so, it is worth mentioning that the R² results obtained with the other three models are actually good. Another outstanding performance is the MAE and RMSE values obtained by the analytical model, which are twice as high as those obtained with the rest of the models.

Figure 11, Figure 12, Figure 13 and Figure 14 below show the predictions of PV generation during a week in July. In general, all four models are able to detect the PV generation pattern, differentiating between night and midday hours. As mentioned, all four models perform well, and this is also reflected in the figures. However, it is worth mentioning that in some aspects, models behave differently. The FFNN, NARX, and SVR models are able to reach the maximum generation peaks. Nevertheless, the FFNN model presents some complications on 22 and 23 July, which are more cloudy days. In this aspect, the SVR and NARX models show a greater ability to cope with sudden peaks. The NARX model particularly stands out on cloudy days. This model shows good behavior in dealing with constant changes. Nevertheless, it should be mentioned that at the beginning of the generation day, the NARX technique forecasts a little production when in reality, it is lower. Seeing how well the NARX model performs against cloudy days, we can conclude that its lower R² value compared to the others may be due to the fact that most of the days in the month of July are sunny days. Finally, the analytical model has difficulties in reaching the maximum generation peaks on each day, especially on the 23rd and 25th of July. In addition, the simulation generates a small lag in energy production. These reasons may be the explanation for the higher MAE and RMSE metrics obtained by the analytical model compared to the other models.

Afterward, continuing with the analysis of the behavior of the models according to the type of day, Table 7 shows the R² values for five sunny days and five cloudy days. It is clearly noticeable that on sunny days the R² is around 0.95, whereas when the day is cloudy, regardless of the model, the R² values are around 0.8. In this case, also, the SVR model stands out for obtaining the best results.

Finally, in order to analyze what happens on each day of July on a more detailed basis, the error made in the prediction of each day has been calculated. Since the results presented in Table 6 may not be very detailed due to the fact that a punctual error of one day can ruin the average of the month. For that reason, Figure 15 shows in a pie diagram form the daily relative error made by ranges. Summarising Figure 15, we can see how, as on previous occasions, the SVR model stands out from the others; in 58% of the cases, it commits an error of less than 4%. In the case of the FFNN model, this occurs in 52% of the cases. In the NARX model, 48%, and, finally, with the analytical model, it is in 26% of the cases that errors of less than 4% occur. As for the maximum errors, it is the analytical model that obtains the highest errors, 32%, compared to 19%, 16%, and 13% for the NARX, FFNN, and SVR models, respectively. Table 8 summarises the mean errors for each model.

6. Conclusions

This paper presents the development of four models to predict PV production, an analytical model developed in the open-source software OpenModelica and three AI models: an FFNN, a NARX, and an SVR. All four models are designed to predict the production of the next 24 h, with a time interval of 30 min. Predicting PV production is not an easy task due to the high dependence on solar irradiance. This work uses historically measured data to carry out the prediction.

Firstly, one of the conclusions to be highlighted is that in the framework considered, the highest prediction accuracy is obtained with the SVR model. This model obtains an average R² of 0.934 for the July forecast. Thus, in this case, study, the SVR model performs 4.07% better than the FFNN model, 5.12% better than the NARX model, and 4.18% better than the analytical model.

Regardless of the technique used, it is concluded that forecasting on sunny days performs better than forecasting on cloudy days. This is related to the fact that PV production on sunny days is more constant, giving the characteristic bell shape of an ideal PV system production. In contrast, the forecast for cloudy days needs to cope with sudden changes.

With regard to the relative error committed on each day of July, this has been calculated in order to analyze in more detail, which is the behavior of each model for every day of the predicted month. The best performance is obtained with the model SVR, which in 58% of the cases it, commits an error of less than 4%. In comparison with the rest of the models, the SVR model performs 13.76%, 23.06%, and 83.3% better than the FFNN, NARX, and the analytical model, respectively.

Another conclusion has been drawn from the development of the different models. In terms of the development process, the analytical model requires a higher level of knowledge about how a PV installation works. Furthermore, when implementing changes in the PV installation, such as an increase in installed capacity or the degradation of the PV panels, it is more complex to implement them in the analytical model. These changes have to be manually included so that they can represent the reality of the PV installation as accurately as possible. In contrast, for AI-based models, it would only be necessary to re-train the models with the new data.

One of the changes applied in this work, compared to previous work [34], is the use of a filter that sets the solar irradiance values between sunset and sunrise to 0. Applying this step to the pre-processing and post-processing of the data has helped to improve the behavior of the four models during the night. It should also be noted that a good pre-processing of the data is as important as the design of the model itself.

Regarding future work, a new comparison of the different models will be carried out considering predicted meteorological data instead of historical measured data. It can be thought that the predicted data, for instance, solar irradiance, will have much more influence on the results than, for example, the type of model used. Therefore, meteorological forecast data provided by different meteorological agencies will be considered, and their effect on the prediction will be analyzed. Finally, the best prediction model that uses the best meteorological forecast data will be implemented in the Izarbel EMS in order to operate in real time.

Author Contributions

Conceptualization, G.E., I.Z. and H.C.; Data curation, G.E., I.Z., J.H. and O.C.; Formal analysis, G.E., I.Z., H.C., J.U., J.H. and O.C.; Funding acquisition, H.C.; Investigation, G.E., I.Z., H.C. and J.H.; Methodology, G.E., I.Z., H.C. and J.H.; Project administration, H.C.; Resources, G.E., I.Z., H.C., J.U., J.H. and O.C.; Software, G.E., I.Z., J.U. and J.H.; Supervision, H.C. and O.C.; Validation, H.C., J.U. and O.C.; Visualization, G.E., I.Z. and H.C.; Writing—original draft, G.E., I.Z. and J.H.; Writing—review & editing, H.C., J.U. and O.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research study carried out in the frame of the EKATE project has been supported by the FEDER Interreg POCTEFA program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request due to restriction. The data presented in this study are available on request from the corresponding author.

Acknowledgments

The historical meteorological data used in this work has been provided by Météo France.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Van Nuffel, L.; Mihov, M. National Strategies for Renewables: Energy Efficiency, Building Renovation and Self-Consumption: Workshop Proceedings. European Parliament, Directorate-General for Internal Policies of the Union. 2018. Available online: https://data.europa.eu/doi/10.2861/402958 (accessed on 11 June 2022).
Matthieu, S.; Dhaker, A.; Kahina, H.O.; Antoine, L.; Benoît, R. Distributed optimization of energy profiles to improve photovoltaic self-consumption on a local energy community. Simul. Model. Pract. Theory 2021, 108, 102242. [Google Scholar] [CrossRef]
Heydar, C.; Salah, B.; Ghasem, D. Day-ahead scheduling problem of smart micro-grid with high penetration of wind energy and demand side management strategies. Sustain. Energy Technol. Assess. 2020, 40, 100747. [Google Scholar] [CrossRef]
Hannes, K.; Stefan, L.; Sebastian, E.; Martin, H. Assessing the Potential of Rooftop Photovoltaics by Processing High-Resolution Irradiation Data, as Applied to Giessen, Germany. Energies 2022, 15, 6991. [Google Scholar] [CrossRef]
Spanish Photovoltaic Union. The Photovoltaic Self-Consumption Installed in Spain Grew by More Than 100% in 2021. Available online: https://www.unef.es/es/comunicacion/comunicacion-post/el-autoconsumo-fotovoltaico-instalado-en-espana-crecio-mas-del-100-en-2021 (accessed on 24 August 2022).
EKATE. Electricity Management in Collective Photovoltaic Self-Consumption in the France/Spain Cross-Border Area, with “Blockchain” and “Internet of Things” (IoT) Technologies. Available online: https://www.ekate.eu/es/bienvenida/ (accessed on 24 August 2022).
Irati, Z.; Garazi, E.; Juan, H.; Zina, B.; Naiara, A.; Haritza, C. Short-term electricity consumption forecasting with NARX, LSTM, and SVR for a single building: Small data set approach. Energy Sources Part A Recovery Util. Environ. Eff. 2022, 44, 6898–6908. [Google Scholar] [CrossRef]
Zibo, D.; Dazhi, Y.; Thomas, R.; Wilfred, M.W. Satellite image analysis and a hybrid ESSS/ANN model to forecast solar irradiance in the tropics. Energy Convers. Manag. 2014, 79, 66–73. [Google Scholar] [CrossRef]
Bixuan, G.; Xiaoqiao, H.; Junsheng, S.; Yonghang, T.; Jun, Z. Hourly forecasting of solar irradiance based on CEEMDAN and multi-strategy CNN-LSTM neural networks. Renew. Energy 2020, 162, 1665–1683. [Google Scholar] [CrossRef]
Souhaila, C.; Mohamed, M. Principal Component Analysis and Machine Learning Approaches for Photovoltaic Power Prediction: A Comparative Study. Appl. Sci. 2021, 11, 7943. [Google Scholar] [CrossRef]
Su-Chang, L.; Jun-Ho, H.; Seok-Hoon, H.; Chul-Young, P.; Jong-Chan, K. Solar Power Forecasting Using CNN-LSTM Hybrid Model. Energies 2022, 15, 8233. [Google Scholar] [CrossRef]
Tiwari, S.; Sabzehgar, R.; Rasoli, M. Short termsolar irradiance forecast using numerical weather prediction (NWP) with gradient boost regression. In Proceedings of the Name of the Conference 9th IEEE International Symposium on Power Electronics for Distributed Generation Systems, Charlotte, NC, USA, 25–28 June 2018. [Google Scholar] [CrossRef]
Caldas, M.; Alonso-Suárez, R. Very short-term solar irradiance forecast using all-sky imaging and real-time irradiance measurements. Renew. Energy 2019, 143, 1643–1658. [Google Scholar] [CrossRef]
Aylin, D.A.; Bahar, D.; Birol, K. Prediction of Photovoltaic Panel Power Outputs Using Time Series and Artificial Neural Network Method. J. Tekirdag Agric. Fac. 2021, 18, 457–469. [Google Scholar]
Huaizhi, W.; Yangyang, L.; Bin, Z.; Canbing, L.; Guangzhong, C.; Nikolai, V.; Evgeny, B. Taxonomy research of artificial intelligence for deterministic solar power forecasting. Energy Convers. Manag. 2020, 214, 112909. [Google Scholar] [CrossRef]
Héctor Felipe, M.R.; Miguel Ángel, G.R.; Valentín, C.P.; Victor, A.G.; Alberto, R.P.; Ranganai, T.M.; Luis, H.C. Applications of Artificial Intelligence to Photovoltaic Systems: A Review. Appl. Sci. 2022, 12, 10056. [Google Scholar] [CrossRef]
Wen-Chi, K.; Chiun-Hsun, C.; Sih-Yu, C.; Chi-Chuan, W. Deep Learning Neural Networks for Short-Term PV Power Forecasting via Sky Image Method. Energies 2022, 15, 4779. [Google Scholar] [CrossRef]
Elias, R.; Tassos, S. Prediction of a Grid-Connected Photovoltaic Park’s Output with Artificial Neural Networks Trained by Actual Performance Data. Appl. Sci. 2022, 12, 6458. [Google Scholar] [CrossRef]
Sanford, F.; Alan, M.; Rick, S. A Practical Guide to SysML. The Systems Modeling Language, 3rd ed.; Morgan Kaufmann: Burlington, MA, USA, 2015. [Google Scholar]
Sven Erik, M.; Hilding, E.; Martin, O. Physical system modeling with Modelica. Control Eng. Pract. 1998, 6, 501–510. [Google Scholar] [CrossRef]
Fermín, R.; Michael, G.; Luis, F.; Ainhoa, G. Very short-term temperature forecaster using MLP and N-nearest stations for calculating key control parameters in solar photovoltaic generation. Sustain. Energy Technol. Assess. 2021, 45, 101085. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Pearson Education: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
Filippo, M.B.; Enrico, M.; Michael, C.K.; Antonello, R.; Robert, J. An Overview and Comparative Analysis of Recurrent Neural Networks for Short Term Load Forecasting, 1st ed.; Springer: Cham, Switzerland, 2018; pp. 1–41. [Google Scholar] [CrossRef] [Green Version]
Mohamad, K.; McGough, A.S.; Zoya, P.; Mehdi, P.; Sara, W. Machine Learning, Deep Learning and Statistical Analysis for forecasting building energy consumption—A systematic review. Eng. Appl. Artif. Intell. 2022, 115, 105287. [Google Scholar] [CrossRef]
Zina, B.; Octavian, C.; Ahmed, R.; Haritza, C.; Najiba, M.B. A Nonlinear Autoregressive Exogenous (NARX) Neural Network Model for the Prediction of the Daily Direct Solar Radiation. Energies 2018, 11, 620. [Google Scholar] [CrossRef] [Green Version]
Vladimir, V.; Steven, E.G.; Alex, S. Support vector method for function approximation, regression estimation and signal processing. In Advances in on Neural Information Processing Systems (NIPS’96); MIT Press: Cambridge, MA, USA, 1996; pp. 281–287. [Google Scholar]
Müller, K.R.; Smola, A.J.; Rätsch, G.; Schölkopf, B.; Kohlmorgen, J.; Vapnik, V. Predicting time series with support vector machines. In International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
Katarina, G.; L’Heureux, A.; Miriam, A.M.C.; Luke, S. Energy Forecasting for Event Venues: Big Data and Prediction Accuracy. Energy Build. 2016, 112, 222–233. [Google Scholar] [CrossRef] [Green Version]
Yu-Sheng, K.; Kazumitsu, N.; Chi-Yo, H. Predicting Primary Energy Consumption Using Hybrid ARIMA and GA-SVR Based on EEMD Decomposition. Mathematics 2020, 8, 1722. [Google Scholar] [CrossRef]
Ahmed, R.; Sreeram, V.; Mishra, Y.; Arif, M.D. A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renew. Sustain. Energy Rev. 2020, 124, 109792. [Google Scholar] [CrossRef]
Modelica Association. Modelica Libraries. Available online: https://modelica.org/libraries (accessed on 8 November 2022).
Ahmed, A. El Tayyan. A simple method to extract the parameters of the single-diode model of a PV system. Turk. J. Phys. 2013, 37, 121–131. [Google Scholar]
Garazi, E.; Irati, Z.; Haritza, C.; Juan, H.; Juan José, U.; Octavian, C. Photovoltaic power forecast for the next 24 h with an analytical model and a FFNN model. In Proceedings of the 4th IEEE International Conference on Electrical Sciences and Technologies in Maghrib, Tunis, Tunisia, 26–28 October 2022. accepted. [Google Scholar]
Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar] [CrossRef]
Trevor, H.; Robert, T.; Jerome, F. The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]

Figure 1. Buildings participating in the CSC of the Izarbel pilot project. of ESTIA Institute of Technology.

Figure 2. ESTIA1 PV generation for two weeks in June.

Figure 3. ESTIA1 PV panels orientation.

Figure 4. Structure of an FFNN.

Figure 5. A general NARX diagram is used for forecasting purposes in operational mode [23].

Figure 6. (a) Open (series-parallel) mode of NARX; (b) Closed (parallel) mode of NARX.

Figure 7. Non-linear SVR, adapted from [29].

Figure 8. Block diagram representation of ESTIA1 building PV installation.

Figure 9. PV cell single diode model equivalent circuit.

Figure 10. Methodology for developing AI-based models.

Figure 11. Forecast of PV generation for a week of July with FFNN model.

Figure 12. Forecast of PV generation for a week of July with NARX model.

Figure 13. Forecast of PV generation for a week of July with SVR model.

Figure 14. Forecast of PV generation for a week of July with OpenModelica model.

Figure 15. (a)Percentage distribution error for the month of July by the FFNN model; (b) Percentage distribution error for the month of July by the NARX model; (c) Percentage distribution error for the month of July by the SVR model; (d) Percentage distribution error for the month of July by the analytical model.

Table 1. R² results of the input combinations tested.

Input Combinations	FFNN Model R²	NARX Model R²	SVR Model R²
Solar Irradiance	0.893	0.828	0.913
Solar Irradiance + Temperature	0.881	0.789	0.904
Solar Irradiance + Wind speed	0.879	0.794	0.906
Solar Irradiance + Wind speed_dq ¹	0.870	0.776	0.905

¹ Wind speed_dq refers to direction-weighted wind speed.

Table 2. Results of tests for hyper-parameter adjustment.

N° of Neurons	MAE	RMSE	R²
2	143.60	253.23	0.8849
4	131.62	237.86	0.8917
5	130.07	235.69	0.8931
6	132.53	239.13	0.8911
8	131.75	238.03	0.8923
10	131.91	238.61	0.8910
15	132.04	241.99	0.8872
20	133.02	243.59	0.8853

Table 3. Error results were obtained with the NARX model combining the different numbers of neurons and delays.

N° of Neurons	Input Delay	Feedback Delay	MAE	RMSE	R²
1	0	1	186.253	306.610	0.850
	0	2	178.890	300.114	0.850
	0	3	170.447	285.746	0.863
	0	4	175.752	296.368	0.848
	1	1	178.735	297.446	0.856
	1	2	182.403	303.188	0.849
	1	3	178.704	300.948	0.843
	1	4	175.490	295.415	0.850
3	0	1	203.762	338.190	0.854
	0	2	233.423	375.512	0.812
	0	3	209.042	351.111	0.779
	0	4	178.290	305.830	0.842
	1	1	160.204	274.975	0.877
	1	2	148.465	255.714	0.881
	1	3	151.524	261.359	0.877
	1	4	190.125	327.298	0.810

Table 4. Specifications of the hyper-parameter search space.

Hyper-Parameters	Lower Bounds	Upper Bounds	#Samples
$C$	$10^{- 1}$	$10^{6}$	50
$ε$	$10^{- 3}$	$1$	50
$γ$	$10^{- 6}$	$10$	50

Table 5. Results of the hyper-parameter tuning based on Bayesian optimization.

$C$	$ε$	$γ$
166	0.002	0.003

Table 6. Results of the error metrics obtained for the prediction of PV generation during the month of July for all models.

Model	MAE	RMSE	R²
FFNN	125.66	231.20	0.896
NARX	155.93	267.51	0.886
SVR	141.81	252.46	0.934
MODELICA	234.46	406.51	0.895

Table 7. Results of the four models for five sunny days and five cloudy days.

Type of Day	Date	FFNN R²	NARX R²	SVR R²	MODELICA R²
Sunny days	5 July 2021	0.963	0.943	0.962	0.897
	9 July 2021	0.980	0.927	0.968	0.881
	10 July 2021	0.980	0.971	0.978	0.915
	19 July 2021	0.981	0.973	0.982	0.894
	26 July 2021	0.971	0.953	0.977	0.906
Cloudy days	6 July 2021	0.780	0.779	0.871	0.758
	7 July 2021	0.676	0.670	0.813	0.726
	8 July 2021	0.662	0.670	0.802	0.728
	13 July 2021	0.824	0.839	0.902	0.834
	6 July 2021	0.780	0.779	0.871	0.758

Table 8. Mean relative error of four models.

Model	Error (%)
FFNN	5.87
NARX	6.35
SVR	5.16
MODELICA	9.46

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Etxegarai, G.; Zapirain, I.; Camblong, H.; Ugartemendia, J.; Hernandez, J.; Curea, O. Photovoltaic Energy Production Forecasting in a Short Term Horizon: Comparison between Analytical and Machine Learning Models. Appl. Sci. 2022, 12, 12171. https://doi.org/10.3390/app122312171

AMA Style

Etxegarai G, Zapirain I, Camblong H, Ugartemendia J, Hernandez J, Curea O. Photovoltaic Energy Production Forecasting in a Short Term Horizon: Comparison between Analytical and Machine Learning Models. Applied Sciences. 2022; 12(23):12171. https://doi.org/10.3390/app122312171

Chicago/Turabian Style

Etxegarai, Garazi, Irati Zapirain, Haritza Camblong, Juanjo Ugartemendia, Juan Hernandez, and Octavian Curea. 2022. "Photovoltaic Energy Production Forecasting in a Short Term Horizon: Comparison between Analytical and Machine Learning Models" Applied Sciences 12, no. 23: 12171. https://doi.org/10.3390/app122312171

APA Style

Etxegarai, G., Zapirain, I., Camblong, H., Ugartemendia, J., Hernandez, J., & Curea, O. (2022). Photovoltaic Energy Production Forecasting in a Short Term Horizon: Comparison between Analytical and Machine Learning Models. Applied Sciences, 12(23), 12171. https://doi.org/10.3390/app122312171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Photovoltaic Energy Production Forecasting in a Short Term Horizon: Comparison between Analytical and Machine Learning Models

Abstract

Featured Application

Abstract

1. Introduction

2. Case Study

2.1. Izarbel Pilot Project Description

2.2. Used Data Set

3. PV Generation Forecaster Models

3.1. Analytical Model

3.2. Artificial Intelligence Models

3.2.1. Feed Forward Neural Network

3.2.2. Non-Linear Autoregressive with Exogenous Input Neural Network

3.2.3. Support Vector Regression

3.3. Error Metrics

4. Forecasters Development

4.1. Development of the Analytical Model

4.2. Development of the Artificial Intelligence Models

4.2.1. Data Analysis and Input Parameter Selection

4.2.2. Models Hyper-Parameters Adjustment

Hyper-Parameters Adjustment of FFNN

Hyper-Parameters Adjustment of NARX

Hyper-Parameters Adjustment of SVR

5. Numeric Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI