Photovoltaic Power Forecast Using Deep Learning Techniques with Hyperparameters Based on Bayesian Optimization: A Case Study in the Galapagos Islands

: Hydropower systems are the basis of electricity power generation in Ecuador. However, some isolated areas in the Amazon and Galapagos Islands are not connected to the National Inter-connected System. Therefore, isolated generation systems based on renewable energy sources (RES) emerge as a solution to increase electricity coverage in these areas. An extraordinary case occurs in the Galapagos Islands due to their biodiversity in ﬂora and fauna, where the primary energy source comes from fossil fuels despite their signiﬁcant amount of solar resources. Therefore, RES use, especially photovoltaic (PV) and wind power, is essential to cover the required load demand without negatively affecting the islands’ biodiversity. In this regard, the design and installation planning of PV systems require perfect knowledge of the amount of energy available at a given location, where power forecasting plays a fundamental role. Therefore, this paper presents the design and comparison of different deep learning techniques: long-short-term memory (LSTM), LSTM Projected, Bidirectional LSTM, Gated Recurrent Units, Convolutional Neural Networks, and hybrid models to forecast photovoltaic power generation in the Galapagos Islands of Ecuador. The proposed approach uses an optimized hyperparameter-based Bayesian optimization algorithm to reduce the forecast error and training time. The results demonstrate the accurate performance of all the methods by achieving a low-error short-term prediction, an excellent correlation of over 99%, and minimizing the training time.


Introduction
The production of electrical energy constitutes a fundamental factor in society's current economic, industrial, and technological development.However, the growing energy demand has negatively affected the environment since fossil fuels are the primary energy source.The consequences of this scenario constitute natural catastrophes with a profound impact on humanity [1].Therefore, using alternative, environmentally friendly energy sources is essential.Photovoltaic solar energy (PV) is one of the most attractive and developing renewable energy sources (RES) due to the abundance of places with a high rate of solar radiation and the reduction in the cost of photovoltaic installations that require low maintenance.Furthermore, the solar irradiance received by photovoltaic panels is transformed into electrical energy through the photovoltaic effect [2].
However, the stochastic nature of renewable resources (e.g., irradiance, temperature, wind speed, etc.) significantly affects the generation system's stability, limiting PV penetration [3].This uncertainty in photovoltaic systems prevents their possible integration into existing power systems without proper planning.In this regard, the forecast of photovoltaic energy allows the regulation, planning, storage, and management of the generated power [4].However, the projection of PV power depends on the accuracy of the PV cell mathematical model [5][6][7], the solar panel's parameters [8,9], the type of PV module [10], the availability of weather data [11], and the forecast method, among others.
In the literature, several methods are used for forecasting PV power.These methods are classified as: (i) atatistical: which uses historical time series data to learn patterns and forecast data [12]; (ii) machine Learning (ML): which follows the process of data preparation, algorithm training, model generation, and finally making the prediction [13] or deep learning (DL) in which the prediction models have solid predictive capacity since they learn the complex characteristics and relationships that are usually hidden in the data set [14]; (iii) physical models, which constitute mathematical equations that describe physical and dynamic states based on numerical climate prediction and satellite images [15]; and (iv) hybrids: which are based on a combination of different methods.However, their structure makes them more complex and requires more training time [16].The DL models have demonstrated the ability to extract the nonlinear features of temporal series and process large amounts of data.Therefore, some DL models have recently been proposed for PV power forecasting.Models based on Recurrent Neural Networks (RNN) such as Long-Short Term Memory (LSTM), Bidirectional LSTM (BiLSTM), and Gated Recurrent Unit (GRU) have demonstrated good accuracy in forecasting PV power [14].
Furthermore, models based on Convolutional Neural Networks (CNN) are adapted to handle time series data with seasonal characteristics, such as PV power [17].Finally, hybrid models with a combination of RNN and CNN have demonstrated better performance than the typical structures of DL models [18].However, not all the previously mentioned works have implemented an algorithm to obtain optimized hyperparameters for training.Therefore, the models have not been used efficiently, which implies an unnecessary computational cost.
The hyperparameters mainly influence the performance of DL models.These values cannot be learned from the training data set but must be set beforehand.Therefore, it is essential to select a suitable set of these values.Bayesian optimization (BO) is an efficient method for searching for hyperparameters.It uses the historical information of the search iterations to focus on the range of most significant interest based on a probabilistic model whose objective function is the evaluation metric.In ref. [19], the hyperparameters of an LSTM are optimized to construct a forecasting model for PV plants with a lack of historical data.The applied methodology shows high prediction accuracy with the provided data.
Moreover, the authors in ref. [20] develop a generative Bayesian model for large data sets to validate the error as a function of training set size and accelerate the optimization.As a result, the method obtained the hyperparameters required for the DL models faster than the traditional Bayesian method.In addition, the authors in ref. [21] propose a model to forecast PV power based on an ANN trained in a Bayesian framework.This methodology allows for obtaining the confidence intervals and estimating the error bars of the prediction model.The optimal number of nodes in the ANN is obtained through the K-fold crossvalidation mechanism.As a result, the Multi-Layer Perceptron (MLP) Neural Network shows a low error percentage on test data.
Furthermore, the study presented in ref. [22] develops a hybrid model based on singular spectrum analysis (SSA) and BiLSTM with the BO algorithm.The SSA decomposes the PV series into several sub-signals, whereas the BO algorithm adjusts the DL network architecture hyperparameters, improving the DL models' forecasting accuracy.
Consequently, the hybrid model demonstrated superior performance compared to traditional DL-based models.Furthermore, reference [23] proposes a PV power prediction model based on sparse Bayesian regression (SBR).The artificial bee colony (ABC) is used to optimize the hyperparameters instead of the maximum likelihood method.Both methodologies demonstrated that they were capable of predicting the PV module's maximum power for various weather conditions at the study location, even though the ABC model achieved higher prediction accuracy.
The study in ref. [24] proposes an attention-based encoder-decoder network based on BO with a GRU to accurately forecast short-term power loads.The BO method achieves optimal predictions by determining the model's hyperparameters.In addition, the methodology can extract features and learning capabilities from time series data.With the optimized hyperparameters, the method reflected accuracy, low response speed, and low computing consumption.The authors in ref. [25] also present a spatial-temporal analysis of the correlation among distributed PV system PV generation data.A data-driven inference model, built on a Bayesian network, is developed for a very short-term PV generation forecast.The proposed methodology demonstrated that the models could be trained with only one data set for similar data sets and present low errors in prediction evaluated in another data set.The tuning of the hyperparameters required for efficient training of the models can be completed by various methods, such as genetic algorithm (GA), cross-validation, etc.Furthermore, the authors in ref. [26] conclude that the BO achieves higher precision and that the search speed is higher.The Bayesian method needs the data set partition for training and validation purposes.Finally, the study in ref. [27] shows that the Hold-out method that involves the random data set division (training and validation) is adaptable to regression models.
The study of the prediction of photovoltaic generation is not removed from the context of Ecuador.For example, the authors in ref. [28] estimate the PV power generation based on a physical method.The numerical climate prediction model is established at a university campus in Ecuador.The collected data based on environmental conditions (i.e., solar radiation and ambient temperature) is used to develop a mathematical model to obtain the forecasted PV power.In addition, the authors in ref. [29] use data mining techniques to predict PV power in a rural location in Ecuador.Univariate and multivariate analyses of the variables with the highest incidence in PV power generation are made to obtain the input for the learning machine.This study uses a decision tree algorithm through random forests to get a faster prediction.
Furthermore, the authors in ref. [30] develop a short-term active power forecasting model based on an ANN.The network design considered variable selection to utilize the best input variables for the model and a suitable number of layers, neurons, and learning algorithms.For the number of neurons, the rule of the geometric pyramid is used, so optimization is not used.In the previous study [4], two models based on DL were presented.The LSTM and GRU models were developed to predict PV energy in the Galapagos Islands in Ecuador.The proposed methodology obtains two accuracy models.However, an optimization algorithm does not get the hyperparameters used to train the models.Accordingly, as mentioned above, the models developed to forecast PV in Ecuador have not been prepared with optimized hyperparameters, which implies that the models have not been exploited efficiently.
Although solar radiation levels in Ecuador are high enough for extensive implementation of photovoltaic solar plants, they have not been used in the country.However, above 3.8 kWh/m 2 , a photovoltaic project is viable [31].Additionally, the methods developed to forecast PV power in Ecuador have not applied a technique for obtaining optimized hyperparameters to train their models, making it challenging to take advantage of them.Therefore, this study shows the potential of Ecuador's solar resources by developing methods based on DL techniques with optimized hyperparameters based on BO to forecast PV power.This work expands on the previous study [4], where the models based on DL did not include BO.It also presents the similarity of data between two islands corresponding to the location under study.The data similarity allows training a model with one data set and using the trained model to forecast PV power on a different island.With this, the computational cost is reduced, and developing models that can be applied on the islands becomes a more flexible task.Furthermore, the results show the advantages of applying hyperparameter optimization to algorithms.In addition, the accuracy of the model's prediction does not limit its use in isolated areas but in any location that requires the forecast of photovoltaic generation, which allows for maximizing the use of the abundant solar resource in the country [32].
The rest of this paper is organized as follows.Section 2 presents the current situation of renewable energies in Ecuador.Additionally, the need to migrate to clean primary energy sources in the Galapagos Islands is needed to take advantage of the abundant solar resources and minimize the environmental impact on the delicate ecosystem of the islands.Section 3 presents a summary of the prediction models LSTMP, BiLSTM, CCN, and hybrid, as well as the algorithm to obtain the optimized hyperparameters to train the models.Next, Section 4 describes the data collection and methodology for implementing the prediction models.Furthermore, Section 5 evaluates the achieved results.Finally, Section 6 summarizes the conclusions of this work.

Renewable Energies in Ecuador
The Agency for the Regulation and Control of Energy and Non-Renewable Natural Resources, in its 2021 report, mentions that the electricity service coverage in Ecuador was 97.29% [33].In addition, the nominal power at the national level was 8734.41MW, where 5308.27MW (60.77%) corresponded to RES plants, with hydraulic energy as the predominant source, and 3426.14MW (39.23%) to non-renewable energy plants such as thermal energy.However, geographical conditions and the high costs associated with expanding the electrical network have hampered the country's coverage increase, which is the main reason for having coverage of 93.12% in rural areas [34].Therefore, isolated systems are feasible alternatives to meet the requirement for access to electricity.
On the other hand, due to their extensive biodiversity, UNESCO considered the Galapagos Islands a World Heritage Site in 1978 [35].Therefore, the Ecuadorian government implemented the Zero Fossil Fuels in the Galapagos initiative to use non-conventional renewable energy projects (NCRE) instead of fossil fuel-based power generation sources and avoid species habitat degradation [36].
Galapagos generation plants are considered isolated systems because they are not connected to the National Interconnected System (NIS).The electrical service coverage in the Galapagos Islands is 99.46% [33].In addition, according to the accountability report of ELECGALAPAGOS [37], electricity generation in the Galapagos Islands is distributed as presented in Figure 1.As can be seen, thermal generation is the most significant on the islands, in contrast to renewable energies that do not cover even 10% of the total generation, which harms the environment.Hence, it is evident that other energy sources must be used more significantly to reduce thermal energy consumption.
Furthermore, due to the exceptional location of the Galapagos Islands, their global horizontal irradiance (GHI) is the highest in Ecuador.As shown in Figure 2, the GHI reaches values between 4.8-6.3kWh/m 2 per day, which suggests the high feasibility of implementing solar systems in the diversification of the energy matrix of the islands [31].Additionally, in ref. [34], it is concluded that the installation of autonomous systems in Ecuador does not represent an economic benefit for the user.However, this statement does not apply to sectors without connection with the SNI, such as the Galapagos Islands.

Deep Learning Techniques Applied to Forecast Photovoltaic Power
DL techniques have been widely used in the prediction of ren ewable energy.These have proven efficient in extracting and analyzing time series data's nonlinear and non-stationary characteristics.Several DL variants have been developed in recent years that, unlike traditional RNN -based models, solve th e vanishing gradient problem.These variants include Long Short-Term Memory Networks (LSTM), LSTM Projected (LSTMP), Bidirectional LSTM (BiLSTM), and Gated R ecurren t Unit (GRU).Additionally, models based on Convolutional Neural Networks (CNN) and Hybrid models [14].Next, DL techniques that automatically learn the characteristics necessary to forecast PV power are briefly described.

Long-Short-Term Memory Projected
In LSTM, there are a large number of computations in various gates that make up the neural network.Th erefore, increasing th e number of memory cells increases the memory cost.Nevertheless, you have poor neural network performance if you want to avoid this situation by pres erving a low number of memory cells.From here comes the architecture called LSTMP, which improves the model's accuracy and effectively reduces the computational load.A projected layer is a type of deep learning layer that enables compression by reducing the number of stored learning parameters.By reducing the number of para meters that can be learned by the number of hidden units of th e LSTM layer, it maintains the output size of the layer and, in turn, the sizes of subsequent layers, which can result in better accuracy of the prediction layer.The LSTMP model is shown in Figure 3, where the forget gate (ft) decides if the information is retained or discarded before adding it as input to the cell; the input gate (it) controls th e flow of input activations to the cell memory; the candidate memory cell ( % t C ) creates a vector of new candidate values to add to the cell; the output gate (ot) controls the output flow of activations from the cell to th e rest of the network; and pt is the recurrent projected activation unit [38].In addition, the LSTMP functions are shown in ref. [39].

Deep Learning Techniques Applied to Forecast Photovoltaic Power
DL techniques have been widely used in the prediction of ren ewable energy.These have proven efficient in extracting and analyzing time series data's nonlinear and non-stationary characteristics.Several DL variants have been developed in recent years that, unlike traditional RNN -based models, solve th e vanishing gradient problem.These variants include Long Short-Term Memory Networks (LSTM), LSTM Projected (LSTMP), Bidirectional LSTM (BiLSTM), and Gated R ecurren t Unit (GRU).Additionally, models based on Convolutional Neural Networks (CNN) and Hybrid models [14].Next, DL techniques that automatically learn the characteristics necessary to forecast PV power are briefly described.

Long-Short-Term Memory Projected
In LSTM, there are a large number of computations in various gates that make up the neural network.Th erefore, increasing th e number of memory cells increases the memory cost.Nevertheless, you have poor neural network performance if you want to avoid this situation by pres erving a low number of memory cells.From here comes the architecture called LSTMP, which improves the model's accuracy and effectively reduces the computational load.A projected layer is a type of deep learning layer that enables compression by reducing the number of stored learning parameters.By reducing the number of para meters that can be learned by the number of hidden units of th e LSTM layer, it maintains the output size of the layer and, in turn, the sizes of subsequent layers, which can result in better accuracy of the prediction layer.The LSTMP model is shown in Figure 3, where the forget gate (ft) decides if the information is retained or discarded before adding it as input to the cell; the input gate (it) controls th e flow of input activations to the cell memory; the candidate memory cell ( % t C ) creates a vector of new candidate values to add to the cell; the output gate (ot) controls the output flow of activations from the cell to th e rest of the network; and pt is the recurrent projected activation unit [38].In addition, the LSTMP functions are shown in ref. [39].

Deep Learning Techniques Applied to Forecast Photovoltaic Power
DL techniques have been widely used in the prediction of renewable energy.These have proven efficient in extracting and analyzing time series data's nonlinear and nonstationary characteristics.Several DL variants have been developed in recent years that, unlike traditional RNN-based models, solve the vanishing gradient problem.These variants include Long Short-Term Memory Networks (LSTM), LSTM Projected (LSTMP), Bidirectional LSTM (BiLSTM), and Gated Recurrent Unit (GRU).Additionally, models based on Convolutional Neural Networks (CNN) and Hybrid models [14].Next, DL techniques that automatically learn the characteristics necessary to forecast PV power are briefly described.

Long-Short-Term Memory Projected
In LSTM, there are a large number of computations in various gates that make up the neural network.Therefore, increasing the number of memory cells increases the memory cost.Nevertheless, you have poor neural network performance if you want to avoid this situation by preserving a low number of memory cells.From here comes the architecture called LSTMP, which improves the model's accuracy and effectively reduces the computational load.A projected layer is a type of deep learning layer that enables compression by reducing the number of stored learning parameters.By reducing the number of parameters that can be learned by the number of hidden units of the LSTM layer, it maintains the output size of the layer and, in turn, the sizes of subsequent layers, which can result in better accuracy of the prediction layer.The LSTMP model is shown in Figure 3, where the forget gate (f t ) decides if the information is retained or discarded before adding it as input to the cell; the input gate (i t ) controls the flow of input activations to the cell memory; the candidate memory cell ( C t ) creates a vector of new candidate values to add to the cell; the output gate (o t ) controls the output flow of activations from the cell to the rest of the network; and p t is the recurrent projected activation unit [38].In addition, the LSTMP functions are shown in ref. [39].

Bidirectional Long-Short-Term Memory
The bidirectional LSTM is an extension of the LSTM model in which two LSTM are applied to the input data.First, an LSTM is used on the input sequence (forward layer ).Second, the reverse form of th e input data is fed into the LSTM model (backward layer ).Applying the LSTM twice improves learning long-term dependencies and consequently enhances the model's accuracy.Likewise, a longer training time is re quired [40].The Bi LSTM model is shown in Figure 4.

Convolutional Neural Network
A CNN comprises two layers: convolutional and max pooling.Each layer combines the input with a weight matrix with a pre -specified size, called a filter, that defines the number of nodes that share weights to compute a feature map.These layers are achieved by sliding th e weight matrix over the input and computing the dot product between the input and the weight matrix.The CNN model is shown in Figure 5.The CNN functions are presented in detail in ref. [18].The max pooling layer computes the maximum value of the selected pool of adjacent neurons from the convolutional layer.Combining a convolutional and max pooling layer ensures that the output of the max pooling layer is invariant to shifts in the input data,

Bidirectional Long-Short-Term Memory
The bidirectional LSTM is an extension of the LSTM model in which two LSTM are applied to the input data.First, an LSTM is used on the input sequence (forward layer).Second, the reverse form of the input data is fed into the LSTM model (backward layer).Applying the LSTM twice improves learning long-term dependencies and consequently enhances the model's accuracy.Likewise, a longer training time is required [40].The Bi LSTM model is shown in Figure 4.

Bidirectional Long-Short-Term Memory
The bidirectional LSTM is an extension of the LSTM model in which two LSTM are applied to the input data.First, an LSTM is used on the input sequence (forward layer ).Second, the reverse form of th e input data is fed into the LSTM model (backward layer ).Applying the LSTM twice improves learning long-term dependencies and consequently enhances the model's accuracy.Likewise, a longer training time is re quired [40].The Bi LSTM model is shown in Figure 4.

Convolutional Neural Network
A CNN comprises two layers: convolutional and max pooling.Each layer combines the input with a weight matrix with a pre -specified size, called a filter, that defines the number of nodes that share weights to compute a feature map.These layers are achieved by sliding th e weight matrix over the input and computing the dot product between the input and the weight matrix.The CNN model is shown in Figure 5.The CNN functions are presented in detail in ref. [18].The max pooling layer computes the maximum value of the selected pool of adjacent neurons from the convolutional layer.Combining a convolutional and max pooling layer ensures that the output of the max pooling layer is invariant to shifts in the input data,

Convolutional Neural Network
A CNN comprises two layers: convolutional and max pooling.Each layer combines the input with a weight matrix with a pre-specified size, called a filter, that defines the number of nodes that share weights to compute a feature map.These layers are achieved by sliding the weight matrix over the input and computing the dot product between the input and the weight matrix.The CNN model is shown in Figure 5.The CNN functions are presented in detail in ref. [18].

Bidirectional Long-Short-Term Memory
The bidirectional LSTM is an extension of the LSTM model in which two LSTM are applied to the input data.First, an LSTM is used on the input sequence (forward layer ).Second, the reverse form of th e input data is fed into the LSTM model (backward layer ).Applying the LSTM twice improves learning long-term dependencies and consequently enhances the model's accuracy.Likewise, a longer training time is re quired [40].The Bi LSTM model is shown in Figure 4.

Convolutional Neural Network
A CNN comprises two layers: convolutional and max pooling.Each layer combines the input with a weight matrix with a pre -specified size, called a filter, that defines the number of nodes that share weights to compute a feature map.These layers are achieved by sliding th e weight matrix over the input and computing the dot product between the input and the weight matrix.The CNN model is shown in Figure 5.The CNN functions are presented in detail in ref. [18].The max pooling layer computes the maximum value of the selected pool of adjacent neurons from the convolutional layer.Combining a convolutional and max pooling layer ensures that the output of the max pooling layer is invariant to shifts in the input data, The max pooling layer computes the maximum value of the selected pool of adjacent neurons from the convolutional layer.Combining a convolutional and max pooling layer ensures that the output of the max pooling layer is invariant to shifts in the input data, which is a beneficial property for processing temporal series data.Finally, the function activation Rectified Linear Unit (ReLU) is used to compute the final prediction [41].

Hybrid
The hybrid architecture uses CNN layers described before for feature extraction on input data and combines with an RNN-based model to support sequence prediction.In this work, the RNN model corresponds to the LSTM [4].Specifically, CNN extracts the features from inputs and uses them in the LSTM architecture to output the caption [42].The Hybrid model is shown in Figure 6. which is a beneficial property for processing temporal series data.Finally, the function activation Rectified Linear Unit (ReLU) is used to compute the final prediction [41].

Hybrid
The hybrid architecture uses CNN layers described before for feature extraction on input data and combines with an RNN-based model to support sequence prediction.In this work, the RNN model corresponds to the LSTM [4].Specifically, CNN extracts the features from inputs and uses them in the LSTM architecture to output the caption [42].The Hybrid model is shown in Figure 6.

Hyperparameters Optimization
Before DL network training, it is necessary to initialize the model's hyperpara meters to ensure the prediction model's performance.However, selecting hyperparameters based on experience is not a practical method since it involves attempts that require a lot of time and high computational costs and does not ensure th e maximum performance of the models [24].Optimization refers to finding the maximum or minimum of the objective function with a given set of parameter combinations.Therefore, each prediction method requires hyperparameters for its training stage.The BO is an algorith m based on a probabilistic model to find the minimum of the loss function of the DL model.The algorithm is described below:

•
A random sampling of hyperparameter values.

•
Observe the performance of the model.

•
Based on the previous observation, it fits a Gaussian process.

•
Calculate the mean of this Gaussian process as an approximation of th e loss function.

•
An acquisition function (hyperparameter ) obtains the following hyperparameter space exploration algorithm.

•
Fit the model, observe the output (model performance), and iterate over th e same process until reaching the maximum number of iterations.
The algorithm uses the information from the previous execution to inform the following ones, which allows for reducing the search for hyperparameters throughout the process [43].Bayesian learning offers advantages over th e conventional maximum likelihood approach to training.First, a single weight vector is found, which minimizes the error function.In contrast, the Bayesian scheme considers a probability distribution over the weigh ts, described by a prior distribution p(W) that is modified wh en the data D is observed, p(D).The process is expressed by the Bayes theorem (1) [21], as follows: The expressions for the prior p(W) and the likelihood p(D| W) are needed to evaluate the posterior distribution.The previous over -weights should reflect the knowledge, if

Hyperparameters Optimization
Before DL network training, it is necessary to initialize the model's hyperparameters to ensure the prediction model's performance.However, selecting hyperparameters based on experience is not a practical method since it involves attempts that require a lot of time and high computational costs and does not ensure the maximum performance of the models [24].Optimization refers to finding the maximum or minimum of the objective function with a given set of parameter combinations.Therefore, each prediction method requires hyperparameters for its training stage.The BO is an algorithm based on a probabilistic model to find the minimum of the loss function of the DL model.The algorithm is described below:

•
A random sampling of hyperparameter values.

•
Observe the performance of the model.

•
Based on the previous observation, it fits a Gaussian process.

•
Calculate the mean of this Gaussian process as an approximation of the loss function.

•
An acquisition function (hyperparameter) obtains the following hyperparameter space exploration algorithm.

•
Fit the model, observe the output (model performance), and iterate over the same process until reaching the maximum number of iterations.
The algorithm uses the information from the previous execution to inform the following ones, which allows for reducing the search for hyperparameters throughout the process [43].Bayesian learning offers advantages over the conventional maximum likelihood approach to training.First, a single weight vector is found, which minimizes the error function.In contrast, the Bayesian scheme considers a probability distribution over the weights, described by a prior distribution p(W) that is modified when the data D is observed, p(D).The process is expressed by the Bayes theorem (1) [21], as follows: The expressions for the prior p(W) and the likelihood p(D|W) are needed to evaluate the posterior distribution.The previous over-weights should reflect the knowledge, if any, about the mapping to be built.The model is described in ref. [21].The procedure reasonably estimates the probability mass attached to the posterior.The development of the acquisition functions is described in ref. [44].
The data required for implementing the algorithm is divided according to the Hold-out method.This method divides the data set into training, validation, and testing.When using the Hold-out method, giving each data set a representative part of the entire data set is essential.Otherwise, the model would perform poorly on previously unseen data.Hold-out is an ML technique used to avoid overfitting or underfitting in developed models [45].

Methodology
The methodology followed in this work to predict photovoltaic energy is shown in the flowchart in Figure 7.
any, about the mapping to be built.Th e model is described in ref. [21].The procedure reasonably estimates the probability mass attached to the posterior.The developmen t of the acquisition functions is described in ref. [44].
The data required for implementing the algorithm is divided according to the Hold-out method.This method divides the data set into training, validation, and testing.When using the Hold -out method, giving each data set a representative part of the en tire data set is essential.Otherwise, the model would perform poorly on previously unseen data.Hold-out is an ML technique used to avoid overfitting or underfittin g in developed models [45].

Methodology
The methodology followed in this work to predict photovoltaic energy is shown in the flowchart in Figure 7.As shown in Figure 7, th e PV model obtains the gen erated power after collecting the weather conditions (solar irradiance and temperature) at a given location.Furthermore, data normalization is performed to get the initial data set.After that, the data set is divided into two: one for training and the other for validation.Here, th e data enters different deep -learning models to perform the training, optimization, model selection, and verification processes.Next, th e various stages involved in the process are developed.

Data Collection
As mentioned in the section on R enewa ble En ergies in Ecuador, the Galapagos Islands constitute a feasible location for implementing isolated systems based on NCRE.As shown in Figure 7, the PV model obtains the generated power after collecting the weather conditions (solar irradiance and temperature) at a given location.Furthermore, data normalization is performed to get the initial data set.After that, the data set is divided into two: one for training and the other for validation.Here, the data enters different deeplearning models to perform the training, optimization, model selection, and verification processes.Next, the various stages involved in the process are developed.

Data Collection
As mentioned in the section on Renewable Energies in Ecuador, the Galapagos Islands constitute a feasible location for implementing isolated systems based on NCRE.Therefore, the sites on Santa Cruz Island (0 • 41 42 S 90 • 19 44.399 W) correspond to Data Set 1, and San Cristobal Island (0 • 54 0 S 89 • 30 21.6 W) correspond to Data Set 2. Therefore, these are taken as a case study (see Figure 8).Additionally, these islands are the most populated.This study uses the data provided by the Photovoltaic Geographical Information System (PVGIS) web portal [46], which comprises hourly irradiance and temperature values from 2018 to 2020.These data serve as parameters for obtaining the PV-generated power using the model presented in ( 2) and(3).
Therefore, the sites on Santa Cruz Island (0°41'42" S 90°19'44.399"W) correspond to Data Set 1, and San Cristobal Island (0°54'0" S 89°30'21.6"W) correspond to Data Set 2. Therefore, these are taken as a case study (see Figure 8).Additionally, th ese islands are the most populated.This study uses the da ta provided by th e Photovoltaic Geographical Information System (PVGIS) web portal [46], which comprises hourly irradiance and temperature values from 2018 to 2020.These data serve as parameters for obtaining the PV-generated power using the model presented in ( 2) and(3).

Photovoltaic Model
The output power of a PV module (PPV), considering the incident irradiance, temperature, and construction material, is expressed as follows [4,47]: (3 where PSTC is the PV module output power under standard test conditions (STC), G(,) is the incident irradiance on th e plane of the panels, GSTC is the incident irradiance under STC,  is the power temperature coefficient, TSTC is the temperature under STC (°C), TC is the cell temperature (°C), Ta is the a mbient temperature, and NOCT is th e Nominal Operating Cell Temperature [4].
On the one hand, Figure 9a shows the PV power using Data Set 1 (location 1), where seasonal variation is observed from 2018 to 2020.On the other hand, due to the geographical proximity of both islands (Santa Cruz and San Cristobal), there are similar weather conditions.Consequently, the obtained PV power for location 2 reaches close values, which can be verified when analyzing the correlation graph, see Figure 9b, of PV power achieved in location 1 and location 2. Therefore, this study considers Data Set 1 (location 1) for getting the optimized hyperparameters.

Photovoltaic Model
The output power of a PV module (P PV ), considering the incident irradiance, temperature, and construction material, is expressed as follows [4,47]: where P STC is the PV module output power under standard test conditions (STC), G(β,α) is the incident irradiance on the plane of the panels, G STC is the incident irradiance under STC, γ is the power temperature coefficient, T STC is the temperature under STC ( • C), T C is the cell temperature ( • C), T a is the ambient temperature, and NOCT is the Nominal Operating Cell Temperature [4].
On the one hand, Figure 9a shows the PV power using Data Set 1 (location 1), where seasonal variation is observed from 2018 to 2020.On the other hand, due to the geographical proximity of both islands (Santa Cruz and San Cristobal), there are similar weather conditions.Consequently, the obtained PV power for location 2 reaches close values, which can be verified when analyzing the correlation graph, see Figure 9b, of PV power achieved in location 1 and location 2. Therefore, this study considers Data Set 1 (location 1) for getting the optimized hyperparameters.

Data Normalization
The data set is divided into 80% for training (PVTrain) and 20% for validation (PVTest).The input data consists of PV power output ranging from 0 to rated output power.Therefore, when handling some high-value data with the RNN -based model, a gradient explosion can negatively a ffect the model's performance, reducing th e learning efficien-

Data Normalization
The data set is divided into 80% for training (PV Train ) and 20% for validation (PV Test ).The input data consists of PV power output ranging from 0 to rated output power.Therefore, when handling some high-value data with the RNN-based model, a gradient explosion can negatively affect the model's performance, reducing the learning efficiency.Thus, the input data is normalized through min-max normalization within the interval [0, 1] to solve this problem.Later, the data is used to build the DL models [14].The original data normalization is defined as follows: where PV NORM is the normalized PV power value at sample i, PV Train is the PV power value in the data set at sample i, PV min is the smallest PV power value in the data set, and PV max is the most considerable PV power value in the data set.

Training
The LSTM, LSTMP, BiLSTM, GRU, CNN, and Hybrid models are trained in a supervised manner.The BO is used to obtain the training hyperparameters.Using the Hold-out method, PV Train is divided into 80% for training, PV Train_t , and 20% for validation, PV Train_v .The selection of samples is carried out randomly, as shown in Figure 10a.On the other hand, Figure 10b shows the evolution of the optimal hyperparameters for the hybrid method, where the values with the lowest loss value are chosen.The Adam optimizer is used for all processes to minimize the training loss of function.Table 1 shows the values of the optimal hyperparameters obtained for the six methods.For the LSTMP method, Var.1 and Var.2 correspond to the output and input projector sizes, respectively.Th e CNN and h ybrid Var.1 and Var.2 correspond to the size of the filter and the number of filters, respectively.It is worth noting that for the hybrid method, fewer values are required for feature extraction than the CNN method, which does not include the RNN layer.Table 1 shows the values of the hyperparameters obtained for six methods.For the LSTMP method, Var.1 and Var.2 correspond to the output and input projector sizes, respectively.CNN and hybrid Var.1 and Var.2 correspond to the size of the filter and the number of filters, respectively.It is worth noting that for the hybrid method, fewer values are required for feature extraction than the CNN method, which does not include the RNN layer.

Performance Indicators
Performance indicators are used to quantify the results of the different methods.As in many works, the Root Mean Square Error (RMSE) (5), Mean Absolute Error (MAE) (6), Mean Absolute Percentage Error (MAPE) (7), and correlation coefficient (Cor.Coef.) (8) are used.These indicators have been defined in ref. [4].However, here they are recalled for the sake of paper completeness.
Cor. coe f .= where n is the sample number, y i is the real power value at sample i, ŷl is the predicted power value at sample i, µ yi and σ yi are the real power's mean and standard deviation, respectively, and µ ŷl and σ ŷl are the expected power's mean and standard deviation, respectively [4].

Simulation Results
The presented DL techniques applied to forecast PV power are simulated in Matlab ®® using a computer with an Intel ®® Core™ i7-3770 processor and 8GB of RAM. Figure 11 shows the models' training metrics for Data Set 1. where n is the sample number, yi is the real power value at sample i, l y ˆis the predicted power value at sample i, yi and yi are the real power's mean and standard deviation, respectively, and  yl ˆ and  yl ˆ are the expected power's mean and standard deviation, respectively [4].

Simulation Results
The presented DL techniques applied to forecast PV power are simulated in Matlab ®® using a computer with an In tel ®® Core™ i7-3770 processor and 8GB of RAM. Figure 11 shows the models' training metrics for Data Set 1.
(a) ( As seen in Figure 11, the error value decreases according to the optimized number of iterations.Additionally, th e loss function value does not show fluctuations after the optimized number of epochs, which implies that the parameters are already well -learned for prediction.Moreover, the CNN model, shown in Figure 11e, presents a s moother As seen in Figure 11, the error value decreases according to the optimized number of iterations.Additionally, the loss function value does not show fluctuations after the optimized number of epochs, which implies that the parameters are already well-learned for prediction.Moreover, the CNN model, shown in Figure 11e, presents a smoother training curve than the other models since the filter size is applied to the input data, in contrast to the RRN-based models that deal with the temporal changes of the PV power.The methods reached a Loss value close to 0, implying a precise model and minimizing the forecast error.
In addition, Figure 12 shows the actual and predicted PV power achieved in one week using Data Set 1 through the LSTM, LSTM Projected, BiLSTM, GRU, CNN, and Hybrid methods.It is worth noting that all methods presen t an excellent approximation between the actual and forecasted PV power.
Furthermore, Figure 13 shows the correlation graph between the real and predicted PV power for all methods for the entire validation period using Data Set 1.It is worth noting that all methods present an excellent approximation between the actual and forecasted PV power.
Furthermore, Figure 13 shows the correlation graph between the real and predicted PV power for all methods for the entire validation period using Data Set 1.
As shown in Figure 13, the results imply a significant linear dependence between the variables.Similar results are obtained with the LSTM, LSTMP, and GRU methods.However, the results achieved by BiLSTM, CNN, and Hybrid methods show improved performance since the predicted values are closer to the actual ones.These results demonstrate that all methods with optimized hyperparameters are efficient for the forecast task.
Performance indicators are used further to assess one method's quantitative advantages.As evidenced in Table 2, all the methods analyzed manage to make an efficient short-term prediction of PV power.
The results presented in Table 2 show that the RMSE value is small, indicating a high prediction precision.Furthermore, the MAE and MAPE metrics indicate a low difference between the predicted and actual values.On the other hand, the Cor.Coef.obtained value is close to 1, which means a significant linear dependence between the real and predicted values.
Conversely, Figure 14 shows the correlation graph considering Data Set 2.
It is worth noting that all methods presen t an excellent approximation between the actual and forecasted PV power.
Furthermore, Figure 13 shows the correlation graph between the real and predicted PV power for all methods for the entire validation period using Data Set 1.As shown in Figure 13, the results imply a significant linear dependence between the variables.Similar results are obtained with the LSTM, LSTMP, and GRU methods.However, the results achieved by BiLSTM, CNN, and Hybrid methods show improved performance since the predicted values are closer to th e actual ones.These results demonstrate that all methods with optimized h yperpara meters are efficient for the forecast task.
Performance indicators are used further to assess one method's quantitative advantages.As evidenced in Table 2, all th e methods analyzed manage to make an efficient short-term prediction of PV power.The results presen ted in Table 2 show that th e RMSE value is small, indicating a high prediction precision.Furthermore, the MAE and MAPE metrics indicate a low difference between th e predicted and actual values.On the other hand, the Cor.Coef.obtained value is close to 1, which means a significant linear dependence between the real and predicted values.
Conversely, Figure 14 shows the correlation graph considering Data Set 2.   In the same way, as with the results obtained with Data Set 1, the models present the same accuracy on the forecast.Again, BiLSTM, CNN, and Hybrid show improved performance.
Moreover, the performance indicators for Data Set 2 are presented in Table 3.It can be noticed from Table 3 that the obtained error values are low and similar to those obtained with Data Set 1. Furthermore, Bayesian optimization demonstrated that it was capable of getting optimized hyperparameters for the eight DL models, which can be used in the forecasting task in the isolated areas used as case studies.Due to the low error obtained in the forecast for both Data sets, the methodology applied in this work demonstrated that it was capable of bringing precise DL models.
Additionally, Figure 15 compares the obtained RMSE values for all models for both data sets.In the same wa y, as with the results obtained with Data Set 1, the models present the same accuracy on the forecast.Again, BiLSTM, CNN, and Hybrid show improved performance.
Moreover, the performance indicators for Data Set 2 are presented in Table 3.It can be noticed from Table 3 that the obtained error values are low and similar to those obtained with Data Set 1. Furthermore, Bayesian optimization demonstrated that it was capable of getting optimized hyperparameters for the eight DL models, which can be used in the forecasting task in the isolated areas used as case studies.Due to the low error obtained in the forecast for both Data sets, the methodology applied in this work demonstrated that it was capable of bringing precise DL models.
Additionally, Figure 15 compares the obtained RMSE values for all models for both data sets.It can be seen in Figure 15 that the LSTM model gets a lower value than BiLSTM, indicating that the characteristics of the temporal series do not need to go backward.Feedforward layers are to be learned, but the models above better captured the temporary changes' features in contrast with GRU, CNN, and Hybrid, which get higher R MSE values in the forecast.
Moreover, Table 4 compares the training time for all models to evaluate th e computational performance, whereas Figure 16 shows a visual representation of these values.It can be seen in Figure 15 that the LSTM model gets a lower value than BiLSTM, indicating that the characteristics of the temporal series do not need to go backward.Feedforward layers are to be learned, but the models above better captured the temporary changes' features in contrast with GRU, CNN, and Hybrid, which get higher RMSE values in the forecast.
Moreover, Table 4 compares the training time for all models to evaluate the computational performance, whereas Figure 16 shows a visual representation of these values.As can be seen in Table 4 and Figure 16, the LSTMP, GRU, CNN, and Hybrid models obtain the lowest training times.In con trast, the LSTM and BiLSTM models present higher training times.Specifically, the BiLSTM model got the highest training time since this method uses backward and feedforward layers (two LSTM cells) to learn.However, since it obtains higher R MSE values, the double-layer feature is unnecessary for th e data type used in this work.
It is worth noting that due to the LSTM model achieving the lowest R MSE value in the forecast and training time, the model developed with the optimized hyperparameters can adequately learn the temporal changes of the photovoltaic power in both locations under study in contrast with the other models.
Furthermore, Table 5 compares the LSTM and GRU methods' results and their hyperparameters.As seen in Table 5, th e BO reduced the number of epochs for both methods, which implies a reduction in computational cost.Furth ermore, for the LSTM method, the number of hidden units (NHU) is decreased, resulting in reduced network complexity.However, the NHU is increased for the GR U method, which implies that the model requires more hidden layers to learn the PV power's temporal changes.Finally, it is worth noting that the performance of the methods with optimized hyperparameters has lower error values and higher correlation coefficients.Therefore, the BO allows improved DL methods and more accurate models with reduced hyperparameter values and computational costs.

Conclusions
This work presents six short-term PV power forecasting models based on deep As can be seen in Table 4 and Figure 16, the LSTMP, GRU, CNN, and Hybrid models obtain the lowest training times.In contrast, the LSTM and BiLSTM models present higher training times.Specifically, the BiLSTM model got the highest training time since this method uses backward and feedforward layers (two LSTM cells) to learn.However, since it obtains higher RMSE values, the double-layer feature is unnecessary for the data type used in this work.
It is worth noting that due to the LSTM model achieving the lowest RMSE value in the forecast and training time, the model developed with the optimized hyperparameters can adequately learn the temporal changes of the photovoltaic power in both locations under study in contrast with the other models.
Furthermore, Table 5 compares the LSTM and GRU methods' results and their hyperparameters.As seen in Table 5, the BO reduced the number of epochs for both methods, which implies a reduction in computational cost.Furthermore, for the LSTM method, the number of hidden units (NHU) is decreased, resulting in reduced network complexity.However, the NHU is increased for the GRU method, which implies that the model requires more hidden layers to learn the PV power's temporal changes.Finally, it is worth noting that the performance of the methods with optimized hyperparameters has lower error values and higher correlation coefficients.Therefore, the BO allows improved DL methods and more accurate models with reduced hyperparameter values and computational costs.

Conclusions
This work presents six short-term PV power forecasting models based on deep learning techniques.The study has been developed for two geographically close and remote areas (Santa Cruz and San Cristobal Islands) in the Galapagos Islands of Ecuador.The proposed methodology has proven efficient for LSTM, LSTMP, BiLSTM, CNN, and Hybrid training tasks based on the prediction accuracy obtained and the decreased computational cost.Nevertheless, the procedure could be used to get a forecast of PV power in many areas with this prerequisite.Furthermore, performance indicators for all techniques have shown very low RMSE and excellent 99% correlations between actual PV and resulting forecast values in both data sets.
In addition, Bayesian optimization has helped to obtain the necessary hyperparameters for model training, reducing the error in the forecast task and the training time.Finally, simulation results have demonstrated that LSTM, LSTMP, BiLSTM, GRU, CNN, and Hybrid methods are efficient in the short-term forecasting of PV power.Nevertheless, LSTM has improved performance in learning temporal changes.Therefore, the results can be used as a reference for implementing remote solar PV systems in the Galapagos Islands.

Figure 1 .
Figure 1.Electric energy generation distribution in the Galapagos Islands.

Figure 1 . 19 Figure 1 .
Figure 1.Electric energy generation distribution in the Galapagos Islands.

Figure 6 .
Figure 6.Hybrid deep learning network model.

Figure 6 .
Figure 6.Hybrid deep learning network model.

Figure 9 .
Figure 9. (a) Photovoltaic power of Santa Cruz Island from 2018 to 2020.(b) Correlation of the photovoltaic power between Santa Cruz and San Cristobal islands.

Figure 9 .
Figure 9. (a) Photovoltaic power of Santa Cruz Island from 2018 to 2020.(b) Correlation of the photovoltaic power between Santa Cruz and San Cristobal islands.

Figure 10 .
Figure 10.(a) Hold-out of the data set.(b) Bayesian optimization of Hybrid hyperparameters.

Figure 15 .
Figure 15.RMSE comparison for the studied models.

Figure 15 .
Figure 15.RMSE comparison for the studied models.

Figure 16 .
Figure 16.Training time for DL models.
(5 )Performance IndicatorsPerformance indicators are used to quantify th e results of the different methods.As in many works, the Root Mean Square Error (R MSE)(5 ), Mean Absolute Error (MAE) (6),

Table 2 .
Performance metrics results from Data Set 1.

Table 2 .
Performance metrics results from Data Set 1.

Table 3 .
Performance metrics results from Data Set 2.

Table 3 .
Performance metrics results from Data Set 2.
MethodTraining Time (Min) Data Set 1 Training Time (Min) Data Set 2
Figure 16.Training time for DL models.