Hybrid Intelligent Modelling in Renewable Energy Sources-Based Microgrid. A Variable Estimation of the Hydrogen Subsystem Oriented to the Energy Management Strategy

This work deals with the prediction of variables for a hydrogen energy storage system integrated into a microgrid. Due to the fact that this kind of system has a nonlinear behaviour, the use of traditional techniques is not accurate enough to generate good models of the system under study. Then, a hybrid intelligent system, based on clustering and regression techniques, has been developed and implemented to predict the power, the hydrogen level and the hydrogen system degradation. In this research, a hybrid intelligent model was created and validated over a dataset from a lab-size migrogrid. The achieved results show a better performance than other well-known classical regression methods, allowing us to predict the hydrogen consumption/generation with a mean absolute error of 0.63% with the test dataset respect to the maximum power of the system.


Introduction
In recent years climate change and the environment have become topics of great interest for different sectors. One of the main reasons is related to recent natural disasters, which are related to the above issues [1]. It is urgent to make decisions to mitigate these unwanted events. Any action is important; however, to stop the deterioration of the environment, it is necessary for governments to become aware, and take forceful actions to approach this issue [2]. The use of renewable sources plays a fundamental role in the new global energy model [3].
Despite the benefits of renewable energy sources, the dependence on the intrinsic environmental conditions related to renewable production technologies does not guarantee a net zero power balance at any time. In this sense, the hybridization of renewable energy sources and the use of energy storage systems are feasible solutions [4,5]. There are many energy storage solutions, from traditional examples such as water pumping [6] to the most modern including hydrogen and new battery approaches [7]. However, all the different alternatives have limitations, depending the final use [8]. Battery banks are usually designed to absorb the transients in the power balance, so they can be considered a short-term With the aim of increasing performance with respect to the solutions presented in the scientific literature, there are some different methods for the systems' modelling purpose, however, obviously the performance could be very different, depending on the implemented method for each case. Some of the most commonly used techniques involve Multiple Regression Analyses [34]. Nevertheless, these types of methods or similar ones with small changes present some limitations [35]. One of the reasons for the bad performance is the non-linearities intrinsic to the system. Many alternatives for accomplishing the modelling process are based on intelligent techniques [36]. Of course the nonlinearity problems are solved in many cases by using these kinds of techniques [37]. Despite this, the problem could persist, depending on the nature of the nonlinearities. When this occurs, one of the possible solutions, that gives very satisfactory results, is the breakdown of the problem into areas with similar and/or linear behaviour [38,39].
Taking into account the above explanation, this paper deals with the prediction of the variables in the hydrogen subsystem inside a renewable microgrid based on the EMS concept and development.
Attending to the implementation of EMS, there are four important variables to predict: the power from/to the hydrogen subsystem, the degradation of the electrolyzer and the fuel cell, and the hydrogen level in the storage tank. This research uses one hybrid intelligent model to predict each mentioned variable. The model proposed by authors includes all electrical and physical variables of the microgrid, which enables the identification of the interrelationship between the different variables of the microgrid on the main parameters of the hydrogen-based system, i.e., operating power, hydrogen tank level and degradation of fuel cell and electrolyzer. The modelling methodology employed allows the entire operating range of the system to be modelled, which increases the quality of the model with respect to linearized solutions around a single working point. Table 1 summarizes the main characteristics of the scientific literature. In order to highlight the novelty of this work, the last row presents the main contributions of the authors' proposals. Research gap and contribution of the paper: • Previous scientific works present a microgrid model as a compendium for individual models of each subsystem that integrated the microgrid. This restricts the quality of the whole microgrid model by not considering the cross interactions between elements.

•
These contributions are based on simplifications of the model and linearization around a single working point based on physical parameters that are not easily measurable.
Based on above: • The model proposed by authors includes all electrical and physical variables of the whole microgrid, which allows identifying the interrelationship between the different variables.

•
All the variables involved in the model are easily measurable at a low cost. • The developed model allows the entire operating range of the system to be modelled.

Materials and Methods
In this section, firstly, the installation under study is presented. Then, the model approach used is explained, including the data processing, and the algorithms.
The microgrid used in this work is located in the southwest region of Spain, at the facilities of the "Control and Robotics (TEP-192)" research group in the "La Rábida" Campus of the University of Huelva. The microgrid combines several types of renewable energy sources. In the first instance, there is a 16.2 kW photovoltaic plant, consisting of three 5 kW arrays of different technologies, monocrystalline, polycrystalline and thin film and a two-axis solar tracker of 1.2 kW (photovoltaic system). The wind production is defined by the installation of three wind turbines, two with horizontal axis and one with a vertical axis, which determine an installed power of 31.8 kW (Wind turbine system). It also includes a hybrid energy storage system (supercapacitor, batteries and hydrogen) scaled according to the power of the network and based on the expected response time (energy storage system). The complete scheme of the microgrid with all the systems and their interconnection is shown in Figure 1 [40]. The dataset used by authors to obtain the hybrid model corresponds to experimental tests carried out in the real microgrid presented in [41].
The architecture of the microgrid is characterized by the use of a mixed topology, which uses DC and AC buses to ensure the interconnection between equipment and the main electrical grid [42]. All of the generation and consumption subsystems are connected to the internal DC bus, supported by the supercapacitor and battery bank, allowing a two-way energy flow between the main electrical grid of the campus and the microgrid. The facility also has programmable power sources and loads, and three energy storage systems: a supercapacitor and battery bank, and a hydrogen loop made up of an electrolyzer (hydrogen production), fuel cell stacks (hydrogen consumption), as well as hydrogen storage in the form of pressure gas or metal hydride compound. The interconnection between each of the microgrid subsystems and the different power buses is carried out using commercial and customized power electronics devices specially designed for the application [23].
The microgrid used in this work has two electrolyzers of different technologies, alkaline and polymer electrolyte membrane (PEM), both with similar production capacities and output pressures, 2 Nm 3 /h at 30 bar ( Figure 2a,b respectively). The electrolyzers incorporate all the necessary components for its operation, according to the manufacturer's conditions and the established power setpoint. The hydrogen generated can be stored in the form of pressurized gas, in a hydrogen storage tank of 1 Nm 3 ( Figure 2c); or as a solid compound in the form of metal hydrides in two bottles of 1.5 Nm 3 and two bottles of 5 Nm 3 (Figure 2d). To carry out the inverse conversion, to generate electricity from the stored hydrogen, the microgrid includes a modular fuel cell system comprised of four PEFC stacks of 3.4 kW (Figure 2e). The correct operation of the fuel cells, as well as the regulation of the working point, are carried out by means of control electronics and power electronics converters designed by the authors.
Based on the above description, the main parameters of the subsystems that make up the microgrid are presented in Table 2.     Hybrid intelligent models have been used to divide the whole dataset in different clusters. Figure 4 shows the internal schema of this type of model. In this research, the inputs of the model were the value of the different variables at a specific instant (k), and the model predicted the values of the output in the next instant (k+1). The variables were represented as follows: battery voltage (V_Bat), battery state of charge (SOC_Bat), hydrogen level (HL_H2), battery power (P_Bat), hydrogen subsystem power (P_H2), power exchanged with the electrical grid (P_Grid), power from renewable origin (P_Ren), power demanded by the load (P_Load), battery degradation (D_Bat), electrolyzer degradation (D_Eles), and fuel cell degradation (D_FC). Note that the model was geared towards EMS implementation, not local controllers.
In this research four models were created, each one predicted one output. The four were hybrid models, thus the whole dataset was divided to calculate the output in local models. Each local model was trained with only a fraction of the dataset. The procedure to create the hybrid model was carried out in four steps: 1. Clustering phase. Firstly, the dataset was divided into clusters; but as the optimal separation is usually not known, several hybrid topologies were created, dividing the dataset several times (creating two clusters, three clusters ...). 2. Regression phase. Several regression models were created for each cluster created in the previous phase. These models are known as local models, and there were as many as the number of regression techniques (and a different configuration for each algorithm). 3. Best local model selection. To select the best local model, it is necessary to compare the predicted error in the different models. In this research, the mean squared error (MSE) calculated with K-fold cross validation was selected.. This validation procedure is represented in Figure 5, and the value of error calculated was more realistic than with hold out validation. 4. Best hybrid topology selection. Once the best local models were chosen, each hybrid topology (with two clusters, three clusters ...) was tested to calculate the predicted error of the whole model (not only the local models).

Data Processing
The dataset used in this research was extracted from several experimental tests made in the microgrid described above. First of all, only the samples with a working hydrogen subsystem were considered. Then, 5% of the data were randomly isolated to test the final model. The rest of the data was used to train the models and to choose the best hybrid topology.
The training data were normalized to be fitted in the 0-1 range. Each variable was normalized independently. Several clusters were created with the normalized data and, before training the local models, another 5% of the data from each cluster were separated from the training process to validate the performance of the different hybrid topologies.

K-Means Algorithm
K-Means is one of the most known clustering algorithms. In the present paper it was used to divide the dataset in clusters in order to create the local models. The algorithm assigned each sample to the cluster whose centroid was closest. The Euclidean distance is the most typical one [37,43,44]. The algorithm only needs to know how many clusters it needs to create (the parameter K defined by the user), and the dataset.
After the final centroids were defined, the algorithm took a very short time to assign new samples to its clusters. The training phase to achieve the final centroids involved the following procedure: • The initial centroids were chosen randomly in the dataset.

•
Each sample was assigned to its cluster (defined by the centroid) depending on how far the centroids were from the sample.

•
Once each sample was assigned to a cluster, the center of each sample was defined as the new centroid.
The procedure had to be repeated (the last two steps) until the centroids were the same twice. It was necessary to store the centroids to use the K-Means algorithm with new samples.

Artificial Neural Networks
This research used artificial neural networks (ANNs) to create the regression models. The ANN can be used to perform regression or classification models. This algorithm is inspired by the biological neuron model, and uses this basic component to create the model. The neuron can use several inputs, and each one has an internal factor to adapt the reaction of the neuron to each input.
The neuron has an activation function that uses the sum of all the pondered inputs (and the bias signal), to calculate the neuron output. The typical activation functions are step, linear, log-sigmoid and tan-sigmoid. The multilayer perceptron is a basic feed-forward ANN structure that organizes the neuron in layers. Each layer has neurons with the same inputs and outputs. The inputs of the model are connected to the input layer, there is one (or more) hidden or internal layers, and the output layer is connected to the output of the model [45][46][47].
The ANNs are commonly used because they have a good performance to generalize data despite not being in the training data.

Results
To take into account all the results in this research, this section has been divided in different parts. Following the procedure to create the hybrid model, first of all, the clustering results will be presented. Then, the regression results are shown and they are used to select the best regression configuration for each cluster; and finally the validation results are used to choose the optimal hybrid topology. A final test is also included to measure the error with real values.

Clustering Results
The K-Means algorithm was trained with random centroids as the initial condition; moreover, to ensure the optimal division, the training was repeated 20 times for each configuration. This technique was used to divide the dataset several times, to create from 2 to 10 clusters, but the procedure discards the division when any group has less than 15 samples. Table 3 shows the number of samples in each created cluster. Despite the fact that the procedure tried to create 10 hybrid topologies, only five different ones are shown; the global model was also used, which only had one cluster with all samples available.

Regression Results
ANN was chosen as a regression algorithm; the internal configuration for all the models was basically the same: 11 neurons in the input layer, one hidden layer and one neuron in the output layer. The number of neurons in the internal layer varied from 1 to 15, the hidden neurons activation function was set to tan-sigmoid, while the neuron in the output layer has a linear activation function.
There were several regression models, 15 different configurations for each of the four predicted outputs, one hybrid model was created per output. These models only had one output, instead of one model with four outputs; this configuration enabled different hybrid configurations for each signal. Table 4, as an example, shows the mean squared error (MSE) achieved in the ANN model with eight neurons in the hidden layer to calculate the level of stored hydrogen. Table 5 shows the mean absolute error (MAE) for the degradation in the fuel cell module when ANNs with 13 neurons were used.
All the models created were trained by updating the weight and bias values according to Levenberg-Marquardt optimization. The values shown in Tables 4 and 5 are calculated using 10 K-fold cross validation to ensure a more realistic error measurement than with hold-out. Cl-1 2.3984 · 10 −5 0.0011 6.9041 · 10 −7 8.0724 · 10 −7 4.0524 · 10 −4 7.4945 · 10 −4 Cl-2 3.1544 · 10 −7 6.6584 · 10 −5 2.6084 · 10 −7 2.8948 · 10 −7 1.3023 · 10 −6 Cl-3 1.7973 · 10 −5 1.9729 · 10 −6 3.2842 · 10 −5 9.6112 · 10 −5 Cl-4 7.4043 · 10 −4 3.6455 · 10 −7 7.6656 · 10 −9 Cl-5 1.9621 · 10 −5 1.5924 · 10 −4 Cl-6 1.5676 · 10 −8 Table 5. Mean absolute error (MAE) using an ANN model with 13 neurons in the hidden layer to predict the degradation of the fuel cell module. The best regression model for each cluster was selected depending on the value of the MSE obtained in the previous step. Table 6 shows the lowest MSE for each cluster to predict the degradation of the electrolyzer, and Table 7 shows the lowest values needed to calculate the power from/to the hydrogen subsystem.  Since the lowest MSE was calculated with different models, Tables 8 and 9 show the configuration of each local model obtained by the MSEs shown in Tables 6 and 7. This error was the most typical one used to compare the performance of regression techniques.

Global
It is necessary to highlight that the previous models were trained using K-fold to calculate the presented errors and, with this validation technique, several models were created for the same training data. Then, once the best algorithm was chosen for each cluster, and also for the global model, a new regression model was trained with the selected configuration, using all the training data available for each local model. Table 8. Model configuration to predict the degradation of the electrolyzer.

Global
Hybrid Model (Local Models) Cl-5 ANN10 ANN1 Cl-6 ANN1 Table 9. Model configuration to predict the power of the hydrogen subsystem.

Global
Hybrid Model (Local Models)

Validation Results
A different dataset was used to validate the model and to choose the best hybrid topology. This dataset was isolated from the beginning of the training process, and none of these data were used in the previous models' creation procedure. To select this validation dataset, it was necessary to take into account that it should have data from all the clusters.
This validation dataset has been used to test all the possible hybrid model topologies, and also the global model. Tables 10-13 show the validation results for the model that predicts the electrical power generated/consumed by the hydrogen subsystem, the degradation of the fuel cell module and electrolyzer, and the level of stored hydrogen. These tables include the MSE, MAE and normalized mean squared error (NMSE), and the best hybrid topology for each predicted variable was chosen as the one with the lowest approximation error.
The best hybrid configuration, as it is shown in Tables 10-13, is a hybrid model that divides the dataset into two different clusters. The final configuration for the four created models used two different artificial neural networks to predict the power of the hydrogen subsystem, the level of the stored hydrogen and the degradation (of the electrolyzer and also of the fuel cell module).    Additionally, the proposed model was validated with experimental results and also compared with a reference model taken from the literature (Figures 6-8). The reference model presented in [48][49][50] has been considered for the hydrogen generation/consumption ratio and [51] for electrolyzer and fuel cell degradation. Several references were needed to compare the authors' proposal due to the fact that this one reflects the behaviour of a whole hydrogen system integrated by an electrolyzer, hydrogen storage and a fuel cell, while the references found in the literature address only one subsystem separately.
To check the performance of each model, test data were used. In this case, the data were chosen randomly, regardless of the cluster. Table 14 shows the error measurements of the four predicted variables. Taking into account that the models were created with normalized data, the table shows the error with the original values, and the ones calculated with the normalized values (inside parenthesis). Level of stored hydrogen 1.9153 · 10 −6 (2.6449 · 10 −8 ) 6.1233 · 10 −4 Nm 3 − 0.0012% (7.1957

Conclusions and Future Research
This paper develops a hybrid intelligent model to predict the hydrogen subsystem behaviour considering the remaining electrical and physical variables that intervene in the microgrid performance. This allows identifying the interrelationship between the different variables of the whole microgrid and the parameters of the hydrogen-based subsystem, i.e., operating power, hydrogen tank level and degradation of fuel cell and electrolyzer. Four different models were developed in order to predict the four output variables: the hydrogen subsystem power (consumed by the electrolyzer or supplied by the fuel cell module), the level of stored hydrogen, and the degradation of the electrolyzer and the fuel cell module.
The accuracy of the model has been validated; all the models have been created with a hybrid topology that divides the model into two different local models. The final models were validated with a dataset isolated from the one used for training, and they obtain a maximum MSE of 3.8177 · 10 −4 in the prediction of the degradation of the electrolyzer, and a minimum MSE of 1.9160 · 10 −8 , obtained in the model that predicts the hydrogen level in the tank.
The models were tested using a different dataset and they obtained an MSE of 619.9549 to predict the hydrogen subsystem power, 1.9153 · 10 −6 to predict the level of stored hydrogen, 8.7600 · 10 −8 to predict the degradation of the electrolyzer, and 1.8484 · 10 −15 to predict the degradation of the fuel cell module. These errors are calculated without normalization values. The results shows that this type of model can be used in these systems. To obtain a general model that could be applied to different specific systems, it was necessary to increase the dataset to train all the types of working points.
Finally, the response of the proposed model was compared to those used as reference models [48][49][50][51]. For this purpose, all the reference models were simulated using the same input profile, making use of the power setpoint for electrolyzer and fuel cell obtained from the experimental data. As an output, the hydrogen generation/consumption ratios in electrolyzer and fuel cell (references [48][49][50]) and their respective degradation ratios (reference [51]) have been obtained and used for the validation process.
The created model involves both operation of the hydrogen subsystem, storage and consumption. Previous work [48] develops a model only for the electrolyzer and achieves an RMSE of 0.0957. On the other hand, in studies such as [49] that work with artificial intelligence techniques or [50] that choose classical modelling techniques to model the fuel cell behaviour, they obtain an RMSE of 0.058 and an MSE of 0.37. The proposal developed by authors models the behaviour of a whole hydrogen subsystem integrated by the electrolyzer, hydrogen storage and fuel cell. Based on obtained results, the authors' proposal is validated with an RMSE of 0.0603 and an MSE of 2.2919 · 10 −5 .
As future work, it is possible to mention the extension of this procedure in the other sub systems of the microgrid. Moreover, it would be necessary to increase the dataset to include more working points with different powers and different hydrogen levels in the tank.

Conflicts of Interest:
The authors declare no conflict of interest.