Fuel Cell Hybrid Model for Predicting Hydrogen Inflow through Energy Demand

Hydrogen-based energy storage and generation is an increasingly used technology, especially in renewable systems because they are non-polluting devices. Fuel cells are complex nonlinear systems, so a good model is required to establish efficient control strategies. This paper presents a hybrid model to predict the variation of H2 flow of a hydrogen fuel cell. This model combining clusters’ techniques to get multiple Artificial Neural Networks models whose results are merged by Polynomial Regression algorithms to obtain a more accurate estimate. The model proposed in this article use the power generated by the fuel cell, the hydrogen inlet flow, and the desired power variation, to predict the necessary variation of the hydrogen flow that allows the stack to reach the desired working point. The proposed algorithm has been tested on a real proton exchange membrane fuel cell, and the results show a great precision of the model, so that it can be very useful to improve the efficiency of the fuel cell system.


Introduction
The climate change and the problems derived from the pollution have caused society to look for new source of energy, especially clean energy sources. The hybrid energy topologies, where classical power plants and energy storage are combined, are some of the most promising technologies. The hydrogen is a possible technology to use in storage systems; the hydrogen can be produced with electrolyzers, storage, and then used in fuel cell to produce electrical power [1].
The main challenge of storage systems is the efficiency improvement in general terms. However, from a useful point of view, this efficiency is frequently measured in economic terms. Obviously, before reaching this profitable objective, there is conscientious development. During the last times, there are a lot of different proposals for achieving energy storage systems, in some different ways. For instance, in [2], an optimal nonlinear controller based on model predictive control (MPC) for a flywheel energy storage system is proposed in which the constraints on the system states and actuators are taken into account. The authors in [3] described a system for storing energy deep underwater in concrete spheres which also can act as moorings for floating wind turbines. A proposal is made in [4] for a deterministic and an interval unit commitment formulation for the co-optimization of controllable generation and PHES (pumped hydro energy storage), including a representation of the hydraulic constraints of the PHES.
The fuel cell systems are still researched, and its performance could increase in a few years. This is a high reliability choice for steady applications like electrical vehicles or space applications and, moreover, it is a clean energy source [5]. Internally, a fuel cell is a combination of small individual cells that are connected to create a stack. In the stack, an electrochemical reaction produces the electrical power when the hydrogen (H 2 ) is combined with the oxygen (O 2 ) in a specific environment. A control system ensures that the whole operation is performed in a safe condition, using some subsystems (cooling, conditioning the gases, inlet system, etc.) [6].
In different clean technologies, like photovoltaic or wind generation, the power production depends on the availability of the primary energy (sun or wind). The fuel cells do not depend on other energy than H 2 have, and such a thing allows for installing it wherever it is necessary. A Proton Exchange Membrane Fuel Cell (PEMFC) is one of the most efficient technologies, as it has high energy density, low volume, and weight against other fuel cells. It operates at a low temperature (less than 100 • C), and this allows for using less time to heat when it starts working. It has a large range of power that can adapt to a lot of applications [5]. High power fuel cells could be used connected to the electrical network in power stations [7][8][9][10][11]-while smaller systems could be used in mobile stations [12]. Apart from the output power, the energy stored (the power during the time that can be produced) depends only on the amount of H 2 available.
The output of the fuel cell is considered as a non-regulated power, as it is produced by an electrochemical reaction. It needs a system to control the H 2 and O 2 inlet flow, and also to ensure the security of the whole system. Moreover, the electrical output values can vary by, for example, external agents like temperature, pressure, etc. [13][14][15][16]. To increase the efficiency of the fuel cell, it is very important to have a model to predict the dynamic behavior of the system [1,[17][18][19][20][21]. In previous research, like [22], the variation of voltage and current from starting to steady state operation working point is studied. In [23], for example, it is showed that the change of the voltage affects the current; these variations must be taken into account when the fuel cell changes its power.
To avoid these variations in the output, a power system is attached after the fuel cell to establish the voltage to the desired application. Then, the main output signal to control the fuel cell would be the amount of electrical power produced. The model presented in this article predicts the H 2 flow inlet variation, in order to adapt the actual to the desired electrical power at the output of the fuel cell. The model is based on input-output data techniques [22,[24][25][26][27][28] that help the control system to be more accurate and efficient [29][30][31][32][33][34][35].
This paper is organized as follows: after the Introduction, the case study is presented, where the physical system used in the research is presented. The following section is the model approach where the hybrid intelligent model and the algorithms used are described. The results section explains the configuration of the hybrid model and the performance values achieved with the prediction model. Finally, conclusions and future works are depicted. Figure 1 shows a PEMFC fuel cell diagram; in this type of cell, the electrolyte is in contact with anodes and cathodes. When there is an inlet flow of hydrogen through the anode and oxygen in the cathode, ions (electrical charges) appear in the electrolyte [59]. In the anode, electrons that flow in an external circuit to the cathode are produced. The ions and the electrons are combined in the cathode to produce pure water as a residue of the whole reaction. In a normal condition, a single fuel cell can generate 1.2 V. To create high power systems, some single cells can be connected between each other to form a stack; the cell could be in series or parallel.

Test Bench
To collect the data for the present research, a test bench in a laboratory has been used. The stack is a PEMFC FCgen-1020AVS from Ballard (Majsmarken, Hobro, Denmark) [60], and it is formed by 80 BAM4G polymeric single cells [61]. The stack has a porous carbon cloth anode and cathode, with catalyst based on platinum [62]. The whole stack has graphite plates between cells, and aluminum end plates, all of them joined by compression.
The maximum output power of the stack is 3.4 kW, with typical values for voltage and current of 45.33 V and 75 A. The stack has its own refrigeration system that cools the stack by air. The pressure of the hydrogen inlet is around 1.36 bar. The stack also has the oxidant subsystem that is built based on the manufacturer's instructions [63]; the cooling system follows these instructions too. Figure 2 shows the diagram of the Balance of Plant (BoP) system, where all the subsystems are represented. In Figure 3, the real laboratory equipment is shown [64]. To perform different tests, a programmable electronic load was used (Amrel PLA5K-120-1200, San Diego, CA, USA). The monitor system stores all the important values of the stack, as temperature, voltage, current, hydrogen flow, etc., and it is described in [65,66].  There are several previous works that demonstrate these important values; for example, the authors in [67] studied the effects on the cell operation of stack temperature. In [63], the necessary purge process in the hydrogen that needs to be vented to the atmosphere periodically is studied.

Power System
As the fuel cell is a system that produces a non-regulated power, it is necessary to include a power system at the output of the cell to adapt the voltage and power values to the ones desired to the specific application, Figure 4. The power system should have a stable voltage; the control system would stabilize the variations in the fuel cell produced by the variation of the output power. Then, the main value in the system would be the output power, as the power system controls the voltage and current values.

Model Approach
This research is based on the basic model shown in Figure 5a, where the main variables are presented. The model uses the power generated by the fuel cell, the hydrogen inlet flow, and the desired power variation, in order to predict the necessary variation of the hydrogen flow. As it is explained, this research takes into account the amount of power generated by the fuel cell, not the values of current and voltage. With power converters, for example, the voltage variation at the output of the fuel cell can be established at the right value for a specific application.  Figure 5b shows the timing for specific inputs and output of the model, and also that this research will use a hybrid model. The output of the model is a future prediction; in this case, the model obtains the output with a horizon of five instants in the future. These five instants should be sufficient to adapt the hydrogen inlet control system of the fuel cell to the desired new output power. The hybrid intelligent system layout is shown in Figure 6; in this figure, it is shown the different local models created for the different clusters. Figure 7 shows the procedure to create the hybrid model. First of all, the clustering phase assigns each training sample to a cluster; the number of clusters used is not known previously, and it is selected in the last phase, after checking the results of all possible hybrid topologies. For each cluster, different configurations of regression techniques were trained. K-Fold cross validation is used to select the best local model that produces the lowest prediction error (Figure 8).  A more real error measurement is achieved by using K-Fold instead of Hold-Out validation. With K-Fold, all the data are used as testing data, but in different instants; in the case of 10-Fold cross validation, 10 models are created, each one with 90% of the data for training. Each of these 10 models uses the other 10% of the data for testing the model; at the end of the training process, the "Error log" (Figure 8) includes all the available data. Hold-Out validation would divide the data in training and testing in ratios like 60-40 or 70-30; and only one model would be created. The error was calculated only with this testing data.
The Mean Squared Error is a commonly used error measurement for selecting the best regression local model, in order to choose the best configuration for the regression techniques of each cluster, as the second and third steps in Figure 7 show. After that, it is necessary to select the best hybrid topology, the number of local models to the final hybrid model. To choose this topology, a new testing dataset, isolated at the beginning for the training procedure described below, is used in all the different hybrid models (with different topologies). Each hybrid model produces different errors depending on the internal number of clusters and, with this type of hybrid testing, it is ensured that the best topology is chosen.

K-Means Algorithm
One of the most known clustering algorithms is the K-Means algorithm. This algorithm creates as many centroids as clusters, and, at the end of the training phase, the centroids are placed in the center of their cluster. The data belonging to each cluster are data with similar characteristics [68,69].
To train the algorithm, in an initial phase, the centroids are randomly chosen from all the data. During the training, these centroids vary their positions to be the center of each cluster. An iterative phase includes the assignment of the cluster for all data samples, taking into account the distances to the centroids. After all, the samples have their "new" cluster, the centroids are calculated another time (as the center of each cluster). The training finishes when the centroids are not moved from the previous iteration [70].
Once the algorithm is trained, a new sample only needs to compare the distance to each centroid to assign a cluster. The usual distance is the Euclidean, and new data are assigned to its cluster very fast [71]. The training phase could finish in local minima; to avoid these situations, it is usual to train several times with random initialization and choose the furthest centroids.

Artificial Neural Networks
ANN (Artificial Neural Network) is an algorithm used in regression (and in classification) that is based on small units called neurons. The neurons are connected internally with links, and every one calculates its output using an activation function. The input to this activation function is the sum of the input links in the neuron [72].
The ANN algorithm has the ability to generalize through the experience; although the cases are not trained before, the obtained results should be good [73]. This intelligent algorithm creates its own internal representation of a problem, adapting the different links of the neurons [74].
The mentioned activation function used to define the state of a neuron [75]. This state is defined in a normalized range for each ANN-normally [0, 1] or [−1, 1]. Depending on the inputs and the selected activation function, the neuron can be inactive (0 or −1) or active (1). In some cases, the neuron could be neither inactive nor active; instead of that, the output would mean an intermediate state between the range.
The internal configuration of the ANN is known as its topology. An ANN could be divided into different layers: input, hidden, and output layer, whose neurons have the same inputs and outputs. The topology defines the links between the neurons and layers, and also the activation functions [76].
The most known architecture of an ANN is the Multi-Layer Perceptron that has a feedforward connections from the input to the output layer (through the hidden layers). Most of the neurons used have a Tan-Sigmoid or Log-Sigmoid activation function; however, the output neurons when the ANN is trained for regression could have a linear function instead.

Polynomial Regression
The Polynomial regression algorithm is an old technique used to achieve a linear regression function. It is based on some basis functions that are summed [77][78][79][80][81]. The number of inputs and the degree of the used polynomial affect the number of these basis functions.
Different examples of the polynomial are shown in equation (1) and (2)-both of them for two inputs, but with different polynomial degrees:

Data Processing
The BoP described below collects the data used in this research. All of the data available have 736,339 samples from six different experiments. However, as the model used different time instants for the inputs and outputs, there are some samples that are not used. The number of valid samples to the research was 736,309.
This data were divided in three different sets of data: for training (441,785 samples), to select the hybrid topology (147,262 samples), and, for tests, the final hybrid model (147,262 samples).

Results
The results of this research could be divided into four different parts: the clustering, the modelling, the hybrid topology selection, and the test.

Clustering Results
As it is explained, he K-Means algorithm is used for clustering. With this technique, the training data were divided into nine different hybrid systems, with different numbers of clusters for each one (from 2 to 10). Moreover, the global model (without cluster division) is taken into account.
To ensure good results in the clusters creation, the K-Means algorithm was random initialization of the centroids, and the training was repeated 20 times. These repetitions allow for avoiding finishing the training in a local minima. Table 1 shows the number of samples for each cluster.

Modeling Results
To train the regression algorithm (ANN and Polynomial), K-Fold cross validation in the training data was used. As is described, this validation divides the data K times and the final error achieved is a more real measure of the behavior of the system.
The ANNs were configured with only one hidden layer, but the different architectures mean different numbers of neurons in this hidden layer. They have three inputs and one output-tan-sigmoid as activation function for all the neurons except the output one that has a linear function. The internal neurons varied from 1 to 15.
To train each ANN configuration, the Levenberg-Marquardt optimization algorithm was used. Moreover, to finish the training phase, gradient descent was used as a base on the MSE (mean squared error).
In the case of a polynomial regression algorithm, the configuration varies between the first or second degree, with the same number of inputs and outputs that are described below.
To select the best algorithm and its configuration for each cluster, the MSE as the performance error measurement is used. Table 2 shows the MSE for each cluster, and Table 3 shows the best algorithm for each case.

Hybrid Topology Selection
To select the hybrid topology, the number of local models or the global one, a new data set isolated from the training phase described in the previous subsection is used. The models used this new data to calculate the output, and a new error log was achieved for each hybrid topology. Table 4 shows the MSE obtained for each model, where it is possible to choose the configuration with eight clusters, as it has the lowest MSE value.  According with Table 3, the best hybrid model will have internally different algorithms as Polynomial (first and second degree), and Artificial Neural Networks (with 11, 13, 14, and 15 neurons in the hidden layer). All these models are different-although in the table, the same model configurations appear, and each model is specific for its cluster. Poly1 1 is different than Poly1 8 (with superscripts meaning the clusters), in the same way ANN13 2 is different than ANN13 7 , and ANN15 4 , ANN15 5 , ANN15 6 , and ANN15 9 are different between each other.

Test Results
After the hybrid configuration was chosen, the final hybrid model was tested with the final data set isolated from the beginning. Table 5 shows different error measurements for this configuration: MSE, NMSE (Normalized MSE), MAE (Mean Absolute Error), and MAPE (Mean Absolute Percentage Error).  Figure 9 shows the fuel cell inlet flow. The green line was calculated with the actual flow and the simulated variation from the model presented in Table 5, a configuration with eight local models. The blue line is the real hydrogen inlet flow. Another simulation using random samples from another test was made. The graphical results ( Figure 10) show that the error between the real and simulated data used to be less than 0.1.

Conclusions and Future Works
In this research, a new model to predict the variation in the inlet flow rate to change the output power of a fuel cell is developed. The model can predict with very good results this variation, and allow for adapting the hydrogen inlet control system to the desired working point.
As the fuel cell system is nonlinear, a hybrid model is chosen, combining clusters' techniques with ANN and Polynomial regression algorithm. The results show that the best configuration is achieved with eight local models. These models are different topologies of ANN (with 11 to 15 neurons in the hidden layer), and Polynomial regression (first and second degree). The test data used to check the performance of the final model obtained error values like 1.2 × 10 −5 for MSE, or 5.1196 × 10 −4 for MAE.
As future work, the joining of the prediction variation calculated with the present model, and the control system of the fuel cell, is taken into consideration. It is possible to create an adaptive-predictive control system with the model from this paper; this new control should be faster and better to adapt the fuel cell to the changes desired in the load power.