Application of Support Vector Machine to Obtain the Dynamic Model of Proton-Exchange Membrane Fuel Cell

An accurate model of a proton-exchange membrane fuel cell (PEMFC) is important for understanding this fuel cell’s dynamic process and behavior. Among different large-scale energy storage systems, fuel cell technology does not have geographical requirements. To provide an effective operation estimation of PEMFC, this paper proposes a support vector machine (SVM) based model. The advantages of the SVM, such as the ability to model nonlinear systems and provide accurate estimations when nonlinearities and noise appear in the system, are the main motivations to use the SVM method. This model can capture the static and dynamic voltage–current characteristics of the PEMFC system in the three operating regions. The validity of the proposed SVM model has been verified by comparing the estimated voltage with the real measurements from the Ballard Nexa® 1.2 kW fuel cell (FC) power module. The obtained results have shown high accuracy between the proposed model and the experimental operation of the PEMFC. A statistical study is developed to evaluate the effectiveness and superiority of the proposed SVM model compared with the diffusive global (DG) model and the evolution strategy (ES)-based model.


Introduction
The faster development and integration of renewable energies and electromobility technologies in the distribution grid have encouraged the necessity to find alternatives for power sources. One of the ways of decarbonization is the use of hydrogen. The hydrogen fuel cell consists of an electrochemical device that directly and continuously converts hydrogen fuel to electrical energy directly and constantly. This device is essential to tackle climate change and air pollution and switch to green energy. Within the different types of FCs, the PEMFC is considered the most exciting power source for many applications, such as transportation, mobile devices, heat, and power supplies for buildings and industry [1]. Its many advantages include low operating temperature, high power density, fast startup, quick response, and zero emissions. Due to its benefits and applications, industry and academia have conducted research and exploration to optimize the performance of the PEMFC. The PEMFC is a nonlinear, multi-variable and very complex system [2]. Due to its complicated process, a control system with good performance is required to regulate the voltage. Therefore, many researchers have focused their interest on the modeling study of PEMFCs.
The models developed not only describe the physicochemical phenomena in the fuel cells but also generate data-driven models based on the functional relationship between the current and the voltage of fuel cells [3]. The process of a fuel cell is usually characterized by the current-voltage curve, known as the polarization curve. There are three main types of polarization: activation polarization, ohmic (or resistance) polarization, and concentration polarization. The deviation of the cell potential from the ideal behavior is a direct result of the sum of these factors over the entire load range. Fuel cells have the best performance in the ohmic polarization region [4].
The FC model comparison based on artificial intelligence and electrical circuit models is shown in Table 1. The models presented in [1,3,[5][6][7][8][9][10][11][12][13] are not viable for analyzing the complete polarization curve of an FC, owing to only considering a linear steady-state response based only on the PEMFC ohmic polarization curve, ignoring the voltage-current dynamic response. On the other hand, most models contemplate an analytical approach based on equations that describe the physical system of the FC [3,9,10,[14][15][16][17][18]. Therefore, these approaches need several variables to evaluate the model, such as the operating absolute temperature of the fuel cell (T f c ), the operating current of the fuel cell (i f c ), the partial pressures of hydrogen and oxygen at the input channels of the fuel cell stack (P H2 and P O2 ), and the resistance of the membrane surface (R m ). However, due to implementation expenses, using multiple variables for its evaluation increases the development costs and the requirements of the high-processing device. Based on the above, data-driven models (or black box models) emerge as a promising strategy for modeling multivariate, nonlinear, and complex systems such as the PEMFC. Among the data-based models is the SVM. The main advantages of the SVM are its stability, robustness, generality, flexibility in determining the model structures, training capacity with limited data, and good predictive performance [13,19,20]. Therefore, this paper proposes a PEMFC model based on SVM, considering the current as the only input variable to the model, and the voltage as the output. The SVM is used to estimate the non-linear V-I dynamic characteristics of a PEMFC system in all regions.
Several papers related to the PEMFC model using an SVM have been found in the state-of-the-art review. In [13], the performance prediction of the PEMFC system of a commercially available electrical bicycle was investigated using an SVM and an artificial neural network. The results showed that the prediction performance of a PEMFC using an SVM is more effective than an artificial neural network. In [3], the authors proposed a hybrid model by combining a support vector machine (SVM) model with an empirical equation. The SVM model was used to predict PEM fuel cell voltages showing high accuracy even when the operating variables vary. In [21], a comparative study of random forest and support vector regression algorithms is presented to estimate the voltage of a solid oxide fuel cell. The results showed that the support vector regression is better than the random forest algorithm. However, the main drawback of the support vector regression algorithm is the small number of samples used in the validation phase. In [14], a Hammerstein model of a PEMFC stack is built using an SVM. The use of an SVM avoids complicated differential equations to describe the dynamic characteristics of a PEMFC stack, and it is appropriate to model the PEMFC compared to traditional methodologies. Finally, in [2] a review of the applications and contributions of machine learning algorithms in optimizing the PEMFC performance is presented. According to the review, machine learning algorithms such as SVMs have shown higher performance in constructing the data-driven model of the PEMFC. In brief, according to the review of the state of the art, most of the models based on machine learning do not consider the dynamic characteristic of the polarization curve, likewise, its validation is not performed with real FC data, and they have a high cost in their implementation. The proposed model overcomes these disadvantages with high estimation accuracy. The main contributions of this paper are the following:

•
The proposed model has been able to model the static and dynamic characteristics of the polarization curve in all operating regions. Two different FC current profiles have been used, with the aim to evaluate the generalization of the proposed model to predict the fuel cell voltage under different operating conditions.

•
We provide a method to model the PEMFC based on SVM, considering a significant reduction in the number of samples used in the training phase, compared with the number of samples used in the training phase of the models proposed in [18,22]. In the same way, the number of samples used in the validation phase is much higher than the number of samples used in the validation phase of the model proposed in [21]. • Real measurements were used in the training and validation phase of the SVM model. The data correspond to a commercial Nexa fuel cell power module, with a rated power up to 1.2 kW. • The proposed model is accurate. The results showed a high similarity between the voltage predictions obtained by the SVM model and the actual data, obtaining root mean squared errors (RMSEs) of less than 1%. Likewise, the root mean squared error obtained with the proposed model is 62% lower than the evolution strategy [18] and the diffusive model [22]. Therefore, the obtained results prove the effectiveness of the proposed FC model compared with other models.
This paper is organized as follows: Section 2 presents a short description of the concepts related with the SVM. The description of the proposed model is presented in Section 3. The validation of the proposed model is present in Section 4, and finally, conclusions are given in Section 5.

A Multi-Output Support Vector Machine
This method was proposed in [23]. The authors use a multi-output support vector regressor (M-SVR) to refer to their method. Let x ∈ R d be the input vector and y ∈ R Q , the output vector. The relationship between x and y is assumed to follow where W = w 1 , . . . , w Q , b = b 1 , . . . , b Q , with a vector w j and a constant b j for every output (j = 1, . . . , Q). The function φ(·) refers to a non-linear transformation to a higherdimensional space H, where H d.
First, given a dataset {x i , y i } n i=1 that is used to build the mapping from x to y, the purpose is to find the set of parameters {W, b} that minimize the functional where C is a regularization constant, and L(u i ) is known as the Vapnik − insensitive loss-function, with u i = e i e i , and As a loss function L(·), the authors of [23] use a differentiable quadratic loss with respect to a user-defined constant, , that takes into account all outputs to obtain each individual regressor w j and b j The solution to the above-mentioned problem can also be expressed in terms of the vector of coefficients β j for each output, which relates to the original vectors w j through . The problem to be solved can then be written as (see Ref. [23]) where K is a kernel matrix with entries k(x i , x i ) computed from a so-called kernel function k(x, x ). The kernel function generalizes the inner product between φ(x) and φ(x ). (In the sense that the kernel is not necessarily a Mercer kernel, this is one that can be written as an inner product. See [24].) Additionally, in expression (4), a = [a 1 , · · · , a n ] , and D a is a diagonal matrix with entries {a i } n i=1 . The terms {a i } n i=1 are computed using [23] where u i was defined before. An iteratively reweighted least squares (IRLS) procedure is proposed in [23] to solve (4), in order to calculate the parameters β = [β 1 , . . . , β Q ] and b.
Once we calculate the matrix β and the vector b, the predictionŷ for a new input vector x can be computed asŷ where k is a vector with entries given by {k( The kernel function k(x, x ) that is used in this paper is the radial basis function (RBF) kernel, given as where the parameter σ 2 is usually known as the bandwidth [24].

Proposed Model of the PEMFC Based on SVM
The proposed method is divided into two parts called the training phase and the validation phase. In the training phase, the matrix β and the vector b of the SVM are calculated, taking as input the real data of current and voltage from the Nexa® FC power module. These data are called training samples. A conceptual illustration of the SVM training phase is shown in Figure 1. With reference to Figure 1, the minimization block performs the calculation of β and b by the minimization of the error function e v between the measured v and the estimated v voltage using the set of the training samples. This minimization is formulated as an iterative procedure given by where β p and b p are the values at the iteration p. The value of η p is computed using a backtracking algorithm, in which, if J p+1 > J p , then η p is multiplied by a positive constant less that one, and the regressors β p+1 and b p+1 are computed again, until a decrease is achieved in J p+1 . The algorithm stops when J p+1 − J p < tolerance. In this paper, the tolerance is set to 10 −9 [23]. It is worth noting that the training phase is performed offline and it can take several seconds. In the validation phase, the estimation of the FC voltage is performed from the validation measurements. In an online estimation, a sample of voltage is estimated for each current sample. Connecting with the description given for the SVM in Section 2, d = 1 and Q = 1 (i.e., x and y are scalars). Taking into account the above, (6) can be re-written as where β is a column vector with N t rows, N t being the number of samples considered in the training phase, and b being a constant value. According to (9), the estimated voltage samples v are calculated given the vector β, the scalar b (both calculated in the training phase), and the measured current sample i. The kernel vector k i is computed using (7) using the current vector i t = [i 1 , . . . i k , . . . i N ] whose entries are the training samples of current, and a sample of real current i. In brief, for each sample of the current i, one sample of voltage is estimated, in this way, for N samples of the currents is formed a voltage vector whose entries are the estimated samples of voltage

Dataset Measured from the PEMFC
In this work, real data taken from the Nexa® fuel cell have been used, which produces unregulated DC power, up to 1.2 kW, from a hydrogen and air supply. LabVIEW software, which supplies a graphical user interface to the operational status and performance of the Nexa module [25], is provided by the Nexa power module. The experimental Nexa PEMFC data acquisition configuration can be observed in Figure 2, employed for training and verification of the SVM model. To accomplish rapid acquisition, data saving on its onboard hard drive, and long capture time, the LeCroy WaveSurfer 64Xs-A oscilloscope has been employed. The oscilloscope instantly acquires and keeps the data corresponding to the fuel cell current and voltage signals. Furthermore, a virtual instrument is designed through LabVIEW to create the current profiles via the DC electronic load control, utilizing its GPIB communication port as a constant current load. Thus, the maximum sampling limitation of the Nexa software is bypassed, performing sampling periods up to 20 µs.
Two FC load current profiles have been considered with the aim of testing the dynamic of the SVM model. Each profile contains the training and validation sets with different measured data. The first one was used in [22] to train and validate the diffusive global (DG) model. The second one was used in [18] to evaluate a PEMFC model based on an equivalent circuit, whose parameters were identified using an evolution strategy (ES) algorithm, and also used in [22] to evaluate the performance of the DG model. Figure 3 shows the instantaneous measurements of the current and voltage of the FC used to select the training samples of the SVM model. The data were sampled with a fixed step time of 200 ms for one hour. A total of 179, 988 current and voltage samples were obtained. The number of samples used for training has been substantially reduced to 9000 (5% of the initial data), taking samples every 2000 ms and considering only 50% of the rest of the data, to reduce the size of the vector β of the SVM model given by (9), which is appropriate to reduce the computational cost in the validation phase. A total of 50% of the samples were randomly selected.

Training Phase of the SVC Model with the First FC Load Current Profile
The hyperparameters of the SVM were set to the following values: C = 200, = 0.001, and σ = 1.5. These values were adjusted according to numerical experiences in the training phase of the SVM and showed a good predictive performance of the model. The root mean square error (RMSE) index defined by (10) has been used to quantify the performance of the SVM model.
where e k is the error between the measured and the estimated FC output voltages and N is the number of samples of the discrete signal v = [ v 1 , . . . , v k , . . . , v N ] . The RMSE obtained for the training samples is 0.3 V. The authors consider that this value confirms a satisfactory predictive performance in SVM training. Figure 4 shows the relative error between the measured and the estimated voltages, arranged in ascending order. As can be seen, the SVM model presents a good predictive performance due to 99% of the data presenting relative errors of less than 3%.

Validation of the SVM Model with the First FC Load Current Profile
The validation of the proposed model is carried out by the measurements shown in Figure 5, which is different from the one used during the training phase. This current profile is much more demanding than the one used for the validation stage because it has current step changes that are higher in magnitude. The validation data have a fixed time step of 200 ms and have been measured during a 20 min time window. Estimated voltage samples are calculated using Equation (9), where β is 9000 × 1 vector and b = 32.2729 V. The kernel vector k i is computed using (7), with x being the vector of training samples of the FC load current and x the validation sample from the FC load current. In a real implementation, each estimated voltage sample is computed from each real current sample. Therefore, the entries of the vector k i must be updated for each sample. The dimension of k i is 9000 × 1 .
The results obtained with the SVM model are compared with the results of the diffusive model presented in [22], using the following indices: R-square, relative error (RE), mean absolute error (MAE), and standard deviation (SD). The mathematical definitions of these indices are presented in Appendix A. The statistical analysis in Figure 6 shows that the proposed SVM model has a low error and a high R-square value compared with the diffusive global model.    Figure 8 shows the measurements used to select the set of training measurements for the SVC model, which correspond to the second FC load current profile. The data were sampled within a period of 200 ms in [18], which is 10 times higher than the sampling in the profile in Figure 3. However, to reduce the size of the vector β in (9), the data used to train the SVM were re-sampled within a period of 2000 ms, and then 50 % of the data were taken randomly to train the SVM. The total number of samples used to train the SVM model was 2957. According to numerical experience, the values of the hyperparameters of the SVM were adjusted to C = 5000, = 0.001, and σ = 4. The predictive performance of the SVM model was evaluated in terms of the RMSE index. For the training measurements, the RMSE obtained is 0.38 V. The authors consider that the predictive performance of the SVM model is satisfactory according to the obtained RMSE value. The relative error was calculated for all the samples used in the training. According to the obtained results, 97% of the relative errors calculated (i.e., 2868 values) are less than 3%. Figure 9 shows the results of the validation test. Notice the wide range of values considered for the FC load current in the upper part of Figure 9. The estimated samples of the FC voltage are calculated with (9), taking as input the vector β (2927 × 1 vector), the constant b = 23.17 V, and the kernel vector k x (x, x ) (2927 × 1 vector) calculated with (7) using the training samples of the input vector x and the validate sample x of the FC load current. The results were compared with the DG model proposed in [22] and the equivalent circuit of the PEMFC proposed in [22]. The results are shown at the bottom of Figure 9. The RMSE value was calculated for all the models. The RMSE of the DG model and the parameters adjusted by ES were 2.3273 V and 2.3116 V, respectively. Meanwhile, the RMSE for the SVM approach is 0.87 V. This value is equivalent to 37% of the RMSE obtained with the other approach. The statistical results of the proposed SVM model are detailed in Figure 10. These results are compared with those from the diffusive global model and those from the ES approach. The SVM model has the lowest error value, as it accurately represents the static and dynamic current-voltage relation in a PEMFC.

Validation of SVM Model with the Second FC Load Current Profile
Finally, Figure 11 shows the dynamic V-I characteristic of the PEMFC using the second profile of the FC load current. The horizontal axis corresponds to the validation samples of the current, and the vertical axis corresponds to the experimental and estimated samples of the voltage. Notice that the SVM model is able to capture the dynamic characteristic of the PEMFC. The maximum difference between the experimental and estimated voltage is 3 V and occurs when the current is 16.5 A.

Conclusions
In this paper, an accurate and efficient support vector machine-based model has been developed for proton-exchange membrane fuel cells. The important contribution of the proposed modeling approach is the considerable reduction of the samples used in the training phase of the SVM model, compared with other recent PEMFC modeling mechanisms, such as the diffusive global model and evolution strategy-based model. This reduction in the number of samples required in the training phase reduces the dimension of the model vectors and, therefore, reduces the computation time in the validation phase. Two FC load current profiles have been considered to evaluate the predictive performance of the proposed PEMFC model based on an SVM. The real measurements were taken from the Nexa® power module with a DC power up to 1.2 kW. The performance of the proposed model was carried out by computing the RMSE between the estimated voltage samples and the measured ones. The RMSE obtained for the first profile was 1 V, and for the second profile was 0.87 V. These results are lower than those obtained with the diffusive model, which are 1.05 V for the first profile and 2.32 V for the second profile, and the evolution strategy model, which is 2.31 V for the second profile. Notice that, for the second profile, the error obtained with the proposed model is 62% lower than with the other models. According to the performance indices, the predictive performance of the proposed SVM model is better than the DG and ES models.
Once the methodology has been validated, it would be possible to use the SVM model to predict the voltage of the PEMFC. Results show that the proposed SVM model is able to capture the dynamic characteristic of the PEMFC, validating the use of the SVM model in further real applications related to its control systems. In future work, it is recommendable to speed up the convergence in the training process by optimizing the amount of data used.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A
Equations (A1)-(A4) are the mathematical definitions of the indices R-square, RE, MAE, and SD, respectively, used to compare the result of the proposed method with other PEMFC models proposed in the literature.
where v k represent the estimated voltage, v k is the measured voltage, e k is the difference between v k and v k , v kmean is the mean value of the measured voltage, N is the number of samples, and k represent the kth sample.