A Multi-Stage Fault Diagnosis Method for Proton Exchange Membrane Fuel Cell Based on Support Vector Machine with Binary Tree

: The reliability and durability of the proton exchange membrane (PEM) fuel cells are vital factors restricting their applications. Therefore, establishing an online fault diagnosis system is of great signiﬁcance. In this paper, a multi-stage fault diagnosis method for the PEM fuel cell is proposed. First, the tests of electrochemical impedance spectroscopy under various fault conditions are conducted. Speciﬁcally, prone recoverable faults, such as ﬂooding, membrane drying, and air starvation, are included, and different fault degrees from minor, moderate to severe, are covered. Based on this, an equivalent circuit model (ECM) is selected to ﬁt impedance spectroscopy by the hybrid genetic particle swarm optimization algorithm, and then fault features are determined by the analysis of each model parameter under different fault conditions. Furthermore, a multi-stage fault diagnosis model is constructed with the support vector machine with the binary tree, in which fault features obtained from the ECM are used as the characteristic inputs to realize the fault classiﬁcation (including fault type and fault degree) online. The results show that the accuracy of the basic fault test and subdivided fault test can reach 100% and 98.3%, respectively, which indicates that the proposed diagnosis method can effectively identify ﬂooding, drying, and air starvation of PEM fuel cells.


Introduction
The energy shortage and environmental pollution problems caused by using fossil fuels are becoming increasingly severe, which means new and clean alternative energy sources are gradually becoming the key research direction of various countries [1]. The proton exchange membrane (PEM) fuel cell has become one of the ideal power sources for new energy transportation due to the advantages of high efficiency, fast start-up speed, environmental protection, and lower operating temperature [2]. However, the promotion and large-scale commercial application of PEM fuel cells are restricted by their service life and reliability [3]. Under vehicular conditions, improper internal states management can lead to adverse phenomena, such as reactant starvation, membrane drying, and flooding, finally affecting output performance and service life. Material and structural optimization and design, as well as manufacturing methods for internal components, including membrane [4], catalyst layer [5], gas diffusion layer [6], and bipolar plate [7], can fundamentally alleviate these failures. On the other hand, advanced system control and management to ensure that the fuel cell works under the right conditions are also important, where the online fault diagnosis system can detect the early stage of the fault in time so that the operating conditions can be adjusted in time to prevent further deterioration, which is of great significance to improving the reliability and lifetime of PEM fuel cells [8].
For an online fault diagnosis system, two major elements, namely, the information acquisition module and diagnosis module, are included. In terms of critical information extraction, techniques are mainly divided into two main categories: physicochemical tests, such as pressure drop measurement, neutron imaging, and magnetic resonance imaging; electrochemical methods: such as polarization curve, current pulse injection, and electrochemical impedance spectroscopy (EIS) [9]. The gas pressure drop between channel inlet and outlet is an effective signal for the determination of gas transfer resistance, which is closely associated with flooding failure, but it seems that drying information cannot be obtained [10]. Neutron imaging and magnetic resonance imaging can realize internal in situ measurement but are not suitable for on-board applications due to limitations of measurement and price [11]. The polarization curve or voltage under specific current density is the most direct indicator to judge output performance and is easy to obtain by cell voltage monitor, while it is not accessible to distinguish fault type since all failures are ultimately a drop in voltage [12]. Current pulse injection mainly reflects ohmic resistance, which is primarily related to membrane water content, thus flooding or starvation cannot be accurately located [13]. On the other hand, the electrochemical impedance spectroscopy analyzing internal dynamics of PEM fuel cells at different time scales from the perspective of the frequency domain has been widely applied in performance assessment [14,15]. Legros et al. [16] found that PEM fuel cell flooding mainly affected the mass transfer impedance and cathode Warburg impedance by EIS measurement under fault experiment and further proved the feasibility of using EIS to diagnose flooding fault. Similarly, Debenjak et al. [17] measured the EIS of an 80-piece PEM fuel cell stack and found that the impedance at 30 Hz, 100 Hz, and 300 Hz had more significant differences under flooding and drying, so they concluded that the impedance at these frequency points could be used for fault diagnosis. Considering that the faster impedance acquisition with quick calculation techniques had been already proposed for low-cost online application [18], by this, the EIS-based feature acquisition is applied in this paper.
Fault diagnosis is an essential prerequisite for fault-tolerant control and fault elimination, and Gao et al. [19] presented a comprehensive review of the real-time fault diagnosis method from model-based and signal-based perspectives. As for PEM fuel cells, fault diagnosis methods can mainly be divided into the model-based method, data-driven method, and hybrid method [20][21][22]. The model-based approach, where the mechanism model or empirical model that can predict the system performance is needed, detects typical faults by a residual evaluation based on the variance between the predefined model and measured signal. In detail, the mechanism model shows satisfactory accuracy, but it is not suitable for online applications due to its high computation. In contrast, the empirical model with a simpler expression and fewer parameters, such as the equivalent circuit model (ECM), is more popularly used in fault diagnosis of PEM fuel cells. For example, Fouquet et al. [23] improved the traditional Randles ECM by replacing the double-layer capacitor with a constant phase element (CPE) and provided a qualitative explanation for the variation of the model parameters under flooding and drying. Rubio et al. [24] proposed two kinds of ECM and established the correlation between the parameters of ECM and internal states within the PEM fuel cell to diagnose flooding and drying fault. For another, the data-driven diagnosis method that considers the PEM fuel cell as a black box detects fault via artificial intelligence method, statistical method, and signal processing method based on analyzing a large amount of historical data [25,26]. Li et al. [27] applied Fisher discriminant analysis to extract characteristic parameters from voltage and used the support vector machine to classify faults of fuel cells, which achieved good results both offline and online. Benouioua et al. [28] analyzed the singularities of the output voltage signal of PEM fuel cell via wavelet transform and further classified the flooding fault of fuel cell accurately by using the k-nearest neighbor method. Riascos et al. [29] used Bayesian networks classification for the PEM fuel cell fault diagnosis. Note that the data-driven diagnosis method is essentially the analysis of data, and the accuracy of its diagnosis results depends on the training of data, which means that high precision requires a particular scale of data. However, the acquisition and storage of large amounts of data are often not easy, which requires many prior experiments and has high requirements on hardware resources. Recently, several researchers have used EIS as the information acquisition to diagnose fuel cell faults in combination with a data-driven approach. Zhang et al. [30] proposed a diagnosis method based on fuzzy clustering by extracting graphic features from the EIS of PEM fuel cell as indicators to complete the fault diagnosis of different degrees of flooding and drying. Lu et al. [31] designed an online fault diagnosis system via online EIS calculation, and the parameters of the ECM identified by the least square method were input into the model on the basis of a machine-learning algorithm to complete the diagnosis. Their proposed system successfully diagnosed the flooding and drying faults of the PEM fuel cell with an accuracy of 90.9%. Inspired by the characteristics of the model-based method and data-driven method, these two methods can be combined, namely the hybrid method [22]. Recently, Djeziri et al. [32] proposed a hybrid method that combines a prior physical model and data-driven updated kernel for fuel cell failure diagnostics, where the updated kernel is enabled when the estimation error between the predicted and measured values of stack voltage surpasses a predefined threshold. Similarly, Pan et al. [33] combined a model-based adaptive Kalman filter and data-driven NARX neural network to realize fuel cell failure diagnostics. From another perspective, the fault diagnosis of the PEM fuel cell based on external signals is inseparable from the sensor measurement. The accuracy of measurement data is the premise of subsequent diagnostic applications. In the actual application of fuel cell vehicles, sensors may encounter significant measurement errors and complete failure. In general, the underlying software of the fuel cell system control unit may determine whether there is a complete failure through analog detection. As for measurement errors, an example is given by Won et al. [34], where the air flow meter fault caused by reduced measurement sensitivity was detected by an artificial neural network classifier and a residual-based diagnosis model. In comparison, this paper assumes that all sensors, including impedance measurement equipment, can work normally. The failure of the PEM fuel cell itself is the focus of research.
Although the above methods have made a significant effect on the fault diagnosis study of PEM fuel cells, there are still some challenges. Firstly, respecting fault features extraction based on EIS with a predefined ECM, the least-squares or directly software fitting is usually used for EIS fitting thanks to their fast convergence speed, but the initial value of various components of ECM needs to be set in advance artificially, which is not conducive to the online application of diagnosis. What is more, the vehicular fuel cell system operating under dynamic conditions has been facing a wide variety of situations with different fault types/degrees. It seems that incorrect fault degree detection may lead the controller to take drastic measures, even fault type is correctly identified, which may result in the fault aggravation or occurrence of other faults. Therefore, fault degree detection is also essential, and a case of the detection and identification of air stoichiometry fault with different degrees was given by Pahon et al. [35], where the fault diagnosis tool was established by wavelet transform technology. Zheng et al. [36] also proposed a data-driven fault detection tool on the basis of reservoir computing to study faults under four degrading operating conditions. However, multi-degree fault diagnoses of membrane drying, flooding, and air starvation are often overlooked. Considering these research gaps, it is strongly incentivized to design an innovative multi-stage online impedance-based fault diagnosis method for improving fuel cell management robustness. In this study, first, a comprehensive fuel cell fault experimental procedure is carried out, in which flooding, membrane drying, and air starvation, covering from minor to moderate and severe, are included. Accordingly, a fuel cell failure data set is established. Second, an improved Randles ECM is introduced to fit EIS by the hybrid genetic particle swarm optimization algorithm, in which the initial values of ECM components are replaced by parameter ranges, avoiding the accurate initial parameter selection. Then, a support vector machine with the binary tree (BT-SVM) is introduced for the detection of fault types, where part of the fitted ECM parameters is selected as characteristic inputs to realize the fault type classification, which can further distinguish the fault degree on the basis of fault types.

Experimental Platform
The PEM fuel cell tested here is a single cell with an active area of 50 cm 2 commercial membrane electrode assembly (MEA) produced by Shanghai Fuel cell Vehicle Powertrain Co. LTD. The flow field structure within the anode/anode graphite bipolar plate is a three-channel snake flow field, and the flow field of coolant is parallel straight structure. The general connection structure of the experimental test system is shown in Figure 1. The fuel cell test bench (G60, Greenlight, Vancouver, BC, Canada) is used to monitor the external state and control the operating parameters, such as gas stoichiometry, inlet humidity, pressure, and cell temperature. The electrochemical workstation (Reference 3000AE with 30k Booster, Gamry Instruments, Warminster, PA, USA) is applied to measure the EIS, and a booster is added to amplify the disturbance signal in the workstation (the original equipment is not suitable for EIS testing at high current density). Consider that a minor voltage disturbance can lead to a considerable change in current, the EIS experiment is conducted under galvanostatic mode by sweeping frequencies covering the range of 5 kHz to 0.1 Hz with 10 points per decade (when the low frequency is as low as 0.01 Hz and 0.001 Hz, the measurement time will increase to about 1 h and 10 h, which is not conducive to stable measurement). Furthermore, in order to ensure that the system approximately meets the linear condition during impedance measurement, the amplitude of the disturbance signal should be as small as possible. On the other hand, respecting the interference of noise and the measurable precision, the amplitude of the disturbance signal should not be too small. For this, the amplitude of the current disturbance selected is 8% of the DC load to ensure a significant trade-off between system linearization conditions and signal-to-noise ratio. Moreover, before each EIS measurement under corresponding fault test conditions, the PEM fuel cell should be sufficiently stable until cell voltage does not change obviously to ensure measurement accuracy.

Experimental Platform
The PEM fuel cell tested here is a single cell with an active area of 50 cm membrane electrode assembly (MEA) produced by Shanghai Fuel cell Vehic Co. LTD. The flow field structure within the anode/anode graphite bipolar p channel snake flow field, and the flow field of coolant is parallel straight s general connection structure of the experimental test system is shown in Figu cell test bench (G60, Greenlight, Vancouver, BC, Canada) is used to monito state and control the operating parameters, such as gas stoichiometry, in pressure, and cell temperature. The electrochemical workstation (Reference 30k Booster, Gamry Instruments, Warminster, PA, USA) is applied to mea and a booster is added to amplify the disturbance signal in the workstation equipment is not suitable for EIS testing at high current density). Consider voltage disturbance can lead to a considerable change in current, the EIS e conducted under galvanostatic mode by sweeping frequencies covering t kHz to 0.1 Hz with 10 points per decade (when the low frequency is as low a 0.001 Hz, the measurement time will increase to about 1 h and 10 h, which is n to stable measurement). Furthermore, in order to ensure that the system a meets the linear condition during impedance measurement, the ampl disturbance signal should be as small as possible. On the other hand, r interference of noise and the measurable precision, the amplitude of the signal should not be too small. For this, the amplitude of the current disturb is 8% of the DC load to ensure a significant trade-off between system conditions and signal-to-noise ratio. Moreover, before each EIS measur corresponding fault test conditions, the PEM fuel cell should be sufficientl cell voltage does not change obviously to ensure measurement accuracy.

Experiment Procedures
This paper aims to propose a multi-stage fault diagnosis method and validate its performance under flooding, membrane drying, and starvation conditions. First of all, a standard operating condition referred to typical vehicle working conditions is established prior to fault impedance testing, and specific information is displayed in Table 1. Based on previous papers, Table 2 presents primary causes of flooding, drying, and starvation by external operating conditions and internal effects [31,37]. It seems that changing operating conditions is convenient and the most direct way to create anticipant faults by directly influencing internal transfer and reaction process. Hence, fault conditions are adjusted based on the standard operating condition. On the other hand, according to polarization curves and sensitivity analysis of polarization loss against different operating conditions in our previous works [38], it can be noted that cell temperature has a significant influence on the ohmic resistance and proton transfer resistance. A higher cell temperature can lead to a lower cell voltage at a smaller current density because this condition easily causes lower water activity. In contrast, the PEM fuel cell presents a poor performance at a lower cell temperature since water vapor condenses easily. Moreover, the convection on the cathode side is conducive to the discharge of liquid water, if the air stoichiometry is reduced, the convection effect will be weakened, and the fuel cell is extremely prone to flooding. At the same time, higher inlet gas humidity and higher current density are also recommended in the flooding experiment. Membrane drying is usually the opposite of flooding fault, and its occurrence can adjust operating conditions in the reverse direction of flooding experiments. As for the vehicular fuel cell system, air starvation is more likely to occur than hydrogen starvation in the process of dynamic load change since the mixture air is more viscous, and the response hysteresis of the air compressor is more severe than that of the proportional valve [39,40]. Hence, the air starvation fault experiment is designed here, and air stoichiometry is the main regulating parameter.  The basic fault experiments designed based on the standard operating condition are shown in Table 3. The experimental steps of the basic fault are as follows: (1) Adjust the test bench to make the fuel cell run stably for 20 min under the standard operating condition; (2) Set the minor flooding condition according to Table 3, and run steadily for 30 min and then measure EIS 20 times, and stabilize for 10 min between each two EIS measurements; (3) Change the condition to moderate flooding and severe flooded conditions, repeat step 3 to complete the EIS measurement of medium and severe flooding; (4) Repeat steps 1, 2, and 3 to complete the basic fault EIS measurements of minor, moderate, and severe degrees of drying and air starvation. Table 3. Fuel cell basic fault operating condition adjusted based on standard operating condition 1 .

Fault
Level Again, four subdivided fault degree experiments of fuel cells are defined, on account of the above basic fault (degree: 1, 4, 7) tested procedures. The subdivided fault degrees are labeled as 2, 3, 5, and 6, respectively, in which degree 2 corresponds to the minor level of the fault, degrees 3 and 5 represent the moderate level, and degree 6 to the severe level. The specific subdivided fault conditions are listed in Table 4, and each test sequence is tested 5 times. Table 4. Fuel cell subdivided fault operating condition adjusted based on standard operating condition 1 .

Fault
Level Degree (Basis) So far, a total of 200 basic fault experiments and 60 subdivided fault exper have been conducted. First, 150 measured samples extracted from basic fault experi data were selected as the training data set for diagnosis model training. The remai samples from basic fault date set and 60 samples in subdivided fault data set were u for model accuracy detection. The data usage instructions are shown in Figure 2.

Equivalent Circuit Model
The equivalent circuit model composed of several simple equivalent comp separating from the complex internal mechanism process, is an effective means to a the impedance spectroscopy of the PEM fuel cell in real-time and quantitatively ca each polarization loss. A typical EIS of the PEM fuel cell and its relation to i dynamics are given in Figure 3a, and anode polarization processes are not in because they show negligible effects on the overall performance loss of the PEM f [41]. The ohmic impedance, equaling to the intercept of the spectroscopy and the re mainly refers to the loss of the proton passing through the membrane and the e passing through the electrode material. Furthermore, the impedance from high-fre to low-frequency is mainly dominated by proton transfer within the cathode io charge transfer attributed to oxygen reduction, and oxygen transfer in the c respectively. The high-frequency inductive phenomenon is mainly related to equ disturbance [42]. For another, there is the same variation trend between the ohm and cathode proton transfer loss, so the fault characteristic involving proton transp in the cathode catalyst layer is ignored to reduce the calculation of ECM. The trad Randles model with Warburg element is shown in Figure 3b, where m is the resistance, and ct is the charge transfer resistance, and dl stands for the doub capacitor attributed to the capacitance effect between the anode and cathode. W element w reflects the oxygen mass transfer, and its impedance expression is giv where w represents the mass transfer resistance; w is the time constant; angular frequency; and is the imaginary part. Give that the capacitor with dist parameters rather than lumped parameters can better describe real non-uniform st in porous electrode, the constant phase element (CPE) can be used to replace the st double-layer capacitor, and its impedance CPE is expressed as:

Equivalent Circuit Model
The equivalent circuit model composed of several simple equivalent components, separating from the complex internal mechanism process, is an effective means to analyze the impedance spectroscopy of the PEM fuel cell in real-time and quantitatively calculate each polarization loss. A typical EIS of the PEM fuel cell and its relation to internal dynamics are given in Figure 3a, and anode polarization processes are not included because they show negligible effects on the overall performance loss of the PEM fuel cell [41]. The ohmic impedance, equaling to the intercept of the spectroscopy and the real axis, mainly refers to the loss of the proton passing through the membrane and the electron passing through the electrode material. Furthermore, the impedance from high-frequency to low-frequency is mainly dominated by proton transfer within the cathode ionomer, charge transfer attributed to oxygen reduction, and oxygen transfer in the cathode, respectively. The high-frequency inductive phenomenon is mainly related to equipment disturbance [42]. For another, there is the same variation trend between the ohmic loss and cathode proton transfer loss, so the fault characteristic involving proton transport loss in the cathode catalyst layer is ignored to reduce the calculation of ECM. The traditional Randles model with Warburg element is shown in Figure 3b, where R m is the ohmic resistance, and R ct is the charge transfer resistance, and C dl stands for the double-layer capacitor attributed to the capacitance effect between the anode and cathode. Warburg element Z w reflects the oxygen mass transfer, and its impedance expression is given as: where R w represents the mass transfer resistance; T w is the time constant; ω is the angular frequency; and j is the imaginary part. Give that the capacitor with distributed parameters rather than lumped parameters can better describe real non-uniform structure in porous electrode, the constant phase element (CPE) can be used to replace the standard doublelayer capacitor, and its impedance Z CPE is expressed as: where T dl is the time constant; P dl is an exponent of the electrode distribution parameters, and the CPE is a standard capacitor. Based on this, the total impedance of Randles model with Warburg element and CPE (see Figure 3c) can be calculated as follows: in which R m , R ct , R w , T w , P w , T dl , and P dl are the parameters to be identified. The secondorder RQ (R-CPE) model evolved from the Randles model with CPE, as shown in Figure 3d, is also frequently used for impedance analysis.
where dl is the time constant; dl is an exponent of the electrode distribution parameters, and the CPE is a standard capacitor. Based on this, the total impedance o Randles model with Warburg element and CPE (see Figure 3c) can be calculated as follows in which m , ct , w , w , w , dl , and dl are the parameters to be identified. The second-order RQ (R-CPE) model evolved from the Randles model with CPE, as shown in Figure 3d, is also frequently used for impedance analysis. To select an appropriate ECM for fault feature identification and extraction, the impedance carried out in the standard operating condition given in Table 1 is chosen to test the fitting effect of three ECMs mentioned above by Zview software. The Chi-square is often used to evaluate the fitting accuracy of ECM, which represents the deviation between the actual value and the fitted value of the model, and its expression is as follows where 0 is the measured impedance; e is the estimated impedance of the ECM. The fitting results are shown in Figure 4 and  To select an appropriate ECM for fault feature identification and extraction, the impedance carried out in the standard operating condition given in Table 1 is chosen to test the fitting effect of three ECMs mentioned above by Zview software. The Chi-square is often used to evaluate the fitting accuracy of ECM, which represents the deviation between the actual value and the fitted value of the model, and its expression is as follows: where Z 0 is the measured impedance; Z e is the estimated impedance of the ECM. The fitting results are shown in Figure 4 and

Hybrid Genetic Particle Swarm Optimization Algorithm
To avoid initial parameter determination, intelligent optimization algorith more appropriate for online identification. The genetic algorithm (GA) is an evolu algorithm that imitates the evolution of a population, which mainly includes fitness calculation, selection, crossover, and mutation steps [43]. The algorithm firs to encode the target parameters and initialize individuals to construct the population, then combine the target problem to calculate fitness calculation corresponding chromosomal individuals, and on this basis, select the outs individuals with high fitness to complete the genetics operations such as crosso mutation. After each process, better populations will be produced, repeatedly su to fitness calculation and genetics operations until the target population is obtain particle swarm optimization (PSO) algorithm is a global search algorithm based habits of the foraging behavior of a flock of birds, which is modified by using the and position model, and the adaptation calculation is completed through the con iteration of particle velocity and position to find the optimal solution [44]. Fir assumes that there exists a particle population ( =

Hybrid Genetic Particle Swarm Optimization Algorithm
To avoid initial parameter determination, intelligent optimization algorithms are more appropriate for online identification. The genetic algorithm (GA) is an evolutionary algorithm that imitates the evolution of a population, which mainly includes coding, fitness calculation, selection, crossover, and mutation steps [43]. The algorithm first needs to encode the target parameters and initialize individuals to construct the initial population, then combine the target problem to calculate fitness calculation of the corresponding chromosomal individuals, and on this basis, select the outstanding individuals with high fitness to complete the genetics operations such as crossover and mutation. After each process, better populations will be produced, repeatedly subjected to fitness calculation and genetics operations until the target population is obtained. The particle swarm optimization (PSO) algorithm is a global search algorithm based on the habits of the foraging behavior of a flock of birds, which is modified by using the velocity and position model, and the adaptation calculation is completed through the continuous iteration of particle velocity and position to find the optimal solution [44]. First, PSO assumes that there exists a particle population (X = x ) in a n-dimensional search space, and the position vector of the kth particle in the space at kn , k = 1, 2, · · · , n. The position of each particle is a potential feasible solution in the search space, and the velocity vector corresponding to the particle at moment t is V kn . If the best position searched by the kth particle in the current space is P kn , and the best position of the whole target population is P gn . The iterative formulas of particle velocity and position can be expressed as: where ω is the inertia factor; c 1 and c 2 are learning factor; and r 1 and r 2 are random numbers within in [0, 1]. A comprehensive review of parameter identification techniques for the PEM fuel cell is given by Priya et al [45], and more detailed information about GA and PSO can be found in it.
Both GA and PSO belong to the bionic algorithm, and the evolution process of both is similar. The information is shared among individuals in GA with the population going through genetics operations so that the whole population can evolve uniformly toward the optimal solution, making GA obtain high precision. However, the computation of GA is considerable because of its chromosomes encoding and decoding, which results in a long convergence time. In contrast, the PSO has fewer parameters with a more straightforward structure, making it easy to implement and converge quickly. Still, it is easy to fall into the local optimum and reach a poor result due to the insufficient information interaction between particles. By this, we combine the advantages of GA and PSO to construct a hybrid genetic particle swarm optimization algorithm (HGAPSO) for parameter identification of ECM in the application of PEM fuel cells. The general framework of the HGAPSO is shown in Figure 5. The selection and mutation are further added on particles at each iteration to reduce the uncertainties of being trapped into local optima [46].
where is the inertia factor; 1 and 2 are learning factor; and 1 and 2 are numbers within in [0,1]. A comprehensive review of parameter identification te for the PEM fuel cell is given by Priya et al [45], and more detailed information a and PSO can be found in it.
Both GA and PSO belong to the bionic algorithm, and the evolution proces is similar. The information is shared among individuals in GA with the populati through genetics operations so that the whole population can evolve uniforml the optimal solution, making GA obtain high precision. However, the computati is considerable because of its chromosomes encoding and decoding, which res long convergence time. In contrast, the PSO has fewer parameters with straightforward structure, making it easy to implement and converge quickly. easy to fall into the local optimum and reach a poor result due to the in information interaction between particles. By this, we combine the advantages o PSO to construct a hybrid genetic particle swarm optimization algorithm (HGA parameter identification of ECM in the application of PEM fuel cells. The framework of the HGAPSO is shown in Figure 5. The selection and mutation ar added on particles at each iteration to reduce the uncertainties of being trapped optima [46].

Parameters Identification
In general, the sum of the error squares of the actual measured values and c values will be adopted as the fitness function of the algorithm, which can be expr where is the total frequency points of EIS; ′ and ′′ are the real and imagin of the single-frequency impedance measured by the electrochemical wor

Parameters Identification
In general, the sum of the error squares of the actual measured values and calculated values will be adopted as the fitness function of the algorithm, which can be expressed as: where n is the total frequency points of EIS; Z and Z are the real and imaginary parts of the single-frequency impedance measured by the electrochemical workstation, respectively; and z and z are the real and imaginary parts of the impedance fitted by ECM.
The goal of the HGAPSO algorithm is to minimize the fitness function. However, the fitness function does not distinguish the impedance at the high and low frequencies based on Equation (7), which will lead to obvious error of fitting in high frequency. To improve the fitting accuracy, a frequency weighting factor W j consisting of the inverse of the impedance modulus is introduced, as expressed in Equation (8). W j can be adaptively adjusted according to the frequency, thus reducing the volatility of the fitness function.
According to Equations (7) and (8), the final fitness function of HGAPSO can be expressed as: The parameter range is adjusted on account of experimental data and can cover the extreme conditions of fuel cells, which ensures the stability of the algorithm and make the online application possible, which is shown in Table 6. Table 6. Parameter range of critical components of the ECM.

Symbol
Unit Range R m mΩ · cm 2 10~200 R ct mΩ · cm 2 10~1500 R w mΩ · cm 2 10~1500 T w s P w /mΩ · cm 2 0.1~1 P w -0.5~1 T dl s P dl /mΩ · cm 2 1 × 10 −5 ∼ 1 × 10 −4 P dl -0.5~1 Four different conditions (normal/flooding/drying/air starvation) test whether the HGAPSO algorithm could keep high identification accuracy under extreme conditions. The maximum number of iterations of the HGAPSO is set to 200, and the termination condition is that the minimum value of the fitness function is less than 1 × 10 −5 . Figure 6 presents the convergence curves of the fitness function of HGAPSO under the four working conditions. It is seen that the adaptation curves are relatively smooth in the convergence process, and the algorithm converges to the same fitness value at a lower number of iterations for 10 consecutive identifications in all four conditions, proving that the HGAPSO algorithm has good stability and convergence speed in the parameter identification. Figure 7 shows the EIS fitting results for the above four conditions with the HGAPSO algorithm and Zview, and the relative deviation between the HGAPSO and the Zview is less than 3%, which also indicates that the HGAPSO algorithm has sufficient accuracy.  Figure 7 shows the EIS fitting results for the above four conditions with the HGAPSO algorithm and Zview, and the relative deviation between the HGAPSO and the Zview is less than 3%, which also indicates that the HGAPSO algorithm has sufficient accuracy.

Support Vector Machine Algorithm with Binary Tree
Support vector machine (SVM) is a machine learning method based on statistica learning theory. As shown in Figure 8, the basic SVM is designed to solve the binary classification problem, and its principle is to find an optimal hyperplane L in the data se [47]. Hence, the separation distance between the two types of data and the hyperplane L is the largest, that is, with the L parallel sample boundary 1 and 2 have the larges interval, and the sample point on the maximum interval sample boundary is the suppor  Figure 7 shows the EIS fitting results for the above four conditions with the HGAPSO algorithm and Zview, and the relative deviation between the HGAPSO and the Zview i less than 3%, which also indicates that the HGAPSO algorithm has sufficient accuracy.

Support Vector Machine Algorithm with Binary Tree
Support vector machine (SVM) is a machine learning method based on statistica learning theory. As shown in Figure 8, the basic SVM is designed to solve the binary classification problem, and its principle is to find an optimal hyperplane L in the data se [47]. Hence, the separation distance between the two types of data and the hyperplane L is the largest, that is, with the L parallel sample boundary 1 and 2 have the larges interval, and the sample point on the maximum interval sample boundary is the suppor

Support Vector Machine Algorithm with Binary Tree
Support vector machine (SVM) is a machine learning method based on statistical learning theory. As shown in Figure 8, the basic SVM is designed to solve the binary classification problem, and its principle is to find an optimal hyperplane L in the data set [47]. Hence, the separation distance between the two types of data and the hyperplane L is the largest, that is, with the L parallel sample boundary l 1 and l 2 have the largest interval, and the sample point on the maximum interval sample boundary is the support vector. The set of sample data should be set as D i = {x i , y i | x i ∈ R m , y i ∈ {−1, 1}} n , where x i is the sample vector, and y i is the sample label, such that the classification function is: Then the classification interval is: Normalize Equation (11) and one can obtain: where w is the normal vector to the optimal plane; b is the bias of the hyperplane to the origin. To maximize the classification interval distance γ i , w should be minimized, which is equivalent to minimizing w 2 /2. Therefore, the objective function minimization problem can be transformed into a plane programming problem with constraints [48]: where the optimal Lagrange factor α * i can be obtained by solving the following convex optimization problem: in which C is a penalty factor to constrain the error degree of the algorithm. When the sample data of the system are not linearly separable, a nonlinear mapping, i.e., a kernel function needs to be introduced to map the sample data into higher dimensions and transform it into a linear problem for further processing. The kernel function chosen in this study is a Gaussian kernel function, as shown in the following equation: Then the final classification decision function can be shown in the following Equation (16): Nevertheless, the traditional SVM algorithm is a binary classifier that cannot meet the multiple fault type classification. Therefore, it is necessary to improve the traditional SVM algorithm by extending its binary classification capability into multi-classification. Currently, there are two primary forms of SVM-based multiclassification improvement: one-to-one and one-to-many. In details, one-to-one refers to designing classifiers for the data set two-by-two, and it is necessary to train N(N − 1)/2 classifiers for the N-class problem. In contrast, only N classifiers need to be trained in one-to-many, which have higher efficiency for issues with a more significant number of classes. Recognizing that the fault of the PEM fuel cell contains three types, the one-to-many SVM is applied to transform the fault diagnosis of fuel cells into a multiclassification problem in the form of the binary tree. The whole structure of the designed fuel cell fault diagnosis tool is manifested in Figure 9. Specially, the backbone of the fault classifier consists of a pre-classifier and three fault-type classifiers. The pre-classifier is placed first to determine whether the PEM fuel cell has faults, and terminate if there is no fault to improve the diagnosis efficiency, which is consistent with the actual situation that the PEM fuel cell is in the normal status most of the time during operating. The fault type classifier is designed in the order of flooding, drying, and air starvation, which is arranged according to the probability of fault occurrence in the empirical data statistics [49]. Such a design sequence can save computing resources and improve computing efficiency to a certain extent, which is conducive to the online application of the diagnostic algorithm. When the PEM fuel cell is diagnosed to be in specific fault status (flooding, drying, and air starvation), it will continue to enter three fault degree classifiers to analyze the fault degree further, thus improving the diagnostic accuracy and providing a more accurate guide for the controller regulation. Then the classification interval is: Normalize Equation (11) and one can obtain: where is the normal vector to the optimal plane; is the bias of the hyperplane to th origin. To maximize the classification interval distance , ∥ ∥ should be minimized which is equivalent to minimizing ∥ ∥ 2 /2 . Therefore, the objective functio minimization problem can be transformed into a plane programming problem wit constraints [48]:  Then the final classification decision function can be shown in the following Equation (16): Nevertheless, the traditional SVM algorithm is a binary classifier that cannot meet the multiple fault type classification. Therefore, it is necessary to improve the traditional SVM algorithm by extending its binary classification capability into multi-classification. Currently, there are two primary forms of SVM-based multiclassification improvement: one-to-one and one-to-many. In details, one-to-one refers to designing classifiers for the data set two-by-two, and it is necessary to train N(N − 1)/2 classifiers for the N-class problem. In contrast, only N classifiers need to be trained in one-to-many, which have higher efficiency for issues with a more significant number of classes. Recognizing that the fault of the PEM fuel cell contains three types, the one-to-many SVM is applied to transform the fault diagnosis of fuel cells into a multiclassification problem in the form of the binary tree. The whole structure of the designed fuel cell fault diagnosis tool is manifested in Figure 9. Specially, the backbone of the fault classifier consists of a preclassifier and three fault-type classifiers. The pre-classifier is placed first to determine whether the PEM fuel cell has faults, and terminate if there is no fault to improve the diagnosis efficiency, which is consistent with the actual situation that the PEM fuel cell is in the normal status most of the time during operating. The fault type classifier is designed in the order of flooding, drying, and air starvation, which is arranged according to the probability of fault occurrence in the empirical data statistics [49]. Such a design sequence can save computing resources and improve computing efficiency to a certain extent, which is conducive to the online application of the diagnostic algorithm. When the PEM fuel cell is diagnosed to be in specific fault status (flooding, drying, and air starvation), it will continue to enter three fault degree classifiers to analyze the fault degree further, thus improving the diagnostic accuracy and providing a more accurate guide for the controller regulation.   validity of fault degree diagnosis between adjacent basic fault levels. Figure 11 shows the ECM parameters under different operating conditions ide by HGAPSO algorithm. As can be seen, with the fault degree increasing, m , w , dl have no obvious intersection of the curves under three fault conditions, pos high differentiation and good fault type differentiation capability. ct also increas the growth of fault degree after the interpolation of low fault degree. On the othe w and dl cross over each other as the fault degree growth, and there is no relationship between different fault degrees. In this regard, it can be considered and dl are not suitable for fault diagnosis so that m , w , ct , w , and dl are s as fault feature inputs.    Figure 11 shows the ECM parameters under different operating conditions identified by HGAPSO algorithm. As can be seen, with the fault degree increasing, R m , R w , T w , and P dl have no obvious intersection of the curves under three fault conditions, possessing high differentiation and good fault type differentiation capability. R ct also increases with the growth of fault degree after the interpolation of low fault degree. On the other hand, P w and T dl cross over each other as the fault degree growth, and there is no linear relationship between different fault degrees. In this regard, it can be considered that P w and T dl are not suitable for fault diagnosis so that R m , R w , R ct , T w , and P dl are selected as fault feature inputs.

Fault Diagnosis Results
Once the fault features were extracted and filtered, as mentioned in Section 2.3, 150 samples from the basic fault experiments were selected as training data to complete the offline training of the diagnostic model, and the remaining 50 samples of basic fault data and 60 samples from the subdivided fault experiments were used as online test data for the diagnostic model. The diagnostic results are shown in Figure 12, where N represents the normal status, F, D, and S stand for the fuel cell faults of flooding, drying and air starvation, respectively. Additionally, the subscript notation "min", "mod", and "sev" represent the three levels of minor, moderate, and severe level, respectively. The number marked indicates the severity of the fault (five samples in a group). As shown in Figure 12a, it is seen that the diagnostic model can recognize fault type/degree with 100% accuracy under the basic fault experiment, and this is because both training and test data are extracted from impedance parameters under the basic fault conditions. Moreover, among the 60 samples in the subdivided fault experiment, the diagnostic model only misdiagnosed 1 sample, and the overall diagnostic accuracy is 98.3%. The misdiagnosis occurred in the 47nd sample, in which the fault is moderate air starvation in real, but diagnosed as minor level. The fault type was diagnosed correctly, confirming that the model has a high diagnostic ability for the fault type. The diagnostic error of the fault degree may be caused by the slight measurement error of EIS during the experiment. In summary, the fault diagnosis method proposed in this paper cannot only accurately diagnose flooding, drying, and air starvation fault types of the fuel cell based on the EIS, but also has good differentiation performance in terms of fault degree, which can effectively ensure the controller for fuel cell internal status regulation, so as to improve the reliability and service life of the PEM fuel cell.

Algorithm Related Parameters Analysis
It is well known that the amount of training data has a significant influence on the accuracy or learning ability of diagnostic models. Therefore, in addition to 150 samples for training, 50 samples and 100 samples were respectively applied for model training, and the same test set was used to test their accuracy, as shown in Figure 13. It can be seen that a lower number of training samples can also achieve 100% basic fault diagnosis. As for subdivided fault conditions, the detection accuracy based on the training of 50 samples and 100 samples is 90% and 96.7%, respectively, meaning that increasing the number of samples can improve the detection accuracy in a particular range. Still, the detection result of 200 samples after training was the same as that of 150 samples (not plotted here), which may be related to the fact that the training samples are all basic fault data. model has a high diagnostic ability for the fault type. The diagnostic error of the fault degree may be caused by the slight measurement error of EIS during the experiment. In summary, the fault diagnosis method proposed in this paper cannot only accurately diagnose flooding, drying, and air starvation fault types of the fuel cell based on the EIS, but also has good differentiation performance in terms of fault degree, which can effectively ensure the controller for fuel cell internal status regulation, so as to improve the reliability and service life of the PEM fuel cell.

Algorithm Related Parameters Analysis
It is well known that the amount of training data has a significant influence on the accuracy or learning ability of diagnostic models. Therefore, in addition to 150 samples for training, 50 samples and 100 samples were respectively applied for model training, and the same test set was used to test their accuracy, as shown in Figure 13. It can be seen that a lower number of training samples can also achieve 100% basic fault diagnosis. As for subdivided fault conditions, the detection accuracy based on the training of 50 samples and 100 samples is 90% and 96.7%, respectively, meaning that increasing the number of samples can improve the detection accuracy in a particular range. Still, the detection result of 200 samples after training was the same as that of 150 samples (not plotted here), which may be related to the fact that the training samples are all basic fault data. Another critical parameter setting in support vector machines is kernel function selection. Here, three different kernel functions, including linear, polynomial, and Gaussian, are compared. The linear kernel is usually used for linearly separable cases, Another critical parameter setting in support vector machines is kernel function selection. Here, three different kernel functions, including linear, polynomial, and Gaussian, are compared. The linear kernel is usually used for linearly separable cases, suitable for the number of features up to about the number of samples. Both polynomials and Gaussian kernels can map a sample to a higher-dimensional feature space, but higher-order polynomials have more parameters. Theoretically speaking, when selecting the kernel function, if we have specific prior knowledge of our data, the kernel function can be chosen based on the data distribution. If there is no previous information, cross-validation is usually used to try different kernels, and the kernel with the lowest error is the best effect. Figure 14 shows the diagnosis results based on different kernels. Observing that all configurations can achieve a satisfactory diagnostic result under basic fault conditions, but for subdivided fault, the Gaussian kernel has the best effect, while the accuracy of linear and polynomial are 70% and 96.7%, respectively.

Conclusions and Future Research Orientations
In this paper, a multi-stage fault diagnosis method based on BT-SVM was proposed to diagnose the fault type and fault degree of the PEM fuel cell. Firstly, a fault data set was established, including different fault types and fault degrees. Then, a Randles model with Warburg element and CPE was selected to extract fault features from the measured EIS. To realize the online parameter identification of ECM, a parameter identification method based on HGAPSO was proposed, where the initial values of ECM components were replaced by parameter ranges, avoiding the initial parameter selection. The relative error of parameters identified by the HGAPSO algorithm and the software Zview is less than 3% under four working conditions of normal, flooding, drying, and air starvation. Based on this, a diagnosis model in conjunction with BT-SVM was constructed and trained by 150 samples from basic fault experiment data set, where five identified ECM parameters were selected as valuable features to be input into the diagnostic model. The remaining 50 samples from the basic fault data set and 65 samples from the subdivided fault data set were applied to complete the online fault diagnosis. The results show that the diagnostic

Conclusions and Future Research Orientations
In this paper, a multi-stage fault diagnosis method based on BT-SVM was proposed to diagnose the fault type and fault degree of the PEM fuel cell. Firstly, a fault data set was established, including different fault types and fault degrees. Then, a Randles model with Warburg element and CPE was selected to extract fault features from the measured EIS. To realize the online parameter identification of ECM, a parameter identification method based on HGAPSO was proposed, where the initial values of ECM components were replaced by parameter ranges, avoiding the initial parameter selection. The relative error of parameters identified by the HGAPSO algorithm and the software Zview is less than 3% under four working conditions of normal, flooding, drying, and air starvation. Based on this, a diagnosis model in conjunction with BT-SVM was constructed and trained by 150 samples from basic fault experiment data set, where five identified ECM parameters were selected as valuable features to be input into the diagnostic model. The remaining 50 samples from the basic fault data set and 65 samples from the subdivided fault data set were applied to complete the online fault diagnosis. The results show that the diagnostic model demonstrates satisfactory performance for both basic fault and subdivided fault with the accuracy of 100% and 98.3%, respectively, which proves that the proposed diagnosis method is capable of improving the reliability and durability of the fuel cell system. Furthermore, the multi-stage fault diagnosis method with different training samples and kernel function configuration were analyzed.
There are still many improvements in the future: (1) Impedance online acquisition is the premise of impedance diagnosis. However, current laboratory impedance measurements are primarily based on expensive test equipment, which is challenging to use in a real vehicle environment. At present, the signal processing method based on wavelet variation can calculate the impedance spectrum covering a wide frequency range quickly online, but it requires an additional square wave excitation source, and the existing controller resources are challenging to meet its calculation requirements. Hence, it is necessary to develop a more efficient and fast online impedance calculation method to meet the needs of real-time fault diagnosis. (2) In this paper, although varying degrees of failure is produced by changing the external operating conditions, it seems that the complex working conditions of the vehicle fuel cell system are still not satisfied. In the future, it will be necessary to simulate fuel cell failure scenarios in extreme environments, such as high temperature, high humidity, cold, and plateau, based on the environment chamber, and then to implement diagnostic algorithm validation. (3) At present, the experimental object is a single cell (MEA level), so it is necessary to carry out fault experiment and diagnosis for a high-power fuel cell stack. Meanwhile, except for the overall performance of the stack, failures of the single cell in the stack also need to be considered.