Data-Driven Fault Diagnosis for Automotive PEMFC Systems Based on the Steady-State Identiﬁcation

: Data-driven diagnosis methods for faults of proton exchange membrane fuel cell (PEMFC) systems can diagnose faults through the state variable data collected during the operation of the PEMFC system. However, the state variable data collected from the PEMFC system during the stack switching between different operating points can easily cause false alarms, such that the practical value of the diagnosis system is reduced. To overcome this problem, a fault diagnosis method for PEMFC systems based on steady-state identiﬁcation is proposed in this paper. The support vector data description (SVDD) and relevance vector machine (RVM) optimized by the artiﬁcial bee colony (ABC) are used for the steady-state identiﬁcation and fault diagnosis. The density-based spatial clustering of applications with noise (DBSCAN) and linear least squares ﬁtting (LLSF) are used to identify the abnormal data in datasets and estimate change rates of the system state variables respectively. The proposed method can automatically identify the state variable data collected from the PEMFC system during the stack switching between different operating points, so that the diagnosis accuracy can be improved and false alarms can be reduced. The proposed method has a certain practical value and can provide a reference for further study.

The model-based method compares outputs of the system model with those of the actual system, and then the system faults can be diagnosed by analyzing the residuals between the two outputs [15].Escobet et al., 2009 [3] calculated the relative fault sensitivities with the residuals between model outputs under the normal operating conditions and fault conditions.The fault diagnosis and isolation of the PEMFC system were realized by calculating Euclidean distances between the observed and the theoretical relative fault sensitivities.Lira et al., 2010 [5] established a linear parameter variable dynamic model of the fuel cell system.The faults of the fuel cell system were diagnosed by the residuals between outputs of the model and the actual system.Rosich et al., 2014 [6] proposed a diagnosis method for faults of the PEMFC system.The residuals of results, which calculated by calculation formulas and redundant formulas of different variables, were used to diagnose related faults.Lee et al., 2020 [7] determined the faulty PEMFC subsystem and the faulty components in the subsystem through the residuals between the predicted values and the measured values of the system state so that fault diagnosis and isolation for the PEMFC system could be realized.Polverino et al., 2017 [8] designed a set of residual generators based on causal analysis, and investigated the maximum fault isolability when the minimum number of sensors were used.In addition, aiming at the faults of the solid oxide fuel cell (SOFC) system, Polverino et al., 2017 [9] realized the fault diagnosis using the complete model and the isolated sub-models.Vijay et al., 2019 [10] designed two nonlinear adaptive observers for an SOFC system; the faults can be diagnosed by monitoring the residuals between the measurement and the observer outputs.
Besides model-based methods, data-driven fault diagnosis methods have also attracted increasing attention.The data-driven methods can realize the PEMFC system fault diagnosis through extracting fault features from the state variable data of system which were collected under the normal operating conditions and fault conditions [15].Although the data-driven method is simple, it has proved to be effective in diagnosis [16].For the flooding of the PEMFC stack, Zhou et al., 2020 [1] used the orthogonal linear discriminant analysis and relevance vector machine (RVM) to reduce dimension and pattern recognition.Finally, Zhou et al., 2020 [1] proposed an online adaptive diagnosis strategy, so that the accuracy of diagnosis can be further improved.Mao et al., 2020 [4] proposed a data-driven diagnosis method based on sensor selection technology.It can identify the state variables of the system, which is sensitive to the performance change of the PEMFC system caused by the faults, so that faults can be diagnosed accurately.Lin et al., 2020 [11] compared the performances of different dimensionality reduction algorithms and different pattern recognition algorithms for the diagnosis of PEMFC system faults.For PEMFC stack faults, Li et al., 2019 [12] used Fisher discriminant analysis and incremental spherical shaped multi-class support vector machine (SVM) for the data dimension reduction and pattern recognition, and designed a specific integrated circuit for online fault diagnosis.Zhang et al.,2020 [13] proposed a fault diagnosis method of the PEMFC system for a hybrid tram.The simulated annealing genetic algorithm fuzzy C-means (FCM) clustering, synthetic minority over-sampling technique and a deep belief network are used for removing the invalid data in samples, processing the unbalanced data, and pattern recognition respectively.Han et al., 2020 [14] proposed a method for the diagnosis of the automotive PEMFC system faults.The possibilistic fuzzy C-means algorithm was used for removing invalid data in data samples, and the SVM optimized by artificial bee colony (ABC) was used to diagnose faults.
It can be seen from [1,[3][4][5][6][7][8][9][10][11][12][13][14] that the model-based fault diagnosis method can be used to diagnose and isolate the different faults for fuel cell systems.However, because of the complex structure of PEMFC systems, it is difficult to obtain the parameters of some key components or materials.Therefore, accurately establishing a physical model of PEMFC systems is difficult [16][17][18].The data-driven fault diagnosis methods do not need the system model in the diagnosis process, and can realize the fault diagnosis for PEMFC systems only through the system state variable data.However, the accuracy of the datadriven fault diagnosis method largely depends on the training data used for algorithm training.Although most of the automotive PEMFC stacks only switch between different operating points during operation, the driving state of the vehicle changes frequently when the vehicle is driving (especially on urban roads), and the PEMFC stack switches between different operating points according to different driving states of the vehicle.Therefore, the actual operating conditions of the PEMFC stack are diverse, and it is difficult to include the system state variable data under all operating conditions in the training data set.If there is a big difference between the actual system state variable data and the data in the training data set, false alarms can easily occur.
To solve the above problems, a data-driven fault diagnosis method based on the steady-state identification is proposed for PEMFC systems in this paper.The system state variable data collected when the stack operates stably at a certain operating point is used for the algorithms training and fault diagnosis in the proposed method.Because the operating points of the PEMFC stack are usually determined, when the stack operates stably at a certain operating point, the data of system state variables (it refers to the variables that can reflect the operation state of the PEMFC system in this study, such as stack output voltage, stack output current, gas pressure in the supply manifold, etc.) are stable within a certain range.Therefore, it can improve the diagnosis accuracy and reduce false fault alarms if the system state variable data collected when the stack operates stably at a certain operating point is used for the diagnosis of PEMFC system faults.
In the proposed method, the support vector data description (SVDD) is used to identify the steady state of the system.The SVDD is a one-class classification algorithm, it can be trained simply by the change rates of the system state variables in unit time when the stack operates stably at a certain operating point, avoiding the problem that the training data set cannot contain the change rates of system state variables in unit time under the condition of various system operation state changes, so that the accuracy of the steady-state identification can be improved.In the process of steady-state identification, it is difficult to quantify the change rates of each system state variable in unit time because the sensor output signal contains measurement noise.Therefore, linear least squares fitting (LLSF) is used to fit the data of each system state variable in unit time, the change rates of each system state variable are estimated by the slopes of the fitting lines.In order to remove the invalid data in the data samples, the density-based spatial clustering of applications with noise (DBSCAN) is used to identify and remove the invalid data by identifying the outliers in the data samples.Because the calculation capability of the vehicle embedded system is worse than that of an ordinary computer, the time taken to diagnose a fault will be too long if the pattern recognition algorithm with a large amount of calculation is adopted.Therefore, the RVM which has less calculation is used to diagnose the faults in the proposed method.In the RVM training process, the ABC is used to optimize the parameters of the RVM, which further improves the diagnosis accuracy and reduces the occurrence of false fault alarms.

Relevance Vector Machine
The feasibility of the online fault diagnosis for the fuel cell stack by the SVM in embedded system was showed in [19].The RVM proposed by Tipping et al., 2001 [20] is an improved algorithm of the SVM.It is a supervised learning pattern recognition algorithm based on Bayesian principles.The RVM can be used to find a decision boundary that separates the training data with different labels, so that the fault diagnosis can be realized by identifying which side of the decision boundary the test data are on.Compared with the SVM, the RVM has lower computational complexity and superior diagnostic capability [1].In this paper, RVM is briefly described.The interested reader is referred to [20,21] for the complete description of the RVM.
Given an input training data samples {x i } N i=1 and target values {t i } N i=1 , the output y(x n ; w) of RVM model can be expressed as [1,20]: where w is the weight vector, w = [w 0 , w 1 , w 2 , . . ., w N ] T , x is the input vector, K(x,x i ) is the kernel function.By considering the classification performance of the RVM and its computational complexity, the radial basis function (RBF) is chosen as the kernel function of RVM because it has low calculation complexity and only one parameter (kernel width r).The formula of kernel function is: For binary classification, the linear model is generalized by applying the logistic sigmoid function σ(y) = 1/(1 + e −y ) to y(x; w), and adopting the Bernoulli distribution for p(t |x ), the likelihood function p(t |w ) is: where t = [t 1 , . . . ,t N ] T , t i ∈ {0, 1}.However, w cannot be obtained analytically.Therefore, an approximation procedure based on Laplace's method [22] is used.Since p(w|t, α ) ∝ P(t |w )p(w |α )('∝' denote direct proportion), for a constant α, the most possible maximum posterior weights w MP can be obtained by finding the maximum P(t |w )p(w |α ).
where α is a vector of N + 1 hyperparameters, Equation ( 4) is differentiated twice to give: where The hyperparameters in matrix A are updated using: where Σ ii the ith diagonal element of the posterior covariance matrix

Support Vector Data Description
SVDD is developed based on SVM theory for the one-class classification.Its goal is to find a smallest hypersphere or domain which can contain all or almost all target samples [23][24][25].In this paper, the SVDD is briefly described.A complete description of the SVDD can be found in [26,27].Given the training sample set D = {x 1 , x 2 , . . ., x m }, let a denote the center of the sphere and R denote the radius of the sphere.The equation of minimum hypersphere is: min where C and ξ are the penalty factor and slack variable, respectively.In [26,27], Lagrange multiplier method was used to solve the constrained optimization problem described in Equation (7), so that the dual problem of Equation ( 7) can be expressed as: where α i , γ i are Lagrange multipliers, L(R,a,α i ,γ i ,ξ i ) is a Lagrangian function with R, a, α i , γ i , and ξ i as independent variables.It is obtained by Lagrange multiplier method.Similar to RVM, the radial basis kernel function is introduced into SVDD.
where K(x i , x j ) is the kernel function.By solving Equation ( 9), formulas of the sphere center and sphere radius can be obtained.The solution process can be found in [25].
For the test data sample z, when the distance from z to the center a of the sphere is less than the radius R of the sphere (Satisfying Equation ( 10)), it can be determined that the sample z is in the sphere.

Artificial Bee Colony Algorithm
Differences in the penalty factor and kernel width of SVDD and the kernel width of RVM have great influence on the accuracy and generalization ability of the SVDD and RVM respectively.Therefore, the reasonable selection of parameters can improve the performance of the SVDD and RVM.The ABC proposed by Karaboga et al., 2008 [28] is a new swarm intelligence algorithm, which has the advantages of the simplicity, strong robustness and easy implementation.In the ABC, all bees adopt the greedy selection mechanism, which makes it faster than the competitive algorithms [29].Therefore, the penalty factor of SVDD and kernel widths of SVDD and RVM are optimized by the ABC in this study.Calculation procedures of ABC can be described briefly below.
(1) Setting the population size N, the dimension of search space D, the maximum number of no updates is limit and search range [x min , x max ]. (2) According to Equation (11), the initial solutions are generated by the random search in the search range.
where d = 1, 2, 3, . . ., D, i = 1, 2, 3, . . ., N; r is a random number in [0,1]; x id represents the dth dimension of ith food source.x d min is the lower bound of the dth dimension in the food source; x d max is the upper bound of the dth dimension in the food source.
(3) The capture bees update food sources by the neighborhood search, the formula of the neighborhood search is as follows: where id represents the dth dimension of the ith new food source.
(4) The new food source, which is generated by the neighborhood search, will replace the previous optimal food source and become a new optimal food source if its fitness is better than that of previous the optimal food source.(5) According to the probability p, the onlooker bee selects the food source to search in its neighborhood.The calculation formula of the probability p is [30]: where fit is the fitness value of the food source.
(6) If a food source cannot be updated after limit iterations, the scout bee will randomly selects a new food source and then continue the neighborhood search.

Density Based Spatial Clustering with Noise
DBSCAN is an unsupervised clustering algorithm based on the data density.It can recognize clusters of arbitrary shape and is insensitive to the input order of data.In addition, it can avoid the problems that the data labels are unknown and it is difficult to determine the number of clusters in the process of using the FCM to remove invalid data.The specific calculation steps of the DBSCAN are as follows [31,32]: (1) Selecting any data p from the data set.If p does not belong to any cluster and is not marked as an outlier, the number of data points in the neighborhood with a radius of Eps is checked.If the number is equal to minPts or more than minPts, a new cluster D will be established, and all points in the neighborhood will be added to the candidate set M. Otherwise, p is marked as an outlier.
(2) Selecting the data q which has not been processed (does not belong to any cluster and is not an outlier) in the candidate set M, adding q to the cluster D, and checking the number of data in the neighborhood with the radius of Eps.If the number is equal to minPts or more than minPts, the data in the neighborhood of q with the radius of Eps are added to M. (3) Repeating ( 2), continue to check the unprocessed data in candidate set M until there is no unprocessed data in candidate set M; (4) Repeating ( 1) to ( 3) until all the data in the data set belong to clusters or are marked as outliers.
In the DBSCAN, the calculation formula of the distances between the different data is written as follows: where d is the distance between the different data, x is the data, i, j = 1, 2, 3, . . ., i = j.

Fault Diagnosis for PEMFC Systems Based on Steady-State Identification
The fault diagnosis process of the PEMFC system based on the steady-state identification can be divided into two stages: the off-line training and online diagnosis.The detailed processes of off-line training and online diagnosis are shown in Figure 1.As shown in Figure 1, the process of the off-line training is as follows: (1) the state variable data, which are collected during the steady-state operation of the PEMFC system, are extracted from all the data; (2) the DBSCAN is used to remove the invalid data (outliers) in the data samples, aiming to avoid the invalid data affecting the subsequent results of the data processing; (3) the average value of each state variable in unit time is calculated, and the LLSF is used to fit the system state variable data in unit time so that the change rate of each state variable in unit time can be estimated by the slope of the fitting line; (4) the change rates and average values of each state variable per unit time are normalized so as to avoid the larger values dominating the smaller ones in the training sample set and decrease calculation load [18].The normalization formula can be found in [13]; (5) the SVDD and RVM are trained by the change rates and average values of each state variable per unit time after normalization, and the parameters of SVDD and RVM are optimized by the ABC.In the optimization process, the accuracy of fault diagnosis method is taken as the fitness function of optimization, so that the accuracy of the steady-state identification and fault diagnosis can be improved.In the process of the online diagnosis, firstly, sensors are used to collect the state variable data of the PEMFC system in unit time, and the DBSCAN is used to remove the invalid data.Then, the LLSF is used to fit the data of system state variables in unit time.After the fitting results are normalized, the trained SVDD is used to judge whether the system is in the steady state.If the system is in the steady state, the average value of each state variable in the unit time will be calculated and normalized, and then the fault diagnosis will be performed by the trained RVM.If the system is in the unsteady state, the fault diagnosis will not be performed, and sensors will be used to collect the data of system state variables in unit time again.
The data-driven fault diagnosis method needs the data of the system state variables to train the pattern recognition algorithm.These data include the state variable data of the system under normal operation and various faults.However, simulating various faults on the actual PEMFC system test bench may lead to permanent damage [33].A more dangerous scenario is that safety accidents like explosion or combustion may arise once the air mixes with the hydrogen in the stack when the proton exchange membrane in the stack perforates or ruptures during the fault simulation.
Therefore, the PEMFC system model proposed by Pukrushpan et al., 2004 [34] is used to verify the diagnosis effect of the proposed method in this paper.This model was developed based on the PEMFC system of the Ford P2000 vehicle and is widely used in the verification of diagnosis methods for faults of PEMFC systems [3,5,6,14,35].The detailed principles and relevant formulas of the PEMFC system model are shown in [34].By changing the model parameters, simulation of the air compressor fault and air supply manifold fault of the PEMFC system can be realized.In this paper, F0, F1, and F2 are used to describe three different modes of PEMFC system.F0 represents that the PEMFC system is in normal.F1 represents the state of PEMFC system when the system has a fault of increasing friction in the compressor motor.F1 can be simulated by changing compressor constant (k v ) in the PEMFC system model, and the changing magnitudes are set as +6%, +8%, +10%, +12%, +14%, respectively, in this paper.F2 represents the state of PEMFC system when the system has a fault of air leak in the air supply manifold.F2 can be simulated by changing supply manifold outlet orifice constant (k sm,out ) in the PEMFC system model, and the changing magnitudes are set as −6%, −8%, −10%, −12%, −14%, respectively, in this paper [3,6,14,35].
For verifying the diagnosis effect of the method proposed in this paper, under the dynamic condition (Conditions_1 [34], see Figure 2 and Table 1 for details), F0-F2 were simulated respectively, and the system state variable data obtained under different modes were collected.However, the output signal of each sensor contains the measurement noise in the actual PEMFC system.Therefore, in order to simulate the output signals of actual sensors, according to different sensor types, Gaussian noises with different variances were added to the data of different system state variables.In this paper, the standard deviations of the Gaussian noises, which were used to simulate the measurement noises of different sensors, are 4.5•10 −4 kg/s for the flow sensor [6], 10 rad/s for the angular velocity sensor [6], 300 Pa for pressure sensors [6], 0.07 K for the temperature sensor [6] and 0.001 V for the voltage sensor [8], respectively.Based on the sensor configuration of the actual automotive PEMFC system, eight state variables were selected from the PEMFC system model as the input of the proposed fault diagnosis method.They include the stack output current, gas temperature at the air compressor outlet, air output flow rate of air compressor, compressor speed, stack output voltage, gas pressure at the air compressor outlet, gas pressure in the hydrogen supply manifold and air pressure at the inlet of the stack.These state variables can be measured by different types of sensors (The errors is generally less than ± 1% of the full scale).For verifying the diagnostic accuracy of the proposed method under different fault severities, the change magnitudes of parameters affected by faults were set to 6%, 8%, 10%, 12% and 14% respectively during the acquisition of state variable data.
In the F0, F1 and F2, when the change magnitude of parameters affected by faults is 10%, results, which were obtained by the PEMFC system model simulation under steadystate conditions, are taken as the training sample sets.The details of the training sample sets are shown in Table 2.In F0, F1 and F2, when the change magnitudes of parameters affected by faults are 6%, 8%, 10%, 12% and 14%, the simulation results of the PEMFC system model under Conditions_1 are taken as the testing sample sets in order to test the diagnostic performance of the proposed method under different fault severities.Since eight state variables were selected for the fault diagnosis, the dimension of each data sample in the training and test sample sets is eight-dimensional.In the process of data collection and processing, the setting of a long unit time or a high sensor sampling frequency can increase the number of data in unit time used for LLSF, so that the influence of sensor output signal noise on the estimation of change rates of system state variable can be reduced.But at the same time, the setting of a long unit time or a high sensor sampling frequency can lead to the degradation of the real-time of fault diagnosis or increase the amount of data processing of embedded system.Therefore, it is necessary to set the unit time and the sampling period of the sensor reasonably in the process of data collection and processing.In this study, the unit time is set to 0.5 s (namely, the length of the data acquisition time window for the single diagnosis is 0.5 s) in the process of data processing, and the data sampling period is 0.01 s.Therefore, the number of data per unit time used for LLSF is 50 in the process of data processing.  1 Because the output current of the stack is the input of the model, it is not identified whether it is in steady state or not.
In terms of the diagnosis accuracy, the fault simulation method and the setting of the change magnitudes of parameters affected by faults in this paper refer to the relevant contents in [14], and the diagnosis method for PEMFC system faults based on the possibilistic fuzzy C-means clustering artificial bee colony support vector machine (PFCM-ABC-SVM) proposed in [14] is an improvement of the classical method which uses the SVM for fault diagnosis.Therefore, the diagnosis method for PEMFC system faults based on the PFCM-ABC-SVM was chosen for comparison with the diagnosis method proposed in this paper.For demonstrating the steady-state identification process of the proposed method, the data of the air pressure at the inlet of the stack was randomly selected as an example from seven diagnostic variables except the stack output current (the stack output current is the model input).In Figure 3, the original data of air pressure at the inlet of the stack obtained by simulation, the data of the pressure change rates per unit time estimated by the LLSF, the identification results of invalid data and the steady-state identification results are shown respectively.Under the steady-state operation conditions, both the original simulation data of air pressure at the inlet of the stack and the data of change rates of the air pressure at the inlet of the stack per unit time, which is estimated by the LLSF, are shown in Figure 3a.Because of adding Gaussian noise to the output signal of the pressure sensor, it is difficult to directly calculate the change rate of air pressure in unit time (0.5 s).Therefore, the LLSF is used to fit the gas pressure data in unit time, and the change rate of gas pressure in unit time is estimated by the slope of the fitting line.The estimated results show that the change rate of gas pressure within 0.5 s of the start of simulation is significantly higher than that of gas pressure in unit time during the steady-state operation of the system.This is because the air pressure at the inlet of the stack converges rapidly from the initial value to the steady value in a period of time after the start of simulation.In this period, the system is in an unstable state.Since the SVDD needs to be trained by the change rate data of the state variables collected during the steady-state operation of the system, the pressure change rate data within 0.5 s of the start of simulation need to be recognized and removed.Figure 3b,c show the recognition results of invalid data in the sample set using the FCM and DBSCAN, respectively.As can be seen from Figure 3b, the FCM not only identifies the pressure change rate data within 0.5 s of the start of simulation as invalid data but also mistakenly identifies many steady-state operation data in the sample as invalid data, which leads to the loss of the valid data.As shown in Figure 3c, compared with the FCM, the DBSCAN can accurately identify the invalid data in the sample through the data density, which ensures the reliability of data for SVDD training.As shown in Figure 3d, the steady-state identification method of the PEMFC system based on the SVDD can accurately identify the data collected during the steady-state operation of the system, which provides a guarantee for the accurate diagnosis of PEMFC system faults.

Diagnostic Accuracy under Different the Change Magnitudes of Parameters Affected by Faults
When the change magnitude of parameters affected by faults is 10% (equal to the change magnitude of parameters affected by faults set in the training data sets), the diagnostic accuracy (the percentage of samples which are correctly diagnosed in the test samples) of different fault diagnosis methods under the different modes (F0, F1 and F2) is as shown in Figure 4.The proposed method can automatically identify the state variable data of the system collected during the stack switching between different operating points.These data are liable to cause diagnostic errors.Therefore, when the change magnitude of parameters affected by faults is 10%, the proposed method has a higher diagnostic accuracy than the PFCM-ABC-SVM, the diagnostic accuracy can reach 100%.For testing the diagnostic accuracy of the proposed method under different fault severities, the proposed method was tested with the model simulation results under different change magnitudes of parameters affected by faults (6%, 8%, 10%, 12% and 14%).The diagnostic accuracy is shown in Figure 5.Under different change magnitudes of parameters affected by faults, the proposed method has a higher diagnostic accuracy than the PFCM-ABC-SVM.When the change magnitudes of parameters affected by faults are 8%, 10%, 12% and 14%, the diagnostic accuracy of the method proposed in this paper is 100%.When the change magnitude of parameters affected by faults is 6%, the diagnostic accuracy of the method proposed in this paper decreases slightly, the accuracy is only 98.1%.The reason is that the changes of system state variables caused by faults are small when the change magnitude of parameters affected by faults is small, the system state variables under the conditions of faults are closer to those under the normal condition.
Therefore, the diagnostic accuracy decreases.For the PFCM-ABC-SVM, the diagnostic accuracy is the highest when the change magnitude of parameters affected by faults set in the test set is equal to that in the training set.However, when the change magnitudes of parameters affected by faults set in the test set are not equal to those in training set, the diagnostic accuracy of the PFCM-ABC-SVM decreases.

The Diagnostic Accuracy under Different Operating Conditions
For verifying the diagnosis effect of the proposed method under other conditions (different from the conditions set in the training data set), the proposed method was tested by using Conditions_2 which is symmetrical to Conditions_1.The reasons for this are as follows: (1) the difference between Conditions_2 and Conditions_1 is large.Conditions_1 is a process in which the stack output current increases gradually, which simulates the vehicle acceleration process to a certain extent.Conditions_2 is a process in which the stack output current decreases gradually, which simulates the vehicle deceleration process to a certain extent.(2) Because the operating points of the stack in Conditions_2 are the same as those of Conditions_1, the PEMFC model proposed by Pukrushpan et al., 2004 [34] can be run correctly and stably.The values of stack output current under Conditions_2 are shown in Figure 6 and Table 3.When the change magnitude of parameters affected by faults is 10%, the diagnostic accuracy of different fault diagnosis methods under Conditions_2 is as shown in Figure 7.As shown in Figure 7, when the operating conditions are different from those set in the training data set, the diagnostic accuracy of the PFCM-ABC-SVM decreases slightly.In order to analyze the reasons for the decline of diagnostic accuracy of the PFCM-ABC-SVM, a diagnostic variable was randomly selected from seven diagnostic variables except the stack output current for the visualization of diagnostic results (the air pressure at the inlet of the stack was still taken as an example).In the F0, the distribution of data samples, which are diagnosed correctly and incorrectly, is shown in Figure 8.The data shown in Figure 8 are the data of the air pressure at the inlet of the stack collected during the operation of Conditions_2 under no system fault.The data of error diagnosis shown in the Figure 8 refers to the data that causes the PFCM-ABC-SVM false alarms.As shown in Figure 8, the data samples that are diagnosed incorrectly by the PFCM-ABC-SVM are mainly the system state variable data collected during the stack switching between different operating points.In Conditions_2, the system state variables data collected during the stack switching between different operating points are very different from the data in the training data set.Therefore, the diagnostic accuracy of the PFCM-ABC-SVM is low.The method proposed in this paper can automatically identify and remove the system state variables data collected during the stack switching between different operating points.Therefore, when the actual operation condition of the PEMFC system is different from that set in the training data set, the method proposed in this paper can still maintain a high diagnostic accuracy and reduce the occurrence of false fault alarms.However because the proposed method cannot diagnose the fault in time if the system fault occurs during the stack switching between different working points, it is only suitable for the diagnosis of some component faults which cannot cause significant harm to the vehicle or PEMFC system in a short time.

Conclusions
In this study, a fault diagnosis method based on the steady-state identification is proposed for the PEMFC system, and the diagnosis effect was verified by the faults of increased friction in the compressor motor and the air leak in the air supply manifold which were simulated by the PEMFC system model.The proposed method can automatically identify the system state variables data collected during the stack switching between different operating points, which is easy to cause diagnostic errors.Therefore, the accuracy of diagnosis can be improved and the occurrence of false fault alarms of the diagnosis system can be reduced.In the proposed method, the SVDD and RVM are used for the steady-state identification and fault diagnosis respectively, avoiding the problem of the training data set not containing the change rates of system state variables in unit time under the conditions of various system operation state changes, and the diagnostic program can be run in the vehicle embedded system.Therefore, the proposed method has a certain practical application value.In the training process of the SVDD and RVM, the ABC is used to optimize their parameters, so that the accuracy of the steady-state identification and fault diagnosis can be further improved.In order to avoid the problems of the unknown label and uncertain clustering numbers of system state variable data, the DBSCAN was introduced to remove the invalid data in the data.In the process of simulation, in order to more closely resemble the real situation, Gaussian noises with different variances were added to the output signals of different types of sensors to simulate the measurement noises of sensor output signals.Because the sensor output signals contain noises, it is difficult to quantify the change rates of system state variable data in unit time.Therefore, the LLSF was introduced to estimate the change rates of system state variable data.
Under the premise that the test conditions are the same as those set in the data acquisition process of the training sample set, simulation results show that the diagnostic accuracy of the proposed method is 100% in different modes when the change magnitudes of parameters affected by faults set in the data acquisition process of the test sample set are either equal to or higher than those set in the data acquisition process of the training sample set.Furthermore, the proposed method also has high diagnostic accuracy when the change magnitudes of parameters affected by faults set in the data acquisition process of the test sample set are slightly lower than those set in the data acquisition process of the training sample set or the operating conditions are different from those set in the data acquisition process of training sample set.It can be seen that the fault diagnosis method proposed in this paper can effectively diagnose the faults of the PEMFC system and reduce false alarms of the diagnosis system.
However, it is worth noting that PEMFC system faults cannot be diagnosed by the proposed method during the stack switching between different working points.Therefore, it is not suitable for the diagnosis of PEMFC system faults which may cause significant harm to the vehicle or PEMFC system in a short time (such as hydrogen leakage).In the near future, we will study the fault diagnosis method for the PEMFC system under unsteady operation, and improve the proposed method with posterior probability of diagnosis results and the pattern recognition algorithm based on incremental learning, so that its practical application value can be further improved.

Figure 1 .
Figure 1.Flowchart of the diagnosis method based on the steady-state identification.

Figure 3 .
Figure 3.The steady-state identification of the PEMFC system (taking the data of the air pressure at the inlet of the stack as an example): (a) The original simulation data of air pressure at the inlet of the stack under the steady-state operation conditions and the change rates data (estimated by the linear least squares fitting (LLSF)) of air pressure at the inlet of the stack per unit time under the steady-state operation conditions; (b) the results of the invalid data identification using the fuzzy C-means (FCM) clustering; (c) the results of the invalid data identification using the density-based spatial clustering of applications with noise (DBSCAN); (d) the steady-state identification results.

Figure 4 .
Figure 4.The diagnostic accuracy of the proposed method and the possibilistic fuzzy C-means clustering artificial bee colony support vector machine (PFCM-ABC-SVM) under different modes (Modes: F0, F1 and F2; operating conditions: Conditions_1; the change magnitude of parameters affected by faults: 10%).

Figure 5 .
Figure 5.The average diagnostic accuracy of the proposed method and the PFCM-ABC-SVM under different change magnitudes of parameters affected by faults (Operation condition: Conditions_1; the change magnitudes of parameters affected by faults: 6%, 8%, 10%, 12% and 14%).

Figure 8 .
Figure 8.The distribution of data samples which are diagnosed correctly and wrongly by the PFCM-ABC-SVM.

Table 1 .
Output current values of the stack at different times under Conditions_1.

Table 2 .
The details of the training sample sets for the relevance vector machine (RVM) and the support vector data description (SVDD).

Table 3 .
Output current values of the stack at different time under Conditions_2.
Figure 7.The diagnostic accuracy of the proposed method and the PFCM-ABC-SVM under different modes (Modes: F0, F1 and F2; operating conditions: Conditions_2; the change magnitude of parameters affected by faults: 10%).