Life Prediction Based on D-S ELM for PEMFC

: The proton exchange membrane fuel cell (PEMFC) is an extremely clean and e ﬃ cient power generation device. However, its limited lifespan has restricted the large-scale commercial development of PEMFCs. Life prediction is a promising solution for the further life extension of PEMFCs. In this paper, D-S ELM(DWT-SaDE ELM), deﬁne as, an enhanced extreme learning machine (ELM) optimized by discrete wavelet transform (DWT) and self-adaptive di ﬀ erential evolutionary algorithm (SaDE), is proposed to predict the remaining useful life (RUL) of PEMFCs. In D-S ELM, DWT is employed to extract available features from multi-input data with stochastic noise. Then, SaDE explores the optimal parameter conﬁguration for the ELM neural network. Moreover, the inﬂuence of training data sizes on the prediction results is discussed. Simulations show that D-S ELM has obvious advantages in prediction accuracy. Furthermore, the superiority of D-S ELM in small sample applicability, prediction speed and robustness make it more suitable for the online prediction of PEMFCs.


Introduction
Energy issue is the hottest topic in the 21th century. Hydrogen energy is attracting more and more attention due to its high efficiency, cleanliness and sustainability. As a prospective power supply device, the proton exchange membrane fuel cell (PEMFC) has been widely applied in many fields such as substation, cars, buses, locomotives, trams and portable devices [1][2][3]. However, the performance of each single cell will be affected by catalyst poisoning, membrane drying, water flooding, current spillover and other problems, resulting in temporary or permanent voltage degradation. In that case, the remaining useful life of fuel cells is greatly shortened, which greatly hinders the large-scale application of fuel cell technology in the industrial field. Prognostics and health management (PHM) [4], especially the life prediction of PEMFC, is a challenging issue to be addressed.
At present, life prediction methods can be mainly divided into three categories including model-driven methods, data-driven methods and hybrid-driven methods.
Model-driven methods are realized by an analysis model based on non-linear physical phenomena. Due to the lack of an adaptive aging model and the disturbance caused by the recovery phenomenon, Marine Jouin et al. [5] adopted a particle filter (PF) algorithm to predict power degradation based on an empirical model. Taejin Kim et al. [6] proposed a method based on an equivalent circuit model (ECM) and the training electrochemical impedance spectroscopy (EIS) data to predict the remaining useful life of PEMFC. Elodie Lechartier et al. [7] proposed a behavioral model of PEMFC consisting of the static model based on the Butler-Volmer equation and the dynamic model based on the electrical equivalent, respectively, to describe the physical phenomena of PEMFC. Xinfeng Zhang et al. [8] proposed an   [20]. In this paper, RUL prediction is considered under a constant current condition, in this case, FC1 data set which is observed under constant current stable environment J = 0.7 A/cm is chosen as raw experimental data.

Physical Parameters Control Range
To accurately predict the RUL of PEMFC, the selection of indicators characterizing the aging process is the primary problem. The curves of candidate indicators in the FC1 data set are shown in Figure 2. We can see that the candidate indicators (e.g., outlet hydrogen temperature, outlet water temperature and outlet hydrogen pressure) cannot reflect the aging phenomena of PEMFC from the curves of these indicators. Compared with the above indicators, the output voltage has a distinct downward trend, which can be used as an aging indicator.  In this paper, RUL prediction is considered under a constant current condition, in this case, FC1 data set which is observed under constant current stable environment J = 0.7A/cm 2 is chosen as raw experimental data.
To accurately predict the RUL of PEMFC, the selection of indicators characterizing the aging process is the primary problem. The curves of candidate indicators in the FC1 data set are shown in Figure 2. We can see that the candidate indicators (e.g., outlet hydrogen temperature, outlet water temperature and outlet hydrogen pressure) cannot reflect the aging phenomena of PEMFC from the curves of these indicators. Compared with the above indicators, the output voltage has a distinct downward trend, which can be used as an aging indicator. The characteristics of experimental data can be summarized as follows: (1) The size of monitoring data is large. For better analysis, data preprocessing is applied.
(2) The data set only includes the operation parameters (e.g., inlet hydrogen pressure) and the output parameters (e.g., voltage), but not the internal aging parameters (e.g., catalyst active surface area) of PEMFC. Considering the difficulty of establishing an accurate physical model, the data-driven method is a promising solution. (3) The data set has the characteristic of non-linear and non-Gauss disturbance, further data tracking in time series contributes to ensuring the accuracy of prediction.

Data Preprocessing
In this paper, the data preprocessing method includes data reduction and normalization. The purpose of data reduction is to simplify the original huge data without loss of data properties. The specific operation is to extract data from the original data with a step size of 100. To unify multiple physical quantities, data normalization is applied. The operation parameters and output parameters are unified in the range of [0,1].

Discrete Wavelet Transform
Discrete wavelet transform (DWT) can filter out noise and retain useful information of the time series [21]. Basic wavelet translation and scaling operations give the wavelet transform good local characteristics in both time and frequency domains, which is an important feature of the wavelet transform. Wavelet translation can obtain the time-domain information of time series and wavelet scaling can obtain the frequency-domain information of time series. The main process of DWT is decomposition and reconstruction. The decomposition process is to first determine a basic wavelet, decompose the time series into N-layer sub-wavelets and then get the approximate coefficients and detail coefficients. Figure 3 gives a brief introduction of decomposition hierarchically.  The characteristics of experimental data can be summarized as follows: (1) The size of monitoring data is large. For better analysis, data preprocessing is applied.
(2) The data set only includes the operation parameters (e.g., inlet hydrogen pressure) and the output parameters (e.g., voltage), but not the internal aging parameters (e.g., catalyst active surface area) of PEMFC. Considering the difficulty of establishing an accurate physical model, the data-driven method is a promising solution. (3) The data set has the characteristic of non-linear and non-Gauss disturbance, further data tracking in time series contributes to ensuring the accuracy of prediction.

Data Preprocessing
In this paper, the data preprocessing method includes data reduction and normalization. The purpose of data reduction is to simplify the original huge data without loss of data properties. The specific operation is to extract data from the original data with a step size of 100. To unify multiple physical quantities, data normalization is applied. The operation parameters and output parameters are unified in the range of [0,1].

Discrete Wavelet Transform
Discrete wavelet transform (DWT) can filter out noise and retain useful information of the time series [21]. Basic wavelet translation and scaling operations give the wavelet transform good local characteristics in both time and frequency domains, which is an important feature of the wavelet transform. Wavelet translation can obtain the time-domain information of time series and wavelet scaling can obtain the frequency-domain information of time series. The main process of DWT is decomposition and reconstruction. The decomposition process is to first determine a basic wavelet, decompose the time series into N-layer sub-wavelets and then get the approximate coefficients and detail coefficients. Figure 3 gives a brief introduction of decomposition hierarchically.
The reconstruction process reconstructs the original time series by the approximate coefficients and detail coefficients. For further understanding, Figure 4 shows the decomposition and reconstruction process of DWT, meanwhile, the approximate coefficients and detail coefficients of the original signal can be obtained. After decomposition, the original signal, whose time domain length is L, is divided into approximate coefficients and detail coefficients whose length is L/2. In this paper, the experimental data provided by FCLAB is decomposed by using one level wavelet transform. The basic wavelet is "db3" wavelet.  For a given discrete input signal x[n], a low-pass filter l[n], a high-pass filter h[n] and a down-sampling filter ↓ Q (set Q = 2), x α,L [n] and x α,H [n] are the approximate coefficients and the detail coefficients in α th layer. (1) The reconstruction process reconstructs the original time series by the approximate coefficients and detail coefficients. For further understanding, Figure 4 shows the decomposition and reconstruction process of DWT, meanwhile, the approximate coefficients and detail coefficients of the original signal can be obtained. After decomposition, the original signal, whose time domain length is L, is divided into approximate coefficients and detail coefficients whose length is L/2. In this paper, the experimental data provided by FCLAB is decomposed by using one level wavelet transform. The basic wavelet is "db3" wavelet.
The reconstruction process reconstructs the original time series by the approximate coefficients and detail coefficients. For further understanding, Figure 4 shows the decomposition and reconstruction process of DWT, meanwhile, the approximate coefficients and detail coefficients of the original signal can be obtained. After decomposition, the original signal, whose time domain length is L, is divided into approximate coefficients and detail coefficients whose length is L/2. In this paper, the experimental data provided by FCLAB is decomposed by using one level wavelet transform. The basic wavelet is "db3" wavelet.

Self-Adaptive Differential Evolutionary Algorithm
Since the output weights of ELM [22] are calculated based on the random input weights w i and stochastic hidden layer bias b i , a set of non-optimal or unnecessary input weights and hidden layer bias exist. In some applications, ELM may need more hidden neurons than the traditional algorithm [23]. Therefore, the algorithm is expected to get more compact and respond faster to the neural network in practical applications.
Self-adaptive differential evolutionary algorithm [24] (SaDE) is a methodology available to optimize the neural network of ELM. According to the above analysis, the optimized objectives are the two randomly selected parameters including the input weight w i and hidden layer bias b i . First, the randomly selected parameters are used as the initial population. Then the fitness of each individual in the population is calculated. In this paper, the root mean square error (RMSE) was chosen as the fitness function. When the fitness calculation of all individuals was completed, the process of mutation, crossover and selection was carried out and the individuals with better fitness will be retained for the next generation.

D-S ELM
This paper proposes D-S ELM, which is optimized by discrete wavelet transform (DWT) and a self-adaptive differential evolutionary algorithm (SaDE). The framework of D-S ELM is shown in Figure 5.

Self-Adaptive Differential Evolutionary Algorithm
Since the output weights of ELM [22] are calculated based on the random input weights and stochastic hidden layer bias , a set of non-optimal or unnecessary input weights and hidden layer bias exist. In some applications, ELM may need more hidden neurons than the traditional algorithm [23]. Therefore, the algorithm is expected to get more compact and respond faster to the neural network in practical applications.
Self-adaptive differential evolutionary algorithm [24] (SaDE) is a methodology available to optimize the neural network of ELM. According to the above analysis, the optimized objectives are the two randomly selected parameters including the input weight and hidden layer bias . First, the randomly selected parameters are used as the initial population. Then the fitness of each individual in the population is calculated. In this paper, the root mean square error (RMSE) was chosen as the fitness function. When the fitness calculation of all individuals was completed, the process of mutation, crossover and selection was carried out and the individuals with better fitness will be retained for the next generation.

D-S ELM
This paper proposes D-S ELM, which is optimized by discrete wavelet transform (DWT) and a self-adaptive differential evolutionary algorithm (SaDE). The framework of D-S ELM is shown in Figure 5. The main concept and steps of D-S ELM are summarized as follows.
Step 1. Discrete wavelet transform Use discrete wavelet transform (DWT) to transform the training data and test data.
Step 2. ELM initialization Set the number of hidden layer nodes and the activation function ( ). Original ELM can be approximated to zero-error samples, which is expressed as the following equations.
Equation (3) can be written succinctly. The main concept and steps of D-S ELM are summarized as follows.
Step 1. Discrete wavelet transform Use discrete wavelet transform (DWT) to transform the training data and test data.
Step 2. ELM initialization Set the number of hidden layer nodes N and the activation function g(x). Original ELM can be approximated to N zero-error samples, which is expressed as the following equations.
In Equation (3), w i and b i (i = 1, · · · , N) is the input weight and the bias of hidden layer. β is the output weight, and x j and p j ( j = 1, · · · , N) is the training input and training output. In Equation (4), H is the output matrix of hidden layer and P is the training output matrix.
Step 3. SaDE initialization Set and the maximum number of iterations G max . The population θ k,G with NP individuals is initialized.
Step 4. Calculations of hidden layer output matrix H.
According to Equation (3) and Equation (4), the i th column of matrix H is the i th hidden neuron output vector response to training input x 1 , x 2 , · · · , x N .
Step 5. Calculations of output weights β and RMSE β k,G = H k,G + P Step 6. Mutation v i,G is a new generated mutant population, where r i 1 , r i 2 , r i 3 are randomly generated in the population. Parameter F is an adaptive scaling factor which is usually chosen between [0,2]. In this paper, F is set as a value that changes with the number of iterations.
Step 7. Crossover Crossover process can increase the diversity of the population. CR is the crossover probability which is usually chosen between [0,1].
Step 8. Selection The individuals with lower fitness value will be retained for the next generation. Repeat step 4 to step 8 until G = G max .
Step 9. Calculations of test results Input the test data, calculate the predicted results according to β.

Selection of Input and Output Layer Data
From the analysis in Section 2, only the output stack voltage of PEMFC has an obvious downward trend compared with other indicators in the original data set, which is suitable for predicting the remaining useful life of PEMFC. Therefore, the output voltage was chosen as the data of output layer network. The number of input network layers is determined by the number of factors which affect the prediction results. After data analysis, outlet water temperature, air pressure and hydrogen pressure were used as data for the input layer network. These three parameters directly affect the performance of PEMFC and the RUL of the fuel cell. Compared with BP and deep learning algorithms, in this paper, the proposed algorithm used fewer historical data to predict the current state of the PEMFC system, and can also reflect the real-time changes of the state of the PEMFC system over time.

Selection of the Number of Hidden Layer Nodes
The choice of the number of hidden layer nodes is a sensitive issue [25], which has a great influence on the prediction performance of a neural network. In order to make the D-S ELM network have a better generalization performance, the number of hidden layer nodes determined should be as few as possible to make the network more compact. When determining the number of hidden layer nodes, the following preconditions should be satisfied: The number of hidden layer nodes should be less than that of the training samples, otherwise the network will have no practical value and low generalization ability. Similarly, the number of input layer neurons should be less than that of output layer neurons. In this paper, the number of hidden layer nodes is determined to be 3, which has best performance in simulations.

Selection of Activation Function
Activation function plays an important role in the neural network because of the ability of nonlinearity capture, which can introduce non-linear factors to the neuron model. In that case, the neural network can approximate non-linear function arbitrarily. The input data is non-linearly transformed by activation function, and then is transmitted to the next neuron as input layer data. Here enumerate some common activation functions, such as sigmoid function, tanh function and hardlim function. In this paper, sigmoid function was used as the activation function. The sigmoid function is generally used as the threshold function of artificial neural network, which maps variables to [0,1].

Simulation Results Based on Two Algorithms
After the parameters of ELM network are set up, the number of input training data should also be determined to obtain the best prediction results. As mentioned earlier, one of the advantages of D-S ELM over traditional algorithms is that it needs fewer training data. The influence of training data size on prediction accuracy is studied and discussed in Section 4.4.3. Therefore, to study the prediction accuracy improvements of the proposed algorithms, the prediction curves and comparison results given below are in the case of training data size n = 10.
One of the simulation results is shown in Figure 6. The error curves of two prediction methods are shown in Figure 7. As can be seen from the Figure 6, the predicted voltage curve based on ELM fluctuates greatly, which limits the on-line prediction of RUL. On the basis of ELM, D-S ELM is more stable and closer to the expected output and the improvement of prediction accuracy can also be seen from Figure 7.

Research on Robustness of D-S ELM
In addition, to study the robustness of the algorithm, the accuracy of life-end-point prediction was considered. Because data-driven algorithms have the characteristic of randomness, each simulation is different, which leads to the inaccuracy of prediction. In that case, use the two algorithms mentioned above to make 100 simulation predictions and obtain the end-of-life probability distribution.
As a promising energy converter, PEMFC is expected to ensure its power supply capability for a certain time in practical applications. Due to the aging phenomenon, the voltage drops and the power output is insufficient, which leads to the issue of RUL prediction. The data set used in this paper was under constant current J = 0.7A/cm 2 , and voltage drop was used as an indicator of life end. In order to verify the accuracy and universality of the proposed algorithm, and according to the definition of 2014 IEEE Fuel Cell Data Challenge Competition [20], this paper considered a variety of voltage losses as the life end point-3.5%, 4%, 4.5% of initial voltage.
As can be seen from Figure 8, when the total output voltage reaches the failure threshold, the fuel cell will reach the end-of-life stage. The initial voltage of PEMFC test bench was 3.359 V, and the voltage failure threshold was set to 3.2418 V, 3.225 V, 3.2082 V, respectively. The actual end-of-life point was 627 h, 809 h, 811 h.

Comparison of Predicted Results
For a better comparison of final predicted results and further study, the following experiments are proposed on the basis of 4% voltage loss. After simulations, the predicted results were compared with the actual voltage curve to find the earliest point which reach the failure threshold. The probability distribution of end-of-life point is shown in Figure 8.
Failure threshold is set to 3.5% of initial voltage The actual end-of-life point is 627 h. Set (627 h ± 2 h) as the confidence interval. The probability of the predicted life end point based on ELM in the confidence interval is 39%, while the probability of D-S ELM in the confidence interval reaches 73%. As can be seen from Figure 8a, the fluctuation of ELM between 600 hours and 640 hours is large, leading to the predicted point ahead of the actual point, and the prediction time point is scattered.
Failure threshold is set to 4% of initial voltage The actual end-of-life point is 809 h. Set (809 h ± 2 h) as the confidence interval. The probability of the predicted life end point based on ELM in the confidence interval is 30%, the probability of D-S ELM in the confidence interval reaches 99% This phenomenon happens because the set failure threshold is very close to the first trough of the output voltage curve, at a position of about 660 h. When the set failure threshold is confusing, the instability of ELM will lead to inaccurate prediction results.
Failure threshold is set to 4.5% of initial voltage The actual end-of-life point is 811 h. Set (811 h ± 2 h) as the confidence interval. The probability of the predicted life end point based on ELM in the confidence interval is 67%, and the probability of D-S ELM in the confidence interval reaches 100%. As for the voltage threshold set in Figure 8c, the predicted time points of ELM and D-S ELM are concentrated due to the significant voltage drop, and the accuracy is improved accordingly.
The accuracy of the algorithm is different for different voltage failure thresholds. However, according to three probability distribution figures, D-S ELM closely matched the expected voltage curve, and the prediction results of the proposed D-S ELM were more accurate than those of ELM. According to the above analysis, compared with ELM, the predicted end-of-life point of D-S ELM is more robust.

Comparison of Predicted Results
For a better comparison of final predicted results and further study, the following experiments are proposed on the basis of 4% voltage loss.
To measure the accuracy of prediction results, RMSE and ME are defined as follows.
Two algorithms are compared according to the predicted end-of-life point, deviation, root mean square error, mean error and predicted time of the algorithm. The comparison results are shown in Table 2 As can been seen from the Table 2, the proposed D-S ELM has the better accuracy, with the prediction results achieving a RMSE of 0.0036 and a ME of 0.0020. Although the application of self-adaptive differential evolutionary algorithm increased the prediction time, the training time and testing time is still much smaller than that of traditional algorithms, which is enough for online prediction. Online RUL prediction requires high accuracy and fast speed. For that reason, it requires algorithms to use as little experimental data as possible to get results quickly and accurately. In the actual operation of the PEMFC system, the changes of environmental factors or human factors will lead to real-time voltage fluctuation. The data far from the current time cannot represent the current state of the system and has a negative impact on the mathematical model of the neural network. This phenomenon has also been confirmed in Reference [17].
In this paper, to get the most accurate predictive output, constantly changing data is used to update the mathematical model of the neural network in ELM and D-S ELM. The data sizes are set from 3 to 150 to find the appropriate training data size for PEMFC. The comparisons of ME, RMSE and the time deviation of two algorithms are shown in Figure 9. In order to see the influence of data sizes on prediction results more intuitively, five groups of predicted voltage output curves of D-S ELM and prediction result comparisons are shown in Figure 10.
As the data size increases from 3 to 150, the prediction error and time deviation of both ELM and D-S ELM increases. The ME of D-S ELM increases from 0.0020 to 0.00956, the RMSE increases from 0.0036 to 0.0121, and the time deviation increases from 1.84 h to 88.79 h. When the size of the training sample is set between [5,10], the time deviation and prediction error is smallest. As can been seen from Figure 10, with the data size increasing, the predicted voltage gradually deviates from the expected voltage. When the training data size is set to 150, only a roughly decline curve can be obtained, and the RUL deviation reaches 88.79 h. This phenomenon also confirms the assumption mentioned earlier. Compared with traditional BP algorithm, D-S ELM has little requirement for the number of data samples, which is more suitable for a small sample for online RUL prediction.
testing time is still much smaller than that of traditional algorithms, which is enough for online prediction. Online RUL prediction requires high accuracy and fast speed. For that reason, it requires algorithms to use as little experimental data as possible to get results quickly and accurately. In the actual operation of the PEMFC system, the changes of environmental factors or human factors will lead to real-time voltage fluctuation. The data far from the current time cannot represent the current state of the system and has a negative impact on the mathematical model of the neural network. This phenomenon has also been confirmed in Reference [17].
In this paper, to get the most accurate predictive output, constantly changing data is used to update the mathematical model of the neural network in ELM and D-S ELM. The data sizes are set from 3 to 150 to find the appropriate training data size for PEMFC. The comparisons of ME, RMSE and the time deviation of two algorithms are shown in Figure 9. In order to see the influence of data sizes on prediction results more intuitively, five groups of predicted voltage output curves of D-S ELM and prediction result comparisons are shown in Figure 10.
As the data size increases from 3 to 150, the prediction error and time deviation of both ELM and D-S ELM increases. The ME of D-S ELM increases from 0.0020 to 0.00956, the RMSE increases from 0.0036 to 0.0121, and the time deviation increases from 1.84 h to 88.79 h. When the size of the training sample is set between [5,10], the time deviation and prediction error is smallest. As can been seen from Figure 10, with the data size increasing, the predicted voltage gradually deviates from the expected voltage. When the training data size is set to 150, only a roughly decline curve can be obtained, and the RUL deviation reaches 88.79 h. This phenomenon also confirms the assumption mentioned earlier. Compared with traditional BP algorithm, D-S ELM has little requirement for the number of data samples, which is more suitable for a small sample for online RUL prediction.

Conclusions
This paper proposes D-S ELM to predict the remaining useful life of PEMFC. In the proposed algorithm, the discrete wavelet transform and self-adaptive differential evolutionary algorithm are applied to optimize ELM. The proposed D-S ELM is compared with the original ELM, using the data set provided by FCLAB. The results show that D-S ELM has best performance.
The main advantages of the proposed algorithm can be summarized as follows: (1) The proposed D-S ELM algorithm has strong robustness. In order to study the uncertainties of data-driven algorithm, 100 simulations provide the RUL probability distribution of different algorithms. D-S ELM has best stability and smallest RUL deviation. (2) The proposed algorithm has little requirement for the size of training samples which is suitable for small sample prediction. (3) D-S ELM overcomes the shortcomings of a traditional neural network, such as a large amount of calculation and slow convergence speed. (4) D-S ELM improves the prediction accuracy of ELM, with the predicted voltage achieving an RMSE of 0.0036 and a ME of 0.0020.
The above research provides a basis for data-driven life prediction of PEMFC. However, the data set used in this research is under constant current conditions, while in practical applications, PEMFC usually works under variable loading conditions. Therefore, the next step of this study is to realize the RUL prediction of PEMFC with a variable load, which is a huge challenge but has a certain practical significance for prolonging the life of PEMFC and improving its reliability.