The Degradation Prediction of Proton Exchange Membrane Fuel Cell Performance Based on a Transformer Model

: Proton exchange membrane fuel cells have attracted widespread attention due to their cleanliness and high energy density, but the performance degradation during operation greatly limits their commercialization. Therefore, the reliable degradation prediction of fuel cell performance is of great significance. The recovery phenomenon of the reversible voltage loss that occurs during the operation of fuel cells has posed great difficulties for model training and prediction. Moreover, the models may easily and erroneously learn the combined trends in the recovery of reversible voltage loss and performance degradation. To address this issue, this paper employs the Transformer model to predict the performance degradation of fuel cells. By utilizing the unique self-attention structure and masking mechanism of the Transformer model, the signal for the recovery of the reversible voltage loss is adopted as the input for the model to avoid interference from information before voltage recovery on subsequent predictions. Experimental results show that the model has the highest prediction accuracy at various prediction starting points. Meanwhile, it can predict the accelerated performance degradation of fuel cells, which has positive implications for health management.


Introduction
As efficient and clean energy conversion devices, proton exchange membrane fuel cells (PEMFCs) play an increasingly important role in the development of modern energy technology [1][2][3][4].Compared with traditional internal combustion engines and secondary batteries, fuel cells can directly convert chemical energy into electrical energy, with higher energy conversion efficiency and lower pollution emissions [5][6][7].This gives fuel cells broad application prospects in fields such as transportation, distributed power generation, and portable power sources.
Although fuel cell technology has made significant progress in the past few decades, its commercialization process still faces many challenges, one of which is the issue of fuel cells' short lifespan [8][9][10][11][12].Specifically, fuel cells will experience a gradual decline in performance during long-term operation.This performance degradation is caused by a variety of complex factors, including the degradation of electrode materials, a loss of catalysts, mechanical damage to the membrane electrode assembly, and fluctuations in operating conditions [13,14].Performance degradation not only affects the efficiency and output power of fuel cells, but also shortens their service life and increases maintenance and replacement costs [15].Therefore, the accurate prediction of fuel cell performance degradation has important theoretical and practical significance [16].
At present, there are three methods for the prediction of fuel cell performance degradation: model-based methods, data-driven methods, and hybrid methods.Model-based methods generally establish models, including physical, semi-empirical, or empirical ones, which combine statistical and filtering methods for prediction [17].M. Chandesris studied the degradation of the catalyst layer of proton exchange membrane fuel cells under various operating conditions, focusing on the effects of voltage, pressure, and humidity on the degradation rate.Through experimental methods such as electrochemical technology, post-analysis, fluoride release measurement, and other experimental methods, a semi-empirical model was proposed to quantitatively describe the impact of these factors on membrane degradation [18].Mingyang Ou [19] proposed a prediction method based on a semi-empirical model to predict the performance degradation and remaining lifespan of proton exchange membrane fuel cells under automotive operating conditions.By introducing the degradation model of electrochemical surface area and equivalent resistance, a voltage prediction model was established and its accuracy was verified through experiments.Mathieu Bressel [20] proposed a lifespan prediction method for PEMFCs based on an extended Kalman filter to estimate the health state and remaining lifespan.Through simulation and experimental verifications, it can be proved this method can quickly and accurately predict the degradation trend and remaining service life of the fuel cells under various operating conditions.Since the fuel cell degradation is a multi-physics, multi-scale, multi-component, and multi-factor process, it is a complex and challenging task to establish an accurate degradation model.
The data-driven approach avoids the complex degradation mechanism of fuel cells and uses machine learning methods to establish a black box model to complete the prediction based on the fuel cell aging experimental data.At present, machine learning, especially deep learning, has been increasingly applied in the field of fuel cells.The common methods for the lifespan prediction of PEMFCs include long short-term memory (LSTM) neural network, gated recurrent unit (GRU) neural network, convolutional neural network (CNN), and echo state network (ESN).Chu Wang [21] generated navigation sequences through the ARIMAX model to improve the long-term prediction performance of LSTM, which has the best performance compared with the original LSTM, ESN, and nonlinear autoregressive exogenous input (NARX) models.Jian Zuo [22] combined LSTM and GRU with the attention mechanism to improve the accuracy of voltage degradation prediction under dynamic load cycles.The results show that the accuracy of this method is superior to that of the original LSTM and GRU methods.Khaled Benaggoune [23] adopted dilated CNN to predict the short-term and long-term performance degradation of fuel cells.The results showed that it was superior to LSTM in terms of accuracy and computing speed.Rania Mezzi [24] proposed a method based on ESN and Markov chain for predictions under dynamic load conditions.The results indicated that the remaining service life of fuel cells can be accurately predicted.Meanwhile, this method can achieve good prediction results even without prior knowledge of load configuration.Zhongliang Li [25] extracted health indicators under dynamic working conditions through a linear parameter change model and employed an integrated ESN model to improve the adaptability of ESN.The results demonstrated that this method can effectively predict the health status of fuel cells under different operating modes.Data-driven approaches generally have high requirements for data quality and require a large amount of data.
The hybrid method refers to the combination of the above two approaches to fully leverage the advantages of both.Daming Zhou [26] combined the empirical voltage degradation model and the neural network to predict the aging trend of fuel cells in the long term, which accurately described the local nonlinear characteristics of the degradation voltage in short and medium time ranges.The final results demonstrated that this method performed well in predicting both the long-and short-term voltage degradation trend of the fuel cells.Yujie Cheng [27] adopted least squares support vector machines for preliminary prediction and then used the regularized particle filter for processing.This method not only completed the prediction, but also provided the probability distribution of uncertainty.However, the hybrid method increased the complexity of the algorithm and also possessed higher requirements for the fusion strategies.
During the operation of the fuel cells, polarization curve testing, shutdown, and other operations will lead to the recovery phenomenon of the reversible voltage loss for the fuel cells, and the voltage will exhibit a segmented decrease characteristic.This poses great difficulties for the training and prediction of the model, and it is prone to learning the superimposed trend for the recovery of the reversible voltage loss and performance degradation, without reflecting the staged decline.
To address the above issues, this paper adopts the Transformer model for predicting the performance degradation of fuel cells.By utilizing the unique self-attention structure and masking mechanism of Transformer models, the signal for the recovery of the reversible voltage loss is adopted as the input for the model; in this way, the information known the voltage recovery does not interfere with the subsequent prediction.Ultimately, the prediction performance of the model is significantly improved.The main contributions of this paper are:

•
Taking the moment at which the recovery of the reversible voltage loss takes place as the boundary point, the fuel cell aging data are segmented and filtered to retain important information while filtering out noise effectively.

•
When the recovery of reversible voltage loss occurs, it is used as input for the model, and the information before the recovery of reversible voltage loss is masked, thereby improving the model performance.

•
The Transformer model is used to predict the performance degradation of fuel cells, and the influencing effects of the masking degree on the model performance are studied.
The remaining part of this paper is organized as follows: Section 2 introduces the method for predicting the performance degradation of fuel cells.Section 3 presents the experimental results and discussion, and Section 4 is the conclusion of this paper.

Method 2.1. Fuel Cell Aging Experiment
The fuel cell aging test/experimental data used in this paper were sourced from the IEEE 2014 Data Challenge [28].The testing system used is shown in Figure 1, and both fuel cell stacks used in the test are composed of five single cells.Each cell has an active area of 100 cm 2 .The PEMFC stack is assembled with commercial membranes, gas diffusion layers, and machined flow distribution plates.The nominal current density of the cells is 0.7 A/cm 2 , and the maximum current density is 1 A/cm 2 .The first fuel cell was operated under static conditions, with the current density maintained at 0.7 A/cm 2 , and other conditions were kept constant.In detail, the inlet and outlet flow rate of H 2 were 4.8 and 2.1 L/min, and the inlet and outlet flow rate of air were 23 and 21 L/min.Therefore, the main factor leading to voltage decay was the aging of the fuel cell, and voltage can be used as a health indicator.The fuel cell voltage throughout the entire experiment is shown in Figure 2.Meanwhile, polarization curve tests and electrochemical impedance spectroscopy tests were conducted on the fuel cell at 49, 186, 349, 516, 659, 824, and 992 h during the aging experiments.It can be seen that the recovery phenomenon of the reversible voltage loss for the fuel cells occurred at the corresponding moment of the voltage-time graph, and the degree of recovery became increasingly severe as the experiment progressed.Therefore, the voltage of the fuel cells exhibited the characteristics of phased decline over time.In addition, the first-order difference of the voltage is shown in Figure 2 as well.The first-order difference of voltage refers to the difference between the present voltage value and the voltage value at the previous time step, u(t) − u(t − 1), and it was employed to indicate the change in voltage.It can be observed that the differential value corresponding to the moment for the recovery of the reversible voltage loss is much larger than that at other moments.Therefore, by examining the difference, it is possible to verify whether the recovery of the reversible voltage loss has occurred.

Data Preprocessing
Since the performance degradation of fuel cells is a long-term process, the original voltage data were resampled at intervals of 1 h.Meanwhile, filtering is required to remove noise from the signal.However, if global filtering is performed directly, the signal near the moment when the recovery of the reversible voltage loss takes place will be improperly changed.Therefore, this paper uses the recovery time as the boundary point to perform segmented filtering on the signal.The Savitzky-Golay method was adopted for the filtering process, with a sliding window length of 20 h, and cubic function was selected as the fitting function.The results are shown in Figure 3.

Transformer
The Transformer model [29] was proposed by Google in 2017, and possesses a unique self-attention mechanism.The specific mechanism enables the model to efficiently perform parallel computing operations and effectively capture long-distance dependencies, achieving great success in fields such as NLP and time series prediction.Since the experimental data are limited, while the Transformer model requires a relatively large amount of data, only the encoder part of the Transformer is selected.
The structure of the Transformer encoder is shown in Figure 4c, and its core structure, the multi-head self-attention mechanism, is shown in Figure 4b.The concatenated multiple self-attention calculation results are shown in Figure 4a.The self-attention calculation formula is as follows: where Q, K, and V correspond to the query matrix, key matrix, and value matrix, respectively, all of which are derived from the linear transformation of the input.d k is the dimension of K.In NLP problems, padding is required to unify length due to the original input lengths being inconsistent.However, to prevent the model from considering the padding information, the weight matrix QK T √ d k needs to be masked when passing through the attention mechanism.In this paper, due to the small correlation between the information before and after the recovery moment of the reversible voltage loss, the interference can be induced, and thus it is also necessary to mask the information before recovery.Therefore, this paper takes whether the voltage recovery occurs at each moment as the input signal.The corresponding signal value at the time of occurrence is set as 1; otherwise, it is set as 0. The mask is calculated by the recovery signal of the reversible voltage loss, and the process is shown in Figure 5.To investigate the impact of previous information retention on model performance, the corresponding calculation of the weight matrix at the masking point is as follows: where s represents the degree of preservation, and when s → 0, it indicates that the information is completely masked.When s = 1, it means that no masking is performed.Since the self-attention mechanism is insensitive to the position information of the input sequence, it is necessary to encode each position of the input sequence and add it to the original information.The encoding formula is as follows: PE pos,2i+1 = cos(pos/10000 2i/d model ) where pos represents the position of the sequence, i represents the dimensional position of the sequence, and d m odel represents the dimension of the input after embedding.

Model
The final design model of this paper is shown in Figure 6.The voltage is adopted as the input for the encoder after embedding and position encoding.The voltage recovery information corresponding to the input voltage is used to generate a mask and the output is passed into the multi-head attention structure inside the decoder.Since the phenomenon for recovery of the reversible voltage loss is a human intervention operation, it should be employed as input to the model.Meanwhile, to enable the linear layer to learn the voltage recovery value, the voltage data are directly passed through the linear layer.The output of the Transformer encoder is also added to the voltage recovery through the linear layer to obtain the final output.

Evaluation Indicators for Prediction Results
To evaluate the accuracy of the model for the prediction results in terms of performance degradation, this paper selects RMSE, MAE, and MAPE as indicators.The calculation formula is as follows: where N is the predicted voltage sequence length, ŷi is the predicted voltage, and y i is the true voltage, value.

Results and Discussion
The performance degradation of fuel cells is a complex process involving the degradation of multiple components such as catalysts, membrane electrode assemblies, gas diffusion layers, and bipolar plates.Catalyst particles may agglomerate and dissolve during operation, leading to a decrease in surface area and catalytic activity.Proton exchange membranes are susceptible to chemical degradation and mechanical stress during long-term use, resulting in microcracks and decreased mechanical properties.Moreover, operating conditions such as relative humidity, temperature, and start-stop cycles also have a significant impact on the fuel cell performance.These factors interact with each other and cause fuel cells to gradually degrade during operation.Meanwhile, during the startup and shutdown process of the fuel cell, a brief voltage recovery phenomenon may occur.This is because the startup and shutdown process can help remove accumulated substances on the electrode surface or redistribute reactants, temporarily improving fuel cell performance.During the operation, the catalyst surface may be covered with contaminants or oxidized after long-term operation, which will reduce its catalytic activity.Through polarization curve testing or pulse voltage operation, these contaminants can be removed or the catalyst surface state can be modified so that the catalysts can be reactivated and the voltage can be increased/recovered.
Since the voltage of the fuel cell is constantly declining, directly selecting the voltage as the target will result in the range of the training set and the test set being completely separated, increasing the difficulty of model prediction.Therefore, this paper chooses the difference between the predicted value and the corresponding voltage at the last moment of the input as the output target.It should be noted that the voltage of fuel cells undergoes a severe recovery phenomenon after 1000 h, which has no predictive significance.Therefore, the data before 1000 h are selected as the research object.The prediction process adopts the training set to train the model, and then completes the performance degradation prediction within the entire test set according to the multi-step rolling prediction strategy.

The Influence of Masking Degree s on Model Prediction Results
To investigate the extent to which the information before the recovery moment of the reversible voltage loss should be retained, the retention degree s is set to 0, 0.3, 0.6, or 1, respectively.Meanwhile, 520 h, 620 h, and 720 h are selected as the boundary points, and the period before the cut-off point is used as the training set.Furthermore, the period after the cut-off point is used as the test set.Finally, the prediction results of the performance degradation are shown in Figure 7.The errors between the prediction results and the true values are shown in Table 1.From Figure 7, it can be observed that since the model is provided with the predicted time when the recovery of the reversible voltage loss occurs within the prediction range, each model is able to predict the characteristics of the voltage stages and the recovery phenomenon at the corresponding time.Nevertheless, when the model input includes voltage recovery, this may bring various degrees of interference to the model prediction since different masking degrees are set for the data before recovery.It can be found that when it is completely masked, i.e., s → 0, the model does not focus on the information before the voltage recovery, and the prediction results of performance degradation are closest to the true values.Meanwhile, observing the error results of each prediction in the table, it can be found that starting from 520 h, the prediction errors of s → 0 are the smallest, and the RMSE, MAE, and MAPE are 0.006, 0.004, and 0.001, respectively.The prediction result with s = 0.6 possesses the highest number of errors.When predicting from 620 h onwards, the errors of the model are reduced due to the increase in the training set.The prediction errors of s → 0 are still the smallest, and the errors of s = 0.3 are currently the largest.When predicting from 720 h onwards, the prediction errors are the smallest for s → 0, and the errors become the largest again for s = 0.6.Hence, due to the uncertainty of interference, the degree of retention corresponding to the maximum error constantly changes, but the complete masked situations always maintain the minimum errors and have the best prediction results for the performance degradation.
This verifies the significance of completely masking the data before voltage recovery in this paper, indicating that due to the polarization curve test operation of the fuel cell, the internal state has been redistributed and the surface state of the catalyst has been changed.The voltage information before the operation is not beneficial for subsequent predictions.

Comparison of Predictive Performance for Different Models
To verify the accuracy of the model proposed in this paper, the LSTM models under the ESN and Seq2Seq frameworks were used to make predictions according to the same training set and test set.520 h, 620 h, and 720 h were selected as the boundary points.The data before the boundary points were used as the training set, and the data after the boundary points were used as the test set.Finally, the prediction results of performance degradation are shown in Figure 8.The errors between the prediction results and the true values are shown in Table 2.
When the prediction starts from 520 h, the prediction results of the proposed model, including RMSE, MAE, and MAPE, are the smallest among the three models.In Figure 8a, ESN fails to predict the correct decay trend due to interference at the predicted starting point near the voltage recovery.Although the prediction results contain fluctuation characteristics similar to voltage recovery, the prediction of the recovery time is not accurate due to the unknown voltage recovery time.LSTM can predict the correct degradation trend, but this is the superposition of the aging and voltage recovery of the fuel cell performance and does not reflect the trend of accelerated performance degradation of the fuel cell in the later stage.The prediction results of the model proposed in this paper are almost consistent with the real data, demonstrating excellent performance.
When predicting from 620 h, the RMSE, MAE, and MAPE of the LSTM model are 0.009, 0.006, and 0.002, respectively.The RMSE, MAE, and MAPE of the proposed model are 0.007, 0.005, and 0.001, respectively.The errors of the two are very close, and lower than the errors of ESN.To further compare the performance of the prediction results, it is necessary to analyze according to the information in the figure.In Figure 8b, both ESN and LSTM predict that the fuel cell performance is constantly declining, while LSTM is, relatively speaking, closer to the actual result.The model proposed in this paper successfully predicts the staged decline characteristics of the fuel cells.Meanwhile, the proposed model can also predict the acceleration degradation characteristics of fuel cell performance, and thus it has the best performance.When predicting from 720 h onwards, the prediction errors of the three models basically decrease as the content of the dataset is enriched.The RMSE, MAE, and MAPE of the LSTM model are 0.009, 0.006, and 0.002, respectively.The RMSE, MAE, and MAPE of the proposed model are 0, 0.008, 0.007, and 0.002, respectively.The above-mentioned errors still remain lower than the prediction errors of ESN.In Figure 8c, the fuel cell voltage performance deteriorates sharply around 800h.The accurate prediction for this situation is of great significance for the health management of fuel cells [30].It is found that only the model proposed in this paper predicts this sharp decline result accurately.In summary, based on the reported error values in Table 2, the model proposed in this paper demonstrates the best prediction results, which can avoid information interference before voltage recovery and accurately predict the subsequent voltage recovery phenomena.It grasps the characteristics of the segmented degradation of fuel cell performance and the trend of accelerated degradation in the later stage precisely, providing reliable support for early intervention in fuel cell failures.

Conclusions
This paper proposes a prediction method for fuel cell performance degradation based on the Transformer model.Through segmented filtering and masking mechanisms, the interference which appeared during the recovery phenomenon of the reversible voltage loss on model prediction is effectively solved.Experimental results show that this method can significantly improve the prediction accuracy and successfully predict the staged decline characteristics of fuel cell performance.The model performance under different masking levels verifies the effectiveness of completely masking the data before voltage recovery.In addition, compared with the traditional ESN and LSTM models, the Transformer model proposed in this paper shows better prediction results in terms of fuel cell performance degradation, and can capture the variation trend of PEMFC performance degradation more accurately, especially in dangerous situations of severe performance degradation.In summary, the method proposed in this paper has important application value in fuel cell health management and performance prediction.

Figure 2 .
Figure 2. The relationship between time versus fuel cell voltage and time versus first-order difference of voltage.

Figure 5 .
Figure 5. Mask process: 0 and 1 represent no occurrence or recovery of the reversible voltage loss at time T, and F represents whether masking is needed or not in that area.

Figure 6 .
Figure 6.The proposed degradation prediction model of fuel cell performance.

Figure 7 .
Figure 7. Performance degradation prediction results for different degrees of masking s.(a) Starting from 520 h for prediction.(b) Starting from 620 h for prediction.(c) Starting from 720 h for prediction.

Figure 8 .
Figure 8. Prediction results of performance degradation for different models.(a) Starting from 520 h for prediction.(b) Starting from 620 h for prediction.(c) Starting from 720 h for prediction.

Table 1 .
Errors of the prediction results according to different starting point times and different degrees of masking s.

Table 2 .
Errors of prediction results according to different starting point times and different models.