1. Introduction
With the extensive production and consumption of fossil fuels, the global climate problem has become increasingly serious [
1]. Clean energies such as wind energy, solar energy, wave energy, and other renewable energies are favored by scientists and are widely regarded as effective substitutes for fossil energy. However, the inherent intermittency and volatility of these energies limit their stability and large-scale promotion in energy supply. In this context, hydrogen, as an important clean energy, shows great potential in the energy transition and is considered as one of the important ways to achieve deep decarbonization. Using the abovementioned energies for electrolytic water to produce hydrogen is not only an important hydrogen production method, but it can also deal effectively with the intermittency and volatility problems of renewable energies, thereby further promoting the utilization and development of clean energies [
2]. Therefore, hydrogen energy has an undeniable strategic significance in solving the global climate problem and promoting the development of sustainable energy. In the field of hydrogen fuel cells, proton exchange membrane fuel cells (PEMFCs) and solid oxide fuel cells (SOFCs) are very widely used. However, compared with solid oxide fuel cells, proton exchange membrane fuel cells have significant advantages, especially in terms of efficiency, durability, and stability [
3]. In terms of efficiency, proton exchange membrane fuel cells operate at lower temperatures (60–80 °C) compared to solid oxide fuel cells (700–1000 °C). This reduces the energy consumption of thermal management and enables the system to start up quickly. In terms of durability and stability, proton exchange membrane fuel cells have the ability to resist carbon corrosion under dynamic operating conditions. In contrast, solid oxide fuel cells suffer from problems such as nickel agglomeration and anode degradation in high-temperature environments. Moreover, proton exchange membrane fuel cells show stronger tolerance to frequent load cycles, which is a key requirement for automotive applications. In terms of service life, the utilization of carbon fuels in proton exchange membrane fuel cells can achieve a voltage retention rate of over 90% within 5000 h, which is better than the benchmark performance of solid oxide fuel cells under comparable conditions. Therefore, due to the abovementioned favorable characteristics, proton exchange membrane fuel cells (PEMFCs) are widely used in various fields such as fuel cell vehicles, railway locomotives, and ships.
However, the lifespan of proton exchange membrane fuel cells (PEMFCs) poses a significant barrier to their widespread application. The key to extending their lifespan lies in the accurate assessment of their remaining useful life (RUL) [
4,
5]. Current research on lifespan prediction methods for PEMFCs can be primarily categorized into three approaches: physics-based methods, data-driven methods, and hybrid prediction methods [
6].
The method of predicting the lifespan of proton exchange membrane fuel cells (PEMFCs) based on physical methods mainly relies on internal mechanism models, empirical models, semi-mechanistic models, or semi-empirical models [
7]. The advantages of these methods include requiring less data, high accuracy, and strong universality. Bressel et al. [
8] proposed a degradation-based extended Kalman filter observer to estimate the remaining useful life (RUL) by deriving the degradation state of the internal health core of the battery. However, this method is limited to using a single degradation model for life estimation, resulting in a lower robustness of the model. Based on the voltage polarization loss model established in the literature [
9], Jouin et al. [
10] developed an accurate semi-mechanistic model for the power degradation of proton exchange membrane fuel cells. By integrating experimental data with the mechanistic model, they quantified the dynamic evolution law of key performance indicators (such as voltage decay rate, ohmic impedance rise, etc.) during the aging process, but did not fully consider the concentration polarization loss caused by the anode concentration gradient and the hydrogen loss caused by the cross-leakage of the proton exchange membrane, and this simplification may affect the prediction accuracy of the model under dynamic variable load conditions. Zhang et al. [
11] developed a typical semi-empirical and semi-mechanistic model to analyze the design parameters based on the equivalent electrochemical impedance spectroscopy (EIS) equivalent circuit model. Through experimental data and expert knowledge, they established a fuel cell power self-recovery model. This multi-model method ensures a lower computational workload and improved prediction accuracy. In general, the model-based methods fully consider the influence of internal aging factors and external state variables on the output performance of PEMFCs and have the advantages of low data dependence and strong universality. However, they need to establish an accurate mechanism model, which requires an in-depth exploration of the aging mechanism of fuel cells. This significantly increases the complexity of the modeling process.
Data-driven methods do not require the construction of internal degradation models. Instead, they use historical data to build suitable behavior models for fault diagnosis and lifespan prediction. The advantages of this approach include its ability to handle various nonlinear relationships, flexibility in model construction, and high prediction accuracy [
12]. However, such methods still face the “black-box” problem regarding PEMFC aging mechanisms and state changes, meaning they cannot reveal the relationships between internal parameters or provide useful insights for subsequent development and maintenance. Common types of data-driven methods in current research include statistical modeling, machine learning, deep learning, and hybrid learning [
13]. Data-driven methods are based on statistical learning or machine learning (ML) algorithms to extract degradation pattern representations from massive fuel cell aging data, and then construct data-driven prediction models to achieve quantitative prediction of aging trends [
14,
15]. Literature [
16] introduces a series of machine learning methods, including the linear regression model (LR), support vector regression model (SVR), decision tree regression (DT), and a multi-layer perceptron model (MLP). It also conducts a comparative analysis of their advantages and disadvantages in predicting capacitance, offering a new approach for the application of machine learning in materials. Deep learning, as an important branch of ML, has a strong learning ability and adaptability, and it has broad application prospects in the field of fuel cell degradation prediction [
17]. Morando et al. [
18] use the echo state network (ESN), combined with the prior knowledge of fault tolerance (FT) technology to estimate the remaining useful life (RUL). Li et al. [
19] designed an RUL prediction method based on GRU, which generally performs better than conventional methods. He et al. [
20] extract health indicators from the degradation voltage and use the long short-term memory network (LSTM) to predict future health indicators. Yi et al. [
21] proposed an improved model of matrix long short-term memory (M-LSTM), which can enhance the global modeling ability of LSTM in complex nonlinear feature learning and long sequence data processing. Zhou et al. [
22] applied convolutional neural network–bidirectional gated recurrent unit with attention mechanism (CNN-BiGRU-AM) for fuel cell fault prediction. The proposed method has a good performance in the remaining useful life prediction. Compared with the traditional machine learning prediction model, the short-term prediction performance of this fusion model has been significantly improved. Although the existing methods have good prediction accuracy, the accuracy of their prediction results depends on a large amount of high-quality data and effective prediction methods.
Since both physics-based and data-driven methods have their advantages and limitations, hybrid model prediction approaches combine the interpretability of internal parameters in physics-based models with the flexibility of data-driven methods to address unclear physical mechanisms. This hybridization allows for the treatment of ambiguous aspects of the system, making subsequent predictions more feasible. By integrating the strengths of both approaches, hybrid models improve the overall prediction accuracy. In existing research, three primary hybridization strategies have been identified: model-data hybrid-driven, data-data hybrid-driven, and multi-data-multi-model hybrid-driven modes. Mao et al. [
23] developed a sensitivity and noise-resistant model to select the optimal sensor measurement combinations. These combinations were then used as inputs for an adaptive neuro-fuzzy inference system (ANFIS) to estimate voltage degradation in PEMFCs, thus investigating their performance decline. The data-data hybrid mode involves fitting the data and using this dimension of information in another data model for prediction, thereby improving prediction accuracy and robustness. Zhu et al. [
24] proposed a B-GRU hybrid model by combining Bayesian theory and self-attention mechanisms. They first employed a random forest model to extract key feature parameters and then introduced these features into the B-GRU model, yielding favorable accuracy. Li et al. [
25] extracted aging features from the polarization curve of the battery using a degradation empirical model, while simultaneously identifying aging features from the electrochemical impedance spectroscopy (EIS) data through a backpropagation neural network. By performing similarity analysis on these aging features, they improved results and reduced random errors. The final fusion of aging features was then incorporated into the model to accurately predict the aging state of PEMFCs. From a review of existing studies, it is clear that hybrid-driven approaches offer superior prediction accuracy compared to solely model-based or data-driven methods. However, this comes with the trade-offs of more complex models and longer computational times.
To address the challenges of high noise and instability in fuel cell data caused by complex operating conditions and electromagnetic interference, this study proposes an EMD-TCN hybrid framework that integrates empirical mode decomposition (EMD) and temporal convolutional network (TCN). This framework achieves high-precision lifetime prediction through three key innovations: (1) EMD separates high-frequency noise (e.g., transient load fluctuations) from low-frequency degradation trends (e.g., catalyst aging) to achieve adaptive multi-scale signal modeling; (2) group normalization (GN) calibrates multi-channel intrinsic mode function (IMF) features, suppresses mode mixing, and enhances the interpretability of degradation mechanisms (voltage attenuation, flow channel blockage); (3) the optimized group convolution in TCN improves computational efficiency while maintaining long-term dependency modeling based on dilated convolution. Mainstream models such as RNN, LSTM, and TCN are selected as the benchmark models for prediction. These models can effectively model the time dependence of the battery degradation process (such as the long-term trend of voltage attenuation and the short-term fluctuations of start–stop events) and the vanishing gradient problem, thereby maintaining their robustness in long time-series. They are classic models widely used in the current field. Taking these models as a benchmark provides a technical reference for the performance comparison of the new method (such as EMD-TCN) proposed in this paper. At the same time, its computational efficiency (single-step complexity) and the support of a mature toolchain ensure the engineering feasibility.
The structure of the remaining sections of this paper is as follows:
Section 2 discusses the characteristics of steady-state and dynamic cycling conditions and performs filtering and smoothing of several strongly correlated feature parameters for denoising and reconstruction.
Section 3 presents prediction models based on RNN, LSTM, TCN, and the EMD-based TCN model. In
Section 4, a comparative analysis and summary of the predictions from different models are provided.
Section 5 concludes the paper and offers perspectives for future research.
3. Lifetime Prediction Model Design
3.1. Voltage Prediction Algorithm Process
The voltage prediction experiment is shown in
Figure 4 and consists of three parts: data preprocessing, model training, and model testing. The dataset used is the IEEE PHM Data Challenge durability testing public dataset mentioned earlier.
In the data preprocessing phase, the raw voltage data are first processed using DWT or EMD. Then, the data are divided into different proportions based on a preset ratio. In the model training phase, various deep learning models are constructed and initialized. The training set is fed into these models for training, and after each training session, the results are quantitatively evaluated. The model structure and hyperparameters are adjusted and optimized to obtain the best prediction model. In the model testing phase, the test set is fed into the optimal prediction model to compute the voltage prediction results. The efficiency and accuracy of the models are compared under different prediction methods.
Under the steady-state conditions in this study, the research team established a comprehensive evaluation index system. It mainly uses parameters such as the root mean square error (RMSE), the coefficient of determination (R
2), and the difference between the predicted time and the actual time when the voltage reaches the decline threshold to construct a combined analysis index. In this way, it can evaluate the fitting degree of the model’s voltage trajectory (RMSE) and measure the model’s ability to explain voltage changes (R
2) overall and capture the error at the critical node of the late stage of voltage attenuation locally. The calculation formulas for RMSE and R
2 are as follows:
In the equation, the following is true:
: The i-th predicted value.
: The i-th true value.
n: The number of samples.
In the equation, the following is true:
: The i-th predicted value.
: The i-th true value.
: The mean of the true values.
n: The number of samples.
3.2. Environment Configuration
This training experiment uses the Python programming language, with the software environment set to VS Code and the optimizer set to Adam. The experimental environment configuration is shown in
Table 3. To facilitate the comparison of prediction results across models, the number of iterations, hidden layer size, and sequence length are kept consistent. At the same time, to ensure the reliability of the prediction results and minimize random errors, each model undergoes three prediction runs. The RMSE, R
2, and other evaluation metrics are calculated by averaging the results from the three predictions.
3.3. Lifetime Prediction Model Based on Recurrent Neural Networks (RNNs)
Recurrent neural networks (RNNs) are capable of capturing sequential information and transmitting it through one or more cycles. The basic structure of an RNN includes one or more recurrent units, which repeat over time steps. At each time step, the RNN receives an input vector and a hidden state and outputs an output vector and an updated hidden state. The hidden state is updated at each time step based on the current input and the hidden state from the previous time step. Through cyclic connections, the RNN can propagate prior information to subsequent time steps, allowing it to retain and transfer important contextual information in the sequence. The basic structure of the RNN is shown in
Figure 5.
In the model development stage, to suppress the overfitting tendency in the model prediction process, the dropout regularization technique [
31] is introduced. This technique implements dynamic network structure optimization by randomly masking a specific proportion (such as 40%) of neuron nodes and their associated weights during the training stage. Through the dynamic random inactivation strategy in each training cycle, it realizes the implicit integration of multiple groups of sparse sub-networks, thereby avoiding repeated optimization of the fixed network architecture and effectively improving the generalization performance of the model for new samples. Its working mechanism is shown in
Figure 6.
To ensure that the learning rate η can adaptively adjust while maintaining the adaptability and stability of each parameter, the Adam optimizer was chosen. The other configurations are as follows: the number of neurons in the hidden layers is 150, the dropout probability is 0.4, the activation function is ReLU, the number of iterations is 2000, and to balance training speed and generalization ability, the batch size is set to 64 for each iteration. The number of iterations is experimentally determined using the early stopping method, that is, the training stops when the performance of the model on the validation set no longer improves, and the final number is selected as 2000 iterations; parameters such as batch size are set according to the computer configuration. The prediction results are shown in
Figure 7, and the model evaluation metrics are presented in
Table 4.
The analysis shows that with 80% of the data used for training, the RMSE of the RNN model decreased by 52.7% compared to 40%, and by 14.4% compared to 60%, while the R
2 values increased by 64.3% and 45.5%, respectively. This indicates that a larger training dataset leads to better extraction of machine learning information, resulting in more accurate trend predictions. As seen in
Figure 7b,c the training results exhibit slight gradient explosion, with the predictions gradually deviating from the normal values over time. This suggests inherent limitations of traditional RNN models, which are not easily parallelizable and are prone to gradient vanishing or explosion.
3.4. LSTM-Based Life Prediction Model
LSTM (long short-term memory) is an enhanced type of recurrent neural network (RNN) that incorporates a gating mechanism to handle complex data with strong temporal dependencies. This innovation addresses the weakness of traditional RNNs in dealing with long-term sequential dependencies. Through its gating mechanism, LSTM can effectively control the flow of information, enabling it to better capture and transmit long-term dependencies [
32,
33]. This makes LSTM highly effective in tasks such as language modeling, machine translation, and text generation in sequence data. The gating mechanism of LSTM includes the forget gate, input gate, and output gate, which autonomously learn and adjust the weights of input data to better control the flow of information and preserve important contextual information. The structure of its unit is shown in
Figure 8 [
34].
On the basis of the LSTM algorithm architecture, to prevent overfitting in predictions, the dropout technique is also introduced. According to the conclusions of Bin et al. [
35], a learning rate η = 0.01 and a dropout random deletion probability of 0.4 were selected to train the network. The number of neurons in the LSTM hidden layer is 150, the number of iterations is 2000, the learning rate decays every 50 epochs, and the learning rate decay rate is 0.2. The final prediction results are shown in
Figure 9, and the model evaluation metrics are presented in
Table 5.
Analysis indicates that when the training set accounts for 80% of the data, the RMSE of the LSTM model decreases by 16.4% compared to 40%, while it slightly increases by 1.9% compared to 60%. The R2 value shows no significant difference, suggesting that the fitting capability of the LSTM model is less affected by the proportion of the training set. In the prediction phase, the voltage error gradually increases over time, which can be attributed to the lack of model-based correction of the prediction results, leading to the accumulation of errors.
3.5. Life Prediction Model Based on Temporal Convolutional Network (TCN)
Temporal convolutional network (TCN) is a neural network technique capable of effectively processing time-series data and transforming it into a predictive model. The TCN model is composed of multiple stacked 1D convolutional layers. The 1D convolution operation efficiently captures both global and local information from time-series data, offering higher parallelism and computational efficiency [
36,
37]. The structural unit of the TCN is shown in
Figure 10.
The input and output of the convolutional layer are one-dimensional feature vector sequences of time-series data, where the features at each time step are obtained by performing convolution operations on local regions of the input sequence. An activation function is then applied to introduce nonlinearity. Typically, a global average pooling layer or a global max pooling layer is added at the end to extract the overall features of the time-series data. Subsequently, the extracted features are mapped to the final output through a fully connected layer.
Considering the length of the training dataset, the number of neurons in the TCN encoder input layer is set to 32, the number of neurons in the hidden layer is set to 64, and the size of the 1D convolutional kernel is set to 3. The number of neurons in the feedforward neural network is set to 128. The final prediction results are shown in
Figure 11, and the model evaluation metrics are presented in
Table 6.
Analysis shows that when the training set accounts for 80% of the data, the RMSE of the TCN model decreased by 60.8% compared to 40%, and by 35.1% compared to 60%. The R2 values increased by 97.3% and 71.3%, respectively. In applications such as predicting fuel cell lifespan and other long-term time-series forecasting, the TCN model processes training data faster, offers more adjustable hyperparameters, and exhibits stronger adaptability and generalization ability.
3.6. Lifespan Prediction Model Based on EMD-TCN
To further improve the prediction accuracy of long-term time-series data for the PEMFC system, a novel data processing scheme based on empirical mode decomposition (EMD) is proposed. This approach demonstrates stronger adaptability when processing nonlinear and non-stationary signals. Therefore, an EMD-TCN fuel cell lifespan prediction model is introduced, with improvements made to the normalization layer of the TCN.
In the basic structure of the TCN model, the unit residual block consists of two identical internal convolutional units and a residual connection. The logical sequence processed by a single convolutional unit is as follows: causal convolution layer, normalization layer, activation function, and dropout layer. There are several types of normalization methods, including weight normalization (WN), group normalization (GN), batch normalization (BN), and instance normalization (IN) [
38,
39]. Compared to BN and IN, GN has lower computational complexity, and it more accurately shares statistical information within groups with fewer channels, thereby reducing the risk of overfitting and enhancing the model’s generalization ability. Therefore, in this experiment, the group normalization (GN) method is employed to improve the TCN, with the commonly used activation function of exponential linear unit (ReLU). The prediction process of the PEMFC lifespan based on the EMD-TCN (GN) model is shown in
Figure 12.
The specific steps for the prediction are as follows:
Classical modal decomposition: The original voltage data are decomposed into three intrinsic mode function (IMF) components and one residual component using empirical mode decomposition (EMD) with three layers. According to the previous experiments, to determine the number of decomposition layers, too few layers will lead to lower prediction accuracy, and too many layers will increase the computational complexity and also introduce unnecessary noise or false components.
Feature extraction: Seven statistical features are calculated for each of the four components, resulting in a 28-dimensional feature dataset.
TCN model: The features from the training set are normalized using min-max normalization. The normalized results are then used as inputs to the model, with the remaining lifetime percentage PPP (percentage of remaining performance) as the training label. The Adam optimizer is employed for model training. A random 20% of the data are used as a validation set to enhance the model’s generalization capability.
Test set validation: The trained optimal model is used to predict the remaining lifetime percentage based on the test set.
Following these steps, the preprocessed 2000 voltage time-series data are divided into a training set and a test set with a 6:4 ratio. The training set is used as input for the EMD decomposition, ultimately producing three IMF components and 1 residual component (res.).
The TCN (GN) model has a neuron count of 64, 32, 16, 8, and 1 in the residual connections from the first to the last layer. The design decreases the number of neurons (64 → 32 → 16 → 8 → 1). The first layer with 64 neurons captures the high-frequency details of the original voltage signal (such as millisecond-level load fluctuations), retains rich information through a wide number of channels, and achieves spatial dimension compression (reducing the risk of overfitting) and key feature enhancement (highlighting low-frequency patterns such as attenuation trends) by halving the number of neurons layer by layer. Finally, the output of one neuron (voltage prediction) meets the task requirements. With dilation rates of 1, 2, 4, 8, and 16, respectively. The expansion rate shows exponential growth (1 → 2 → 4 → 8 → 16). The underlying expansion rate of 1 (equivalent to conventional convolution) captures the influence of local fluctuations (such as start-stop impacts). The exponential expansion layer by layer makes the receptive field grow geometrically, matching the overall attenuation period. The convolution kernel size is 3, the activation function used is ReLU, and the batch size is 32. Compared with a small batch, the training stability is higher, and the convergence rate increases significantly; compared with a batch size of 64, its overfitting risk is smaller. A step size that is too short will result in the TCN model failing to learn sufficient data information, while a step size that is too long will increase the computational load, thereby reducing efficiency. Based on literature [
40,
41], the step size is set to 16. To evaluate the applicability of the TCN (GN) model in predicting the lifespan of PEMFC, TCN (WN) and TCN (BN) are used as control groups for experimental comparison. The prediction results are shown in
Figure 13.
As can be seen from the figure, under different training conditions, the TCN (GN) model slightly outperforms TCN (WN) and TCN (BN) in terms of overall prediction accuracy across the entire prediction interval, yielding a more ideal fitting effect. Through quantitative analysis of model evaluation metrics, as shown in
Table 7, the RMSE of TCN (GN) is consistently smaller than that of the other two models, and its R
2 value is superior to that of the other two, further demonstrating that TCN (GN) can accurately share statistical information with fewer intra-group channels, has a lower risk of overfitting, and possesses stronger model generalization capabilities. This suggests that TCN (GN) is more capable of meeting the accuracy requirements for lifespan prediction.
In further research, the internal structural parameters of the TCN model were deeply explored, particularly focusing on the selection of normalization layers. However, based on the normalization features in the TCN group, suitable grouped convolutions can increase the network’s nonlinearity expression ability, help reduce the number of parameters, and improve computational accuracy. Depending on the dataset length, the grouping parameters were initially selected as 2, 4, and 8, referred to as TCN (GN-2), TCN (GN-4), and TCN (GN-8), respectively. Through experiments and cross-validation, the optimal grouping parameter combination was determined.
The data after modal decomposition were imported into each TCN model, and RMSE and R
2 were calculated, with the prediction results shown in
Table 8 and
Figure 14. For the output voltage prediction results based on the EMD-improved TCN (GN-X) model, EMD-TCN (GN-4) outperforms TCN (GN-2) and TCN (GN-8) in 40%, 60%, and 80% training data proportions. When compared to the prediction results in literature [
35], which used a similar method, that study demonstrated the best prediction performance when the grouping parameter was 2 (i.e., using the TCN (GN-2) model). This conclusion is inconsistent with the findings in this paper. Upon analysis, it is found that the voltage data used in this paper are an order of magnitude smaller than the bearing lifespan data in that study. With smaller datasets, smaller grouping parameters lead to higher variance. As the calculation of mean and variance is smaller, the normalization effect is not as effective as that of larger grouping parameters. Therefore, when using the TCN-based normalization model, it is important to select appropriate grouping parameters according to the specific application scenario and dataset characteristics in order to achieve the best normalization effect and computational efficiency.
4. Results and Discussion
To evaluate the performance of the proposed EMD-TCN voltage prediction model, a comparative analysis is conducted across various models, including traditional RNN, LSTM, as well as TCN (WN), TCN (BN), and TCN (GN-X) without grouped parameters. The datasets and experimental settings are consistent across all comparison experiments. The prediction results, shown in
Figure 15, indicate through qualitative analysis that the TCN (GN-4) model, after EMD modal decomposition, achieves the highest prediction accuracy, significantly outperforming the traditional RNN and LSTM models. Because the IMF learning components after EMD decomposition can enhance the identifiability of features, it can reduce the dependence on data in practical applications. Combined with the parallel computing characteristics of TCN based on dilated convolution, it has a very high computing speed under the acceleration of GPU. The accuracy comparison between TCN (GN-8), TCN (GN-4), and TCN (GN-2) is not significantly distinct, warranting further quantitative analysis [
42].
Since the dataset did not reach the 90% lifespan failure threshold by the end of testing [
43], a voltage lifespan threshold of 96% of the standard voltage value, i.e., 3.228 V, was adopted, The selection of this threshold avoids the early high noise interference corresponding to the 95% threshold, and fully utilizes the steady-state data in the 96–91% attenuation interval to reduce the uncertainty of extrapolation. In
Figure 15a, the actual times when the voltage first and second reached the threshold were 807 h and 872 h, respectively. The EMD-TCN (GN-4) model predicted these times as 809 h and 876 h, respectively. In
Figure 15b, the EMD-TCN (GN-4) model predicted the threshold-reaching times as 810 h and 886 h. Both prediction times were the closest to the actual values among all models, demonstrating that the model can accurately predict the steady-state lifespan of PEMFC.
The prediction results of multiple models are presented in
Table 9 and
Figure 16.
Analysis of the above figure and table shows the following:
(1) When the training set accounts for 40%, 60%, and 80% of the data respectively, the root mean square error (RMSE) value of the time convolutional network (TCN) (GN-4) prediction model based on empirical mode decomposition (EMD) is the smallest among the corresponding comparison groups. This indicates that the deviation between the predicted values and the actual values of this model is the smallest, thus demonstrating the best prediction performance. Its coefficient of determination (R2) value is also the highest in the same group, which means that this model can most effectively explain the variance of the voltage data and achieve the highest goodness-of-fit. Moreover, as the data volume increases, the RMSE shows a downward trend and R2 tends to be stable, indicating that the model has no risk of overfitting as the data volume increases. The dilated convolution structure of TCN can capture long-term time dependencies. Combined with the decomposition ability of EMD for non-stationary signals, the model can more accurately extract the core features of voltage attenuation when the data volume is sufficient.
(2) Comparative analysis shows that the time convolutional network (TCN) model based on empirical mode decomposition (EMD) significantly outperforms the traditional recurrent neural network (RNN) and long short-term memory network (LSTM) in prediction performance. When the training set data accounts for 40%, the root mean square error (RMSE) of the TCN (GN-4) model reaches 0.0055 V, a 76.96% reduction compared with the RNN baseline model (0.239 V). At the same time, its coefficient of determination (R2) increases to 0.879, a 217.3% increase compared with RNN (0.277), and the prediction accuracy and generalization ability are significantly improved. This confirms the powerful ability of EMD in extracting key information and its excellent adaptability to the analysis of nonlinear and non-stationary signals.
(3) A horizontal comparison of the prediction results of EMD-TCN (GN-4) under different training set ratios reveals that the fluctuation of the results is extremely small. This indicates that the TCN (GN) model has excellent learning ability, enabling it to effectively learn data features from a small dataset and make accurate predictions. There are no problems such as overfitting when the data volume increases, showing strong robustness to the data volume.
(4) Research on normalization methods shows that the group normalization TCN model can more accurately share statistical information, reducing the risk of overfitting, and significantly outperforms batch normalization, weight normalization, and traditional neural network models in terms of generalization ability. Under different training set sizes, the two evaluation indicators of group normalization, R2 and RMSE, perform well. Further research on the number of channels reveals that the model accuracy is highest when the number of channels is 4. The main reason is that when the data volume is small, using a smaller grouping parameter will result in a larger variance, and the calculation amount of the mean and variance is small, so the normalization effect is not as good as that of a larger grouping parameter. However, an excessively large grouping parameter will cause gradient competition among too many groups of parameters during back-propagation, leading to training instability and thus reducing the prediction accuracy
In recent years, models based on the Transformer architecture or the attention mechanism have also received widespread attention. By comparing the latest research results, this study shows that the proposed EMD-TCN(GN) model has significant advantages in terms of prediction accuracy and engineering applicability. The RMSE value of the RCLMA model (integrating residual convolution blocks, LSTM units, and multi-head self-attention layers) developed by scholars such as Sun [
44] on the same test dataset is 0.01785, which is significantly higher than 0.00447 of this study; Although the transfer learning-Transformer hybrid model proposed by the Tang [
45] team has a better scene transfer ability (it only needs to fine-tune parameters to adapt to different prediction scenarios), its RMSE value of 0.00598 is still 34.3% higher than that of this model. In general, the EMD-TCN(GN) model can extract features from each IMF component and residual of empirical mode decomposition, expanding the data dimension, and the model can train and learn feature information more effectively. The real-time state update mechanism accurately captures the reversible voltage recovery effect, overcoming the over-simplification defect of the life prediction model listed in the introduction. And compared with the Transformer model or the attention mechanism, its computational efficiency is higher, saves computing power, and is more convenient for embedded deployment, and the effect of the Transformer highly depends on high-quality historical data. If the data have noise or are missing, it may affect the prediction accuracy of the model, so a more complex data preprocessing process is required.
In order to deeply explore the prediction accuracy and model generalization of the EMD-TCN model in different prediction scenarios, the team introduces a new dataset for prediction. The data come from the cumulative results of 96 CLTC cycle working conditions on the 60 kW fuel cell test platform in the laboratory (as shown in
Figure 17). If all the data are used for life prediction, it is very easy to have a gradient explosion and a huge amount of calculation, and the feasibility is low. Therefore, the high-speed interval data that have a greater impact on the performance degradation of the PEMFC is selected for prediction, and a total of 20,784 data points are screened out. On this basis, the equidistant window is selected, and one group of high-speed interval voltage data are taken for every four groups of cycles, totaling 24 groups. The voltage values are combined into a continuous dataset according to the test sequence, and finally 5196 original voltage time-series data points are obtained. In the data preprocessing process, the EMD method is used to process the voltage data to obtain five IMF components and one residual (res.). Compared with the decomposition of three IMF components in the steady-state working condition, this is mainly because the quality of the original data in the dynamic working condition is poor; there are extremely complex noises and outliers, resulting in excessive decomposition times, and the results are not ideal, which may affect subsequent prediction. Since the calculation amount of the dynamic voltage dataset and the steady-state voltage dataset is similar, according to the previous conclusion, the improved TCN (GN-4) model is selected for prediction. At the same time, in order to verify the applicability of the EMD-TCN model to the prediction of dynamic working conditions, the prediction results of the RNN and LSTM models are compared. R
2 and RMSE are mainly selected as the model evaluation indicators. These two indicators can accurately reflect the overall goodness-of-fit of the model and the average deviation between the predicted values and the actual values. The experimental environment configuration and the proportion of the training set are consistent with the previous ones. The final prediction results are shown in
Table 10.
Analysis shows that the EMD-TCN model achieves the highest R2 value of 0.712 and the lowest RMSE of 4.88 in predictions. These metrics are slightly worse than the steady-state prediction results under an 80% training set proportion (R2 = 0.877, RMSE = 4.47). This indicates that the discrepancies between predicted and actual values are relatively small under both dynamic and steady-state operating conditions, demonstrating strong prediction accuracy and robust model performance. However, the reduced R2 value suggests higher variability in predictions, implying that the model is sensitive to noise and outliers in dynamic scenarios. To address this, enhanced data preprocessing—such as incorporating unsupervised CGAN models for data reconstruction—should be prioritized in dynamic experiments to mitigate the impact of dataset biases and improve prediction stability.
5. Conclusions and Future Outlook
This paper proposes a PEMFC long-term lifespan prediction scheme based on EMD-TCN. The model consists of two identical internal convolution units and a residual connection structure. The logical sequence processed by a single convolution unit is as follows: causal convolution layer, normalization layer, activation function, and dropout layer. An in-depth study of the normalization layer was conducted. Compared to batch normalization and instance normalization, group normalization (GN) has lower computational complexity. Additionally, GN more accurately shares statistical information with fewer channels within a group, thereby reducing the risk of overfitting and improving the model’s generalization ability. To validate the model, the FCI dataset was first processed, then sliced and input into the model for prediction. RMSE and R2 were selected as evaluation metrics for prediction validation. The main conclusions are as follows:
The prediction results of RNN, LSTM, and TCN models were analyzed, and the prediction performance was found to be unsatisfactory, with the maximum R2 value only reaching 0.663. Therefore, an EMD-TCN prediction model was proposed, and its normalization layer was further analyzed. The RMSE of the TCN with group normalization (GN) was lower than that of TCN with weight normalization and batch normalization, and the R2 value was superior to those of the other two methods. This further demonstrates that TCN (GN) can accurately share statistical information within groups with fewer channels, reducing the risk of overfitting, and improving the model’s generalization ability. Moreover, further research on TCN group normalization features showed that appropriate grouped convolutions can increase the network’s nonlinear expression ability, reduce the number of parameters, and improve computational accuracy. The results indicated that for training set proportions of 40%, 60%, and 80%, the EMD-based TCN (GN-4) prediction model had the smallest RMSE value among the comparison groups, suggesting that the model’s predicted values were closest to the actual values and achieved the best prediction performance. Its R2 value was also the highest in the group, meaning the model explained the most variance in voltage data and had the best fit. For lifetime prediction, using a voltage threshold of 3.228 V (96% of the standard voltage value), the real values reached the threshold at 807 h and 872 h for the first and second times, respectively. The EMD-TCN (GN-4) model predicted the threshold reach times at 810 h and 886 h, respectively. This demonstrates that the model can accurately predict the lifetime of PEMFCs.
Compared to traditional RNN, LSTM, and TCN models, the proposed EMD-TCN model offers higher prediction accuracy. The in-depth study of its normalization layer reduces computational load, lowers overfitting risks, and enhances the model’s generalization ability. This research is significant for improving the accuracy, reliability, and stability of PEMFC lifetime prediction.
Although our study shows that EMD-TCN outperforms conventional models, it primarily demonstrates its precision in steady-state output conditions. In the future, we plan to apply it to different devices and more complex scenarios to improve computational speed and prediction accuracy, enabling more accurate lifetime predictions and faster fault diagnosis, thereby promoting the large-scale application of fuel cells.