5.1. Signal Decomposition Comparison
To evaluate the relative merits of the selected signal processing techniques, EMD, EEMD, and CEEMDAN were applied for a comparative decomposition analysis of battery B0005.
Figure 5 shows the multi-scale decomposition outcomes.
The primary limitation of conventional EMD manifests in
Figure 5a as modal overlap issues, where IMF2 demonstrates spectral aliasing with adjacent IMF1 components. Furthermore, the residual term retains frequency elements characteristic of IMF3, indicating incomplete separation that undermines the physical significance of intrinsic mode functions and compromises decomposition fidelity. When transitioning to EEMD in
Figure 5b, the white noise-assisted ensemble averaging effectively reduces mode mixing. However, this approach introduces new challenges of persistent noise contamination in higher-order IMFs due to incomplete cancelation during averaging processes. In contrast, the advanced CEEMDAN methodology presented in
Figure 5c demonstrates dual advantages through its adaptive noise regulation framework, achieving the complete elimination of both modal interference and residual stochastic artifacts.
To further illustrate the effectiveness of CEEMDAN decomposition, this study computed the Pearson correlation coefficients between the residuals obtained from EMD, EEMD, and CEEMDAN decompositions and the original data, along with the OI between each IMF. Pearson correlation is used as an auxiliary indicator to evaluate trend preservation: by computing the correlation between the residual component and the original capacity series, we quantify whether the residual retains the dominant long-term degradation tendency after decomposition. A higher Pearson value suggests that the decomposition produces a residual that is more consistent with the global trend of the original signal, while separating short-term fluctuations (including noise and regeneration-related variations) into IMFs. In contrast, the OI is adopted to evaluate the mode separability among IMFs: lower OI values indicate weaker inter-mode coupling and less information redundancy, implying that the decomposed IMFs are more independent and thus more suitable for subsequent component-wise modeling and reconstruction.
Validation was performed using the NASA B0005 dataset and the CALCE CS235 dataset, with the results presented in
Table 4.
As can be observed, on the B0005 dataset, the differences in Pearson correlation coefficients between the decomposition residuals and the original data are minimal; all three methods are capable of accurately capturing the trend of battery capacity degradation. EEMD yielded the lowest OI among its IMFs, followed by CEEMDAN. On the CALCE CS235 dataset, CEEMDAN outperformed the other two methods in both the Pearson correlation coefficient and OI. These results demonstrate the effectiveness and robustness of the CEEMDAN decomposition.
5.2. Augmented Data Validation
To verify that the HyT-GAN-generated samples are statistically consistent with the real CEEMDAN-decomposed components, we conducted a quantitative validation on the NASA B0005 dataset by comparing the first- and second-order statistics and the temporal dependency structure between the real and augmented sequences for each IMF. Specifically, we report the mean and standard deviation of real versus augmented data, the normalized mean shift , the standard deviation ratio , and the mean absolute ACF difference over a fixed lag window.
As shown in
Table 5, the augmented data exhibit small mean shifts across all IMFs, with
ranging from 0.031 to 0.114, indicating that the generator does not introduce substantial bias relative to the natural variability of the real components. Meanwhile, the dispersion level is well preserved, with
close to 1 (from 0.990 to 1.154), suggesting that HyT-GAN maintains comparable fluctuation intensity and avoids mode collapse. In addition, the temporal correlation structure is largely retained: the mean ACF diff remains low (from 0.043 to 0.231), especially for the low-frequency component (IMF4), implying that the generated sequences preserve the key autocorrelation patterns of the real degradation-related signals. Overall, these statistical results support that HyT-GAN produces realistic augmented samples that are consistent with the original IMF distributions and temporal dependencies, providing reliable additional training data for early-stage RUL prediction.
5.3. Hyperparameter Optimization
The hybrid method proposed in this article indicates that CEEMDAN-decomposed components exhibit heterogeneous temporal characteristics. The high-frequency IMFs (e.g., IMF1–2) are dominated by rapid fluctuations and noise-sensitive local variations, whereas lower-frequency components (e.g., IMF3–4) mainly reflect smoother long-term degradation dynamics with stronger temporal dependence. Consequently, different IMF components require different model capacities and training configurations in the CNN-BiGRU predictor; a single static hyperparameter setting is suboptimal and may lead to unstable performance in early-stage RUL prediction. Therefore, we employ the DBO algorithm to optimize the key hyperparameters of the CNN-BiGRU model for each IMF separately, including the Conv1D filter number, BiGRU hidden units, batch size, and dropout rate. The optimized results in
Table 6 show clear IMF-wise variability in these hyperparameters, confirming that adaptive hyperparameter selection is necessary to accommodate the diverse frequency contents and noise levels across decomposed components and to improve early-stage prediction robustness.
In order to verify the effectiveness of the DBO algorithm, 50% of the historical data was used for training. The CNN-BiGRU model applied to different datasets was optimized and compared with models using static hyperparameter configurations.
Table 7 presents the chosen combinations of static hyperparameter configurations which include the Baseline group, Extreme Config group, CNN Filters Focus group, Random Search group, Overfitting-Oriented group, and Self-Adjusted group. The prediction errors obtained are shown in
Figure 6. The seventh group is the DBO algorithm tuning group. The DBO algorithm is used to optimize the parameters. The iteration number is set to 8, and the population number is 10. It can be seen from
Figure 6 that the parameter combination of the DBO-optimized CNN-BiGRU model has a higher prediction accuracy than the fixed hyperparameter combination.
5.4. Comparative Analysis of RUL Prediction Results
In practical battery management systems, RUL estimation is driven by sensor-acquired time series signals (e.g., voltage, current, temperature, and capacity). These measurements are typically affected by noise, environmental disturbances, sensor drift, and operational variability, leading to degradation trajectories that are highly nonlinear and non-stationary, especially in early-life stages. Under such conditions, shallow machine learning models often rely on manual feature engineering and implicit stationarity assumptions, which limits their ability to capture multi-scale temporal dependencies and long-range degradation patterns.
In contrast, deep models can learn hierarchical temporal representations directly from raw sequences, enabling more effective extraction of degradation signatures in the presence of noise and non-stationarity. Moreover, our framework is specifically designed to address early-stage data scarcity and regeneration-induced fluctuations by combining CEEMDAN-based multi-scale decomposition and HyT-GAN augmentation before forecasting. This design improves robustness and generalization in small-sample settings, where shallow models are typically more sensitive to data insufficiency and distribution shifts. Finally, while deep models can be more computationally demanding during training, training can be performed offline, and online inference can be executed efficiently; thus, the accuracy–cost trade-off is favorable for sensor-driven battery health monitoring applications.
In this paper, representative shallow regression models and commonly used sequence models for battery RUL prediction were evaluated on the NASA dataset (B0005). As shown in
Figure 7, under the early-stage setting with only 30% historical data, most baseline methods exhibit unstable forecasts and often fail to capture the correct degradation trend, highlighting the strong nonlinearity and non-stationarity of sensor-acquired degradation trajectories and the difficulty of small-sample learning. Even when the training portion increases to 50%, several methods still struggle to produce a reliable capacity degradation trend, indicating that early RUL prediction remains challenging for mainstream approaches.
Table 8 summarizes the quantitative results. Among the baselines, GRU and LSTM achieve the best performance under 30% and 50% training data, respectively. However, the proposed hybrid framework consistently outperforms these models, and its accuracy with only 30% historical data already exceeds that of the best baseline trained with 50% data. This superiority supports our motivation for using a higher-capacity deep framework with decomposition and augmentation modules to improve robustness and generalization in early-stage, small-sample scenarios.
Regarding the currently widely used 50% of historical data, the RUL prediction using the proposed hybrid model architecture is illustrated in
Figure 8. In the figure, the dashed boxes indicate the capacity regeneration phenomenon. Both the NASA and CALCE datasets reveal that the prognostic trajectories maintain precise synchronization with the authentic degradation trends, while effectively capturing capacity rebound characteristics induced by electrochemical noise and cyclic regeneration phenomena. This indicates that the proposed hybrid model can achieve accurate predictive results across different datasets, demonstrating the CEEMDAN decomposition method’s effectiveness for capacity regeneration.
Table 9 presents the predictive results of the hybrid model across various datasets. As evidenced in the table, the proposed method achieves precise evaluation metrics across diverse datasets, with all RUL prediction errors confined within two cycles. It shows that the model has high prediction accuracy and strong cross-dataset generalizability on different datasets.
To validate the early-stage prognostic capability under data scarcity constraints, the historical data used for training were reduced to 20% of the overall capacity data. As shown in
Figure 9, the results indicate that this method can accurately capture the trend of capacity decline using only 20% of the historical capacity data. It can also accurately reflect this trend in the presence of significant capacity regeneration phenomena, a feature not possessed by other methods that utilize a single historical capacity data input in the current research. As shown in
Table 10, the average RMSE values are 0.0212 (NASA) and 0.0136 (CALCE), with R
2 metrics exceeding 0.985 across all test cases except B0006. Remarkably, these metrics rival the performance of comparative methods requiring 50–70% training data inputs. Furthermore, Absolute Error distributions across all experimental configurations remain bounded within 3% tolerance thresholds, empirically confirming the framework’s competence in data-constrained prognostic scenarios.
To further assess cross-scenario generalization beyond the NASA and CALCE benchmarks, we additionally evaluated the proposed framework on the Oxford Battery Degradation Dataset 1. In this study, Cells 1, 3, and 7 were selected, and we adopted the same early-stage protocol by using only the first 20% of the historical capacity trajectory as the prediction starting point. The result is shown in
Figure 10. Despite the differences in cell form factor, nominal capacity, temperature, and dynamic load profile compared with the constant-current cycling conditions in NASA/CALCE, the proposed CEEMDAN–HyT-GAN–CNN-BiGRU framework continues to produce consistent degradationbtrend tracking on these Oxford cells, indicating that the method is not restricted to a single dataset or testing protocol and exhibits promising cross-scenario applicability.
To assess the efficacy of each strategy in the proposed framework, four ablation configurations were evaluated on the NASA B0005 battery: (1) CNN-BiGRU, (2) CEEMDAN–CNN-BiGRU, (3) EMD–HyT-GAN–DBO–CNN-BiGRU, and (4) CEEMDAN–HyT-GAN–DBO–CNN-BiGRU. As reported in
Table 11, the standalone CNN-BiGRU achieves acceptable performance when trained with 50% historical capacity data; however, when the available history is reduced to 20%, its prediction accuracy degrades sharply and the AE exhibits large fluctuations (e.g., AE = 24). This behavior indicates that the baseline CNN-BiGRU is highly sensitive to data scarcity, leading to unstable forecasts and larger prediction variance under limited samples. After introducing CEEMDAN, the prediction becomes more stable because decomposition separates multi-scale trend and fluctuation components, enabling the model to better capture capacity regeneration patterns. When CEEMDAN is replaced by EMD, the performance decreases, since EMD is less effective at separating high-frequency noise from low-frequency degradation trends, which degrades the quality of the decomposed components. Finally, incorporating HyT-GAN augmentation further improves robustness in the 20% setting by increasing the sample diversity and stabilizing training, resulting in consistently lower errors and demonstrating the necessity of the proposed components for reliable early-stage RUL prediction. The ablation experiment results are shown in
Figure 11.
The above experiments demonstrate that the early life prediction method proposed in this paper achieves high accuracy even when using a significantly smaller amount of data (20%) compared to traditional methods. To further investigate the performance of the hybrid model under minimal samples, we conducted predictions using only 8% of the historical data on the NASA lithium-ion battery B0005 dataset. The results indicate that the model achieves an accuracy of RMSE = 0.0209 and MAE = 0.0158 using merely 8% of the data. As illustrated in
Figure 12, the RUL prediction results of the proposed hybrid method are compared with several references from current studies utilizing the NASA dataset. These studies employ various novel hybrid methods for battery RUL prediction, including the CEEMDAN–Transformer–DNN [
18], CEEMDAN–CNN–BiLSTM [
44], EEMD–LSTM–IWOA–SVR [
45], ARIMA–LSTM [
46], LSTM–GSA [
47], CNN–LSTM–ASAN [
48], and DCLA [
49]. The proportion of training data used in these studies ranges from 48% to 60%. Compared to the prediction methods shown in
Figure 12, the proposed hybrid method yields the smallest RMSEs. Under small-sample training conditions, it incurs an acceptable range of accuracy loss relative to the current literature on the NASA Ames PCoE battery dataset, while significantly reducing the amount of training data required. This highlights the model’s excellent data efficiency and its capability to extract critical aging information from very early cycles.