1. Introduction
With the rapid development of the global economy, population growth, and the acceleration of industrialization, the global demand for natural resources has increased significantly [
1,
2]. To cope with these problems, lithium-ion batteries, as an efficient and environmentally friendly energy storage technology, play a key role in promoting energy transition and realizing sustainable development. With their excellent performances, such as high energy density, long cycle life, and low self-discharge rate [
3,
4,
5], Li-ion batteries are widely used in electric transportation, mobile devices, distributed energy storage, and industrial manufacturing. However, with the charging and discharging cycles, batteries inevitably experience aging phenomena, leading to a gradual decline in their SOH, which directly affects the range, power performance, and energy recovery efficiency of electric vehicles [
6,
7]. Therefore, how to accurately assess the SOH of lithium-ion batteries has become a key issue to improve battery performance and extend the service life. In the field of SOH estimation for Li-ion batteries, traditional methods mainly rely on simple analysis of battery data, physical models, or means based on basic feature extraction. For example, the open-circuit voltage (OCV) method [
8], the internal resistance method [
9], and the capacity degradation method [
10] are widely used to determine the aging degree of the battery, while the polarization voltage method [
11] and the charge retention method [
12,
13] are used to detect the reaction degree of the active substance inside the battery and its charge retention ability. However, these traditional methods show obvious limitations in the face of data complexity and nonlinear characteristics, which make it difficult to meet the demand for high accuracy and real-time performance of modern battery management systems.
In recent years, due to the rapid development of big data and high-throughput computing technologies, data-driven methods have been widely used in the estimation of battery SOH and have demonstrated great potential for application. Among them, by utilizing large-scale historical data, machine learning methods are able to build efficient and flexible prediction models. In addition, these methods not only reduce human intervention but also show remarkable performance in dealing with complex patterns and nonlinear relationships. For example, Bin Gou et al. [
14] proposed a hybrid integration method combining nonlinear autoregression (NAR) and bootstrap resampling method (Bootstrap) uncertainty management, which not only improves the accuracy of SOH estimation but also possesses the ability of online application. L Cai et al. [
15] used the non-dominated sorting genetic algorithm II (NSGA-II) to optimize support vector regression (SVR) and current pulse test features to establish an efficient SOH estimator. J Qiao et al. [
16] proposed the chaotic firefly–particle filtering (CF-PF) method, which combines particle filtering (PF) and extended Kalman filtering (EKF) to realize the joint estimation and bidirectional correction of SOC and SOH and dynamically adjusts the parameters by a novel battery migration model to significantly improve the estimation accuracy. T Xu et al. [
17] proposed a hybrid driving model that combines electrochemical impedance spectra with integrated circuit (IC) curve features, captures the local timing and global decay trends with edited nearest neighbor (ENN) and support vector regression (SVR), respectively, and fuses the optimization through extreme learning machine (ELM) to achieve fast and accurate battery capacity estimation. K McCarthy et al. [
18] investigated the different SOH correlations between equivalent circuit elements and impedance spectra of lithium-ion batteries and used curve fitting and Pearson correlation matrix analysis to identify impedance variables that are highly correlated with the battery state, which provides a key basis for impedance state modeling of real-time battery management systems. However, although these methods exhibit high accuracy in battery SOH estimation, certain limitations still exist. For example, some of the methods are not adaptable enough to deal with diverse battery materials, complex dynamic working conditions, or cross-scenario applications. In addition, the high computational complexity of some algorithms makes it difficult to meet the real-time requirements, especially in practical engineering scenarios such as battery management systems, which require high response speed and resource efficiency, and this poses a challenge to further enhance their generalizability and practical deployment capabilities.
Deep learning methods are capable of handling complex pattern recognition tasks with strong nonlinear modeling capabilities by automatically learning features and training on large-scale data. Transfer learning (TL), on the other hand, improves model performance in the presence of insufficient data by transferring knowledge learned from one task to another related task, thereby reducing training time and dependence on large-scale datasets. J Zhao et al. [
19,
20] used the Transformer model to enhance the ability to recognize both short-term and long-term patterns in time series data and achieved SOH by TL for real-time estimation with reduced computational cost and enhanced model adaptability. Y Ma et al. [
21] combined TL with a hybrid deep belief network (DBN)-LSTM model to improve the reliability of SOH estimation through the fusion of multi-health metrics. G Ma [
22] proposed a TL method for battery SOH estimation, which effectively solves the challenge of battery SOH estimation under different usage conditions by combining a convolutional neural network (CNN) and an improved maximum mean difference domain adaptive method, and the experimental results validate the superiority and generalization of the method. Y Yang [
23] proposed a joint state of charge (SOC) and SOH estimation model combining deep learning and TL and achieved a dynamic SOH estimation model by taking into account the distributional differences of the training dataset, spatial characteristics, and cross-time scale characteristics; accurate and robust estimation of SOC and SOH for the whole life cycle of batteries under dynamic operating conditions was achieved. H Zhao et al. [
24] further improved the consistency and performance of the model in SOH estimation by integrating multiple attention mechanisms and deep learning methods. X Shu [
25] combined the long- and short-term memory neural network (LSTM) network with TL to construct the mean model and difference model, which achieved a highly accurate prediction of battery SOH with the error controlled within 3% under the significantly reduced amount of training data, effectively reducing the computational cost. S Shen [
26] proposed the deep convolutional neural network-integrated TL (DCNN-ETL) approach, which combines TL and integrated learning and significantly improves the accuracy and robustness of lithium-ion battery capacity estimation under limited training data.
Hybrid models can improve prediction accuracy and enhance robustness by combining the advantages of multiple models or distributions. J Zhao [
27] proposed a hybrid fusion model combining convolutional neural network (CNN) and self-attention mechanism, which utilized TL to effectively transfer the aging knowledge from LFP batteries to ternary batteries (NMC and NCA) under multiple battery materials and operating conditions to achieve low error and high diagnostic accuracy, and verified its effectiveness in battery health condition assessment. L Ren et al. [
28] utilized an improved CNN combined with an LSTM network to enhance the data dimensionality through an autoencoder, supplemented with a filter to smooth the prediction results, and verified its superiority in a real dataset. Z Lv et al. [
29] proposed a hybrid fusion network model based on resource-efficient artificial intelligence that reduces computational resource consumption while still maintaining highly accurate SOH estimation. Y Dai et al. [
30] further improved the LSTM network by combining the CNN-LSTM with the gate-controlled recurrent unit (GRU) method, which significantly improved the SOH estimation accuracy. C Qian [
31] proposed a CNN-SAM-LSTM model to improve the estimation accuracy by integrating the coupling information of SOH and, at the same time, solve the imbalance problem of multi-state parameter learning by optimizing the loss function. C Jia et al. [
32] proposed a model combining the bi-directional gated recurrent unit (Bi-GRU) and Transformer in a hybrid prediction model, where they input indirect health indicators (HI) characterizing the degradation of lithium-ion batteries into Bi-GRU to learn the hidden states and extract time series features.
Although these methods show high accuracy and flexibility in battery SOH estimation, and especially the deep learning models demonstrate significant advantages in handling complex nonlinear relationships and time series features, there are still some pressing issues that need to be addressed. Specifically, these methods have high requirements on the quality and diversity of the dataset, especially in TL, where the difference in data distribution between the source and target tasks will directly affect the transfer effect of the model. On the other hand, hybrid models significantly improve prediction accuracy by combining the advantages of multiple models or algorithms, but their model structure is usually more complex, with higher computational overhead, which makes it difficult to meet the real-time requirements. In addition, most of the methods mainly model a single task or specific conditions, and the generalization ability of the models still faces certain challenges in cross-material and multi-task scenarios. Therefore, in this study, we propose a deep learning-based framework for battery SOH estimation to address the challenges of existing methods in terms of data quality, computational efficiency, and cross-scene generalization capability by combining inception depthwise convolution (IDC) [
33], channel reduction attention (CRA) [
34] mechanism, and TL techniques (
Figure 1). Specifically, our approach innovates in the following three aspects. First, to address the high computational complexity of traditional convolutional operations, we introduce IDC, a module that significantly reduces the computational overhead through a multi-branch design and a depth-separable convolution technique while extracting time series features related to the battery health state from multiple scales and directions. Second, in order to efficiently capture global features in the battery SOH estimation task, this study employs an improved CRA mechanism. This mechanism significantly reduces the computational complexity and memory footprint of the self-attention mechanism by compressing the channel dimensions prior to the attention computation while preserving the key features of the global information. Finally, to enhance the model’s generalization ability in new scenarios, a staged training strategy is devised. During the pre-training stage, the model learns general battery aging features from large-scale source domain data. In the fine-tuning stage, the model preserves its general feature extraction ability by freezing the weight parameters of the IDC module. Meanwhile, it adjusts the weights of the CRA module to adapt to the feature distribution of the new data. This strategy not only shortens the training time in new scenarios but also substantially improves the model’s effectiveness in cross-task scenarios.
3. Methodology
To achieve efficient battery health state estimation, the hybrid deep learning model proposed in this paper is constructed based on the IDC and CRA and optimized by combining the staged training strategy. As shown in
Figure 3, the model architecture consists of three core modules. The IDC module is responsible for multi-scale spatiotemporal feature extraction, while the CRA module effectively captures global information. Additionally, TL enhances the model’s cross-domain generalization capability through a two-stage training process involving pre-training and fine-tuning. The hyper-parameter configuration of the model is shown in
Table 2, where a higher initial learning rate is used in the pre-training phase to accelerate model convergence, and a lower learning rate is used in the fine-tuning phase, combined with an early stopping mechanism to prevent overfitting. The synergistic design of the architecture enables the model to efficiently process large-scale timing data while adapting to the feature distribution of different battery materials. By combining the optimized parameters in
Table 2 with the modular structure shown in
Figure 3, the model achieves a balance between computational efficiency and generalization capability while ensuring accuracy.
3.1. Inception Depthwise Convolution
In deep learning models for battery SOH estimation, the efficiency and accuracy of feature extraction is one of the key factors to ensure the performance of the model. CNNs are widely used in various tasks due to their excellent performance in image processing. The traditional convolutional operation extracts local features by performing a sliding window convolution on the input feature map. However, this operation suffers from high computational complexity and number of parameters, especially when dealing with high-dimensional features. In order to improve the computational efficiency of the model and enhance the feature extraction capability at the same time, this study employs the inception deeply differentiable convolution approach. This module combines the multi-branching design of the inception architecture with the depthwise separable convolution (DSC) technique, aiming to efficiently extract the time series features related to the battery health state and to maintain efficient performance with limited computational resources. DSC significantly reduces the computational effort by decomposing the traditional convolution operation into two steps: depth convolution and point-by-point convolution. When compared to standard convolution, depthwise separable convolution remarkably decreases the computational complexity of each channel. For a convolution kernel with a size of k × k, here is how the computational effort of depthwise separable convolution is characterized: The IDC module takes a four-dimensional tensor as its input. This tensor is defined as
, where B represents the batch size, C indicates the number of input channels, and H and W, respectively, signify the height and width of the feature map. Specifically, the input tensor will be partitioned into four segments based on the number of channels. This partitioning can be defined by the following equation:
Here, the split operation determines the number of channels per branch by setting the ratio
, defined as:
Here, g represents the number of channels within the branch,
is the scaling factor for the number of channels in the branch, and C stands for the number of channels in the original input. The segmented input
undergoes processing via different convolutional branches. Horizontal and vertical convolution is realized by using depth-divisible convolution with a convolution kernel of size k × k. For each branch, the depth convolution convolves the input channels independently and then fuses the information between the channels by point-by-point convolution. For each branch, the output is processed through deeply differentiable convolution to obtain a new feature map
. The computation of these branches can be expressed as:
where k is the size of the convolution kernel, usually defaulted to k = 3. After all branches are convolved, the final output tensor is obtained by splicing together the outputs of each convolution branch, defined as:
This operation splices along the channel dimensions and fuses features from different convolutional branches together, which enhances the expressive power of the model and helps it capture important features of battery capacity degradation from multiple scales and directions. In addition, depth-divisible convolution significantly reduces the computational effort by decomposing the traditional convolution operation into two steps: depth convolution and point-by-point convolution. Compared with standard convolution, depth-divisible convolution significantly reduces the computational complexity of each channel. In the case of convolution kernel size k × k, the computation of depth divisible convolution is:
while the computational effort for point-by-point convolution is:
This computationally reduced design is particularly important in the battery SOH estimation task to efficiently process large amounts of time series data. The IDC module is designed to efficiently extract multiple features from the battery data while maintaining a low computational overhead through multiple parallel convolutional branches and deeply differentiable convolution. The module employs a splicing operation to fuse features from different branches, which enhances the expressive power of the model in battery health state estimation.
3.2. Channel Reduction Attention
In the battery SOH estimation task, accurately extracting key features of the battery performance is crucial for predicting the SOH of the battery. Among them, the computational complexity and efficiency of the model are the key factors affecting the performance and training speed. In order to optimize the computational efficiency of the battery SOH prediction model and to improve the global feature capture capability, this study employs CRA. The CRA module improves on the traditional self-attention mechanism by performing channel compression through average pooling before performing attention operations on the query (Q) and key (K) matrices. With this channel reduction operation, the computational overhead can be significantly reduced, and the global feature information can be effectively captured. Specifically, the input feature map
is processed into multiple heads, and the
,
, and
of each head are computed by the projection matrix, respectively, with Eq:
where
,
,
is the projection parameter used for the
,
, and
matrices and
is the output projection matrix. Then, the attention weight of each head is calculated, defined as:
The CRA module significantly reduces the computational complexity in the self-attention mechanism by reducing the channel dimensions of the query and key matrices. Compared with the traditional self-attention mechanism, CRA makes the matrix multiplication operation of each computation more efficient by compressing the number of channels, which greatly reduces the memory occupation and computation. With this design, CRA is able to efficiently capture global information while reducing unnecessary computational burden, which is especially important for the battery SOH estimation task that needs to deal with large-scale datasets.
3.3. Hyperparameter Interpretation
In terms of better understanding our model, we explain it for some of the hyperparameters. In this study, we carefully configure key parameters to optimize the model’s performance in predicting battery SOH. To ensure stable gradient updates while maintaining computational efficiency, we set the batch size to 256, which effectively reduces variance in gradient updates, leading to smoother convergence and improved generalization. Given the complexity of battery aging modeling, this setting provides a balance between computational resource utilization and prediction accuracy. The model undergoes an extensive pre-training phase of 10,000 epochs, allowing it to deeply extract key electrochemical aging features from the NCM battery dataset. To prevent overfitting, an early stopping mechanism with a patience value of 1200 is implemented, terminating training when no improvement in validation loss is observed over 1200 consecutive epochs. This ensures that the model fully learns the degradation patterns from the source dataset without unnecessary overfitting, which could otherwise hinder its generalization ability. Additionally, we enhance information integration through feature concatenation, where both the original feature representations and their differences are combined after extraction through the IDC module. This strategy enriches the input space, enabling the model to learn both absolute values and relative variations of battery characteristics. By capturing dynamic changes within charge-discharge cycles, this approach improves the model’s ability to recognize battery aging trends and enhances predictive accuracy.
To further optimize feature extraction, the input data consists of four fundamental channels—voltage, current, charge capacity, and temperature—each crucial to understanding lithium-ion battery degradation. These features are initially processed through the IDC module and then transformed via an adjust_channels layer, expanding the feature dimension from 4 to 64 channels. This transformation enhances feature aggregation within the CRA module, strengthening the model’s ability to capture long-term temporal dependencies and optimize SOH prediction accuracy. Within the IDC module, we employ depthwise convolution to perform multi-scale spatiotemporal feature extraction. Specifically, square depthwise convolution (3 × 3) enhances local spatial feature modeling, horizontal depthwise convolution (1 × 11) captures long-range dependencies along the temporal dimension, and vertical depthwise convolution (11 × 1) facilitates interactions between different input features. This design significantly reduces computational complexity while preserving detailed spatial representations. Unlike standard convolution, which applies shared filters across all input channels, depthwise convolution processes each channel independently, improving efficiency and fine-grained feature extraction. Following this, a 1 × 1 pointwise convolution is applied in the adjust_channels layer, expanding the feature dimension to 64 channels and facilitating multi-scale feature fusion. This step strengthens the model’s global attention mechanism within the CRA module, ensuring a refined representation of extracted features and ultimately improving the accuracy of SOH prediction.
3.4. Training Strategies and Transfer Learning
The model training in this study uses a standard supervised learning approach, aiming to improve the accuracy and generalization ability of the battery SOH estimation task through efficient training strategies and optimization methods. During the training process, multiple strategies are combined to ensure the stability and efficiency of the model. To measure the difference between the model prediction results and the true values, the mean square error was used as the loss function in this study. The optimization of the loss function is performed by the Adam optimizer [
37], which combines the advantages of the momentum approach and the adaptive learning rate to efficiently handle high-dimensional data and accelerate the convergence of the model during the training process. On the other hand, in order to enhance the stability of the model, we also introduce weight decay and gradient shearing strategies to prevent overfitting and ensure the stability of the gradient during the training process. Additionally, early stopping is utilized to avoid overfitting, halting training when the validation performance ceases to improve over several epochs and retaining the best model weights. The learning rate of the model was adjusted in different training stages. A higher learning rate was used in the pre-training phase to accelerate the initial convergence of the model. In the fine-tuning phase, the learning rate was decremented to more finely tune the model parameters, thus ensuring the stability and adaptability of the model on new data.
To improve the generalization ability of the model in battery SOH estimation, this study employs the technique of TL. TL enables the model to utilize prior knowledge gained from a large amount of raw data so that it can adapt to new scenarios without starting from scratch. Specifically, in the fine-tuning phase, the model fine-tunes the weights of some layers using a small amount of new data, which may include features and scenarios that have never been encountered before. In the well-designed fine-tuning strategy, the IDC module is effective in capturing generic battery aging features, and thus, the weight parameters of the module are frozen to preserve the generic electrochemical pattern recognition capabilities obtained from the source domain data. The CRA module, which plays a key role in capturing global information and trends, is then weighted to adapt to the new data features. This fine-tuning strategy improves the model’s predictive accuracy and robustness to new features while retaining its essential generalized capabilities. In conclusion, by rapidly training the pre-trained model using new data, the proposed model can be effectively adapted to the battery SOH estimation task in new scenarios.
4. Results and Discussion
4.1. Model Performance on NCM Cells
In the realm of data analysis and machine learning, assessing a model’s performance is of utmost importance. This is because such an evaluation serves a dual purpose. On the one hand, it checks the model’s accuracy, making sure that it can accurately represent the patterns and relationships present in the data. On the other hand, it assesses the model’s generalization capacity, guaranteeing that it can deliver consistent results when faced with new, unencountered data. In addition, performance evaluation helps to detect overfitting or underfitting problems of the model and guides the adjustment and optimization of the model so that the optimal model can be selected and its parameters optimized. By quantifying the performance of the model, the evaluation results provide a reliable basis for business decisions and ensure that the selected model meets the actual needs and goals. At the same time, continuous performance monitoring can identify and respond to potential problems in a timely manner, ensuring the long-term effectiveness of the model in practical applications. Therefore, in order to better evaluate the accuracy and generalization ability of the proposed model performance, we introduce three evaluation indexes: root mean square error (RMSE), coefficient of determination (R²), and mean absolute percentage error (MAPE). RMSE measures the absolute size of the prediction error, and the smaller the value means the higher the accuracy of the model; R² reflects the proportion of the model explaining the variation of the data, and the closer the value is to 1, the higher the explanation ability is; MAPE indicates the relative percentage of prediction error, the smaller the value indicates the higher the prediction accuracy. The formulas for the relevant indicators are shown below:
where
is the sample size,
is the i-th actual,
is the i-th forecast, and
is the average of the actuals. Based on these evaluation metrics, a comprehensive regression evaluation of the proposed IDC-CRA hybrid model is conducted with the aim of verifying the accuracy and robustness of the model in the estimation of SOH of NCM batteries. Through a series of experimental evaluations and feature analyses, the results demonstrate that the IDC-CRA model exhibits significant advantages across different datasets, confirming its broad applicability and superior performance in various battery materials.
Figure 4 further validates the excellent performance of the IDC-CRA model. In the scatter plots of capacity estimation (a and c), the predicted values after TL are highly fitted to the ideal diagonal, showing a significant improvement in prediction accuracy. This means that the IDC-CRA model can achieve highly accurate capacity estimation by predicting the battery capacity with a very small deviation between its prediction results and the actual values. The error distribution plots (b and d), on the other hand, reveal that the error range of the model is significantly narrowed on all samples and the median is close to zero, indicating that the IDC-CRA model maintains a high degree of stability and robustness in the prediction process. This error distribution characteristic indicates that the model maintains consistent prediction performance under different operating conditions and data conditions, which reduces the occurrence of extreme errors and enhances the reliability of the model.
As shown in
Table 3 and
Figure 5, the IDC-CRA model shows significant improvement in capacity estimation capability and generalization performance compared to before TL is applied. Specifically, the RMSE of the model is reduced from 1.13% to 0.522%, the R² is improved from 0.882 to 0.98, and the MAPE is reduced from 1.06% to 0.431%. The changes in these metrics indicate that the fitting ability of the IDC-CRA model has been significantly enhanced to more accurately capture the changing characteristics of battery capacity. Meanwhile, the model’s adaptability to complex data has also been significantly improved, and it is able to maintain a high level of prediction performance under different battery materials and variable operating conditions. These results indicate that the proposed IDC-CRA hybrid model is able to capture the characteristics of battery capacity changes more accurately, demonstrating excellent stability and robustness. The excellent performance of the model in different battery materials validates its great potential in battery SOH estimation. This hybrid model effectively migrates a priori knowledge from the NCM battery data in the source domain to the NMC battery data in the target domain through the introduction of TL, which successfully overcomes the problem of data distribution differences due to different battery materials, thus significantly improving the generalization ability and prediction accuracy of the model. In addition, the running efficiency and resource utilization of the model are optimized. Through the introduction of TL and optimal design, the IDC-CRA model maintains high computational efficiency and resource utilization under complex working conditions and diverse data conditions. This not only improves the response speed of the model in real applications but also reduces the demand for computational resources, making it more suitable for deployment and operation in resource-constrained real environments. In conclusion, this study validates the high accuracy and robustness of the IDC-CRA hybrid model in estimating the SOH of NCM batteries via regression evaluation. The experimental findings indicate that the outstanding performance of the IDC-CRA model across various datasets not only notably enhances the accuracy and stability of capacity estimation but also refines the model’s generalization ability and computational efficiency. These merits endow the IDC-CRA model with extensive application potential in real-world battery management systems. Specifically, it can effectively prolong battery lifespan, boost battery performance, and cut down on maintenance costs. With this innovative model, this study significantly improves the performance of battery capacity estimation under complex working conditions and diverse data conditions, which lays a solid foundation for its application in more battery chemistries and practical operation scenarios and promotes the intelligent and efficient development of battery management systems.
4.2. Model Performance on NMC Cells
In this study, a two-stage TL strategy, consisting of a pre-training phase and a fine-tuning phase, is adopted to enhance the model’s adaptability to different battery chemistries and operating conditions while improving the accuracy and robustness of SOH estimation. During the pre-training phase, the model is trained on a large-scale source dataset (NCM battery data) to learn the general electrochemical characteristics of battery aging. The proposed model utilizes IDC for multi-scale spatiotemporal feature extraction and incorporates CRA to enhance global feature representation. The input data consists of four channels (voltage, current, charge capacity, and temperature), which are processed through three layers of IDC to extract critical features. Each IDC layer is followed by a MaxPooling layer (MaxPool2d (2,1)), which reduces computational complexity while preserving essential temporal dependencies. After pooling, the feature maps retain four channels, which are then adjusted to 64 channels via a 1×1 convolution layer (adjust_channels) to ensure compatibility with subsequent attention mechanisms and improve feature fusion.
During training, the Adam optimizer is employed, with an initial learning rate of 8 × 10−4. A relatively high learning rate facilitates rapid feature acquisition in the early training phase, while an early stopping strategy (patience = 1200) prevents overfitting. The batch size is set to 256, which helps stabilize gradient updates, reduce variance, and improve training efficiency. The model undergoes 10,000 training epochs to fully capture battery degradation patterns. SOH prediction is formulated as a 10-dimensional output (dense_soh layer), where each dimension represents the estimated SOH at a specific charge/discharge cycle. The L2 loss function is applied to ensure stable gradient updates, with a weighting factor α = [0.1] × 10, which balances contributions across different time steps, preventing any single time step from dominating the learning process.
To accommodate potential discrepancies in the target dataset (NMC battery data) arising from different battery chemistries or operating conditions, a selective parameter update strategy is adopted in the fine-tuning phase. Specifically, the IDC module is frozen (requires_grad = False) to retain the fundamental battery degradation features learned from the source dataset, ensuring that core characteristics remain unchanged. Meanwhile, the adjust_channels layer, CRA module, and dense layers are fine-tuned to adapt to the distributional differences of the target dataset, thereby enhancing the model’s generalization to new battery chemistries. To prevent instability in small-scale datasets, the learning rate is adjusted to 6 × 10−4, slightly lower than the 8 × 10−4 used in pre-training, ensuring smooth convergence. The model is trained for 351 epochs, with early stopping (patience = 1000) to halt training once no further performance improvement is observed, thus preventing overfitting. Furthermore, batch normalization is implemented to normalize data distribution and improve generalization. Additionally, an adaptive loss weighting mechanism dynamically adjusts the contribution of different loss components based on dataset characteristics. To further mitigate overfitting, a dropout layer (0.2) is introduced after dense_2, enhancing model robustness during testing.
To validate the generalization ability of our proposed hybrid model across different battery materials, this study adopts a transfer-learning strategy. This approach effectively transfers prior knowledge from the source domain to the target domain. In this particular section, we specifically apply the feature patterns and rules gleaned from the NCM battery data in the source domain to the NMC battery data in the target domain. By doing so, the model can not only make full use of the abundant information within the NCM battery data but also address the issue of data distribution disparities in the target domain (NMC battery data) caused by the different battery materials. In implementing the TL process, we first trained the initial hybrid model on the source domain NCM battery data to ensure that the model is able to accurately capture and learn the key features and patterns in the NCM battery data. Then, the pre-trained model was applied to the target domain NCM battery data through a fine-tuning technique. During the fine-tuning process, we retained the feature extraction capabilities learned by the model in the source domain while appropriately tuning and optimizing it for the data distribution in the target domain. This process ensures that the model can quickly adapt and effectively handle different battery material properties in the new domain.
To evaluate the effect of TL, we experimentally compare and analyze the prediction performance and generalization ability of models without and with TL. The experimental results show that the hybrid model processed with TL significantly outperforms the model without TL on NMC battery data. Specifically,
Figure 6 represents the comparison of the estimated capacity of the NMC battery sample with the actual capacity and the error distribution. In the fitting plots (a and c), the distribution of the fitted points of estimated capacity versus actual capacity for all samples is dense and, overall, close to the ideal diagonal, especially in the capacity range of 2.6 Ah to 2.9 Ah interval. As can be seen from the error distribution plots (b and d), the error ranges of all samples are significantly narrowed and the median is close to zero, indicating a significant increase in the robustness of the model. Notably, the samples (#03, #06, #10, and #12) have the smallest errors, which well validates the model’s ability to adapt to complex samples. As shown in
Table 3, by comparing the model performance before and after TL, it is clear that the introduction of TL significantly improves the predictive ability of the model. Specifically, TL significantly reduces the root mean square error of the model from 3.63% to 0.283%, indicating that the prediction error of the model is significantly reduced; R² significantly improves from −0.149 to 0.992, indicating that the model is close to the ideal state in terms of its ability to fit the relationship between the target variables and the predicted values; in addition, the MAPE reduces from 3.19% to 0.22%, indicating that the prediction error of the model among different samples is significantly reduced. The prediction error between different samples is significantly reduced. These results show that TL is not only effective in solving the problem of pre-prediction bias due to the difference in data distribution across different batteries but also in improving battery performance. It also greatly enhances the generalization ability and prediction accuracy of the model in complex data environments.
Figure 7 visualizes the difference in capacity estimation performance of NMC battery samples with and without TL in the form of a histogram. Before the introduction of TL, the model exhibits a large bias in capacity estimation, reflecting the significant impact of the difference in data distribution of different batteries on the prediction results. The reason for the poor performance of the model’s results on the NMC dataset in the absence of TL is analyzed. Firstly, due to the significant differences in the electrochemical characteristics of NCM and NMC batteries, their degradation patterns, charge/discharge behavior, and SOH evolution trends are not entirely identical. Consequently, when the model is trained solely on the NMC dataset without TL, its learning is constrained by the limited dataset size, making it difficult to fully capture the degradation features of the target domain. This results in higher prediction errors and reduced generalization capability. Secondly, the NMC dataset contains fewer samples compared to the NCM dataset, which further limits the model’s ability to comprehensively learn the SOH degradation patterns. With insufficient training data, the model is prone to overfitting, exhibiting strong performance on the training set but suffering from high prediction errors on the test set, indicating weakened generalization. Furthermore, the pre-trained model has already learned essential battery degradation patterns from the NCM dataset, utilizing IDC for multi-scale feature extraction and CRA for global feature integration, thereby enhancing the robustness of SOH representation. However, when TL is not applied, the model must learn SOH features from scratch on the NMC dataset. Given the limited size of the target domain dataset, the model’s feature extraction capability is constrained, leading to reduced prediction accuracy, increased errors, and overall performance degradation.
Taken together, these factors contribute to the model’s poor performance on the NMC dataset, as evidenced by significantly higher RMSE, lower R
2 values, and larger prediction errors. This further reinforces the importance of TL in SOH estimation across different battery chemistries. To ensure the accuracy of our experimental results, we have conducted a comprehensive review of the model’s predictions, including data integrity checks and error analysis. The results in
Table 3 align with our expectations and further confirm the effectiveness of TL, demonstrating that TL substantially enhances the model’s predictive performance in the target domain (NMC dataset) by compensating for the generalization limitations caused by the smaller dataset size. In contrast, with the TL strategy, the model effectively captures the distribution characteristics among different battery samples, resulting in a significant reduction of the prediction error. In addition, the TL process makes full use of the rich information in the source dataset and can still accurately extract the key features and optimize the model performance in the case of fewer target data samples. Especially in the face of significant distributional differences between different samples, the fine-tuning process is able to make more detailed hyperparameter adjustments to the target domain data, thus enhancing the robustness of the model. This approach not only significantly reduces the dependence on large-scale labeled data in the target domain but also improves the model’s ability to predict the SOH of batteries under diverse operating conditions. TL not only improves the accuracy and stability of the model but also optimizes the model’s ability to capture inter-sample differences in the high-dimensional feature space, especially on complex samples.
4.3. Comparison with Other Baseline Models
In evaluating the effectiveness of the proposed fusion modeling technique for battery health assessment, a comprehensive comparison with baseline models demonstrates significant improvements in accuracy, generalization ability, and computational efficiency, thereby validating its practical application value. In this section, comparative experiments with three baseline models—CNN, LSTM, and Transformer—are conducted to thoroughly assess the performance of the fusion model in battery capacity estimation. Specifically, these baseline models represent different types of deep learning architectures. CNN excels at extracting local features of voltage, current, and temperature, enabling accurate battery state determination, but it has limitations in handling long-term dependencies. LSTM is capable of capturing long-term dependencies in time series, improving prediction accuracy and stability; however, it is inefficient in processing large-scale, high-dimensional data and struggles with global feature modeling. In contrast, the Transformer model captures complex dependencies and global features through its global attention mechanism, exhibiting strong modeling capabilities yet requiring extensive data and computational resources, which may pose challenges in practical applications.
Table 4 provides a comprehensive evaluation of model performance using three key metrics: RMSE, R², and MAPE. The results indicate that the proposed model outperforms traditional baseline models across all metrics. For instance, the RMSE of the proposed model is only 0.522%, significantly lower than the corresponding values of CNN (6.36%), LSTM (0.57%), and Transformer (1.67%), demonstrating its superior accuracy in reducing prediction errors. The R² value reaches 0.98, which is close to 1, indicating that the model can explain 98% of the variation in the target variable, whereas the R² values of the baseline models are 0.885, 0.835, and 0.801, respectively. This highlights the proposed model’s excellent goodness-of-fit. Additionally, the MAPE of the proposed model is only 0.431%, further validating its ability to maintain low relative error during the prediction process. In contrast, the MAPE values for the baseline models are 5.9%, 1.2%, and 1.51%, demonstrating the superior reliability and consistency of the proposed model’s predictions.
Furthermore, to comprehensively assess the model’s operational efficiency and resource requirements, we evaluate the inference time and floating-point operations (FLOPs) during the testing phase. The results show that the inference time of the proposed model is only 0.018 s, with 4.67 million FLOPs, demonstrating exceptional computational efficiency. Notably, compared to the computationally demanding Transformer model, the proposed model not only maintains, or even enhances, predictive performance but also significantly reduces computational resource consumption and greatly improves operational efficiency. This advantage enables the proposed model to process large-scale data more rapidly in real-world applications, making it highly suitable for battery health assessment tasks that require high real-time performance.
The outstanding performance of the proposed model regarding capacity estimation and error distribution can be more intuitively observed through the visual analysis presented in
Figure 8a. It can be seen that the distribution of the fitted points of the proposed model closely aligns with the ideal diagonal. This indicates that the model boasts remarkable accuracy in predicting the battery capacity, and there is an extremely high level of agreement between the predicted values and the actual values. The error distribution plot in (b) further reveals that the error range of the proposed model is smaller, the median is closer to zero, and the number of outlier points is significantly reduced. This means that the model not only maintains a low prediction error overall but also accurately predicts the battery capacity in most cases and reduces the occurrence of extreme errors, highlighting its stability and reliability in practical applications. These experimental results fully demonstrate the significant advantages of the proposed model in terms of accuracy, generalization ability, and computational efficiency. Specifically, the proposed model effectively migrates the a priori knowledge from the NCM battery data in the source domain to the NMC battery data in the target domain through the introduction of TL, which successfully overcomes the challenges posed by the differences in data distribution and significantly improves the model’s adaptability and generalization ability in different battery materials. In addition, the optimized design enables the model to not only achieve comprehensive performance improvement but also maintain high operational efficiency and resource utilization when dealing with complex battery capacity estimation tasks. These advantages enable the proposed model to have a wide range of potential applications in real battery management systems, which can effectively extend battery life, improve battery performance, and reduce maintenance costs. In summary, through comparison experiments with CNN, LSTM, and Transformer baseline models, the proposed fusion model performs well in the battery capacity estimation task, verifying its superiority and wide applicability in different battery materials. The application of TL not only improves the prediction performance and generalization ability of the model but also optimizes the computational efficiency, which fully demonstrates its practical application value in battery health state assessment. This research result provides a solid theoretical foundation and practical support for the promotion and application of advanced modeling techniques in more battery materials and different application scenarios in the future and promotes the intelligent and efficient development of battery management systems.
To further validate the comprehensiveness and robustness of our proposed model, we systematically compare its performance against conventional machine learning techniques, specifically Gaussian process regression (GPR) and linear regression (LR), for SOH prediction in lithium-ion batteries. The experimental results in
Table 4 clearly demonstrate the superior predictive capability of our model, with notable advancements across all key evaluation metrics. In terms of prediction accuracy, our model achieves an RMSE of just 0.522%, representing a 64.5% and 65.7% reduction compared to GPR (1.47%) and LR (1.52%), respectively. This substantial improvement highlights the model’s ability to provide more precise SOH estimations. Additionally, the coefficient of determination (R²) reaches 0.98, significantly outperforming GPR (0.814) by 16.6% and LR (0.675) by 45.2%, indicating its exceptional capability in capturing the nonlinear degradation trends of lithium-ion batteries. Furthermore, our model achieves a MAPE of just 0.431%, which is 63.5% and 55.1% lower than GPR (1.18%) and LR (0.959%), reinforcing its superior predictive accuracy and robust generalization ability across various operating conditions.
Beyond predictive performance, computational efficiency is a critical factor in practical applications. While traditional machine learning models, such as Gaussian process regression (GPR) and linear regression (LR), are generally known for their computational efficiency, the proposed model achieves an optimal balance between predictive accuracy and computational cost. Experimental results demonstrate that the inference time of the proposed model is 0.018 s, which, although slightly higher than GPR (0.0078 s) and LR (0.0023 s), remains well within the acceptable range for real-time BMS, ensuring its feasibility for deployment. In terms of computational complexity, GPR requires 25.6 trillion FLOPS, while LR demands only 10.3 trillion FLOPS. In contrast, the proposed model operates at 46.7 trillion FLOPS, striking a balance between maintaining a reasonable computational demand and achieving substantially higher predictive accuracy. The underlying reasons for this disparity lie in the distinct computational characteristics of these models. GPR exhibits higher complexity due to its reliance on computing and storing kernel matrices, which leads to a significant computational burden as dataset size increases. Despite its relatively short inference time, its high computational demand constrains scalability for large-scale datasets. Conversely, LR is computationally the most efficient (O(N)), involving only matrix multiplications and linear transformations, resulting in exceptionally fast inference speeds. However, its inherent linear assumptions limit its capability to effectively capture the nonlinear degradation patterns of lithium-ion batteries, thereby restricting its predictive accuracy.
In contrast, the proposed model utilizes a carefully optimized computational architecture to effectively reduce the computational overhead. This approach improves accuracy beyond traditional machine learning models while avoiding the excessive computational burden associated with deep learning architectures such as Transformers. Comparative results clearly show that the proposed model outperforms traditional machine learning approaches in terms of inference speed and computational efficiency while significantly outperforming traditional machine learning approaches in terms of prediction accuracy. These results clearly demonstrate that the proposed model not only accurately captures the complex electrochemical aging mechanism of Li-ion batteries but also provides robust SOH estimation under different operating conditions, which achieves an optimal balance between computational feasibility and prediction performance.
4.4. Ablation Experiments
To better demonstrate the role of the various parts of the model, we performed ablation experiments. The results (
Table 5) demonstrate that IDC and CRA play a crucial role in improving the accuracy and stability of SOH estimation while also verifying the effectiveness of the TL strategy in enhancing the model’s generalization and adaptability across different datasets. Firstly, IDC effectively extracts the local dynamic characteristics of battery health evolution through its multi-scale convolutional kernel structure, enhancing the model’s sensitivity to SOH variations. When IDC was removed, the model’s RMSE increased to 1.72%, R² dropped to 0.789, and MAPE rose to 1.6% in the absence of TL. Although TL mitigated some of the performance degradation (reducing RMSE to 0.852%, increasing R² to 0.942, and lowering MAPE to 0.691%), there was still a noticeable gap compared to the full model. This further confirms the indispensable role of IDC in enhancing feature representation and improving the model’s adaptability to complex temporal data.
Secondly, CRA significantly contributes to suppressing redundant information and enhancing the model’s focus on critical global features, thereby improving the stability and generalization capability of SOH estimation. The ablation study shows that removing CRA had an even greater impact on model performance, with RMSE increasing to 1.94%, R² dropping to 0.526, and MAPE rising to 1.85% in the absence of TL. While TL partially compensated for the loss of CRA (reducing RMSE to 0.753%, increasing R² to 0.955, and lowering MAPE to 0.613%), the overall predictive accuracy remained inferior to the full model. This further validates that CRA plays a vital role in optimizing feature representation and enhancing the model’s robustness and predictive performance.
Additionally, the staged TL strategy proves to be critical in improving the model’s adaptability and generalization across different datasets. When transferring the model from the NCM dataset to the NMC dataset, the model exhibited limited adaptation capability without the transfer strategy, as evidenced by an RMSE of 1.13%, R² of 0.882, and MAPE of 1.06%. However, when the transfer strategy was applied, RMSE significantly decreased to 0.522%, R² improved to 0.98, and MAPE dropped to 0.431%, indicating that TL effectively leverages prior knowledge from the source dataset, allowing the model to rapidly adapt to different battery chemistries, reduce dependency on large labeled datasets, and enhance prediction accuracy. Overall, IDC, CRA, and the staged TL strategy work synergistically to improve the model’s feature learning capability, cross-dataset generalization, and predictive stability, ensuring superior SOH estimation performance under diverse operating conditions.
5. Conclusions
In this study, we propose a hybrid deep learning model that integrates IDC-CRA and TL to improve the accuracy, efficiency, and generalization ability of lithium-ion battery SOH estimation. The model utilizes IDC and CRA for effective feature extraction and comprehensive information capture, while a staged training strategy significantly enhances its adaptability to different battery chemistries and operational conditions. Experimental validation on two distinct battery datasets, NCM and NMC, confirms the model’s superior predictive accuracy and robustness across various scenarios. For the NCM dataset, TL reduces the RMSE from 1.13% to 0.522%, improves the R² from 0.882 to 0.98, and lowers the MAPE from 1.06% to 0.431%. Similarly, for the NMC dataset, TL further enhances performance, reducing RMSE from 3.63% to 0.283%, increasing R² from −0.149 to 0.992, and decreasing MAPE from 3.19% to 0.22%. This enables rapid cross-material transfer from 54 NCM cells to 16 NMC cells, demonstrating the model’s strong adaptability and high-precision predictive capability across different battery chemistries and diverse operating conditions. Moreover, its stable performance under dynamic charging and discharging protocols and varied temperature ranges highlights its potential for real-time deployment. To further validate the contribution of each component in the proposed model, we conducted an ablation study. The results demonstrate that removing either IDC or CRA leads to a notable degradation in model performance, underscoring the importance of these modules. Additionally, we compared the proposed model not only with deep learning baselines such as CNN, LSTM, and Transformer but also with traditional machine learning approaches, including GPR and LR. The results indicate that the IDC-CRA model consistently surpasses its counterparts in prediction accuracy, generalization capability, and computational efficiency. These findings highlight the IDC-CRA model’s potential for real-world applications in BMS, where precise and efficient SOH estimation is crucial for ensuring battery reliability, extending lifespan, and optimizing performance.
With the increasing demand for efficient and low-latency SOH estimation in EVs, ESS, and intelligent BMS, the scalability and computational efficiency of SOH estimation models have become critical considerations. Traditional deep learning models often rely on cloud computing platforms for high-performance computation, which may introduce communication delays and increase dependency on network infrastructure. In contrast, the proposed model, integrating IDC and CRA mechanisms, demonstrates significant computational efficiency, making it suitable for deployment in edge computing and embedded system environments.
For edge computing applications, the proposed model leverages IDC-based multi-scale feature extraction, which significantly reduces the number of trainable parameters and computational complexity compared to CNN architectures. This reduction enhances its feasibility for deployment on local BMS devices or edge servers, enabling real-time SOH estimation with minimal latency. Additionally, the CRA module’s channel reduction mechanism further alleviates computational overhead, allowing the model to operate efficiently under constrained power and processing conditions. Moreover, TL enhances the model’s adaptability to different battery chemistries and operating conditions, making it suitable for large-scale distributed BMS deployments and real-time health monitoring in heterogeneous battery systems.
For embedded system applications, the proposed model can be optimized using quantization-aware training and model pruning techniques, enabling efficient deployment on ARM-based processors, field-programmable gate arrays (FPGAs), and microcontroller units (MCUs). These optimizations minimize computational costs and memory usage while preserving predictive accuracy. Compared to LSTM or Transformer-based architectures, the proposed model demonstrates lower memory consumption and reduced computational latency, making it well-suited for onboard SOH estimation in electric vehicles (EVs), stationary energy storage systems, and industrial IoT applications. By enabling localized SOH estimation, the model reduces dependence on cloud-based computing, thereby enhancing system reliability and data security in battery-powered intelligent systems. Moreover, the integration of edge computing not only enhances system responsiveness but also decreases reliance on high-bandwidth networks, effectively lowering operational costs. Processing data directly on edge devices reduces data transmission, safeguards user privacy, and strengthens overall system security and reliability.