1. Introduction
In new energy vehicles, the battery management system (BMS) is responsible for monitoring battery operating conditions and supporting safe and efficient energy utilization. As one of the key indicators used to characterize battery status, the state of health (SOH) reflects the aging level and remaining service capability of a battery [
1]. Owing to their high energy density, long cycle lifespan, low self-discharge characteristics, and relatively limited environmental burden, lithium-ion batteries have been widely applied in electric vehicles and energy storage systems. However, during repeated charge–discharge cycling, internal aging reactions gradually accumulate, leading to capacity loss and continuous SOH decline [
2,
3]. Therefore, accurate SOH estimation is essential for battery safety assessment, operation strategy optimization, and service-life management.
The SOH evolution of lithium-ion batteries is affected by electrochemical aging, operating conditions, and historical usage, showing nonlinear and strongly coupled characteristics. As a result, it is difficult to describe the actual battery health state using only one measurable parameter [
4]. SOH estimation techniques can be broadly grouped into experimental testing methods [
5], mechanism-based approaches, and data-driven approaches. In contrast to mechanism-based methods, data-driven approaches infer battery health directly from measured data, thereby avoiding the need to establish an accurate electrochemical model. Instead, they extract health features (HFs) from measurable operating data, such as voltage, current, temperature, and capacity, and establish the mapping relationship between HF and SOH through machine learning or deep learning models [
6,
7,
8].
For example, Reference [
9] adopted random forest for battery SOH estimation, and Reference [
10] used extreme gradient boosting with an accuracy correction strategy to improve SOH prediction performance. Reference [
11] introduced a feature selection strategy for identifying key variables from complex data. Reference [
12] constructed HF from capacity degradation characteristics, entropy information, and correlation-related indicators, and then applied manifold learning to reduce feature dimensionality. Although such approaches can improve SOH estimation performance, their prediction accuracy remains strongly influenced by the representativeness of the input HFs. If the extracted HF fails to fully describe battery degradation or includes redundant information, the accuracy and generalization performance of the model may decline.
In addition to conventional charge–discharge data-based methods, fast electrochemical data, such as electrochemical impedance spectroscopy (EIS), have also been used for SOH estimation in recent years [
13]. EIS-based methods can obtain electrochemical diagnostic information within a relatively short measurement time and are useful for rapid battery health assessment. However, their practical implementation usually requires additional impedance measurement equipment and specific test conditions [
14]. In contrast, the method in this study extracts HFs from electrical signals during the charge–discharge process, including voltage, current, and time-related information, which can be obtained from conventional cycling data. It should be noted that the total time from measurement to analysis is difficult to compare quantitatively among different methods because it is affected by the type and amount of measured data, measurement protocol, feature extraction process, equipment conditions, and computing platform. Therefore, this study focuses on SOH prediction based on charge–discharge electrical signal features, while EIS-related fast electrochemical features will be further considered in future work.
With the development of deep learning, recurrent neural networks and their variants have been increasingly applied to battery state estimation and lifetime prediction. Reference [
15] introduced a stacked BiLSTM network for remaining useful life prediction of supercapacitors, which provides useful insight into the modeling of degradation sequences. Reference [
16] combined BiLSTM with a self-attention mechanism for SOH estimation, while Reference [
17] optimized a BiLSTM model using the sparrow search algorithm. Reference [
18] constructed a long short-term memory (LSTM)-based capacity prediction model to learn battery degradation trends from time-series data. References [
19,
20] combined filtering algorithms, temperature-related models, and deep learning methods to estimate SOH under different operating conditions. Reference [
21] further investigated SOH estimation from a multi-time-scale perspective using Kalman filtering. These studies show that deep learning models can effectively capture temporal degradation patterns and have good potential in lithium-ion battery SOH prediction.
To improve the representation of complex aging information, multi-feature fusion, convolutional neural networks (CNNs), and attention mechanisms have been introduced in recent studies [
22,
23]. These methods enhance prediction performance by extracting local features, modeling temporal dependencies, and learning feature interactions. Although existing data-driven SOH estimation methods have achieved promising results, several limitations still remain. First, some studies only use a single type of HF or a limited number of degradation indicators, which may not be sufficient to characterize the complex aging behavior of lithium-ion batteries. Second, when multiple HFs are directly used as model inputs, the differences in their degradation relevance and contribution are often not fully considered. Redundant or weakly informative features may affect model training and reduce prediction stability. Third, the hyperparameters of deep learning models are commonly selected by experience, which may lead to unstable prediction performance and limit the reproducibility of the model.
To address these issues, this paper proposes an SOH prediction framework based on multidimensional HF weighted fusion and MRFO-CNN-BiLSTM. The main contributions of this study are summarized as follows. First, 11 HFs are extracted from charge–discharge duration, charging area, IC characteristics, voltage change rate, and time-ratio information so that battery aging can be described from multiple observable dimensions. Second, a Pearson correlation-based weighting strategy is introduced to adjust the contribution of different HFs while retaining all extracted features, thereby avoiding the direct removal of low-correlation but potentially complementary degradation information. Third, a CNN-BiLSTM model is constructed to learn local coupling relationships and bidirectional temporal degradation dependencies from the weighted fused HF matrix. Finally, MRFO is used to optimize key hyperparameters of the CNN-BiLSTM model, reducing the dependence on empirical parameter selection.
2. Multidimensional HF Selection and Weighted Fusion
In this study, four cells from the CALCE lithium-ion battery aging dataset [
24], namely CS2-35, CS2-36, CS2-37, and CS2-38, are selected for feature construction and model verification. The main battery specifications and charge–discharge conditions are summarized in
Table 1. For each cycle, charging was performed in constant-current/constant-voltage mode, followed by constant-current discharging. These controlled cycling data provide voltage, current, capacity, and cycle information for subsequent HF extraction.
The SOH of each cell is calculated according to the ratio between the current available capacity and the rated capacity, as expressed in Equation (1):
where
QC and
QR denote the current capacity and rated capacity of the battery, respectively. The SOH evolution curves of the four cells are shown in
Figure 1. Overall, the SOH values decrease as the cycle number increases, while local fluctuations and different degradation rates can also be observed among different cells. These characteristics indicate that the battery aging process is not strictly monotonic, which increases the difficulty of SOH prediction under cross-cell conditions.
2.1. Multidimensional HF Selection
The construction of input HF directly determines how much degradation information can be provided to the prediction model. Instead of relying on a single external response, this paper describes battery aging from several observable changes in the charge–discharge process. Considering the variations in charge–discharge duration, current integration area, incremental capacity behavior, voltage variation rate, and time proportion during cycling, 11 HF indicators are extracted from multiple dimensions. These HF indicators correspond to different aging-related responses, including changes in charging time, discharge behavior, constant-voltage charging duration, incremental capacity (IC) curve evolution, and voltage dynamic characteristics. The detailed definitions of the extracted HF are listed in
Table 2.
The selected HFs are related to battery aging from different physical perspectives. HF1–HF4 describe the time response of the battery during specific charging and discharging voltage intervals. As the battery ages, loss of active lithium, impedance growth, and polarization variation change the time required for the terminal voltage to pass through a given voltage range. HF5–HF7 are current–time integral features during different charging stages. Since the integral of current over time is directly related to the amount of charge transferred during charging, these area-related features can reflect the change in available capacity and charge acceptance ability during degradation. HF8 and HF9 are extracted from the IC curve. The IC peak and its corresponding voltage are sensitive to changes in electrochemical reaction processes, active material loss, and internal polarization, and therefore can characterize the evolution of battery aging from the perspective of voltage-capacity response. HF10 represents the minimum voltage change rate during charging, which reflects the dynamic voltage response and polarization behavior of the battery. HF11 describes the proportion of the charging time before reaching 4.2 V in the total charging time, reflecting the redistribution of charging duration between the constant-current and constant-voltage stages during aging. Therefore, the 11 HFs provide complementary information for SOH prediction from the perspectives of time response, charge accumulation, IC curve evolution, voltage dynamics, and charging-stage proportion.
To illustrate the variation patterns of the extracted HF, the CS2-38 cell is taken as an example, and the evolution curves of SOH and 11 HF with cycle number are plotted in
Figure 2. Different HFs show different changing trends during cycling, indicating that they describe battery degradation from different perspectives. Therefore, the extracted multidimensional HF can provide complementary information for subsequent SOH estimation.
To further describe the relationship between each HF and SOH, the Pearson correlation coefficient is used for quantitative analysis, as given in Equation (2):
where
n denotes the number of samples;
xi,k represents the value of the
i-th HF in the
k-th sample;
is the average value of the
i-th HF;
yk is the corresponding SOH value of the
k-th sample;
is the average SOH value; and
ri denotes the Pearson correlation coefficient between the
i-th HF and SOH.
2.2. Weighted Fusion of Multidimensional HF
The extracted HF has different physical meanings and numerical ranges. If they are directly used as model inputs, the training process may be affected by scale differences among features. Moreover, different HFs contribute unequally to SOH prediction. Some HFs are strongly correlated with SOH and reflect the main degradation trend, while others may provide local or auxiliary aging information. Therefore, this paper does not simply discard low-correlation HF. Instead, all 11 HFs are retained, and their contributions are adjusted through correlation-based weighting.
Before feature weighting, each HF is normalized to a unified numerical interval. This operation reduces the influence of feature magnitude differences and makes the subsequent weighting process more comparable. The normalized value of the
i-th HF in the
k-th sample is calculated by Equation (3).
where
and
denote the minimum and maximum values of the
i-th HF among all samples, respectively.
After normalization, the correlation strength between each HF and SOH is used as the basis for assigning feature importance. Considering that both positive and negative correlations may reflect battery degradation characteristics, the magnitude of the Pearson correlation coefficient is used for weight assignment. In this way, features with stronger correlations are assigned larger weights, while weakly correlated features are retained with smaller contributions. The weight of the
i-th HF is calculated as follows:
where
m is the total number of HF indicators, and
in this paper;
wi is the weighting coefficient corresponding to the
i-th HF, satisfying:
To show the correlation distribution more intuitively, the Pearson correlation coefficients among different HF and SOH are calculated and visualized in
Figure 3. The heatmap shows that the correlations between different HF and SOH are not uniform. Some HF are highly correlated with SOH, while several HF show relatively weak correlations. Meanwhile, correlations also exist among different HFs, indicating that the extracted multidimensional HFs contain both redundant and complementary information. The weights of different HFs are determined according to their correlation strength with SOH, and the results are presented in
Table 3.
Table 3 shows that the importance of different HF is not identical. HF5 obtains the largest weight, indicating that the current–time integral over the whole charging process has a close relationship with SOH. HF1, HF3, HF6, and HF8 also have relatively large weights, suggesting that charging time, staged charging area, and IC peak information are sensitive to battery aging. In contrast, HF4, HF7, and HF10 have smaller weights, but they still describe degradation-related changes from the constant-voltage charging stage, the later charging area, and the voltage response rate.
Therefore, the weight distribution does not mean that low-weight HF are useless. These HFs may not dominate the global degradation trend, but they can still provide supplementary information for local fluctuations or nonlinear aging behavior. For this reason, this paper retains all 11 HF and adjusts their contributions through weighting, rather than removing some features only according to their correlation values.
Although Pearson correlation mainly measures the linear relationship between each individual HF and SOH, it can provide a quantitative basis for evaluating the SOH-related degradation relevance of different HFs. In this study, the absolute Pearson correlation coefficient is used to adjust the relative input contribution of each HF before model training. HFs with stronger SOH-related degradation trends are assigned larger weights so that the subsequent CNN-BiLSTM model can focus more on the main degradation information during training. At the same time, low-correlation HFs are not directly removed because they may still contain supplementary information related to local fluctuations or aging differences among cells. Therefore, all 11 HFs are retained in the weighted fusion process. The weighted fused feature matrix provides more effective inputs for the subsequent CNN-BiLSTM model, thereby improving the model training effect and SOH prediction performance.
Furthermore, by combining the normalized HF indicators with their corresponding weights, a weighted fusion feature vector can be constructed. For the
k-th sample, the weighted fused feature representation is expressed as
Then, all samples can be represented as the weighted fused feature matrix:
where
n is the number of samples; each row of matrix
Z represents the weighted fused representation of one sample in the multidimensional HF space. After normalization and correlation-based weighting, the weighted fused feature matrix
Z is obtained. This matrix comprehensively characterizes the differences in the contributions of different HF indicators to battery SOH and is used as the input of the subsequent CNN-LSTM prediction model to establish the mapping relationship between multidimensional HF and SOH.
3. SOH Prediction Model Based on MRFO-CNN-BiLSTM
Based on the weighted fused feature matrix, an MRFO-CNN-BiLSTM model is developed for lithium-ion battery SOH prediction. In the proposed framework, the CNN part is responsible for extracting local coupling patterns from the fused HF sequence, while the BiLSTM part further learns the degradation dependencies along the cycling process. MRFO is used as an outer-loop optimizer to search for suitable hyperparameters of the CNN-BiLSTM model. The overall framework is shown in
Figure 4.
3.1. CNN-BiLSTM Network Structure
The CNN-BiLSTM network used in this paper is designed to process the weighted fused HF sequence. The input layer receives the fused feature matrix , and the front part of the network contains two one-dimensional convolutional layers. These convolutional layers extract local relationships among adjacent HF components and convert the original input sequence into higher-level feature representations. A max-pooling layer is then used to compress redundant information and reduce the temporal dimension.
After convolutional feature extraction, the obtained feature sequence is sent to two BiLSTM layers. Different from a unidirectional LSTM, a BiLSTM can learn degradation information from both forward and backward directions, which is beneficial for capturing the evolution pattern of battery SOH. Finally, the fully connected and regression layers transform the learned feature representation into the estimated SOH value.
Let the weighted fused feature representation of the
k-th sample be
. Then, the prediction process of the model can be expressed as
where
is the SOH prediction value output by the model;
is the nonlinear mapping function represented by MRFO-CNN-BiLSTM; and
denotes the set of network parameters. For the convolutional layer, its output can be expressed as
where
W denotes the convolution kernel weight;
b denotes the bias term;
represents the convolution operation; and
represents the activation function. Through convolution operations, the model can extract more discriminative local degradation features from the input HF, providing a basis for subsequent temporal modeling. Furthermore, the BiLSTM unit regulates information retention, update, and output through its gating mechanism, and the related equations are expressed as follows:
where
ft,
it, and
ot represent the three gating variables of the LSTM unit, while
Ct and
ht denote the cell state and hidden output, respectively. The symbol
refers to the Hadamard product. Compared with unidirectional LSTM, BiLSTM can simultaneously utilize historical and future contextual information, which is more conducive to extracting key temporal features during SOH evolution.
3.2. MRFO Hyperparameter Optimization Mechanism
Several hyperparameters influence the CNN-BiLSTM model, including convolutional kernel settings, the size of the BiLSTM hidden layer, learning rate, and dropout ratio. Manual parameter selection may lead to unstable prediction results and may not fully exploit the model’s capability. Therefore, MRFO is introduced in this paper to optimize these key hyperparameters.
In the proposed optimization process, each individual in the MRFO population represents a candidate hyperparameter combination. For each candidate hyperparameter configuration, the CNN-BiLSTM model is trained and evaluated, and the validation RMSE is taken as the optimization objective. The manta ray population has subsequently evolved chain foraging, cyclone foraging, and somersault foraging strategies. This search procedure is repeated until the preset iteration limit is satisfied or the objective value becomes stable. The overall optimization workflow is illustrated in
Figure 5.
Let the hyperparameter vector to be optimized be defined as
where
and
are the numbers of convolution kernels in the two convolutional layers, respectively;
is the number of BiLSTM hidden units;
is the initial learning rate; and
is the dropout rate. During MRFO, the RMSE calculated on the validation set is used to evaluate each candidate solution. The hyperparameter search is performed by repeatedly adjusting the positions of population members until a better parameter combination is obtained. The optimization objective can be expressed as
where
and
correspond to the measured and predicted SOH values of the
i-th sample, respectively, while
is the size of the validation dataset.
During the iterative process of MRFO, the population size, maximum number of iterations, and individual position parameters are first initialized. Then, the model error corresponding to each candidate hyperparameter combination is calculated and used as the fitness value. Next, the individual positions are updated according to the chain foraging, cyclone foraging, and somersault foraging mechanisms. The optimization terminates when either the maximum iteration number is satisfied or the fitness criterion reaches convergence, after which the optimal hyperparameter set is selected. Among them, chain foraging is used to enhance information transfer among population individuals, cyclone foraging is used to improve the search ability in the global optimal region, and somersault foraging is used to further perform fine optimization around the optimal solution.
The weighted fused feature matrix , together with the corresponding SOH labels, is partitioned into training, validation, and test datasets before model training. The training data are used for parameter updating, the validation data guide the selection of hyperparameter combinations during MRFO, and the test data are employed only to evaluate the final prediction performance. After the optimal hyperparameters are obtained, the CNN-BiLSTM model is reconstructed with these settings and trained using the Adam optimizer. Dropout is introduced to reduce overfitting. Finally, the optimized model is used to predict SOH on the test dataset.
To improve the reproducibility of the proposed method, the model input settings, dataset partition strategy, and MRFO hyperparameter optimization settings are summarized in
Table 4. In the main experiment, CS2-38 is used for training and validation, while CS2-35, CS2-36, and CS2-37 are used as independent test cells. In addition, leave-one-cell-out cross-validation is further introduced to evaluate the cross-cell generalization ability of the proposed method under different training-test partitions.
5. Conclusions
Aiming at the problem that a single feature is difficult to fully characterize the SOH evolution of lithium-ion batteries during aging, this paper proposes an SOH prediction method based on multidimensional HF weighted fusion and MRFO-CNN-BiLSTM and verifies it on the CALCE dataset. The main conclusions are as follows.
- (1)
The 11 HFs extracted from multiple dimensions, including charge–discharge duration, charging area, IC, voltage change rate, and time ratio, can characterize battery degradation information from different perspectives. The weighted fusion method based on Pearson correlation coefficients can highlight the contributions of highly correlated HF while retaining the supplementary degradation information contained in low-correlation HF, providing effective inputs for SOH prediction.
- (2)
By combining CNN for local feature extraction and BiLSTM for forward–backward temporal dependency modeling, the MRFO-CNN-BiLSTM model can effectively characterize battery degradation patterns. With MRFO-assisted hyperparameter tuning, the model obtains R2 values higher than 0.95 on the CS2-35, CS2-36, and CS2-37 test batteries.
- (3)
The ablation experiment on HF fusion strategies shows that Weighted 11-HF generally outperforms unweighted 11-HF, Weighted Top-3 HF, Weighted Top-5 HF, and Weighted Top-7 HF. The weighted fusion of all 11 HF can make fuller use of multidimensional degradation information and improve the accuracy and stability of SOH prediction.
It should be noted that the present method mainly relies on complete or nearly complete charge–discharge cycle data. Therefore, it is more suitable for standard laboratory cycle data or application scenarios where complete charging information can be obtained. Future work will further investigate SOH prediction under partial charging segments, different temperatures, different charge–discharge rates, different battery types, and practical BMS operating conditions. In addition, fast electrochemical features such as EIS-related indicators, nonlinear feature weighting strategies, attention mechanisms, and uncertainty estimation methods will be considered to further improve the adaptability, prediction accuracy, and safety-oriented reliability of the proposed framework.