State of Health Estimation for Lithium-Ion Batteries Based on Conformer-KAN

Wang, Yuchen; Wang, Jingyu

doi:10.3390/a19030203

Open AccessArticle

State of Health Estimation for Lithium-Ion Batteries Based on Conformer-KAN

by

Yuchen Wang

^* and

Jingyu Wang

College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin 300222, China

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(3), 203; https://doi.org/10.3390/a19030203

Submission received: 31 January 2026 / Revised: 25 February 2026 / Accepted: 3 March 2026 / Published: 9 March 2026

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

The state of health (SOH) estimation of lithium-ion batteries faces significant challenges under complex operating conditions due to transient disturbances and distribution shifts. This paper proposes a deep learning framework named Conformer-KAN, which integrates a convolution-augmented Transformer (Conformer) with a Kolmogorov–Arnold Network (KAN). The method first constructs a unified input representation by fusing multi-view features including voltage, current, temperature, and incremental capacity. It then employs a Conformer encoder that combines gated local convolution units (GLCU) and multi-head self-attention (MHSA) to achieve joint modeling of local and global features. In addition, learnable spline-based activation functions are introduced within the KAN structure to enhance the model’s capacity for capturing complex nonlinear degradation behaviors. Cross-battery and cross-condition evaluations conducted on two public datasets demonstrate that the proposed method achieves root mean square errors (RMSE) of 0.006 ± 0.001 and 0.003 ± 0.001, and coefficients of determination (R²) of 0.987 ± 0.003 and 0.994 ± 0.002, respectively. These results show that Conformer-KAN significantly outperforms existing mainstream approaches in both robustness and generalization performance.

Keywords:

lithium-ion battery; state of health estimation; conformer-KAN; multi-view features; deep learning

1. Introduction

The state of health (SOH) of lithium-ion batteries serves as a fundamental indicator for energy optimization and safety warning in battery management systems (BMS) [1]. However, battery aging processes exhibit strong nonlinearity and time-varying characteristics. In practical operation, dynamic fluctuations in operating conditions, ambient temperature, and charge–discharge rates frequently induce distribution shifts in monitoring data, which leads to a pronounced degradation in the generalization performance of models trained under single operating conditions when deployed across different scenarios [2]. Consequently, enhancing the feature extraction capability and robustness of data-driven models under complex operating conditions remains a critical challenge in SOH estimation.

While data-driven approaches have made significant progress in SOH estimation, their evolution reflects a sequence of architectural attempts to balance local sensitivity, temporal dependency modeling, and cross-condition robustness. Early studies relied on handcrafted degradation indicators derived from voltage, differential voltage, and incremental capacity (IC) curves, which were then mapped to SOH using conventional regression models such as Gaussian Process Regression (GPR) [3], semi-supervised transfer component analysis [4], or interval prediction frameworks [5]. Although these feature-based approaches offered interpretability and relatively low computational complexity, their reliance on fixed feature representations made them sensitive to noise and inadequate for capturing the full nonlinear evolution of battery degradation under varying operating conditions.

To bypass the limitations of manual feature engineering, the field shifted toward deep learning, prominently utilizing Long Short-Term Memory (LSTM) architectures. LSTM-based models leveraged gated memory mechanisms to learn temporal degradation dynamics directly from sequential measurements [6,7,8]. Compared with static regression models, LSTMs improved adaptability to nonlinear aging processes. Subsequent enhancements incorporated attention mechanisms and convolutional front-ends to improve feature selectivity and local pattern extraction [9,10,11]. However, despite their ability to model short- and medium-range temporal dependencies, recurrent architectures remain constrained by sequential information flow, limited parallelization, and gradient attenuation over long sequences. These structural characteristics reduce their efficiency in modeling long-range dependencies across the entire battery life cycle and limit their responsiveness to abrupt regeneration phenomena.

Transformer-based architectures were subsequently introduced to address these long-range dependency bottlenecks. By employing multi-head self-attention, Transformer models capture global correlations across multivariate time-series more effectively than recurrent networks [12,13]. Their parallel computation paradigm improves scalability and enables holistic degradation modeling. Nevertheless, pure Transformer models inherently lack convolutional inductive bias, which diminishes sensitivity to localized degradation behaviors. Moreover, their reliance on fixed-activation Multi-Layer Perceptron (MLP) blocks in feed-forward layers may restrict functional approximation capacity when modeling highly nonlinear, multi-stage aging trajectories.

To reconcile local and global modeling capabilities, hybrid convolutional neural network (CNN) and Transformer frameworks were proposed. Gu et al. and Zhang et al. integrated convolutional modules with attention-based encoders to enhance representational robustness and multi-scale degradation perception [14,15]. Although these architectures improve local–global interaction, their structural fusion is often empirically designed rather than systematically optimized, and most still employ conventional MLP-based regression heads. As a result, nonlinear approximation efficiency remains fundamentally bounded by fixed activation functions.

Beyond architectural design, transfer-learning strategies have been explored to mitigate distributional shifts across batteries and operating conditions. Wu et al. proposed a transfer-stacking framework to enhance cross-battery generalization, while related works in multivariate time-series modeling have emphasized representation alignment techniques [16,17,18]. Although these approaches improve inductive transfer under limited labeled data, they primarily address feature alignment rather than enhancing the intrinsic representational capacity of the backbone model. Consequently, cross-condition robustness often depends on additional fine-tuning or complex domain adaptation procedures.

Collectively, despite continuous architectural advancements from feature-based regression to recurrent, attention-based, hybrid, and transfer-learning paradigms, existing methods still struggle to simultaneously achieve strong local feature sensitivity, effective long-range dependency modeling, high-capacity nonlinear function approximation, and robust cross-condition generalization within a unified framework. To address these limitations, this study proposes an end-to-end multi-view deep learning framework termed Conformer-KAN, which integrates a convolution-augmented Transformer (Conformer) with a Kolmogorov–Arnold Network (KAN) for SOH estimation. By incorporating convolutional inductive bias, attention-based global modeling, and spline-based adaptive nonlinear mapping into a single architecture, the proposed framework aims to improve both representational capacity and generalization robustness under complex operating conditions. The main contributions of this study are summarized as follows:

(1): Multi-view feature representation: A unified multi-view input layer integrates physical signals and mechanism-informed descriptors with temporal alignment, improving robustness to noise and distribution shifts across operating conditions.
(2): Conformer encoder structure: A Macaron-style Conformer combines multi-head self-attention and gated local convolution to jointly model localized degradation signatures and long-range temporal dependencies.
(3): KAN-based prediction module: A Kolmogorov–Arnold Network with learnable B-spline edge functions replaces conventional MLP heads, enhancing nonlinear function approximation for complex SOH degradation trajectories.

The remainder of this paper is organized as follows. Section 2 introduces the proposed Conformer-KAN framework, including the multi-view feature representation, the convolution-augmented Conformer encoder, the KAN-based nonlinear modeling module, and the multi-dimensional attention prediction layer. Section 3 presents the experimental validation, covering dataset construction, implementation settings, ablation studies, comparative evaluations with baseline models, and generalization analysis. Finally, Section 4 summarizes the key contributions of this study and provides concluding remarks.

2. Conformer-KAN Method

2.1. Overall Network Architecture

To address the strong nonlinearity of battery aging, localized regeneration signatures, and distribution shifts across operating conditions, we propose an end-to-end SOH estimator termed Conformer-KAN (Figure 1). Given a multivariate charging-stage sequence, the method follows a four-stage pipeline: (i) multi-view feature construction and temporal alignment, (ii) shallow convolutional embedding to obtain a unified latent representation, (iii) a Macaron-style Conformer encoder that alternates global self-attention and gated local convolution for joint local–global temporal modeling, and (iv) an attention-guided regression head in which a Kolmogorov–Arnold Network (KAN) provides adaptive spline-based nonlinear approximation.

Formally, for each cycle (or charging segment) we construct an input tensor

X \in R^{L \times F}

, where

L

is the resampled sequence length and

F

is the number of input channels (multi-view features). The shallow embedding maps

X

to

H_{0} \in R^{L \times d_{model}}

. The Conformer encoder produces

H_{N} \in R^{L \times d_{model}}

, which is subsequently fused by temporal/channel attention to obtain

h \in R^{d_{model}}

. Finally, KAN regresses

\hat{y} \in R^{o u t}

as the predicted SOH.

2.2. Multi-View Feature Representation

A multi-view feature representation is constructed to integrate directly measured physical signals with mechanism-informed degradation descriptors, thereby improving robustness under measurement noise and cross-condition distribution shifts. All features are extracted from the constant-current (CC) charging stage to maintain consistency of operating conditions across cycles. Two complementary views are considered: (i) a physical view consisting of terminal voltage

V (t)

, charging current

I (t)

, and surface temperature

T (t)

; and (ii) a mechanism-informed view given by the incremental capacity (IC) curve, computed from capacity–voltage pairs

(Q_{t}, V_{t})

as

I C (V) = \frac{d Q}{d V} \approx \frac{Q_{t + Δ t} - Q_{t}}{V_{t + Δ t} - V_{t}}

(1)

Because numerical differentiation is highly sensitive to measurement noise, the discrete

Q (V)

sequence is smoothed before finite-difference approximation. Specifically, Gaussian smoothing is applied using a kernel size of 11 and standard deviation

σ = 2

. These smoothing parameters are fixed across all experiments and are not estimated from test data. After smoothing, the IC curve is obtained by numerical differentiation of the filtered

Q (V)

sequence.

To construct a unified input representation, the mechanism-informed IC feature is temporally aligned with the raw physical signals. Although

I C (V)

is initially defined in the voltage domain, the CC charging stage provides a monotonic voltage trajectory

V (t)

, enabling each IC sample to be mapped to its corresponding time index. This yields a time-aligned sequence

I C (t)

, which is synchronized with

V (t)

,

I (t)

, and

T (t)

. Consequently, all four modalities are treated as parallel time-series channels. Each channel is then resampled to a fixed length

L

by linear interpolation, producing a normalized multi-view tensor

X \in ℝ^{L \times F}

(2)

where

F = 4

corresponds to voltage, current, temperature, and IC features.

To ensure reproducibility and prevent data leakage, dataset splitting is performed at the battery level before normalization. Min–max normalization statistics are computed using the training split only and subsequently applied to the validation and test splits. The resulting tensor

X

is projected by a one-dimensional shallow convolution module (SConv) into the latent representation

H_{0} \in ℝ^{L \times d_{model}}

(3)

using kernel size 5 and stride 1, followed by normalization and dropout. This operation performs initial cross-channel fusion and noise suppression, and the resulting representation serves as the input to the Conformer encoder.

2.3. Conformer Encoder

In this study, the Conformer is adopted as a battery-oriented temporal encoder within the proposed SOH estimation framework. Its use is coupled with multi-view feature construction, temporal alignment in the charging stage, and a local–global temporal encoding design to accommodate degradation evolution and regeneration-related fluctuations. Under this formulation, the Conformer serves as a structured backbone for jointly modeling localized aging signatures and long-range dependencies in charging trajectories. Compared with CNN–LSTM pipelines, the encoder enables parallel computation while retaining convolutional inductive bias; compared with pure Transformer architectures, it improves local sensitivity through gated depthwise convolution.

The encoder consists of

N

identical Conformer blocks. Each block transforms the representation according to

H_{l - 1} \to H_{l}, H_{l - 1}, H_{l} \in ℝ^{L \times d_{model}}, l = 1, 2, \dots, N

(4)

Through iterative stacking, the feature representation is progressively refined while preserving the temporal dimension

L

. The final encoder output is denoted as

H_{N} \in R^{L \times d_{model}}

.

2.3.1. Gated Local Convolution

To compensate for the lack of local inductive bias in self-attention mechanisms, a Gated Local Convolution Unit (GLCU) is introduced. The GLCU adopts a bottleneck structure composed of pointwise convolution, gated activation, and depthwise convolution, aiming to enhance sensitivity to localized voltage fluctuations and temperature inflection points induced by capacity regeneration [19].

Given an input feature tensor

H \in R^{L \times d_{m o d e l}}

, where

L

denotes the temporal length and

d

_model represents the feature channel dimension, the computation is formulated as follows:

\begin{array}{l} \tilde{H} = LN (H) \\ H_{pw} = GLU (Conv (\tilde{H})) \\ H_{dw} = Swish (BN (DWConv (H_{pw}))) \\ H_{out} = Dropout (Conv (H_{dw})) \end{array}

(5)

where, LN(·) and BN(·) denote layer normalization and batch normalization, respectively. Conv(·) indicates one-dimensional pointwise convolution, DWConv(·) represents one-dimensional depthwise convolution, GLU(·) denotes the gated linear unit, and Swish(·) is the Swish activation function.

2.3.2. Conformer Encoder Structure

As shown in Figure 2, the Conformer encoder adopts a Macaron-style architecture. Each encoding block consists of two half-step Feed-Forward Networks (FFNs), one Multi-Head Self-Attention (MHSA) module, and one GLCU, arranged in an alternating manner.

The feature propagation at the l-th layer is defined as follows. First, a half-step FFN introduces nonlinear mapping:

H_{l}^{(1)} = H_{l - 1} + \frac{1}{2} FFN (LN (H_{l - 1}))

(6)

Next, MHSA captures long-range degradation trends:

H_{l}^{(2)} = H_{l}^{(1)} + MHSA (LN (H_{l}^{(1)}))

(7)

Then, GLCU enhances local temporal perception

H_{l}^{(3)} = H_{l}^{(2)} + GLCU (LN (H_{l}^{(2)}))

(8)

Finally, another half-step FFN reconstructs the output features:

H_{l} = H_{l}^{(3)} + \frac{1}{2} FFN (LN (H_{l}^{(3)}))

(9)

Through hierarchical stacking of these Conformer blocks, the encoder progressively integrates multi-scale local fluctuations and long-range degradation dynamics, producing the final representation

H_{N}

, which is subsequently fed into the multi-dimensional attention fusion module.

2.4. Multi-Dimensional Attention Fusion

To further enhance representation quality before regression, we introduce a multi-dimensional attention fusion module that adaptively re-weights encoder features along both temporal and channel dimensions. This design aims to (i) identify degradation-relevant temporal segments within the charging trajectory and (ii) suppress redundant or noise-sensitive feature channels prior to nonlinear mapping.

Denote

H_{N} = [h_{1}, h_{2}, \dots, h_{L}]^{⊤},

with

h_{t} \in R^{d_{model}}

representing the feature vector at time step

t

. Temporal attention (TA) evaluates the relative importance of each time step by computing attention weights across the temporal dimension. The attention score for time step

t

is defined as

s_{t} = w_{t}^{⊤} h_{t}

(10)

where

w_{t} \in R^{d_{model}}

is a learnable parameter vector. The normalized attention weight is obtained via softmax:

α_{t} = \frac{\exp (s_{t})}{\sum_{k = 1}^{L} \exp (s_{k})}

(11)

The temporally aggregated representation is then computed as

h_{time} = \sum_{t = 1}^{L} α_{t} h_{t}, h_{time} \in ℝ^{d_{model}}

(12)

Given the temporally fused representation

h_{time}

, channel attention (CA) further models inter-channel dependencies to refine feature discriminability. Channel-wise weights are computed as

β = σ (MLP (h_{time}))

(13)

where

β \in R^{d_{model}}

,

σ (\cdot)

denotes the sigmoid activation function, and MLP represents a lightweight two-layer feed-forward network. The final fused feature vector is obtained via element-wise modulation:

h = β ⊙ h_{time}, h \in ℝ^{d_{model}}

(14)

where

⊙

denotes element-wise multiplication.

2.5. KAN-Based Modeller

To overcome the efficiency bottleneck of conventional MLPs in function approximation, a KAN is introduced as the nonlinear modeling core, as illustrated in Figure 3. Unlike fixed activation functions, KAN parameterizes activations along network edges using learnable spline functions, enabling adaptive fitting of complex SOH evolution trajectories.

According to the Kolmogorov–Arnold representation theorem, a multivariate function can be expressed as a finite superposition of univariate functions [20]. Let

h \in R^{d_{in}}

denote the fused representation from the attention module. A KAN layer maps

h

to

y \in R^{d_{out}}

as

y_{i} = \sum_{j = 1}^{d_{in}} ϕ_{i, j} (h_{j}) + b_{i}, i = 1, \dots, d_{out}

(15)

where

ϕ_{i, j} (\cdot)

denotes the learnable edge function from input

j

to output

i

,

b_{i}

is the bias term. Each edge function is parameterized as a base activation combined with a B-spline expansion:

ϕ_{i, j} (x) = a_{i, j} SiLU (x) + \sum_{k = 1}^{G} c_{i, j, k} B_{k} (x)

(16)

where

B_{k} (\cdot)

is the

k

-th B-spline basis of order

p

defined over a knot grid,

G

is the number of spline bases (grid resolution), and

{a_{i, j}, c_{i, j, k}}

are learnable parameters. This design enables the regression head to adaptively approximate multi-stage and highly nonlinear degradation trajectories that are difficult to fit with fixed-activation MLPs.

Beyond generic nonlinear approximation, the use of KAN is motivated by the shape characteristics of battery SOH trajectories under heterogeneous operating conditions. In practice, SOH evolution may exhibit piecewise-varying slopes, local regeneration-like deviations, and condition-dependent nonlinear transitions. Compared with fixed-activation MLP heads, the spline-parameterized edge functions in KAN provide more flexible local function-shape adaptation, which helps the regression head better fit such irregular degradation patterns while maintaining stable performance under cross-condition shifts.

2.6. SOH Prediction and Optimization

For SOH estimation, the final KAN regression head produces a scalar prediction:

\hat{y} = {KAN}_{head} (h)

(17)

The entire network is trained end-to-end using the mean squared error (MSE) loss:

L_{MSE} = \frac{1}{B} \sum_{n = 1}^{B} {({\hat{y}}_{n} - y_{n})}^{2}

(18)

where

B

is the mini-batch,

y_{n}

is the ground-truth SOH,

{\hat{y}}_{n}

is the predicted SOH.

For reproducibility, all architectural and training hyperparameters are explicitly specified. The resampled sequence length is fixed to

L = 128

, and the multi-view input dimension is

F = 4

, corresponding to voltage, current, temperature, and IC features. The shallow convolutional embedding maps the input to an embedding dimension

d_{model} = 128

using a one-dimensional convolution with kernel size 5, stride 1, and dropout rate 0.1. The Conformer encoder consists of

N = 4

stacked blocks, each with embedding dimension

d_{model} = 128

, feed-forward hidden dimension

d_{ff} = 256

, and

h = 4

attention heads. The depthwise convolution within the GLCU adopts a kernel size of 15, and dropout with rate 0.1 is applied in the FFN, MHSA, and convolutional modules. The multi-dimensional attention fusion layer preserves the embedding dimension and generates temporal and channel attention weights through a two-layer projection with hidden dimension 64. In the KAN module, the input dimension is

d_{in} = 128

and the output dimension is 1 for SOH regression. Each edge function is parameterized using cubic B-spline bases of order

p = 3

with grid resolution

G = 8

, and the knot grid is uniformly initialized over the normalized input range

[- 1, 1]

.

All dataset splits are conducted at the battery level prior to normalization, and normalization statistics are computed exclusively from the training set and then applied to validation and test sets to avoid data leakage. Hyperparameters are selected using only the training and validation splits and remain fixed across all cross-condition experiments to ensure fair comparison. The complete training procedure of the proposed method is summarized in Algorithm 1.

Algorithm 1 Training procedure of the Conformer-KAN framework for SOH estimation

Input: Battery dataset

D = {X_{n}, y_{n}^{SOH}}

; maximum number of training epochs E; batch size B
Output: Trained model parameters

Θ^{*}

01.: Initialize model parameters $Θ$ randomly
02.: Data preprocessing: resample all sequences in $D$ to a fixed length L, normalize features to [0, 1], and compute IC curves
03.: While $e < E$ do
04.: Shuffle dataset $D$ and divide it into mini-batches of size B
05.: For each mini-batch $(X_{b}, y_{b}^{SOH}) \in D$ do
06.: Step 1: Multi-view feature embedding
07.: $H_{0} \leftarrow SConv (X_{b})$
08.: Step 2: Conformer-KAN encoding
09.: For $l = 1$ to N do
10.: $\begin{array}{l} H_{l}^{(1)} \leftarrow H_{l - 1} + \frac{1}{2} FFN (LN (H_{l - 1})) \\ H_{l}^{(2)} \leftarrow H_{l}^{(1)} + MHSA (LN (H_{l}^{(1)})) \\ H_{l}^{(3)} \leftarrow H_{l}^{(2)} + GLCU (LN (H_{l}^{(2)})) \\ H_{l} \leftarrow H_{l}^{(3)} + \frac{1}{2} FFN (LN (H_{l}^{(3)})) \end{array}$
11.: End For
12.: Step 3: Multi-dimensional attention fusion
13.: $h_{time} \leftarrow TA (H_{N})$
14.: $h \leftarrow CA (h_{time})$
15.: Step 4: SOH prediction and optimization
16.: ${\hat{y}}^{SOH} \leftarrow {KAN}_{head} (h)$
17.: Compute loss: $L = MSE ({\hat{y}}_{n}^{SOH}, y_{n}^{SOH})$
18.: Update parameters: $Θ \leftarrow Θ - η \nabla_{Θ} L_{MSE}$
19.: End For
20.: $e \leftarrow e + 1$
21.: End While
22.: Return trained parameters $Θ^{*}$

3. Results and Analysis

3.1. Dataset Construction

To comprehensively evaluate the prediction accuracy and generalization capability of the proposed Conformer-KAN model across different material systems, battery formats, and operating conditions, two widely adopted benchmark datasets are employed: the lithium-ion battery dataset released by the National Aeronautics and Space Administration (NASA) and the battery aging dataset provided by the University of Oxford. A detailed comparison of key dataset parameters is summarized in Table 1 [21].

NASA dataset: Four cylindrical lithium-ion batteries, labeled B0005, B0006, B0007, and B0018, are selected. The cathode material is lithium nickel cobalt aluminum oxide, with a rated capacity of 2.0 Ah. All experiments are conducted at 24 °C under a constant-current–constant-voltage (CC–CV) charging protocol with a charging current of 1.5 A, followed by 2 A constant-current discharge. The corresponding SOH trajectories exhibit pronounced nonlinearity and capacity regeneration phenomena, making this dataset particularly suitable for evaluating a model’s sensitivity to localized transient fluctuations, as illustrated in Figure 4.

Oxford dataset: This dataset consists of eight prismatic pouch lithium-ion batteries (Cell 1–Cell 8) with lithium cobalt oxide cathodes and a rated capacity of 0.74 Ah. The aging experiments are performed at an elevated temperature of 40 °C using a 1 C constant-current cycling protocol. Owing to its distinct physical form factor and higher noise level, this dataset is designed to assess the cross-condition transferability and generalization performance of SOH estimation models.

The Oxford dataset is used as a cross-domain validation benchmark to examine the robustness of SOH estimation under heterogeneous battery conditions. Relative to the NASA dataset, it exhibits multiple sources of distributional shift, including differences in cell format (prismatic pouch versus cylindrical), cathode material system, ambient/cycling temperature, and charge–discharge protocol. These discrepancies affect both degradation trajectories and signal characteristics, thereby providing a more stringent setting for evaluating cross-condition generalization than single-dataset testing alone.

In this study, SOH is adopted as the aging indicator and is defined as the ratio between the current maximum available capacity

C_{current}

and the nominal initial capacity

C_{new}

, expressed as:

SOH = \frac{C_{current}}{C_{new}} \times 100 %

(19)

To construct high signal-to-noise ratio inputs, only voltage, current, and temperature measurements recorded during the constant-current charging stage are extracted for analysis. The datasets are partitioned as follows:

NASA dataset: A hold-out strategy is employed, where B0006, B0007, and B0018 are used for training and B0005 is reserved for testing, aiming to evaluate baseline prediction accuracy.

Oxford dataset: Cell 6 is selected as the target-domain test battery, while the remaining cells are used for training, in order to assess cross-domain generalization capability.

For hyperparameter selection and early stopping, a validation split is constructed from the source-domain training batteries only, and the held-out test batteries are used exclusively for final evaluation. The current study adopts a fixed hold-out protocol for controlled comparison; more extensive cross-battery repeated validation will be explored in future work.

3.2. Experimental Setup

All models in this study are implemented using Python 3.8 and the PyTorch 1.10 deep learning framework. The experiments are conducted on a workstation equipped with an Intel Core i9-12900K CPU and an NVIDIA GeForce RTX 3090 GPU.

Model training is performed using the Adam optimizer, with an initial learning rate set to 0.001, which is dynamically adjusted using a cosine annealing schedule. Network weights are initialized using a normal distribution to facilitate faster convergence. The batch size is set to B = 32, and the maximum number of training epochs is set to E = 200.

To quantitatively evaluate the performance of SOH estimation, three widely used metrics are adopted: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R²) [22]. Lower RMSE and MAE values, together with an R² value closer to unity, indicate higher estimation accuracy. The corresponding definitions are given as follows:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(20)

MAE = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}|

(21)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(22)

where,

y_{i}

and

{\hat{y}}_{i}

denote the ground-truth and predicted SOH values of the

i

-th sample, respectively;

\bar{y}

represents the mean of the ground-truth SOH values; and

N

is the total number of samples.

To ensure fair comparison, all baseline methods are trained under the same data partition protocol, preprocessing pipeline, optimizer family, and early-stopping criterion whenever applicable. Hyperparameters are selected using the validation split only, and the best checkpoint is chosen according to validation RMSE. In addition to single-run performance, repeated experiments with different random seeds are conducted for the main methods, and the mean ± standard deviation are reported for the primary metrics. Model complexity and computational cost (including parameter count and inference time) are also reported to provide a balanced comparison between accuracy and deployment efficiency.

3.3. Ablation Study

To systematically investigate the contributions of the core components in the proposed Conformer-KAN framework, including the multi-view input strategy, the Conformer encoder, and the KAN-based modeling module, a comprehensive ablation study is conducted. Four representative variant models are constructed for comparison:

Transformer-MLP: A baseline model consisting of a standard Transformer encoder followed by an MLP-based prediction head, without local convolution enhancement or the KAN structure.

Conformer-MLP: A variant in which the standard Transformer is replaced by a Conformer encoder, aiming to evaluate the benefit of joint local–global feature modeling for temporal representation.

Transformer-KAN: A model that retains the standard Transformer encoder while replacing the MLP with a KAN module, designed to assess the nonlinear approximation capability of KAN.

Conformer-KAN: The complete framework proposed in this study, integrating both the Conformer encoder and the KAN-based modeling module.

The quantitative evaluation results summarized in Table 2, together with the full life cycle SOH prediction curves illustrated in Figure 5, show consistent performance differences among the variant models. By comparing Transformer-MLP with Conformer-MLP, it is observed that introducing the Conformer architecture substantially improves temporal representation quality, with the RMSE decreasing from 0.019 ± 0.003 to 0.011 ± 0.002 (approximately 42.1% reduction) and the

R^{2}

increasing from 0.886 ± 0.015 to 0.939 ± 0.011. This trend is consistent with the role of the gated local convolution unit in complementing self-attention with stronger local inductive bias, thereby improving sensitivity to localized fluctuations and transition regions in battery degradation trajectories. The corresponding increase in parameter count (from 1.25 M to 1.58 M) and inference time (from 2.5 ms/sample to 3.1 ms/sample) remains moderate.

In terms of nonlinear modeling capability, the comparison between Transformer-MLP and Transformer-KAN shows that replacing the MLP head with KAN further improves prediction accuracy, with the RMSE reduced to 0.009 ± 0.002, the MAE reduced to 0.86 ± 0.07 × 10⁻², and the

R^{2}

increased to 0.975 ± 0.006. This performance gain is consistent with the adaptive function-shape modeling capability of KAN, whose learnable B-spline-based edge activations provide greater flexibility than fixed-activation MLP layers when fitting stage-wise nonlinear SOH trajectories. Notably, Transformer-KAN uses fewer parameters (1.10 M) than Transformer-MLP (1.25 M), but incurs higher inference latency (6.8 ms/sample vs. 2.5 ms/sample), indicating an accuracy–efficiency trade-off associated with spline-based function evaluation.

Ultimately, the proposed Conformer-KAN model achieves the best overall prediction performance by combining the local–global temporal modeling strength of the Conformer with the adaptive nonlinear approximation capability of KAN. Specifically, it achieves the lowest MAE (0.53 ± 0.04 × 10⁻²) and RMSE (0.006 ± 0.001), together with the highest

R^{2}

(0.987 ± 0.003) among all variants. The prediction curves further show that Conformer-KAN provides the closest trajectory alignment to the ground-truth SOH over the full life cycle, including regions with localized capacity drop and recovery-like fluctuations. In terms of computational characteristics, Conformer-KAN maintains a parameter count of 1.42 M (lower than Conformer-MLP at 1.58 M) while exhibiting the highest inference time (7.5 ms/sample), which reflects the additional computational overhead introduced by the KAN module. Overall, the ablation results provide empirical support for the complementary benefits of Conformer-based temporal encoding and KAN-based nonlinear regression under the present hold-out evaluation setting.

3.4. Comparative Experiments

To verify the superiority of the proposed Conformer-KAN method in lithium-ion battery SOH estimation, three representative data-driven approaches are selected as benchmark models for comparison. The configurations of the baseline models are summarized as follows:

AST-LSTM: An attention-based spatio-temporal long short-term memory network that integrates attention mechanisms with LSTM architecture, aiming to evaluate its capability in capturing long-sequence dependencies [6].

CNN-LSTM: A hybrid architecture that cascades CNN and LSTM networks, where CNNs are employed for local feature extraction and LSTMs are responsible for temporal dependency modeling [10].

CNN-Trans: A hybrid CNN–Transformer model that leverages the global self-attention mechanism of Transformers, representing the current mainstream approach in SOH estimation [14].

The SOH prediction trajectories produced by different models are illustrated in Figure 6. As observed, the prediction curve generated by Conformer-KAN shows the closest alignment with the ground-truth SOH degradation trajectory over the full life cycle. In the early aging stage and in regions with localized capacity drops or recovery-like fluctuations, both AST-LSTM and CNN-LSTM exhibit noticeable phase lag and amplitude deviation, suggesting that recurrent-dominant architectures have limited responsiveness to abrupt nonlinear variations. Although CNN-Trans benefits from global self-attention and improves overall trend fitting compared with LSTM-based methods, it still shows local oscillatory behavior in fluctuation-sensitive regions, indicating that global dependency modeling alone may be insufficient for robust local feature tracking under noisy conditions.

In contrast, Conformer-KAN achieves more stable local trajectory tracking while preserving key degradation characteristics, including localized recovery-like deviations. This behavior is consistent with the complementary roles of the Conformer encoder and KAN regression head: the Conformer enhances joint local–global temporal representation through gated local convolution and self-attention, while KAN provides adaptive nonlinear mapping for stage-wise SOH evolution. As a result, the proposed model exhibits improved robustness under complex degradation patterns without relying solely on recurrent temporal processing.

The quantitative comparison presented in Table 3 further supports these observations. Conformer-KAN achieves the best overall performance across all evaluation metrics, with MAE = 0.53 ± 0.04 × 10⁻², RMSE = 0.006 ± 0.001, and R² = 0.987 ± 0.003. Compared with AST-LSTM (RMSE = 0.017 ± 0.002), the RMSE is reduced by approximately 64.7%; compared with CNN-Trans (RMSE = 0.010 ± 0.001), the RMSE is reduced by approximately 40.0%. These results indicate that the proposed Conformer-KAN framework provides stronger predictive accuracy for high-noise and strongly nonlinear SOH trajectories under the present evaluation setting.

From the perspective of computational characteristics, Conformer-KAN has a parameter count of 1.42 M, which is comparable to CNN-Trans (1.36 M) and higher than AST-LSTM (0.92 M) and CNN-LSTM (1.18 M). However, its inference time (7.5 ms/sample) is higher than the MLP-based baselines (e.g., 3.8 ms/sample for CNN-Trans), reflecting the additional computational overhead introduced by the spline-based function evaluation in the KAN module. This result suggests a clear accuracy–efficiency trade-off: Conformer-KAN offers the best prediction performance at the cost of increased inference latency, while still maintaining a moderate model size.

Overall, the comparative experiments provide consistent empirical evidence that Conformer-KAN improves SOH estimation accuracy and robustness relative to representative recurrent and hybrid baselines on the NASA B0005 dataset, while also clarifying the associated computational trade-offs.

3.5. Generalization Performance Validation

To further evaluate the generalization capability of the proposed model across different battery packaging formats, material systems, and operating conditions, cross-domain validation experiments are conducted using the Oxford battery aging dataset.

The quantitative prediction results on Oxford Cell 6 are summarized in Table 4. Under this cross-domain setting, Conformer-KAN achieves the best overall performance among the compared methods, with MAE = 0.31 ± 0.03 × 10⁻², RMSE = 0.003 ± 0.001, and R² = 0.994 ± 0.002. Relative to CNN-Trans (RMSE = 0.005 ± 0.001), the RMSE is reduced by approximately 40.0%; relative to CNN-LSTM (RMSE = 0.006 ± 0.001), the reduction is approximately 50.0%. These results suggest that the proposed framework maintains stronger predictive accuracy and stability than representative recurrent and hybrid baselines under distributional shifts between datasets.

The full life cycle SOH prediction curves for Oxford Cell 6 are illustrated in Figure 7. Compared with the NASA B0005 case, the Oxford trajectory is relatively smoother in its global degradation trend but contains dense local fluctuations, making it a complementary test case for evaluating cross-condition robustness. In this setting, AST-LSTM and CNN-LSTM show larger local deviations, while CNN-Trans improves overall trend fitting but still exhibits oscillatory behavior in fluctuation-sensitive regions. By contrast, Conformer-KAN provides closer trajectory alignment to the ground-truth SOH curve across the full life cycle, including local variation regions. This behavior is consistent with the proposed architectural design, in which the Conformer encoder enhances joint local–global temporal representation and the KAN head improves adaptive nonlinear regression for heterogeneous degradation patterns.

From the perspective of computational characteristics, the parameter count and inference time of each model remain consistent with the results reported in Table 3 because the model architectures are unchanged across datasets. Conformer-KAN has a parameter count of 1.42 M and an inference time of 7.5 ms/sample, which is higher in latency than the MLP-based baselines but remains within a moderate model-size range. Therefore, the Oxford results further support an accuracy–efficiency trade-off consistent with previous experiments: the proposed model achieves the strongest predictive performance under cross-domain conditions at the cost of increased inference time due to spline-based KAN computation.

Overall, the cross-domain validation on Oxford Cell 6 provides additional empirical evidence that Conformer-KAN improves SOH estimation accuracy and robustness under heterogeneous operating conditions. At the same time, these results should be interpreted within the scope of the current experimental protocol (single target battery under a hold-out setting), and broader validation on more diverse datasets and deployment scenarios remains an important direction for future work.

4. Conclusions

To address the challenges of strong nonlinearity, transient disturbances, and cross-condition distribution shifts in lithium-ion battery SOH estimation, this study presented a deep learning framework termed Conformer-KAN, which combines a convolution-augmented Conformer encoder with a KAN-based regression module. Under the experimental settings considered in this work, the main findings can be summarized as follows.

(1): Joint local–global temporal representation improves SOH feature modeling. By combining gated local convolution with multi-head self-attention, the Conformer encoder enhances sensitivity to localized degradation fluctuations while preserving the ability to model long-range temporal dependencies. The ablation and comparative results consistently indicate that this local–global encoding strategy provides more accurate trajectory tracking than recurrent-dominant and standard Transformer-based baselines, particularly in fluctuation-sensitive regions.
(2): KAN-based regression improves adaptive nonlinear fitting for complex degradation trajectories. Replacing a conventional MLP head with a spline-parameterized KAN module improves the flexibility of nonlinear function approximation for SOH mapping. The ablation results show that the KAN head contributes additional error reduction, especially when modeling stage-wise and highly nonlinear degradation behavior, supporting its role as an effective nonlinear regressor in the proposed framework.
(3): The proposed framework shows improved predictive accuracy under the current in-domain and cross-domain evaluation settings. On the NASA B0005 test battery and the Oxford Cell 6 cross-domain test battery, Conformer-KAN achieves the best overall performance among the compared methods (e.g., RMSE values of 0.006 and 0.003, respectively, with high $R^{2}$ values). These results provide empirical evidence that the proposed design can improve robustness under heterogeneous battery conditions and distribution shifts within the scope of the datasets and protocols used in this study.

Overall, the results suggest that Conformer-KAN is a promising approach for SOH estimation under complex operating conditions, particularly when both local degradation sensitivity and nonlinear regression capacity are required. However, several limitations should be noted. First, the current evaluation is based on a limited number of public datasets and primarily uses hold-out testing protocols; broader validation with additional chemistries, operating profiles, and repeated cross-validation settings is still needed. Second, although the proposed model achieves the best predictive accuracy, the KAN module introduces additional inference latency relative to MLP-based baselines, which should be considered in resource-constrained BMS deployment scenarios. Third, the present experiments evaluate offline prediction performance, while practical deployment may involve sensor mismatch, missing data, online adaptation, and long-term domain drift.

Future work will therefore focus on (i) broader multi-dataset and multi-condition validation with stronger statistical testing, (ii) lightweight and hardware-aware optimization of the Conformer-KAN architecture for embedded deployment, and (iii) online updating and domain adaptation strategies to improve robustness under real-world operating variability.

Author Contributions

Conceptualization, Y.W.; methodology, Y.W.; software, Y.W.; validation, Y.W. and J.W.; formal analysis, Y.W.; investigation, Y.W.; resources, Y.W.; data curation, Y.W. and J.W.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W.; visualization, Y.W.; supervision, Y.W.; project administration, Y.W.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tang, K.; Luo, B.; Chen, D.; Wang, C.; Chen, L.; Li, F.; Cao, Y.; Wang, C. The State of Health Estimation of Lithium-Ion Batteries: A Review of Health Indicators, Estimation Methods, Development Trends and Challenges. World Electr. Veh. J. 2025, 16, 429. [Google Scholar] [CrossRef]
Zhang, J.; Li, K. State-of-Health Estimation for Lithium-Ion Batteries in Hybrid Electric Vehicles—A Review. Energies 2024, 17, 5753. [Google Scholar] [CrossRef]
Peng, Y.; Hou, Y.; Song, Y.; Pang, J.; Liu, D. Lithium-Ion Battery Prognostics with Hybrid Gaussian Process Function Regression. Energies 2018, 11, 1420. [Google Scholar] [CrossRef]
Li, Y.; Sheng, H.; Cheng, Y.; Stroe, D.-I.; Teodorescu, R. State-of-health estimation of lithium-ion batteries based on semi-supervised transfer component analysis. Appl. Energy 2020, 277, 115504. [Google Scholar] [CrossRef]
Song, Z.; Zhang, H.; Jia, J. Data-Driven State of Health Interval Prediction for Lithium-Ion Batteries. Electronics 2024, 13, 3991. [Google Scholar] [CrossRef]
Li, P.; Zhang, Z.; Xiong, Q.; Ding, B.; Hou, J.; Luo, D.; Rong, Y.; Li, S. State-of-health estimation and remaining useful life prediction for the lithium-ion battery based on a variant long short term memory neural network. J. Power Sources 2020, 459, 228069. [Google Scholar] [CrossRef]
Ma, Y.; Shan, C.; Gao, J.; Chen, H. A novel method for state of health estimation of lithium-ion batteries based on improved LSTM and health indicators extraction. Energy 2022, 251, 123973. [Google Scholar] [CrossRef]
Peng, S.; Wang, Y.; Tang, A.; Jiang, Y.; Kan, J.; Pecht, M. State of health estimation joint improved grey wolf optimization algorithm and LSTM using partial discharging health features for lithium-ion batteries. Energy 2025, 315, 134293. [Google Scholar] [CrossRef]
Lin, M.; Wu, J.; Meng, J.; Wang, W.; Wu, J. State of health estimation with attentional long short-term memory network for lithium-ion batteries. Energy 2023, 268, 126706. [Google Scholar] [CrossRef]
Xu, H.; Wu, L.; Xiong, S.; Li, W.; Garg, A.; Gao, L. An improved CNN-LSTM model-based state-of-health estimation approach for lithium-ion batteries. Energy 2023, 276, 127585. [Google Scholar] [CrossRef]
Qi, J.; Zhu, Y.; He, X.; Wang, J.; Chen, Z.; Li, J.; Li, H. State of Health Estimation for Lithium-Ion Batteries Based on Gray Wolf-Optimized CNN-BiLSTM-Attention Hybrid Model. Energy Fuels 2025, 39, 20682–20698. [Google Scholar] [CrossRef]
Chen, L.; Xie, S.; Lopes, A.M.; Bao, X. A vision transformer-based deep neural network for state of health estimation of lithium-ion batteries. Int. J. Electr. Power Energy Syst. 2023, 152, 109233. [Google Scholar] [CrossRef]
Zhao, J.; Feng, X.; Wang, J.; Lian, Y.; Ouyang, M.; Burke, A.F. Battery fault diagnosis and failure prognosis for electric vehicles using spatio-temporal transformer networks. Appl. Energy 2023, 352, 121949. [Google Scholar] [CrossRef]
Gu, X.; See, K.W.; Li, P.; Shan, K.; Wang, Y.; Zhao, L.; Lim, K.C.; Zhang, N. A novel state-of-health estimation for the lithium-ion battery using a convolutional neural network and transformer model. Energy 2023, 262, 125501. [Google Scholar] [CrossRef]
Zhang, H.; Wang, J. Research on State of Health Assessment of Lithium-Ion Batteries Using Actual Measurement Data Based on Hybrid LSTM–Transformer Model. Symmetry 2026, 18, 169. [Google Scholar] [CrossRef]
Wu, J.; Cui, X.; Meng, J.; Peng, J.; Lin, M. Data-driven transfer-stacking-based state of health estimation for lithium-ion batteries. IEEE Trans. Ind. Electron. 2023, 71, 604–614. [Google Scholar] [CrossRef]
Li, W.; Shang, Z.; Zhang, J.; Gao, M.; Qian, S. A novel unsupervised anomaly detection method for rotating machinery based on memory augmented temporal convolutional autoencoder. Eng. Appl. Artif. Intell. 2023, 123, 106312. [Google Scholar] [CrossRef]
Xie, T.; Xu, Q.; Jiang, C. Anomaly detection for multivariate times series through the multi-scale convolutional recurrent variational autoencoder. Expert Syst. Appl. 2023, 231, 120725. [Google Scholar] [CrossRef]
Jiangyan, Z.; Ma, J.; Wu, J. A regularized constrained two-stream convolution augmented transformer for aircraft engine remaining useful life prediction. Eng. Appl. Artif. Intell. 2024, 133, 108161. [Google Scholar] [CrossRef]
Schmidt-Hieber, J. The Kolmogorov–Arnold representation theorem revisited. Neural Netw. 2021, 137, 119–126. [Google Scholar] [CrossRef]
Tian, J.; Xiong, R.; Shen, W. State-of-health estimation based on differential temperature for lithium ion batteries. IEEE Trans. Power Electron. 2020, 35, 10363–10373. [Google Scholar] [CrossRef]
Al-Smadi, M.K.; Abu Qahouq, J.A. State of Health Estimation for Lithium-Ion Batteries Based on Transition Frequency’s Impedance and Other Impedance Features with Correlation Analysis. Batteries 2025, 11, 133. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the Conformer-KAN framework. The proposed method consists of multi-view feature inputs, shallow convolutional embedding, a Conformer encoder, a multi-dimensional attention module, and a KAN-based modeling head. The Conformer jointly models local temporal patterns and global dependencies through gated local convolution and self-attention mechanisms, while KAN enhances nonlinear function approximation for complex degradation mapping.

Figure 2. Structure of the Conformer encoder. Through alternating stacking of local convolution and self-attention modules, the encoder simultaneously captures transient local fluctuations and long-term degradation trends within the same hierarchical level.

Figure 3. Architecture of the KAN module. Compared with fixed activation functions, KAN provides enhanced function space exploration capability and improves representation accuracy for nonlinear degradation features.

Figure 4. SOH degradation trajectories of the NASA battery dataset. The curves depict the evolution of SOH with cycling number for four batteries (B0005, B0006, B0007, and B0018), revealing pronounced nonlinearity and capacity recovery behavior.

Figure 5. Comparison of SOH prediction results on the NASA B0005 dataset using different model variants. (a) Transformer-MLP, exhibiting pronounced fluctuations and insufficient local sensitivity; (b) Conformer-MLP, where convolutional enhancement improves local tracking performance; (c) Transformer-KAN, demonstrating enhanced nonlinear approximation capability enabled by KAN; and (d) Conformer-KAN, which integrates both advantages and achieves the highest prediction accuracy with the lowest uncertainty.

Figure 6. Comparison of SOH prediction results on the NASA B0005 battery. (a) AST-LSTM, exhibiting evident prediction lag and difficulty in capturing rapid capacity fluctuations; (b) CNN-LSTM, showing improved local feature extraction but still suffering from considerable overall deviation; (c) CNN-Trans, achieving better global trend fitting but presenting local oscillations and higher uncertainty; and (d) Conformer-KAN, which accurately tracks nonlinear degradation trends with the highest fitting accuracy and stability.

Figure 7. Comparison of SOH generalization prediction results on the Oxford Cell 6 battery. (a) AST-LSTM, showing large prediction deviations and limited adaptability to cross-domain data distributions; (b) CNN-LSTM, capable of tracking the overall trend but exhibiting noticeable fluctuations; (c) CNN-Trans, achieving improved overall fitting but still presenting deviations in certain aging stages; and (d) Conformer-KAN, which maintains high prediction accuracy and close agreement with ground-truth values in cross-dataset testing, demonstrating the strongest generalization capability.

Table 1. Comparison of key parameters of the experimental datasets.

Parameter	NASA	Oxford
Battery IDs	B0005–B0007, B0018	Cell 1–Cell 8
Battery format	Cylindrical	Prismatic pouch
Cathode material	Li(NiCoAl)O₂	LiCoO₂
Rated capacity	2.0 Ah	0.74 Ah
Ambient temperature	24 °C (room temperature)	40 °C (high temperature)
Charging strategy	0.75 C CC–CV	1 C CC–CV
Validation objective	Prediction accuracy	Generalization performance

Table 2. Quantitative ablation results on the NASA B0005 test battery (mean ± std over repeated runs), together with model complexity and inference cost.

Model	MAE (×10⁻²)	RMSE	R²	Params (M)	Inference Time (ms/Sample)
Transformer-MLP	4.80 ± 0.25	0.019 ± 0.003	0.886 ± 0.015	1.25	2.5
Conformer-MLP	0.91 ± 0.08	0.011 ± 0.002	0.939 ± 0.011	1.58	3.1
Transformer-KAN	0.86 ± 0.07	0.009 ± 0.002	0.975 ± 0.006	1.10	6.8
Conformer-KAN	0.53 ± 0.04	0.006 ± 0.001	0.987 ± 0.003	1.42	7.5

Table 3. Quantitative comparison results on the NASA B0005 test battery (mean ± std over repeated runs), together with model complexity and inference cost.

Model	MAE (×10⁻²)	RMSE	R²	Params (M)	Inference Time (ms/Sample)
AST-LSTM	2.80 ± 0.18	0.017 ± 0.002	0.898 ± 0.014	0.92	2.2
CNN-LSTM	1.98 ± 0.14	0.013 ± 0.002	0.913 ± 0.012	1.18	2.9
CNN-Trans	1.45 ± 0.10	0.010 ± 0.001	0.970 ± 0.007	1.36	3.8
Conformer-KAN	0.53 ± 0.04	0.006 ± 0.001	0.987 ± 0.003	1.42	7.5

Table 4. Quantitative evaluation of generalization performance on the Oxford Cell 6 dataset (suggested placeholder values for manuscript drafting; replace with measured results).

Model	MAE (×10⁻²)	RMSE	R²	Params (M)	Inference Time (ms/Sample)
AST-LSTM	2.53 ± 0.16	0.009 ± 0.001	0.946 ± 0.010	0.92	2.2
CNN-LSTM	0.79 ± 0.06	0.006 ± 0.001	0.972 ± 0.006	1.18	2.9
CNN-Trans	0.53 ± 0.05	0.005 ± 0.001	0.984 ± 0.004	1.36	3.8
Conformer-KAN	0.31 ± 0.03	0.003 ± 0.001	0.994 ± 0.002	1.42	7.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Wang, J. State of Health Estimation for Lithium-Ion Batteries Based on Conformer-KAN. Algorithms 2026, 19, 203. https://doi.org/10.3390/a19030203

AMA Style

Wang Y, Wang J. State of Health Estimation for Lithium-Ion Batteries Based on Conformer-KAN. Algorithms. 2026; 19(3):203. https://doi.org/10.3390/a19030203

Chicago/Turabian Style

Wang, Yuchen, and Jingyu Wang. 2026. "State of Health Estimation for Lithium-Ion Batteries Based on Conformer-KAN" Algorithms 19, no. 3: 203. https://doi.org/10.3390/a19030203

APA Style

Wang, Y., & Wang, J. (2026). State of Health Estimation for Lithium-Ion Batteries Based on Conformer-KAN. Algorithms, 19(3), 203. https://doi.org/10.3390/a19030203

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

State of Health Estimation for Lithium-Ion Batteries Based on Conformer-KAN

Abstract

1. Introduction

2. Conformer-KAN Method

2.1. Overall Network Architecture

2.2. Multi-View Feature Representation

2.3. Conformer Encoder

2.3.1. Gated Local Convolution

2.3.2. Conformer Encoder Structure

2.4. Multi-Dimensional Attention Fusion

2.5. KAN-Based Modeller

2.6. SOH Prediction and Optimization

3. Results and Analysis

3.1. Dataset Construction

3.2. Experimental Setup

3.3. Ablation Study

3.4. Comparative Experiments

3.5. Generalization Performance Validation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI