A Physics-Informed Recurrent Neural Network with Fractional-Order Kinetics for Robust Lithium-Ion Battery State of Charge Estimation

Ke, Le; Dang, Lujuan

doi:10.3390/sym18040652

Open AccessArticle

A Physics-Informed Recurrent Neural Network with Fractional-Order Kinetics for Robust Lithium-Ion Battery State of Charge Estimation

by

Le Ke

and

Lujuan Dang

^*

State Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(4), 652; https://doi.org/10.3390/sym18040652

Submission received: 13 February 2026 / Revised: 15 March 2026 / Accepted: 18 March 2026 / Published: 14 April 2026

(This article belongs to the Special Issue Symmetry or Asymmetry in Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Accurate State of Charge (SOC) estimation is critical for the safety and efficiency of Battery Management Systems (BMS). While data-driven methods have shown promise, they often exhibit limited generalization capability due to the lack of physical constraints. Incorporating symmetry in the battery, such as through the use of Physics-Informed Neural Networks (PINNs), can mitigate this issue. However, PINNs typically rely on integer-order equivalent circuit model differential equations, which fail to accurately describe the complex electrochemical relaxation processes. To bridge this gap, we propose a novel Fractional Differential Physics-Informed Neural Network (FDE-PINN) framework. Unlike traditional approaches, this method embeds a Fractional-Order Equivalent Circuit Model (FO-ECM) into the Gated Recurrent Unit (GRU) architecture to explicitly capture the anomalous diffusion and long-memory effects inherent in battery polarization. Specifically, the network is trained by minimizing a composite loss function that integrates the data fitting error with residuals from fractional-order governing equations, including Coulomb counting and fractional voltage dynamics. Extensive experiments on the Panasonic 18650PF dataset and CALCE A123 dataset verify the method’s superiority. Results demonstrate that the proposed FDE-GRU model achieves an average MSE of

14.29 \times 10^{- 4}

(with an MAE of 2.43% and RMSE of 3.23%) on the NCA chemistry and

26.24 \times 10^{- 4}

(with an MAE of 3.75% and RMSE of 5.09%) on the LiFePO₄ chemistry, significantly outperforming traditional methods by reducing the estimation error by 35.6% and 26.2% compared to the standard GRU, respectively.

Keywords:

lithium-ion battery; State of Charge (SOC); Physics-Informed Neural Network (PINN); fractional differential equation; Gated Recurrent Unit (GRU)

1. Introduction

Global efforts towards carbon neutrality have accelerated the electrification of the transportation sector and the deployment of renewable energy grid systems [1,2,3]. As the core power source for Electric Vehicles (EVs) and stationary energy storage, Lithium-ion batteries (LIBs) are pivotal due to their high energy density and long cycle life. In the operation of Battery Management Systems (BMSs), accurate estimation of the SOC is of paramount importance [4,5,6]. Accurate SOC estimation is critical for battery safety (preventing over-charging and over-discharging), optimal energy utilization, and reliable range prediction in electric vehicles [7,8,9].

Since SOC involves internal electrochemical reactions and cannot be directly measured, it must be inferred from observable variables such as terminal voltage, current, and temperature. This estimation task is complicated by the battery’s highly nonlinear dynamics, hysteresis effects, and the time-varying nature of its internal parameters due to aging and environmental conditions [10]. Consequently, developing a robust and high-precision SOC estimation algorithm remains a significant challenge.

Existing SOC estimation approaches are generally categorized into model-based methods and data-driven methods. Model-based methods typically employ Equivalent Circuit Models (ECMs) or Electrochemical Models (EMs) combined with adaptive filters, such as the Kalman Filter (KF) and its variants. Researchers often leverage the intrinsic electrochemical symmetry within the battery, such as the analogous solid-phase diffusion processes at both electrodes or the symmetry in thermodynamic behavior, to construct simplified yet effective ECMs or to establish governing equations for EMs. Conventional ECMs utilize integer-order differential equations (IDEs) to describe battery dynamics. However, electrochemical studies suggest that the diffusion processes and the memory effect inside LIBs are better characterized by fractional-order calculus [11,12,13]. Consequently, Fractional-Order Models (FOMs) have been introduced to capture the solid-phase diffusion and double-layer capacitance behaviors more accurately than integer-order models [14,15]. However, while these methods offer enhanced physical interpretability, their practical implementation is severely bottlenecked by the influence of model accuracy and model complexity. Traditional model-based solvers, such as the Fractional Extended Kalman Filter (F-EKF), demand the continuous evaluation of non-local memory terms at every time step. This imposes significant computational and memory burdens, making online parameter identification and real-time execution on resource-constrained BMS highly challenging [16,17].

In contrast, data-driven methods, particularly Deep Learning (DL) algorithms, have gained prominence due to their powerful feature extraction capabilities [18,19]. Architectures such as Multi-Layer Perceptrons (MLP), Long Short-Term Memory (LSTM) networks, and Transformers have been extensively applied to map voltage, current, and temperature sequences directly to SOC. Among these, the Gated Recurrent Unit (GRU) is favored for its simplified structure and efficiency in handling time-series data compared to LSTM [20,21]. However, pure data-driven models operate as black boxes. They lack physical constraints, often leading to physically inconsistent predictions when facing unseen data or noisy inputs [22,23]. Furthermore, they require massive amounts of high-quality labeled data, which is expensive to obtain.

To bridge the gap between physics-based interpretability and data-driven flexibility, Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm. PINNs embed physical laws (typically represented by differential equations) into the loss function of the neural network, guiding the training process to satisfy both data patterns and physical constraints [24]. While recent studies have successfully applied PINNs to battery state estimation, the vast majority of these frameworks are rigidly constrained by standard integer-order circuit equations [25,26]. Because these integer-order PINNs approximate system dynamics through exponential decay, they may not optimally capture the complex, anomalous diffusion and power-law relaxation (Mittag-Leffler) behaviors characteristic of real-world porous electrodes. Consequently, relying solely on standard IDEs can constrain the estimation precision and physical fidelity, particularly under highly dynamic or extreme thermal conditions.

Addressing these limitations, this paper proposes a novel Physics-Informed Neural Network framework named FDE-GRU, which seamlessly integrates Fractional-Order Differential Equations (FDEs) into a GRU-based architecture for robust SOC estimation. A major challenge in combining FDEs with deep learning is the non-local property of fractional derivatives, which traditionally disrupts the parallel efficiency of backpropagation. To overcome this, our framework employs advanced tensor vectorization techniques to parallelize the Grünwald–Letnikov (G-L) discrete approximation across batch dimensions. This algorithmic innovation eliminates the computational bottleneck of fractional calculus, making the end-to-end training and inference of the fractional-order dynamic system highly efficient and feasible for real-world deployment. Instead of relying solely on data mapping or complex mathematical solving, we embed the fractional-order circuit dynamics, specifically the fractional derivative of voltage and the state equation, directly into the network’s loss function as a regularization term. This approach leverages the GRU’s ability to handle nonlinear time-series data while ensuring the results adhere to the fractional-order physical laws of the battery.

The main contributions of this work are summarized as follows:

We propose a hybrid architecture that combines a GRU with a fractional-order physical model, which embeds fractional-order calculus constraints into a PINN for battery SOC estimation.
We formulate a composite loss function that integrates the data fitting error with the residuals of the fractional-order governing equations, specifically the Coulomb counting (mass conservation) and the fractional voltage dynamics equations. This constraint enforces physical consistency and improves the model’s generalization ability under noisy conditions.
We introduce an efficient, vectorized computational strategy to overcome the severe performance bottlenecks typically associated with fractional-order derivatives. By parallelizing the G-L discrete approximation using advanced tensor operations, we drastically reduce computational overhead, making the end-to-end training and real-time inference of the FDE-GRU framework highly feasible.
The proposed FDE-GRU is extensively evaluated and compared against traditional methods, including MLP, standard GRU, LSTM, and Transformer models. Experimental results demonstrate that FDE-GRU achieves superior accuracy and robustness.

The remainder of this paper is organized as follows. Section 2 briefly reviews the preliminaries of Fractional Calculus and the GRU network. Section 3 details the proposed FDE-GRU framework and the physics-informed training strategy. Section 4 presents the experimental setup and discusses the results. Finally, Section 5 concludes the paper.

2. Preliminaries

To establish a robust and interpretable data-driven framework for battery SOC estimation, it is essential to integrate fundamental physical principles with advanced sequential learning techniques. This section provides the necessary theoretical background, which is structured into three interconnected parts. First, we introduce the fractional calculus, which generalizes conventional derivatives to model complex memory-dependent dynamics. Second, we derive the Fractional-Order Equivalent Circuit Model (FO-ECM), a physics-based model that accurately captures the non-ideal relaxation behavior of lithium-ion batteries using fractional differential equations. Finally, we describe the GRU network, a powerful data-driven model for processing time-series data, which will serve as the computational backbone of our proposed framework.

2.1. Fractional Calculus and Discrete Approximation

To accurately characterize the aforementioned non-ideal dynamics, fractional calculus generalizes the differentiation order from integer n to an arbitrary real number

α

. Among various definitions, the G-L definition is particularly suitable for time-series processing in digital systems and recurrent neural networks due to its discrete recursive nature [27,28].

For a continuous function

f (t)

, the fractional derivative of order

α

(

0 < α < 1

) at time t is defined as the limit of a difference quotient [29]:

{}_{GL}D_{t}^{α} f (t) = lim_{h \to 0} \frac{1}{h^{α}} \sum_{j = 0}^{\infty} {(- 1)}^{j} (\binom{α}{j}) f (t - j h),

(1)

where

{}_{GL}D_{t}^{α}

is the G-L fractional derivative of order

α

with respect to t, h is the sampling time step, and

(\binom{α}{j})

are the binomial coefficients.

In practical BMS applications with discrete sampling, the infinite sum is truncated to a memory window of size k. The discrete approximation is formulated as:

{}_{GL}D_{t}^{α} f (t) \approx \frac{1}{h^{α}} \sum_{j = 0}^{k} w_{j}^{(α)} f (t - j h) .

(2)

Here, the weighting coefficients

w_{j}^{(α)}

can be computed recursively to reduce computational complexity:

w_{0}^{(α)} = 1, w_{j}^{(α)} = (1 - \frac{α + 1}{j}) w_{j - 1}^{(α)}, j = 1, 2, \dots, k .

(3)

Equation (2) explicitly shows that the current state depends on the weighted history of previous states, mathematically enforcing the memory effect inherent in battery dynamics.

2.2. FO-ECM

To address the limitations of ideal integer-order components in describing porous electrode behavior, the FO-ECM structure is presented. This model is a generalization of the standard First-Order RC circuit. The fractional-order equivalent circuit is depicted in Figure 1, where

U_{O C V}

denotes the open-circuit voltage,

R_{0}

the ohmic internal resistance,

C_{P}

a Constant Phase Element (CPE) in parallel with a polarization resistance

R_{1}

, and

U_{1}

the polarization voltage across the CPE loop. The current flowing through the circuit is I, and V represents the measurable terminal voltage of the battery.

The impedance of a CPE is defined in the frequency domain as

Z_{C P E} = \frac{1}{C_{p}^{α}}

, where

C_{p}

is the generalized capacitance and

α

(

0 < α < 1

) is the dispersion coefficient reflecting the deviation from ideal capacitive behavior. It is important to note that the dispersion coefficient

α

in the CPE impedance here corresponds precisely to the fractional order

α

of the G-L derivative

{}_{GL}D_{t}^{α}

defined in Equation (1) [30]. This establishes a direct mathematical connection between the non-ideal capacitive behavior in the frequency domain and the memory-effect dynamics described by the fractional derivative in the time domain. Applying the Coulomb counting principle and Kirchhoff’s laws while leveraging the symmetry, we obtain a system of Fractional Differential Equations (FDEs) that governs the electrical behavior of the battery [31]:

\frac{d S O C}{d t} = - \frac{δ}{Q} I (t),

(4)

D_{t}^{α} U_{1} (t) = - \frac{1}{R_{1} C_{p}} U_{1} (t) + \frac{1}{C_{p}} I (t),

(5)

V (t) = U_{O C V} (S O C) + U_{1} (t) - I (t) R_{0},

(6)

where

δ

is the Coulombic efficiency, Q is the battery capacity,

V (t)

denotes the terminal voltage,

I (t)

is the load current (positive for discharge), and

U_{1} (t)

represents the polarization voltage across the CPE loop. Equations (4)–(6) serve as the core physical constraint in our framework.

The advantage of using this fractional-order formulation over standard integer-order models lies in the representation of relaxation processes. Standard models assume that polarization decays following a pure exponential law (

e^{- t / τ}

), which often leads to modeling errors [32,33]. In contrast, electrochemical impedance spectroscopy (EIS) studies reveal that lithium-ion diffusion in porous electrodes exhibits anomalous diffusion and long-memory characteristics, especially at low temperatures or high aging states [30,34]. The fractional-order model inherently describes these complex dynamics through a slower, power-law-like relaxation (Mittag-Leffler function behavior), thereby providing a superior physical basis for state estimation [35].

2.3. GRU Network

The GRU, introduced by Cho et al., is a variant of Recurrent Neural Networks (RNNs) designed to solve the vanishing gradient problem inherent in standard RNNs [36]. Compared to the LSTM network, the GRU has a simplified architecture with fewer parameters (two gates vs. three gates), making it more computationally efficient for real-time BMS applications while maintaining comparable performance in capturing long-term temporal dependencies.

In the context of SOC estimation, the GRU network processes a time-series input sequence

X = {x_{1}, x_{2}, \dots, x_{L}}

, where L is the sequence length. As implemented in our baseline model, the input vector at each time step t consists of three observable variables:

x_{t} = {[V (t), I (t), T (t)]}^{⊤},

(7)

where

V (t)

,

I (t)

, and

T (t)

represent the terminal voltage, load current, and battery temperature, respectively.

The internal mechanism of a GRU cell at time step t is governed by the following transition equations:

\begin{matrix} z_{t} & = σ (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z}) \\ r_{t} & = σ (W_{r} x_{t} + U_{r} h_{t - 1} + b_{r}) \\ {\tilde{h}}_{t} & = tanh (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1}) + b_{h}) \\ h_{t} & = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t} . \end{matrix}

(8)

Here,

z_{t}

is the update gate, which determines how much of the past information needs to be passed along to the future;

r_{t}

is the reset gate, which controls how much of the past information to forget.

σ

denotes the sigmoid activation function, ⊙ represents the element-wise Hadamard product, and

h_{t}

is the hidden state output at time t. In these equations,

W_{z}

,

W_{r}

,

W_{h}

and

U_{z}

,

U_{r}

,

U_{h}

represent the learnable weight matrices for the input and recurrent connections, respectively, while

b_{z}

,

b_{r}

,

b_{h}

are the corresponding bias vectors. These parameters are optimized during the training process to capture the complex temporal dynamics of battery behavior.

For the SOC estimation task, we employ a Many-to-One architecture. The GRU layer processes the entire input sequence, and the hidden state of the final time step,

h_{L}

, is treated as the high-level feature representation of the current battery state. This final state is then fed into a fully connected (linear) layer to predict the scalar SOC:

{\hat{S O C}}_{t} = W_{f c} h_{L} + b_{f c},

(9)

where

h_{L} \in R^{d_{h}}

is the final hidden state of the GRU layer, representing a high-dimensional feature vector that encapsulates the temporal dynamics learned from the entire input sequence. The parameters

W_{f c} \in R^{1 \times d_{h}}

and

b_{f c} \in R

are the learnable weight matrix and bias term of the fully connected layer, respectively. This linear transformation converts the rich, high-dimensional representation

h_{L}

into a single scalar SOC estimate. This architecture serves as the robust data-driven backbone for our proposed physics-informed framework.

3. Methodology

3.1. Problem Formulation

The core challenge in SOC estimation is to develop a robust mapping from measurable battery variables to the internal state, while ensuring physical consistency and generalizability. Formally, given a time-series of measured terminal voltage

V (t)

, load current

I (t)

, and battery temperature

T (t)

, the objective is to estimate the SOC at each time step t:

\hat{S O C} (t) = F (V (t - k : t), I (t - k : t), T (t - k : t); Θ),

(10)

where

F

denotes the estimation model, k is the historical window length, and

Θ

represents the model parameters. Traditional data-driven approaches directly learn

F

from labeled data, but often produce physically implausible predictions. In contrast, our approach integrates the governing FDEs of the battery dynamics as constraints during training, ensuring that the learned mapping

F

respects the underlying electrochemical physics.

3.2. Proposed FDE-GRU Architecture

3.2.1. Overall of the FDE-GRU Framework

The proposed framework seamlessly integrates a data-driven Recurrent Neural Network with a FO-ECM. As illustrated in Figure 2, the architecture consists of two parallel paths: the estimation path and the physics-informed regularization path. The estimation path takes the measured terminal voltage V, load current I, and temperature T as inputs, and processes them through a GRU network to extract temporal features and output the estimated SOC (

\hat{S O C}

). The physics-informed path incorporates the FO-ECM, which includes the ohmic resistance

R_{0}

, polarization resistance

R_{1}

, and fractional capacitance

C_{P}

, to compute the open-circuit voltage

U_{O C V}

and the polarization voltage

U_{1}

. The fractional derivative

D_{t}^{α} U_{1}

of the polarization voltage is then calculated to enforce the underlying fractional-order dynamics. The total loss

L_{t o t a l}

is a composite of the data-fitting loss

L_{d a t a}

, the mass conservation loss

L_{m a s s}

(based on Coulomb counting), and the fractional polarization loss

L_{f r a c}

(based on the FDE), ensuring that the estimated SOC is both data-consistent and physics-compliant.

3.2.2. Backbone Network: GRU

The backbone of the estimation pathway is a standard GRU network. While recent high-capacity architectures like Transformers excel in sequence modeling, their quadratic computational complexity and large memory footprint pose significant barriers for real-time inference on resource-constrained BMS microcontrollers [37,38]. The GRU is chosen as the optimal backbone because its recurrent hidden state naturally captures the time-dependent memory effects of battery relaxation, while its simplified gating mechanism (compared to LSTM) ensures high computational efficiency and a lightweight parameter space, making edge deployment highly feasible. For each time step t, the GRU receives an input vector

x_{t} = {[V (t), I (t), T (t)]}^{⊤}

and updates its hidden state

h_{t}

according to the gating mechanisms defined in Equation (8). After processing a sequence of length L, the final hidden state

h_{L}

is passed through a fully connected layer to yield the SOC estimate

\hat{S O C} (t)

, as described by Equation (9). This many-to-one architecture ensures that the prediction at each time step is informed by a context window of recent measurements.

3.2.3. Physical Parameters

To explicitly embed the battery’s electrical characteristics into the learning process, we parameterize key physical variables as trainable tensors within the neural network. Specifically, the ohmic resistance

R_{0}

, polarization resistance

R_{1}

, and the generalized capacitance

C_{p}

of the CPE are defined as learnable parameters, initialized based on typical values for lithium-ion cells. These parameters are optimized jointly with the GRU’s weights during training, allowing the model to adapt to the specific battery under study. The fractional order

α

, which characterizes the dispersion of the CPE and the memory effects in the diffusion process, is treated as a fixed hyperparameter determined by the battery chemistry. This separation ensures that the fundamental fractional-order nature of the dynamics is preserved while allowing the electrical parameters to be fine-tuned from data.

3.2.4. Physics-Informed Loss Formulation

To ensure the estimated SOC adheres to electrochemical laws, we formulate physical residuals based on the constraint equations derived in Section 2.2. These residuals quantify the discrepancy between the left-hand side (LHS) and right-hand side (RHS) of each constraint, and minimizing them enforces physical consistency.

1. Mass Conservation Residual (

L_{m a s s}

): Based on the Coulomb counting principle, which inherently reflects the symmetry in charge conservation, the rate of change of SOC is proportional to the load current. In our implementation, according to Equation (4), we define the residual as:

L_{m a s s} = {∥\frac{δ}{Q} I (t) + \frac{d \hat{S O C}}{d t}∥}_{2}^{2},

(11)

where

\hat{S O C}

is estimated by the GRU network (Equation (9)). In the discrete domain,

\frac{d \hat{S O C}}{d t}

is approximated by the finite difference between consecutive time steps.

2. Fractional Polarization Residual (

L_{f r a c}

): The polarization voltage

U_{1}

is first derived from Kirchhoff’s voltage law according to Equation (6):

U_{1} (t) = V (t) - U_{O C V} ({\hat{S O C}}_{t}) + I (t) R_{0},

(12)

where

U_{O C V}

is modeled by a mapping layer. Substituting Equation (12) into Equation (5) yields the residual term:

L_{f r a c} = {∥D_{t}^{α} U_{1} (t) + \frac{1}{R_{1} C_{p}} U_{1} (t) - \frac{1}{C_{p}} I (t)∥}_{2}^{2} .

(13)

To efficiently compute the fractional derivative

D_{t}^{α} U_{1} (t)

, we utilize the vmap (vectorizing map) operator from the functorch library. This allows for the parallel execution of the G-L convolution across the batch dimension without explicit loops, significantly accelerating the training process. The discrete G-L approximation in Equation (2) requires truncating the infinite memory window to a finite length k. We set

k = 10

in all experiments, meaning that the fractional derivative at each time step depends on the current and the past 10 historical states. The choice of

k = 10

is motivated by the mathematical properties of the G-L approximation and our underlying computational architecture. Theoretically, the G-L weighting coefficients

w_{j} (α)

inherently decay as the index j increases. Given the specific sampling rate of the dataset, the coefficients for

j > 10

become sufficiently small, indicating that older historical states have a negligible contribution to the current polarization dynamics. Furthermore, this theoretical truncation aligns seamlessly with our parallelization strategy. Because the computational graph of the vmap operator scales linearly with k, setting

k = 10

effectively captures the dominant fractional memory effects without causing a disproportionate expansion in computational memory footprint and processing overhead. Thus,

k = 10

serves as an optimal structural design choice, ensuring reliable physical constraints while satisfying the resource-constrained hardware.

3.2.5. Implementation of Fractional Calculus

Efficient computation of fractional derivatives is crucial for the practical training of the FDE-GRU. We implement the discrete G-L derivative (Equation (2)) using a one-dimensional convolution (Conv1d) operation, which leverages optimized parallel computations on modern hardware. The G-L weights

w_{j}^{(α)}

are precomputed recursively using Equation (3) and stored as the kernel of a Conv1d layer with no bias. Given a sequence of polarization voltages

U_{1} (t)

of length N, the fractional derivative

D_{t}^{α} U_{1} (t)

is obtained by convolving this sequence with the G-L kernel. This approach efficiently captures the long-range dependencies inherent in fractional calculus.

To further accelerate training, we employ the vmap (vectorizing map) operator from the PyTorch functorch library. vmap automatically vectorizes the convolution operation, enabling parallel computation of fractional derivatives for all samples in a batch without explicit Python loops. This results in a significant reduction in computational time, making the physics-informed training feasible for large datasets. The memory window length k for the G-L approximation is set to 10, balancing accuracy and computational efficiency. In our experiments, this configuration achieves a close approximation to the ideal fractional derivative while maintaining tractable training times. Notably, while the inclusion of physical constraints leads to a modest increase in training time compared to a standard GRU (approximately 25% slower), once trained, the inference speed of FDE-GRU is virtually identical to that of the standard GRU, ensuring real-time deployability in embedded BMS hardware.

3.3. Optimization Algorithm

The training process is formulated as a multi-objective optimization problem. The total loss function

L_{t o t a l}

is a summation of the data-fitting error (MSE) and the physical residuals:

L_{t o t a l} = MSE (\hat{S O C}, S O C_{t r u e}) + L_{m a s s} + L_{f r a c},

(14)

where

MSE (\hat{S O C}, S O C_{t r u e}) = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{S O C}}_{i} - S O C_{t r u e, i})}^{2}

.

The objective of training is to minimize

L_{t o t a l}

with respect to the GRU network parameters and the trainable physical parameters (

R_{0}

,

R_{1}

,

C_{p}

). The detailed training procedure is summarized in Algorithm 1.

Algorithm 1 Physics-Informed Training Strategy for FDE-GRU

Require: Dataset

D = {(X_{t}, y_{t})}

, where

X_{t} = {[V (t), I (t), T (t)]}

is a sequence of voltage, current and temperature measurements of length L, and

y_{t} = {{SOC}_{true, t}}

are the corresponding true SOC values for each time step; Initial weights

θ = {W_{z}, W_{r}, W_{h}, U_{z}, U_{r}, U_{h}, b_{z}, b_{r}, b_{h}}

; Physical parameters

ϕ = {R_{0}, R_{1}, C_{p}}

Require: Hyperparameters: Learning rate

η

; Fractional order

α

; Batch size B
1: Initialize network parameters

θ

and physics parameters

ϕ

2: for epoch

= 1

to

N_{e p o c h s}

do
3: for each batch

(x_{t}, y_{t})

in

D

do
4: Forward Pass:
5:

\hat{S O C} \leftarrow {GRU}_{θ} (X_{t})

6: Physics Computation:
7: Calculate

U_{O C V}

via mapping layer
8: Derive polarization voltage

U_{1} \leftarrow V - U_{O C V} + I \cdot R_{0}

9: Compute

D_{t}^{α} U_{1}

using vectorized G-L solver (vmap)
10: Loss Calculation:
11:

L_{d a t a} \leftarrow {∥ \hat{S O C} - y_{t} ∥}^{2}

12:

L_{m a s s} \leftarrow {∥ \frac{δ}{Q} I + Δ \hat{S O C} ∥}^{2}

13:

L_{f r a c} \leftarrow {∥ D_{t}^{α} U_{1} + \frac{1}{R_{1} C_{p}} U_{1} - \frac{I}{C_{p}} ∥}^{2}

14:

L_{t o t a l} \leftarrow L_{d a t a} + L_{m a s s} + L_{f r a c}

15: Backpropagation:
16: Compute gradients

\nabla_{θ, ϕ} L_{t o t a l}

17: Update

θ, ϕ

using Adam optimizer
18: end for
19: end for
20: return Optimized model

θ^{*}, ϕ^{*}

Adam Optimization Algorithm

The Adam (Adaptive Moment Estimation) optimizer [39] is employed to minimize the composite loss function

L_{t o t a l}

. Unlike standard stochastic gradient descent, Adam computes adaptive learning rates for each parameter by estimating the first and second moments of the gradients. This approach is particularly suitable for training PINNs with potentially ill-conditioned loss landscapes due to the coupling between data-driven and physics-based terms [40,41].

Given the model parameters

θ

(including GRU weights and biases) and physical parameters

ϕ = {R_{0}, R_{1}, C_{p}}

, let

Θ = {θ, ϕ}

denote the complete set of trainable parameters. At each training iteration t, the algorithm proceeds as follows:

(1): Gradient Computation: The total gradient $g_{t}$ with respect to all trainable parameters is computed via backpropagation:

$g_{t} = \nabla_{Θ} L_{t o t a l} = {[\frac{\partial L_{t o t a l}}{\partial θ}, \frac{\partial L_{t o t a l}}{\partial R_{0}}, \frac{\partial L_{t o t a l}}{\partial R_{1}}, \frac{\partial L_{t o t a l}}{\partial C_{p}}]}^{⊤} .$

(15)

This gradient incorporates contributions from both the data-fitting loss and the physics-based residuals, ensuring that parameter updates satisfy both empirical observations and physical laws.
(2): Moment Estimation: Adam maintains exponentially decaying moving averages of past gradients (first moment, $m_{t}$ ) and squared gradients (second moment, $v_{t}$ ):

$\begin{matrix} m_{t} & = β_{1} m_{t - 1} + (1 - β_{1}) g_{t}, \end{matrix}$

(16)

$\begin{matrix} v_{t} & = β_{2} v_{t - 1} + (1 - β_{2}) g_{t}^{2}, \end{matrix}$

(17)

where $β_{1} = 0.9$ and $β_{2} = 0.999$ are decay rates controlling the exponential decay of these moving averages. The first moment $m_{t}$ can be interpreted as an estimate of the gradient’s mean, while the second moment $v_{t}$ estimates the uncentered variance.
(3): Bias Correction: Since $m_{t}$ and $v_{t}$ are initialized as zero vectors, they are biased toward zero during the initial training steps. To counteract this bias, corrected estimates are computed:

$\begin{matrix} {\hat{m}}_{t} & = \frac{m_{t}}{1 - β_{1}^{t}}, \end{matrix}$

(18)

$\begin{matrix} {\hat{v}}_{t} & = \frac{v_{t}}{1 - β_{2}^{t}} . \end{matrix}$

(19)
(4): Parameter Update: Finally, the parameters are updated using the following rule:

$Θ_{t + 1} = Θ_{t} - η \frac{{\hat{m}}_{t}}{\sqrt{{\hat{v}}_{t}} + ϵ},$

(20)

where $η$ is the learning rate (selected from ${1 \times 10^{- 3}, 5 \times 10^{- 4}, 2.5 \times 10^{- 4}, 2 \times 10^{- 4}, 1 \times 10^{- 4}}$ via grid search), and $ϵ = 10^{- 8}$ is a small constant for numerical stability.

The adaptive nature of Adam is particularly beneficial for the FDE-GRU framework for two reasons. First, the physics-based loss terms (

L_{m a s s}

and

L_{f r a c}

) may exhibit different gradient scales compared to the data-fitting term, requiring per-parameter learning rate adaptation. Second, the trainable physical parameters (

R_{0}

,

R_{1}

,

C_{p}

) typically have different magnitudes and sensitivity compared to the neural network weights, making adaptive optimization crucial for stable convergence.

During training, we observe that the Adam optimizer effectively balances the competing objectives of minimizing data prediction error while satisfying fractional-order physical constraints. The exponential moving averages provide momentum-like behavior for navigating flat regions of the loss landscape while dampening oscillations in steep directions, leading to more stable convergence compared to non-adaptive optimizers.

4. Experimental Setup

4.1. Datasets and Preprocessing

All experiments in this study are conducted using two publicly available battery datasets: the Panasonic 18650PF dataset provided by Dr. Phillip Kollmeyer from the University of Wisconsin-Madison [42] and the CALCE A123 battery dataset [43]. The Panasonic dataset includes tests under a wide range of sub-zero to room temperatures (25 °C, 10 °C, 0 °C, −10 °C, and −20 °C) utilizing dynamic load profiles scaled for an electric vehicle battery pack, such as the US06, HWFET, UDDS, and LA92 drive cycles. The CALCE dataset encompasses tests across seven distinct temperature conditions (0 °C, 10 °C, 20 °C, 25 °C, 30 °C, 40 °C, and 50 °C) under the DST, FUDS, and US06 dynamic load profiles.

To construct the model training and testing sets, specific partition strategies were applied based on the structures of the respective datasets. For the Panasonic 18650PF dataset at each test temperature, the standard test sequence includes nine drive cycles in the following order: Cycle 1, Cycle 2, Cycle 3, Cycle 4, US06, HWFET, UDDS, LA92, and Neural Network (NN), where Cycles 1–4 comprise a random mix of the US06, HWFET, UDDS, LA92, and NN profiles. The data from Cycle 1, Cycle 2, Cycle 3, and Cycle 4 were assigned to the training set. The subsequent five standardized drive cycles (US06, HWFET, UDDS, LA92, and NN) were allocated to the test set, enabling the evaluation of model generalization on unseen, standard driving profiles.

For the CALCE A123 dataset, a cross-profile evaluation strategy is adopted for each temperature condition. Specifically, the data from two dynamic drive cycles (e.g., DST and FUDS) are assigned to the training set, while the remaining cycle (e.g., US06) is reserved for the test set. This approach rigorously assesses the model’s adaptability to completely unseen driving conditions under identical thermal states.

4.2. Model Implementation and Training Protocol

The proposed model is implemented using the PyTorch deep learning framework. All experiments are conducted on a cloud server equipped with an NVIDIA GeForce RTX 5060 Ti GPU (16 GB VRAM), an AMD EPYC 7601 32-core CPU (8 vCPUs allocated), and 62 GB of RAM, using PyTorch 2.9.1 with Python 3.11. Key implementation parameters are listed below:

Data Preprocessing: The input features comprise the raw measurements of Voltage, Current, and Temperature. The sequence length is set via a sliding window approach and fixed at 20 for all experiments.
Network Configuration: To ensure a fair and robust comparison, all deep learning baselines and the data-driven backbone of our proposed physics-informed model are configured with comparable architectural scales, primarily centralized around a hidden state dimension of 32 and a linear output layer. Specifically, the GRU and LSTM models utilize a single recurrent layer with 32 hidden units, while the Bi-LSTM employs 32 units for each direction. The MLP contains two 32-unit hidden layers with ReLU activation. The CNN-LSTM prepends a 1D convolutional layer (32 filters, kernel size of 3, padding of 1) to a 32-unit LSTM. The Transformer uses 1 encoder layer, an embedding dimension of 128, 4 attention heads, and a feed-forward dimension of 512. For our proposed model, the 32-unit data-driven backbone is subsequently constrained by trainable physical parameters (e.g., $R_{0}$ , $R_{1}$ , $C_{p}$ ) to solve the FDEs.
Physics Parameters: Specifically for our proposed physics-informed model, the network incorporates both trainable and fixed physics-based parameters. The trainable parameters are the FO-ECM components $R_{0}$ , $R_{1}$ , and $C_{p}$ initialized as learnable variables with a starting value of 1.0. In contrast, the battery capacity Q is fixed at 2.9 Ah for the Panasonic dataset and 1.1 Ah for the CALCE dataset, and the scaling factor $δ$ is set to 1.0. Additionally, the discrete fractional-order derivative uses a window length of 10 to balance computational efficiency with model performance.
Training Settings: All methods are evaluated under a common set of learning rates and batch sizes to ensure consistent comparison. Specifically, batch sizes of 128 and 256 are combined with learning rates of $1 \times 10^{- 3}$ , $5 \times 10^{- 4}$ , $2.5 \times 10^{- 4}$ , $2 \times 10^{- 4}$ , and $1 \times 10^{- 4}$ to explore the optimal trade-off between model convergence and generalization respectively. Each model is trained for 100 epochs using the Adam optimizer [39]. A fixed random seed ensures reproducibility. It is worth noting that a fixed 100-epoch strategy was employed primarily to provide a strictly uniform computational budget across all evaluated architectures for a fair comparative baseline. While the physics-based loss terms in FDE-GRU act as strong regularizers to mitigate overfitting, future studies will adopt dynamic early stopping to further eliminate unnecessary computational overhead.

4.3. Testing and Validation Evaluation

To rigorously assess the robustness of the proposed FDE-GRU framework alongside the baseline models, a comprehensive cross-profile testing and validation protocol is established. During the evaluation phase, the network parameters are strictly frozen. The models are then validated on the completely unseen standard drive cycles assigned to the test set (e.g., US06, HWFET, UDDS, LA92, and NN for the Panasonic dataset, and the reserved test cycle for the CALCE dataset). This evaluation strategy ensures that the validation process rigorously tests the models’ adaptability to novel, highly dynamic load conditions that were not encountered during the training phase.

The performance of the SOC estimation is quantitatively evaluated using three standard statistical metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). These metrics provide a comprehensive assessment of the estimation accuracy. Specifically, MSE and RMSE are particularly sensitive to large localized errors (such as overshoots during aggressive current pulses), while MAE reflects the average absolute deviation over the entire driving cycle. The metrics are mathematically defined as follows:

MSE = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2},

(21)

MAE = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |,

(22)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}},

(23)

where

y_{i}

represents the true reference SOC value,

{\hat{y}}_{i}

denotes the predicted SOC value at the i-th time step, and N is the total number of samples in the validation sequence. To facilitate a clear presentation of the results across varying conditions, unless otherwise specified, the reported MSE values in the subsequent sections are scaled by

10^{- 4}

(i.e., a presented value of x corresponds to an actual MSE of

x \times 10^{- 4}

), while MAE and RMSE are reported as percentages (%).

5. Results and Discussion

In this section, we comprehensively evaluate the proposed FDE-GRU framework. The assessment is conducted in four dimensions: (1) overall accuracy comparison against traditional methods; (2) an ablation study on the impact of the fractional order

α

to validate the necessity of fractional calculus, followed by a dedicated comparison between the optimal fractional-order (

α = 0.25

) and integer-order (

α = 1.0

) models; (3) robustness analysis across different temperature conditions; and (4) adaptability to highly dynamic drive cycles.

5.1. Overall Performance Comparison

This subsection evaluates the overall performance of all models on both the Panasonic 18650PF and CALCE A123 datasets. For the Panasonic dataset, the assessment is conducted across all temperature conditions (−20 °C, −10 °C, 0 °C, 10 °C, and 25 °C) and all drive cycles (US06, HWFET, UDDS, LA92, and NN). For the CALCE A123 dataset, which utilizes a Lithium Iron Phosphate (LiFePO₄) chemistry, the evaluation covers temperatures from 0 °C to 50 °C and drive cycles including DST, US06, and FUDS. The performance is aggregated by computing the MSE across the entire test set for each respective dataset. This provides a comprehensive assessment of each model’s average accuracy under diverse operating conditions.

Table 1 presents the aggregate performance across all tested temperatures and drive cycles for the Panasonic dataset. The proposed FDE-GRU (

α = 0.25

) achieves the lowest average MSE of

14.29 \times 10^{- 4}

, demonstrating superior performance over all baselines.

Similarly, Table 2 presents the overall performance on the CALCE A123 dataset. The proposed FDE-GRU framework consistently outperforms all pure data-driven baselines on this LiFePO₄ battery chemistry. Specifically, FDE-GRU achieves the lowest overall average MSE of

26.24 \times 10^{- 4}

and an RMSE of

5.09 %

, reducing the estimation error by

26.2 %

compared to the standard GRU (

35.55 \times 10^{- 4}

).

Compared to pure data-driven deep learning models including GRU, MLP, LSTM, BiLSTM, CNN-LSTM, and Transformer, the FDE-GRU demonstrates significant superiority. Specifically, for the Panasonic 18650PF battery dataset, it reduces the MSE by 35.6% compared to the standard GRU (

22.19 \times 10^{- 4}

) and by 48.7% compared to the Transformer model (

27.88 \times 10^{- 4}

). Similarly, for the CALCE A123 dataset, it reduces the MSE by 26.2% compared to the GRU (

35.55 \times 10^{- 4}

) and by 47.6% compared to the Transformer (

50.08 \times 10^{- 4}

). It is worth noting that complex architectures like Transformers and BiLSTMs did not achieve notable gains over the standard GRU in this task. This may be attributed to factors such as the limited dataset size or the inherent compatibility of these architectures with the current forecasting horizon. Without the regularization effect of physics constraints, increased model capacity appears prone to overfitting, which is an issue that FDE-GRU effectively mitigates via its lightweight, physics-guided recurrent structure, achieving better accuracy with higher computational efficiency.

To provide visual insights into the model performance under specific conditions, Figure 3 and Figure 4 compare the SOC estimation results of four representative models (FDE-GRU, LSTM, GRU, and Transformer) across three representative temperatures (0 °C, 25 °C, and −20 °C) and four dynamic drive cycles (HWFET, LA92, UDDS, and US06) in the Panasonic dataset. These cycles were selected to represent different driving patterns: HWFET for highway conditions, LA92 for aggressive urban driving, UDDS for urban driving with frequent stops, and US06 for high-speed and high-acceleration scenarios. These figures provide preliminary visual evidence that the proposed FDE-GRU maintains closer tracking to the reference SOC compared to the GRU, LSTM, and Transformer models under all displayed conditions, suggesting superior robustness across varying temperatures and load profiles. The figures present the SOC estimation results, consisting of two components: (a) a comparison between the predicted and reference SOC values over time, and (b) the corresponding absolute error profile. To enhance clarity in visualizing the error trend, the data points in the absolute error plot (b) are presented at a subsampled interval of 100. A detailed quantitative analysis of the estimation errors under these specific scenarios is provided in the following subsection.

5.2. Impact of Fractional Order Kinetics: An Ablation Study

This subsection investigates the impact of the fractional order parameter

α

on SOC estimation accuracy. The evaluation is conducted across all temperature conditions and all drive cycles in Panasonic 18650PF battery dataset to ensure a comprehensive assessment. For each value of

α \in {0.1, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.9, 1.0}

, the corresponding FDE-GRU model is tested on the complete test set comprising data from all five temperatures and all five drive cycles. The average MSE across all these conditions is computed to quantify the effect of varying

α

.

A core hypothesis of this work is that lithium-ion battery dynamics, particularly solid-phase diffusion and double-layer polarization, are inherently fractional-order processes. To verify this, we conducted an ablation study by varying the derivative order

α

in the physics-informed loss function

L_{f r a c}

. This ablation study utilizes the same dataset, data split, and model implementation details as described in Section 4.1 and Section 4.2, with the only variation being the value of the fractional order

α

in the physics-informed loss. We tested

α

listed above, where

α = 1.0

represents the standard Integer-Order Equivalent Circuit Model (typically used in existing PINNs).

The comparison results of estimation accuracy are summarized in Table 3. Additionally, Figure 5 visualizes both the average MSE and the single-training time across different

α

values.

The results yield two critical insights:

(1): Physics-Informed vs. Data-Driven: Even with integer-order constraints ( $α = 1.0$ ), the physics-informed model (MSE $18.34 \times 10^{- 4}$ ) significantly outperforms the pure data-driven GRU (MSE $22.19 \times 10^{- 4}$ ). This confirms that embedding Kirchhoff’s laws and mass conservation explicitly regularizes the network, preventing unphysical predictions.
(2): Fractional vs. Integer: Notably, the model with $α = 0.25$ achieves the best performance among all tested configurations, yielding an improvement of approximately 22.1% over the integer-order counterpart ( $α = 1.0$ ). The overall trend suggests that models with $α$ values in the lower to medium range (0.1–0.6) generally outperform those with $α$ values closer to 1.0. This observation supports the electrochemical hypothesis that battery relaxation dynamics may be better described by power-law decay (Mittag-Leffler function) rather than pure exponential decay. The optimal $α$ value of 0.25 in our experiments appears to capture the balance between modeling the long-memory effects characteristic of solid-phase lithium diffusion and maintaining numerical stability. However, we note that the specific optimal value may vary with battery chemistry and operating conditions, and further investigation is needed to establish a definitive physical interpretation.

5.3. Robustness Across Different Operating Temperature

This subsection examines the robustness of all models under varying thermal conditions across both datasets. For the Panasonic dataset, for each specific temperature (−20 °C, −10 °C, 0 °C, 10 °C, and 25 °C), the models are evaluated on the test data corresponding to that temperature, which includes all five drive cycles (US06, HWFET, UDDS, LA92, and NN). The MSE values are averaged across all drive cycles at each temperature, allowing for a clear analysis of performance degradation as temperature decreases.

Evaluating model performance under varying thermal conditions is crucial, as temperature fluctuations significantly alter internal electrochemical impedances. Low temperatures, in particular, exacerbate the anomalous diffusion and nonlinear polarization effects, serving as a rigorous testbed for the model’s physical capture ability. Table 4 details the performance degradation as temperature drops from 25 °C to −20 °C.

Under varying thermal conditions, the performance of battery state estimation models diverges significantly due to temperature-dependent electrochemical dynamics. At benign room temperature (25 °C), where kinetics are fast and dynamics are nearly linear, most models perform well with relatively small performance gaps, with the FDE-GRU achieving a superior MSE of

2.52 \times 10^{- 4}

.

As temperature decreases, the degradation in model accuracy becomes increasingly pronounced, yet the FDE-GRU (

α = 0.25

) consistently exhibits superior robustness across all low-temperature regimes. At 0 °C, where electrolyte viscosity begins to rise and diffusion processes slow, the FDE-GRU maintains an MSE of

11.14 \times 10^{- 4}

, substantially outperforming the standard GRU (

34.73 \times 10^{- 4}

), LSTM (

34.66 \times 10^{- 4}

), and Transformer (

30.67 \times 10^{- 4}

). This advantage persists at −10 °C, with the FDE-GRU achieving

18.68 \times 10^{- 4}

compared to

23.20 \times 10^{- 4}

for GRU and

27.46 \times 10^{- 4}

for LSTM. Under extreme cold (−20 °C), where increased internal resistance and anomalous diffusion severely challenge purely data-driven models, the FDE-GRU again records the lowest error (

33.73 \times 10^{- 4}

), significantly below the GRU (

38.08 \times 10^{- 4}

) and MLP (

59.85 \times 10^{- 4}

).

Crucially, the improvement of FDE-GRU (

α = 0.25

) over the integer-order FDE-GRU (

α = 1.0

) (MSE

41.10 \times 10^{- 4}

at −20 °C) underscores the advantage of fractional calculus under low-temperature conditions. At reduced temperatures, many physical systems exhibit increased viscosity and slowed dynamics, often leading to non-Fickian diffusion [44]. The non-local kernel of the fractional derivative is mathematically suited to model such memory-dependent and viscoelastic-like transport behavior, enabling the neural network to generalize more effectively in these unseen, challenging regimes [45].

Furthermore, Table 5 details the temperature robustness on the CALCE A123 dataset across a wide thermal spectrum ranging from 0 °C to 50 °C. The FDE-GRU consistently secures the lowest MSE across all evaluated temperatures. The advantage remains significant at lower temperatures (e.g., 0 °C), demonstrating that the fractional-order physical constraints effectively regulate the neural network to capture the intensified memory effects and non-ideal electrochemical diffusion regardless of the specific battery chemistry.

5.4. Adaptability to Dynamic Drive Cycles

This subsection assesses the adaptability of all models to different driving patterns represented by various drive cycles. For the Panasonic dataset, for each specific drive cycle (HWFET, LA92, US06, UDDS, and NN), the models are evaluated on the test data corresponding to that drive cycle across all temperature conditions. The MSE values are averaged across all temperatures for each drive cycle, enabling a comparative analysis of model performance under different dynamic load profiles.

We further evaluate the models under distinct drive cycles representing different driving behaviors: UDDS (Urban), HWFET (Highway), and US06 (Aggressive). The US06 cycle poses the greatest challenge due to its rapid high-current charge/discharge pulses.

As shown in Table 6, under the aggressive US06 cycle, the FDE-GRU achieves an MSE of

35.72 \times 10^{- 4}

, which is dramatically lower than the Transformer (

71.84 \times 10^{- 4}

) and MLP (

71.73 \times 10^{- 4}

). Large current rates typically induce sharp polarization voltages [46]. Purely data-driven models often overshoot or undershoot SOC during these spikes. The FDE-GRU is constrained by the fractional-order dynamics of the polarization voltage (governed by Equation (5) in Section 2.2):

D_{t}^{α} U_{1} (t) = - \frac{1}{R_{1} C_{p}} U_{1} (t) + \frac{1}{C_{p}} I (t) .

(24)

This constraint enforces a specific trajectory for voltage recovery after current pulses. Consequently, even when the input current changes largely (as in US06), the network’s predicted SOC and internal states remain physically bounded, preventing the drift often seen in Transformers and LSTMs.

This robust adaptability extends to the CALCE A123 dataset, as shown in Table 7. Evaluating the models under DST, US06, and FUDS dynamic drive cycles reveals that the FDE-GRU maintains robust SOC tracking despite the complex load profiles. Importantly, it successfully overcomes the unique estimation challenges presented by the LiFePO₄ chemistry, most notably its notoriously flat Open-Circuit Voltage (OCV) plateau.

5.5. Further Comparison of FDE-GRU ( $α = 0.25$ ) and FDE-GRU ( $α = 1.0$ )

To further elucidate the advantages of fractional-order dynamics over integer-order approximations, this subsection provides a detailed comparative analysis between FDE-GRU (

α = 0.25

) and FDE-GRU (

α = 1.0

) across various operating conditions in the Panasonic dataset. Figure 6 presents the absolute error distributions for representative drive cycles and temperatures, where the error distributions are visualized after removing the top 1% of extreme values to better highlight the central trends and typical performance differences between the two models.

As shown in Figure 6, while both models exhibit comparable accuracy under certain conditions, FDE-GRU (

α = 0.25

) demonstrates a clear and consistent advantage in several scenarios, particularly during the HWFET cycle across different temperatures. Under this highway driving profile characterized by sustained moderate-current loads, the fractional-order model shows not only lower median error but also tighter error distributions, indicating superior robustness. This pattern is consistent with the aggregated performance metrics presented in Table 4 and Table 6, where FDE-GRU (

α = 0.25

) achieves the lowest average MSE across all test conditions.

The observed performance superiority of the fractional-order model can be understood through the electrochemical characteristics of lithium-ion batteries. The HWFET cycle, with its relatively stable and prolonged current demands, emphasizes the battery’s long-term relaxation behavior and polarization dynamics. Fractional-order calculus, with its inherent memory kernel (Equation (2)), provides a mathematical framework that more naturally describes the power-law decay and long-memory effects observed in battery relaxation processes [30,35]. In contrast, the integer-order model (

α = 1.0

) assumes exponential relaxation, which may inadequately capture these complex dynamics, potentially leading to the gradual error accumulation observed during sustained operation.

Furthermore, the enhanced performance of FDE-GRU (

α = 0.25

) at lower temperatures (as evidenced in Figure 6 and Table 4) aligns with established electrochemical principles. Reduced temperatures are known to amplify non-ideal battery behavior, making diffusion processes more sub-diffusive and increasing the significance of memory effects [44]. The fractional-order derivative, with its parameter

α < 1

, offers greater flexibility to model these temperature-dependent deviations from ideal behavior compared to the integer-order approximation.

While both models incorporate physical constraints, the fractional-order formulation appears to provide more accurate regularization for the neural network, particularly in conditions where long-term dynamics and temperature-dependent effects are prominent. This suggests that the FDE serves as a more faithful representation of the underlying electrochemical processes governing lithium-ion battery behavior.

5.6. Error Distribution Analysis

To assess the reliability and robustness of the models, we visualize the error distribution using violin plots (refer to Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11). Each violin plot is constructed from the MAE and RMSE values obtained by training and testing each model for five times independently, under each specific temperature and drive cycle in the Panasonic dataset. These plots collectively highlight the consistency and spread of estimation errors, enabling a direct comparison of model performance across the diverse scenarios studied.

Taking the 0 °C US06 case (Figure 7d) as an example, the FDE-GRU distribution is characterized by a high concentration around zero error and, more importantly, much shorter tails compared to BiLSTM and CNN-LSTM. Long tails in error distribution imply occasional large deviations, which are dangerous for BMS safety (potentially causing over-discharge). The compactness of the FDE-GRU’s error distribution is attributed to the dual-constraint mechanism: the Mass Conservation Law (

L_{m a s s}

) locks the low-frequency SOC trend, while the Fractional Voltage Law (

L_{f r a c}

) corrects the high-frequency polarization dynamics.

5.7. Computational Complexity Analysis

To evaluate the practical deployability of the proposed FDE-GRU, we analyze its computational cost in terms of training time and inference latency. All models were trained on the same hardware configuration described in Section 4. Training times were recorded as typical values for a single training run of 100 epochs with a batch size of 256. Inference latency was evaluated on a CPU with a batch size of one to simulate real-time BMS execution. After an initial warm-up phase, the average processing time per sample was calculated over 2000 independent inference runs. Notably, for the FDE-GRU, the physics-informed pathway (including the vectorized G-L fractional derivative calculation) is only utilized to compute the physical loss during the backpropagation training phase. During inference, this pathway is inherently bypassed, and the model operates strictly using the standard GRU forward pass.

Table 8 summarizes the computational results. Among the recurrent architectures, the standard GRU is highly time-efficient during training (10 min). The proposed FDE-GRU requires a typical training time of 16 min, representing a moderate increase of 6 min. This overhead is entirely attributed to the additional fractional-order derivative computation inside the training loop. Nevertheless, this one-time offline training cost is highly acceptable. More importantly, the inference latency of the FDE-GRU (0.57 ms) is virtually identical to that of the standard GRU (0.56 ms), as the fractional-order constraints are excluded post-deployment. While simpler architectures like the MLP offer lower latency (0.05 ms), the proposed FDE-GRU achieves an excellent balance between dynamic modeling capability and computational cost. Furthermore, compared to other advanced architectures such as the Transformer, BiLSTM, and CNN-LSTM (1.54–2.44 ms), the FDE-GRU maintains a highly streamlined profile, making it exceptionally advantageous for real-time execution on resource-constrained embedded BMS hardware.

These results verify that the proposed physics-informed training strategy imposes only a modest increase in offline computational overhead while preserving the lightweight, high-efficiency characteristics of a standard GRU during online inference, making it highly suitable for real-world deployment.

6. Conclusions

In this paper, we proposed a novel Fractional Differential Physics-Informed Neural Network (FDE-GRU) to address the challenges of SOC estimation under extreme temperatures and dynamic loads. By embedding the G-L fractional derivative directly into the recurrent neural network’s loss function and leveraging the inherent symmetry in charge conservation to construct physical constraints, we successfully bridged the gap between data-driven flexibility and electrochemical physical consistency. The proposed FDE-GRU demonstrates superior estimation accuracy by achieving an average MSE of

14.29 \times 10^{- 4}

on the Panasonic dataset, significantly outperforming standard GRUs (

22.19 \times 10^{- 4}

) and complex Transformers (

27.88 \times 10^{- 4}

), which suggests that physics-informed regularization may offer advantages over simply increasing model complexity. Crucially, this superiority is generalized to the CALCE dataset with a notoriously flat LiFePO₄ OCV plateau, where FDE-GRU secures an average MSE of

26.24 \times 10^{- 4}

. Through a rigorous ablation study, we validated the fractional dynamics by identifying an optimal fractional order of

α = 0.25

, surpassing the integer-order PINN (

α = 1.0

) by approximately 22.1%, thereby providing strong empirical evidence that battery relaxation processes follow power-law decays and anomalous diffusion rather than the ideal exponential decay assumed by traditional integer-order models. Future work will focus on two directions: (1) extending the fractional-order constraint to multi-cell battery packs to account for cell inconsistencies; and (2) deploying the simplified algorithm onto low-power embedded microcontrollers to validate its real-time performance in onboard hardware.

Author Contributions

Conceptualization, L.D.; methodology, L.D. and L.K.; investigation, L.K.; data curation, L.K.; formal analysis, L.K.; visualization, L.K.; writing—original draft preparation, L.K.; writing—review and editing, L.K. and L.D.; supervision, L.D.; project administration, L.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 62401453), Fundamental Research Funds for Xi’an Jiaotong University (No. xzy012023134), and China Postdoctoral Science Foundation (No. 2023M732792).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

Symbol	Description
$V (t)$	Terminal voltage of the battery (V)
$I (t)$	Load current, positive for discharge (A)
$T (t)$	Battery ambient temperature (°C)
$U_{O C V}$	Open-circuit voltage (V)
$U_{1}$	Polarization voltage across the RC/CPE loop (V)
$R_{0}$	Ohmic internal resistance ( $Ω$ )
$R_{1}$	Polarization resistance ( $Ω$ )
$C_{p}$	Generalized capacitance of the CPE ( $F \cdot s^{α - 1}$ )
$α$	Fractional order of the derivative ( $0 < α < 1$ )
$D_{t}^{α}$	Grünwald–Letnikov fractional derivative operator
Q	Nominal capacity of the battery (Ah)
$δ$	Coulombic efficiency
h	Sampling time step (s)
$w_{j}^{(α)}$	Weighting coefficients for the fractional derivative
k	Memory window length for fractional calculation
$h_{t}$	Hidden state vector of the GRU at time t
$L_{t o t a l}$	Composite loss function
$L_{m a s s}$	Mass conservation residual loss (based on Coulomb counting)
$L_{f r a c}$	Fractional polarization residual loss (based on FDE)
$Θ, ϕ$	Network and physical parameters to be optimized

Abbreviations

The following abbreviations are used in this manuscript:

Adam	Adaptive Moment Estimation
Bi-LSTM	Bidirectional Long Short-Term Memory
BMS	Battery Management System
CNN-LSTM	Convolutional Neural Network - Long Short-Term Memory
CPE	Constant Phase Element
DL	Deep Learning
ECM	Equivalent Circuit Model
EIS	Electrochemical Impedance Spectroscopy
EM	Electrochemical Model
EV	Electric Vehicle
FDE	Fractional-Order Differential Equation
F-EKF	Fractional Extended Kalman Filter
FO-ECM	Fractional-Order Equivalent Circuit Model
FOM	Fractional-Order Model
G-L	Grünwald–Letnikov
GRU	Gated Recurrent Unit
IDE	Integer-Order Differential Equation
KF	Kalman Filter
LHS	Left-Hand Side
LIB	Lithium-Ion Battery
LiFePO₄	Lithium Iron Phosphate
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MLP	Multi-Layer Perceptron
MSE	Mean Squared Error
NCA	Nickel Cobalt Aluminum
NN	Neural Network
OCV	Open-Circuit Voltage
PINN	Physics-Informed Neural Network
RHS	Right-Hand Side
RMSE	Root Mean Squared Error
RNN	Recurrent Neural Network
SOC	State of Charge

References

Guo, W.; Feng, T.; Li, W.; Hua, L.; Meng, Z.; Li, K. Comparative life cycle assessment of sodium-ion and lithium iron phosphate batteries in the context of carbon neutrality. J. Energy Storage 2023, 72, 108589. [Google Scholar] [CrossRef]
Lai, X.; Chen, Q.; Gu, H.; Han, X.B.; Zheng, Y.J. Life cycle assessment of lithium-ion batteries for carbon-peaking and carbon-neutrality: Framework, methods, and progress. J. Mech. Eng. 2022, 58, 3–18. [Google Scholar]
Zhang, Y.; Yang, F.; Zang, C.; Zhou, Z.; Zhao, Y.; Wan, H. Development Prospect of Energy Storage Technology and Application Under the Goal of Carbon Peaking and Carbon Neutrality. In 2022 5th International Conference on Energy, Electrical and Power Engineering (CEEPE); IEEE: Piscataway, NJ, USA, 2022; pp. 1049–1054. [Google Scholar]
Ali, M.U.; Zafar, A.; Nengroo, S.H.; Hussain, S.; Junaid Alvi, M.; Kim, H.J. Towards a smarter battery management system for electric vehicle applications: A critical review of lithium-ion battery state of charge estimation. Energies 2019, 12, 446. [Google Scholar] [CrossRef]
Shete, S.; Jog, P.; Kumawat, R.; Palwalia, D. Battery management system for soc estimation of lithium-ion battery in electric vehicle: A review. In Proceedings of the 2021 6th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE); IEEE: Piscataway, NJ, USA, 2021; Volume 6, pp. 1–4. [Google Scholar]
Mukherjee, S.; Chowdhury, K. State of charge estimation techniques for battery management system used in electric vehicles: A review. Energy Syst. 2025, 16, 1521–1564. [Google Scholar] [CrossRef]
Chen, L.; Zeng, S.; Li, J.; Li, K.; Ma, R.; Liu, J.; Wu, W. Safety assessment of overcharged batteries and a novel passive warning method based on relaxation expansion force. J. Energy Chem. 2025, 105, 595–607. [Google Scholar] [CrossRef]
Ji, C.; Zhang, S.; Wang, B.; Sun, J.; Zhang, Z.; Liu, Y. Study on thermal safety of the overcharged lithium-ion battery. Fire Technol. 2023, 59, 1089–1114. [Google Scholar] [CrossRef]
Di Liberto, E.; Borchiellini, R.; Fruhwirt, D.; Papurello, D. A Review of Safety Measures in Battery Electric Buses. Fire 2025, 8, 159. [Google Scholar] [CrossRef]
Movahedi, H.; Tian, N.; Fang, H.; Rajamani, R. Hysteresis compensation and nonlinear observer design for state-of-charge estimation using a nonlinear double-capacitor li-ion battery model. IEEE/ASME Trans. Mechatron. 2021, 27, 594–604. [Google Scholar] [CrossRef]
Hasan, R. Fractional Modelling of Rechargeable Batteries. Ph.D. Thesis, The University of Waikato, Hamilton, New Zealand, 2021. [Google Scholar]
Vandeputte, F.; Hallemans, N.; Lataire, J. Parametric estimation of arbitrary fractional order models for battery impedances. IFAC-PapersOnLine 2024, 58, 97–102. [Google Scholar] [CrossRef]
Guo, D.; Yang, G.; Feng, X.; Han, X.; Lu, L.; Ouyang, M. Physics-based fractional-order model with simplified solid phase diffusion of lithium-ion battery. J. Energy Storage 2020, 30, 101404. [Google Scholar] [CrossRef]
Mu, H.; Xiong, R.; Zheng, H.; Chang, Y.; Chen, Z. A novel fractional order model based state-of-charge estimation method for lithium-ion battery. Appl. Energy 2017, 207, 384–393. [Google Scholar] [CrossRef]
Xiong, R.; Tian, J.; Shen, W.; Sun, F. A novel fractional order model for state of charge estimation in lithium ion batteries. IEEE Trans. Veh. Technol. 2018, 68, 4130–4139. [Google Scholar] [CrossRef]
Zou, C.; Zhang, L.; Hu, X.; Wang, Z.; Wik, T.; Pecht, M. A review of fractional-order techniques applied to lithium-ion batteries, lead-acid batteries, and supercapacitors. J. Power Sources 2018, 390, 286–296. [Google Scholar] [CrossRef]
Wang, Y.; Tian, J.; Sun, Z.; Wang, L.; Xu, R.; Li, M.; Chen, Z. A comprehensive review of battery modeling and state estimation approaches for advanced battery management systems. Renew. Sustain. Energy Rev. 2020, 131, 110015. [Google Scholar] [CrossRef]
Liu, Y.; He, Y.; Bian, H.; Guo, W.; Zhang, X. A review of lithium-ion battery state of charge estimation based on deep learning: Directions for improvement and future trends. J. Energy Storage 2022, 52, 104664. [Google Scholar] [CrossRef]
Zhang, D.; Zhong, C.; Xu, P.; Tian, Y. Deep learning in the state of charge estimation for li-ion batteries of electric vehicles: A review. Machines 2022, 10, 912. [Google Scholar] [CrossRef]
Naguib, M.; Kollmeyer, P.J.; Emadi, A. State of charge estimation of lithium-ion batteries: Comparison of GRU, LSTM, and temporal convolutional deep neural networks. In Proceedings of the 2023 IEEE Transportation Electrification Conference & Expo (ITEC); IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Gole, A.C.; Aher, P.K.; Patil, S.L. SOC Estimation of a Li-ion Battery using Deep Learning Method: A comparative Study of LSTM and GRU Architecture. In Proceedings of the 2023 11th National Power Electronics Conference (NPEC); IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Zhu, Y.; Cheng, J.; Liu, Z.; Zou, X.; Wang, Z.; Cheng, Q.; Xu, H.; Wang, Y.; Tao, F. Remaining useful life prediction approach based on data model fusion: An application in rolling bearings. IEEE Sens. J. 2024, 24, 42230–42244. [Google Scholar] [CrossRef]
Liu, Z.; Cai, L.; Zhang, J.; He, Y.; Ren, Z.; Ding, C. Predicting Airplane Cabin Temperature Using a Physics-Informed Neural Network Based on a Priori Monotonicity. Aerospace 2025, 12, 988. [Google Scholar] [CrossRef]
Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
Dang, L.; Yang, J.; Liu, M.Q.; Chen, B. Differential Equation-Informed Neural Networks for State-of-Charge Estimation. IEEE Trans. Instrum. Meas. 2024, 73, 1000315. [Google Scholar] [CrossRef]
Singh, S.; Ebongue, Y.E.; Rezaei, S.; Birke, K.P. Hybrid modeling of lithium-ion battery: Physics-informed neural network for battery state estimation. Batteries 2023, 9, 301. [Google Scholar] [CrossRef]
Puchalski, B. Neural approximators for variable-order fractional calculus operators (VO-FC). IEEE Access 2022, 10, 7989–8004. [Google Scholar] [CrossRef]
Matusiak, M. Optimization for software implementation of fractional calculus numerical methods in an embedded system. Entropy 2020, 22, 566. [Google Scholar] [CrossRef]
Podlubny, I. Fractional Differential Equations: An Introduction to Fractional Derivatives, Fractional Differential Equations, to Methods of Their Solution and Some of Their Applications; Elsevier: Amsterdam, The Netherlands, 1998; Volume 198. [Google Scholar]
Sibatov, R.T.; Svetukhin, V.V.; Kitsyuk, E.P.; Pavlov, A.A. Fractional differential generalization of the single particle model of a lithium-ion cell. Electronics 2019, 8, 650. [Google Scholar] [CrossRef]
Wang, J.; Zhang, L.; Xu, D.; Zhang, P.; Zhang, G. A Simplified Fractional Order Equivalent Circuit Model and Adaptive Online Parameter Identification Method for Lithium-Ion Batteries. Math. Probl. Eng. 2019, 2019, 6019236. [Google Scholar] [CrossRef]
Bard, A.J.; Faulkner, L.R.; White, H.S. Electrochemical Methods: Fundamentals and Applications, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2022. [Google Scholar]
Westerhoff, U.; Kurbach, K.; Lienesch, F.; Kurrat, M. Analysis of lithium-ion battery models based on electrochemical impedance spectroscopy. Energy Technol. 2016, 4, 1620–1630. [Google Scholar] [CrossRef]
Laakso, E.; Efimova, S.; Colalongo, M.; Kauranen, P.; Lahtinen, K.; Napolitano, E.; Ruiz, V.; Moškon, J.; Gaberšček, M.; Park, J.; et al. Aging mechanisms of NMC811/Si-Graphite Li-ion batteries. J. Power Sources 2024, 599, 234159. [Google Scholar] [CrossRef]
Mainardi, F. Fractional Calculus and Waves in Linear Viscoelasticity: An Introduction to Mathematical Models; World Scientific: Singapore, 2022. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient transformers: A survey. ACM Comput. Surv. 2022, 55, 1–28. [Google Scholar] [CrossRef]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
Adam, J.B.; Kinga, D. A method for stochastic optimization. arXiv 2014, arXiv:1412.1196. [Google Scholar] [CrossRef]
Wang, S.; Teng, Y.; Perdikaris, P. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM J. Sci. Comput. 2021, 43, A3055–A3081. [Google Scholar] [CrossRef]
Rathore, P.; Lei, W.; Frangella, Z.; Lu, L.; Udell, M. Challenges in training pinns: A loss landscape perspective. arXiv 2024, arXiv:2402.01868. [Google Scholar] [CrossRef]
Kollmeyer, P. Panasonic 18650PF Li-ion Battery Data. Mendeley Data 2018. [Google Scholar] [CrossRef]
Center for Advanced Life Cycle Engineering (CALCE). Battery Data. 2024. Available online: https://calce.umd.edu/battery-data (accessed on 30 January 2025).
Lu, X.; Li, H.; Chen, N. Analysis of the properties of fractional heat conduction in porous electrodes of lithium-ion batteries. Entropy 2021, 23, 195. [Google Scholar] [CrossRef]
Hristov, J. Non-local kinetics: Revisiting and updates emphasizing fractional calculus applications. Symmetry 2023, 15, 632. [Google Scholar] [CrossRef]
Jiang, Y.; Zhang, C.; Zhang, W.; Shi, W.; Liu, Q. Modeling charge polarization voltage for large lithium-ion batteries in electric vehicles. J. Ind. Eng. Manag. 2013, 6, 686–697. [Google Scholar] [CrossRef][Green Version]

Figure 1. FO-ECM: the fractional-order equivalent circuit model.

Figure 2. Architecture of proposed methods.

Figure 3. Comparison of FDE-GRU, LSTM, GRU, and Transformer models for SOC estimation across different temperature and drive cycles (Panasonic 18650PF).

Figure 4. Comparison of FDE-GRU, LSTM, GRU, and Transformer models for SOC estimation error across different temperature and drive cycles (Panasonic 18650PF).

Figure 5. Impact of fractional order

α

on SOC estimation accuracy (Average MSE

\times 10^{- 4}

) and single-training time (min).

Figure 5. Impact of fractional order

α

on SOC estimation accuracy (Average MSE

\times 10^{- 4}

) and single-training time (min).

Figure 6. Comparison of FDE-GRU (

α = 0.25

) and FDE-GRU (

α = 1.0

) models for SOC estimation across representative drive cycles and temperatures.

Figure 6. Comparison of FDE-GRU (

α = 0.25

) and FDE-GRU (

α = 1.0

) models for SOC estimation across representative drive cycles and temperatures.

Figure 7. Violin plot comparisons of different test cycles under 0 °C.

Figure 8. Violin plot comparisons of different test cycles under 10 °C.

Figure 9. Violin plot comparisons of different test cycles under −10 °C.

Figure 10. Violin plot comparisons of different test cycles under −20 °C.

Figure 11. Violin plot comparisons of different test cycles under 25 °C.

Table 1. Average MSE (

10^{- 4}

), MAE (%), and RMSE (%) of all models under various drive cycles across all temperatures (Panasonic 18650PF).

Table 1. Average MSE (

10^{- 4}

), MAE (%), and RMSE (%) of all models under various drive cycles across all temperatures (Panasonic 18650PF).

	FDE-GRU ( $α = 0.25$ )	FDE-GRU ( $α = 1.0$ )	GRU	MLP	LSTM	CNN- LSTM	Bi- LSTM	Transformer
MSE ( $10^{- 4}$ )	14.29	18.34	22.19	29.29	24.53	26.86	23.39	27.88
MAE (%)	2.43	2.79	2.97	3.62	3.25	3.34	3.17	3.63
RMSE (%)	3.23	3.70	4.05	4.79	4.31	4.48	4.28	4.72

Note: Bold values indicate the best performance.

Table 2. Average MSE (

10^{- 4}

), MAE (%), and RMSE (%) of all models under various drive cycles across all temperatures (CALCE A123).

Table 2. Average MSE (

10^{- 4}

), MAE (%), and RMSE (%) of all models under various drive cycles across all temperatures (CALCE A123).

	FDE-GRU	BiLSTM	LSTM	GRU	CNN-LSTM	Transformer
MSE ( $10^{- 4}$ )	26.24	32.63	34.01	35.55	41.60	50.08
RMSE (%)	5.09	5.63	5.75	5.86	6.39	6.87
MAE (%)	3.75	4.18	4.31	4.39	4.55	5.07

Note: Bold values indicate the best performance.

Table 3. Impact of fractional order

α

on SOC estimation accuracy (Average MSE

\times 10^{- 4}

).

Table 3. Impact of fractional order

α

on SOC estimation accuracy (Average MSE

\times 10^{- 4}

).

Model Configuration	Average MSE
FDE-GRU ( $α = 0.25$ )	14.29
FDE-GRU ( $α = 0.6$ )	15.50
FDE-GRU ( $α = 0.3$ )	16.15
FDE-GRU ( $α = 0.2$ )	15.75
FDE-GRU ( $α = 0.5$ )	16.72
FDE-GRU ( $α = 0.1$ )	16.66
FDE-GRU ( $α = 0.4$ )	16.58
FDE-GRU ( $α = 0.7$ )	16.91
FDE-GRU ( $α = 0.9$ )	17.23
FDE-GRU ( $α = 0.8$ )	17.82
FDE-GRU ( $α = 1.0$ )	18.34
FDE-GRU ( $α = 0.75$ )	18.36
Standard GRU	22.19

Note: Bold values indicate the best performance.

Table 4. Average MSE (

10^{- 4}

) of all models under various drive cycles across different temperatures.

Table 4. Average MSE (

10^{- 4}

) of all models under various drive cycles across different temperatures.

T/°C	FDE-GRU ( $α = 0.25$ )	FDE-GRU ( $α = 1.0$ )	GRU	MLP	LSTM	CNN- LSTM	Bi- LSTM	Transformer
−20	33.73	41.10	38.08	59.85	37.23	40.25	37.53	44.30
−10	18.68	21.14	23.20	32.06	27.46	30.74	26.63	30.79
0	11.14	18.59	34.73	30.69	34.66	37.77	32.71	30.67
10	5.37	7.37	9.15	16.46	16.77	20.69	14.37	24.06
25	2.52	3.52	5.78	7.39	6.53	4.84	5.69	9.55

Note: Bold values indicate the best performance.

Table 5. Average MSE (

10^{- 4}

) of all models across different temperatures on the CALCE A123 dataset.

Table 5. Average MSE (

10^{- 4}

) of all models across different temperatures on the CALCE A123 dataset.

T/°C	FDE-GRU	BiLSTM	LSTM	GRU	CNN-LSTM	Transformer
0	32.84	46.15	48.48	44.13	46.26	53.55
10	31.17	33.70	34.98	36.06	39.78	48.03
20	22.99	42.62	38.52	45.69	43.22	65.85
25	24.79	30.14	31.83	33.17	41.04	60.78
30	24.06	24.54	28.95	31.75	40.40	52.53
40	24.72	27.12	27.23	30.77	39.79	29.44
50	23.14	24.13	28.11	27.29	40.74	40.39

Note: Bold values indicate the best performance.

Table 6. Average MSE (

10^{- 4}

) of all models under different drive cycles across all temperatures.

Table 6. Average MSE (

10^{- 4}

) of all models under different drive cycles across all temperatures.

Condition	FDE-GRU ( $α = 0.25$ )	FDE-GRU ( $α = 1.0$ )	GRU	MLP	LSTM	CNN- LSTM	Bi- LSTM	Transformer
HWFET	7.22	11.19	10.55	12.86	10.48	10.52	11.20	11.31
LA92	7.67	8.81	8.58	17.16	9.61	10.11	9.87	12.53
US06	35.72	46.68	56.12	71.73	65.65	72.60	57.05	71.84
UDDS	4.51	6.01	7.83	11.93	8.75	11.29	10.49	14.43
NN	16.30	19.02	27.85	32.78	28.17	29.76	28.32	29.28

Note: Bold values indicate the best performance.

Table 7. Average MSE (

10^{- 4}

) of all models under different drive cycles on the CALCE A123 dataset.

Table 7. Average MSE (

10^{- 4}

) of all models under different drive cycles on the CALCE A123 dataset.

Condition	FDE-GRU	BiLSTM	LSTM	GRU	CNN-LSTM	Transformer
DST	29.22	41.97	43.47	48.54	53.56	71.13
US06	20.81	26.00	25.07	24.85	31.64	32.42
FUDS	28.70	29.92	33.51	33.26	39.61	46.70

Note: Bold values indicate the best performance.

Table 8. Comparison of training time and inference latency. Training time is reported as a typical value for a single training run of 100 epochs on the full training set; inference latency is the average per-sample time on CPU with a batch size of 1.

Model	Training Time (min)	Inference Latency (ms)
FDE-GRU	16	0.57
GRU	10	0.56
MLP	7	0.05
LSTM	11	1.27
CNN-LSTM	12	1.54
Transformer	16	1.80
BiLSTM	13	2.44

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ke, L.; Dang, L. A Physics-Informed Recurrent Neural Network with Fractional-Order Kinetics for Robust Lithium-Ion Battery State of Charge Estimation. Symmetry 2026, 18, 652. https://doi.org/10.3390/sym18040652

AMA Style

Ke L, Dang L. A Physics-Informed Recurrent Neural Network with Fractional-Order Kinetics for Robust Lithium-Ion Battery State of Charge Estimation. Symmetry. 2026; 18(4):652. https://doi.org/10.3390/sym18040652

Chicago/Turabian Style

Ke, Le, and Lujuan Dang. 2026. "A Physics-Informed Recurrent Neural Network with Fractional-Order Kinetics for Robust Lithium-Ion Battery State of Charge Estimation" Symmetry 18, no. 4: 652. https://doi.org/10.3390/sym18040652

APA Style

Ke, L., & Dang, L. (2026). A Physics-Informed Recurrent Neural Network with Fractional-Order Kinetics for Robust Lithium-Ion Battery State of Charge Estimation. Symmetry, 18(4), 652. https://doi.org/10.3390/sym18040652

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Physics-Informed Recurrent Neural Network with Fractional-Order Kinetics for Robust Lithium-Ion Battery State of Charge Estimation

Abstract

1. Introduction

2. Preliminaries

2.1. Fractional Calculus and Discrete Approximation

2.2. FO-ECM

2.3. GRU Network

3. Methodology

3.1. Problem Formulation

3.2. Proposed FDE-GRU Architecture

3.2.1. Overall of the FDE-GRU Framework

3.2.2. Backbone Network: GRU

3.2.3. Physical Parameters

3.2.4. Physics-Informed Loss Formulation

3.2.5. Implementation of Fractional Calculus

3.3. Optimization Algorithm

Adam Optimization Algorithm

4. Experimental Setup

4.1. Datasets and Preprocessing

4.2. Model Implementation and Training Protocol

4.3. Testing and Validation Evaluation

5. Results and Discussion

5.1. Overall Performance Comparison

5.2. Impact of Fractional Order Kinetics: An Ablation Study

5.3. Robustness Across Different Operating Temperature

5.4. Adaptability to Dynamic Drive Cycles

5.5. Further Comparison of FDE-GRU ( α = 0.25 ) and FDE-GRU ( α = 1.0 )

5.6. Error Distribution Analysis

5.7. Computational Complexity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.5. Further Comparison of FDE-GRU ( $α = 0.25$ ) and FDE-GRU ( $α = 1.0$ )