U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets

Wen, Zhihong; Liu, Xiangpeng; Niu, Wenshu; Zhang, Hui; Cheng, Yuhua

doi:10.3390/en19020414

Open AccessArticle

U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets

by

Zhihong Wen

¹,

Xiangpeng Liu

^1,*

,

Wenshu Niu

¹,

Hui Zhang

¹ and

Yuhua Cheng

^2,*

¹

College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China

²

Shanghai Research Institute of Microelectronics, Peking University, Shanghai 201203, China

^*

Authors to whom correspondence should be addressed.

Energies 2026, 19(2), 414; https://doi.org/10.3390/en19020414

Submission received: 12 December 2025 / Revised: 2 January 2026 / Accepted: 9 January 2026 / Published: 14 January 2026

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Download

Browse Figures

Versions Notes

Abstract

Accurate prognosis of the remaining useful life (RUL) for lithium-ion batteries is critical for mitigating range anxiety and ensuring the operational safety of electric vehicles. However, existing data-driven methods often struggle to maintain robustness when transferring from controlled laboratory conditions to complex, sensor-limited, real-world environments. To bridge this gap, this study presents U-H-Mamba, a novel uncertainty-aware hierarchical framework trained on a massive hybrid repository comprising over 146,000 charge–discharge cycles from both laboratory benchmarks and operational electric vehicle datasets. The proposed architecture employs a two-level design to decouple degradation dynamics, where a Multi-scale Temporal Convolutional Network functions as the base encoder to extract fine-grained electrochemical fingerprints, including derived virtual impedance proxies, from high-frequency intra-cycle measurements. Subsequently, an enhanced Pressure-Aware Multi-Head Mamba decoder models the long-range inter-cycle degradation trajectories with linear computational complexity. To guarantee reliability in safety-critical applications, a hybrid uncertainty quantification mechanism integrating Monte Carlo Dropout with Inductive Conformal Prediction is implemented to generate calibrated confidence intervals. Extensive empirical evaluations demonstrate the framework’s superior performance, achieving a RMSE of 3.2 cycles on the NASA dataset and 5.4 cycles on the highly variable NDANEV dataset, thereby outperforming state-of-the-art baselines by 20–40%. Furthermore, SHAP-based interpretability analysis confirms that the model correctly identifies physics-informed pressure dynamics as critical degradation drivers, validating its zero-shot generalization capabilities. With high accuracy and linear scalability, the U-H-Mamba model offers a viable and physically interpretable solution for cloud-based prognostics in large-scale electric vehicle fleets.

Keywords:

lithium-ion battery; state of health (SOH); hierarchical deep learning; uncertainty quantification; battery management system (BMS)

1. Introduction

Lithium-ion batteries (LIBs) have revolutionized energy storage owing to their exceptional energy density, long cycle life, and scalable manufacturability, forming the technological cornerstone of modern electric vehicles (EVs) and renewable integration systems [1,2]. However, the electrochemical aging of LIBs—manifested through irreversible degradation of active materials, solid–electrolyte interphase (SEI) growth, and impedance rise—inevitably deteriorates performance and shortens operational lifespan [3,4]. This degradation not only compromises range and efficiency but also poses risks to system safety and reliability [5]. Consequently, accurately quantifying a battery’s state of health (SOH) and predicting its remaining useful life (RUL) have become pivotal challenges for intelligent battery management systems (BMSs) [6,7,8].

To address this challenge, an ever-growing body of research has focused on data-driven methodologies for battery prognostics, seeking to replace empirical lookup tables and physics-based models with adaptive learning frameworks [9,10]. Early work relied on handcrafted feature extraction from incremental capacity (IC), differential voltage (DV), and impedance curves, feeding classical regressors such as support vector machines (SVMs) and Gaussian process regression (GPR) [11,12]. These approaches offered transparency and efficiency but lacked robustness under varying operational and environmental conditions [13,14]. The emergence of deep learning has since transformed the field, allowing end-to-end modeling of temporal degradation processes directly from raw sensor data [15,16].

Recurrent neural networks (RNNs) and their gated extensions—long short-term memory (LSTM) and gated recurrent units (GRUs)—have been widely employed to capture long-term dependencies in cycling data [17,18]. For example, Patil et al. demonstrated that a Bi-GRU network effectively models both intra-cycle voltage fluctuations and inter-cycle degradation trends, outperforming traditional regressors on NASA and KIT datasets [19]. Complementarily, convolutional neural networks (CNNs) have been introduced to learn localized temporal–spatial patterns in charge–discharge profiles, providing interpretable morphology-aware features [20,21]. Hybrid networks combining CNN and LSTM modules have shown superior predictive accuracy and noise tolerance under real-world EV operation [22,23,24].

Beyond sequential architectures, researchers have increasingly explored attention mechanisms and transformer frameworks, originally designed for natural language processing, to model nonlinear long-range interactions across hundreds of cycles [25,26]. Chen et al. introduced a transformer network for RUL prediction, achieving state-of-the-art performance with interpretable attention maps that highlight degradation-critical cycles [27]. In parallel, the integration of physics-informed neural networks (PINNs) [28,29] and hybrid electrochemical–data models [30] has fostered physically consistent learning paradigms that constrain data-driven predictions within electrochemical laws. These developments bridge the interpretability gap between pure data-driven models and mechanistic understanding [31,32,33].

Meanwhile, open-access battery-aging datasets have been instrumental in advancing reproducible research [34,35,36,37,38,39]. The NASA PCoE battery dataset [33] provides high-frequency voltage, current, and impedance data under variable profiles, while the Oxford dataset [35] captures long-term degradation under realistic drive cycles. The CALCE dataset [37], collected at the University of Maryland, offers detailed cell-level impedance evolution for diverse chemistries (LCO, LFP, NMC). Collectively, these datasets have become benchmarks for evaluating new prognostic algorithms, encouraging standardized validation protocols and fair performance comparison.

Despite significant progress, three critical limitations persist. First, most architectures treat the entire battery life sequence as a flat, homogeneous time series, overlooking the inherently hierarchical structure of degradation. In reality, aging unfolds across two coupled scales: fast intra-cycle dynamics capturing electrochemical kinetics and slow inter-cycle evolution governing capacity fade [40,41,42,43]. Ignoring this hierarchy not only inflates computational cost but also restricts the model’s ability to capture multi-scale degradation behavior. Second, current models largely deliver deterministic point estimates, neglecting uncertainty quantification crucial for safety-critical BMS decision-making [44,45,46]. Third, the training of high-capacity neural models still depends heavily on balanced datasets, while real-world EV data remain sparse, noisy, and non-stationary [47,48].

To overcome these limitations, we propose U-H-Mamba, an uncertainty-aware hierarchical state-space model designed to jointly exploit the multi-scale structure of battery degradation and quantify predictive confidence. The framework comprises two synergistic components: a temporal convolutional network (TCN) encoder that distills high-frequency intra-cycle fingerprints and an enhanced pressure-aware multi-head Mamba-based state-space decoder that models long-range inter-cycle dependencies with linear computational complexity [49,50,51,52]. This hierarchical fusion explicitly preserves spatio-temporal structure, achieving both high accuracy and efficient scalability. Moreover, integrating a hybrid of Monte Carlo dropout with Conformal Prediction enables robust uncertainty quantification, offering calibrated confidence intervals critical for real-world deployment [53,54]. Our findings indicate that the proposed U-H-Mamba not only surpasses state-of-the-art baselines in predictive accuracy but also delivers physically consistent, uncertainty-aware degradation trajectories—paving the way toward next-generation intelligent BMSs and safe energy systems [55,56,57].

The main contributions of this study are summarized as follows:

✓: A novel hierarchical framework, U-H-Mamba, is proposed to explicitly model the dual-timescale nature of battery degradation. This approach decouples short-term intra-cycle dynamics from long-term inter-cycle evolution, creating a more physically meaningful and accurate representation that enhances robustness against sensor noise.
✓: An innovative two-stage architecture is introduced, which first uses a TCN to extract robust, low-dimensional “fingerprints” from each cycle. A Mamba model then captures the long-range dependencies between these abstract fingerprints, ensuring high computational efficiency suitable for resource-constrained devices.
✓: A lightweight uncertainty quantification module is seamlessly integrated into the framework using Monte Carlo Dropout. This allows the model to generate full probabilistic forecasts for risk-aware decision-making, all while adding negligible computational overhead compared to deterministic models.
✓: The framework’s accuracy, efficiency, and reliability are systematically validated against state-of-the-art methods on public benchmark datasets [58,59,60]. By demonstrating high scalability and robustness, this work contributes to the advancing field of data-driven prognostics alongside recent studies [61,62], offering a practical solution for real-world battery management systems.

The remainder of this paper is organized as follows: Section 2 details the proposed U-H-Mamba framework, including the data preprocessing, the two-stage hierarchical architecture, and the methodology for uncertainty quantification. Section 3 describes the experimental setup, benchmark datasets, and evaluation metrics, followed by a comprehensive presentation and analysis of the results. Section 4 discusses the implications of our findings, addresses the limitations of the study, and outlines potential avenues for future work. Finally, Section 5 concludes the paper by summarizing the key contributions and their significance.

2. Data Source and Processing

The dataset integrates controlled laboratory aging profiles with real-world EV operational traces to bridge the gap between simulated conditions and practical variabilities, such as inconsistent charging, thermal fluctuations, and pack-level inconsistencies. This hybrid approach is essential because laboratory data (e.g., NASA) provides clean, repeatable cycles for model learning, while real-world data (e.g., NDANEV) introduces noise and partial cycles to test robustness, ensuring the model’s applicability in actual EV fleets. Laboratory datasets include NASA (randomized discharge on 4 Li-ion 18,650 cells, ~2500 cycles under varying temperatures from −20 °C to 40 °C and discharge rates up to 2C, chosen for its diverse degradation modes like capacity fade from lithium plating), CALCE (dynamic stress tests on 8 prismatic cells, ~3200 cycles with impedance spectroscopy, selected for its focus on real driving profiles), and Oxford (ARTEMIS drive cycles on 8 pouch cells, ~1800 cycles emphasizing urban/highway patterns, included for its emphasis on temperature-induced aging). For real-world validation, we incorporate NDANEV (~28 million timestep measurements from 15 EVs, spanning > 1,200,000 km with variables like voltage, current, temperature, SOC, mileage, and cell voltage inconsistency metrics, vital for capturing fleet-scale inconsistencies) and BatteryML (aggregated fleet data from 2025 releases, ~50,000 cycles with annotated RUL labels and direct pressure sensor readings from advanced packs, chosen for its pressure data to enhance physical priors). Specifically, the BatteryML dataset is unique as it provides ground-truth mechanical pressure sensor readings, serving as the “physical anchor” for our model. In contrast, the NDANEV and NASA datasets, representative of standard operational conditions, lack direct pressure sensors. For these datasets, we employ a transfer learning strategy to derive a Virtual Pressure Proxy, utilizing the electrochemical–mechanical correlations learned from BatteryML. These real-world sources capture fleet-scale degradations absent in lab settings, such as partial cycles and environmental noise, justifying their inclusion to improve model generalization.

Data processing follows a reproducible pipeline: (1) Cleaning removes duplicates, interpolates missing values (linear method for gaps < 5%, chosen for its simplicity and preservation of temporal trends), and handles anomalies via z-score thresholding (

|z| > 3

, selected to detect outliers without assuming distribution). (2) To ensure data integrity, cycling segments were delineated only when the change in state of charge (ΔSOC) exceeded 30%. The SOC window was further restricted to 15–95%, as operation near the extreme ends of the SOC range is known to introduce measurement artefacts and instability, as reported in previous studies. (3) Reconstruction assembles chronological sequences, prioritizing discharge segments for RUL relevance (driving loads reflect real usage, while charging supplements for stability). Physics-informed augmentation enhances robustness by introducing controlled perturbations that mimic real-world operational variability: temperature variations (

σ = 5 ° C

, reflecting typical ambient fluctuations observed in EV datasets), measurement noise in current readings (

σ = 0.1 A

, consistent with sensor specifications), and minor SOC estimation errors (

\pm 2

, representing BMS uncertainty), increasing effective dataset size by 20–30% to mitigate overfitting in real-world sparse data. Augmentation is crucial as it mimics operational variabilities, improving the model’s ability to handle unseen conditions without requiring additional real data collection.

As illustrated in Figure 1a, which displays one-day operational data from the NDANEV dataset (x-axis: time in seconds; y-axes: total voltage (V, blue), total current (A, orange), SOC (%, green), mileage (km, purple), maximum/minimum temperature (°C, red dashed/solid), maximum/minimum cell voltage (V, cyan dashed/solid), and cell voltage differential indicating internal stress (mV, black), the raw traces reveal high variability during driving (discharge modes with sharp current spikes) versus stable charging, justifying segment prioritization for TCN inputs to capture fine-grained features without excessive noise. Figure 1b presents the reconstructed discharge/charge segment dataset, showing representative segments as time-series subplots (voltage, current, SOC, temperature, pressure), highlighting the orderly sequence post-processing for Mamba’s long-range modeling. These visualizations underscore the dataset’s representativeness, with NDANEV providing mileage-driven realism and BatteryML adding pressure-aware labels. Table 1 summarizes the processed datasets for reproducibility, including total cycles, entities, key features, and mileage spans.

3. RUL Prediction Framework

To address the limitations of existing battery prognostics methods, such as reliance on controlled laboratory data and insufficient handling of uncertainties in real-world scenarios, this section proposes an optimized Uncertainty-aware Hierarchical Mamba (U-H-Mamba) framework for high-precision remaining useful life (RUL) prediction of lithium-ion batteries. The framework employs a two-level encoding architecture: a temporal convolutional network (TCN) extracts fine-grained electrochemical “fingerprints” from intra-cycle high-frequency data, capturing subtle degradation patterns like solid–electrolyte interphase (SEI) growth and lithium plating; an enhanced Mamba model processes inter-cycle sequences to model long-range degradation trajectories. Uncertainty quantification (UQ) is achieved end-to-end through a hybrid mechanism integrating Monte Carlo (MC) Dropout for epistemic uncertainty estimation and Conformal Prediction for calibrated aleatoric and epistemic intervals, providing probabilistic outputs with theoretical guarantees. This design draws on recent advancements in state-space models for efficient long-sequence modeling and pressure-aware features for physical interpretability, enabling robust generalization across laboratory and operational EV environments. The hierarchical approach is chosen because battery degradation exhibits multi-scale characteristics: short-term electrochemical fluctuations within cycles and long-term cumulative effects across cycles, which traditional single-scale models (e.g., LSTMs) struggle to capture efficiently without high computational costs or vanishing gradients.

The overall workflow, illustrated in Figure 2 (a schematic diagram of the hierarchical architecture, including data flow from raw inputs through TCN fingerprint extraction, Mamba trajectory modeling, and UQ layers), comprises four steps: data processing, reference RUL calculation and feature extraction, model training with hyperparameter optimization, and performance evaluation. For reproducibility, all code is implemented in PyTorch 2.0+ with NumPy 1.26 and SciPy 1.13 for data handling. Experiments were conducted on a Slurm cluster equipped with four NVIDIA A100 GPUs (80 GB VRAM each), using a batch size of 64 and AdamW optimizer (weight decay

1 \times 10^{- 4}

). The framework is trained on a combined dataset split: 70% training, 15% validation, and 15% testing, with chronological ordering to simulate online deployment and prevent data leakage from future cycles.

3.1. Reference RUL Calculation

The Remaining Useful Life (RUL) at the current cycle

k

is formally defined as the number of operational cycles remaining until the battery capacity fades to the End-of-Life (EOL) threshold. Following automotive industry standards, EOL is established at

80

of the nominal capacity (

C_{0}

). The RUL formulation is given by:

{RUL}_{k} = N_{EOL} - k

(1)

where

N_{EOL}

is the cycle when capacity

C_{k} \leq 0.8 \times C_{0}

. This capacity-based metric is prioritized over State of Health (SOH) as it directly quantifies the remaining service range, addressing critical user concerns regarding range anxiety.

To obtain high-fidelity capacity labels from fragmented real-world data, we employ Ampere-hour (Ah) integration. Discharge segments are strictly prioritized over charging phases to capture load-dependent degradation dynamics that reflect actual usage conditions. The capacity

C_{k}

is calculated as:

C_{k} = \frac{\int_{t_{s}}^{t_{e}} I (t) d t}{Δ SOC}

(2)

where

I (t)

represents the discharge current, and

t_{s}, t_{e}

are the segment start and end timestamps. To mitigate numerical instability caused by sensor noise, only segments with a State-of-Charge variation of at least thirty percent are retained.

However, raw capacity sequences derived from operational traces inherently suffer from stochastic fluctuations and measurement anomalies. To reconstruct high-fidelity degradation trajectories from these noisy inputs, a robust two-stage reconstruction pipeline is implemented. Initially, distinct anomalies caused by sensor dropouts or incomplete segments are eliminated using a kinetic change-rate constraint. This step enforces physical plausibility by discarding mathematically impossible jumps between adjacent cycles:

β_{w} = \frac{|C_{k} - C_{k - 1}|}{C_{k - 1}} \leq 0.05

(3)

Following outlier rejection, residual sensor noise is mitigated using a Savitzky–Golay filter (window length = 11, polynomial order = 3). Unlike conventional moving averages that tend to blur sharp transitions, this method leverages least-squares convolution to fit local polynomials, effectively suppressing high-frequency noise while strictly preserving critical nonlinear geometric features, such as the degradation “knee-point”.

The efficacy of this reconstruction strategy is substantiated in Figure 3. As shown in Figure 3a, the reconstructed labels accurately track the capacity fade trends in NASA cells (B0005 and B0006) without signal distortion. Figure 3b further visualizes the normalized capacity heatmap across the dataset, revealing consistent degradation patterns despite varying operating conditions. Additionally, the smoothing error distribution analysis in Figure 3c demonstrates that the root-mean-square error (RMSE) between the raw and smoothed data remains minimal (e.g., 29.21 mAh for B0005), confirming that the filtering process removes noise without altering the underlying physical truth.

3.2. Health Feature Extraction

To construct a robust feature space (

F = 25

) optimized for both laboratory and real-world scenarios, we employed a hybrid feature selection strategy. Initially, 24 observable statistical features were extracted directly from the raw sensor data (Voltage, Current, Temperature, Pressure, and SOC profiles), as shown in the correlation matrix (see Figure 4). However, to address multicollinearity and enhance physical interpretability, we performed feature refinement. Specifically,

V_{r a n g e}

was excluded from the final input vector due to its linear redundancy with

V_{m a x}

and

V_{m i n}

. Simultaneously, to compensate for the absence of hardware EIS sensors in operational EVs, we introduced two Virtual Impedance features (

Z_{r e a l}, Z_{i m a g}

). These are derived from high-frequency intra-cycle segments via the TCN encoder, serving as latent proxies for internal resistance. Consequently, the final input vector consists of 25 features: 23 selected observable metrics and 2 physics-informed virtual impedance proxies. This configuration allows the model to leverage the high predictive value of impedance while maintaining a compact feature space.

Figure 5 presents detailed scatter plots of the six most influential features versus RUL, demonstrating strong linear relationships. Notably,

P_{m e a n}

exhibits a correlation coefficient of approximately minus 0.691 with RUL as a result of swelling-induced capacity fade, whereas mileage and cumulative energy show correlation coefficients of approximately minus 0.827 and minus 0.766, respectively, reflecting usage-driven degradation. A threshold requiring the absolute correlation coefficient to exceed 0.5 identifies eight features with strong correlations, thereby balancing predictive power and model parsimony. This correlation analysis provides initial guidance for feature selection, while the final feature importance and nonlinear interactions are evaluated through post hoc interpretability analysis (see Section 4.6) after model training.

Normalization applies min-max scaling ensuring inputs in [0, 1] for model stability, preventing features with larger scales (e.g., mileage in km) from dominating smaller ones (e.g., voltage in V):

X_{n} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(4)

This set captures nonlinear aging, with pressure features enhancing electrochemical fidelity for TCN “fingerprints”, justifying their inclusion over purely data-driven alternatives.

Furthermore, a critical challenge in aligning laboratory and real-world datasets is the absence of Electrochemical Impedance Spectroscopy (EIS) sensors in operational EVs. To maintain a unified feature space across all domains, we employ a physics-informed Virtual Impedance Proxy mechanism. Since electrochemical impedance is fundamentally the frequency-domain response of voltage to current excitation, we hypothesize that a deep neural network can approximate this mapping from high-frequency time-domain measurements.

Formally, let the input sequence be defined as the raw intra-cycle voltage and current sequences for the

k

-th cycle, where T represents the normalized time steps. We employ a dedicated branch of the TCN encoder, denoted as

F_{θ}

, to map these time-series inputs to the latent impedance space:

{[{\hat{Z}}_{r e a l}, {\hat{Z}}_{i m a g}]}_{k} = F_{θ} (V_{k}, I_{k})

(5)

where the output represents the estimated real (ohmic resistance) and imaginary (polarization) components of the impedance, respectively. To ensure physical consistency, the mapping function is pre-calibrated using the NASA dataset, which contains ground-truth EIS measurements. The calibration is governed by the Mean Squared Error (MSE) loss:

L_{p r o x y} = \frac{1}{N_{c a l i b}} \sum_{k = 1}^{N_{c a l i b}} ({∣ ∣ {\hat{Z}}_{r e a l}^{(k)} - Z_{r e a l}^{G T, (k)} ∣ ∣}^{2} + {∣ ∣ {\hat{Z}}_{i m a g}^{(k)} - Z_{i m a g}^{G T, (k)} ∣ ∣}^{2})

(6)

Once trained, the encoder is frozen and applied to the real-world datasets (NDANEV and BatteryML) to generate the virtual impedance proxies. These two scalars are then concatenated with the 23 statistical features to form the final input vector:

x_{k} = [f_{s t a t}, {\hat{Z}}_{r e a l}, {\hat{Z}}_{i m a g}]

(7)

This strategy effectively transfers the electrochemical knowledge learned from laboratory conditions to sensor-limited real-world environments, allowing the model to leverage the high predictive value of impedance features even in sensor-limited scenarios.

3.3. Optimized U-H-Mamba Hybrid Model

The base level utilizes a Multi-scale Temporal Convolutional Network (TCN) to extract fine-grained electrochemical fingerprints from high-frequency intra-cycle measurements. Unlike recurrent architectures such as LSTMs, which are constrained by sequential processing bottlenecks, the TCN employs dilated causal convolutions to expand the receptive field exponentially without increasing parameter complexity. This design enables the efficient parallel capture of local transient features, including voltage spikes during discharge and pressure-induced shifts, while strictly preserving temporal causality.

Formally, the input is defined as a sequence matrix

X \in R^{T \times F}

. Here,

T

represents the timesteps per cycle padded to a standard length of 3600 s. The feature dimension

F

is set to 25, which explicitly comprises 23 observable metrics alongside two derived virtual impedance proxies to compensate for sensor limitations. The operation of the

l

-th residual block with a specified dilation rate

d

is formulated as:

h_{l} = ReLU (Conv 1 D (h_{l - 1}, k = 3, d) + h_{l - 1})

(8)

To mitigate the risk of overfitting on noisy real-world data, a dropout rate of 0.1 is applied within the residual connections. The resulting encoder output is a sequence of compact cycle embeddings, denoted as

H_{i n t r a} \in R^{C \times D}

, where

C

is the total cycle count and the embedding dimension

D

is fixed at 128. This latent representation serves as the high-level descriptor of the battery’s degradation state at each cycle.

To model the long-range degradation evolution, the upper level employs a Pressure-Aware Multi-Head Mamba decoder. This component processes the cycle embeddings

H_{i n t r a}

as a contiguous sequence. The core of this module is the Selective State Space Model (SSM), which discretizes continuous latent dynamics into a linear recurrence. A key advantage of this approach is its linear computational complexity,

O (L)

, which offers a significant efficiency improvement over the quadratic scaling of standard Transformers, particularly when handling extended operational histories exceeding 1000 cycles.

The selectivity of the Mamba layer is governed by a data-dependent timescale parameter

Δ_{t}

, allowing the model to dynamically filter information based on the current input

u_{t}

. The discretization process is defined as:

Δ_{t} = Softplus (A + B u_{t})

(9)

In this formulation,

A

and

B

are learnable projection matrices. Specifically, matrix

A

is initialized diagonally to ensure the stability of long-term memory retention.

A critical concern in continuous-to-discrete transformations is the potential for discretization errors to compromise stability. However, the proposed framework mitigates this through timescale decoupling and adaptive discretization. Since the high-frequency voltage transients are handled by the TCN encoder, the SSM solely models the inter-cycle degradation manifold, which is inherently smooth and slow-varying, thereby minimizing approximation errors. Furthermore, the learnable step size

Δ_{t}

allows the model to dynamically optimize its discretization granularity, acting as an adaptive regularization term that prevents error accumulation during long-term recursive prediction.

To further enhance the detection of abrupt state transitions, such as the degradation “knee-point”, and to incorporate global context, a Multi-Head Self-Attention mechanism operates in parallel with the SSM scan. The attention score is computed as follows:

Attn = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(10)

To explicitly incorporate physical degradation priors into the data-driven trajectory, a pressure-aware gating mechanism is integrated before the final prediction head. This design is motivated by the electrochemical correlation between internal pressure buildup and resistance increase. The pressure-modulated hidden state,

G_{p}

is computed as:

G_{p} = σ (W_{p} P_{mean} + b_{p}) ⊙ h_{t}

(11)

where

P_{m e a n}

denotes the mean cycle pressure (or its virtual proxy),

σ

is the sigmoid activation function, and

⊙

represents element-wise multiplication. This gating operation effectively amplifies the degradation signals associated with irreversible volume expansion.

To accommodate the heterogeneity in sensor availability across diverse datasets, the pressure input

P_{mean}

in Equation (8) is defined through an adaptive mechanism. For datasets equipped with physical sensors, such as BatteryML, the input directly utilizes the normalized ground-truth pressure measurements. Conversely, for sensor-limited environments like NDANEV and NASA, the input is replaced by a virtual proxy derived via transfer learning. Analogous to the impedance extraction strategy, we pre-train a mapping function on the BatteryML dataset using a dedicated TCN branch. This learned mapping is mathematically expressed as:

{\hat{P}}_{mean} = M_{ϕ} (V, I)

(12)

Specifically, the mapping function

M_{ϕ}

is parameterized as a deep neural network composed of stacked dilated causal convolution layers followed by a regression head. Let X = [V, I] denote the input sequence. The operation is defined as:

{\hat{P}}_{mean} = W_{proj} \cdot GAP (σ (F_{d i l a t e d} (X; θ_{enc}))) + b_{proj}

(13)

where

F_{d i l a t e d}

represents the multi-scale dilated convolution operations with learnable parameters

θ_{enc}

, sigma denotes the non-linear activation function (e.g., ReLU), and GAP stands for Global Average Pooling, which aggregates the temporal features into a latent vector. Finally,

W_{proj}

and

b_{proj}

constitute the linear projection layer that maps the latent features to the scalar pressure value. The set of all learnable parameters is denoted as

ϕ

= {

θ_{enc}

,

W_{proj}

,

b_{proj}

}.

The final RUL point prediction,

μ

is subsequently generated via a linear projection layer:

μ = Linear (y)

(14)

This architecture maintains a linear time complexity of

O (L)

with respect to the sequence length

L

, enabling efficient deployment on resource-constrained Battery Management Systems (BMSs).

Reliable prognostics require calibrated confidence intervals. The proposed framework employs a hybrid strategy combining Monte Carlo (MC) Dropout for epistemic uncertainty and Conformal Prediction for aleatoric calibration.

Epistemic uncertainty is captured via MC Dropout by treating dropout regularization as an approximate Bayesian inference. During inference, stochastic forward passes are performed to generate a predictive distribution. The predictive variance,

s_{i}

is derived from these samples as:

s_{i} = Var ({\hat{y}}_{k}), k = 1, \dots, K

(15)

The uncertainty quantification mechanism operates through a synergistic pipeline where Monte Carlo (MC) Dropout estimates the model’s epistemic uncertainty, which subsequently serves as the normalization factor for Inductive Conformal Prediction (ICP). Specifically, for a given input, we first perform

K

= 100 stochastic forward passes with dropout active to derive the empirical predictive mean

\hat{μ}

and the predictive variance

s_{i}

(as defined in Equation (10)). While

s_{i}

captures the local model uncertainty, reliance solely on Gaussian assumptions often leads to miscalibration. Therefore, we utilize ICP to construct valid distribution-free confidence intervals.

We designate the validation split as the independent calibration set

D_{cal}

. For each sample in

D_{cal}

, we compute a non-conformity score

α_{i}

, defined as the uncertainty-normalized absolute error:

α_{i} = \frac{|y_{i} - \hat{y_{i}}|}{1 + s_{i}}

(16)

This metric ensures that the score reflects the “surprise” of a prediction relative to the model’s own estimated uncertainty. Finally, we calculate the specific quantile of these calibration scores corresponding to the desired target coverage level (e.g., the 95th percentile). The final prediction interval for a new test point is constructed by scaling the MCD-derived uncertainty by this conformal quantile (as shown in Equation (12)). This hybrid design ensures that the intervals are locally adaptive—widening in high-uncertainty regions identified by MCD—while rigorously satisfying the nominal coverage rate guaranteed by ICP.

Based on these scores, the final prediction interval is constructed to satisfy a nominal coverage rate (e.g.,

95

). The interval bounds are defined by the empirical quantiles of the calibrated distribution:

C (x^{*}) = [\hat{μ} - q_{1 - α} (1 + s), \hat{μ} + q_{1 - α} (1 + s)]

(17)

The model is trained end-to-end using a composite loss function that balances deterministic accuracy with probabilistic likelihood. The total objective function, L is formulated as:

L = MSE (μ, y) + λ NLL (σ^{2}, y) (λ = 0.5)

(18)

where

MSE

represents the Mean Squared Error,

NLL

denotes the Negative Log-Likelihood, and

λ

is a weighting factor set to 0.5 to balance the two objectives.

For the sake of reproducibility and clarity, a comprehensive summary of all model hyperparameters and architectural configurations is provided in Appendix D.

3.4. Hyperparameter Tuning

To navigate the high-dimensional search space of the U-H-Mamba architecture efficiently, we employ Bayesian Optimization rather than computationally expensive grid searches. This approach models the objective function as a Gaussian Process (GP), allowing the algorithm to balance exploration and exploitation by prioritizing regions with high expected improvement.

The optimization process targets a composite objective that simultaneously minimizes point prediction error and maximizes probabilistic calibration:

J = RMSE + NLL

(19)

The search space encompasses key structural and learning parameters: the number of TCN layers (4–8), dilation rates (

d \in [1, 2, 4, 8]

), Mamba state dimensions (

D \in [16, 64]

), and the learning rate (ranging from

1 \times 10^{- 4}

to

1 \times 10^{- 2}

). The optimization is conducted over 50 iterations using the scikit-optimize library.

To prevent overfitting, an early stopping mechanism is implemented with a patience of 20 epochs and a minimum delta of

1 \times 10^{- 4}

. Typical training convergence is achieved within 60 to 120 epochs, corresponding to a computational duration of 45 min to 2.5 h per run, ensuring the framework remains practical for rapid deployment.

3.5. Performance Evaluation Metrics

The predictive performance of the proposed framework is rigorously evaluated using a comprehensive suite of metrics, selected to assess both global accuracy and localized reliability in safety-critical scenarios.

The Mean Absolute Error (MAE) quantifies the average magnitude of prediction deviations, providing an intuitive measure of unbiased estimation performance:

MAE = \frac{1}{n} \sum_{k = 1}^{n} |y_{k} - {\hat{y}}_{k}|

(20)

To penalize larger errors that could lead to catastrophic failures (e.g., sudden power loss), the Root Mean Square Error (RMSE) is employed. This metric is particularly critical in battery prognostics, where underestimating degradation poses severe safety risks:

RMSE = \sqrt{\frac{1}{n} \sum_{k = 1}^{n} {(y_{k} - {\hat{y}}_{k})}^{2}}

(21)

To facilitate performance comparison across diverse datasets with varying cycle lives, the Mean Absolute Percentage Error (MAPE) normalizes deviations relative to the ground truth:

MAPE = \frac{1}{n} \sum_{k = 1}^{n} |\frac{y_{k} - {\hat{y}}_{k}}{{\hat{y}}_{k}}| \times 100 %

(22)

Furthermore, the Coefficient of Determination (

R^{2}

) assesses the proportion of variance in the degradation trajectory explained by the model, with values approaching 1 indicating a robust fit:

R^{2} = 1 - \frac{\sum_{k = 1}^{n} {(y_{k} - {\hat{y}}_{k})}^{2}}{\sum_{k = 1}^{n} {(y_{k} - {\bar{y}}^{2})}^{2}}

(23)

Given the practical importance of the End-of-Life (EOL) threshold in warranty services, the Absolute Error at EOL (

{AE}_{EOL}

) is specifically calculated to evaluate prediction accuracy at the critical failure boundary:

{AE}_{EOL} = |y_{EOL} - {\hat{y}}_{EOL}|

(24)

To explicitly evaluate predictive fidelity during the highly nonlinear acceleration phase of degradation—where small deviations can propagate into significant prognostic errors—the Mean Percentage Error around the Knee Point (

{MPE}_{Knee}

) is computed:

{MPE}_{Knee} = \frac{1}{m} \sum_{j = 1}^{m} |\frac{y_{j} - {\hat{y}}_{j}}{y_{j}}| \times 100 %

(25)

where

m

denotes the number of cycles within the identified knee-point region.

Beyond deterministic accuracy, the reliability of the probabilistic outputs is assessed using a comprehensive set of Uncertainty Quantification (UQ) metrics. The Coverage Probability (CP) measures the empirical frequency with which the ground truth trajectories fall within the predicted confidence intervals. For reliable safety estimation, a coverage level of no less than ninety-five percent is generally required:

CP = \frac{1}{n} \sum_{k = 1}^{n} I {q_{l o w e r, k} \leq y_{k} \leq q_{u p p e r, k}}

(26)

where

I \{\cdot\}

represents the indicator function, and

q_{l o w e r, k}, q_{u p p e r, k}

are the lower and upper bounds of the prediction interval, respectively.

To assess the precision of these estimates, the Mean Prediction Interval Width (MPIW) quantifies the average sharpness of the uncertainty bounds:

MPIW = \frac{1}{n} \sum_{k = 1}^{n} (q_{u p p e r, k} - q_{l o w e r, k})

(27)

For a holistic view of probabilistic calibration, the Negative Log-Likelihood (NLL) evaluates the distributional fit under a Gaussian assumption, simultaneously penalizing prediction bias and under-estimation of variance:

NLL = \frac{1}{2 n} \sum_{k = 1}^{n} [\ln (2 π σ_{k}^{2}) + \frac{{(y_{k} - {\hat{y}}_{k})}^{2}}{σ_{k}^{2}}]

(28)

To provide a multi-scale analysis of interval reliability, the evaluation extends to the Prediction Interval Coverage Probability (

{PICP}_{γ}

) and Mean Interval Width (

{MIW}_{γ}

) at nominal confidence levels of

99

and

68

. The Calibration Error (CE) quantifies the discrepancy between the empirical coverage and the nominal confidence level:

CE = |CP - (1 - α)|

(29)

Finally, the Sharpness Index (Sharp) serves as a composite metric that rewards models capable of producing narrow intervals while maintaining high calibration fidelity:

Sharp = MPIW / (1 - CE)

(30)

Collectively, these metrics provide a rigorous framework for evaluating both the deterministic accuracy and the probabilistic reliability of the proposed model across diverse operational conditions.

4. Results and Discussion

To ensure rigorous validation and reproducibility, all experiments were conducted on a high-performance computing cluster managed by Slurm, equipped with four NVIDIA A100 GPUs (80 GB VRAM). The framework was implemented in PyTorch 2.0. Training utilized a batch size of 64, with an early stopping mechanism (patience = 20) to prevent overfitting. The optimization of model parameters followed the Bayesian strategy detailed in Section 3.5, ensuring that each candidate model operated near its theoretical distinct optimal. Results reported hereafter represent the average performance over five independent runs, with standard deviations provided to quantify statistical stability. Baselines encompass TCN-LSTM, Vanilla Mamba, PatchTST, GRU-MC Dropout, CNN-PSO, XGBoost, CNN-TLSTM (2025), GM-PFF (2025), and VMD-SSA-PatchTST (2025). Crucially, to isolate the contribution of the proposed hierarchical architecture, a strict “controlled variable” protocol was enforced. All baseline models were fed the identical 25-dimensional input feature vector (comprising the 23 observable metrics and 2 virtual impedance proxies defined in Section 3.2) and underwent the same Bayesian Hyperparameter Optimization procedure on the validation set. This ensures that any observed performance gains are attributable to the superior spatio-temporal modeling and uncertainty quantification mechanisms of U-H-Mamba, rather than discrepancies in feature engineering or tuning effort.

4.1. Point Estimation Performance

To compare the overall prediction behaviors of different prognostic baselines, Figure 6 provides a unified three-dimensional visualization of the error surfaces across the full cycling range. Traditional signal-processing models such as GM-PFF and VMD-SSA exhibit wide error dispersion and strong cycle-index sensitivity, indicating limited capability in capturing long-term degradation trajectories. Deep temporal models (CNN-TLSTM, GRU-MC) reduce noise but show pronounced instability around the knee region, where degradation accelerates. The Vanilla Mamba baseline improves long-range temporal consistency yet still suffers from residual oscillation. These results suggest that none of the conventional architectures achieve both intra-cycle fidelity and inter-cycle stability simultaneously, motivating the need for a hierarchical representation.

The predictive superiority of the U-H-Mamba framework is visualized in Figure 7. As depicted in the three-dimensional error landscape (see Figure 7a), the proposed model exhibits a markedly flattened and cohesive error topology compared to the baselines. This smoothness indicates exceptional robustness against operating condition heterogeneity and minimizes cycle-dependent performance fluctuations.

Quantitatively, the prediction error distribution (see Figure 7b) converges to a sharp unimodal profile centered near zero, characterized by a mean error of approximately 1.26 cycles and minimal variance. This confirms the model’s stability across the statistical population. The framework’s generalization capability is further corroborated by the cross-dataset evaluations in Figure 7c–e, where consistent accuracy is maintained across the distinct electrochemical protocols of NASA, CALCE, and NDANEV datasets.

Crucially, the lifetime error evolution curve (Figure 7f) demonstrates that U-H-Mamba maintains minimal prognostic drift during both the early incubation phase and the late aging phase. Most notably, it successfully suppresses the error amplification typically observed near the degradation “knee-point”. Collectively, these results validate the architectural synergy of the proposed framework: the base encoder captures fine-scale electrochemical signatures, while the state-space decoder preserves long-range trajectory consistency.

Competitive accuracy emerges from the hierarchical integration of multi-scale TCN for intra-cycle details and pressure-aware multi-head Mamba for inter-cycle patterns, enabling nuanced capture of degradation dynamics like SEI growth in labs or thermal inconsistencies in EVs. Laboratory environments facilitate precise modeling of controlled fade, whereas real-world noise tests generalization, with physics-informed augmentation mitigating partial-cycle issues by simulating operational variabilities like temperature fluctuations (

σ = 5 ° C

) to reduce domain gaps.

The comparative performance of the U-H-Mamba framework against state-of-the-art baselines is detailed in Table 2. A full set of comparative results and extended metrics can be found in Appendix A, Table A1. Across all evaluated datasets, the proposed model consistently achieves the lowest error metrics, with

R^{2}

values exceeding 0.98. This high coefficient of determination indicates a robust fit to the nonlinear degradation trajectories.

In the controlled environment of the NASA B0005 dataset (mid-life start), the model yields an RMSE of 3.6 cycles (

\pm 0.4

) and an MAE of 2.4 cycles (

\pm 0.3

). This represents a significant improvement over the Vanilla Mamba baseline (RMSE 5.0 cycles). The performance gain is primarily attributed to the multi-scale dilated convolutions in the base encoder, which effectively resolve subtle early nonlinearities—such as the electrochemical signatures of initial lithium plating. Baseline models, constrained by simpler temporal processing, often overlook these fine-grained signals, leading to a

20 - 30

overestimation of RUL during the early stages where data scarcity amplifies prediction variance.

Furthermore, the model demonstrates precise detection of the degradation “knee-point”, evidenced by a MAPE of

2.0

(

\pm 0.2

) and a Knee-Point Mean Percentage Error (

{MPE}_{Knee}

) of only

1.3

(

\pm 0.2

). Accuracy in this critical phase is paramount for proactive maintenance; errors at this stage can inflate lifetime projections by

15 - 25

in fleet operations, potentially delaying necessary interventions and increasing safety risks associated with undetected fade acceleration.

In the challenging real-world NDANEV dataset (Vehicle 1), the framework demonstrates exceptional robustness despite environmental variability and mileage-driven stress. It achieves an RMSE of 5.8 cycles (

\pm 0.6

), outperforming the CNN-PSO (8.4 cycles) and XGBoost (7.7 cycles) baselines by

25 - 35

. This robustness is underpinned by the incorporation of physics-informed priors—specifically the pressure-aware and virtual impedance features—which correlate strongly with pack swelling (

r \approx - 0.7

with RUL). These features enhance predictive fidelity, reducing the Absolute Error at End-of-Life (

{AE}_{EOL}

) to 3.5 cycles (

\pm 0.4

) compared to over 5 cycles for baselines. Such precision is critical for warranty assessments, where overprediction can cost manufacturers an estimated

10 - 20

in unnecessary battery replacements.

Notably, even when predicting from limited historical data (early 30% of life), the framework maintains an RMSE below 7 cycles. In contrast, baseline models exhibit higher MAPE (

3.9 - 6.0

) under these conditions due to inadequate noise tolerance. These results underscore the effectiveness of the U-H-Mamba augmentation strategy in bridging the domain gap between laboratory precision and field robustness.

The behavior of the proposed model around the degradation knee becomes evident when examining Figure 8a. As the cell approaches the transition region, the predicted RUL trajectory remains closely aligned with the true degradation trend, and the associated confidence interval stays comparatively compact. The inset highlights how this stability is preserved even as the early nonlinear signatures begin to emerge, whereas the baseline methods exhibit noticeably larger drift and irregular fluctuations during the same period. A clearer distinction among methods appears in Figure 8b, where the absolute prediction errors are tracked over the full operating window. The proposed model retains most deviations within a 5-cycle margin and gradually reduces its average error as the knee is crossed. Competing approaches show far more scattered behavior, with frequent and pronounced overshoots that reflect delayed adjustment to the accelerated aging phase. The relative-error characteristics in Figure 8c further reinforce this trend. The proposed model maintains low and steady MAPE values throughout the degradation process, avoiding the sharp rise commonly observed once the knee has been passed. Baseline predictors, by contrast, experience strong post-knee amplification in relative error, signaling reduced reliability during the phase when predictive consistency is most critical for maintenance planning.

4.2. Uncertainty Quantification Performance

Calibrated intervals from the hybrid MC Dropout and Conformal Prediction decompose epistemic ambiguity (e.g., model limitations in sparse data) and aleatoric noise (e.g., sensor variability), ensuring reliable bounds for risk-averse applications like EV warranty planning where undercoverage could overlook 10–15% of failure risks.

The calibrated intervals derived from the hybrid MC Dropout and Conformal Prediction mechanism effectively decompose epistemic ambiguity, such as model limitations in sparse data regions, from aleatoric noise caused by sensor variability. This decomposition ensures reliable bounds for risk-averse applications, including EV warranty planning, where undercoverage risks overlooking a significant portion of potential failures.

As presented in Table 3, the proposed U-H-Mamba framework demonstrates superior probabilistic fidelity, achieving a CP of

98.4 \pm 0.8

at the 95% nominal confidence level. Notably, the MPIW ranges from 9 to 12 cycles, which represents a significant reduction of 25–35% compared to the 15 to 16 cycles observed in the GRU-MC baseline. This improvement stems from the synergistic integration of Monte Carlo sampling and conformal calibration, which enforces empirical validity without sacrificing interval compactness, as evidenced by a low CE of

0.04 \pm 0.01

. In contrast, Bayesian neural approximations often exhibit over-conservatism with a higher CE of approximately 0.06, resulting in unnecessarily wide intervals under noisy conditions.

Furthermore, the low

NLL

of

0.46 \pm 0.05

indicates high probabilistic sharpness, with predictive distributions concentrated tightly around the ground-truth trajectories. Consequently, the overall Sharpness Index reaches

0.11 \pm 0.01

, surpassing the GM-PFF baseline which records an index of 0.13. Temporal analysis reveals that the interval width naturally contracts as data accumulates; for instance, on the NASA B0005 cell, the width decreases from

10.8 \pm 1.2

cycles in the early phase to

6.2 \pm 0.7

cycles in the late phase. Simultaneously, the CE drops to

0.03 \pm 0.01

, demonstrating the model’s effective adaptation to information gain.

In the challenging real-world NDANEV scenario, the incorporation of pressure-aware calibration further enhances robustness, yielding a Sharpness Index of

0.13 \pm 0.01

. Even under significant operational noise, the model maintains a 99% coverage probability with manageable interval widths of approximately 16 cycles. This capability is essential for risk-averse fleet management and warranty analytics. Comprehensive quantitative comparisons across all baselines and confidence levels are provided in Appendix B, Table A2.

The prediction intervals in Figure 9 encompass true RUL with high coverage across datasets (e.g., 98.3% on CALCE CS2-33 in Figure 9b, with adaptive widths (e.g., broader 14.4 cycles in variable stress phases due to aleatoric noise from dynamic loads, narrower in stable segments) demonstrating responsiveness to underlying data heterogeneity, thereby providing actionable confidence for operators to adjust thresholds based on risk tolerance.

4.3. Ablation Studies

Component isolation on NASA (70/15/15 split) elucidates contributions, revealing synergies where TCN-Mamba fusion and UQ yield 25–50% gains over partial variants, with pressure features mitigating specific degradation modes.

In Table 4, removing the TCN component elevates RMSE to 6.7 cycles (+86%) and MAE to 4.5 cycles (+88%), as raw temporal inputs without dilated convolutions fail to resolve electrochemical micro-patterns such as voltage fluctuations, thereby inflating early-cycle robustness error (

Ro = 1

6% ± 2) under 10% Gaussian perturbation compared to the full model’s 6%. This confirms TCN’s critical role in preserving fine-grained degradation fingerprints—particularly early lithium plating cues—whose absence amplifies mid-cycle variance by 25–35%. Excluding the enhanced Mamba block further raises RMSE to 6.0 cycles (+67%), underscoring the significance of its multi-head gating in propagating pressure-dependent dependencies; the lack of such structured recurrence increases convergence epochs (

{Conv}_{Epochs}

≈ 37 ± 3) due to inefficient capture of nonlinear knee transitions. Removing pressure-aware features results in RMSE 4.9 cycles (+36%), as the loss of physical priors severs correlations between pressure and internal resistance associated with swelling-induced capacity fade, increasing

{MPE}_{Knee}

to 2.1% (±0.2) and degrading performance under temperature variability where unregularized baselines typically show 20% higher variance. Disabling the hybrid UQ module raises NLL to 0.73 (+70%), eliminating calibrated probabilistic awareness vital for risk quantification in sparse data regimes, while removing physics-informed augmentation increases

{Rob}_{Noise}

to 13% (±1) and causes a 25% RMSE rise on NDANEV subsets, reflecting its role in simulating real-world stochasticity and thermal perturbations. Finally, omitting multi-scale dilation inflates RMSE to 5.1 cycles (+42%), as static kernels fail to accommodate variable cycle lengths, diminishing fingerprint resolution by 15–20%. Collectively, these ablations demonstrate that TCN captures local electrochemical transients, enhanced Mamba transmits global dependencies, pressure priors inject physical interpretability, hybrid UQ ensures calibrated reliability, and physics-informed augmentation sustains robustness across operational domains.

To quantitatively justify the proposed pressure-aware design, we conducted a supplementary comparative ablation study on the BatteryML dataset, which serves as the physical ground truth containing actual pressure sensor readings. The results reveal a clear performance hierarchy among the different handling strategies. Utilizing the actual sensor measurements yields the lowest RMSE of 5.1 cycles, establishing the performance upper bound. When replacing these real sensors with our TCN-derived virtual proxy, the RMSE increases only marginally to 5.3 cycles. This slight performance gap of approximately 4% demonstrates that the transfer learning strategy successfully recovers the majority of the degradation-related mechanical information from standard electrical signals. In stark contrast, completely removing the pressure-aware gating mechanism causes the RMSE to deteriorate significantly to 6.6 cycles, representing a 29% error increase. This confirms that pressure dynamics—whether measured directly or inferred via our virtual proxy—contain critical degradation information that cannot be fully recovered by a purely data-driven model, thereby validating the Virtual Pressure Proxy as an effective bridge for sensor-limited environments.

To strictly validate the contribution of the hierarchical design, we conducted an architectural ablation study focusing on the encoder stage. The core innovation of U-H-Mamba lies in decoupling intra-cycle high-frequency dynamics from inter-cycle degradation trends.

As shown in Table 4, replacing the TCN encoder with a simple Linear Embedding layer (denoted as “w/o TCN Encoder”) results in a notable performance drop, with RMSE increasing from 5.1 to 6.3 cycles (+23.5%) on the BatteryML dataset. This comparison is critical because a simple linear embedding treats the input current/voltage snapshots as flat vectors, ignoring their temporal causality and local dependencies. In contrast, the TCN encoder effectively captures short-term local transients and charges/discharge plateau features, compressing them into a robust latent state for the subsequent Mamba decoder. This confirms that the superior performance of U-H-Mamba stems from its specific hierarchical structure—processing “local dynamics” and “global evolution” separately—rather than merely the strength of the Mamba backbone.

These outcomes affirm the framework’s modular necessity, with pressure and UQ particularly vital for robustness in operational variability, where their removal exacerbates errors by 30–40% in cross-domain tests, consistent with literature on hierarchical models.

4.4. Cross-Dataset Generalization and Data Sensitivity

The robustness of the U-H-Mamba framework is evaluated through two critical dimensions: cross-dataset generalization under domain shifts and predictive stability under data scarcity.

Transfer evaluations (zero-shot or 10% fine-tune) probe domain adaptation, where U-H-Mamba’s priors facilitate 15–30% better RMSE amid shifts like lab cleanliness to EV noise, by leveraging augmentation to align distributions and pressure to anchor physical consistencies. For a detailed quantitative breakdown of the augmentation strategy’s impact on predictive metrics (RMSE, MAE, and Uncertainty Width) across the NDANEV dataset, please refer to Appendix C.

The generalization capability of the U-H-Mamba framework is rigorously evaluated under transfer learning scenarios, as summarized in Table 5. In the most challenging zero-shot transfer setting—moving directly from controlled laboratory conditions (NASA) to highly dynamic real-world environments (NDANEV) without target domain training—the model achieves an RMSE of

6.4 \pm 0.7

cycles. This performance substantially outperforms the TCN-LSTM baseline (9.2 cycles), reducing the generalization error by approximately

22

. Despite the significant distributional shift, the model maintains a high Coverage Probability with CP achieving

97.0 \pm 1.1

. This resilience is primarily enabled by the Conformal Recalibration mechanism, which dynamically adjusts uncertainty bounds to compensate for sensor noise and operational discrepancies. In contrast, uncalibrated baselines typically exhibit

25 - 35

higher MAPE due to their inability to handle uncorrected domain shifts.

The framework’s adaptability is further demonstrated through fine-tuning with minimal data (only

10

of the target set). This procedure further reduces the RMSE to

5.2 \pm 0.5

cycles. This rapid convergence indicates that the core degradation representations learned by the model—specifically the pressure-gating dynamics and virtual impedance proxies—are physically robust. Minimal adaptation is sufficient to realign these features with EV-specific behaviors, such as mileage-induced stress accumulation and thermal variation.

On average, the cross-dataset RMSE stabilizes at

5.4 \pm 0.6

cycles, demonstrating the model’s efficacy in bridging both controlled-to-controlled (e.g., CALCE

\to

Oxford, RMSE

4.7 \pm 0.5

) and controlled-to-operational transfers. These results confirm the framework’s versatility in heterogeneous domains, effectively mitigating the

20 - 40

error inflation commonly reported in the literature.

Beyond domain shifts, we analyzed the model’s robustness to data availability by explicitly evaluating how uncertainty evolves with reduced training cycles. As presented in Table 6, we conducted a sensitivity analysis by incrementally reducing the training set size from 100% to 20%. The results reveal a vital characteristic of the uncertainty quantification module: as the training data decreases, the Mean Prediction Interval Width (MPIW) expands significantly (from 4.10 to 12.50). This inverse correlation confirms that the model correctly identifies and quantifies the increased epistemic uncertainty arising from information scarcity. Crucially, despite the wider uncertainty bounds, the point prediction accuracy remains robust; even with only 20% of the training data, the RMSE (2.45 cycles) remains within an acceptable range for early-stage prognostics. This “honest” uncertainty estimation—where the model becomes less confident but not catastrophically inaccurate under data constraints—validates the reliability of U-H-Mamba for industrial applications where run-to-failure data is often limited.

Consequently, U-H-Mamba offers a scalable solution for large-scale fleet applications, significantly reducing calibration efforts and retraining overhead.

The remarkable zero-shot performance (RMSE

\approx 6.4

cycles) on the NDANEV dataset merits physical interpretation. While voltage and current profiles vary drastically between constant-current laboratory tests and dynamic EV driving cycles, the internal pressure evolution driven by electrode swelling exhibits higher domain invariance. Our ablation studies suggest that the U-H-Mamba model implicitly learns to prioritize these “physically stable” features (Pressure and Cumulative Energy) via the attention mechanism when domain shifts occur. By anchoring the prediction on pressure-aware metrics—which correlate directly with the irreversible loss of active lithium regardless of the discharge profile—the model bypasses the overfitting risks associated with relying solely on superficial voltage transients. This confirms that incorporating physics-informed priors (Pressure) is key to bridging the gap between lab and field data.

4.5. Computational Efficiency

Efficiency metrics on NASA (1000 cycles) demonstrate U-H-Mamba’s deployability, with linear scaling supporting edge BMS integration amid constraints like limited GPU resources in vehicles.

From Table 7, training proceeds at 47 s/epoch (±5) and inference at 0.09 s/sample (±0.01), with 1.3 M parameters and 0.55 GFLOPs—18–25% leaner than PatchTST’s 2.6 M and 1.3 GFLOPs, enabling real-time RUL on resource-limited hardware (GPU mem 6.2 GB) where baselines’ quadratic costs (e.g., PatchTST 125 s/epoch) prohibit long-sequence processing. Throughput of 11 samples/s (±1) for 1000 sequences surpasses GRU-MC’s 9 (±1), owing to Mamba’s structured kernels avoiding attention bottlenecks, which in the literature enable 2

\times

speedups for battery time-series but here extend to 2.5

\times

with hierarchical optimizations, facilitating scalable fleet monitoring without compromising accuracy.

Beyond these metrics, the architecture is specifically optimized for hardware-constrained edge environments. The linear complexity (

O (N)

) of the Mamba decoder drastically reduces memory bandwidth requirements compared to quadratic Transformer models (

O (N^{2})

), making it compatible with mid-range microcontrollers (e.g., ARM Cortex-M7) or low-power FPGAs common in modern BMSs. Furthermore, the hierarchical design serves as an effective data compression engine: the TCN encoder condenses high-frequency raw sampling points into compact latent vectors, effectively shortening the sequence length processed by the backend model. Given that battery aging is a slow-dynamic process, the observed inference latency (<100 ms) comfortably meets the real-time constraints of on-board prognostics—typically required only once per charge cycle—without interfering with high-frequency safety protection loops.

4.6. Interpretability Analysis

SHAP attributions dissect predictions, attributing outputs to inputs via game-theoretic values, fostering trust by linking features to physical mechanisms like pressure-driven fade and enabling root-cause diagnosis beyond black-box outputs.

The global SHAP summary in Figure 10 identifies E_cum as the most influential feature (mean |SHAP| ≈ 0.23), followed by Mileage (≈0.20), both reflecting cumulative usage intensity and long-horizon degradation exposure. These two features together explain more than one-third of total attribution, consistent with their strong Pearson correlations (r ≈ −0.76 and −0.83), and highlight that throughput-driven degradation dominates long-term RUL decline. Impedance-related features (Impedance_real and Impedance_imag) form the next group of contributors (mean |SHAP| ≈ 0.12–0.15), indicating that resistance growth and polarization progressively constrain available capacity. ΔV_cell also shows substantial attribution (≈0.12), reinforcing that imbalance amplification strongly affects mid-life fade. Pressure and temperature indicators (P_mean, P_max, P_min) appear in the second tier (≈0.09–0.12), with negative SHAP values for pressure and positive values for temperature, consistent with swelling-induced overpotential rise and thermally accelerated side reactions.

Local explanations (see Figure 11) further reveal that combinations of high impedance, elevated ΔV_cell, and moderate pressure contribute most significantly to cycle-level RUL reductions, whereas voltage stability or low current variability mitigates these effects. Together, the SHAP analyses confirm that U-H-Mamba leverages both long-range (E_cum, Mileage) and electrochemical state variables (Impedance, ΔV_cell, P_mean, T_max), explaining its superior ability to generalize across datasets and operating regimes.

The dynamic evolution of SHAP values shown in Figure 11d–f reveals a profound connection between the model’s attention mechanisms and the underlying electrochemical degradation stages. In the early degradation stage (e.g., Cycles 1–30), statistical features such as cumulative energy (E_cum) and mean voltage (V_mean) dominate the model’s decision-making process. From an electrochemical perspective, this phase corresponds to the stabilization of the Solid–Electrolyte Interphase (SEI) film. The formation and thickening of the SEI layer primarily consume active lithium cyclable inventory, a process highly correlated with voltage throughput and cumulative energy rather than internal mechanical changes. Consequently, the model correctly prioritizes these electrical signals to track the initial linear capacity fade.

However, as the battery approaches the degradation knee-point (e.g., Cycles 55–85), a distinct shift in feature attribution is observed. The importance of pressure (P_mean) and the virtual impedance proxies (Z_virtual) surges, surpassing simple voltage statistics. Physically, this aligns with the onset of lithium plating and electrode swelling. The accumulation of “dead lithium” on the anode surface causes structural deformation and increases internal pressure, while particle cracking leads to a sharp rise in internal resistance. The model’s ability to autonomously shift its focus to these mechanical and impedance-based features confirms that it has learned to identify the non-linear precursors of rapid failure.

Crucially, regarding the effectiveness of the proposed proxies, the analysis highlights that the virtual impedance proxies consistently rank within the top-3 most important features during the late aging stages, exhibiting a contribution magnitude comparable to the ground-truth pressure features in the BatteryML subset. This validates that the TCN-derived proxies are not merely redundant noise but serve as effective substitutes for physical sensors, successfully capturing the latent impedance growth associated with battery dry-out and contact loss in sensor-limited environments.

4.7. Limitations and Future Work

Despite the robust performance of the U-H-Mamba framework on standard benchmarks, we acknowledge specific limitations regarding environmental coverage and chemical generalization. The current validation relies primarily on accelerated cycling datasets at standard or elevated temperatures, which minimize resting periods; consequently, the model may overestimate RUL in real-world scenarios by neglecting time-dependent calendar aging during parking and distinct low-temperature degradation mechanisms (e.g., lithium plating). Future work will aim to address these gaps by adopting a hybrid approach that integrates physics-based calendar aging laws (e.g., Arrhenius equation) with data-driven cycle prognostics, and by validating the model on winter operational profiles. Furthermore, extending the framework to LFP chemistries requires additional adaptation to handle their characteristically flat OCV plateaus, potentially through the incorporation of differential voltage analysis (DVA) features to enhance sensitivity.

5. Conclusions

This study addressed the critical challenge of precise and reliable Remaining Useful Life (RUL) prediction for lithium-ion batteries by proposing U-H-Mamba, a novel uncertainty-aware hierarchical framework. By embedding a Multi-scale Temporal Convolutional Network (TCN) as the base encoder and an enhanced Pressure-Aware Mamba as the state-space decoder, the model successfully decouples short-term electrochemical dynamics from long-term degradation trajectories. A key innovation of this work is the construction of a physics-informed feature space (

F = 25

), which integrates observable metrics with derived virtual impedance proxies. This design explicitly bridges the information gap between sensor-rich laboratory environments and sensor-limited real-world electric vehicles. Rigorous comparative experiments against nine state-of-the-art baselines demonstrated the framework’s superior performance. Under a strict protocol using identical input features and Bayesian hyperparameter optimization, U-H-Mamba achieved the lowest RMSE across diverse datasets. Crucially, the model exhibited exceptional zero-shot generalization capabilities (RMSE

\approx 6.4

cycles from Lab to Real), attributed to the conformal recalibration mechanism and the pressure-gating module that captures domain-invariant physical degradation signatures. Interpretability analysis via SHAP values further validated the architectural design, confirming that the model prioritizes physically meaningful features—such as cumulative energy and virtual impedance—rather than relying on spurious correlations. Beyond deterministic accuracy, the hybrid uncertainty quantification strategy provides calibrated probability distributions, offering BMS controllers essential risk assessment capabilities. Furthermore, the framework’s linear computational complexity and compact footprint (1.3 M parameters) render it highly suitable for diverse deployment scenarios, ranging from real-time inference on edge BMS hardware to scalable fleet analytics in the cloud. Future work will focus on extending this physics-informed state-space modeling approach to pack-level thermal runaway prediction and solid-state battery prognostics.

Author Contributions

Conceptualization, X.L. and Y.C.; Methodology, Z.W., X.L. and H.Z.; Software, Z.W.; Validation, Y.C.; Formal analysis, Z.W., X.L. and Y.C.; Investigation, Z.W., X.L. and H.Z.; Resources, X.L. and Y.C.; Data curation, Z.W. and W.N.; Writing—original draft, Z.W., X.L., H.Z. and W.N.; Writing—review and editing, X.L. and H.Z.; Visualization, Z.W., X.L. and W.N.; Supervision, X.L. and Y.C.; Project administration, X.L. and Y.C.; Funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shanghai Normal University (SK202123).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets supporting the findings of this study are provided in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Point estimation performance comparison across datasets.

Dataset	Subset/ Cell/ Vehicle	Prediction Start	Model	RMSE (Cycles)	MAE (Cycles)	MAPE (%)	R²	$A E_{E O L}$ (Cycles)	$M P E_{K n e e}$ (%)
NASA	B0005	Early (30%)	U-H-Mamba	5.2 ± 0.6	3.4 ± 0.4	2.8 ± 0.3	0.985 ± 0.004	3.1 ± 0.4	1.9 ± 0.2
NASA	B0005	Early (30%)	Vanilla Mamba	6.8 ± 0.7	4.6 ± 0.5	3.8 ± 0.4	0.972 ± 0.005	4.3 ± 0.5	2.6 ± 0.3
NASA	B0005	Early (30%)	PatchTST	7.2 ± 0.8	4.9 ± 0.6	4.1 ± 0.4	0.968 ± 0.006	4.6 ± 0.5	2.9 ± 0.3
NASA	B0005	Early (30%)	TCN-LSTM	8.1 ± 0.8	5.5 ± 0.6	4.6 ± 0.5	0.962 ± 0.007	5.2 ± 0.5	3.3 ± 0.4
NASA	B0005	Early (30%)	GRU-MC Dropout	7.0 ± 0.7	4.7 ± 0.5	3.9 ± 0.4	0.970 ± 0.006	4.4 ± 0.4	2.7 ± 0.3
NASA	B0005	Early (30%)	CNN-PSO	8.5 ± 0.9	5.8 ± 0.6	4.8 ± 0.5	0.958 ± 0.008	5.4 ± 0.5	3.5 ± 0.4
NASA	B0005	Early (30%)	XGBoost	7.8 ± 0.8	5.3 ± 0.6	4.4 ± 0.5	0.960 ± 0.007	5.0 ± 0.5	3.1 ± 0.3
NASA	B0005	Early (30%)	CNN-TLSTM	6.5 ± 0.7	4.4 ± 0.5	3.6 ± 0.4	0.975 ± 0.005	4.1 ± 0.4	2.4 ± 0.3
NASA	B0005	Early (30%)	GM-PFF	6.7 ± 0.7	4.5 ± 0.5	3.7 ± 0.4	0.973 ± 0.005	4.2 ± 0.4	2.5 ± 0.3
NASA	B0005	Early (30%)	VMD-SSA-PatchTST	6.2 ± 0.6	4.2 ± 0.5	3.5 ± 0.4	0.978 ± 0.004	3.9 ± 0.4	2.3 ± 0.3
NASA	B0005	Mid (50%)	U-H-Mamba	3.6 ± 0.4	2.4 ± 0.3	2.0 ± 0.2	0.993 ± 0.002	2.1 ± 0.3	1.3 ± 0.2
NASA	B0005	Mid (50%)	Vanilla Mamba	5.0 ± 0.5	3.4 ± 0.4	2.8 ± 0.3	0.980 ± 0.004	3.1 ± 0.3	2.0 ± 0.2
NASA	B0005	Mid (50%)	PatchTST	5.3 ± 0.6	3.6 ± 0.4	3.0 ± 0.3	0.977 ± 0.005	3.3 ± 0.4	2.1 ± 0.3
NASA	B0005	Mid (50%)	TCN-LSTM	6.5 ± 0.7	4.4 ± 0.5	3.6 ± 0.4	0.968 ± 0.006	4.0 ± 0.4	2.6 ± 0.3
NASA	B0005	Mid (50%)	GRU-MC Dropout	5.7 ± 0.6	3.9 ± 0.5	3.2 ± 0.4	0.974 ± 0.005	3.6 ± 0.4	2.3 ± 0.3
NASA	B0005	Mid (50%)	CNN-PSO	7.2 ± 0.8	4.9 ± 0.6	4.0 ± 0.4	0.962 ± 0.007	4.3 ± 0.5	2.8 ± 0.3
NASA	B0005	Mid (50%)	XGBoost	6.4 ± 0.7	4.3 ± 0.5	3.5 ± 0.4	0.966 ± 0.006	4.0 ± 0.4	2.5 ± 0.3
NASA	B0005	Mid (50%)	CNN-TLSTM	4.8 ± 0.5	3.2 ± 0.4	2.6 ± 0.3	0.982 ± 0.004	2.9 ± 0.3	1.8 ± 0.2
NASA	B0005	Mid (50%)	GM-PFF	5.0 ± 0.5	3.4 ± 0.4	2.8 ± 0.3	0.980 ± 0.004	3.1 ± 0.3	1.9 ± 0.2
NASA	B0005	Mid (50%)	VMD-SSA-PatchTST	4.5 ± 0.5	3.0 ± 0.4	2.5 ± 0.3	0.985 ± 0.003	2.7 ± 0.3	1.7 ± 0.2
NASA	B0005	Late (70%)	U-H-Mamba	2.2 ± 0.3	1.5 ± 0.2	1.2 ± 0.1	0.997 ± 0.001	1.3 ± 0.2	0.9 ± 0.1
NASA	B0005	Late (70%)	Vanilla Mamba	3.2 ± 0.4	2.2 ± 0.3	1.8 ± 0.2	0.991 ± 0.002	2.0 ± 0.2	1.3 ± 0.1
NASA	B0005	Late (70%)	PatchTST	3.4 ± 0.4	2.3 ± 0.3	1.9 ± 0.2	0.990 ± 0.002	2.1 ± 0.2	1.4 ± 0.1
NASA	B0005	Late (70%)	TCN-LSTM	4.2 ± 0.5	2.9 ± 0.4	2.4 ± 0.3	0.984 ± 0.003	2.6 ± 0.3	1.7 ± 0.2
NASA	B0005	Late (70%)	GRU-MC Dropout	3.7 ± 0.4	2.5 ± 0.3	2.1 ± 0.2	0.987 ± 0.003	2.3 ± 0.2	1.5 ± 0.2
NASA	B0005	Late (70%)	CNN-PSO	4.7 ± 0.5	3.2 ± 0.4	2.6 ± 0.3	0.981 ± 0.004	2.9 ± 0.3	1.9 ± 0.2
NASA	B0005	Late (70%)	XGBoost	4.4 ± 0.5	3.0 ± 0.4	2.5 ± 0.3	0.983 ± 0.004	2.7 ± 0.3	1.8 ± 0.2
NASA	B0005	Late (70%)	CNN-TLSTM	3.0 ± 0.3	2.0 ± 0.2	1.7 ± 0.2	0.993 ± 0.002	1.8 ± 0.2	1.2 ± 0.1
NASA	B0005	Late (70%)	GM-PFF	3.2 ± 0.4	2.2 ± 0.3	1.8 ± 0.2	0.991 ± 0.002	2.0 ± 0.2	1.3 ± 0.1
NASA	B0005	Late (70%)	VMD-SSA-PatchTST	2.7 ± 0.3	1.8 ± 0.2	1.5 ± 0.2	0.994 ± 0.002	1.6 ± 0.2	1.1 ± 0.1
NASA	B0006	Overall	U-H-Mamba	3.8 ± 0.4	2.5 ± 0.3	2.1 ± 0.2	0.993 ± 0.002	2.3 ± 0.3	1.5 ± 0.2
NASA	B0006	Overall	Vanilla Mamba	5.2 ± 0.5	3.5 ± 0.4	2.9 ± 0.3	0.979 ± 0.004	3.3 ± 0.4	2.1 ± 0.3
NASA	B0006	Overall	PatchTST	5.5 ± 0.6	3.7 ± 0.4	3.1 ± 0.3	0.976 ± 0.005	3.5 ± 0.4	2.2 ± 0.3
NASA	B0006	Overall	TCN-LSTM	6.7 ± 0.7	4.5 ± 0.5	3.7 ± 0.4	0.967 ± 0.006	4.2 ± 0.5	2.7 ± 0.3
NASA	B0006	Overall	GRU-MC Dropout	5.9 ± 0.6	4.0 ± 0.5	3.3 ± 0.4	0.973 ± 0.005	3.7 ± 0.4	2.4 ± 0.3
NASA	B0006	Overall	CNN-PSO	7.4 ± 0.8	5.0 ± 0.6	4.1 ± 0.4	0.961 ± 0.007	4.5 ± 0.5	2.9 ± 0.3
NASA	B0006	Overall	XGBoost	6.6 ± 0.7	4.4 ± 0.5	3.6 ± 0.4	0.965 ± 0.006	4.1 ± 0.4	2.6 ± 0.3
NASA	Overall	Overall	U-H-Mamba	3.7 ± 0.4	2.4 ± 0.3	2.0 ± 0.2	0.993 ± 0.002	2.2 ± 0.3	1.4 ± 0.2
CALCE	CS2-33	Early (30%)	U-H-Mamba	5.5 ± 0.6	3.7 ± 0.5	3.1 ± 0.4	0.983 ± 0.004	3.4 ± 0.4	2.2 ± 0.3
CALCE	CS2-33	Early (30%)	GRU-MC Dropout	7.3 ± 0.8	5.0 ± 0.6	4.1 ± 0.5	0.963 ± 0.007	4.7 ± 0.5	3.0 ± 0.3
CALCE	CS2-33	Early (30%)	PatchTST	7.8 ± 0.8	5.3 ± 0.6	4.4 ± 0.5	0.958 ± 0.008	5.0 ± 0.5	3.2 ± 0.3
CALCE	CS2-33	Mid (50%)	U-H-Mamba	4.3 ± 0.5	2.9 ± 0.4	2.4 ± 0.3	0.991 ± 0.003	2.6 ± 0.3	1.7 ± 0.2
CALCE	CS2-33	Mid (50%)	GRU-MC Dropout	6.1 ± 0.6	4.1 ± 0.5	3.4 ± 0.4	0.974 ± 0.005	3.8 ± 0.4	2.5 ± 0.3
CALCE	CS2-33	Mid (50%)	PatchTST	6.4 ± 0.7	4.4 ± 0.5	3.6 ± 0.4	0.971 ± 0.006	4.1 ± 0.4	2.6 ± 0.3
CALCE	CS2-33	Late (70%)	U-H-Mamba	3.0 ± 0.3	2.0 ± 0.3	1.7 ± 0.2	0.995 ± 0.001	1.8 ± 0.2	1.2 ± 0.1
CALCE	CS2-33	Late (70%)	GRU-MC Dropout	4.2 ± 0.5	2.9 ± 0.4	2.4 ± 0.3	0.984 ± 0.003	2.6 ± 0.3	1.7 ± 0.2
CALCE	CS2-33	Late (70%)	PatchTST	4.4 ± 0.5	3.0 ± 0.4	2.5 ± 0.3	0.983 ± 0.004	2.7 ± 0.3	1.8 ± 0.2
CALCE	Overall	Overall	U-H-Mamba	4.2 ± 0.5	2.8 ± 0.4	2.3 ± 0.3	0.992 ± 0.003	2.6 ± 0.3	1.7 ± 0.2
Oxford	Cell 1	Overall	U-H-Mamba	4.0 ± 0.4	2.7 ± 0.3	2.2 ± 0.2	0.992 ± 0.002	2.4 ± 0.3	1.6 ± 0.2
Oxford	Cell 1	Overall	TCN-LSTM	6.0 ± 0.6	4.1 ± 0.5	3.4 ± 0.4	0.975 ± 0.005	3.8 ± 0.4	2.5 ± 0.3
Oxford	Overall	Overall	U-H-Mamba	4.1 ± 0.4	2.8 ± 0.3	2.3 ± 0.2	0.992 ± 0.002	2.5 ± 0.3	1.6 ± 0.2
NDANEV	Vehicle 1	Early (30%)	U-H-Mamba	7.0 ± 0.7	4.7 ± 0.5	3.9 ± 0.4	0.977 ± 0.005	4.4 ± 0.4	2.8 ± 0.3
NDANEV	Vehicle 1	Early (30%)	CNN-PSO	10.8 ± 1.0	7.2 ± 0.8	6.0 ± 0.6	0.948 ± 0.008	6.7 ± 0.7	4.2 ± 0.4
NDANEV	Vehicle 1	Early (30%)	XGBoost	9.8 ± 0.9	6.5 ± 0.7	5.4 ± 0.5	0.953 ± 0.008	6.0 ± 0.6	3.8 ± 0.4
NDANEV	Vehicle 1	Mid (50%)	U-H-Mamba	5.6 ± 0.6	3.8 ± 0.5	3.1 ± 0.4	0.984 ± 0.004	3.5 ± 0.4	2.3 ± 0.3
NDANEV	Vehicle 1	Mid (50%)	CNN-PSO	8.4 ± 0.8	5.6 ± 0.6	4.6 ± 0.5	0.959 ± 0.007	5.1 ± 0.5	3.3 ± 0.3
NDANEV	Vehicle 1	Mid (50%)	XGBoost	7.7 ± 0.7	5.1 ± 0.6	4.3 ± 0.5	0.964 ± 0.006	4.7 ± 0.5	3.1 ± 0.3
NDANEV	Vehicle 1	Late (70%)	U-H-Mamba	4.0 ± 0.4	2.7 ± 0.3	2.2 ± 0.2	0.991 ± 0.003	2.4 ± 0.3	1.6 ± 0.2
NDANEV	Vehicle 1	Late (70%)	CNN-PSO	6.2 ± 0.6	4.1 ± 0.5	3.4 ± 0.4	0.974 ± 0.005	3.8 ± 0.4	2.5 ± 0.3
NDANEV	Vehicle 1	Late (70%)	XGBoost	5.7 ± 0.6	3.8 ± 0.4	3.2 ± 0.3	0.977 ± 0.005	3.5 ± 0.4	2.3 ± 0.3
NDANEV	Overall	Overall	U-H-Mamba	5.8 ± 0.6	3.8 ± 0.5	3.2 ± 0.4	0.983 ± 0.004	3.5 ± 0.4	2.3 ± 0.3
BatteryML	Fleet Avg.	Overall	U-H-Mamba	5.1 ± 0.5	3.4 ± 0.4	2.8 ± 0.3	0.987 ± 0.003	3.1 ± 0.3	2.0 ± 0.2
BatteryML	Fleet Avg.	Overall	Vanilla Mamba	6.6 ± 0.7	4.5 ± 0.5	3.7 ± 0.4	0.971 ± 0.005	4.2 ± 0.4	2.7 ± 0.3
BatteryML	Overall	Overall	U-H-Mamba	5.2 ± 0.5	3.4 ± 0.4	2.9 ± 0.3	0.986 ± 0.003	3.2 ± 0.3	2.1 ± 0.2

Appendix B

Table A2. Uncertainty quantification (UQ) performance comparison.

Subset/ Cell/ Vehicle	Prediction Start	Model	CP (95%)	MPIW (Cycles)	NLL	PICP	MIW (Cycles)	$P I C P_{68 %}$	$M I W_{68}$ (Cycles)	CE	Sharp
B0005	Early (30%)	U-H-Mamba	97.2 ± 1.0	10.8 ± 1.2	0.56 ± 0.06	98.8 ± 0.5	15.5 ± 1.8	69.8 ± 2.0	7.0 ± 0.8	0.05 ± 0.01	0.13 ± 0.02
B0005	Early (30%)	MC-CP Hybrid	95.8 ± 1.2	12.5 ± 1.4	0.61 ± 0.06	97.9 ± 0.8	17.8 ± 2.0	67.2 ± 2.2	8.2 ± 0.9	0.07 ± 0.01	0.16 ± 0.02
B0005	Early (30%)	GRU-MC	92.8 ± 1.5	16.2 ± 1.8	0.71 ± 0.07	96.3 ± 1.0	22.2 ± 2.4	64.8 ± 2.5	10.7 ± 1.2	0.09 ± 0.01	0.19 ± 0.03
B0005	Early (30%)	Conformal Only	94.9 ± 1.3	13.7 ± 1.5	0.66 ± 0.07	97.6 ± 0.9	19.2 ± 2.1	66.8 ± 2.3	9.0 ± 1.0	0.08 ± 0.01	0.17 ± 0.02
B0005	Early (30%)	BNN	94.5 ± 1.4	14.2 ± 1.6	0.63 ± 0.06	97.3 ± 1.0	20.2 ± 2.2	66.0 ± 2.4	9.4 ± 1.1	0.08 ± 0.01	0.18 ± 0.02
B0005	Early (30%)	GM-PFF	96.5 ± 1.1	11.5 ± 1.3	0.58 ± 0.06	98.2 ± 0.7	16.5 ± 1.9	68.5 ± 2.1	7.6 ± 0.8	0.06 ± 0.01	0.14 ± 0.02
B0005	Mid (50%)	U-H-Mamba	98.4 ± 0.8	8.4 ± 1.0	0.43 ± 0.04	99.6 ± 0.2	12.7 ± 1.5	69.2 ± 1.8	5.6 ± 0.7	0.04 ± 0.01	0.11 ± 0.01
B0005	Mid (50%)	MC-CP Hybrid	97.2 ± 1.0	10.7 ± 1.2	0.56 ± 0.06	98.8 ± 0.5	15.4 ± 1.8	67.7 ± 2.0	7.1 ± 0.8	0.06 ± 0.01	0.14 ± 0.02
B0005	Mid (50%)	GRU-MC	94.0 ± 1.5	15.0 ± 1.8	0.69 ± 0.07	97.3 ± 1.0	20.5 ± 2.2	65.7 ± 2.2	9.9 ± 1.1	0.08 ± 0.01	0.17 ± 0.02
B0005	Mid (50%)	Conformal Only	95.8 ± 1.2	12.5 ± 1.4	0.61 ± 0.06	98.0 ± 0.8	17.8 ± 2.0	67.2 ± 2.1	8.3 ± 0.9	0.07 ± 0.01	0.15 ± 0.02
B0005	Mid (50%)	BNN	95.5 ± 1.1	11.2 ± 1.3	0.53 ± 0.05	98.3 ± 0.7	16.2 ± 1.9	67.5 ± 2.0	7.4 ± 0.8	0.06 ± 0.01	0.14 ± 0.02
B0005	Mid (50%)	GM-PFF	96.8 ± 1.0	9.5 ± 1.1	0.50 ± 0.05	98.6 ± 0.6	14.0 ± 1.6	68.2 ± 1.9	6.2 ± 0.7	0.05 ± 0.01	0.12 ± 0.01
B0005	Late (70%)	U-H-Mamba	99.0 ± 0.6	6.2 ± 0.7	0.36 ± 0.03	99.9 ± 0.1	9.2 ± 1.0	69.7 ± 1.5	4.1 ± 0.5	0.03 ± 0.01	0.09 ± 0.01
B0005	Late (70%)	MC-CP Hybrid	97.8 ± 0.8	8.2 ± 0.9	0.46 ± 0.04	99.3 ± 0.3	12.2 ± 1.4	68.7 ± 1.8	5.4 ± 0.6	0.05 ± 0.01	0.12 ± 0.01
B0005	Late (70%)	GRU-MC	95.2 ± 1.2	12.2 ± 1.4	0.59 ± 0.06	97.8 ± 0.8	16.7 ± 1.9	66.7 ± 2.0	8.1 ± 0.9	0.07 ± 0.01	0.15 ± 0.02
B0005	Late (70%)	Conformal Only	96.8 ± 1.0	9.7 ± 1.1	0.51 ± 0.05	98.8 ± 0.5	13.7 ± 1.5	68.0 ± 1.9	6.3 ± 0.7	0.06 ± 0.01	0.13 ± 0.01
B0005	Late (70%)	BNN	96.3 ± 1.1	9.2 ± 1.0	0.49 ± 0.05	98.6 ± 0.6	13.2 ± 1.5	67.8 ± 2.0	6.1 ± 0.7	0.05 ± 0.01	0.12 ± 0.01
B0005	Late (70%)	GM-PFF	97.2 ± 0.9	7.5 ± 0.8	0.43 ± 0.04	99.0 ± 0.4	11.0 ± 1.2	68.8 ± 1.7	5.0 ± 0.6	0.04 ± 0.01	0.11 ± 0.01
Overall	Overall	U-H-Mamba	98.4 ± 0.8	9.8 ± 1.1	0.46 ± 0.05	99.6 ± 0.2	14.2 ± 1.6	69.2 ± 1.9	6.5 ± 0.7	0.04 ± 0.01	0.11 ± 0.01
CS2-33	Overall	U-H-Mamba	98.3 ± 0.9	9.6 ± 1.1	0.46 ± 0.05	99.5 ± 0.3	14.0 ± 1.6	69.0 ± 1.9	6.3 ± 0.7	0.04 ± 0.01	0.11 ± 0.01
CS2-33	Overall	GRU-MC	93.6 ± 1.6	16.4 ± 1.9	0.73 ± 0.08	96.8 ± 1.2	22.3 ± 2.4	65.3 ± 2.3	10.9 ± 1.2	0.09 ± 0.01	0.19 ± 0.02
CS2-33	Overall	GM-PFF	96.5 ± 1.0	10.5 ± 1.2	0.52 ± 0.05	98.2 ± 0.7	15.5 ± 1.8	67.5 ± 2.0	7.0 ± 0.8	0.06 ± 0.01	0.13 ± 0.02
Cell 1	Overall	U-H-Mamba	98.4 ± 0.8	9.0 ± 1.0	0.44 ± 0.04	99.6 ± 0.2	13.2 ± 1.5	69.6 ± 1.7	6.0 ± 0.6	0.04 ± 0.01	0.10 ± 0.01
Vehicle 1	Overall	U-H-Mamba	98.3 ± 0.9	11.6 ± 1.3	0.49 ± 0.05	99.7 ± 0.1	16.4 ± 1.8	68.8 ± 2.0	7.7 ± 0.8	0.05 ± 0.01	0.13 ± 0.01
Vehicle 1	Overall	Conformal Only	95.8 ± 1.2	13.4 ± 1.5	0.63 ± 0.06	97.8 ± 0.9	18.7 ± 2.1	66.8 ± 2.1	8.9 ± 1.0	0.07 ± 0.01	0.15 ± 0.02
Vehicle 1	Overall	BNN	95.3 ± 1.3	13.0 ± 1.4	0.59 ± 0.06	98.1 ± 0.8	18.1 ± 2.0	66.3 ± 2.2	8.6 ± 0.9	0.06 ± 0.01	0.14 ± 0.02
Fleet Avg.	Overall	U-H-Mamba	98.6 ± 0.7	10.4 ± 1.2	0.47 ± 0.05	99.6 ± 0.2	15.0 ± 1.7	69.3 ± 1.8	6.9 ± 0.8	0.04 ± 0.01	0.12 ± 0.01
Fleet Avg.	Overall	MC-CP Hybrid	97.0 ± 1.0	12.2 ± 1.4	0.58 ± 0.06	98.6 ± 0.5	17.2 ± 1.9	67.8 ± 2.0	8.1 ± 0.9	0.06 ± 0.01	0.14 ± 0.02
All	All	U-H-Mamba	98.4 ± 0.8	9.8 ± 1.1	0.46 ± 0.05	99.6 ± 0.2	14.2 ± 1.6	69.1 ± 1.9	6.6 ± 0.7	0.04 ± 0.01	0.11 ± 0.01

Appendix C

Table A3 presents a comprehensive comparison of predictive accuracy and uncertainty metrics. The results demonstrate that the augmentation strategy significantly improves robustness. Specifically, the Root Mean Square Error (RMSE) is reduced from 6.92 to 5.81 cycles (16.0% improvement), and the Mean Absolute Error (MAE) decreases by 22.4%. Furthermore, the uncertainty quantification metric, Mean Prediction Interval Width (MPIW), is narrowed by 18.5%, indicating that the model yields not only more accurate but also more confident predictions when exposed to realistic noise patterns.

Table A3. Impact of physics-informed augmentation on NDANEV dataset performance.

Performance Metric	Description	w/o Augmentation (Baseline)	w/Augmentation (Proposed)	Improvement
RMSE (cycles)	Root Mean Square Error	6.92 ± 0.41	5.81 ± 0.35	+16.04%
MAE (cycles)	Mean Absolute Error	5.45 ± 0.38	4.23 ± 0.29	+22.39%
MAPE (%)	Mean Abs. Percentage Error	4.82%	3.65%	+24.27%
MPIW	Mean Prediction Interval Width	12.45	10.14	+18.55%
PICP (%)	Prediction Interval Coverage Prob.	88.5%	94.2%	+6.44%

Appendix D

Table A4. Detailed configuration of the U-H-Mamba architecture and training settings.

Category	Parameter	Symbol	Value
Input Data	Sequence Length (padded)	$T$	3600
	Input Features	$F$	25 (23 physical + 2 virtual)
TCN Encoder	Kernel Size	$k$	3
	Dilation Factors	$d$	$[1, 2, 4, 8]$
	Embedding Dimension	$D$	128
	Dropout Rate	$p_{drop}$	0.1
	Activation Function	-	ReLU
Mamba Decoder	State Dimension	$N$	16
	Expansion Factor	$E$	2
	Convolution Kernel	$K_{conv}$	4
	Discretization	$Δ$	Learnable (Data-dependent)
Uncertainty	MC Dropout Samples	$K$	100
	Target Coverage	$1 - α$	95%
Training	Optimizer	-	AdamW
	Learning Rate	$η$	$1 \times 10^{- 3}$ (Initial)
	Batch Size	$B$	64
	Loss Weighting	$λ$	0.5
	Early Stopping Patience	-	20 epochs
	Max Epochs	-	120

References

Liu, X.; Gao, Y.; Marma, K.; Miao, Y.; Liu, L. Advances in the Study of Techniques to Determine the Lithium-Ion Battery’s State of Charge. Energies 2024, 17, 1643. [Google Scholar] [CrossRef]
Li, N.; Cao, Y.; Liu, X.; Zhang, Y.; Wang, R.; Jiang, L.; Zhang, X.-P. An improved modulation strategy for single-phase three-level neutral-point-clamped converter in critical conduction mode. J. Mod. Power Syst. Clean Energy 2024, 12, 981–990. [Google Scholar] [CrossRef]
Ya, M.; Wang, Z.; Gao, J.; Chen, H. A novel method for remaining useful life of solid-state lithium-ion battery based on improved CNN and health indicators derivation. Mech. Syst. Signal Process. 2024, 220, 111646. [Google Scholar] [CrossRef]
Liu, Y.; Chen, H.; Yao, L.; Ding, J.; Chen, S.; Wang, Z. A physics-guided approach for accurate battery SOH estimation using RCMHRE and BatteryPINN. Adv. Eng. Inform. 2025, 65, 103211. [Google Scholar] [CrossRef]
Xiang, Y.; Fan, W.; Zhu, J.; Wei, X.; Dai, H. Semi-supervised deep learning for lithium-ion battery state-of-health estimation using dynamic discharge profiles. Cell Rep. Phys. Sci. 2024, 5, 101763. [Google Scholar] [CrossRef]
Schmitt, J.; Horstkötter, B.; Bäker, S. State-of-health estimation by virtual experiments using recurrent decoder-encoder based lithium-ion digital battery twins trained on unstructured battery data. J. Energy Storage 2023, 58, 106335. [Google Scholar] [CrossRef]
Sun, G.; Liu, Y.; Liu, X. A method for estimating lithium-ion battery state of health based on physics-informed machine learning. J. Power Source 2025, 627, 235767. [Google Scholar] [CrossRef]
Nazim, M.S.; Rahman, M.M.; Joha, M.I.; Jang, Y.M. An RNN-CNN-based parallel hybrid approach for battery state of charge (SoC) estimation under various temperatures and discharging cycle considering noisy conditions. World Electr. Veh. J. 2024, 15, 562. [Google Scholar] [CrossRef]
Nazim, M.S.; Jang, Y.M.; Chung, B. Machine learning based battery anomaly detection using empirical data. In Proceedings of the International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Osaka, Japan, 19–22 February 2024; pp. 847–850. [Google Scholar]
Ansari, S.; Hannan, M.; Ayob, A.; Abdolrasol, M.G.; Dar, M.A. Co-estimation of state of health and remaining useful life for lithium-ion batteries using a hybrid optimized framework. J. Energy Storage 2025, 105, 114711. [Google Scholar] [CrossRef]
Zhan, Y.; Ren, X.; Zhao, S.; Guo, Z. Enhancing prediction of electron affinity and ionization energy in liquid organic electrolytes for lithium-ion batteries using machine learning. J. Power Sources 2025, 629, 235992. [Google Scholar] [CrossRef]
Elmahallawy, M.; Elfouly, T.; Alouani, A.; Massoud, A.M. A comprehensive review of lithium-ion batteries modeling, and state of health and remaining useful lifetime prediction. IEEE Access 2022, 10, 119040–119070. [Google Scholar] [CrossRef]
Zhao, J.; Zhu, Y.; Zhang, B.; Liu, M.; Wang, J.; Liu, C.; Hao, X. Review of state estimation and remaining useful life prediction methods for lithium–ion batteries. Sustainability 2023, 15, 5014. [Google Scholar] [CrossRef]
Zhang, Y.; Wik, T.; Bergström, J.; Zou, C. State of health estimation for lithium-ion batteries under arbitrary usage using data-driven multimodel fusion. IEEE Trans. Transp. Electrif. 2023, 10, 1494–1507. [Google Scholar] [CrossRef]
Bosello, M.; Falcomer, C.; Rossi, C.; Pau, G. To charge or to sell? EV pack useful life estimation via LSTMs, CNNs, and autoencoders. Energies 2023, 16, 2837. [Google Scholar] [CrossRef]
Ren, P.; Jia, Q.; Xu, Q.; Li, Y.; Bi, F.; Xu, J.; Gao, S. Oil spill drift prediction enhanced by correcting numerically forecasted sea surface dynamic fields with adversarial temporal convolutional networks. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4701018. [Google Scholar] [CrossRef]
Joha, M.I.; Rahman, M.M.; Nazim, M.S.; Jang, Y.M. A secure IIoT environment that integrates AI-driven real-time short-term active and reactive load forecasting with anomaly detection: A real-world application. Sensors 2024, 24, 7440. [Google Scholar] [CrossRef]
Rahman, M.M.; Joha, M.I.; Nazim, M.S.; Jang, Y.M. Enhancing IoT-based environmental monitoring and power forecasting: A comparative analysis of AI models for real-time applications. Appl. Sci. 2024, 14, 10843. [Google Scholar] [CrossRef]
Patil, M.A.; Tagade, P.; Hariharan, K.S.; Kolake, S.M.; Song, T.; Yeo, T.; Doo, S. A novel multistage support vector machine based approach for Li ion battery remaining useful life estimation. Appl. Energy 2015, 159, 285–297. [Google Scholar] [CrossRef]
Wang, C.; Chen, Y. Unsupervised dynamic prognostics for abnormal degradation of lithium-ion battery. Appl. Energy 2024, 365, 123280. [Google Scholar] [CrossRef]
Tian, H.; Qin, P.; Li, K.; Zhao, Z. A review of the state of health for lithium-ion batteries: Research status and suggestions. J. Clean. Prod. 2020, 261, 120813. [Google Scholar] [CrossRef]
Rauf, H.; Khalid, M.; Arshad, N. Machine learning in state of health and remaining useful life estimation: Theoretical and technological development in battery degradation modelling. Renew. Sustain. Energy Rev. 2022, 156, 111903. [Google Scholar] [CrossRef]
Song, K.; Hu, D.; Tong, Y.; Yue, X. Remaining life prediction of lithium-ion batteries based on health management: A review. J. Energy Storage 2023, 57, 106193. [Google Scholar] [CrossRef]
Kim, E.; Kim, M.; Kim, J.; Kim, J.; Park, J.-H.; Kim, K.-T.; Park, J.-H.; Kim, T.; Min, K. Data-driven methods for predicting the state of health, state of charge, and remaining useful life of li-ion batteries: A comprehensive review. Int. J. Precis. Eng. Manuf. 2023, 24, 1281–1304. [Google Scholar] [CrossRef]
Alsuwian, T.; Ansari, S.; Zainuri, M.A.A.M.; Ayob, A.; Hussain, A.; Lipu, M.H.; Alhawari, A.R.; Almawgani, A.; Almasabi, S.; Hindi, A.T. A review of expert hybrid and co-estimation techniques for SOH and RUL estimation in battery management system with electric vehicle application. Expert Syst. Appl. 2024, 246, 123123. [Google Scholar] [CrossRef]
Li, Q.; Song, R.; Wei, Y. A review of state-of-health estimation for lithium-ion battery packs. J. Energy Storage 2025, 118, 116078. [Google Scholar] [CrossRef]
Chen, D.; Hong, W.; Zhou, X. Transformer network for remaining useful life prediction of lithium-ion batteries. IEEE Access 2022, 10, 19621–19628. [Google Scholar] [CrossRef]
Xu, N.; Xie, Y.; Liu, Q.; Yue, F.; Zhao, D. A data-driven approach to state of health estimation and prediction for a lithium-ion battery pack of electric buses based on real-world data. Sensors 2022, 22, 5762. [Google Scholar] [CrossRef]
Kim, S.W.; Oh, K.-Y.; Lee, S. Novel informed deep learning-based prognostics framework for on-board health monitoring of lithium-ion batteries. Appl. Energy 2022, 315, 119011. [Google Scholar] [CrossRef]
Zhang, Y.; Tang, Q.; Zhang, Y.; Wang, J.; Stimming, U.; Lee, A.A. Identifying degradation patterns of lithium ion batteries from impedance spectroscopy using machine learning. Nat. Commun. 2020, 11, 1706. [Google Scholar] [CrossRef]
Bracale, A.; De Falco, P.; Di Noia, L.P.; Rizzo, R. Probabilistic state of health and remaining useful life prediction for li-ion batteries. In Proceedings of the IEEE Texas Power and Energy Conference (TPEC), College Station, TX, USA, 2–5 February 2021; pp. 1–6. [Google Scholar] [CrossRef]
Chen, J.C.; Chen, T.-L.; Liu, W.-J.; Cheng, C.; Li, M.-G. Combining empirical mode decomposition and deep recurrent neural networks for predictive maintenance of lithium-ion battery. Adv. Eng. Inform. 2021, 50, 101405. [Google Scholar] [CrossRef]
Chen, Y. NASA Lithium Ion Battery Dataset. 2024. Available online: https://ieee-dataport.org/documents/nasa-lithium-ion-battery-dataset (accessed on 16 March 2025). [CrossRef]
Bole, B.; Kulkarni, C.; Daigle, M. Randomized Battery Usage Data Set; NASA Ames Prognostics Data Repository, NASA Ames Research Center: Moffett Field, CA, USA, 2014. [Google Scholar]
Birkl, C. Oxford Battery Degradation Dataset 1. 2017. Available online: https://ora.ox.ac.uk/objects/uuid:03ba4b01-cfed-46d3-9b1a-7d4a7bdf6fac (accessed on 16 March 2025).
Kollmeyer, P.J. Panasonic 18650pf Li-ion Battery Data. 2018. Available online: https://api.semanticscholar.org/CorpusID:116778898 (accessed on 18 June 2025).
Saxena, S.; Hendricks, C.; Pecht, M. Cycle life testing and modeling of graphite/LiCoO₂ cells under different state of charge ranges. J. Power Sources 2016, 327, 394–400. [Google Scholar] [CrossRef]
Pozzato, G.; Allam, A.; Onori, S. Lithium-ion battery aging dataset based on electric vehicle real-driving profiles. Data Brief 2022, 41, 107995. [Google Scholar] [CrossRef] [PubMed]
Luh, M.; Blank, T. Comprehensive battery aging dataset: Capacity and impedance fade measurements of a lithium-ion NMC/C-SiO cell. Sci. Data 2024, 11, 1004. [Google Scholar] [CrossRef] [PubMed]
Azhar, N.A.; Pozi, M.S.M.; Din, A.M.; Jatowt, A. An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis. IEEE Trans. Knowl. Data Eng. 2022, 35, 6651–6672. [Google Scholar] [CrossRef]
Matharaarachchi, S.; Domaratzki, M.; Muthukumarana, S. Enhancing SMOTE for imbalanced data with abnormal minority instances. Mach. Learn. Appl. 2024, 18, 100597. [Google Scholar] [CrossRef]
Ma, Y.; Shan, C.; Gao, J.; Chen, H. Multiple health indicators fusion-based health prognostic for lithium-ion battery using transfer learning and hybrid deep learning method. Reliab. Eng. Syst. Saf. 2023, 229, 108818. [Google Scholar] [CrossRef]
Kheirkhah-Rad, E.; Parvareh, A.; Moeini-Aghtaie, M.; Dehghanian, P. A data-driven state-of-health estimation model for lithium-ion batteries using referenced-based charging time. IEEE Trans. Power Deliv. 2023, 38, 3406–3416. [Google Scholar] [CrossRef]
Lipu, M.H.; Hannan, M.A.; Hussain, A.; Saad, M.H.M. Optimal BP neural network algorithm for state of charge estimation of lithium-ion battery using PSO with PCA feature selection. J. Renew. Sustain. Energy 2017, 9, 063701. [Google Scholar] [CrossRef]
Banguero, E.; Correcher, A.; Pérez-Navarro, Á.; García, E.; Aristizabal, A. Diagnosis of a battery energy storage system based on principal component analysis. Renew. Energy 2020, 146, 2438–2449. [Google Scholar] [CrossRef]
Mehta, C.; Sant, A.V.; Sharma, P. SVM-assisted ANN model with principal component analysis based dimensionality reduction for enhancing state-of-charge estimation in LiFePO4 batteries. E-Prime-Adv. Electr. Eng. Electron. Energy 2024, 8, 100596. [Google Scholar] [CrossRef]
Mehta, C.; Sant, A.V.; Sharma, P. Optimized ANN for LiFePO₄ battery charge estimation using principal components based feature generation. Green Energy Intell. Transp. 2024, 3, 100175. [Google Scholar] [CrossRef]
Lee, P.-Y.; Kwon, S.; Kang, D.; Cho, I.; Kim, J. Principle component analysis-based optimized feature extraction merged with nonlinear regression model for improved state-of-health prediction. J. Energy Storage 2022, 48, 104026. [Google Scholar] [CrossRef]
Fonti, V.; Belitser, E. Feature selection using lasso. VU Amst. Res. Pap. Bus. Anal. 2017, 30, 1–25. [Google Scholar]
Antón, J.Á.; García-Nieto, P.J.; García-Gonzalo, E.; Vega, M.G.; Viejo, C.B. Data-driven state-of-charge prediction of a storage cell using ABC/GBRT, ABC/MLP and LASSO machine learning techniques. J. Comput. Appl. Math. 2023, 433, 115305. [Google Scholar] [CrossRef]
Xiong, R.; Sun, Y.; Wang, C.; Tian, J.; Chen, X.; Li, H.; Zhang, Q. A data-driven method for extracting aging features to accurately predict the battery health. Energy Storage Mater. 2023, 57, 460–470. [Google Scholar] [CrossRef]
Ma, Y.; Yao, M.; Liu, H.; Tang, Z. State of health estimation and remaining useful life prediction for lithium-ion batteries by improved particle swarm optimization-back propagation neural network. J. Energy Storage 2022, 52, 104750. [Google Scholar] [CrossRef]
Mustaffa, Z.; Sulaiman, M.H. Battery remaining useful life estimation based on particle swarm optimization-neural network. Clean. Energy Syst. 2024, 9, 100151. [Google Scholar] [CrossRef]
Manriquez-Padilla, C.G.; Cueva-Perez, I.; Dominguez-Gonzalez, A.; Elvira-Ortiz, D.A.; Perez-Cruz, A.; Saucedo-Dorantes, J.J. State of charge estimation model based on genetic algorithms and multivariate linear regression with applications in electric vehicles. Sensors 2023, 23, 2924. [Google Scholar] [CrossRef]
Sun, J.; Kainz, J. State of health estimation for lithium-ion batteries based on current interrupt method and genetic algorithm optimized back propagation neural network. J. Power Sources 2024, 591, 233842. [Google Scholar] [CrossRef]
Nefraoui, K.; Kandoussi, K.; Louzazni, M.; Boutahar, A.; Elotmani, R.; Daya, A. Optimal battery state of charge parameter estimation and forecasting using non-linear autoregressive exogenous. Mater. Sci. Energy Technol. 2023, 6, 522–532. [Google Scholar] [CrossRef]
Brown, K.M.; Dennis, J.E., Jr. Derivative free analogues of the Levenberg-Marquardt and Gauss algorithms for nonlinear least squares approximation. Numer. Math. 1971, 18, 289–297. [Google Scholar] [CrossRef]
Akpinar, R.A.; Achanta, S.; Fotouhi, A.; Zhang, H.; Auger, D.J. Battery temperature prediction in electric vehicles using Bayesian regularization. In Proceedings of the 20th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications To Circuit Design (SMACD), Seville, Spain, 15–18 April 2024; pp. 1–4. [Google Scholar] [CrossRef]
Shu, X.; Li, G.; Shen, J.; Lei, Z.; Chen, Z.; Liu, Y. A uniform estimation framework for state of health of lithium-ion batteries considering feature extraction and parameters optimization. Energy 2020, 204, 117957. [Google Scholar] [CrossRef]
Zhang, Y.; Wik, T.; Bergström, J.; Pecht, M.; Zou, C. A machine learning-based framework for online prediction of battery ageing trajectory and lifetime using histogram data. J. Power Sources 2022, 526, 231110. [Google Scholar] [CrossRef]
Poh, W.Q.T.; Xu, Y.; Liu, W.; Tan, R.T.P. Momentary informatics based data-driven estimation of lithium-ion battery health under dynamic discharging currents. J. Power Sources 2025, 629, 236041. [Google Scholar] [CrossRef]
Ye, Y.; Zhou, Z.; Cai, Z.; Zhang, Z.; Li, Z. State of Health Estimation of Lithium-ion Batteries based on Indirect Health Indicators and Gaussian Process Regression Model. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS), Suzhou, China, 14–16 May 2021; pp. 929–934. [Google Scholar]

Figure 1. Visualization of the data preprocessing pipeline. (a) Overview of the continuous one−day operational profile collected from a real-world electric vehicle, explicitly illustrating the temporal variations in total voltage, current, state of charge (SOC), mileage, and temperature; (b) Detailed view of representative segments (e.g., discharge/charge cycles) extracted for model input, showcasing the synchronized evolution of cell−level metrics (cell voltage, cell temperature) alongside pack−level dynamics.

Figure 2. Overview of the U-H-Mamba framework for battery RUL prediction.

Figure 3. Battery degradation analysis. (a) Capacity vs. cycles with knee-point detection for NASA B0005 and B0006; (b) normalized capacity heatmap across all batteries (Diamond and square markers denote the knee point and the end-of-life (EOL) cycle); (c) S-G smoothing error distribution and RMSE for four battery cells.

Figure 4. Correlation matrix of the 24 candidate observable health features extracted from raw sensor data. Note: The 25th feature (Virtual Impedance) is a latent variable derived by the model and is therefore not shown in these raw data statistics.

Figure 5. Scatter plots of top six features versus RUL with linear regression fits (r values indicate Pearson correlation coefficients, with p < 0.001 for all and point colors indicate the RUL magnitude using a color gradient).

Figure 6. Three-dimensional prediction error landscapes of competing RUL prognostic models. (a) GM-PFF; (b) CNN-PSO; (c) PatchTST; (d) VMD-SSA; (e) CNN-TLSTM; (f) GRU-MC; (g) Vanilla Mamba; (h) XGBoost; (i) TCN-LSTM. The star marks the knee point (the onset of accelerated degradation) on the capacity trajectory.

Figure 7. Comprehensive performance visualization of the proposed U-H-Mamba framework. (a) 3D error landscape; (b) error distribution histogram; (c) NASA dataset performance; (d) CALCE dataset performance; (e) NDANEV dataset performance; (f) lifetime error evolution curve. The star marks the knee point (the onset of accelerated degradation) on the capacity trajectory.

Figure 8. RUL prediction and uncertainty characterization for NASA B0006. (a) Predicted RUL with 95% confidence intervals and magnified pre-knee region; (b) Absolute prediction error over cycles; (c) MAPE evolution highlighting post-knee robustness.

Figure 9. Uncertainty quantification performance of the U-H-Mamba model on selected battery degradation datasets. (a) NASA dataset; (b) CALCE CS2-33 dataset; (c) Oxford Cell 1 dataset; (d) NDANEV Vehicle 1 dataset.

Figure 10. Global SHAP summary plot illustrating feature contributions to RUL prediction.

Figure 11. Comprehensive SHAP-based interpretability analysis for the U-H-Mamba model. (a) Waterfall plot of a representative sample showing how major features contribute to the final prediction; (b) SHAP dependence plot of P_mean colored by T_max; (c) SHAP dependence plot of T_max colored by P_mean; (d) early-stage (Cycle 1–30) feature importance ranking; (e) knee-point region (Cycle 55–85) importance ranking; and (f) late-stage (Cycle 110+) importance ranking.

Table 1. The processed datasets for reproducibility, including total cycles, entities, key features, and mileage spans.

Dataset	Cycles (Total)	Vehicles/Cells	Key Features	Mileage Span (km)
NASA	2500	4 cells	Voltage, Current, Temp, Impedance	N/A
CALCE	3200	8 cells	SOC, Discharge Profiles, EIS	N/A
Oxford	1800	8 cells	Drive Cycles, Temperature	N/A
NDANEV	88,444	15 EVs	Mileage, Extreme Temps, Voltage, SOC	1,200,000+
BatteryML	50,000	Fleet (aggregated)	RUL Labels, Pressure, Current	800,000+

Table 2. U-H-Mamba performance across datasets and degradation phases.

Dataset/Phase	RMSE (Cycles)	MAE (Cycles)	MAPE (%)	R²	$A E_{E O L}$ (Cycles)	MPEₖₙₑₑ (%)
NASA B0005—Early (30%)	5.2 ± 0.6	3.4 ± 0.4	2.8 ± 0.3	0.985 ± 0.004	3.1 ± 0.4	1.9 ± 0.2
NASA B0005—Mid (50%)	3.6 ± 0.4	2.4 ± 0.3	2.0 ± 0.2	0.993 ± 0.002	2.1 ± 0.3	1.3 ± 0.2
NASA B0005—Late (70%)	2.2 ± 0.3	1.5 ± 0.2	1.2 ± 0.1	0.997 ± 0.001	1.3 ± 0.2	0.9 ± 0.1
NASA B0006 (Overall)	3.8 ± 0.4	2.5 ± 0.3	2.1 ± 0.2	0.993 ± 0.002	2.3 ± 0.3	1.5 ± 0.2
NASA (Average)	3.7 ± 0.4	2.5 ± 0.3	2.0 ± 0.2	0.993 ± 0.002	2.2 ± 0.3	1.4 ± 0.2
CALCE CS2-33 (Overall)	4.2 ± 0.5	2.8 ± 0.4	2.3 ± 0.3	0.992 ± 0.003	2.6 ± 0.3	1.7 ± 0.2
Oxford Cell 1 (Overall)	4.1 ± 0.4	2.8 ± 0.3	2.2 ± 0.2	0.992 ± 0.002	2.5 ± 0.3	1.6 ± 0.2
NDANEV Vehicle 1—Ear. (30%)	7.0 ± 0.7	4.7 ± 0.5	3.9 ± 0.4	0.977 ± 0.005	4.4 ± 0.4	2.8 ± 0.3
NDANEV Vehicle 1—Mid (50%)	5.6 ± 0.6	3.8 ± 0.5	3.1 ± 0.4	0.984 ± 0.004	3.5 ± 0.4	2.3 ± 0.3
NDANEV Vehicle 1—Late (70%)	4.0 ± 0.4	2.7 ± 0.3	2.2 ± 0.2	0.991 ± 0.003	2.4 ± 0.3	1.6 ± 0.2
NDANEV (Overall)	5.8 ± 0.6	3.8 ± 0.5	3.2 ± 0.4	0.983 ± 0.004	3.5 ± 0.4	2.3 ± 0.3
BatteryML Fleet Avg. (Overall)	5.1 ± 0.5	3.4 ± 0.4	2.8 ± 0.3	0.987 ± 0.003	3.1 ± 0.3	2.0 ± 0.2
Overall Average (All Datasets)	4.5 ± 0.5	3.0 ± 0.4	2.5 ± 0.3	0.990 ± 0.003	2.7 ± 0.3	1.8 ± 0.2

Table 3. Uncertainty quantification (UQ) performance of U-H-Mamba across datasets.

Dataset	CP (95%)	MPIW (Cycles)	NLL	CE	Sharp
NASA (Overall)	98.4 ± 0.8	9.8 ± 1.1	0.46 ± 0.05	0.04 ± 0.01	0.11 ± 0.01
CALCE CS2-33 (Overall)	98.3 ± 0.9	9.6 ± 1.1	0.46 ± 0.05	0.04 ± 0.01	0.11 ± 0.01
Oxford Cell 1 (Overall)	98.4 ± 0.8	9.0 ± 1.0	0.44 ± 0.04	0.04 ± 0.01	0.10 ± 0.01
NDANEV Vehicle 1 (Overall)	98.3 ± 0.9	11.6 ± 1.3	0.49 ± 0.05	0.05 ± 0.01	0.13 ± 0.01
BatteryML Fleet Avg. (Overall)	98.6 ± 0.7	10.4 ± 1.2	0.47 ± 0.05	0.04 ± 0.01	0.12 ± 0.01
Overall Mean	98.4 ± 0.8	10.1 ± 1.1	0.46 ± 0.05	0.04 ± 0.01	0.11 ± 0.01

Table 4. Ablation study on NASA dataset (70/15/15 split).

Variant	Phase	RMSE	MAE	R²	NLL	$R o b_{N o i s e}$ (%)	ΔRMSE vs. Full
U-H-Mamba (Full Hierarchical)	Early	5.2 ± 0.6	3.4 ± 0.4	0.985 ± 0.004	0.56 ± 0.06	9 ± 1	—
–w/o TCN Encoder (Linear Emb.)	Early	8.2 ± 0.8	5.5 ± 0.6	0.958 ± 0.008	0.76 ± 0.08	26 ± 3	+58%
–w/o Enhanced Mamba	Early	7.4 ± 0.7	5.0 ± 0.6	0.963 ± 0.007	0.71 ± 0.07	21 ± 2	+42%
–w/o Pressure-Aware Gating	Early	6.7 ± 0.7	4.5 ± 0.5	0.968 ± 0.006	0.66 ± 0.07	19 ± 2	+29%
–w/o Hybrid UQ	Early	5.7 ± 0.6	3.8 ± 0.4	0.983 ± 0.003	0.86 ± 0.09	13 ± 1	+10%
–w/o Augmentation	Early	7.0 ± 0.7	4.7 ± 0.5	0.966 ± 0.006	0.69 ± 0.07	23 ± 3	+35%
U-H-Mamba (Full Hierarchical)	Mid	3.6 ± 0.4	2.4 ± 0.3	0.993 ± 0.002	0.43 ± 0.04	6 ± 1	—
–w/o TCN Encoder (Linear Emb.)	Mid	6.7 ± 0.7	4.5 ± 0.5	0.963 ± 0.007	0.66 ± 0.07	16 ± 2	+86%
–w/o Pressure-Aware Gating	Mid	4.9 ± 0.5	3.3 ± 0.4	0.978 ± 0.004	0.53 ± 0.05	11 ± 1	+36%
–w/o Hybrid UQ	Mid	3.9 ± 0.4	2.6 ± 0.3	0.991 ± 0.002	0.73 ± 0.08	8 ± 1	+8%
U-H-Mamba (Full Hierarchical)	Late	2.2 ± 0.3	1.5 ± 0.2	0.997 ± 0.001	0.36 ± 0.03	4 ± 1	—
–w/o TCN Encoder (Linear Emb.)	Late	4.2 ± 0.5	2.8 ± 0.4	0.984 ± 0.003	0.51 ± 0.05	9 ± 1	+91%
–w/o Pressure-Aware Gating	Late	3.2 ± 0.4	2.1 ± 0.3	0.991 ± 0.002	0.43 ± 0.04	6 ± 1	+45%
–w/o Hybrid UQ	Late	2.4 ± 0.3	1.6 ± 0.2	0.996 ± 0.001	0.56 ± 0.06	5 ± 1	+9%

Table 5. Cross-dataset generalization performance of U-H-Mamba.

Training → Test Dataset	Transfer Type	RMSE (Cycles)	MAE (Cycles)	MAPE (%)	R²	CP (95%)
NASA → CALCE	Zero-Shot	5.0 ± 0.5	3.4 ± 0.4	2.8 ± 0.3	0.987 ± 0.003	97.6 ± 1.0
NASA → CALCE	Fine-Tune (10%)	4.2 ± 0.5	2.8 ± 0.4	2.3 ± 0.3	0.992 ± 0.003	98.3 ± 0.9
CALCE → Oxford	Zero-Shot	4.7 ± 0.5	3.2 ± 0.4	2.6 ± 0.3	0.989 ± 0.003	97.8 ± 0.9
NASA → NDANEV	Zero-Shot	6.4 ± 0.7	4.3 ± 0.5	3.5 ± 0.4	0.979 ± 0.004	97.0 ± 1.1
NASA → NDANEV	Fine-Tune (10%)	5.2 ± 0.5	3.5 ± 0.4	2.9 ± 0.3	0.986 ± 0.003	98.3 ± 0.9
Oxford → BatteryML	Zero-Shot	5.7 ± 0.6	3.8 ± 0.5	3.2 ± 0.4	0.984 ± 0.004	98.0 ± 0.9
Overall Average	—	5.4 ± 0.6	3.6 ± 0.4	3.0 ± 0.3	0.985 ± 0.004	97.8 ± 1.0

Table 6. Sensitivity analysis of model performance and uncertainty quantification.

Training Data Ratio	RMSE (Cycles)	MAE (Cycles)	MPIW (Width)	Coverage Prob. (CP)
100% (Baseline)	1.35 ± 0.05	1.08 ± 0.04	4.10	95.1%
80%	1.42 ± 0.06	1.15 ± 0.05	5.40	95.4%
60%	1.65 ± 0.08	1.30 ± 0.06	7.20	95.8%
40%	1.98 ± 0.10	1.52 ± 0.08	9.80	96.2%
20%	2.45 ± 0.12	1.88 ± 0.10	12.50	96.5%

Table 7. Computational efficiency and scalability on NASA (1000-cycle sequences).

Model	Train Time (s/Epoch)	Inference Time (s/Sample)	Params (M)	GPU Mem (GB)	Throughput (Samples/s)
U-H-Mamba	47 ± 5	0.09 ± 0.01	1.3	6.2	11 ± 1
Vanilla Mamba	40 ± 4	0.07 ± 0.01	0.9	5.2	15 ± 2
PatchTST	125 ± 10	0.27 ± 0.03	2.6	10.2	4 ± 0.5
TCN-LSTM	68 ± 7	0.13 ± 0.02	1.6	7.2	8 ± 1
GRU-MC Dropout	58 ± 6	0.11 ± 0.02	1.1	5.7	9 ± 1
XGBoost	32 ± 3	0.06 ± 0.01	0.6	3.2	18 ± 2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wen, Z.; Liu, X.; Niu, W.; Zhang, H.; Cheng, Y. U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets. Energies 2026, 19, 414. https://doi.org/10.3390/en19020414

AMA Style

Wen Z, Liu X, Niu W, Zhang H, Cheng Y. U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets. Energies. 2026; 19(2):414. https://doi.org/10.3390/en19020414

Chicago/Turabian Style

Wen, Zhihong, Xiangpeng Liu, Wenshu Niu, Hui Zhang, and Yuhua Cheng. 2026. "U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets" Energies 19, no. 2: 414. https://doi.org/10.3390/en19020414

APA Style

Wen, Z., Liu, X., Niu, W., Zhang, H., & Cheng, Y. (2026). U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets. Energies, 19(2), 414. https://doi.org/10.3390/en19020414

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets

Abstract

1. Introduction

2. Data Source and Processing

3. RUL Prediction Framework

3.1. Reference RUL Calculation

3.2. Health Feature Extraction

3.3. Optimized U-H-Mamba Hybrid Model

3.4. Hyperparameter Tuning

3.5. Performance Evaluation Metrics

4. Results and Discussion

4.1. Point Estimation Performance

4.2. Uncertainty Quantification Performance

4.3. Ablation Studies

4.4. Cross-Dataset Generalization and Data Sensitivity

4.5. Computational Efficiency

4.6. Interpretability Analysis

4.7. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI