A Battery Cycle-Level RUL Estimation Method Based on Multi-Domain Features and an MCAS-Guided Dual-Attention Bi-LSTM

Süpürtülü, Meltem; Yılmaz, Ersen

doi:10.3390/app16042070

Open AccessArticle

A Battery Cycle-Level RUL Estimation Method Based on Multi-Domain Features and an MCAS-Guided Dual-Attention Bi-LSTM

by

Meltem Süpürtülü

^1,2,*

and

Ersen Yılmaz

¹

Electrical-Electronic Engineering Department, Bursa Uludag University, Bursa 16059, Türkiye

²

TOFAS R&D Center, Turkish Automobile Factory, Bursa 16110, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 2070; https://doi.org/10.3390/app16042070

Submission received: 8 January 2026 / Revised: 27 January 2026 / Accepted: 10 February 2026 / Published: 20 February 2026

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

Reliable prediction of the Remaining Useful Life (RUL) of lithium-ion batteries (LIBs) plays a pivotal role in maintaining safe operation, enhancing system dependability, and supporting economically sustainable lifecycle planning in electric mobility and stationary energy storage applications. However, battery aging is governed by highly nonlinear, interacting, and chemistry-dependent processes, which pose significant challenges for conventional data-driven prognostic models. In this study, a unified RUL prediction framework is proposed by integrating multi-domain feature engineering, a Multi-Criteria Adaptive Selection (MCAS) strategy, and a Bidirectional Long Short-Term Memory (Bi-LSTM) network enhanced with dual multi-head attention. Degradation-relevant descriptors extracted from time, frequency, and chaotic domains are employed to capture complementary aging dynamics across battery cycling. In addition, a novel degradation-consistency indicator, termed the M-score, is introduced to characterize the regularity and stability of degradation behavior using observable electrical, thermal, and statistical signals. The MCAS mechanism systematically identifies informative and temporally stable features while suppressing redundancy, thereby improving both predictive robustness and interpretability. The resulting architecture jointly exploits adaptive feature refinement and attention-based temporal modeling to enhance the RUL estimation accuracy. The proposed framework is validated using two widely adopted benchmark datasets: the Toyota Research Institute (TRI) dataset, representing fast-charging lithium iron phosphate (LFP) cells, and the Sandia National Laboratories (SNL) dataset, which includes multiple chemistries, such as LFP, NMC, and NCA. Experimental results demonstrate substantial improvements in the RUL prediction accuracy compared with baseline Bi-LSTM and single-attention models, while systematic ablation studies confirm the individual contributions of the M-score and MCAS components. Within the evaluated datasets and operating conditions, the results suggest that the proposed framework offers a robust and interpretable data-driven solution for battery RUL estimation. However, extending its generalizability and validating its performance on unseen datasets and in real-world scenarios remain important areas for future research.

Keywords:

battery prognostics; Bi-LSTM; dual attention; feature selection; lithium-ion batteries (LIBs); multi-domain feature engineering; remaining useful life (RUL)

1. Introduction

The increasing reliance on LIBs for electric vehicles and grid-level energy storage has brought battery health monitoring to the forefront of reliability and safety considerations. Among various health indicators used in battery management systems (BMSs), remaining useful life (RUL) estimation plays a pivotal role in ensuring operational safety, enabling condition-based maintenance, and reducing lifecycle costs. Inaccurate or unreliable RUL predictions may result in unexpected system failures, elevated safety risks, or overly conservative operating strategies, ultimately constraining the economic and environmental advantages of battery-powered technologies [1,2,3,4,5]. Consequently, the development of robust, interpretable, and deployable RUL prognostic models has become a central research challenge within the field of prognostics and health management.

Battery degradation is driven by a tightly coupled combination of electrochemical, thermal, and mechanical phenomena whose evolution depends strongly on operating conditions, usage profiles, and cell chemistry. These interacting processes introduce pronounced nonlinearity, nonstationarity, and multi-scale behavior into battery aging trajectories, making purely physics-based degradation models difficult to generalize across diverse operating scenarios and chemistries [6,7,8,9,10]. As a result, data-driven approaches have emerged as a practical alternative, offering greater flexibility by learning degradation patterns directly from operational measurements. In this context, recurrent neural networks—particularly Long Short-Term Memory (LSTM) models and their bidirectional variants (Bi-LSTM)—have demonstrated notable effectiveness in capturing temporal dependencies in battery cycling data and have been widely adopted for RUL prediction tasks [11,12,13,14,15].

Despite these advances, many existing deep learning-based RUL models rely on restricted feature sets or single-domain signal representations, typically derived from voltage, current, temperature, or capacity measurements. Such simplified representations may fail to reflect the full complexity of battery degradation dynamics [16,17,18]. Recent studies indicate that features extracted from multiple analytical domains—including time, frequency, and nonlinear or chaotic domains—can provide complementary perspectives on aging behavior and performance degradation [19]. However, the integration of multi-domain features introduces new challenges, notably increased feature redundancy, temporal instability across cycles, and reduced interpretability when large feature spaces are directly coupled with deep neural architectures.

To mitigate these issues, feature selection has become an essential component of data-driven battery prognostics. Conventional feature selection techniques—such as filter-based, wrapper-based, and embedded approaches—primarily focus on statistical relevance or model-specific importance metrics [20,21,22,23]. While effective under certain conditions, these methods are typically static and do not explicitly account for degradation-specific temporal consistency, inter-feature redundancy, or stability across battery life cycles. Furthermore, feature selection is often treated as a detached preprocessing step, independent of the temporal learning process, which may limit adaptability when operating conditions or degradation patterns change over time.

In parallel, attention mechanisms have gained prominence as a means of enhancing deep learning models by enabling adaptive weighting of informative temporal segments or features. Attention-based architectures, including transformer-inspired models, have recently been explored for battery RUL prediction to improve long-range dependency modeling and predictive accuracy [23,24,25,26,27,28,29]. Nevertheless, many existing approaches emphasize temporal attention alone, without explicitly modeling feature-level interactions. Moreover, the interpretability potential of attention mechanisms is frequently underutilized, particularly when attention is applied in a single dimension without explicit alignment to degradation-relevant feature structures.

Motivated by these limitations, this study proposes an integrated RUL prediction framework that jointly addresses multi-domain feature representation, adaptive feature selection, and interpretable temporal modeling within a unified architecture. The proposed approach consists of three tightly coupled components. First, multi-domain feature engineering is employed to extract complementary degradation descriptors from time, frequency, and chaotic domains. Second, a novel Multi-Criteria Adaptive Selection (MCAS) mechanism is introduced to identify informative and stable features by jointly considering statistical relevance, redundancy avoidance, temporal stability, domain knowledge, and feature uniqueness. Third, a Bi-LSTM architecture enhanced with dual multi-head attention is developed to simultaneously capture long-range temporal dependencies and inter-feature relationships in battery degradation trajectories.

In addition, this study introduces a new degradation indicator, referred to as the M-score, designed to characterize degradation behavior from a consistency-oriented perspective. Unlike conventional health indicators that primarily emphasize the degradation magnitude, the M-score focuses on the regularity and stability of degradation evolution across cycles. It is explicitly designed to interact with both feature-level and temporal attention mechanisms, providing a degradation-aware consistency signal that complements adaptive feature selection and attention-based modeling rather than serving as a standalone health metric.

A defining characteristic of the proposed framework is its design philosophy, which emphasizes the synergistic integration of its constituent components. In contrast to existing studies that treat feature selection strategies, health indicators, and attention mechanisms as independent enhancements, the present approach explicitly integrates these components into a mutually reinforcing structure. While dual attention mechanisms improve temporal and feature-level representation learning, their effectiveness may be compromised by redundant or temporally unstable inputs when applied in isolation. In contrast, an MCAS-based approach operating without an attention mechanism is inherently limited in its ability to capture the long-term temporal dependencies characteristic of battery degradation dynamics. By jointly coupling MCAS, M-score, and dual attention, the proposed framework translates architectural complexity into meaningful performance gains rather than relying on any single component alone.

The proposed framework is evaluated using two benchmark datasets that represent diverse operating conditions and chemistries. The Toyota Research Institute (TRI) dataset [18] comprises fast-charging lithium iron phosphate (LFP) cells, and the Sandia National Laboratories (SNL) dataset [30] includes multiple chemistries, namely LFP, NMC, and NCA. Comprehensive experiments and systematic ablation studies are conducted to quantify the contribution of each architectural component and to assess robustness across heterogeneous degradation scenarios.

The principal contributions of this work can be outlined as follows:

A degradation-consistency-oriented indicator (M-score) is introduced, emphasizing temporal and cross-domain stability rather than degradation magnitude, and explicitly designed to interact with feature-level and temporal attention mechanisms.
A Multi-Criteria Adaptive Selection (MCAS) framework is proposed for robust and interpretable feature refinement from high-dimensional multi-domain representations, jointly accounting for relevance, redundancy, stability, and feature uniqueness.
A dual multi-head attention–enhanced Bi-LSTM architecture is developed to simultaneously model long-range temporal dependencies and inter-feature interactions in battery degradation trajectories.
A synergistic integration of M-score, MCAS, and dual attention is established, demonstrating that coordinated interaction among these components yields superior performance compared to isolated enhancements.
Extensive validation on two complementary benchmark datasets (TRI and SNL) confirms the robustness, generalization capability, and interpretability of the proposed framework across different chemistries and operating conditions.

The structure of the paper is organized as follows: Section 2 provides an overview of related studies on battery RUL estimation and feature-driven prognostic approaches. The proposed methodology is described in detail in Section 3. Section 4 reports the experimental findings, which are subsequently analyzed and discussed in depth in Section 5. The paper concludes in Section 6 with a summary of the main outcomes and potential avenues for future investigation. An overview of the proposed framework and its operational workflow is depicted in Figure 1.

2. Related Works

Battery RUL prediction has been extensively investigated using publicly available benchmark datasets, including the TRI [18] and SNL [30]. In recent years, the rapid growth in the volume, dimensionality, and heterogeneity of battery-cycling data has accelerated the adoption of deep learning-based prognostic approaches, which offer greater flexibility than traditional model-driven techniques [1,2,11,12,13,14,15]. Within this landscape, recurrent neural networks—particularly when augmented with attention mechanisms—have emerged as leading solutions due to their effectiveness in capturing nonlinear degradation dynamics as well as long-term temporal dependencies under diverse operating conditions [16,17,21,22,23,26,27,28,29,30,31].

A large body of prior work focuses on integrating recurrent architectures with temporal attention mechanisms to improve the RUL estimation accuracy. Classical LSTM [32,33] and Bi-LSTM models augmented with attention have demonstrated improved performance by adaptively emphasizing informative degradation stages while suppressing less relevant or noisy cycles [11,21,23]. Extensions incorporating hierarchical attention, stacked recurrent layers, and multi-scale temporal modeling have further enhanced prediction accuracy across different chemistries and cycling protocols [21,23]. Despite these advances, most temporal attention formulations rely on local similarity or alignment between hidden states, which limits their ability to encode global degradation structure and long-range interactions across the full life cycle [16].

Beyond recurrent models, convolutional and hybrid deep learning architectures have also been explored for battery prognostics. Convolutional Neural Networks (CNNs) are commonly employed to extract localized degradation patterns from voltage–capacity curves, incremental capacity (IC) signals, or differential voltage profiles. Hybrid CNN–LSTM architectures combine spatial feature extraction with sequential learning, offering improved representation capacity for complex degradation trajectories [12,22]. More recently, transformer-based architectures that rely exclusively on self-attention mechanisms have been applied to battery RUL prediction, achieving promising results on large-scale datasets [1,2,16]. However, the substantial computational complexity, extensive training requirements, and sensitivity to hyperparameter configuration associated with transformers limit their practical applicability, particularly for embedded or resource-constrained Battery Management Systems (BMS) [16].

Parallel to architectural developments, feature engineering remains a central aspect of battery prognostics. Most existing studies primarily employ time-domain features derived from capacity fade, voltage evolution, current response, temperature variation, and internal resistance growth [17,18,22,33,34,35,36,37,38,39]. To enrich degradation representation, several works have incorporated frequency-domain descriptors obtained through spectral analysis or impedance-related measurements, as well as nonlinear and chaotic indicators such as entropy measures, correlation dimension, and Lyapunov-based metrics [16,19,20,25]. While multi-domain features provide complementary insights into aging behavior, their direct integration often results in high-dimensional and highly correlated feature spaces, which can degrade model robustness and interpretability [17,22,31].

To address this challenge, a wide range of feature selection techniques have been introduced in the context of battery RUL prediction [22,24,25,40,41,42]. Filter-based methods, including correlation analysis and mutual information ranking, offer computational efficiency but neglect feature interactions and temporal behavior [22,32]. Wrapper-based approaches, such as recursive feature elimination and sequential selection, improve predictive alignment at the expense of increased computational cost [19,23,32]. Embedded methods—including LASSO regularization, tree-based feature importance, and SHAP-based attribution—strike a balance between efficiency and performance [22,31]. Nevertheless, most existing feature selection strategies optimize a single criterion and do not explicitly account for temporal stability, redundancy across degradation stages, or robustness under varying operating conditions [24,25].

Attention mechanisms have also been extended beyond purely temporal formulations to incorporate feature-level weighting, enabling models to adaptively emphasize influential degradation indicators [11,21,23]. Although such approaches improve interpretability and performance, the majority of existing studies apply attention either along the temporal dimension or across features but rarely address both dimensions simultaneously within a unified framework [21,31]. Moreover, conventional attention formulations typically focus on local or pairwise relationships and do not explicitly encode global structural characteristics of multivariate degradation trajectories.

Recent advances in time-series forecasting and fault diagnosis suggest that incorporating richer structural information into attention mechanisms—for example, through positional encoding, low-rank approximations, multi-branch attention, or matrix factorization—can enhance representation capacity and robustness. These developments indicate that global structural descriptors can complement local similarity-based attention [16,29,30,31,32,33]. However, within the specific context of battery RUL prediction, the systematic integration of matrix-level structural characteristics and degradation consistency into attention scoring mechanisms remains largely unexplored.

In contrast to existing approaches, the present study adopts a unified and degradation-aware perspective that integrates multi-domain feature engineering, adaptive feature selection, and dual-level attention modeling within a single framework. By introducing a degradation consistency indicator and embedding structural information into both feature selection and attention mechanisms, the proposed approach moves beyond static feature selection and purely similarity-based attention. This integrated design enables simultaneous modeling of local temporal dynamics and global feature interactions, thereby addressing several key limitations identified in prior battery RUL prediction studies.

3. Materials and Methods

3.1. Datasets

To evaluate the effectiveness and robustness of the proposed RUL prediction framework under diverse degradation scenarios, two complementary lithium-ion battery datasets are employed, as schematically illustrated in Figure 2. The first dataset is the TRI dataset, which focuses on accelerated aging induced by fast-charging protocols. The second dataset is the SNL dataset, which encompasses a wide range of lithium-ion chemistry and operating conditions. The combined use of these datasets enables systematic assessment of model performance under both high-stress fast-charging regimes and chemistry-dependent degradation behaviors, thereby supporting a comprehensive and balanced evaluation (Table 1).

3.1.1. Toyota Research Institute (TRI) Dataset

The TRI battery dataset was developed through a collaborative effort between the Toyota Research Institute, Stanford University, and the Massachusetts Institute of Technology with the objective of investigating degradation mechanisms associated with fast charging in lithium-ion batteries [18,30]. The dataset comprises 140 cylindrical lithium iron phosphate (LFP)/graphite cells (A123 Systems APR18650M1A) with a rated capacity of 1.1 Ah. All cells were evaluated under 72 distinct fast-charging protocols at a tightly controlled ambient temperature of 30 °C [18]. The experiments were carried out in an environmental chamber, where each cell was discharged at a constant current to a cutoff voltage of 2 V, followed by a constant-voltage phase maintained until the current decayed to C/50. Charging currents ranged from 3.6 C to 6 C, allowing for a systematic analysis of the degradation behavior under accelerated aging conditions.

Throughout cycling, key electrical and thermal signals—including voltage, current, surface temperature, discharge capacity, internal resistance, and charging duration—were continuously recorded. The full dataset contains approximately 96,700 cycles, with individual cell lifetimes ranging from 150 to 2300 cycles. The end-of-life (EoL) criterion is specified as the cycle at which the discharge capacity falls below 80% of its initial value, which corresponds to approximately 0.88 Ah [1,12].

3.1.2. Sandia National Laboratories (SNL) Dataset

The SNL battery dataset was developed by Sandia National Laboratories on behalf of the U.S. Department of Energy to provide a comprehensive experimental benchmark for lithium-ion battery degradation across a wide range of chemistries and operating conditions [29]. The dataset includes cells based on multiple lithium-ion chemistries, primarily lithium iron phosphate (LFP), nickel manganese cobalt (NMC), and nickel cobalt aluminum (NCA). These cells were subjected to diverse stress conditions, including charge and discharge rates ranging from 0.5 C to 3 C, depth-of-discharge (DoD) windows between 20% and 100%, and ambient temperatures varying from 0 °C to 55 °C. This experimental diversity enables analysis of both standard and accelerated aging mechanisms under realistic operating scenarios.

Structurally, the SNL dataset is organized into three hierarchical levels. The first level provides cycle-level metrics, such as charge and discharge capacities and coulombic efficiency. The second level contains summary statistics, including average voltage, energy efficiency, and internal resistance indicators. The third level comprises high-resolution time-series measurements of voltage, current, and temperature recorded throughout each cycle. This multi-resolution data organization facilitates advanced feature extraction across time, frequency, and nonlinear (chaotic) domains, which is particularly advantageous for machine learning-based RUL estimation.

3.2. Data Preprocessing

Prior to the feature extraction and model development, a unified data-preprocessing pipeline was applied to both datasets to ensure signal quality, structural consistency, and comparability across cells, chemistries, and operating conditions. In data-driven battery prognostics, preprocessing plays a critical role, as raw cycling measurements are frequently affected by sensor noise, missing samples, and acquisition irregularities, all of which can significantly degrade the learning performance if left unaddressed [18,33].

The preprocessing workflow consisted of sequential data cleaning, signal conditioning, normalization, and dataset-partitioning stages. Initially, cycles containing incomplete records, corrupted measurements, or physically implausible values were excluded. Specifically, cycles exhibiting abnormal voltage limits, unrealistic temperature excursions, missing time-series segments, or inconsistent capacity values were removed to retain only physically meaningful degradation trajectories [18,34]. For otherwise valid cycles with minor data gaps, missing samples were reconstructed using linear interpolation, which has been widely adopted in battery aging studies for handling short-duration signal losses without distorting degradation trends [34].

To ensure temporal consistency, irregular sampling intervals were filtered using time-gap constraints, thereby eliminating cycles with nonuniform acquisition behavior that could bias subsequent feature extraction [35]. In addition, Z-score-based anomaly detection was employed to identify and remove outliers caused by sensor faults or abnormal operating events [36]. To further suppress high-frequency noise while preserving the underlying degradation evolution, Savitzky–Golay filtering was applied to voltage, current, and capacity signals, providing effective smoothing without compromising long-term trend information [37,38].

To eliminate scale disparities across cells and chemistries, signal normalization was applied prior to feature extraction. Discharge capacity was normalized with respect to its initial value (Q/Q₀), a common practice in battery-aging analysis that enables direct comparison of degradation trajectories across different cells and chemistries [12,18]. Voltage and current signals were rescaled to the [0, 1] interval using min–max normalization to reduce numerical instability and prevent dominance of high-magnitude variables during learning [33].

RUL was defined according to the commonly adopted 80% capacity fade criteria [1,12,18]. Specifically, the EoL point was identified as the cycle at which the discharge capacity dropped below 0.8 × Q₀. For each cycle i, the corresponding RUL value was computed as the difference between the EoL cycle index (

N_{E O L}

) and the current cycle index, as expressed in Equation (1). This formulation provides a cycle-level ground truth label suitable for supervised regression-based

R U L

estimation.

{R U L}_{i} = N_{E O L} - i

(1)

To avoid data leakage and ensure unbiased performance evaluation, dataset partitioning was performed strictly at the cell level. All cycles belonging to a given battery were assigned exclusively to a single subset, ensuring that no degradation trajectory was partially observed across training, validation, or testing stages.

Case 1—TRI Dataset (Fast-Charging Scenario): The TRI dataset served as the primary benchmark for evaluating fast-charging–induced degradation behavior. From the available pool, 40 LFP/graphite cells with complete cycling histories and consistent sensor calibration were selected. Charging rates ranged from 3.6 C to 6 C, representing highly accelerated aging conditions. For each cell, only the first 100 cycles were used to construct input feature sequences, while the corresponding RUL label represented the remaining number of cycles until EoL. This early-cycle formulation reflects realistic prognostic deployment scenarios, where long-term health prediction must be estimated from limited initial data.

To rigorously assess generalization performance, an incremental battery-wise leave-one-out evaluation protocol was employed, as illustrated in Figure 3. In each iteration, the complete degradation trajectory of a single battery was excluded from training and used exclusively for testing, while the model was incrementally updated using the remaining batteries. This strategy prevents cycle-level information leakage and closely mirrors real-world conditions in which new batteries become available over time.

Case 2—SNL Dataset (Multi-Chemistry Scenario): For the SNL dataset, 26 cells were selected, including 9 LFP, 8 NCA, and 9 NMC cells. The dataset was partitioned using a 70/30 split at the cell level, ensuring that all cycles of a given cell were assigned exclusively to either the training or the test set. In this scenario, the model was trained on the SNL training subset and evaluated on previously unseen test cells, enabling assessment of robustness and generalization across heterogeneous chemistries, aging mechanisms, and operating conditions. To ensure fair comparison across chemistries, discharge capacity values were normalized by the nominal capacity of each cell prior to feature extraction.

3.3. Proposed Methodology

3.3.1. Multi-Domain Feature Extraction with M-Score

To characterize the complex and nonlinear degradation behavior of lithium-ion batteries, a comprehensive multi-domain feature extraction strategy was adopted. Cycle-level features were derived from the time, frequency, and chaotic domains, each of which provides a complementary perspective on battery-aging dynamics. Time-domain descriptors capture global degradation trends and statistical dispersion, frequency-domain features reflect spectral variations and cyclic behavior, while chaotic-domain indicators quantify nonlinear irregularities and complexity in degradation trajectories. The effectiveness of combining such heterogeneous feature domains has been widely demonstrated in battery prognostics, particularly under diverse operating conditions and across different cell chemistries [19,22,31,32].

In addition to domain-specific descriptors, battery-centric physical indicators and a newly proposed composite degradation metric, termed the M-score, were incorporated into the feature space. By integrating conventional statistical features with nonlinear and spectral indicators, the resulting representation captures both long-term degradation tendencies and subtle cycle-to-cycle variations. This enriched feature formulation is designed to improve robustness and generalization in data-driven RUL estimation, particularly in scenarios characterized by accelerated aging or heterogeneous degradation behavior [19,22].

The complete mathematical definitions of all extracted features—including time-domain, frequency-domain, chaotic-domain, and battery-specific indicators—are provided in Appendix A.1, Appendix A.2, Appendix A.3 and Appendix A.4. In total, 161 candidate features were extracted prior to adaptive feature refinement using the proposed MCAS framework.

M-score Degradation-Consistency Indicator

Beyond conventional domain-based features, this study introduces the M-score, a composite degradation-consistency indicator designed to quantify the coherence and regularity of degradation dynamics across multiple signal domains. Unlike traditional health indicators that rely on isolated statistical metrics or single physical measurements, the M-score aggregates complementary information related to capacity evolution, signal irregularity, electrical stability, and thermal variability into a unified scalar descriptor. The M-score is defined as in Equation (2):

M - score = \frac{Δ Q_{v a r} \cdot l o g (H_{c y c l e})}{V_{s t a b i l i t y} \cdot \sqrt{T_{f l u c t u a t i o n}}}

(2)

where

Δ Q_{v a r}

denotes the variance of discharge capacity differences,

H_{c y c l e}

represents the entropy of the cycle sequence,

V_{s t a b i l i t y}

quantifies voltage consistency across cycles, and

T_{f l u c t u a t i o n}

measures temperature variability. The detailed mathematical formulations of each component are provided in Appendix A.5. For a detailed description of the cycle-level multi-domain feature extraction framework, see Appendix A.6.

Figure 4 illustrates the degradation-driven fusion mechanism that integrates charge variance, cycle entropy, voltage stability, and temperature fluctuations into a unified health indicator. Directed arrows denote signal propagation and functional transformation. The × symbols represent multiplicative coupling between normalized sub-indicators (M₁ and M₂), ensuring sensitivity to multi-domain degradation dynamics. The logarithmic operator is defined as ln(·).

It is important to clearly delineate the conceptual scope of the proposed M-score. While its formulation is conceptually inspired by ideas from nonequilibrium systems and dissipative structure theory [40], the M-score does not stand for thermodynamic entropy, energy dissipation, or any directly measurable physical quantity. Instead, it is a data-driven operational indicator derived exclusively from observable electrical, thermal, and statistical signal characteristics. So, references to dissipative structures are intended to provide an interpretative framework for understanding degradation irregularity and stability, rather than implying a mechanistic or physics-based correspondence with electrochemical degradation processes. The M-score should therefore be interpreted as a degradation-consistency descriptor specifically designed to support adaptive feature selection and attention-based temporal modeling, rather than as a surrogate for physical entropy production or irreversible energy loss.

3.3.2. Multi-Criteria Adaptive Selection (MCAS)

The high-dimensional feature space obtained through multi-domain extraction necessitates an effective and interpretable dimensionality reduction strategy. To this end, a Multi-Criteria Adaptive Selection (MCAS) framework is introduced to identify informative, stable, and non-redundant features prior to model training. Unlike conventional feature selection methods that rely on a single relevance criterion, MCAS evaluates each candidate feature from multiple complementary perspectives and integrates these evaluations into a unified scoring mechanism, as summarized in Algorithm 1.

Algorithm 1. MCAS feature selection

Input : Feature set X = {x_{1}, \dots, x_{M}},

target RUL, weights w = {w_{1}, \dots, w_{5}}

Output: Selected subset X*

for each feature x_{i}

in X do

S_{s t a t}

(i) \leftarrow (| r_{P e a r s o n}

(x_{i}

, RUL) | + | r_{S p e a r m a n}

(x_{i}

, RUL) | + MI (x_{i}

, RUL))/3

S_{m o d e l}

(i) \leftarrow 0.5 * ({F I}_{R F}

(x_{i}

) + ∆ {M A E}_{p e r m}

(x_{i}

))

for each cycle t : r_{t}

(i) \leftarrow corr (x_{i}

@t, RUL@t)

S_{t e m p}

(i) \leftarrow 1 - std (r_{t}

(i)) / mean (| r_{t}

(i)|)

S_{d o m a i n}

(i) ← {1.0 if strongly supported; 0.5 if moderate; 0.0 otherwise}

S_{u n i q}

(i) \leftarrow 1 - \max_{j \neq i} | corr (x_{i}

, x_{j}

)|

redundancy(i) ← max_{j ≠ i} |corr(x_i, x_j)|

S_{b a s e}

(i) \leftarrow \sum_{k} w_{k} * S_{k} (i)

, k ∈ {stat, model, temp, domain, uniq}

S_{M C A S}

(i) \leftarrow S_{b a s e}

(i)/sqrt (1 + redundancy(i))

end for

Rank features by S_{M C A S}

descending

X * \leftarrow Top - K features (or apply threshold on S_{M C A S}

)

return X*

Specifically, each feature

x_{i}

is assessed using five distinct criteria: statistical relevance, model contribution, temporal stability, domain relevance, and uniqueness. Statistical relevance quantifies the direct association between a feature and the target RUL using correlation- and information-theoretic measures. Model contribution reflects the practical predictive value of a feature through model-based importance indicators. Temporal stability captures the consistency of feature–RUL relationships across degradation cycles, thereby penalizing features whose relevance fluctuates over time. Domain relevance incorporates prior battery knowledge to emphasize physically meaningful indicators, while uniqueness explicitly discourages redundancy by imposing penalties on features showing high redundancy with the remaining candidate features. The individual criteria are linearly combined using a weighted aggregation scheme, followed by an explicit redundancy-aware normalization. The resulting MCAS score for feature

x_{i}

is given in Equation (3).

S_{M C A S} (i) = \frac{\sum_{k = 1}^{5} w_{k} \cdot S_{k} (i)}{1 + \sqrt{λ \cdot r e d u n d a n c y (i)}}

(3)

where

S_{k} (i)

denotes the normalized score associated with the

k

th criterion,

w_{k}

is the corresponding weighting coefficient, and

λ

controls the influence of redundancy. The redundancy term is defined as the maximum absolute correlation between feature

x_{i}

and any other feature in the candidate set [40,41].

Following score computation, features are ranked in descending order of

S_{M C A S}

, and the final subset

X^{*}

is obtained either by selecting the top-

K

ranked features or by applying a threshold on the MCAS score, depending on the experimental configuration. This flexible selection strategy allows MCAS to adapt to different feature budgets while maintaining interpretability and robustness.

The weighting coefficients

w_{k}

used in the MCAS formulation were determined through a constrained preliminary tuning phase conducted prior to the final experimental evaluation. During this phase, multiple weight configurations were systematically explored using a fixed-feature pool, model architecture, and evaluation protocol. Candidate configurations were assessed based on predictive accuracy (RMSE) and performance stability across repeated runs. Importantly, once selected, the final weights were fixed and consistently applied across all datasets and chemistries, including both the TRI and SNL benchmarks, without any dataset-specific re-optimization. This design choice was intentionally adopted to make to reduce the likelihood of implicit data leakage and to ensure that MCAS operates as a general selection mechanism rather than a dataset-tuned heuristic.

A summary of the preliminary weight exploration procedure is provided in Appendix B.1, while the detailed search ranges and sensitivity analysis are reported in Appendix B.2. By jointly accounting for relevance, stability, redundancy, and domain knowledge within a unified framework, MCAS provides a principled and interpretable mechanism for adaptive feature refinement in battery RUL prognostics.

3.3.3. Bi-LSTM with Dual Multi-Head Attention Architecture

To effectively model the complex temporal evolution and feature interactions inherent in multi-domain battery degradation data, a Bi-LSTM framework enhanced with dual multi-head attention is proposed (Figure 5). The architecture combines bidirectional recurrent sequence modeling with attention mechanisms operating along both temporal and feature dimensions, enabling the network to simultaneously learn long-range degradation dependencies and to adaptively emphasize the most informative time intervals and feature representations.

Bi-LSTM architecture: The Bi-LSTM backbone is the core temporal modeling part of the proposed framework (Figure 6). Unlike standard unidirectional LSTM, the bidirectional variant processes input sequences in both forward (

\vec{h_{t}}

) and backward (

\overset{\leftarrow}{h_{t}}

) directions, capturing degradation dependencies that evolve both chronologically and retrospectively [43,44]. For an input sequence

X = [x_{1}, x_{2}, \dots, x_{T}]

, the hidden states are expressed as follows:

\vec{h_{t}}, = {L S T M}_{f} (x_{t}, \vec{h_{t - 1}}), \overset{\leftarrow}{h_{t}} = {L S T M}_{f} (x_{t}, \overset{\leftarrow}{h_{t + 1}}) L S T M (x_{t}, h_{t - 1}, c_{t - 1}) = (o_{t} ⨀ t a n h (f_{t} ⨀ c_{t - 1} + i_{t} ⨀ {\bar{c}}_{t}), f_{t} ⨀ c_{t - 1} + i_{t} ⨀ {\bar{c}}_{t})

(4)

where “⊙” denotes the Hadamard (elementwise) product. The ultimate concealed representation is obtained through the process of concatenation:

h_{t} = [\vec{h_{t}}; \overset{\leftarrow}{h_{t}}]

(5)

This dual-pass mechanism is particularly advantageous in battery prognostics, where degradation signals often exhibit delayed or history-dependent effects [22,23].

Dual Multi-Head Attention Mechanism. Although the Bi-LSTM effectively encodes sequential information, its fixed-length hidden representations limit both adaptability and interpretability. To overcome this limitation, a dual attention layer is introduced, consisting of a temporal attention module and a feature-level attention module, each implemented using multi-head self-attention (MSA).

Temporal Attention. In the domain of temporal attention processing (Equation (6)), the calculation of a weighted context vector is performed, with an emphasis placed on degradation-critical time steps.

α_{t} = \frac{e x p (e_{t})}{\sum_{k = 1}^{T} e x p (e_{k})}, e_{t} = v_{t}^{T} t a n h (W_{t} h_{t} + b_{t})

(6)

where

e_{t}

denotes the relevance score for time step t, and

α_{t}

represents normalized temporal attention weights. The temporal context vector is given in Equation (7):

c_{T} = \sum_{t = 1}^{T} α_{t} h_{t}

(7)

This mechanism allows the model to adaptively emphasize degradation-relevant temporal segments, thereby enhancing interpretability when compared to conventional recurrent representations that assign equal importance to all time steps [22].

Feature-Level Attention. Feature-level attention operates orthogonally to temporal attention by identifying the most informative feature channels within the sequence representation. Given the hidden representation matrix

H = [h_{1}, h_{2}, \dots, h_{T}]

, feature attention weights are computed in Equation (8):

β_{j} = \frac{e x p (s_{i})}{\sum_{j = 1}^{F} e x p (s_{j})}, s_{i} = v_{F}^{T} t a n h (W_{F} H_{i} + b_{F})

(8)

where

F

denotes the number of features. The feature-level context vector is yielded as in Equation (9):

c_{F} = \sum_{i = 1}^{F} β_{i} H_{i}

(9)

By emphasizing degradation-relevant feature channels, this module enhances robustness against feature redundancy and noise, particularly in high-dimensional multi-domain representations.

Multi-Head Formulation. To further increase representational diversity, both temporal and feature attention modules adopt a multi-head formulation [45]. Each attention head independently projects the input into distinct subspaces and computes attention weights in parallel. The aggregated attention output is defined as in Equation (10):

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W^{O}

(10)

where

{h e a d}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

. This formulation allows the network to effectively model diverse degradation behaviors spanning multiple temporal resolutions and distinct feature subspaces.

Output Fusion and Regression Layer. The temporal (

c_{T}

) and feature (

c_{F}

) context vectors are concatenated then processed by fully connected (Dense) layers, with dropout regularization applied to mitigate overfitting. This is defined in Equation (11):

y = W_{y} [c_{T}; c_{F}] + b_{y}

(11)

where the network output

y

denotes the predicted RUL. The final layer employs a linear activation, which is well-suited to continuous regression tasks. Through this dual-context fusion, the proposed architecture simultaneously learns when degradation becomes critical (temporal dimension) and which features contribute most strongly (feature dimension), thereby improving both predictive performance and interpretability.

3.3.4. Hyperparameter Configuration and Training Strategy

To ensure consistency, computational efficiency, and fair comparison across both the TRI and SNL datasets, the proposed Bi-LSTM architecture with dual multi-head attention was trained using a standardized and empirically validated configuration. The layer-wise structure and key architectural parameters of the final model are summarized in Table 2.

Model Configuration: The input to the network consists of fixed-length sequences of 100 cycles, each represented by 36 MCAS-selected features. The temporal modeling backbone comprises three stacked Bi-LSTM layers with progressively decreasing hidden dimensions (512, 256, and 128 units, respectively), enabling hierarchical abstraction of degradation patterns while controlling model complexity. Layer normalization and dropout regularization were applied after each Bi-LSTM block to stabilize training and mitigate overfitting. The dual attention module consists of two multi-head self-attention blocks operating along temporal and feature dimensions. The temporal attention block employs eight heads with a key dimension of 64, while the feature-level attention block uses four heads with a key dimension of 32. Dropout and normalization layers are incorporated within the attention blocks to improve robustness and generalization. Following attention-based feature aggregation, a sequence of fully connected layers with gradually reduced dimensionality (256, 128, 64, and 32 units) performs nonlinear regression mapping. Progressive dropout scheduling, decreasing from 0.5 to 0.2, is adopted across the dense layers to balance regularization strength and representational capacity. The final output layer comprises a single neuron with a linear activation function, producing continuous-valued RUL estimates.

Training Setup: Model optimization was conducted using the Adam optimizer, initialized with a learning rate of

1 \times 10^{- 3}

. Gradient clipping with a clip norm of 1.0 was applied to prevent gradient explosion, which is particularly important for deep recurrent architectures. Following comparative preliminary evaluations of multiple loss functions, the Huber loss was adopted for training, as it offers greater robustness to outliers than mean squared error, particularly in scenarios involving accelerated degradation. The model was trained using a batch size of 16, with a maximum of 100 training epochs. To mitigate overfitting, early stopping was applied by monitoring the validation loss with a patience of 20 epochs, while an adaptive learning-rate scheduling strategy was employed, reducing the learning rate by a factor of 0.5 after 10 epochs without improvement. In addition, L2 weight regularization with a coefficient of 0.001 was applied to all trainable layers to further enhance generalization. Only model checkpoints corresponding to the minimum validation loss were retained for final evaluation.

Computational Environment: All experiments were performed within a standardized computational setup. The implementation was developed in Python 3.9.18 using TensorFlow 2.20.0 and Keras 3.10.0 as the primary deep learning frameworks. Training and evaluation were primarily performed on a macOS system (Darwin 25.0.0) equipped with an Intel Core i9 processor and 16 GB RAM, operating in CPU-only mode due to limitations of the Metal plugin. To verify scalability and cross-platform consistency, the same implementation was additionally validated on a Linux workstation featuring an NVIDIA RTX A5000 GPU (24 GB VRAM) and 64 GB system memory.

3.3.5. Evaluation Metrics

Model performance and generalization capability were assessed using three widely adopted regression metrics for RUL prediction: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R²). Together, these metrics provide a complementary evaluation of predictive accuracy, error magnitude, and explanatory power.

RMSE, defined as the square root of the mean squared prediction error, is more sensitive to outliers due to the squaring of residuals and is therefore a sensitive and widely adopted metric for evaluating predictive precision in RUL estimation tasks. RMSE is defined in Equation (12).

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({R U L}_{i}^{p r e d} - {R U L}_{i}^{t r u e})}^{2}}

(12)

Mean Absolute Error (MAE) reflects the average absolute deviation between predicted and true values, offering an intuitive measure of model bias while remaining less sensitive to extreme outliers. MAE is defined in Equation (13):

M A E = \frac{1}{N} \sum_{i = 1}^{N} | {R U L}_{i}^{p r e d} - {R U L}_{i}^{t r u e} |

(13)

The coefficient of determination (

R^{2}

) measures the extent to which the variability in the true RUL values is explained by the model’s predicted outputs. Values approaching 1.0 denote strong agreement and high explanatory strength, whereas lower values indicate weaker correlations. R² is defined in Equation (14):

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {({R U L}_{i}^{p r e d} - {R U L}_{i}^{t r u e})}^{2}}{\sum_{i = 1}^{N} {({R U L}_{i}^{t r u e} - {R \bar{U} L}^{t r u e})}^{2}}

(14)

To assess training stability and robustness under stochastic optimization, each experimental configuration was executed ten independent times using different random initializations. The reported RMSE values, therefore, include a 95% confidence interval, computed across these repeated runs, which provides an estimate of performance variability and convergence consistency.

4. Results

This section reports the experimental outcomes achieved with the proposed framework, which integrates multi-domain feature extraction, MCAS-driven feature selection, and a dual-attention Bi-LSTM architecture.

4.1. Case 1 Results: TRI Dataset

4.1.1. Multi-Domain Feature Extraction

To investigate the individual contribution of different feature domains, the RUL prediction performance was first evaluated using raw feature groups extracted from the time, frequency, chaotic, and battery-specific domains. In this analysis, feature selection was intentionally excluded to isolate the representational capacity of each domain. An overview of the obtained results is given in Table 3. These results highlight the necessity of combining multiple feature domains to effectively characterize battery degradation under accelerated charging conditions. Among the individual domains, chaotic features provide the strongest standalone performance, suggesting that nonlinear irregularities have valuable degradation information. However, directly combining all raw features does not yield proportional performance gains, despite the increased feature count. This observation highlights the presence of substantial redundancy within high-dimensional feature spaces and motivates the need for adaptive feature selection. To further examine degradation representation beyond domain-level features, the impact of incorporating the proposed M-score was evaluated under identical model architecture and training conditions. This analysis focuses on early-cycle behavior, where accurate characterization of initial degradation dynamics is critical for long-term RUL prediction. Rather than serving as a standalone predictor, the M-score is introduced as a composite indicator that augments degradation representation by integrating information from multiple signal domains.

Figure 7 illustrates the constituent components of the M-score computed over the first 100 cycles for a representative TRI battery. Specifically, the capacity-irregularity component reflects cycle-to-cycle fluctuations in discharge capacity, while the entropy-based component captures stochastic variations in early degradation trajectories. Voltage stability and temperature fluctuation components quantify electrical and thermal consistency, respectively. Collectively, these components provide a multi-dimensional depiction of early-stage degradation spanning electrochemical, electrical, and thermal domains.

The evolution of the M-score exhibits a gradual and near-monotonic increase as degradation progresses. Elevated M-score values are associated with increased capacity variability, higher entropic disorder, and reduced voltage–thermal stability, whereas stable operating conditions yield values close to zero. By embedding multi-domain degradation consistency into a single scalar descriptor, the M-score enhances degradation representation without increasing the dimensionality of the feature space, thereby supporting subsequent feature selection and sequence modeling stages.

A detailed cycle-level feature analysis was conducted on a representative fast-charging cell (CH4) to set up a clear link between model predictions and underlying degradation behavior. In Figure 8, capacity and state-of-health trajectories show a smooth and monotonic decline, indicating progressive but stable degradation. Energy efficiency and power-related indicators display increased variability during early operation, followed by a consistent downward trend reflecting cumulative resistive and thermal losses. Voltage and temperature statistics reveal gradually increasing dispersion, accompanied by a near-linear rise in the internal resistance (approximately

1.0 \times 10^{- 3} Ω

per cycle), which is indicative of interfacial degradation and charge-transfer limitations. In parallel, the voltage variance ratio increases steadily, mirroring the reduction in energy efficiency. The M-score demonstrates a pronounced upward trend and intermittently approaches a critical threshold, signaling the emergence of nonlinear irregularities prior to substantial capacity fade. Correlation analysis further reveals strong cross-domain dependencies, particularly the negative relationships between capacity and internal resistance, and between energy efficiency and the M-score. These observations underscore the benefit of integrating physically interpretable indicators with nonlinear degradation descriptors for comprehensive battery health assessment.

4.1.2. Feature Selection

To evaluate the impact of feature selection on RUL prediction performance, several feature selection strategies were applied to the multi-domain feature set derived from the TRI dataset. Conventional filter-based and model-based approaches were compared with the proposed MCAS method under identical model architecture and training conditions. The resulting performance metrics in bold show the best results in Table 4. All feature selection methods outperform the no-selection baseline, confirming the necessity of dimensionality reduction for high-dimensional feature representations. While Mutual Information and Random Forest-based ranking provide substantial performance gains, their hybrid combination yields further improvement. Notably, the proposed MCAS achieves the best overall performance while retaining fewer features, demonstrating its ability to identify informative, complementary, and temporally stable indicators across multiple domains.

The outcome of the MCAS feature selection process is illustrated in Figure 9, where 36 features are retained from an initial pool of 161 candidates. The selected features are dominated by charge-related statistical descriptors that capture cumulative capacity evolution, complemented by discharge-based indicators reflecting energy throughput behavior. In addition, several chaotic- and entropy-domain features—including the M-score and related nonlinear metrics—are ranked among the most influential, highlighting their relevance in characterizing degradation consistency and instability.

4.1.3. Model Performance and Comparative Analysis

The impact of network architecture on RUL prediction capability was examined through a controlled comparison of various deep learning models, each trained and evaluated with identical inputs, early-cycle sequence lengths, and a battery-wise evaluation protocol, ensuring that observed differences arise solely from architectural variations.

As summarized in Table 5, convolution-based architectures exhibit limited capability in capturing long-term degradation dynamics, such as 1D-CNN and CNN–LSTM. Recurrent models based on LSTM and Bi-LSTM improve stability by explicitly modeling temporal dependencies; however, their representational capacity remains constrained in the absence of attention mechanisms. Attention-enhanced Bi-LSTM architectures yield substantial performance gains by adaptively focusing on degradation-critical cycles and features. The transformer model included in this comparison was implemented as an encoder-only, parameter-matched but non-optimized baseline. Its performance should therefore be interpreted as indicative of architectural suitability under controlled experimental constraints rather than as an upper bound on transformer-based RUL prediction. Importantly, capacity-matched baselines demonstrate that performance gains cannot be attributed solely to increased model size. Instead, the progressive integration of bidirectional temporal modeling, dual attention mechanisms, and MCAS-guided feature refinement plays a decisive role in reducing prediction error. The performance metrics in bold show the best results in Table 5.

As illustrated in Figure 10, the proposed model accurately tracks the RUL trajectories of representative fast-charging cells, achieving RMSE values in the range of approximately 42–46 cycles with consistently low MAE values. The predicted trajectories closely follow ground-truth trends while maintaining relatively narrow 95% confidence intervals, demonstrating both predictive accuracy and robustness across different degradation profiles.

4.2. Case 2 Results: SNL Dataset

For the SNL dataset, model performance was evaluated under a chemistry-consistent representative-cell protocol, in which one fixed test cell was selected for each chemistry subset (LFP, NMC, and NCA). To eliminate selection bias and ensure reproducibility, the representative cells were randomly sampled using a fixed random seed prior to model training. This procedure prevents performance-driven cherry-picking while allowing controlled and interpretable comparisons across model variants.

It should be noted that the evaluation strategy adopted for the SNL dataset differs from conventional k-fold or leave-one-cell-out validation schemes. The SNL dataset shows extreme heterogeneity in cycle life, degradation trajectories, and operating conditions across different chemistries. Under such conditions, direct averaging of performance across heterogeneous cells may obscure chemistry-specific degradation behaviors and lead to misleading aggregate statistics. Consequently, representative-cell evaluation was deliberately chosen to prioritize interpretability, chemistry-specific insight, and architectural ablation clarity over population-level averaging.

4.2.1. Progressive Architectural Evaluation

To systematically quantify the contribution of individual architectural and methodological components, five progressively enhanced Bi-LSTM-based configurations were evaluated on the fixed representative test cells. The baseline two-layer Bi-LSTM (Model 1) captures bidirectional temporal dependencies but lacks an explicit mechanism to prioritize informative cycle segments or discriminative features, resulting in limited predictive accuracy. Introducing single (temporal) attention (Model 2) enables adaptive weighting of degradation-critical cycles, yielding consistent reductions in both RMSE and MAE. To further capture joint temporal and feature-level dependencies, a dual-attention Bi-LSTM configuration (Model 3) was constructed, where temporal attention emphasizes informative cycle intervals while feature-level attention dynamically re-weights degradation-relevant attributes. Building on this design, a deeper three-layer Bi-LSTM with dual attention (Model 4) increases representational capacity and improves generalization across all chemistries. Finally, the proposed framework (Model 5) integrates Model 4 with MCAS-guided feature selection, which reduces redundancy while preserving informative multi-domain characteristics. This configuration consistently achieves the lowest prediction errors and highest coefficients of determination across all SNL subsets.

4.2.2. Quantitative Performance Analysis

For a controlled and chemistry-consistent comparison, one representative test cell was selected from each subset: LFP: c_25C_0.5–1C (3037 cycles), NCA: b_25C_0.5–1C (521 cycles), NMC: a_25C_0.5–3C (729 cycles). All model variants (Models 1–5) were evaluated on these fixed test instances over repeated runs, and performance metrics are reported as mean ± standard deviation to quantify both accuracy and robustness. The resulting comparison is summarized in Table 6. Relative to the baseline (Model 1), the proposed Model 5 achieves substantial RMSE reductions of 67.0% (LFP), 77.4% (NCA), and 77.9% (NMC). In terms of R², the performance ranking follows LFP > NMC > NCA, which aligns with the comparatively stable electrochemical behavior of LFP cells and the higher sensitivity typically observed in NCA-based chemistries. The highest-performing results are indicated in bold in Table 6.

Beyond average accuracy, the reported dispersion values demonstrate that the proposed configuration maintains stable performance across different random initializations, a critical requirement for deploying learning-based prognostic models under heterogeneous degradation conditions.

4.2.3. Trend Analysis and Trajectory-Level Interpretation

Figure 11 illustrates the evolution of RMSE and R² across Models 1–5 for all chemistries. RMSE decreases monotonically from the baseline to the proposed configuration (e.g., 850.2 → 280.5 cycles for LFP, 420.5 → 95.2 cycles for NCA, and 445.8 → 98.7 cycles for NMC), while R² increases accordingly, reaching 0.9623 (LFP), 0.8766 (NCA), and 0.9123 (NMC) in Model 5. These consistent trends confirm that (i) attention mechanisms facilitate adaptive focus on degradation-relevant cycle intervals and (ii) MCAS-guided feature selection further strengthens prediction quality by suppressing redundant inputs.

Figure 12 provides a qualitative, trajectory-level comparison of predicted and ground-truth RUL for the representative LFP, NCA, and NMC cells. Interpreting the plots with RUL on the y-axis, the proposed model closely tracks the true degradation trajectories across all chemistries.

The predicted curves preserve the characteristic shape of degradation, capturing both early-stage nonlinear evolution and the subsequent smoother decline without sacrificing trend continuity. The uncertainty bands remain relatively narrow under stable degradation regimes and widen in regions where degradation rates change abruptly, reflecting increased dynamical variability. Among the three subsets, LFP exhibits the most stable tracking behavior, while NCA shows higher curvature variability, with NMC lying between these extremes. Overall, these observations support the effectiveness of the dual-attention mechanism in simultaneously emphasizing informative cycle segments and salient degradation-related attributes under heterogeneous chemistry conditions.

5. Discussion

5.1. Interpretability and Comparison with Related Work

Interpretability is a central requirement for data-driven battery prognostic models, particularly when such models are expected to support operational decisions in safety-critical or resource-constrained applications. Previous studies have primarily addressed interpretability through attention visualization techniques, where attention weights are inspected to identify influential cycles or temporal segments. For instance, attention heatmaps have been employed to illustrate cycle-level relevance in RUL estimation tasks [11].

In contrast, the present study introduces the M-score trend as an additional, complementary interpretability mechanism. Rather than replacing attention-based explanations, the M-score provides a degradation-consistency perspective that enables cross-validation of prediction reliability. By jointly examining the temporal evolution of M-scores and prediction errors, it is observed that smooth and gradually evolving M-score trends are consistently associated with stable and accurate RUL predictions. Conversely, abrupt fluctuations in the M-score often coincide with increased prediction uncertainty. This behavior positions the M-score as a robustness-oriented diagnostic indicator that augments attention-based explanations with an interpretable signal reflecting degradation regularity.

Table 7 summarizes representative studies that have employed different feature extraction and selection strategies on TRI-based datasets.

Reference [46] introduces an MI-based Bi-LSTM framework that initially extracts 61 handcrafted health indicators from voltage, current, temperature, incremental capacity (IC) curves, and energy-related metrics. These features are then ranked using mutual information to keep the top seven. While the Bi-LSTM model shows stable training behavior, the experimental evaluation is limited to only four battery cells from the TRI/MIT dataset, which restricts the statistical robustness and generalizability of the results to broader degradation patterns. The model’s performance is also extremely sensitive to preprocessing and feature construction choices, such as the selection of voltage windows and IC curve denoising methods. Although inference latency is low, the offline feature extraction and ranking steps increase data preparation overhead and reduce deployment flexibility. Consequently, the approach favors detailed feature engineering over data efficiency and scalability, limiting its practical utility despite promising results on a small, controlled dataset.

Reference [47] presents the JFO-CFNN framework, which reduces 46 handcrafted health indicators to fifteen features using systematic sampling combined with Jellyfish Optimization. The model is evaluated on a limited set of battery cells (c33–c36), which constrains statistical reliability and limits understanding of how the model performs under varied operating conditions. From an engineering standpoint, the close integration of feature selection with a metaheuristic optimization algorithm increases sensitivity to hyperparameter settings, as feature relevance depends heavily on the optimizer’s control parameters. In addition, the iterative structure of Jellyfish Optimization raises computational demands during training due to repeated fitness evaluations. While the model shows strong accuracy on a small, curated dataset, it emphasizes optimization-driven performance at the expense of transparency, scalability, and deployment efficiency.

Reference [48] proposes a Parallel Feature Fusion Network (PFFN) that combines statistical cycle-level features with domain-informed indicators using parallel transformer encoders and a feature fusion mechanism. As shown in Table 7, the model is trained on 41 batteries—significantly more than studies relying on just a few selected cells—enabling better generalization across varied degradation behaviors. This advantage, however, comes with added architectural and optimization complexity. The framework incorporates multi-head attention, parallel transformer blocks, and Bayesian optimization for hyperparameter tuning, which increases sensitivity to tuning choices and extends the training time. While inference is efficient once the model is trained, the feature selection process is largely implicit, limiting interpretability and making it difficult to trace underlying degradation mechanisms. As a result, although PFFN performs well in data-rich environments, its deployment may be less practical in scenarios where fast adaptation, minimal tuning, or model transparency is essential.

Reference [49] introduces the Positive and Negative Convolution Cross-Connect Neural Network (PNCCN), designed to model battery degradation dynamics directly from voltage, current, and temperature time-series data using specialized PNC and NCC layers. The model is trained and evaluated on 118 lithium-ion cells from the TRI/MIT dataset, with a 60/20/20 split (approximately 71 training, 24 validation, and 23 test cells), offering far broader data coverage than studies limited to a few selected cells. This larger dataset enables exposure to more diverse degradation behaviors but also introduces considerable architectural and training complexity. The framework requires quadratic interpolation to align long time-series data (around 35,000 s per cycle) and training schedules extending up to 2000 epochs, contributing to the significant computational cost. Model performance is also extremely sensitive to hyperparameters, including convolutional filter configurations, nonlinear interaction settings, and training dynamics. Importantly, the substantial gap between training/validation and test RMSE (9.47 vs. 93.58 cycles) suggests a risk of overfitting, even with the large dataset. As a result, while PNCCN offers strong representational power without relying on internal resistance measurements, its deployment entails a trade-off between using rich data, managing computational demands, and ensuring model robustness.

Reference [50] presents MuRAIN, a multi-time-resolution attention-based interaction network developed for joint estimation of RUL directly from raw cycling data. The model is evaluated on 124 cells from TRI-1 and 45 cells from TRI-2, using a one-third split at the cell level for training, validation, and testing. This yields approximately 41 training cells for TRI-1 and 15 for TRI-2, offering a large and diverse dataset that supports learning across varied degradation patterns. However, this advantage comes with significant architectural and computational complexity. MuRAIN integrates multi-resolution signal patching, stacked multi-head self-attention, and interactive learning modules—components that increase hyperparameter sensitivity, particularly in attention configuration and patch structuring. Additionally, processing raw, high-resolution cycling data through multiple attention layers results in notable training-time delays and memory demands, even though inference is efficient post-training. As a result, while MuRAIN shows robust performance and robustness in data-rich settings, its practical use is better suited to high-capacity computing environments rather than lightweight or rapidly deployable battery health-monitoring systems.

Transformer-based models have also shown promising performance on large-scale battery datasets; however, their effective deployment typically requires extensive hyperparameter optimization and dataset-specific input representations. In this study, the transformer was intentionally implemented as a parameter-matched but non-optimized baseline, and its performance should therefore be interpreted as indicative rather than exhaustive. By contrast, the proposed Bi-LSTM architecture with dual attention explicitly integrates domain-informed feature selection and attention-based temporal modeling within a unified and interpretable framework. Using 36 MCAS-selected features from an initial pool of 161 indicators, the proposed approach achieves a test RMSE of 43.85 cycles on 40 TRI batteries. These results show that combining adaptive multi-criteria feature selection with interpretable attention mechanisms improves both predictive accuracy and transparency by emphasizing degradation-relevant temporal segments and feature interactions.

Table 8 presents a comparative evaluation of the proposed MCAS-guided Bi-LSTM framework with dual attention against several recent state-of-the-art methods evaluated on the SNL multi-chemistry dataset.

Reference [51] investigates battery health and lifetime estimation using the SNL dataset, applying a cell-level training and testing strategy that captures degradation variability across a range of operating conditions. This broader data configuration enhances model robustness but also introduces sensitivity to dataset-specific factors, such as protocol diversity and measurement resolution, which affect hyperparameter tuning. The method depends on high-resolution cycling data, which increases computational demands during training, though inference remains efficient once the model is trained. The DegradAI Mixture-of-Experts model shows strong chemistry-adaptive performance, reporting a low mean absolute error (≈2.5 × 10⁻² Ah) and R² values near 0.99 across LFP, NMC, and NCA chemistries. However, since the specific training and testing cell groups and fixed evaluation splits are not clearly defined, RMSE comparisons should be interpreted cautiously. The reported metrics are best understood as dataset-specific indicators rather than directly comparable performance benchmarks.

Reference [52] explores early-cycle lifetime prediction using the SNL dataset, using temperature-dependent health indicators extracted from only the first 10 charge–discharge cycles and a lightweight ElasticNet regression model. This approach significantly reduces training data requirements and supports fast deployment with minimal computational overhead. From an engineering standpoint, the model offers high interpretability and negligible inference latency due to its linear structure. However, reported performance varies widely, with RMSE ranging from 37 to 329 cycles and MAPE between 6% and 17%, reflecting sensitivity to data partitioning and variability in operating conditions. These results highlight a trade-off between model simplicity and predictive accuracy. Given the lack of fixed training groups and consistent evaluation splits, the reported RMSE values should be viewed as context-specific rather than directly comparable performance benchmarks.

Reference [53] reports strong capacity estimation performance on NCA and NCM cells across varying operating conditions, with RMSE values ranging from approximately 0.009 to 0.028 Ah and R² consistently between 0.98 and 0.99. The feature importance analysis shows that predictions are primarily influenced by charge/discharge energy metrics and temperature statistics, showing the model’s sensitivity to thermal and energetic degradation patterns. From an engineering standpoint, the results reflect effective modeling of chemistry-specific behavior. However, the evaluation is limited to a predefined set of cells and focuses solely on capacity estimation (Ah), rather than on cycle-based lifetime or RUL prediction. As such, Reference [53] is excluded from Table 8, which includes only methods assessed under comparable RUL-focused targets and clearly defined training/testing protocols. While the reported accuracy is high within the study’s specific scope, these results should be interpreted within the context of the SNL operating conditions and target formulation used, and are not directly comparable to RUL-oriented studies.

Despite their respective merits, these approaches primarily rely on static regression formulations or chemistry-specific feature mappings and are therefore limited in their capacity to capture long-range temporal dependencies and complex multi-domain interactions inherent in battery degradation sequences. In contrast, the proposed framework explicitly integrates MCAS-based feature refinement with a dual-attention mechanism, enabling the simultaneous modeling of temporal dynamics and feature-level relevance within a unified recurrent architecture.

Across all SNL chemistries, the proposed model achieves RMSE = 280.5 cycles, MAE = 198.7 cycles, and R² = 0.9623, indicating strong generalization and robustness, particularly under the highly nonlinear degradation behavior observed in NCA cells. While absolute error magnitudes may vary across studies due to differences in target definitions, evaluation protocols, and units of measurement, the proposed framework consistently demonstrates a favorable balance between predictive accuracy and interpretability. Collectively, the results indicate that combining MCAS-guided feature selection with dual-attention-based temporal modeling supports robust and scalable RUL estimation in heterogeneous battery systems.

5.2. Physical Interpretation and Conceptual Scope of the M-Score

From a physical standpoint, the M-score should be interpreted as a system-level indicator of degradation consistency, rather than as a direct proxy for specific electrochemical degradation mechanisms. Although its conceptual motivation draws inspiration from dissipative systems theory, the M-score does not attempt to quantify thermodynamic entropy production, energy dissipation, or irreversible losses. Instead, it captures observable manifestations of degradation-induced instability using measurable electrical, thermal, and statistical signals. Within this framework, battery degradation can be viewed as a non-equilibrium process in which increasing irregularity and variability reflect a gradual loss of structural and energetic order. The connection between the M-score and dissipative structure theory is therefore interpretative rather than mechanistic. The M-score aggregates indicators, such as irregular capacity evolution, voltage dispersion, thermal variability, and entropy-related signal complexity, to highlight deviations from smooth and stable aging behavior. Accordingly, the primary value of the M-score lies in its ability to serve as an operational and interpretable consistency descriptor, computed solely from commonly available measurements. When integrated with attention-based temporal modeling, the M-score further supports interpretability by linking prediction reliability to the regularity of the underlying degradation trajectory, without implying a direct mapping to internal electrochemical processes.

5.3. Limitations of the M-Score and MCAS Design Choices

Despite its advantages, several limitations of the M-score should be explicitly acknowledged. First, the indicator does not capture mechanistic distinctions among degradation processes such as lithium plating, SEI evolution, or electrode structural failure. Distinct failure processes may produce similar macroscopic signatures in voltage, temperature, or capacity measurements, resulting in comparable M-score responses despite fundamentally different degradation origins.

Second, certain operating conditions may lead to elevated M-score values without corresponding irreversible degradation. Highly dynamic load profiles, abrupt ambient temperature changes, or transient measurement noise can introduce short-term irregularities that increase entropy-related components of the M-score. When considered in isolation, such effects may produce false-positive indications of degradation inconsistency. This limitation motivates the integration of the M-score within the broader MCAS-guided feature selection and attention-based modeling framework, where noisy, redundant, or temporally unstable features are explicitly penalized.

Conversely, under slow and highly uniform degradation regimes, the M-score may exhibit reduced sensitivity, as conventional health indicators based on capacity fade or resistance growth already provide sufficient prognostic information. In such cases, the incremental benefit of degradation-consistency metrics becomes less pronounced.

Regarding the MCAS mechanism, dataset-specific re-optimization of weighting coefficients could potentially improve numerical performance. However, this strategy was deliberately avoided to prevent overfitting by design and to preserve fair cross-dataset comparison. Instead, MCAS weights were fixed following preliminary tuning and consistently applied across all datasets and chemistries, prioritizing robustness and generalizability over dataset-specific optimality.

5.4. Accuracy–Complexity Trade-Off Considerations

While the proposed MCAS-guided Bi-LSTM framework with dual attention delivers strong predictive performance, its practical use must be considered alongside model complexity. The final architecture includes approximately 1.9 million trainable parameters—striking a deliberate balance between expressive capacity and generalization, particularly for modeling the nonlinear dynamics of battery degradation. To assess this design trade-off, lighter variants with reduced Bi-LSTM depth and smaller hidden dimensions were also evaluated. These models significantly lower the parameter count and computational cost but consistently show reduced accuracy and greater sensitivity to temporal noise and feature redundancy, especially in cases involving delayed or nonlinear aging patterns.

Crucially, the RMSE improvements observed in the proposed model are not simply a result of increased size. Instead, they stem from the effective interaction between MCAS-based feature refinement and the dual attention mechanism, which jointly enhances temporal and feature-level representation. This combination allows the model’s added capacity to be used meaningfully rather than redundantly.

From a deployment perspective, the proposed architecture is suitable for offline training and cloud-based prognostic applications, in which accuracy and interpretability take precedence over strict computational constraints. Meanwhile, the lighter variants offer practical alternatives for resource-limited or real-time BMS applications. As shown in Figure 13, performance gains tend to plateau beyond a certain model complexity unless guided by informed feature selection—highlighting the key role of MCAS in optimizing the trade-off between accuracy and complexity.

From a theoretical perspective, the computational complexity of the proposed architecture is primarily driven by the stacked Bi-LSTM layers and the dual attention mechanism. For a sequence of length

T

, hidden dimension

h

, and attention key dimension

d_{k}

, each Bi-LSTM layer has a time complexity of

O (T \cdot h^{2})

, due to the recurrent operations in both forward and backward directions. The dual multi-head attention module adds further complexity of

O (T^{2} \cdot d_{k})

. In this design, these costs are carefully managed through several measures: the sequence length is fixed at a moderate 100 cycles, the number of hidden units is progressively reduced across Bi-LSTM layers, and the attention modules are configured with smaller key dimensions to avoid excessive overhead. As a result, the quadratic cost of attention does not dominate the overall computational load.

The MCAS feature selection stage is conducted offline before training and does not impact inference-time complexity. In practice, training is performed offline, while inference requires only a single forward pass, with sub-second latency on CPUs and millisecond-level latency on GPUs. Compared to transformer-only models, whose complexity scales quadratically with sequence length, the proposed hybrid recurrent–attention architecture offers a more balanced trade-off between model expressiveness and computational efficiency. This makes it well-suited for practical battery prognostics applications, where any accuracy improvement must be weighed against computational cost.

Table 9 summarizes the computational performance of the proposed model on representative CPU and GPU platforms. While CPU-based execution enables feasible training and inference, GPU acceleration provides a substantial reduction in training time, achieving approximately an 8–10× speedup depending on dataset size. Inference latency is still significantly lower than training cost in both configurations, with sub-second execution on CPU and millisecond-level latency on GPU. These results show that the proposed model can be efficiently deployed in practical scenarios, particularly in cloud-assisted or edge-computing-based BMSs.

6. Conclusions

In this study, a unified RUL estimation framework was presented, which integrates multi-domain feature extraction with MCAS and a Bi-LSTM architecture enhanced by dual multi-head attention to improve both accuracy and interpretability. By jointly leveraging time-, frequency-, and chaotic-domain descriptors, the proposed framework captures both long-term degradation trends and subtle nonlinear irregularities that are often overlooked by conventional single-domain approaches.

A key contribution of this work is the introduction of the M-score, a composite degradation-consistency indicator designed to quantify the regularity and stability of degradation dynamics using observable electrical, thermal, and statistical signals. Rather than serving as a standalone health indicator, the M-score complements adaptive feature selection and attention-based temporal modeling, thereby enhancing both predictive robustness and interpretability. This design enables the framework to move beyond purely black-box predictions toward degradation-aware modeling that provides meaningful diagnostic insight.

The performance of the proposed approach was substantiated through validation on two widely used benchmark datasets exhibiting distinct characteristics: the TRI dataset, representing fast-charging–induced degradation under accelerated aging conditions, and the SNL dataset, encompassing multiple chemistries and heterogeneous operating regimes. On the TRI dataset, the proposed framework achieved an RMSE of 31.02 cycles and an R² of 0.9627, representing a substantial improvement over baseline Bi-LSTM and single-attention models. When applied to the multi-chemistry SNL dataset, the model achieved a high R² of 0.9623 using an unchanged architectural and hyperparameter configuration, while learning dataset-specific weights from SNL data, thereby demonstrating robust performance within the evaluated operating conditions across LFP, NMC, and NCA chemistries.

The results consistently indicate that the performance gains achieved in this study arise from the synergistic integration of MCAS-based feature refinement and dual-attention modeling, rather than from increased model complexity alone. This integration enables a balanced trade-off between accuracy and interpretability, allowing the framework to identify critical degradation phases and the most influential features contributing to RUL estimation.

Although the proposed framework demonstrates robust performance on the TRI and SNL benchmark datasets, its generalization to unseen datasets, operating profiles, and real-world deployment scenarios has not yet been fully established and remains an important direction for future investigation.

Future research will therefore focus on extending adaptability through transfer learning and domain adaptation strategies, as well as developing lightweight model variants suitable for real-time BMS integration. Additional extensions will explore the incorporation of physics-informed constraints and the application of the framework to next-generation battery chemistries, including solid-state and lithium–sulfur systems, particularly under extreme fast-charging conditions. These efforts aim to further strengthen the practical relevance and deployment readiness of data-driven battery prognostic models.

Author Contributions

Conceptualization, M.S. and E.Y.; methodology, software, validation, M.S.; writing—review and editing, M.S.; supervision, E.Y. All authors have read and agreed to the published version of the manuscript.

Funding

Financial support for this study was provided in part by TÜBİTAK within the framework of the 2244 Industrial Ph.D. Program (Grant No. 118C083).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used to support the findings of this study are publicly available. The experimental data employed in this research are obtained from two established lithium-ion battery degradation datasets. The first dataset is the TRI (Toyota Research Institute) battery dataset, which provides long-term cycling data collected under controlled laboratory conditions and is widely used for battery health assessment and remaining useful life (RUL) prediction studies. The TRI FastCharge dataset is publicly accessible at: https://s3.amazonaws.com/publications.matr.io/1/final_data/FastCharge.zip (accessed on 14 April 2025). The second dataset is the Sandia National Laboratories (SNL) battery dataset, which includes high-resolution cycling and degradation measurements of commercial lithium-ion cells evaluated under diverse operational profiles. The SNL dataset is publicly available through the Battery Archive platform at: https://www.batteryarchive.org (accessed on 10 June 2022). In addition to the publicly accessible data, the authors gratefully acknowledge Battery Archive for providing the complete SNL experimental dataset via direct correspondence. The dataset was kindly supplied to the authors on 10 June 2025, and the authors thank the Battery Archive team for their support and cooperation.

Conflicts of Interest

Author Meltem Süpürtülü was employed by the company Turkish Automobile Factory. The re-maining authors declare that the research was conducted in the absence of any commercial or fi-nancial relationships that could be construed as a potential conflict of interest.

Appendix A

This appendix documents the mathematical formulations of all features extracted in the proposed framework. These definitions are provided to ensure reproducibility and transparency, while the main manuscript focuses on methodological design, modeling strategy, and experimental evaluation.

Let

x = \{x_{1}, x_{2}, \dots, x_{N}\}

denote a cycle-level time-series signal (e.g., voltage, current, discharge capacity, or temperature), where

N

is the number of samples per cycle.

Appendix A.1. Time-Domain Statistical Features

Time-domain descriptors characterize the global distribution, dispersion, and asymmetry of cycle-level signals. Table A1 lists the features used in the study.

Table A1. Time-domain statistical features extracted from cycle-level signals.

Features	Mathematical Explanation	Features	Mathematical Explanation
Mean	$x_{m} = \frac{1}{N} \sum_{i = 1}^{N} (x_{i})$	Root Mean Square (RMS)	$x_{r m s} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {{(x}_{i})}^{2}}$
Standard Deviation (std)	$x_{s t d} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i} - x_{m})}^{2}}$	Peak-to-peak	$\max (\| x_{i} \|) - \min (x_{i})$
Minimum (min)	$\min (x_{i})$	Crest factor	$C F = \frac{m a x (\|x_{i}\|)}{\frac{1}{N} \sum_{i = 1}^{N} \|x_{i}\|}$
Maximum (max)	$\max (x_{i})$	Form factor	$S F = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {{(x}_{i})}^{2}}}{\frac{1}{N} \sum_{i = 1}^{N} \|x_{i}\|}$
Range	$\max (x_{i}) - \min (x_{i})$	10th percentile	$P_{10} = p e r c e n t i l e (x, 10)$
Skewness	$x_{s k e w n e s s} = \frac{\sum_{i = 1}^{N} {(x_{i} - x_{m})}^{3}}{(N - 1) {x_{s t d}}^{3}}$	90th percentile	$P_{90} = p e r c e n t i l e (x, 90)$
Kurtosis	$x_{k u r t o s i s} = \frac{\sum_{i = 1}^{N} {(x_{i} - x_{m})}^{4}}{(N - 1) {x_{s t d}}^{4}}$

Note: x is a time series signal.

Appendix A.2. Frequency-Domain Features

Frequency-domain descriptors are derived from the discrete Fourier transform

X (f)

and the corresponding power spectrum

P (f) = ∣ X (f) ∣^{2}

, capturing periodicity and spectral energy distribution. Table A2 lists the features used in the study.

Table A2. Frequency-domain features derived from spectral representations.

Features	Mathematical Explanation	Features	Mathematical Explanation	Additional Information
Spectral Center of Gravity	$S C G = \frac{\sum_{k} f_{k} P (f_{k})}{\sum_{k} P (f_{k})}$	Low Band Power Ratio	$R_{l o w} = \frac{\sum_{f_{k} \in B_{l o w}} P (f_{k})}{P_{t o t}}$	$P_{t o t} = \sum_{k} P (f_{k})$
Spectral Drop Frequency	$\sum_{f_{k} \leq f_{d r o p}} P (f_{k}) = 0.85 \sum_{k} P (f_{k})$	Mid-Band Power Ratio	$R_{m i d} = \frac{\sum_{f_{k} \in B_{m i d}} P (f_{k})}{P_{t o t}}$
Spectral Entropy	$H = - \sum_{k} p_{k} \log (p_{k});$ $p_{k} = \frac{P (f_{k})}{\sum_{j} P (f_{j})}$	High Band Power Ratio	$R_{h i g h} = \frac{\sum_{f_{k} \in B_{h i g h}} P (f_{k})}{P_{t o t}}$
Dominant Frequency	$f_{d o m} = a r g \underset{f}{m a x} P (f)$	Spectral Flatness Ratio	$S F I = \frac{\exp (\frac{1}{K} \sum_{k} I n P (f_{k}))}{\frac{1}{K} \sum_{k} P (f_{k})}$

Note:

P (f)

is a power spectrum.

Appendix A.3. Chaotic-Domain Features

Chaotic-domain indicators quantify nonlinear irregularities and complexity in degradation dynamics using reconstructed phase-space representations. Table A3 lists the features used in the study.

Table A3. Chaotic-domain features characterize nonlinear degradation behavior.

Features	Mathematical Explanation	Features	Mathematical Explanation
Approximate Entropy	$A p E n (m, r) = \emptyset_{m} - \emptyset_{m + 1};$ $ϕ_{m} (r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} \ln C_{i}^{m} (r),$ and $C_{i}^{m} (r) = \frac{number of j such that d [X_{i}^{m}, X_{j}^{m}] < r}{N - m + 1}$	DFA Alfa (DFA α)	$F (n) \sim n^{α}$ ; $F (n) = \sqrt{\frac{1}{N} \sum_{k = 1}^{N} [Y (k) - Y_{n} (k)]^{2}}$ and $Y (k) = \sum_{i = 1}^{k} (x_{i} - \bar{x})$
Sample Entropy	$S a m p E n (m, r) = - l o g (\frac{A}{B})$	Correlation Dimension	$D_{2} \sim \frac{d (\log C (r))}{d l o g}$ ; $C (r) = \frac{2}{N (N - 1)} \sum_{i < j} H (r - ∥ x_{i} - x_{j} ∥)$
Lyapunov Exponent	$λ = \frac{1}{Δ t} ⟨\ln \frac{d (t)}{d (0)}⟩; d (t) = d (0) e^{λ t}$	Recurrence Rate (RR)	$R R = \frac{N u m b e r o f 1 s}{n^{2} - n}$
Hurst Exponent	$\frac{R}{S} = {(\frac{N}{2})}^{H}$	Determinism (DET)	$D E T = \frac{\sum_{l = l_{m i n}}^{\infty} l P (l)}{\sum_{l = 1}^{\infty} l P (l)};$ $P (l)$ : histogram of diagonal lines of length l

Appendix A.4. Battery-Oriented Physical Indicators

Battery-specific indicators directly reflect electrochemical and thermal aging processes. Table A4 lists the features used in the study.

Table A4. Battery-specific physical features.

Features	Mathematical Explanation	Features	Mathematical Explanation
Capacity Decrease Rate	$C_{dec} = \frac{Q_{1} - Q_{N}}{N - 1}, N : c y c l e$	Average Internal Resistance	$\bar{R} = \frac{1}{N} \sum_{i = 1}^{N} R_{i}$
Capacity Variance	$V a r (Q) = \frac{1}{N - 1} \sum_{i = 1}^{N} (Q_{i} - \bar{Q})^{2}$	Internal Resistance Increase Rate	$R_{inc} = \frac{R_{N} - R_{1}}{R_{1}}$
Capacity Trend Slope	$y = a x + b \Rightarrow a$	Maximum Temperature Peak Value	$T_{m a x} = \underset{i}{m a x} (T_{i})$
Average Energy Efficiency	$η_{avg} = \frac{1}{N} \sum_{i = 1}^{N} \frac{E_{d c h, i}}{E_{c h g, i}}$	Temperature Variance	$V a r (T)$
Energy Efficiency Variance	$V a r (η) = \frac{1}{N - 1} \sum_{i = 1}^{N} {(\frac{E_{d c h, i}}{E_{c h g, i}} - η_{avg})}^{2}$

Appendix A.5. M-Score Components

The M-score aggregates complementary degradation-consistency indicators into a single scalar descriptor. Table A5 lists the features used in the study.

Table A5. Mathematical definitions of M-score components and derived features.

Features	Mathematical Explanation
Capacity irregularity	${Δ Q}_{v a r} = V a r (S a v i t z k y G o l a y ({Δ Q}_{i}))$ $, {Δ Q}_{i} = Q_{i} - Q_{i - 1}$
Cycle entropy	$H_{c y c l e} = - \sum_{i = 1}^{N} p_{i} {l o g}_{2} (p_{i}); p_{i} = \frac{n_{i}}{\sum_{j = 1}^{N} n_{j}}$
Voltage Stability	$σ_{V} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(V_{i} - \bar{V})}^{2}}; V_{s t a b i l i t y} = \frac{1}{1 + \frac{σ_{V}}{\bar{V}}}$
Temperature Fluctuation	$T_{f l u c t u a t i o n} = \frac{T_{m a x} - T_{m i n}}{T_{m e a n}}; σ_{T}^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} {(T_{i} - \bar{T})}^{2}$
$M - score$	$M = \frac{Δ Q_{v a r} \cdot l o g (H_{c y c l e})}{V_{s t a b i l i t y} \cdot \sqrt{T_{f l u c t u a t i o n}}}$

The M-score trend and M-score acceleration quantify temporal evolution and rate of change using sliding-window linear regression and first-order differences, respectively.

Appendix A.6. Feature Extraction Algorithms

Algorithm A1 describes the multi-domain feature extraction pipeline applied at the cycle level.

Algorithm A1. Multi-domain feature extraction pipeline

Input:

Raw signals per cycle:

V (t)

(voltage), I (t)

(current), T (t)

(temperature),

Q_{c} (t)

(charge capacity), Q_{d} (t)

(discharge capacity)

Output:

Multi-domain feature matrix X

1: for each battery cell do

2: for each cycle c do

3: Segment V(t), I(t), T(t) into fixed sliding windows

4:

5: for each window w do

6: Compute time-domain features:

7: mean, std, skewness, kurtosis, range, IQR

8:

9: Compute frequency-domain features:

10: FFT-based spectrum

11: dominant frequency, spectral centroid, bandwidth

12:

13: Compute entropy and chaotic-related features:

14: entropy-based complexity measures

15: chaotic-inspired irregularity indicators

16: end for

17:

18: Compute battery-specific features:

19: charge/discharge capacity trends

20: voltage response characteristics

21: temperature variation indicators

22:

23: Flag features required for M-score computation:

24: normalize electrical, thermal, capacity, entropy features

25: aggregate normalized descriptors into a consistency score

26:

27: Normalize all extracted features for cycle c

28: Aggregate window-level features into cycle-level descriptors

29: Append cycle-level feature vector to X

30: end for

31: end for

Appendix B

Appendix B documents the mathematical formulation of MCAS evaluation criteria and the preliminary weight-selection process.

Appendix B.1. MCAS Evaluation Criteria

Statistical strength (

S_{s t a t}

). Statistical strength quantifies the dependency between a feature and the RUL target using linear, rank-based, and nonlinear measures. The proposed methodology combines linear (Pearson correlation (

r_{P e a r s o n}

)), rank-based (Spearman rank correlation (

r_{S p e a r m a n}

)), and nonlinear Mutual Information (MI) approaches.

S_{s t a t} = \frac{1}{3} (|r_{P e a r s o n}| + |r_{S p e a r m a n}| + M I (X, Y))

(A1)

This multi-metric formulation captures both monotonic and nonlinear interactions, improving resilience against non-Gaussian degradation behaviors [22].

Model contribution (

S_{m o d e l}

). The quantification of model-driven relevance is achieved through the implementation of a hybrid importance index, which integrates the Random Forest feature importance (

{F I}_{R F}

) and permutation-based error increase (

∆ {M A E}_{p e r m}

):

S_{m o d e l} = \frac{1}{2} ({F I}_{R F} + ∆ {M A E}_{p e r m})

(A2)

This measure finds features that directly enhance supervised model performance, integrating both structural relevance and predictive sensitivity [19].

Temporal stability (

S_{t e m p}

). Temporal stability is defined as the degree to which each feature’s correlation with RUL is consistent across cycles, as measured by the following formula:

S_{t e m p} = 1 - \frac{s t d (r_{t})}{m e a n (|r_{t}|)}

(A3)

where

r_{t}

is the correlation between feature values and RUL at cycle t. It has been shown that features exhibiting stable correlations across degradation trajectories receive higher scores, thereby ensuring reliability under varying load and temperature conditions [22].

Domain relevance (

S_{d o m a i n}

). This part is expert knowledge and prior empirical evidence. Features that are empirically found to be associated with SEI growth, internal resistance, energy efficiency, or the M-score are assigned full weight, while moderately supported features receive a weight of 0.5, and unsupported ones are penalized.

S_{d o m a i n} = \{\begin{matrix} 1.0 & i f X \{{S E I}_{g r o w t h}, I R r i s e, e f f i c i e n c y, M s c o r e\} \\ 0.5 & i f X r e l a t e d t o l i t e r a t u r e \\ 0.0 & o t h e r \end{matrix}

(A4)

The integration of physical interpretability with alignment between data-driven selection and electrochemical theory is helped by this rule-based weighting approach [17].

Uniqueness (

S_{u n i q}

). To mitigate the issue of redundancy, a uniqueness score is used to penalize highly correlated features.

S_{u n i q} = 1 - {m a x}_{j \neq i} (|c o r r (X_{i}, X_{j})|)

(A5)

Appendix B.2. MCAS Weight Sensitivity Analysis

This appendix will investigate the robustness of the proposed framework for MCAS in terms of differences in the weight coefficients of the proposed framework. As described in Algorithm 1, MCAS combines five different criteria: statistical relevance, model-based contribution, temporal stability, domain relevance, and uniqueness.

Mathematical Perspective: The score of MCAS for feature

x_{i}

is given by Equation (3), where each component score

S_{k}

is normalized independently before combined. This formulation exhibits two key properties. First, the weight coefficients

w_{k}

are used to control the contribution of each criterion in the benefit term. Second, the final MCAS score is normalized by a redundancy-aware term, which linearly reduces highly correlated features irrespective of the weight configuration chosen. Consequently, moderate changes in the weight vector influence the relative emphasis of criteria while maintaining the stability of the feature ranking.

Algorithmic Perspective: MCAS is purely a feature selection-level method, operating before any predictive model is trained. As such, a sensitivity analysis is performed on the results of feature ranking and selection, rather than on prediction metrics that could introduce confounding training noise. For each weight configuration, the scores of MCAS are calculated on the same feature pool and redundancy structure; the features are listed in descending order of

S_{MCAS}

and a fixed Top-

K

subset is selected. Robustness is quantified using Spearman rank correlation (ρ) to assess the ranking stability and the Jaccard similarity index to measure overlap between selected feature subsets.

Empirical Sensitivity Evaluation: A systematic one-at-a-time weight perturbation strategy was employed. Starting from the baseline configuration used throughout the experiments, individual criterion weights were increased while the remaining weights were re-normalized to support a unit sum. This procedure explores a local neighborhood around the baseline without dataset-specific re-optimization. No predictive model retraining was performed, ensuring that the observed effects are solely attributable to MCAS. The resulting ranking correlations and feature-set overlaps show that MCAS maintains high stability under moderate weight perturbations, with substantial overlap among selected feature subsets.

Table A6 reports a preliminary heuristic screening of MCAS weight configurations used to identify a balanced baseline setting. This screening is not intended as a sensitivity analysis; RMSE and R² values are included only to illustrate performance trends and were not used to infer robustness. Table A7 summarizes the systematic sensitivity analysis conducted around the selected baseline configuration, demonstrating that the proposed MCAS formulation does not rely on finely tuned weight coefficients to achieve stable feature selection.

Table A6. Preliminary MCAS weight screening for baseline configuration selection.

Config ID	$w_{1}$	$w_{2}$	$w_{3}$	$w_{4}$	$w_{5}$	RMSE	R²
C1	0.40	0.20	0.15	0.15	0.10	49.7	0.89
C2	0.30	0.30	0.15	0.15	0.10	47.2	0.90
C3	0.25	0.30	0.20	0.15	0.10	45.6	0.91
C4	0.20	0.30	0.25	0.15	0.10	44.8	0.92
C5 (Selected)	0.25	0.35	0.20	0.10	0.10	43.85	0.925
C6	0.20	0.40	0.15	0.15	0.10	45.1	0.91

Table A7. MCAS weight sensitivity analysis around the selected baseline configuration.

Config ID	Weight Perturbation	Spearman ρ	Jaccard (Top-K)
C5 (baseline)	—	1.00	1.00
C5-S	stat +10%	0.92	0.81
C5-M	model +10%	0.90	0.79
C5-T	temp +10%	0.89	0.77
C5-D	domain +10%	0.88	0.76
C5-U	uniq +10%	0.91	0.80

The Jaccard similarity index [54] is defined as the ratio between the intersection and the union of two selected feature sets. For two Top-

K

feature subsets

A

and

B

, it is computed as follows:

J (A, B) = \frac{|A \cap B|}{|A \cup B|}

(A6)

and reflects the degree of overlap between MCAS-selected feature subsets under different weight configurations.

References

Nazim, M.S.; Chakma, A.; Joha, M.I.; Alam, S.S.; Rahman, M.M.; Umam, M.K.S.; Jang, Y.M. Artificial intelligence for estimating State of Health and Remaining Useful Life of EV batteries: A systematic review. ICT Express 2025, 11, 769–789. [Google Scholar] [CrossRef]
Wang, K.; Lin, X.; Zhang, X.; Zheng, J.; He, H.; Xu, Y.; Huang, Y. Integrated approaches for lithium-ion battery state estimation and life prediction: A critical review of model-driven, data-driven, and hybrid techniques. J. Clean. Prod. 2025, 521, 146229. [Google Scholar] [CrossRef]
Madani, S.S.; Shabeer, Y.; Fowler, M.; Panchal, S.; Chaoui, H.; Mekhilef, S.; See, K. Artificial Intelligence and Digital Twin Technologies for Intelligent Lithium-Ion Battery Management Systems: A Comprehensive Review of State Estimation, Lifecycle Optimization, and Cloud-Edge Integration. Batteries 2025, 11, 298. [Google Scholar] [CrossRef]
Menye, J.S.; Camara, M.B.; Dakyo, B. Lithium Battery Degradation and Failure Mechanisms: A State-of-the-Art Review. Energies 2025, 18, 342. [Google Scholar] [CrossRef]
Vega–Garita, V.; Heydarzadeh, M.; Dadash, A.H.; Immonen, E. The need for aging-aware control methods in lithium-ion batteries: A review. J. Energy Storage 2025, 132, 117653. [Google Scholar] [CrossRef]
Li, H.; Shaukat, H.; Zhu, R.; Kaleem, M.B.; Wu, Y. Fault Detection of Li–Ion Batteries in Electric Vehicles: A Comprehensive Review. Sustainability 2025, 17, 6322. [Google Scholar] [CrossRef]
Madani, S.S.; Shabeer, Y.; Allard, F.; Fowler, M.; Ziebert, C.; Wang, Z.; Khalilpour, K. A comprehensive review on lithium-ion battery lifetime prediction and aging mechanism analysis. Batteries 2025, 11, 127. [Google Scholar] [CrossRef]
Barré, A.; Deguilhem, B.; Grolleau, S.; Gérard, M.; Suard, F.; Riu, D. A review on lithium-ion battery ageing mechanisms and estimations for automotive applications. J. Power Sources 2013, 241, 680–689. [Google Scholar] [CrossRef]
Vetter, J.; Novák, P.; Wagner, M.R.; Veit, C.; Möller, K.-C.; Besenhard, J.O.; Winter, M. Ageing mechanisms in lithium-ion batteries. J. Power Sources 2005, 147, 269–281. [Google Scholar] [CrossRef]
Elmahallawy, M.; Elfouly, T.; Alouani, A.; Massoud, A.M. A comprehensive review of lithium-ion batteries modeling, and state of health and remaining useful lifetime prediction. IEEE Access 2022, 10, 119040–119070. [Google Scholar] [CrossRef]
Feng, W.; Sun, Z.; Han, Y.; Cai, N.; Zhou, Y. A multi-strategy attention regression network for joint prediction of state of health and remaining useful life of lithium-ion batteries using only charging data. J. Power Sources 2025, 636, 236507. [Google Scholar] [CrossRef]
Li, Y.; Shi, H.; Wang, S.; Huang, Q.; Liu, C.; Nie, S.; Luo, T. A comprehensive review of remaining useful life prediction methods for lithium-ion batteries: Models, trends, and engineering applications. J. Energy Chem. 2025, 112, 384–414. [Google Scholar] [CrossRef]
Ruiz, P.L.; Damianakis, N.; Mouli, G.R.C. Physics-based and Data-driven Modeling of Degradation Mechanisms for Lithium-Ion Batteries—A Review. IEEE Access 2025, 13, 21164–21189. [Google Scholar] [CrossRef]
Qu, X.; Shi, D.; Zhao, J.; Tran, M.K.; Wang, Z.; Fowler, M.; Burke, A.F. Insights and reviews on battery lifetime prediction from research to practice. J. Energy Chem. 2024, 94, 716–739. [Google Scholar] [CrossRef]
Amer, M.; Masri, J.; Sajjad, U.; Hamid, K. Electric vehicles: Battery technologies, charging standards, AI communications, challenges, and future directions. Energy Convers. Manag. X 2024, 24, 100751. [Google Scholar] [CrossRef]
Zhao, J.; Han, X.; Wu, Y.; Wang, Z.; Burke, A.F. Opportunities and challenges in transformer neural networks for battery state estimation: Charge, health, lifetime, and safety. J. Energy Chem. 2025, 102, 463–496. [Google Scholar] [CrossRef]
Yang, Y. A machine-learning prediction method of lithium-ion battery life based on charge process for different applications. Appl. Energy 2021, 292, 116897. [Google Scholar] [CrossRef]
Severson, K.A.; Attia, P.M.; Jin, N.; Perkins, N.; Jiang, B.; Yang, Z.; Chen, M.H.; Aykol, M.; Herring, P.K.; Fraggedakis, D.; et al. Data-driven prediction of battery cycle life before capacity degradation. Nat. Energy 2019, 4, 383–391. [Google Scholar] [CrossRef]
Ali, J.B.; Saidi, L. A new suitable feature selection and regression procedure for lithium-ion battery prognostics. Int. J. Comput. Appl. Technol. 2018, 58, 102–115. [Google Scholar] [CrossRef]
Zhang, T.; Sun, H. Research on Model Prediction of Remaining Service Life of Lithium-Ion Batteries Based on Chaotic Time Series. Electronics 2025, 14, 2280. [Google Scholar] [CrossRef]
Wang, Z.; Liu, N.; Chen, C.; Guo, Y. Adaptive self-attention LSTM for RUL prediction of lithium-ion batteries. Inf. Sci. 2023, 635, 398–413. [Google Scholar] [CrossRef]
Hu, X.; Che, Y.; Lin, X.; Onori, S. Battery Health Prediction Using Fusion-Based Feature Selection and Machine Learning. IEEE Trans. Transp. Electrif. 2021, 7, 382–398. [Google Scholar] [CrossRef]
Xu, Y. Advanced RUL estimation for lithium-ion batteries: Integrating attention-based LSTM with mutual learning-enhanced artificial bee colony optimization. J. Inst. Eng. (India) Ser. B 2025, 106, 735–760. [Google Scholar] [CrossRef]
Li, N.; Wang, M.; Lei, Y.; Yang, B.; Li, X.; Si, X. Remaining useful life prediction of lithium-ion battery with nonparametric degradation modeling and incomplete data. Reliab. Eng. Syst. Saf. 2025, 256, 110721. [Google Scholar] [CrossRef]
Liu, Z.; Meng, B.; Pan, R.; Zhou, J. An enhanced sorting framework for retired batteries based on multi-dimensional features and an integrated clustering approach. Energy AI 2025, 22, 100612. [Google Scholar] [CrossRef]
Song, B.; Yue, G.; Guo, D.; Wu, H.; Sun, Y.; Li, Y.; Zhou, B. Prediction of the Remaining Useful Life of Lithium–Ion Batteries Based on Mode Decomposition and ED-LSTM. Batteries 2025, 11, 86. [Google Scholar] [CrossRef]
Liu, P.; Liu, C.; Wang, Z.; Wang, Q.; Han, J.; Zhou, Y. A data-driven comprehensive battery SOH evaluation and prediction method based on improved critic-gra and att-bigru. Sustainability 2023, 15, 15084. [Google Scholar] [CrossRef]
Wang, P.; Zhang, X.; Zhang, G. Remaining useful life prediction of lithium-ion batteries based on ResNet-Bi-LSTM-Attention model. J. Energy Storage Sci. Technol. 2023, 12, 1215–1222. [Google Scholar] [CrossRef]
Song, W.; Wu, D.; Shen, W.; Boulet, B. A Remaining Useful Life Prediction Method for Lithium-Ion Battery Based on Temporal Transformer Network. Procedia Comput. Sci. 2023, 217, 1830–1838. [Google Scholar] [CrossRef]
Preger, Y.; Barkholtz, H.M.; Fresquez, A.; Campbell, D.L.; Hewson, J.C.; Westrich, T.; Ferreira, S.R. Degradation of commercial lithium-ion cells across a wide range of battery chemistries and cycling conditions. J. Electrochem. Soc. 2020, 167, 120532. [Google Scholar] [CrossRef]
Attia, P.M.; Grover, A.; Jin, N.; Severson, K.A.; Markov, T.M.; Liao, Y.H.; Chen, M.H.; Cheong, B.; Perkins, N.; Yang, Z.; et al. Closed-loop optimization of fast-charging protocols for batteries with machine learning. Nature 2020, 578, 397–402. [Google Scholar] [CrossRef] [PubMed]
Hatipoğlu, A.; Süpürtülü, M.; Yılmaz, E. Enhanced fault classification in bearings: A multi-domain feature extraction approach with LSTM-attention and LASSO. Arab. J. Sci. Eng. 2025, 50, 10795–10812. [Google Scholar] [CrossRef]
Süpürtülü, M.; Hatipoğlu, A.; Yılmaz, E. An analytical benchmark of feature selection techniques for industrial fault classification leveraging time-domain features. Appl. Sci. 2025, 15, 1457. [Google Scholar] [CrossRef]
Liu, L. Data-driven prognosis of multiscale and multiphysics complex system anomalies: Its application to lithium-ion batteries failure detection. J. Electrochem. Soc. 2023, 170, 050525. [Google Scholar] [CrossRef]
Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
Wu, B.; Qiu, S.; Liu, W. Addressing Sensor Data Heterogeneity and Sample Imbalance: A Transformer-Based Approach for Battery Degradation Prediction in Electric Vehicles. Sensors 2025, 25, 3564. [Google Scholar] [CrossRef]
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
Savitzky, A.; Golay, M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Guo, F.; Wu, X.; Liu, L.; Ye, J.; Wang, T.; Fu, L.; Wu, Y. Prediction of remaining useful life and state of health of lithium batteries based on time series feature and Savitzky–Golay filter combined with gated recurrent neural network. Energy 2023, 270, 126880. [Google Scholar] [CrossRef]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Prigogine, I. From Being to Becoming: Time and Complexity in the Physical Sciences; W.H. Freeman and Company: San Francisco, CA, USA, 1980. [Google Scholar]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30; Curran Associates: Red Hook, NY, USA, 2017; Available online: https://arxiv.org/abs/1706.03762 (accessed on 4 January 2025).
Tao, T.; Ji, C.; Dai, J.; Rao, J.; Wang, J.; Sun, W.; Romagnoli, J. Data-based health indicator extraction for battery SOH estimation via deep learning. J. Energy Storage 2024, 78, 109982. [Google Scholar] [CrossRef]
Ansari, S.; Ayob, A.; Lipu, M.H.; Hussain, A.; Abdolrasol, M.G.; Zainuri, M.A.A.M.; Saad, M.H.M. Optimized data-driven approach for remaining useful life prediction of Lithium-ion batteries based on sliding window and systematic sampling. J. Energy Storage 2023, 73, 109198. [Google Scholar] [CrossRef]
Lai, C.S.; Yang, M.; Dong, Z.; Ji, X.; Wang, H.; Wang, J. PFFN: A Parallel Feature Fusion Network for Remaining Useful Life Early Prediction of Lithium-Ion Battery. IEEE Trans. Transp. Electrif. 2025, 11, 2696–2706. [Google Scholar] [CrossRef]
Bak, G.; Bae, Y. Positive and negative convolution cross-connect neural network for predicting the remaining useful life of lithium-ion batteries. Energy AI 2025, 20, 100507. [Google Scholar] [CrossRef]
Liu, R.; Jiang, B. A multi-time-resolution attention-based interaction network for co-estimation of multiple battery states. Appl. Energy 2025, 381, 125097. [Google Scholar] [CrossRef]
Sudarshan, M.; Vajja, J.V.R.; Tomar, V. DegradAI: A scalable framework for early battery health diagnosis from limited data. npj Clean Energy 2025, 1, 8. [Google Scholar] [CrossRef]
Sugiarto, L.; Huang, Z.; Lu, Y.C. Battery lifetime prediction using surface temperature features from early cycle data. Energy Environ. Sci. 2025, 18, 2511–2523. [Google Scholar] [CrossRef]
Wu, W.; Chen, Z.; Liu, W.; Pan, E. Correlation based-graph neural network for health prognosis of non-fully charged and discharged lithium-ion batteries. J. Power Sources 2025, 629, 235984. [Google Scholar] [CrossRef]
Jaccard, P. Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 547–579. [Google Scholar]

Figure 1. System workflow illustrating battery-level data segmentation, early-cycle temporal processing, feature engineering, Bi-LSTM-based modeling, and incremental evaluation protocol for RUL prediction.

Figure 2. Overview of the (a) TRI and (b) SNL datasets used in this study.

Figure 3. Incremental battery-wise leave-one-out evaluation protocol for RUL prediction.

Figure 4. Block-level formulation of the proposed M-score calculation. The framework integrates charge variability and cycle entropy in the numerator (M₁), while voltage stability and temperature fluctuation regulate the denominator (M₂). The final M-score is computed as the ratio between degradation amplification and stability suppression components. Legend: Orange blocks denote intermediate degradation indicators (ΔQ_var, V_stability, T_fluctuation). The red block (H_cycle) represents the entropy-based cycle health index. Grey blocks correspond to pre-processing and statistical transformations (Savitzky–Golay filtering, normalization, variance, mean, min–max operations). The blue block indicates the final composite metric (M score). Mathematical operators include multiplicative fusion (×), inverse modeling (1/·), logarithmic scaling (log), variance (Var), and square-root normalization (√). Input signals are capacity Q(t), voltage V(t), and temperature T(t) time series computed at cycle level.A higher M-score corresponds to increased degradation irregularity, elevated stochastic behavior, and reduced electrical–thermal stability, which are commonly associated with accelerated or unstable aging processes. By embedding multi-domain degradation consistency into a single descriptor, the M-score enhances both the interpretability of degradation patterns and the robustness of RUL prediction when used in conjunction with sequence learning models.

Figure 5. Conceptual framework of the proposed Bi-LSTM-based RUL prediction architecture with attention and model evolution.

Figure 6. Bi-LSTM backbone [43,44].

Figure 7. (a) Capacity irregularity, (b) cycle entropy, (c) voltage stability, and (d) temperature fluctuation components used to construct the composite M-score during the early-cycle phase.

Figure 8. Cycle by cycle multi-domain feature evolution for a representative FastCharge cell (CH4), illustrating (a) capacity and SOH fade, (b) voltage statistic, (c) temperature evolution, (d) energy and efficiency, (e) internal resistance, (f) M score, (g) degradation rate, (h) power capability, (i) voltage variance, (j) current statistics, (k) system entropy, (l) feature correlation matrix.

Figure 9. MCAS feature selection results showing the top 36 features ranked by importance score.

Figure 10. RUL prediction results for three selected FastCharge cells—(a) 000000_CH19, (b) 000001_CH16, and (c) 000001_CH30—showing consistent predictive behavior across different cycle profiles.

Figure 11. Comparative (a) RMSE and (b) R² performance of Bi-LSTM-based configurations across LFP, NCA, and NMC subsets of the SNL dataset, showing incremental gains from attention mechanisms and MCAS-guided feature selection.

Figure 12. RUL prediction comparison in the SNL dataset for LFP (a), NMC (b), and NCA (c) cells, plotted over normalized cycles with the fitted trend and uncertainty band, illustrating consistency across chemistries.

Figure 13. Accuracy−complexity trade-off analysis for Bi-LSTM-based models on the TRI dataset: (a) RMSE evolution across different model architectures, showing a percentage reduction in error as complexity increases; (b)

R^{2}

score evolution illustrating the improvement in variance explanation with architectural enhancements; (c) model complexity evolution representing the growth in the number of trainable parameters (in millions) for each iteration; (d) scatter plot of the performance vs. complexity trade-off, correlating predictive accuracy (RMSE) with computational cost (Parameters).

Figure 13. Accuracy−complexity trade-off analysis for Bi-LSTM-based models on the TRI dataset: (a) RMSE evolution across different model architectures, showing a percentage reduction in error as complexity increases; (b)

R^{2}

score evolution illustrating the improvement in variance explanation with architectural enhancements; (c) model complexity evolution representing the growth in the number of trainable parameters (in millions) for each iteration; (d) scatter plot of the performance vs. complexity trade-off, correlating predictive accuracy (RMSE) with computational cost (Parameters).

Table 1. Unified technical specifications and operating limits of Dataset 1 and Dataset 2 used in this study.

Parameter	Dataset 1 (TRI)	Dataset 2 (SNL)
Parameter	LFP	LFP	NCA	NMC
Nominal Capacity (Ah)	1.1	1.1	3.2	3.0
Nominal Voltage (V)	3.3	3.3	3.6	3.6
Operating Voltage Range (V)	2.0–3.6	2.0–3.6	2.5–4.2	2.0–4.2
Charge Cut-off Voltage (V)	3.6	3.6	4.2	4.2
Discharge Cut-off Voltage (V)	2.0	2.0	2.5	2.0
Maximum Discharge Current (A)	4C	30	6	20
Operating Temperature Range (°C)	30	−30 to 60	0–45	−5–50

Table 2. Layer-wise configuration and training parameters of the proposed Bi-LSTM with dual multi-head attention model.

Layer Type	Units/Parameters
Input	Input shape = (Batch size, 100 cycles, 36 features)
Bi-LSTM Layer 1	512 units (256 forward + 256 backward), Normalization, Dropout = 0.3
Bi-LSTM Layer 2	256 units (128 forward + 128 backward), Normalization, Dropout = 0.3
Bi-LSTM Layer 3	128 units (64 forward + 64 backward), Normalization, Dropout = 0.3
Dual Multi-Head Attention	Attention 1: 8 heads (key size = 64), Attention 2: 4 heads (key size = 32), Normalization, Dropout = 0.2
Dense Layer 1	256 units, Dropout = 0.5
Dense Layer 2	128 units, Dropout = 0.4
Dense Layer 3	64 units, Dropout = 0.3
Dense Layer 4	32 units, Dropout = 0.2
Output Layer	1 unit (RUL prediction)

Table 3. Performance of raw feature groups on the TRI dataset.

Feature Set	Feature Count	RMSE	MAE	R²
Time-domain features	65	81.4 ± 6.2	68.1 ± 5.7	0.71 ± 0.04
Frequency-domain features	40	72.6 ± 5.4	60.3 ± 4.8	0.76 ± 0.03
Chaotic features	40	69.8 ± 6.8	57.9 ± 6.1	0.78 ± 0.05
Battery-specific features	9	88.3 ± 7.1	74.6 ± 6.4	0.63 ± 0.06
All features (including M score)	161	74.9 ± 6.0	62.1 ± 5.5	0.74 ± 0.04

Table 4. Comparison of feature selection strategies on the TRI dataset.

Feature Selection Method	Selected Feature Count	RMSE	MAE	R²
No selection (all features)	161	74.9 ± 6.0	62.1 ± 5.5	0.74 ± 0.04
Mutual Information (MI)	40	55.2 ± 4.8	44.8 ± 4.1	0.87 ± 0.03
Random Forest Importance (RFI)	40	52.9 ± 4.5	42.6 ± 3.9	0.88 ± 0.03
MI + RFI (Hybrid method)	40	49.6 ± 4.2	39.8 ± 3.6	0.90 ± 0.02
MCAS (proposed)	36	43.85 ± 3.7	35.9 ± 3.2	0.925 ± 0.02

Table 5. Preliminary comparison of different deep learning architectures on the TRI dataset.

Model Architecture	RMSE	MAE	R²
1D-CNN	198.6 ± 7.4	158.4 ± 6.8	0.73 ± 0.06
CNN-LSTM	191.9 ± 6.1	150.7 ± 5.4	0.72 ± 0.05
LSTM (1L*)	127.3 ± 5.6	96.1 ± 4.9	0.65 ± 0.04
LSTM (1L) + Attention	109.7 ± 4.6	84.1 ± 4.0	0.60 ± 0.03
Bi-LSTM (1L)	119.2 ± 5.0	71.3 ± 4.3	0.65 ± 0.06
Bi-LSTM (1L) + Attention	69.45 ± 4.6	52.9 ± 4.3	0.81 ± 0.03
Transformer (Encoder-only, 2L)	155.6 ± 6.8	125.2 ± 5.9	0.76 ± 0.05
Bi-LSTM (2L)	73.4 ± 4.8	56.8 ± 4.1	0.79 ± 0.04
Bi-LSTM (2L) + Attention	58.9 ± 4.1	45.3 ± 3.7	0.86 ± 0.03
Bi-LSTM (2L) + Dual Attention	43.85 ± 3.7	35.9 ± 3.2	0.925 ± 0.02
Bi-LSTM (3L) + Dual Attention	32.41 ± 3.1	27.10 ± 2.6	0.938 ± 0.02
Bi-LSTM (3L) + Dual Attention + MCAS (proposed)	31.02 ± 2.8	26.07 ± 2.4	0.963 ± 0.01

* L: Layer.

Table 6. Performance comparison of Bi-LSTM variants and the proposed MCAS-guided dual-attention model on the SNL dataset (RMSE/MAE in cycles).

Dataset	Models	Performance Metrics
Dataset	Models	RMSE	MAE	R² Score
SNL–LFP	Model 1	850.2 ± 61.4	510.1 ± 42.7	0.786 ± 0.031
	Model 2	720.4 ± 54.2	438.2 ± 36.5	0.823 ± 0.028
	Model 3	680.7 ± 49.6	408.4 ± 34.2	0.846 ± 0.025
	Model 4	309.2 ± 27.9	217.3 ± 21.4	0.955 ± 0.013
	Model 5 (Proposed)	280.5 ± 24.7	198.7 ± 19.3	0.962 ± 0.010
SNL–NCA	Model 1	420.5 ± 38.6	255.4 ± 26.8	0.723 ± 0.037
	Model 2	380.2 ± 34.1	232.1 ± 24.3	0.786 ± 0.032
	Model 3	350.8 ± 31.7	216.3 ± 22.7	0.812 ± 0.028
	Model 4	108.2 ± 12.6	80.2 ± 9.1	0.837 ± 0.021
	Model 5 (Proposed)	95.2 ± 10.9	69.8 ± 8.2	0.877 ± 0.018
SNL–NMC	Model 1	445.8 ± 41.2	270.4 ± 29.6	0.757 ± 0.034
	Model 2	395.6 ± 36.5	242.7 ± 26.1	0.812 ± 0.029
	Model 3	365.2 ± 33.8	220.1 ± 24.5	0.835 ± 0.026
	Model 4	108.4 ± 13.1	74.1 ± 8.7	0.888 ± 0.019
	Model 5 (Proposed)	98.7 ± 11.4	66.5 ± 7.9	0.912 ± 0.016

Table 7. Summary of feature extraction and selection strategies in related studies for TRI dataset.

Method	Number of Trained Batteries	Number of Extracted Features	Number of Selected Features	Feature Selection Method and Model Utilization	RMSE/Performance	Ref.
Bi-LSTM	4	Comprehensive extraction with 61 features	7 features	Feature importance was figured out using the MI method, and the seven most crucial features were integrated into the model.	RMSE ~1.64 cycle (SoH)	[46]
JFO-CFNN	4	46	15	Systematic sampling was applied to find 15 meaningful features among 46 candidates, which were used in the modeling process.	RMSE < 0.15	[47]
PFFN	41	20+ (statistical + domain-specific)	8–12	A feature selection strategy was applied to keep the most relevant statistical and domain-specific features, modeled within a parallel network architecture.	6–27% RMSE improvement	[48]
PNCNN	~71	Automatically extracted deep features (convolution + NCC)	Embedded (automatic in-model selection)	The Positive-Negative Convolution (PNC) layer extracted features with positive and negative correlations, while the NCC layer modeled complex interactions; feature selection was entirely embedded.	RMSE = 93.58 (test set)	[49]
MuRAIN	~56	Raw cycling data (current, voltage, Δq, temperature)	Automatic multi-resolution feature mapping	Multi-time-resolution patching and interactive learning with self-attention modules for SOC–SOH–RUL co-estimation; feature extraction and selection performed implicitly.	RMSE = 63 cycles (RUL)	[50]
Bi-LSTM + Dual Attention Mechanism	40	161 comprehensive features	36 features	Developed by integrating a Dual Attention mechanism into the Bi-LSTM structure.	RMSE = 43.85 (test set) (trained with 40 batteries)	Present Study

Table 8. Performance comparison between baseline methods and the proposed MCAS-guided Bi-LSTM with dual attention on the SNL dataset.

Dataset/Chemistry	Method	RMSE (Cycles)	MAE (Cycles/Ah)	R² Score/Notes	Ref.
SNL–LFP	DegradAI (Mixture-of-Experts)	—	0.0252 Ah (MAE × 10⁻² ≈ 2.52)	— (100% chemistry accuracy)	[51]
	ElasticNet Model (temperature HIs)	329	264 (MAPE 10.2%)	—	[52]
	Proposed Model (Bi-LSTM + Dual Attention + MCAS)	280.5	198.7	R² = 0.9623	Present Study
SNL–NMC	DegradAI (Mixture-of-Experts)	—	0.0252 Ah (MAE × 10⁻² ≈ 2.52)	R² ≈ 0.99	[51]
	ElasticNet Model (temperature HIs)	144	97 (MAPE 17.1%)	—	[52]
	Proposed Model (Bi-LSTM + Dual Attention + MCAS)	95.2	72.1	R² = 0.8756	Present Study
SNL–NCA	DegradAI (Mixture-of-Experts)	—	0.0345 Ah (MAE × 10⁻² ≈ 3.45)	R² ≈ 0.99 (63.3% classification accuracy)	[51]
	ElasticNet Model (temperature HIs)	37	30 (MAPE 6.4%)	—	[52]
	Proposed Model (Bi-LSTM + Dual Attention + MCAS)	98.7	68.4	R² = 0.9123	Present Study

Note: Reported error metrics across studies are not directly comparable due to differences in target definitions, evaluation horizons, and measurement units.

Table 9. Computational performance benchmark of the proposed model.

Platform	Hardware Configuration	Training Time (Medium Dataset)	Training Time (Large Dataset)	Inference Latency (per Sample)	Remarks
CPU	Intel Core i9, 16 GB RAM	4–5 h	10–12 h	<1 s	CPU-only execution
GPU	NVIDIA RTX A5000, 24 GB VRAM	<1 h	2–3 h	Tens of ms	CUDA-accelerated

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Süpürtülü, M.; Yılmaz, E. A Battery Cycle-Level RUL Estimation Method Based on Multi-Domain Features and an MCAS-Guided Dual-Attention Bi-LSTM. Appl. Sci. 2026, 16, 2070. https://doi.org/10.3390/app16042070

AMA Style

Süpürtülü M, Yılmaz E. A Battery Cycle-Level RUL Estimation Method Based on Multi-Domain Features and an MCAS-Guided Dual-Attention Bi-LSTM. Applied Sciences. 2026; 16(4):2070. https://doi.org/10.3390/app16042070

Chicago/Turabian Style

Süpürtülü, Meltem, and Ersen Yılmaz. 2026. "A Battery Cycle-Level RUL Estimation Method Based on Multi-Domain Features and an MCAS-Guided Dual-Attention Bi-LSTM" Applied Sciences 16, no. 4: 2070. https://doi.org/10.3390/app16042070

APA Style

Süpürtülü, M., & Yılmaz, E. (2026). A Battery Cycle-Level RUL Estimation Method Based on Multi-Domain Features and an MCAS-Guided Dual-Attention Bi-LSTM. Applied Sciences, 16(4), 2070. https://doi.org/10.3390/app16042070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Battery Cycle-Level RUL Estimation Method Based on Multi-Domain Features and an MCAS-Guided Dual-Attention Bi-LSTM

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Datasets

3.1.1. Toyota Research Institute (TRI) Dataset

3.1.2. Sandia National Laboratories (SNL) Dataset

3.2. Data Preprocessing

3.3. Proposed Methodology

3.3.1. Multi-Domain Feature Extraction with M-Score

3.3.2. Multi-Criteria Adaptive Selection (MCAS)

3.3.3. Bi-LSTM with Dual Multi-Head Attention Architecture

3.3.4. Hyperparameter Configuration and Training Strategy

3.3.5. Evaluation Metrics

4. Results

4.1. Case 1 Results: TRI Dataset

4.1.1. Multi-Domain Feature Extraction

4.1.2. Feature Selection

4.1.3. Model Performance and Comparative Analysis

4.2. Case 2 Results: SNL Dataset

4.2.1. Progressive Architectural Evaluation

4.2.2. Quantitative Performance Analysis

4.2.3. Trend Analysis and Trajectory-Level Interpretation

5. Discussion

5.1. Interpretability and Comparison with Related Work

5.2. Physical Interpretation and Conceptual Scope of the M-Score

5.3. Limitations of the M-Score and MCAS Design Choices

5.4. Accuracy–Complexity Trade-Off Considerations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Time-Domain Statistical Features

Appendix A.2. Frequency-Domain Features

Appendix A.3. Chaotic-Domain Features

Appendix A.4. Battery-Oriented Physical Indicators

Appendix A.5. M-Score Components

Appendix A.6. Feature Extraction Algorithms

Appendix B

Appendix B.1. MCAS Evaluation Criteria

Appendix B.2. MCAS Weight Sensitivity Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI