Hybrid Neural Networks for Enhanced Predictions of Remaining Useful Life in Lithium-Ion Batteries

: With the proliferation of electric vehicles (EVs) and the consequential increase in EV battery circulation, the need for accurate assessments of battery health and remaining useful life (RUL) is paramount, driven by environmentally friendly and sustainable goals. This study addresses this pressing concern by employing data-driven methods, speciﬁcally harnessing deep learning techniques to enhance RUL estimation for lithium-ion batteries (LIB). Leveraging the Toyota Research Institute Dataset, consisting of 124 lithium-ion batteries cycled to failure and encompassing key metrics such as capacity, temperature, resistance, and discharge time, our analysis substantially improves RUL prediction accuracy. Notably, the convolutional long short-term memory deep neural network (CLDNN) model and the transformer LSTM (temporal transformer) model have emerged as standout remaining useful life (RUL) predictors. The CLDNN model, in particular, achieved a remarkable mean absolute error (MAE) of 84.012 and a mean absolute percentage error (MAPE) of 25.676. Similarly, the temporal transformer model exhibited a notable performance, with an MAE of 85.134 and a MAPE of 28.7932. These impressive results were achieved by applying Bayesian hyperparameter optimization, further enhancing the accuracy of predictive methods. These models were bench-marked against existing approaches, demonstrating superior results with an improvement in MAPE ranging from 4.01% to 7.12%.


Introduction
The prediction of remaining useful life (RUL) for lithium-ion batteries is a critical task for ensuring the safe and optimal operation of battery packs, especially in applications like electric vehicles (EVs) [1].This literature delves into two primary approaches for RUL prediction: physics-based models (PBMs) and data-driven models (DDMs).PBMs leverage the fundamental principles of electrochemistry and battery physics to simulate battery behaviour over time [2].These models consider factors such as ion diffusion, electrode reactions [3], discharge capacity [4,5], cycles, capacity fade [6], and thermal effects [7].While PBMs provide valuable insights into degradation mechanisms, they are computationally intensive, require detailed knowledge of electrochemical processes [2,8], and may struggle to capture real-world complexities.DDMs, on the other hand, use machine learning algorithms to learn patterns and relationships directly from available data [9,10].They have gained prominence due to their ability to capture complex and nonlinear relationships that exist in the data, making them more adaptable and flexible than PBMs.
DDMs in machine learning are broadly categorized into two distinct groups: statistical machine learning and deep learning, each offering unique strengths and applications in various domains.Statistical machine learning, often referred to as shallow learning, typically encompasses models that are less complex and computationally intensive.This category includes a range of methods such as support vector machines (SVMs) [10], Gaussian process regression (GPR) [11][12][13], random forest [14,15], and Bayesian approaches [16][17][18].These techniques are particularly effective for smaller datasets where prior knowledge about the generative process of the data is available.However, they often encounter challenges when addressing complex scenarios, such as accurately capturing intricate battery characteristics and long-term dependencies within datasets [2,19].
In contrast, deep learning methods, which utilize neural networks with multiple hidden layers, are adept at handling large, complex datasets, often where there is limited a priori knowledge about the underlying processes or the most suitable features.This category includes advanced techniques like recurrent neural networks (RNNs) [20,21], convolutional neural networks (CNNs) [22,23], and various hybrid models [24,25].Deep learning approaches are renowned for their ability to discern and learn from intricate patterns present in raw data, making them particularly effective in the analysis of multivariate time-series information.This capability makes them especially valuable in fields where understanding and predicting complex behaviours over time is crucial.
Recent advancements in sequence-to-sequence deep learning in the domain of natural language processing further contribute to the discourse on multi-horizon time-series forecasting (MTSF).Sutskever et al. introduced a powerful end-to-end approach employing multilayered long short-term memory (LSTM) networks for sequence learning, showcasing impressive results in translation tasks [26].Similarly, Yang et al. explored incorporating a cross-entity attention mechanism in MTSF in [27].This method minimizes assumptions on sequence structures, particularly in tasks where large labelled training sets are available.The use of time-series forecasting has also been adopted in RUL prediction, as shown by [28], which introduces a Capsule Neural Network-LSTM, a hybrid model for precise RUL prediction in mechanical systems, enhancing sensitivity to spatial features in time-series sensor data.
However, in the context of RUL prediction, improvements in robustness, generalizability, and addressing challenges like variable sampling rates and incomplete data remain areas of focus [29].Hence, despite the progress in RUL prediction, current research has several knowledge gaps and limitations.These include (1) the need for more robust and accurate models, for enhanced generalizability, (2) the comprehensive exploration of various deep learning architectures, (3) the utilisation of complete datasets with varying sampling rates, and the consideration of factors like battery ageing and non-stationary signals.

Rationale for Advanced Model Architectures over Regression
The progression in methodologies for predicting the remaining useful life (RUL) of lithium-ion batteries underscores a pivotal shift toward enhancing the operational efficiency and reliability of battery-powered systems.Building upon the groundwork laid by Severson et al. (2019) [30], who utilized regression-based methods for early-cycle data prediction, our study uses the same dataset for advanced machine learning exploration.Diverging significantly, we employ sophisticated hybrid deep learning architectures, such as convolutional long short-term memory deep neural networks (CLDNN) and temporal transformers, which are adept at intricately modelling the spatial and temporal dynamics of battery degradation.This strategic departure from linear regression to deep learning is instrumental in capturing the multifaceted and nonlinear aspects of battery ageing, thereby elevating the precision and dependability of our predictions within a dynamic 100-cycle window.The rationale behind exploring temporal deep learning approaches based on neural networks over vanilla regression lies in their superior ability to process and learn from sequential data, providing a nuanced understanding of degradation patterns that enhance predictive accuracy and reliability, and broadening the scope for practical applications in battery life-cycle management.
Further, we refine the dataset through a rigorous feature engineering and selection process, including techniques such as linear interpolation and principal component analysis (PCA), which enhance model training and provide a broader application range.Coupled with precise data preprocessing and sampling techniques, such as stratified random sampling and outlier removal, our methodology assures robust and dependable predictions across varied battery use cases.The incorporation of Bayesian optimization for hyperparameter tuning is a testament to our dedication to achieving optimal model performance, tailored specifically to the dataset's nuances (an overview can be seen in Figure 1).It encompasses crucial parameters such as discharge capacity, temperature, internal resistance, and discharge time.
To effectively address the heterogeneity within our dataset, we adopted a hybrid approach known as CLDNN proposed by Sainath et al. [31], which stands for convolutional, long short-term memory, and dense neural networks.CLDNN harnesses the collective power of these neural network architectures, providing a solution to the multifaceted nature of LIB RUL prediction.In addition, we repurposed the hybrid model called the temporal transformer (TT) proposed by Chadha et al. [32] to enhance prediction accuracy and robustness in LIB RUL.The TT model combines the strengths of transformer self-attention layers and LSTM architectures, presenting a unique approach to sequential modelling that effectively addresses the challenge of capturing long-term dependencies [33].While the temporal transformer shares a name with the temporal fusion transformer (TFT) introduced by Lim et al. [34], it is important to note their architectural differences.The TFT is designed for handling multi-modal data, allowing it to incorporate various relevant features for forecasting tasks.In contrast, our dataset did not require such multi-modal capabilities, leading to divergent architectural choices in our models.The remainder of this article is structured as follows.In Section 2, we delve into the related works within the field of lithium-ion battery RUL prediction, focusing on deep-learning approaches.Section 3 provides a comprehensive overview of our proposed methodology, followed by a detailed account of the experimental procedure.Section 4 is dedicated to discussions regarding the outcomes of our experiments.Lastly, Section 5 closes this article with our conclusion and future work.

Related Works
Physics-based models (PBMs) provide valuable insights into the underlying mechanisms of lithium-ion batteries but often face challenges when predicting their remaining useful life (RUL) [2].These models perform optimally within narrow domains, requiring a detailed understanding of battery electrochemical processes, which can be time-consuming and computationally intensive [3,8,35].The simplified assumptions often made in these models limit their ability to generalize, affecting RUL prediction accuracy.Furthermore, parametrisation in PBMs is challenging, due to manufacturing variations and the difficulty of obtaining precise model parameters [36,37].These models also exhibit limited adaptability to dynamic environments and high computational complexity, which hampers their real-time applicability.Calibration and validation procedures for these models demand extensive data and resources [38].Consequently, data-driven methods like machine learning have gained prominence for their ability to learn complex nonlinear relationships from data, overcoming uncertainties, and providing a promising alternative for RUL prediction in LIB [39][40][41][42].

Temporal Models
Temporal models, like recurrent neural networks (RNNs) and more advanced variants such as LSTM cells and gated recurrent units (GRUs), are employed to capture the sequential nature of battery data.Lipu et al. [20] introduced a nonlinear auto-regressive with exogenous input (NARX)-based neural network (NARXNN) for state of charge (SOC) estimation, utilizing a lighting search algorithm (LSA) to optimise hyperparameters.Signal decomposition techniques like discrete wavelet transform (DWT), empirical mode decomposition (EMD), and variational mode decomposition (VMD) are combined with the NARX model in [46] to predict capacity degradation trajectories.In [47], Zheng et al. employed a deep LSTM network followed by multiple fully connected (FC) layers, whereas Wang et al. in [48] introduced a bidirectional LSTM architecture with additional FC layers.Chemali et al. [21] proposed an RNN architecture with LSTM for RUL estimation, achieving low mean absolute error (MAE) values and demonstrating generalization across different conditions.

Convolutional Models
Shen et al. [23] presented a deep convolutional neural network (DCNN) that achieved higher accuracy than traditional methods, such as filter-based models [49][50][51][52].However, limitations include the fixed-size input matrix and a limited understanding of the method's interpretability.Following a similar approach, Li et al. [53] proposed another DCNN, employing a time window approach for sample preparation and feature extraction.In another work, [54], Babu et al. proposed a novel deep CNN-based regression method for RUL estimation.Similarly, Shen et al. in [14] proposed a DCNN with transfer learning (DCNN-TL) model, which was then integrated with ensemble learning to form DCNN-ETL.The effectiveness of the DCNN-ETL model was then benchmarked against five other data-driven methods including random forest regression, Gaussian process regression, and DCNN.In another study, a temporal convolutional network (TCN) model was used along with causal and dilated convolutions to capture local capacity degradation, but its limitations include its applicability to other battery types and real-time implementation challenges [24].

Hybrid Models
Ren et al. proposed an auto-CNN-LSTM model for RUL estimation, incorporating an auto-encoder to enhance feature vectors through dimensionality reduction and feature learning, achieving accurate predictions [25].Identified knowledge gaps include the need for enhanced generalizability, further exploration of various temporal networks, and improved utilization of datasets with varying feature sampling rates.Additionally, it is suggested that future research should address the ageing effects, the impact of non-stationary signals, the incorporation of relevant features, and the capture of spatial information.Li et al. introduced a directed acyclic graph network that combined the LSTM with the CNN, offering a novel approach to RUL prediction [55].Tan et al. focused on a multi-variate time-series approach using a lightweight CNN with attention mechanisms to enhance prediction accuracy [56].Hybrid models, such as the one combining a GRU and a CNN for state of health (SOH) estimations, presented by Fan et al., and the attention-assisted temporal convolutional memory-augmented network (ATCMN) for RUL prediction from limited data, proposed by Fei et al., demonstrate promise but require further exploration and rigorous testing on more diverse datasets [57,58].In previous work, the effectiveness of CNN-LSTM and CNN-LSTM-Differential-Neural-Computer models was demonstrated, showcasing not only superior predictive accuracy but also efficiency in learning temporal dependencies with fewer epochs [59,60].
In the following section, we aim to address critical knowledge gaps identified in the literature regarding the prediction of the RUL of LIBs.These gaps encompass issues of robustness, accuracy, and generalizability in DDMs for battery management systems (BMSs).To overcome these limitations, we developed end-to-end hybrid models capable of utilizing complete datasets [61], including features with varying sampling rates.By doing so, we intend to enhance the performance metrics for RUL estimation in LIBs and move closer to enabling real-time implementation of these models within BMS for practical applications.Additionally, we explored various deep learning approaches, such as convolutional neural networks (CNNs) [62], long short-term memory cells (LSTMs) [33], the Transformer network [63], autoencoder [64], neural Turing machines [65], differentiable neural computers [66], and hybrid models, while considering the impact of ageing, nonstationary signals, relevant feature incorporation, spatial information capture, and variable ambient temperature conditions.

Methodology
This section presents the framework (Figure 2) of the proposed LIB RUL prediction method.An important part of this research was the optimization of model hyperparameters using Bayesian optimization techniques aimed at maximizing the models' predictive accuracy.The primary objective was to develop a robust predictive model for estimating the RULs of LIBs, accounting for intricate temporal battery degradation dynamics and feature variations in sensor measurements across different battery batches.The study pipeline begins with a dataset comprising 124 commercial LIBs that have undergone extensive cycling until failure.Next, we implement feature selection to extract crucial features that enhance the accuracy of RUL predictions.Following this, a series of data preprocessing techniques are employed to clean and compress the dataset.Subsequently, stratified random sampling is utilized to create a representative sample.Finally, we develop a variety of hybrid deep network architectures, apply hyperparameter optimization, and evaluate these models using a testing set to determine the best-performing model.

Dataset Description
The dataset employed in this study consists of 124 commercial lithium iron phosphate (LFP)/graphite lithium-ion batteries, each with a nominal capacity of 1.1 Ah and a nominal voltage of 3.3 V.These batteries were subjected to cycling tests until failure under conditions aimed at fast charging within a temperature-controlled environment set at 30 °C.Our dataset is derived from extensive data collection efforts, capturing critical parameters such as voltage, current, temperature, and internal resistance, which were continuously measured throughout the cycling process.The term 'continuously measured' refers to the collection of data at frequent, regular intervals throughout each battery's charging and discharging cycles.This high-resolution data capture allows for an in-depth analysis of battery performance over time, enabling the precise modelling of degradation patterns.The charging protocol involved an initial phase utilizing a current of C1 until reaching a state-of-charge (SOC) of S1, followed by a transition to a current of C2 until achieving a SOC of S2, consistently maintained at 80% for all cells.Subsequently, a constant current-constant voltage (CC-CV) charging technique was applied for the transition from 80% to 100% SOC at a rate of 1 C, culminating at a cut-off voltage of 3.6 V [67].The batteries' lifetimes, defined by the cycle count at which the capacity declined to 80% of its initial value, varied from 150 to 2300 cycles, showcasing a diverse range of degradation behaviors.

Stratified Random Sampling
Stratified random sampling was employed to create a representative dataset for the model's training and testing.This approach ensured that cells from different battery batches, characterised by varying quality control protocols, were proportionally included in the dataset.By preventing models from overfitting to specific batch attributes, this method improved model robustness and generalisation across different LIB batches.

Feature Selection
The dataset consists of three groups of approximately 48 lithium-ion battery cells each, to identify features critical for predicting the remaining useful life (RUL) of the batteries.Our analysis focuses on key parameters, which include linearly interpolated discharge capacity (Qdlin) and linearly interpolated temperature (Tdlin).These features are standardized through linear interpolation to maintain uniform analysis conditions across all cells, ensuring accurate predictions by aligning data on a consistent timeline.The interpolation process applied maintains a consistent sampling rate for all cells, a critical factor for the accurate prediction of RUL and ensuring the reliability of timeseries forecasting models.The selection also encompasses internal resistance (IR) and discharge characteristics, which are vital for evaluating battery health and degradation.This approach ensures the reliability of our forecasting models by providing a detailed view of battery performance over time.These features provide descriptive, summary, and cycle data, offering a holistic view of the batteries' operational conditions, performance metrics, and cycle-by-cycle behaviour.This supports a nuanced analysis, leveraging Qdlin and Tdlin alongside other selected parameters to assess battery health and predict its lifespan effectively.The output metric, remaining cycles, is calculated to estimate each battery's potential duration until its capacity falls to 80% of its original value, serving as a key indicator of its useful life.

Data Preprocessing
To prepare the data for deep neural networks, we implemented a comprehensive preprocessing pipeline.Outliers in internal resistance, discharge time, and discharge quantity were removed using the fill outliers function with the cubic spline method and a moving average window of 100 (Figure 3a, Figure 3b and Figure 3c, respectively).These figures show the features before and after preprocessing (note: these plots depict the first cell in the first batch).Smoothing techniques, such as moving average filters with a window size of 15, were applied to discharge data.To capture temporal dynamics in Qdlin and Tdlin, the sampling rate was standardized to 1000 entries per cycle.Finally, principal component analysis (PCA) was employed to reduce the dimensionality of Qdlin and Tdlin while preserving their salient features.

RUL Estimation
Traditional supervised learning approaches rely on labelled training data, where each data point is associated with a known target value.However, in the context of prognostics, this assumption often does not hold.The remaining useful life (RUL) of individual components is typically not known a priori, making it challenging to train predictive models using standard regression techniques.To overcome this hurdle, researchers have explored alternative approaches, such as employing physics-based models or utilizing machine learning algorithms to estimate the RUL of training data.While assigning a constant RUL value to all training points may seem like a straightforward solution, this can lead to inaccurate representations of the actual degradation patterns and hinder the model's ability to generalize effectively.A more sophisticated approach involves estimating the RUL based on a suitable model, as demonstrated by the use of a deep convolution neural network [23].This approach offers a more realistic representation of the degradation process and can potentially enhance the model's predictive performance.

Metrics
A variety of metrics were used in this study.The success of the model was calculated based on the deviation (e i ) of the predicted number (model prediction ŷi : RUL predicted ) of cycles from the actual number (ground truth y i : RUL truth ) of cycles remaining after every 100 cycle count, as shown in Equation ( 1).The choice of loss function significantly impacts the outcome of RUL prediction.Along with the mean absolute error (MAE), shown in Equation ( 3), and the mean absolute percentage error (MAPE), shown in Equation ( 4), which simply average the absolute errors and calculate the percentage, respectively, we also tracked the root mean squared error (RMSE), shown in Equation ( 2), which squares the errors before averaging, placing greater emphasis on larger deviations.This sensitivity to larger errors makes RMSE a more suitable metric for prognostics, where accurately predicting RUL is crucial, and substantial errors can lead to poor performance.Additionally, the rationale behind tracking these specific metrics is that they allow for meaningful comparisons against state-of-the-art approaches.Success in our study is quantitatively measured by the accuracy of our RUL prediction models.The MAE provides a direct measure of prediction accuracy in the same units as the dataset (number of cycles), MAPE offers a percentage-based error that is independent of the scale, and RMSE gives a sense of the error distribution and penalizes larger errors more severely.Our success criterion was to minimize these error metrics, thereby improving the predictive accuracy of our models.

Proposed Architectures
It is worth noting that our experimentation phase encompassed the exploration of multiple temporal hybrid models; however, we only detail the most effective models.The most successful architectures in our study were the convolutional long short-term memory deep neural network (CLDNN), originally proposed by Sainath et al. [31] for natural language processing and specifically used for large vocabulary continuous speech recognition (LVCSR) tasks [31].In their work, CLDNN outperformed the Gaussian mixture model (GMM) and hidden Markov model (HMM) systems [68].We repurposed the CLDNN architecture for the specific task of RUL estimation.
Another noteworthy model in our investigation was the temporal transformer (TT), initially introduced by Chadha et al. [32] for RUL estimation using the commercial modular aero-propulsion system simulation (C-MAPSS) dataset.The TT model demonstrated effectiveness in predicting the remaining useful life of aircraft engines.Ma et al. [69] presented a similar use case where their model utilized multi-head attention to capture global features from various representation sub-spaces.Although originally designed for predicting the remaining useful life of aerospace engines, we adapted these models for estimating the remaining useful life of lithium-ion batteries (LIB).This adaptation involved specific architectural deviations and adjustments to hyperparameters.
Our approach follows Occam's razor principle, which emphasizes simplicity and efficiency in model selection without sacrificing the predictive accuracy needed for effective RUL estimation [70].This method allows for scalable solutions in battery degradation prediction by using the advantages of neural network architectures while avoiding unnecessary complexity.Regarding dataset size concerns, our hybrid deep learning models aim to balance complexity with dataset limitations.We enhanced data utilization through preprocessing strategies and incorporated dropout, regularization, and early stopping in our training processes to prevent overfitting, ensuring consistent performance (please refer to Figure 4a,b).

The Convolutional Long Short-term Memory Deep Neural Network (CLDNN)
Adapting a convolutional long short-term memory deep neural network (CLDNN), initially developed for natural language processing (NLP) tasks with tokenized sentences, to predict remaining useful life (RUL) in lithium-ion batteries (LIB) necessitates several architectural and algorithmic modifications, as follows: • Input Representation: In NLP tasks, CLDNN takes tokenized sentences as input.For RUL prediction, the input representation needs to be tailored to the characteristics of battery data.Time-series data from sensors measuring various parameters (voltage, current, temperature, etc.) were used as input.The input data were reshaped into a format suitable for time-series analysis.

Temporal Transformer
Adapting the temporal transformer (TT) model, which was originally designed for the estimation of the remaining useful life (RUL) of aircraft engines, for the estimation of RUL in lithium-ion batteries (LIB), involves several architectural differences and adjustments.The following are some modifications that might be considered.

•
Input Representation: Adjustments to the input representation to accommodate the characteristics of LIB data were required.The original model took input sequences related to engine parameters.LIB data consists of time-series measurements of capacity, temperature, resistance, and discharge time.
• Attention Mechanisms: Multi-head attention mechanisms, which were used in the original model by Ma et al. [69], needed adjustments for LIB data.Attention mechanisms were tailored to focus on features relevant to battery degradation patterns about the linearly interpolated feature.

•
Model Size and Complexity: The overall size and complexity of the LIB dataset required an increase in the size and complexity of the TT model.This involved adding layers, adjusting attention mechanisms, and increasing the model depth, which led to 3,936,281 trainable parameters.

•
Hyperparameter Tuning: Fine-tuning hyperparameters using Bayesian optimization specific to LIB data was required.This included learning rates, the number of attention heads, embedding dimensions, layer sizes, and dropout rates.
The temporal transformer (TT) model we used is a hybrid network that combines the transformer and long short-term memory (LSTM).Each component of the network serves a specific function, as follows: 1.
Transformer: The transformer's multi-head self-attention mechanism allows the model to decipher complex temporal dependencies within the dataset.By parallel processing different parts of the input sequence, it extracts a rich contextual understanding of each data point.

2.
LSTM: The LSTM units capture long-term relationships between features, enhancing the model's predictive capabilities.

3.
Feed-forward neural networks (FFNs): The architecture employs FFNs for further refinement, facilitating the modelling of non-linear data relationships.
The TT algorithm is outlined in the provided Algorithm 2 and Figure 4b (note: x ∈ R B×T×D : input data with dimensions B × T × D, where B is the batch size, T is the time dimension, and D is the feature dimension).
The SelfAttention function in Algorithm 2 computes weighted representations of input data by considering inter-dependencies across multiple dimensions, employing a scaled dot-product attention mechanism.The TransformerBlock further refines these representations through layer normalization and feed-forward networks, enhancing their expressiveness while retaining sequential relationships.This modular and hierarchical structure allows the LSTM-transformer to capture patterns in sequential data, making it versatile for offering a robust solution for accurate RUL predictions in lithium-ion batteries.

Hyperparameter Optimization
The hyperparameter tuning for the proposed LIB RUL prediction models was achieved using Bayesian Optimization.Hyperparameter tuning is a critical step in the development of machine learning models, involving the search for optimal configurations to enhance predictive accuracy [71].In this study, Bayesian optimization was chosen due to its effectiveness in handling non-linear and complex search spaces.Unlike traditional grid search or cross-validation methods [72], Bayesian optimization uses probabilistic models to predict the performance of different hyperparameter configurations, guiding the search toward promising regions [73].This is particularly beneficial in high-dimensional spaces, where an exhaustive search becomes computationally expensive.
The specific implementation of this tuning process utilized the Keras Tuner library [74], a choice motivated by its compatibility and ease of integration with deep learning models (which can be seen in Algorithm 3).The process began with the definition of a hypermodel class, referred to as "MyHyperModel", derived from Keras Tuner's "HyperModel" class.This class encapsulated not only the architecture of the LIB RUL prediction models, including CLDNN or TT but also the hyperparameters' search space.Furthermore, a specialized Bayesian optimization tuner class, named "MyBayesianOptimizationTuner", was created by extending the "BayesianOptimization" class from Keras Tuner.This custom tuner was configured to minimize the validation mean absolute error (MAE), thereby optimizing the models' performance.

Algorithm 2 Temporal Transformer
Prepare for Transformer block Return [Q dlin , T dlin , IR, DT, QD] We now proceed to apply the attention mechanism across each data point 5: end procedure 6: procedure DEFINEMULTIHEADSELFATTENTION(x, D h ) Uses three sets of weight matrices W q , W k , W v to transform the input data into query (Q), key (K), and value (V) Obtained from DefineStructuring() Computing attention scores Q and K representations, obtained by linear transformations using W q and W k .h ← HW o Return h 7: end procedure 8: procedure DEFINETRANSFORMERENCODERBLOCK(x, D h , F) Perform a linear transformation on h and add bias b 1 u ← ReLU(u) Apply the ReLU activation function for non-linearity Perform another linear transformation 9: Return z ∈ R B×T×D 10: end procedure 11: Dropout Layer 12: LSTM Layers: LSTM for sequential data processing h 1 ← LSTM(z, lstm_units) The hyperparameter search was conducted over a substantial span of 100 trials, encompassing 10 epochs for each trial on the training dataset.This comprehensive approach ensured a thorough exploration of the hyperparameter space, thereby identifying configurations that yield the most accurate predictions.Finally, the performance of the optimized models was rigorously evaluated on the validation dataset, ensuring the reliability and effectiveness of the hyperparameter tuning process in enhancing the predictive capabilities of the LIB RUL prediction models.
After completing the search, the optimal hyperparameter set was retrieved, and a new model was built using these parameters.This process ensured that the final CLDNN and TT model was fine-tuned for optimal performance, leveraging the power of Bayesian optimization to navigate the hyperparameter space efficiently.

Results and Discussion
This section presents and compares our findings and results to other algorithms, leveraging the validation data.During model development, it became evident that the most effective models necessitated two cardinal attributes.Firstly, they were required to be capable of managing sparse data by proficiently extracting significant features.Secondly, they were expected to possess the capability to learn both temporal and spatial relationships between the features, and the remaining cycles of each lithium-ion battery.
After these findings, a comparative analysis was undertaken among an assortment of hybrid models, which embraced these two critical characteristics.The performance of these models is visually presented in Figure 5a-c where key performance indicators such as loss, mean absolute error (MAE), mean absolute percentage error (MAPE) and mean squared error (MSE) are prominently depicted.It is noteworthy that each figure includes a comparison of all the hybrid neural network architectures developed, encompassing the baseline CLDNN, the optimized CLDNN, the transformer-LSTM, and its optimized counterpart.

Comparing All the Tested Temporal Models
When placed in contrast with other models that embody the desired characteristics, such as convolution neural network-differential neural computer (CNN-DCN), CNN-LSTM-neural Turing machine (CNN-LSTM-NTM), CNN-transformers, and transformerautoencoder, it becomes evident that these models exhibit higher error metrics, thereby underscoring their relative inefficacy in accurately predicting the RUL for lithium-ion batteries, as can be seen in Table 1.

Best Performing Models
In the course of assessing these results, two models, namely the convolutional, LSTM, densely connected (CLDNN) and the transformer-LSTM (temporal transformer), emerged as the most proficient in predicting the RUL, as outlined in Table 1 and Figure

Observations
Several key observations were drawn from the hyperparameter setups (Tables 2 and 3) of the best-performing hybrid models.

•
The optimized temporal transformer had fewer embedding dimensions and a lower number of attention heads compared to the original model.This meant that the original model was over-parametrised.• Both transformer-LSTM and CLDNN models have 'optimized' versions with distinguishable hyperparameter configurations.For instance, the dense layer in the optimized models contains an increased number of units, with the optimized transformer-LSTM model having 64 units, compared to its original 40.Additionally, the optimized configurations have a reduced dropout rate and learning rate.• With a learning rate of 0.001, the original transformer-LSTM model is ten times more robust than its optimized counterpart, which has a learning rate of 0.0001, preventing gradient explosion and overshooting the minimum in the optimized model.In comparing our results to other approaches in the literature, several standout models, namely autoencoder-DNN, auto-CNN-LSTM, and ATCMN, exhibit relevance and importance to our research.These models, as presented in Table 4, share a hybrid deep learning architecture similar to ours, combining different network architectures to enhance prediction.Notably, they distinguish themselves by predicting the remaining useful life (RUL) of batteries in terms of remaining cycle counts, aligning with our research objectives and providing a more actionable metric for battery health management compared to other models that predominantly focus on binary classification tasks (predicting whether the battery has remaining cycles or has surpassed the end of life threshold).While these models bear significance due to their hybrid architecture and cycle count prediction approach, our CLDNN and transformer-LSTM models demonstrate superior performance in RUL prediction for lithium-ion batteries (LIBs), especially when compared to other temporal models like LSTM-RNN, Deep-CNN, and Temporal-CNN [75].Table 4 also highlights the average inference time during the validation of each of the approaches.The average inference time of the CLDNN and TT is also reasonable, making them efficient for real-time applications.
The ATCMN model, as described by Fei et al. [58], utilizes discharging time, voltage, and capacity for RUL estimation.In contrast, our models demonstrate superior performance in predicting remaining cycles for LIBs within a 100-cycle moving window, employing a CC-CV (constant current-constant voltage) charge policy.The CC-CV charging approach involves initially charging the battery at a constant current until a set voltage threshold is reached, followed by maintaining this voltage while the current gradually decreases as the battery becomes fully charged.This method is widely used due to its effectiveness in extending the battery's lifespan and optimizing its performance.Typical parameters of the CC-CV charge policy include the constant charge current, the voltage threshold at which the transition to constant voltage occurs, and the termination current at which the charge cycle is considered complete.These parameters are not dynamic but are predetermined based on the battery's characteristics and the desired charging performance.The Auto-CNN-LSTM model by Ren et al. [25] operates on a distinct dataset with less variation and limited diversity in input parameters, which the authors acknowledge.This disparity in datasets, compared to our more extensive and diverse dataset, may lead to variations in outcomes between the two models.Our comprehensive dataset provides a robust foundation for modelling and prediction, ultimately enhancing the reliability and applicability of our findings, especially when considering real-time variables such as voltage, capacity, battery body temperature, and internal resistance within the 100-cycle moving window.

Conclusions and Future Work
This study addresses the critical challenge of accurately estimating the RUL of LIBs within the context of electric vehicles.By leveraging deep learning techniques and utilizing a rich dataset from the Toyota Research Institute, we have developed and evaluated two hybrid models: CLDNN and TT.Our contributions encompass the creation of a preprocessed high-quality dataset through stratified random sampling, by implementation of a comprehensive data preprocessing pipeline, and the development of two hybrid models.This pipeline ensures feature consistency and captures temporal dynamics, thereby laying the foundation for precise RUL predictions.
Both the CLDNN and TT models exhibited commendable performance, surpassing existing approaches with mean absolute errors (MAEs) of 84.012 and 85.134, respectively.Furthermore, they demonstrated improvements in mean absolute percentage error (MAPE), which ranged from 4.01% to 7.12%.These models prove to be well-suited for LIB RUL prediction, making substantial contributions to battery recycling and sustainability within the electric vehicle industry.Additionally, these models demonstrated superior inference time, highlighting their efficiency and applicability in real-world scenarios where rapid decision-making is crucial.This enhancement in inference speed, combined with their predictive accuracy, positions the CLDNN and TT models as highly effective tools for advancing the reliability and efficiency of battery systems in the electric vehicle sector.
While the current achievements in battery health estimation models are commendable, there are several avenues for improvement.Expanding the validation of these models to a wider array of real-world datasets is essential to strengthening confidence in their practical applicability.Investigating advanced techniques such as complex data augmentation, and alternative ensemble methods, and exploring the potential of liquid neural networks (LNNs) could lead to improvements in model performance, offering more adaptable and robust solutions for battery health estimation.Additionally, the application of graph neural networks (GNNs) in explaining the lithium-ion battery's (LIB) remaining useful life (RUL) has been suggested as a means to reduce the number of parameters while potentially outperforming traditional physics-based models.Incorporating diverse battery chemistries into future research is also critical.The current dataset is limited to a specific type of battery (lithium ferrous phosphate/Gr, 1.1 Ah, 3.3 V), which may not fully represent the performance across different battery types.By including a variety of battery chemistries, such as LFP batteries with a capacity of 3 Ah or nickel cobalt manganese (NCM) batteries with a capacity of 3 Ah and a nominal voltage of 3.8 V, models can be trained to generalize better across different battery systems.The benefit of including diverse battery chemistries lies in the ability to create more universal and robust predictive models that can adapt to various battery behaviours and degradation patterns, ultimately leading to improved safety, reliability, and efficiency in battery usage.Additionally, a more exhaustive comparison with existing RUL prediction methods, encompassing both traditional statistical models and contemporary machine learning approaches, could be conducted.
Despite these achievements, several areas for future improvement have been identified.Real-world implementation and validation on a broader dataset are crucial for bolstering confidence in the models' applicability.Exploring complex augmentation methods, alternative ensemble solutions, and liquid neural networks (LNNs) [76][77][78] could further refine model performance and introduce more efficient, adaptable, and robust approaches to battery health estimation.There is also potential for the use of explaining the LIB RUL using graph neural networks, which have also been shown to significantly reduce parameter count and perform better than their traditional physics-based model counterparts [79,80].Future research may leverage LNNs or GNNs, known for their dynamic adaptability, to potentially enhance RUL prediction for LIBs.With reduced computational intensity, these networks may offer superior generalization and efficiency for large-scale applications like electric vehicle battery management systems.
Reducing parameter counts to enhance model efficiency, measuring processing times in online scenarios, and investigating the alignment between hyperparameter optimization and a comparison between physics-based models are promising avenues for future research.In conclusion, while this study represents a significant step forward in battery health estimation, ongoing research should focus on diversifying data sources, simplifying model complexities, and exploring emerging technologies, such as LNNs, to further advance this field.

Figure 1 .
Figure 1.An overview of the proposed advanced data refinement and model optimization.

Figure 2 .
Figure 2.The study pipeline begins with a dataset comprising 124 commercial LIBs that have undergone extensive cycling until failure.Next, we implement feature selection to extract crucial features that enhance the accuracy of RUL predictions.Following this, a series of data preprocessing techniques are employed to clean and compress the dataset.Subsequently, stratified random sampling is utilized to create a representative sample.Finally, we develop a variety of hybrid deep network architectures, apply hyperparameter optimization, and evaluate these models using a testing set to determine the best-performing model.

Figure 3 .
Comparison of data features before and after preprocessing.(a) Internal resistance, (b) discharge time, (c) discharge Quantity.

Figure 4 .
Comparison of the CLDNN and temporal transformer architecture.(a) The CLDNN model.(b) The temporal transformer model.

Figure 5 .
Comparison of error metrics between the top-performing models and other tested models, as detailed in Table1.(a) MAE-mean absolute error for the best performing models.(b) MAPE-mean absolute percentage error for the best performing models.(c) MSE-mean squared error for the best performing models.
5. The CLDNN model exhibited an MAE of 84.012, a MAPE of 25.676, and a MSE of 0.6754.Conversely, the temporal transformer model recorded a MAE of 85.134, a MAPE of 28.7932, and a MSE of 0.7136.To view the model's training duration, please refer to Figure 6.

Figure 6 .
Figure 6.Training times of the baseline and optimized models in seconds.
The final optimized CLDNN architecture excels at predicting the remaining useful life (RUL) of lithium-ion batteries.It comprises a total of 1,518,665 trainable parameters and combines convolutional neural networks (CNN), long short-term memory (LSTM), and dense neural networks (DNN).Each component of the network serves a specific function, The dense neural networks are used for prediction.They contribute to the model's regularization and refined prediction capabilities.The CLDNN architecture for our task is shown in Algorithm 1 and Figure4a.This model demonstrates a strong efficiency in handling the heterogeneity of the dataset.

TimeDistributed Dense Layer and Dropout: Further feature transformation
4:

Table 1 .
Results of six different ensemble NN models.

Table 4 .
Caparison of CLDNN and TT against other deep learning algorithms.