Battery Health State Prediction Based on Singular Spectrum Analysis and Transformer Network

: The failure of a battery may lead to a decline in the performance of electrical equipment, thus increasing the cost of use, so it is important to accurately evaluate the state of health ( SOH ) of the battery. Capacity degradation data for batteries are usually characterized by non-stationarity and non-linearity, which brings challenges for accurate prediction of battery health status. To tackle this problem, this paper proposes a battery prediction model based on singular spectrum analysis (SSA) and a transformer network. The model uses SSA to eliminate the effect of capacity regeneration, and a transformer network to automatically extract features from historical degraded data for the prediction. Specifically, the battery capacity sequence is used as the key index of performance degradation, which is decomposed by the SSA into trend components and periodic components. Then, the long-term dependence of capacity degradation is captured by the transformer network with a multi-head attention mechanism. Finally, two public lithium battery datasets were used to verify the validity of proposed model, and compared with mainstream models such as long-/short-term memory (LSTM) and convolutional neural networks (CNNs). The experimental results show that the proposed model has better prediction performance and extensive generalizability.


Introduction
At present, battery management system technology (BMS) is still not perfect enough.
A key factor restricting the large-scale and full-field application of BMS is that it is difficult to accurately estimate the state of health of the battery, which directly affects the effective use of battery capacity.Failure to accurately determine the performance status of the battery will reduce the safety and reliability of the battery system, which causes battery charge and discharge control issues with a lack of sufficient reference information, and ultimately affects battery performance and service life [1].In order to ensure the reliable operation of the battery management system, there must be a method to help determine the SOH of the battery system, so as to provide reference information for decision makers about when to remove or replace the battery [2].SOH is usually used to characterize the aging degree of the battery.SOH represents the ability of the battery to store electric energy, which is related to the initial capacity of the battery.It is defined as the percentage of capacity and initial capacity when discharging to the discharge cut-off voltage under a certain working condition.The calculation formula is as follows: Electronics 2024, 13 where C t is the capacity of the lithium battery in the t-th charge discharge cycle, and C 0 is the initial capacity of the battery.Generally, when the estimated value of battery SOH is less than 80%, it means that the battery is out of service, so the effective estimation of battery usable capacity is equivalent to the indirect estimation of battery SOH.
The complex chemical structure inside the battery and the influence of working conditions and environment make the degradation of lithium-ion battery nonlinear.The capacity regeneration phenomenon of the lithium battery brings challenges to the accurate estimation of SOH.Capacity regeneration means that the capacity decline of the lithium battery is not a monotonous decreasing process, but a sudden temporary capacity recovery phenomenon, causing capacity fluctuations and changing the trend of the degradation curve.
In order to further improve the performance of health state prediction of the battery system, this paper starts on the prediction model and data preprocessing method.At present, the life prediction methods of lithium-ion battery can be roughly divided into a model-based method and data-driven method [3][4][5].The model-based method reflects the electrochemical and physical characteristics of lithium-ion batteries by establishing an empirical model, so as to describe the degradation behavior of batteries.Zhang et al. [6] constructs a physical model of battery aging attenuation, and identifies the parameters of constructing the RUL model by using the least squares method.The prediction results show that the method can control the relative error of training within 1%.Lyu et al. [7] proposes an electrochemical model that simulates battery charge and discharge, treats the model parameters reflecting battery degradation as state variables, and estimates the service life of the battery using a particle filter.Despite the advantages of the strong real-time performance of the model-based approaches, the modeling and calculation are complex, and with the fixed parameter model it is difficult to accurately track the state of the internal structure of the battery.The data-driven method is the mainstream direction of available capacity estimation at present.It does not need to consider the internal complex structure, and only needs to deal with the relationship between input and output to build a model based on the historical characteristics of the battery.Data-driven methods usually use some machine learning algorithms for model training and prediction, including support vector machine (SVM), neural network (NN), random forest regression (RFM), Gaussian process regression (GPR), etc. [8][9][10][11].Due to the local regeneration of capacity in the process of battery degradation, the traditional machine learning model has limited feature-grabbing ability, which leads to a significant reduction in the prediction accuracy of the model.
Compared with the traditional machine learning algorithm, the deep learning algorithm has a deeper hidden layer and can effectively mine the hidden features between the input parameters, so researchers also began to improve the model by using the deep learning network [12].Liu et al. [13] used long-term and short-term memory recurrent neural networks to achieve online battery health assessment.Yang et al. [14] proposed a battery SOH estimation method based on a bidirectional long-term/short-term memory neural network, and verified the superiority of this method over back-propagation neural networks.However, these deep neural networks based on a recurrent neural network (RNN) structure inevitably face the problem of long-term dependence, and the degradation information in a long period of time will affect their prediction performance.
The transformer model is a state-of-the-art deep learning model.The self-attention mechanism is a key feature of this model, which allows the model to capture the dependence between any two positions in the sequence, not just between adjacent elements.The self-attention mechanism in its structure completely discards traditional RNN propagation in the horizontal direction, and propagates only in the vertical direction through the continuous superposition of self-attention layers [15].The attention mechanism can dynamically adjust the weight of input features, highlight the impact of important features, enhance the accuracy of model prediction, and has no impact on the calculation and storage capacity of the model.Therefore, the transformer model not only has good feature expression ability, but can also solve the problem that neural networks based on RNN structure will inevitably fall into long-term dependence.Lin et al. [16] proposed an LSTM network model combining attention mechanism and verified the superiority of the algorithm after adding an attention mechanism on the public Oxford dataset and Massachusetts Institute of Technology (MIT) dataset.
Researchers have used denoising algorithms to preprocess data to reduce the effect of data acquisition error on model prediction performance.The battery degradation data can be regarded as a class of nonlinear, non-stationary time series data, which can be decomposed by a decomposition algorithm into feature components and noise with different time scales, so as to strip the noise [17].A singular spectrum analysis algorithm is a data processing algorithm that does not require complex prior information and is very suitable for the analysis and optimization of nonlinear time series data [18].
In summary, a battery health status prediction model based on singular spectrum analysis and a transformer is proposed in this paper.On the one hand, the singular spectrum analysis algorithm is used to decompose and reorganize the features of the input prediction model, so as to remove the noise; on the other hand, a transformer network can capture key information to access the hidden correlations of degraded features, so as to improve prediction accuracy.
The contributions of this paper can be summarized as follows: (i) Singular spectrum analysis is used to decompose and reconstruct the original capacity sequence to obtain the long-term trend components, so as to filter out the noise components in the series and improve the accuracy of model prediction.(ii) The multi-head attention mechanism of the transformer model is used to capture the hidden correlation in the capacity sequence data, so as to achieve more accurate SOH prediction.(iii) The versatility of the SSA Transformer method is verified by the effective prediction of various types of batteries.
The rest of this paper is organized as follows: Section 2 introduces the relevant methods and principles.Section 3 describes the structure and implementation steps of the proposed method in detail.Section 4 compares and analyzes the specific experimental results.Section 5 makes a summary and discusses some future work directions.

Denoising Method Based on Singular Spectrum Analysis
Singular spectrum analysis is an efficient method to deal with non-stationary time series.By constructing the trajectory matrix of time series, and then decomposing and reconstructing the trajectory matrix, the characteristic components and noise are extracted according to their contribution, so as to reduce the noise of data.The steps of singular spectrum analysis are as follows: Step 1: Construct trajectory matrix.By mapping the original capacity data where L = N − K + 1; the element of the trajectory matrix X at (i, j) is x i+j−1 in the original data sequence, that is, all elements on any reverse diagonal in the trajectory matrix are equal; we set L = 10 in this paper.
Singular value decomposition (SVD) is performed on X, and it is arranged in descending order.X can be decomposed into d sub-vectors: is the number of non-zero singular values.
i are the elementary matrices after singular value decomposition of X; V i = X T U i / √ λ i , represents the right singular vector of X, √ λ i represents the singular value of the covariance matrix XX T , and U i represents the left singular vector of X.
Step 3: Grouping.The r linearly independent subset matrices of trajectory matrix can be obtained by linear transformation of the elementary matrix, that is: where Step 4: Reconstructing.Each matrix X * I i obtained by regrouping is transformed into time series data with length N by diagonal averaging, and each group of data represents different characteristics of the original sequence.Let Y = X * I i , L * = min(L, K), K = max(L, K), and N = K + L − 1; the diagonal averaging of matrix Y can be realized through Equation ( 5) and converted into a new time series y rc 1 , y rc 2 , . . ., y rc N .The relevant formula is as follows:

Transformer Network Model
The transformer network model is different from the traditional RNN network model.The former is completely based on the self-attention mechanism to extract the internal features, there is no such constraint as having to calculate from front to back one by one, and it can solve the problem that RNN and LSTM models cannot completely eliminate gradient disappearance and gradient explosion when facing a long sequence [19].The transformer network is a sequence-to-sequence structure composed of an encoder and decoder.
The structure of the transformer is shown in Figure 1.A single sub-encoder is composed of a position coding layer, multi-head attention mechanism layer, and feed-forward full connection layer.A normalization layer is introduced after the sublayer to normalize the values so that their eigenvalues are within a reasonable range, which can effectively prevent the vanishing of the gradient and accelerate the convergence of the model.The decoder is also composed of multiple identical layers, and residual connection and layer normalization are used in the layer.In addition to the same multi-head attention mechanism layer and feed-forward full connection layer as the encoder, the decoder uses masked self-attention to prevent the position of paying attention to the "future" position.This ensures that the position can only depend on the position preceding it, fitting into the autoregressive nature of sequence generation.
The encoder gradually encodes the input sequence through its multilayer structure.Each layer contains a self-attention layer and a feed-forward neural network, which allows each layer to update the representation of each element in the input sequence while keeping the sequence length constant.The decoder receives the output of the encoder as well as its own previously generated output, generating the target sequence by layers of processing.

Positional Encoding Layer
The positional encoding layer is used to provide information about the position of data in the sequence.For the transformer model, the order relationship of the sequence cannot be obtained directly, so the position encoding is used to retain the position information in the original sequence.Sine and cosine functions with different frequencies are used for position annotation: The encoder gradually encodes the input sequence through its multilayer structure.Each layer contains a self-attention layer and a feed-forward neural network, which allows each layer to update the representation of each element in the input sequence while keeping the sequence length constant.The decoder receives the output of the encoder as well as its own previously generated output, generating the target sequence by layers of processing.

Positional Encoding Layer
The positional encoding layer is used to provide information about the position of data in the sequence.For the transformer model, the order relationship of the sequence cannot be obtained directly, so the position encoding is used to retain the position information in the original sequence.Sine and cosine functions with different frequencies are used for position annotation: where P j is the position code with respect to the time step j, d model is the dimension of the input sequence x ′ t , and X t is the output sequence after the position coding processing.

Multi-Head Attention Mechanism
The purpose of the multi-head attention layer is to capture the dependencies between features.The self-attention layer maps the original features into Q, K, and V features, uses Q and K to calculate the correlation between the features at different times and the features at the current time, and obtains the features at the next time by weighted summation of V.The expression of the L − 1-layer multi-head attention mechanism is given: MultiHead where Q represents the query vector, K is the key vector, V is the value vector, h represents the number of multiple heads for the attention mechanism, W Q , W K , and W V represent different weight values, W o is the weight matrix, Concat is the vector splicing operation, and H L−1 represents the attention function of layer L − 1.

Feed-Forward Full Connection Layer
The role of the feed-forward fully connected layer is to prevent the degradation of the model output, mainly consisting of two linear transformations using a ReLU as the activation function.
The function of feed-forward full connection layer is to prevent the degradation of model output, which is mainly composed of two linear transformations using a ReLU as the activation function [20].The attention formula H L is obtained from the previous multi-head attention layer: where w 1 and w 2 are the weight matrix; b 1 and b 2 are offset terms.

Definition of Loss Function
The loss function is used to calculate the loss value between the predicted value and the real value.In this paper, the average absolute error is used as the loss function.The loss function used is shown in Formula (14):

Prediction Model of Battery Health State Based on SSA Transformer
The battery health state prediction process based on SSA Transformer is shown in Figure 2, with steps detailed below: Step 1. Data preprocessing: the capacity sequence is decomposed and reconstructed based on singular spectrum analysis to reduce noise.
Step 2. Model training: the grey relational analysis (GRA) method is used to determine the appropriate input factors for the model, and then the training set is used for model training and optimization.
Step3.Model effect evaluation: the tuned model is used to predict the capacity sequence, and the error between the predicted value and real value is calculated to evaluate the effect.
model training and optimization.
Step3.Model effect evaluation: the tuned model is used to quence, and the error between the predicted value and real value the effect.

Model Construction
In this paper, the deep learning toolbox of MatlabR2023b was former model.The overall structure of the model is shown in Figu of the capacity sequence was first extracted by SSA, then position and input into the self-attention layer of the transformer; then, a tracts the data from the specified index of the time or spatial dime Finally, the capacity at the next time is output through the full co Since the capacity prediction is essentially regression rather softmaxLayer of the traditional transformer structure is replaced The general prediction steps can be divided into four steps: Step 1.Take the battery capacity degradation data C(t) as th rameter.

Model Construction
In this paper, the deep learning toolbox of MatlabR2023b was used to build the transformer model.The overall structure of the model is shown in Figure 3.The trend sequence of the capacity sequence was first extracted by SSA, then positional encoding was added and input into the self-attention layer of the transformer; then, a 1-D indexing layer extracts the data from the specified index of the time or spatial dimensions of the input data.Finally, the capacity at the next time is output through the full connection layer.
Since the capacity prediction is essentially regression rather than classification, the softmaxLayer of the traditional transformer structure is replaced by a regressionLayer.The general prediction steps can be divided into four steps: Step 1.Take the battery capacity degradation data C(t) as the battery prediction parameter.
Step 2. The C(t) sequence is decomposed and reconstructed by the SSA algorithm, and the trend component of C(t) is decomposed according to Equations ( 2)-(4).
Step 3. The trend component is divided into a certain proportion of a training set and test set.This paper uses the first 40% of the dataset as the training set, and the last 60% as the test set.
Step 4. Put the test set into the trained transformer model to obtain the final battery health status prediction results.and the trend component of C(t) is decomposed according to Equations ( 2)-( 4).
Step 3. The trend component is divided into a certain proportion of a training set and test set.This paper uses the first 40% of the dataset as the training set, and the last 60% as the test set.
Step 4. Put the test set into the trained transformer model to obtain the final battery health status prediction results.

Dataset Selection
In this paper, two publicly available lithium battery datasets are selected to train and validate the SSA Transformer health state prediction model proposed.One is the lithium battery public dataset published by the Center for Advanced Life Cycle Engineering (CALCE) of the University of Maryland, in which there are six groups of CS2 batteries, each group containing between one and four single battery samples.We choose the second group of CS2 battery cells CS2-35, CS2-36, and CS2-37 [21].The other is the public lithium battery dataset, from the National Aeronautics and Space Administration (NASA) Prognostics Center of Excellence (PCoE) Research Center [22].We selected the cells B0005, B0006, and B0007 of group 1.
The nominal capacity of the CS2-35, CS2-36, and CS2-37 cells in the CALCE dataset is 1.1 Ah.The charging and discharging process is as follows: when charging, the constant current is charged at a rate of 1, constant voltage charging when the battery voltage

Dataset Selection
In this paper, two publicly available lithium battery datasets are selected to train and validate the SSA Transformer health state prediction model proposed.One is the lithium battery public dataset published by the Center for Advanced Life Cycle Engineering (CALCE) of the University of Maryland, in which there are six groups of CS2 batteries, each group containing between one and four single battery samples.We choose the second group of CS2 battery cells CS2-35, CS2-36, and CS2-37 [21].The other is the public lithium battery dataset, from the National Aeronautics and Space Administration (NASA) Prognostics Center of Excellence (PCoE) Research Center [22].We selected the cells B0005, B0006, and B0007 of group 1.
The nominal capacity of the CS2-35, CS2-36, and CS2-37 cells in the CALCE dataset is 1.1 Ah.The charging and discharging process is as follows: when charging, the constant current is charged at a rate of 1, constant voltage charging when the battery voltage reaches 4.2 V, and stop charging when the cut-off current (50 mA) is reached; when discharging, discharge to a cut-off voltage of 2.7 V in a constant current mode.The above charge-discharge experiments were repeated until the lithium-ion battery reached the life threshold.
The nominal capacity of battery numbers B0005, B0006, and B0007 in the NASA dataset is 2 Ah, and the charge and discharge process is as follows: during the charging process, charge in a constant current mode of 1.5 A until the battery voltage reaches 4.2 V, and charge continues in a constant voltage mode until the charge current drops to 20 mA.During the discharge process, discharge with a constant current of 2 A until the battery voltage of the cells B0005, B0006, and B0007 drops to 2.7 V, 2.5 V, and 2.2 V, respectively.

Data Preprocessing and Data Correlation Analysis
In the whole process of battery capacity degradation, there will be local capacity regeneration phenomenon, a result of this being that the degradation curve does not monotonously decrease, and the degradation curve fluctuates greatly.Singular spectrum analysis is used to decompose and reconstruct the capacity data, so as to remove noise and reduce the volatility of degradation curve.Figure 4 shows the comparison of the degradation curve of the battery before and after processing, and the fluctuation of the degradation curve of each battery is significantly smaller.
charging, discharge to a cut-off voltage of 2.7 V in a constant current mode.The above charge-discharge experiments were repeated until the lithium-ion battery reached the life threshold.
The nominal capacity of battery numbers B0005, B0006, and B0007 in the NASA dataset is 2 Ah, and the charge and discharge process is as follows: during the charging process, charge in a constant current mode of 1.5 A until the battery voltage reaches 4.2 V, and charge continues in a constant voltage mode until the charge current drops to 20 mA.During the discharge process, discharge with a constant current of 2 A until the battery voltage of the cells B0005, B0006, and B0007 drops to 2.7 V, 2.5 V, and 2.2 V, respectively.

Data Preprocessing and Data Correlation Analysis
In the whole process of battery capacity degradation, there will be local capacity regeneration phenomenon, a result of this being that the degradation curve does not monotonously decrease, and the degradation curve fluctuates greatly.Singular spectrum analysis is used to decompose and reconstruct the capacity data, so as to remove noise and reduce the volatility of degradation curve.Figure 4 shows the comparison of the degradation curve of the battery before and after processing, and the fluctuation of the degradation curve of each battery is significantly smaller.In order to evaluate the correlation between the sequence reconstructed by SSA and the original data, the Pearson correlation coefficient and grey relational analysis (GRA) were used for correlation analysis.Grey relational analysis is a method to quantitatively describe the degree of correlation between factors, which can reflect the consistency of the change trend of the two factors.The greater the absolute value of the Pearson correlation coefficient and grey correlation coefficient, the stronger the correlation between the two vectors [23].The correlation coefficients of different batteries are shown in Table 1.The Pearson correlation coefficient and gray correlation coefficient of the original data and the data processed by SSA are greater than 0.8, indicating a strong correlation.The calculation formulas are such as Equations ( 15) and ( 16): In order to evaluate the correlation between the sequence reconstructed by SSA and the original data, the Pearson correlation coefficient and grey relational analysis (GRA) were used for correlation analysis.Grey relational analysis is a method to quantitatively describe the degree of correlation between factors, which can reflect the consistency of the change trend of the two factors.The greater the absolute value of the Pearson correlation coefficient and grey correlation coefficient, the stronger the correlation between the two vectors [23].The correlation coefficients of different batteries are shown in Table 1.The Pearson correlation coefficient and gray correlation coefficient of the original data and the data processed by SSA are greater than 0.8, indicating a strong correlation.The calculation formulas are such as Equations ( 15) and ( 16):

Model Evaluation Index
In order to verify the effectiveness of the available capacity estimation model of lithium batteries based on SSA Transformer developed in this paper, the experimental capacity observation values were compared with the predicted results.The root mean square error (RMSE) and mean absolute percentage error (MAPE), as shown in Equations ( 17) and (18), are used to evaluate the estimation results of the constructed model.The smaller the value, the higher the estimation accuracy.

Analysis of Experimental Results
In the experiment, the first 40% of the dataset was used as the training set, and the last 60% of the data as the test set, and compared with LSTM network and CNN methods.Regarding the LSTM and CNN model parameter settings: the number of iterations was 500, the number of hidden layers was 1, the number of neurons was 200, the initial learning rate was 0.001, using an Adam optimizer.Regarding the transformer network model parameter settings: The learning rate was set to 0.001, and epoch was set to 500; the head number of the self-attention mechanism was 8, the time window length was set to 30, and the step size was 1, and the average absolute error compiled as the loss function.
In order to more intuitively reflect the performance of the proposed model, comparative experiments were carried out on multiple groups of batteries from the NASA dataset and CALCE dataset.The comparison results are shown in Table 2. Figure 5a-c show the prediction effects of the four prediction models on batteries B0005, B0006, and B0007 in the NASA dataset, respectively, while Figure 5d-f, respectively, show the prediction effects of the four prediction models on batteries CS2-35, CS2-36, and CS2-37 in the CALCE dataset.

Conclusions
In this paper, a prediction model based on singular spectrum analysis and a transformer is proposed to predict the health status of lithium batteries.The long-term trend subsequence is obtained by singular spectrum decomposition and reconstruction of the battery historical capacity series, and then trained and predicted based on the transformer model.The following conclusions are drawn.By comparing the prediction results in Figure 5 with the error values in Table 2, it can be seen that the RMSE and MAPE of the SSA Transformer network model proposed in this paper for SOH prediction of different types of batteries are low.After the addition of the SSA method, in the NASA dataset, the SSA Transformer network model had the best prediction accuracy for the B0007 cell, with an RMSE and MAPE of 0.0106 Ah and 0.0072 Ah, respectively.Compared with the RMSE and MAPE of the original transformer model, they were increased by 0.0047 Ah and 0.0025 Ah, respectively; compared to the LSTM model, there were improvements of 0.0101 Ah and 0.0064 Ah, respectively, and compared with the CNN model, there were improvements of 0.0061 Ah and 0.0014 Ah, respectively.In the CALCE dataset, the SSA Transformer network model had the best prediction accuracy for all the CS2-35, CS2-36, and CS2-37 cells.Compared with the original transformer model, the RMSE and MAPE of the CS2-36 cell were improved by 0.0028 Ah and 0.0037 Ah, respectively.Compared with the LSTM model, were improved by 0.0238 Ah and 0.0307 Ah, respectively, and compared with the CNN model, were improved by 0.0226 Ah and 0.0277 Ah, respectively.

Conclusions
In this paper, a prediction model based on singular spectrum analysis and a transformer is proposed to predict the health status of lithium batteries.The long-term trend subsequence is obtained by singular spectrum decomposition and reconstruction of the battery historical capacity series, and then trained and predicted based on the transformer model.The following conclusions are drawn.
Compared with other deep learning algorithms, the transformer model based on a self-attention mechanism has more efficient feature extraction ability.The proposed method can reduce the impact of the battery capacity regeneration phenomenon on the prediction model to a certain extent, and further improve the accuracy of battery health state prediction.The SSA Transformer model proposed in this paper has strong versatility and can effectively predict a variety of different types of lithium batteries.
In future work, it is necessary to study the voltage, current, temperature and other related characteristics of battery charge and discharge, and evaluate the health state of the

Figure 1 .
Figure 1.The structure of the transformer.

Figure 1 .
Figure 1.The structure of the transformer.

Figure 2 .
Figure 2. SSA Transformer health status prediction flow chart.

Figure 2 .
Figure 2. SSA Transformer health status prediction flow chart.

Figure 3 .
Figure 3.The abridged general view of SSA Transformer model prediction.

Figure 3 .
Figure 3.The abridged general view of SSA Transformer model prediction.

Figure 4 .
Figure 4. Comparison of battery degradation curves before and after processing.

Figure 4 .
Figure 4. Comparison of battery degradation curves before and after processing.

Figure 5 .
Figure 5.Effect comparison of the four prediction models.

Figure 5 .
Figure 5.Effect comparison of the four prediction models.

Table 1 .
Correlation analysis of capacity data after SSA processing.

Table 2 .
Estimation errors of the four models.In the CALCE dataset, the SSA Transformer network model had the best prediction accuracy for all the CS2-35, CS2-36, and CS2-37 cells.Compared with the original transformer model, the RMSE and MAPE of the CS2-36 cell were improved by 0.0028 Ah and 0.0037 Ah, respectively.Compared with the LSTM model, were improved by 0.0238 Ah and 0.0307 Ah, respectively, and compared with the CNN model, were improved by 0.0226 Ah and 0.0277 Ah, respectively.

Table 2 .
Estimation errors of the four models.