Deep Learning with Spatial Attention ‐ Based CONV ‐ LSTM for SOC Estimation of Lithium ‐ Ion Batteries

: Accurate estimation of the state of charge (SOC) is an indispensable part of a vehicle man ‐ agement system. The accurate estimation of SOC can ensure the safe and reliable operation of the vehicle management system. With the development of intelligent transportation systems (ITS), ve ‐ hicles can not only obtain the dynamic changes inside the battery through sensors, but also obtain the traffic information around the vehicle through vehicle–road collaboration. In addition, the de ‐ velopment of onboard graphic processing units (GPUs) and Internet of Vehicles (IOV) technology make the computing power of vehicles no longer limited by hardware, which makes neural net ‐ works applied to the intelligent control of vehicles. Aiming at the problem that the traditional net ‐ work cannot effectively obtain the complex spatial information of sample attributes, we developed an attention ‐ based CONV ‐ LSTM module for SOC prediction based on a convolutional neural net ‐ work (CNN) and a long short ‐ term memory (LSTM) network. Different from the traditional LSTM network, the algorithm not only considers the temporal correlation of the data stream, but also cap ‐ tures the spatial correlation information of the input data by convolution. It then uses the different weights, automatically assigned by the attention mechanism, to correctly distinguish the im ‐ portance of different input data streams. In order to verify the validity of the model, this paper selects the degradation data set of the aeroengine as the verification data set. Experiments show that the proposed model has achieved good results. Finally, the proposed model is applied to the actual vehicle running data, and the effectiveness of the proposed model is verified by comparing it with the Multi ‐ Layer Perceptron (MLP), LSTM, and CNN ‐ LSTM models.


Introduction
Due to ongoing issues with oil supply and increasing environmental pollution, electric vehicles (EVs) have gradually supplanted regular cars as the primary mode of transportation [1]. Because of its high energy density, low self-discharge rate, and numerous cycles, lithium-ion batteries are frequently utilized in electric cars [2]. However, when lithium-ion batteries are used incorrectly or are not changed on time, the pleasurability of driving and even the safety of passengers suffer. As a result, an efficient and safe battery management system (BMS) is required to monitor the battery's status and characteristics. SOC, as one of the essential BMS assessment indexes [3], may be used to monitor the remaining capacity of the battery and as a characterization to assure the vehicle's steady operation. SOC is commonly described as the ratio of the battery's current remaining capacity to its maximum capacity [4]. SOC cannot be directly measured but can only be calculated by algorithms using observable variables (such as current, voltage, and temperature). Accurate prediction of SOC remains difficult due to the complex dynamic changes inside the battery and the vehicle's increasingly complicated exterior environ-ment, which manifest in factors such as the battery self-discharge rate, power regeneration, and driving conditions [5]. In the literature, the main methods of SOC estimation can be divided into three main categories [6], which are traditional methods, model-based methods, and data-driven methods.
The open-circuit voltage (OCV) method [7] and the ampere-hour integral method [8] are two traditional methods. The ampere-hour integral method, also known as the Coulomb counting method, estimates the battery's state of charge (SOC) by integrating the charging/discharging current at the terminals of the battery during operation, and does not require a battery model. Kong Soon Ng [9] proposed an intelligent estimation method for lithium-ion battery SOC and SOH based on Coulomb counting. Although the amperehour integral method is simple to implement and produces direct results, it requires high sensor accuracy and that the working conditions of the battery do not change significantly. However, in the actual operation process, the working conditions of the electric vehicle are more complicated, making it impossible for the current to reach a continuous and stable state. In addition, the initial SOC of the battery is also directly linked to the results, so the means to obtain the initial value of SOC is also a problem that has always existed in this method. The open-circuit voltage (OCV) method [7] is a direct calculation based on the relationship between the battery's SOC and its OCV. Although this process is very simple, to correctly measure the OCV of the battery, it is necessary to stand the battery for some time to make the internal electrolyte evenly distributed, which makes the open circuit voltage method unsuitable for online applications.
Model-based estimation methods are also often used in battery SOC prediction. Common methods include EM (electrochemical model), EIM (electrochemical impedance model), and ECM (equivalent circuit model). These methods are based on the reaction principle inside the battery and use mathematical formulas and electrical components to simulate the battery. These methods require the modeler to have a strong knowledge of the battery, and when the accuracy is improved, the parameters of the battery model also increase. In addition, in order to improve the nonlinear and dynamic performance of the model, researchers often combine the nonlinear observer with the battery model. The commonly used methods are the Kalman filtering method and its improved algorithm [10]. In this method, the researchers regard the battery as a whole, without considering the internal reaction principle of the battery. According to the error between the terminal voltage of the model and the measured voltage, the difference is fed back to the predicted value of the SoC through a gain matrix. After multiple recursions, the voltage output of the model is changed to minimize the voltage error. However, this technology is very dependent on the accuracy of the circuit model, which will directly lead to the increased complexity of the algorithm. In the paper [11], a battery SOC prediction method based on the autocovariance least-squares technique and an unscented Kalman filter is proposed. The model is constructed using the state space model of the RC equivalent circuit, and then the SOC prediction model is modified by using the improved unscented Kalman filter of autocovariance least-squares. Zou et al. [12] used partial differential equations to represent the dynamics of batteries under various operating conditions. The Kalman filter is an iterative algorithm. Its final accuracy depends on the accuracy of the battery model. At the same time, the equivalent circuit and observer need to be re-established for different batteries. These problems are the shortcomings of the model-based method.
In recent years, with the development of machine learning, the data-driven SOC prediction method has gradually attracted the attention of researchers. The data-based estimation method refers to the method of automatically learning some relationship between the battery and SOC by measuring the characteristics of the battery related to SOC prediction, such as current, voltage, temperature and internal resistance, etc. The data-driven method has excellent nonlinear modeling capabilities, and data-based modeling does not require the establishment of complex circuit models, which greatly reduces the complexity of the model. The commonly used data-driven methods for SoC estimation include SVM (support vector machine)-, ELM (extreme learning machine)-, RF (random forest)-, and neural network (ANN)-based methods. In [13], the least-squares method is used to reduce the dimension of high-dimensional data for input data, and then the parameters of SVM are optimized by differential evolution and weighted regularization. Finally, the performance of the prediction model is evaluated by cross-validation, and the SoC prediction problem is transformed into a nonlinear regression problem. Li et al. [14] created a SOC prediction approach based on a random forest (RF) algorithm and conducted experiments with varying discharge currents. LIU [15] used principal component analysis (PSO) and particle swarm optimization to improve the accuracy and robustness of BPNN (back propagation neural network) and then applied the proposed algorithm to the SOC prediction of the battery. The final model achieved good results under different working conditions. Although the commonly used machine learning methods are widely used, they also have certain limitations. For example, when there are many variables in the training data and the relationship between the data is more complex, the model cannot achieve the desired effect.
With the continuous development of cloud computing and computer technology, more and more methods based on artificial neural networks have attracted people's attention. Among them, RNN (recurrent neural network) and its variants are the most commonly used prediction methods. Compared with traditional neural networks, RNN is more suitable for battery SOC prediction. Because the SOC of the current battery has a strong correlation with the previous SOC, the unique 'gate' structure of RNN can save the historical data of the battery for the next SOC prediction. Therefore, RNN can play a better role in battery SOC prediction. Chao et al. [16] applied RNN to the SOC and SOH prediction of lithium-ion batteries and validated the comparison between RNN and other methods across several data sets. The article [17] used GRU (gate recurrent unit)-RNN for battery SOC prediction, obtaining good results using only the measured current, voltage, and temperature. Yang et al. [18] successfully employed LSTM to SOC prediction to better determine the time series characteristics of features.
Although the traditional RNN and its improved network can make use of the time connections between input data, they cannot obtain the correlation between features well when facing high-dimensional input data. At the same time, the standard model does not distinguish input features and assumes that all inputs contribute equally to the prediction results. As a result, in order to address the aforementioned two issues, this study develops a CONV-LSTM time series prediction approach with an attention mechanism. The suggested model can not only obtain the correlation between features automatically, but it can also automatically assign different attention coefficients based on the relevance of input features to output features. This paper's innovations are as follows: (1) A new CONV-LSTM deep prediction network structure is suggested and applied to multitime series data prediction tasks based on convolution and LSTM networks. (2) For more accurate prediction, the spatial attention mechanism is introduced to the model, which may choose the relevant input features in each time step. (3) The suggested model is evaluated using two actual multivariate time series data sets to demonstrate its usefulness.
The remainder of the article is organized as follows. The relevant theoretical information is introduced in Section 2. Section 3 goes over the proposed method in depth. Section 4 is dedicated to experimental validation and analysis. Finally, Section 5 summarizes the article's conclusion.

LSTM Neural Network
Long short-term memory recurrent neural network (LSTM) is a popular time series modeling approach. Hochreiter [19] proposed it in 1997 to answer the problem of recurrent neural network (RNN) gradient disappearance and explosion. The LSTM network is completed by adding a 'gating mechanism' to the present input and historical data memory filter, as shown in Figure 1. A single LSTM unit operation formula can be described as follows [20]:

Attention Mechanism
The attention mechanism has been widely discussed and studied in recent years. It works similarly to the human visual system. When the network is trained, it can focus more on the input that is most related to the prediction target and suppress the unrelated content. At present, the attention mechanism is widely used in image classification [21], speech recognition [22], and machine vision [23], and people also extend the attention mechanism to data processing. DARNN [24] integrates the attention mechanism with the LSTM network and assigns different attention coefficients to the input results based on their contribution, boosting the LSTM's accuracy and resilience.
The working principle of the attention mechanism is shown in Figure 2. Suppose N input sequences  are given, where k is the dimension of the input sequence and N is the length of the data. To select the most favorable k i x , [1, ] i N  for the target, the self-attention distribution coefficient is calculated using the input data k x and the task-related query set vector q .
Among them, v , e W , e U and e b are parameters that can be learned with the overall network.
Their detailed meanings are described in Section 3.

AT-CONVLSTM
CNN-LSTM has been used as a hybrid model in traffic flow prediction [25], weather prediction [26], and other applications. The model combines CNN's local feature extraction ability with LSTM's temporal feature extraction ability, which improves the model's spatial-temporal extraction capacity to some level. However, it solely analyzes the spatial information between the input characteristics within each time interval and overlooks the transfer of temporal-spatial information. As a result, this research introduces AT-CON-VLSTM, a new attention-based spatial-temporal information extraction network. Figure  3 depicts the AT-CONVLSTM schematic.
The AT-CONVLSTM network consists of a spatial attention module and a Conv-LSTM module. At each time step, the spatial attention module can assign different attention coefficients to the input data features and alter the weights of distinct input features adaptively. The Conv-LSTM module improves the LSTM network's spatio-temporal feature extraction ability for input data in each time step by utilizing the powerful local feature extraction ability of the convolutional network. The following sections provide an overview of the model's many modules.

Spatial Attention Module
The purpose of this module is to enable the network to adaptively assign different attention coefficients to the input data features of each time step, by which the weights of different input features can be adjusted. Figure 4 is a schematic spatial attention module. It is assumed that the input data flow of each time step model is 1 1 ( , , , ) In Equation (4) x is the input data stream at time t. In formula (5), t  is the score function t s obtained by Softmax function operation, and its purpose is to make the sum of the function be 1. The parameters in the above formula are updated using the backpropagation of the model. The resulting attention coefficient is the spatial attention coefficient, which is multiplied by the input variable t x to obtain  x(t ) . The calculation formula is shown in Formula (6): The attention module is a feedforward neural network that may complete parameter optimization adaptively with the training of other modules in the model, applying varying degrees of attention to the input data.

ConvLSTM Modules
The principle of ConvLSTM is shown in Figure 5. The ConvLSTM network replaces the matrix operation in traditional LSTM with convolution operation and uses the feature extraction ability of the convolution network to extract the coupling information of input and hidden state, in order to better improve the spatial feature extraction ability of LSTM.
. The specific formulas of the module are shown in (4) to (9): In the above formula,  ( ) t x represents the input data weighted by the spatial attention coefficient, * represents the convolution operation, and the remaining parameters have the same meaning as Formula (1). Formula (7) represents the calculation method of the "forgetting gate" in the proposed model. Firstly,

Experimental Results and Analysis
In this part, we performed two experiments. The RUL prediction experiment of the turbo engine is to illustrate the effectiveness of the proposed AT-CONVLSTM model, which is then applied to the SOC prediction of vehicle battery data [27]. The RUL prediction of the turbofan engine uses the CMAPSS data set published by NASA [28]. The data set is divided into four subdata sets according to the engine failure mechanism and operating conditions. Each training set and test set is taken from a certain point in the life cycle to the end of life. Under the initial conditions of the engine, the degree of mechanical wear is low, which is defined as being in a healthy state. When a fault occurs at a certain point, the performance of the engine will gradually decrease with time, and the current corresponding RUL value is marked at each moment in the data set. The specific information of the data set is shown in Table 1; each data set records the changes of 24 aeroengine variables in each flight cycle, of which 21 are engine sensor measurement variables and 3 are working condition variables. The driving data of the vehicle are derived from the actual driving data of the electric vehicle. The data are all from NEDC (New European driving cycle), with a maximum speed of 120 km/h and an average speed of 36.1 km/h. The experimental acquisition interval is 0.3 s, and the initial SOC of the vehicle is 70%. The data are composed of 13 variables such as voltage, current, temperature and speed, and record the running state of the vehicle. All experiments are completed under the processor Inter Core i7-6700 model and 32 G RAM Windows 10 operating system. In addition, the tool used was "python 3.6.13", with "PyTorch 1.9.1" as the backend. To better obtain the optimal solution in the training process, this paper adopts the mechanism of 'early termination'. Specifically, 20% is randomly selected from the training data set as the verification set, and the model parameters are updated according to the error of the verification set during the training process. In addition, extensive experiments are conducted to obtain the appropriate parameters, and a grid search strategy is used to determine the appropriate hyperparameters.

Evaluation Metrics
In order to quantitatively analyze the superiority of the proposed algorithm compared with other algorithms, the article uses root-mean-square error (RMSE) and score function to undertake the performance evaluation of RUL, where the score function is proposed by the 2008 prognosis and health management (PHM) data challenge [27]. This is frequently used as a widely accepted predictor. Similarly, root-mean-square error (RMSE) and mean absolute error (MAE) are utilized as evaluation indicators in vehicle driving data to quantitatively examine the results of the AT-CONVLSTM network and other algorithms. These two error evaluation methods are commonly utilized in regression task evaluation. The RMSE, MAE, and score computation formulas are as follows.
In the above formula, N is the number of samples, n m and  n m are the predicted value and the actual value of the sample f, respectively. Results with smaller RMSE, MAE, and score values are better than results with larger RMSE and score values.

Experiment 1: C-MAPSS
Since some sensor data in the C-MAPSS data set do not change with time, it is considered that they do not contain useful information for RUL prediction. Therefore, we selected and used 14 sensor measurements in subsequent experiments. The serial numbers of these 14 sensors are 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20 and 21, respectively [29]. Figure 6 shows one testing sample in FD001 with data of 14 sensors within a time window of length 60. In order to obtain the best experimental parameters of the comparison test and the proposed method, all experiments have been carried out many times. Finally, the hyperparameters of the related methods are shown in Table 2.  At the same time, considering the impact of randomness on performance, all experiments in this paper were run 10 times, independently, and then the average taken to determine the final result. The final prediction results of each method are shown in Table 3.  Table 3 shows the average error results for the four engine data sets FD001 to FD004 using five different methods. Table 3 shows that MLP has the worse prediction performance across all data sets. Because MLP has a simplistic structure and cannot consider data's temporal relationship, it cannot produce accurate predictions for complex time series data sets. LSTM outperforms MLP in data prediction because it can capture the dynamic character of process data. It can store the nonlinear features extracted by the nonlinear activation function through the loop unit to complete the time extraction function, and then the data is filtered by the 'gate structure '. When there are multiple characteristics in the data, however, LSTM cannot determine the coupling relationship between them. CNN-LSTM can mine the association between features more effectively than LSTM prediction. Conv-LSTM is a replacement for CNN-LSTM that only addresses the extraction of single-step spatial coupling information of data while neglecting the fact that data coupling information may change over time, hence its effect is superior to CNN-LSTM. The Conv-LSTM module in the proposed AT-CONVLSTM model can not only extract the spatial characteristics of the feature in a single step time, but it can also mine the spatial features that change with time adaptively. In addition, the Spatial attention  module can calculate the attention weight of the input variable by hiding the state variable and the current input, which can better identify the input variables related to the RUL prediction, so the method has higher prediction accuracy. Figure 7 shows the prediction results of MLP, LSTM, CNN-LSTM, and the proposed method on the FD003 data set. It can be seen from Figure 5 that the prediction performance based on MLP, LSTM, and CNN-LSTM is poor. Although their prediction results can follow the actual change trend, the change fluctuates greatly. For predictions based on Conv-LSTM, the prediction curve can better track changes in measurements, but there is still a large deviation between the predicted output and the measured output. For the AT-CONVLSTM model, the predicted results track the actual curve well.

Experiment 2: Battery SOC Prediction
We use the proposed AT-CONVLSTM model for SOC prediction of lithium-ion batteries. In the experiment, we found that the total voltage and total current of the battery pack have a high correlation with the voltage and current of the single battery by analyzing the correlation coefficient between the sensors. Therefore, we screened the variables with high correlation. The final input variables are the current, voltage, temperature, average current, average voltage, and vehicle speed of the battery for a specified time. The output variable is the SOC of the battery. After many experiments, the parameters of MLP, LSTM, CNN-LSTM, Conv-LSTM, and AT-CONVLSTM used in the article are shown in Table 4. Considering the influence of randomness on performance, the experiment was also run independently 10 times, and the final result was obtained by averaging. The final prediction results of each method are shown in Table 5. Similarly, Table 5 shows the results of MLP, LSTM, CNN-LSTM, Conv-LSTM, and AT-CONVLSTM network models for SOC prediction of lithium-ion batteries. Similar to the RUL prediction results of the engine, the MLP, LSTM, and CNN-LSTM networks have poor accuracy in the prediction of SOC. Conversely, the Conv-LSTM network can adaptively obtain the coupling information of time and space in the input data and hidden state by using the convolution network, and the effect is better. The proposed AT-CONVLSTM network model applies the attention mechanism to the convolutional longterm and short-term memory network and uses the attention mechanism to integrate the most relevant information into its prediction based on the contribution of the input features to the SOC, so the prediction results are in good agreement with the actual SOC curve of the lithium-ion battery. The detailed results of predicting the lithium-ion battery test data set using MLP, LSTM, CNN-LSTM, Conv-LSTM, and the AT-CONVLSTM network method proposed in this paper are shown in Figure 8.
To show the comparison of the prediction results of different algorithms more intuitively, some of the predicted results are shown as follows. The yellow part is the prediction result, and the blue part is the actual result of the proposed method.

Conclusions
Aiming at the characteristics of a traditional LSTM recurrent neural network that does not consider the relationship between sequences when predicting time series, this paper improves LSTM and integrates CNN and the attention mechanism into the LSTM network to improve the prediction accuracy of the model. The spatial attention module is utilized first to weigh distinct inputs according to their contribution to the output, and then the processed data is sent to the upgraded Conv-LSTM network. In contrast to standard CNN-LSTM networks, the Conv-LSTM network here takes into account, not just the correlation between the input features of each time step, but also the connection between hidden states. Finally, the proposed AT-CONVLSTM is used to estimate the RUL prediction of the aeroengine and the SOC prediction of a lithium-ion battery of the electric vehicle. The final experimental results are obtained by conducting multiple separate experiments. The specific experimental results are higher that on the RUL verification data set of aeroengine FD001, the RMSE of the proposed algorithm is 76.4%, 33.2%, 14.3%, and 4% lower than that of MLP, LSTM, CNN-LSTM, Conv-LSTM, respectively. In the SOC prediction experiment of the electric vehicle lithium-ion battery, the average RMSE of the proposed algorithm is reduced by 178%, 35%, 67%, and 32 %, respectively, compared with MLP, LSTM, CNN-LSTM, Conv-LSTM. The results of different data sets show that the proposed model is superior to MLP, LSTM, CNN-LSTM, and Conv-LSTM.