An Improved Gated Recurrent Unit Network Model for State-of-Charge Estimation of Lithium-Ion Battery

: An accurate state-of-charge (SOC) can not only provide a safe and reliable guarantee for the entirety of equipment but also extend the service life of the battery pack. Given that the chemical reaction inside the lithium-ion battery is a highly nonlinear dynamic system, obtaining an accurate SOC for the battery management system is very challenging. This paper proposed a gated recurrent unit recurrent neural network model with activation function layers (GRU-ATL) to estimate battery SOC. The model used deep learning technology to establish the nonlinear relationship between current, voltage, and temperature measurement signals and battery SOC. Then the online SOC estimation was carried out on di ﬀ erent testing sets using the trained model. The experiments in this paper showed that the GRU-ATL network model could realize online SOC estimation under di ﬀ erent working conditions without relying on an accurate battery model. Compared with the gated recurrent unit recurrent neural (GRU) network model and long short-term memory (LSTM) network model, the GRU-ATL network model had more stable and accurate SOC prediction performance. When the measurement data contained noise, the experimental results showed that the SOC prediction accuracy of GRU-ATL model was 0.1–0.4% higher than the GRU model and 0.3–0.7% higher than the LSTM model. The mean absolute error (MAE) of SOC predicted by the GRU-ATL model was stable in the range of 0.7–1.4%, and root mean square error (RMSE) was stable between 1.2–1.9%. The model still had high prediction accuracy and robustness, which could meet the SOC estimation in complex vehicle working conditions.


Introduction
With the development of science and technology, the world's dependence on energy is increasing year by year. At present, oil and coal are the main non-renewable energy sources, and the emission of waste gas will inevitably cause some serious environmental consequences [1]. To solve these problems, governments of various countries strongly advocate that electric vehicles replace fuel vehicles, helping to reduce the emission of harmful gases. Lithium-ion batteries have made considerable progress in the past decade [2]. Compared with other chemical materials, lithium batteries have the advantages of high energy density, long life, and high power. They have been widely used in mobile phones, computers, electric vehicles, and satellites [3]. Because the chemical reaction inside the battery is a long short-term memory (BILSTM) neural network to predict battery SOC. However, the disadvantage of the BILSTM network is that the decoding accuracy is seriously affected when the decoding begins without enough input sequence information. Vidal et al. [25] proposed a deep feed-forward neural network (FNN) approach for battery SOC estimation. The method still had a good SOC estimation performance when the measurement signal was added with errors. But this method needed to calculate the average value of the measured signal in a certain time step, and the average value in different time periods had a certain influence on the prediction results. Yang et al. [26] used the gated recurrent unit recurrent neural network (GRU-RNN) to predict the battery SOC using the measured current, voltage, and temperature signals. Jiao et al. [27] used the momentum gradient to optimize the weight of the network to improve the prediction accuracy of the GRU network model. In addition, some methods have combined neural networks and filter algorithms that also obtained accurate SOC estimation. Dong et al. [28] proposed a hybrid model of the wavelet neural network and particle filter algorithm and estimated the energy state of a battery under different working conditions. Tian et al. [29] proposed a method combining LSTM with an adaptive cubature Kalman filter for battery SOC estimation, and realized the accurate estimation of battery SOC with less data. Although this type of method can reduce the SOC estimation error to some extent, the final SOC estimation value excessively depends on the SOC value estimated by deep learning or neural network. In order to achieve good SOC estimation results, it also needs a lot of experiments to adjust the process noise covariance and measurement noise covariance.

The Contributions
Inspired by reference [25], this paper proposed a GRU-RNN network model with an activation function layer (GRU-ATL) for battery SOC estimation. This model mainly added a Tanh layer, a leaky rectified linear unit (ReLU) layer, and a clipped rectified linear unit (ReLU) layer to GRU-RNN network model. The relationship between current, voltage, temperature, and SOC was established in the training set. Then the online SOC estimation was carried out on different testing sets using the trained model. The main contributions of this paper are as follows: (1) Deep learning technology can solve the problem of low prediction accuracy of battery SOC by a traditional neural network. This technology does not need to establish an accurate battery equivalent circuit model. It can simplify the tedious parameter adjustment process based on the model method, and greatly save the time needed in the whole process of SOC estimation; (2) Compared with an LSTM network, a GRU network structure has the advantages of fewer parameters and a simple structure. It can save a lot of training and prediction time on the premise of ensuring the prediction accuracy of the model. Compared with FNN, LSTM, and GRU network models, it was found that the proposed model can obtain more accurate and stable SOC estimation results in different operating conditions. The Tanh activation function is a saturated activation function, which can enhance the nonlinear learning ability of neural network. The leaky ReLU and the clipped ReLU are unsaturated activation functions, which can solve the problem of gradient disappearance encountered in the neural network. The output of the neural network was limited to a certain area, which improved the prediction performance of the model. Adding the above three activation function layers in the GRU-RNN network can improve the prediction accuracy and robustness of the model; (3) The SOC prediction performance of LSTM, GRU, and GRU-ATL network models was compared when the measurement signals contain Gaussian noise and non-Gaussian noise. The experimental results showed that the SOC prediction accuracy of GRU-ATL model was 0.1-0.4% higher than GRU model and 0.3-0.7% higher than LSTM model; (4) The battery model obtained by the deep learning network can be used for SOC online prediction in different temperature working conditions. The GRU-ATL model still had high prediction the experiment and the evaluation index of the model. The fourth part discusses and analyzes in detail the SOC prediction performance of the GRU-ATL network model under various operating conditions. The fifth part is the conclusion of this paper.

GRU Structural Unit
A GRU network was formed by adding a gating mechanism to a simple RNN network, which was used to control the transmission of information in the neural network. A GRU network could effectively capture the dependence of large step in time series, and solve the problem of gradient attenuation or explosion in long-term memory and back-propagation [30]. Compared with an LSTM network, the GRU network had the advantages of fewer parameters, a simpler structure, and higher computational efficiency, which were suitable for building larger networks [31,32]. At present, GRU networks have been proven effective in some application scenarios. A GRU network was mainly composed of the reset gate and update gate units. Its structural unit is shown in Figure 1.   Figure 1 shows that the input of the reset and update gates of the gating unit were the hidden state Ht−1 and the input Xt, and the output was calculated by the Sigmoid activation function. Suppose that the number of hidden cells is h, the small-batch input of time step t is Xt ∈ ℝ n×d (the length of sample time is n, the number of eigenvectors is d) and the hidden state of the previous time step is Ht−1 ∈ ℝ n×h . The calculation of reset gate Rt ∈ ℝ n×h and update gate Zt ∈ ℝ n×h is as follows: where Wxr, Wxz ∈ ℝ d×h , and Whr, Whz ∈ ℝ h×h are network weight parameters. br and bz are network bias parameters. σgru is a Sigmoid activation function. Its main function is to convert the values in both gates to 0-1. The candidate hidden state Ht* ∈ ℝ n×h was obtained by a series of operations between the output of the reset gate and the hidden state Ht−1 of the previous time step. When the element value in the reset gate was close to 0, then the hidden state Ht−1 was discarded. When the element value was close to 1, the hidden state Ht−1 was retained. The calculation method is as follows: where Wxh ∈ ℝ d×h and Whh ∈ ℝ h×h are the network weight parameters and bh is the network bias parameter. The function of the Tanh activation function was to convert all element ranges into between [−1, 1].
In an LSTM network, the input and forget gates are complementary and have certain  Figure 1 shows that the input of the reset and update gates of the gating unit were the hidden state H t−1 and the input X t , and the output was calculated by the Sigmoid activation function. Suppose that the number of hidden cells is h, the small-batch input of time step t is X t ∈ R n×d (the length of sample time is n, the number of eigenvectors is d) and the hidden state of the previous time step is H t−1 ∈ R n×h . The calculation of reset gate R t ∈ R n×h and update gate Z t ∈ R n×h is as follows: where W xr , W xz ∈ R d×h , and W hr , W hz ∈ R h×h are network weight parameters. b r and b z are network bias parameters. σ gru is a Sigmoid activation function. Its main function is to convert the values in both gates to 0-1. The candidate hidden state H t * ∈ R n×h was obtained by a series of operations between the output of the reset gate and the hidden state H t−1 of the previous time step. When the element value in the reset gate was close to 0, then the hidden state H t−1 was discarded. When the element value was close to 1, the hidden state H t−1 was retained. The calculation method is as follows: where W xh ∈ R d×h and W hh ∈ R h×h are the network weight parameters and b h is the network bias parameter. The function of the Tanh activation function was to convert all element ranges into between [−1, 1].
In an LSTM network, the input and forget gates are complementary and have certain redundancy. By contrast, a GRU network directly uses an update gate to control the balance between the input and forgetting. The hidden state H t ∈ R n×h of the current time moment is obtained by combining the hidden state H t−1 of the previous time moment and the candidate hidden state H t * of the current time through the update gate Z t of the current time step.
Equations (1)- (3) show that when Z t = 0 and R t = 1, a GRU network will degenerate into a simple RNN network. When Z t = 0 and R t = 0, the current state H t is only related to the current input X t , and is not involved with the historical state H t−1 . When Z t = 1, the current state H t is equal to the previous hidden state H t−1 .

SOC Estimation Based on GRU-ATL Network
As shown in Figure 2, a GRU-RNN [8,26] network model consists of a sequence input layer, a GRU network layer, a fully connected layer with neurons, a fully connected layer without neurons, and a regression output layer. The input of the GRU network model is composed of current, voltage, and temperature measurement signals, and the output is the battery SOC at the current time, namely x t = [I t , V t , T t ], y t = SOC k . In the training set, the nonlinear relationship between current, voltage, and temperature and SOC was established. The prediction accuracy of the model was verified in the test set. Inspired by reference [25], this paper proposed a GRU-RNN network model with activation function layer (GRU-ATL) for battery SOC estimation. As shown in Figure 2, this model mainly added a Tanh layer, a leaky ReLU layer, and a clipped ReLU layer to the GRU-RNN network model.
Equations (1)- (3) show that when Zt = 0 and Rt = 1, a GRU network will degenerate into a simple RNN network. When Zt = 0 and Rt = 0, the current state Ht is only related to the current input Xt, and is not involved with the historical state Ht−1. When Zt = 1, the current state Ht is equal to the previous hidden state Ht−1.

SOC Estimation Based on GRU-ATL Network
As shown in Figure 2, a GRU-RNN [8,26] network model consists of a sequence input layer, a GRU network layer, a fully connected layer with neurons, a fully connected layer without neurons, and a regression output layer. The input of the GRU network model is composed of current, voltage, and temperature measurement signals, and the output is the battery SOC at the current time, namely xt = [It, Vt, Tt], yt = SOCk. In the training set, the nonlinear relationship between current, voltage, and temperature and SOC was established. The prediction accuracy of the model was verified in the test set. Inspired by reference [25], this paper proposed a GRU-RNN network model with activation function layer (GRU-ATL) for battery SOC estimation. As shown in Figure 2, this model mainly added a Tanh layer, a leaky ReLU layer, and a clipped ReLU layer to the GRU-RNN network model. The leaky ReLU layer played the role of execution threshold. It mainly multiplied any input value less than zero by a fixed factor coefficient [33], and its expression is as follows:  The leaky ReLU layer played the role of execution threshold. It mainly multiplied any input value less than zero by a fixed factor coefficient [33], and its expression is as follows: where scale is the factor coefficient when the input x is negative. The clipped ReLU layer also performed threshold operation. It mainly set any input value less than zero to zero, and any value higher than the clipping upper limit as the clipping upper limit [34]. The expression is as follows: where T C is the threshold of the clipped ReLU layer.

Selection of Other Parameters in Network
In deep learning networks, the common optimizers include the adaptive moment estimation (Adam) optimizer, stochastic gradient descent (SGD) optimizer, adaptive gradient (Adagrad) optimizer, root mean square prop (RMSProp) optimizer, and batch gradient descent (BGD) optimizer. The Adam optimizer was proposed by Kingma and Ba, and it combines the advantages of momentum and RMSProp [35]. The principle and implementation of the algorithm are simple, and the memory requirement is low. The updating of parameters is not affected by the scaling change of gradient. The super parameters have good interpretability, and the learning rate can be automatically adjusted without adjustment or little fine-tuning. The Adam optimizer has achieved good results in practical application, and it is suitable for gradient sparse or gradient noise problems.
The updating equation of the bias parameter θ adam is as follows: where m * t is the bias-corrected first raw moment estimate, v * t is the bias-corrected second raw moment estimate, α adma is the learning rate, and ε is a constant value.
The above formula that Adam algorithm can adjust adaptively from two aspects of gradient mean and gradient square. To prevent calculation errors, the constant value ε is set to 1 × 10 −8 . The convergence speed of the network becomes very small when the α adma is too small. The loss function of the network oscillates and even deviates from the minimum when the α adma is too large. Therefore, the α 0 adma is set as 0.01. The α adma is attenuated once every 250 iterations with a magnification factor of 0.1.
In the deep learning network, if many parameters are to be set and the network model is too complex, the overfitting phenomenon occurs. In other words, the model performs well in the training set but poorly in the actual test set, and the model does not have good generalization ability. Therefore, an L2 regularization algorithm is selected to avoid the overfitting phenomenon. L2 regularization is mainly based on the original loss function plus the sum of squares of weight parameters [36]. The expression is as follows: where, E L2 (θ adam ) is the original loss function, C L2 is the regularization coefficient, C L2 = 0.001, and W is the weight vector.

Battery Data Description
To verify the feasibility of the deep learning network model proposed in this paper, we chose the open-source battery test data set of McMaster University's McMaster Institute for Automotive Research and Technology as the research target [25]. This data set conducted charging and discharging test experiments on LGHG2 batteries under multiple different working conditions, effectively simulating the real driving environment of electric vehicles. Each battery test data contained measurement signals such as time, current, voltage, temperature, and capacity. The various parameters of LGHG2 battery are shown in Table 1. The four standard discharge test conditions mainly include Urban Dynamometer Driving Schedule (UDDS), Highway Fuel Economy Driving Schedule (HWFET), LA92 Dynamometer Driving Schedule (LA92), and Supplemental Federal Test Procedure Driving Schedule (US06). The charging test condition is a fast constant current and constant voltage (CC-CV) charging mode. In other words, the constant current of 3A is used for charging. When the cut-off voltage reaches 4.2 V, the battery is charged in the constant voltage mode until the cut-off charging current is 0.05 A. In addition, the data set includes eight mixed dynamic test conditions, each of which is a random combination of the above four standard dynamic conditions. The test data of −10, 0, 10, 25, and 40 • C were used to verify the feasibility of the model. Figure 3 shows the test data of LA92 dynamic test conditions. The figure shows a highly nonlinear relationship between the battery capacity and the three measurement signals of current, voltage, and temperature during the discharge process, and the discharge capacity at 25 • C is significantly greater than that at −10 and 0 • C. This finding shows that with the decrease of temperature, the chemical reaction inside the battery became slow, and the actual capacity value gradually decreased. It is worth noting that some aging phenomena appear in the battery after several charge-discharge cycle tests. In the original dynamic working condition data set, the time unit of recording data was 0.1 s, and in the charging working condition, the time unit of recording data was 60 s. To reduce calculation costs, the data set recorded every 1 s in the dynamic test conditions was selected as the research object of this article.

Data Processing Process and Evaluation Indicators
As shown in Figure 3, the measured signals (current, voltage, and temperature) in the data set fluctuated greatly, which affected the results of data analysis. Normalizing the data could not only speed up the learning speed of the network, but also eliminate the adverse effects caused by singular sample data. In this paper, the mapminmax function was used to normalize the three measurement signals to [0, 1]. The expression is as follows: where x min and x max are the minimum and maximum values in real measurement data; x signal is the real measurement data; and x normal is the normalized data.
Energies 2020, 13, 6366 The output of the GRU-ATL network model was the battery SOC, which was mainly obtained through the time integration of the current. The calculation process is as follows: where SOC(t 0 ) is the initial SOC value; SOC(t) is the SOC value at current time t; η is the coulomb rate, η = 1; I t is the current flowing through the battery at time t; and C N is the actual capacity value of each working condition. The data set was divided into a training set and five test sets. The GRU-ATL network model was used to obtain a training model on the training set. Then, the model was used to perform simulation verification on the test sets. It is worth noting that some conditions did not complete the charge-discharge experiment. For example, at 25 • C, the discharge time of HWFET was only 600 s. Therefore, this part of the data set was not added to the training set and test set. All the simulation experiments were carried out in the CPU simulation environment of MATLAB, and the computer processor was an Intel Core i5-7400 CPU@3.00 GHz with 8 GB memory. In each case, the experiment was carried out three times, and the average value of the three times was calculated as the final result. Finally, mean absolute error (MAE) and root mean square error (RMSE) were introduced to evaluate the prediction performance of the GRU-ATL network model: Energies 2020, 13, 6366 where MAE soc is the mean absolute error of SOC; RMSE soc is the root mean square error of SOC; SOC Acture is the actual SOC value; SOC Estimated is the predicted SOC value; and n is the sample size.

SOC Estimation Results of Four Network Models
In this experiment, three network models (FNN [25], LSTM, and GRU) were introduced to compare with the GRU-ATL network model. The training set was composed of eight mixed dynamic test conditions, CC-CV charging conditions, and four standard dynamic test conditions. The test set was composed of CC-CV charging conditions and four standard dynamic test conditions. Notably, the number of training sets was one and the number of test sets was five. The detailed division of the training set and test set is shown in Table 2. A large number of simulation experiments showed that the number of iterations had a certain impact on the SOC prediction accuracy of the four models. But this is beyond the scope of this article. According to reference [25], the maximum number of iterations in the FNN network model was 5500. In order to save computation while ensuring the accuracy, the maximum iteration times of the other three network models were set to 2000. The other parameters in the four network models were the same, and the number of neurons in each layer was 55.  shows that when the current, voltage, and temperature characteristics at time t were used to predict the SOC value at time t, the FNN network model had the worst prediction result, and the other three network models could effectively track the real SOC curve. Although the FNN network model needed the shortest time in the whole prediction process, the information in the network structure propagated in one direction, and there was no reverse information transmission. It led to the suggestion that the battery model could not reflect the relationship between the measurement signal and SOC very well. Therefore, the SOC error of the FNN network model was very large. From the SOC error curve of each working condition, we could see that the SOC error of the LSTM network model was larger than those of the GRU and GRU-ATL network models. The reason for this finding is that in the formula of network structure, the number of parameters of LSTM was four times that of simple RNN [37]. If the number of parameters was too much, the overfitting phenomenon would occur. However, GRU had only two gate switches, and the number of parameters was three times that of simple RNN [38]. Therefore, it could reduce the overfitting phenomenon, improve the prediction accuracy of the model, and save a considerable training time. The results in Table 3 show that the estimated SOC value of the GRU-ATL network model was more accurate than that of the GRU network structure under most dynamic conditions. Under the UDDS condition of 0 • C, the MAE and RMSE of the GRU network model were 1.2% and 2.3% respectively. In the UDDS and LA92 dynamic conditions, the GRU-ATL network model achieved good estimation results under different temperatures. Its MAE was less than 0.9%, and its RMSE was less than 1.2%. Owing to the large discharge current in US06 conditions, the chemical reaction inside the battery was severe, resulting in the larger error result of SOC compared with the previous two conditions. However, its MAE was also within 1.5%, and its RMSE was within 1.9%. The SOC prediction results of GRU model were more accurate than those of the GRU-ATL network model in the LA92 condition of 10 • C and US06 condition of 0 • C. It was mainly because the difference of SOC prediction accuracy between GRU network model and GRU-ATL was not very obvious, and the SOC prediction results of GRU-ATL were inevitably lower than those of the GRU model in different time periods. However, the SOC prediction results of GRU-ATL were more accurate than those of GRU in the whole testing set at each temperature. In CC-CV conditions, the SOC prediction results of GRU-ATL model were more accurate than the other three models. But the SOC error results of the four network models were relatively large. The main reason is that the data acquisition unit was 60 s in CC-CV conditions, which did not provide enough data for network training. Through the analysis of the above results, the GRU-ATL network model had the highest accuracy and most stable SOC estimation among the four network models. This shows that the Tanh activation function, leaky ReLU activation function, and clipped ReLU activation function are helpful to improve the prediction accuracy and stability of GRU network model. the GRU-ATL network model achieved good estimation results under different temperatures. Its MAE was less than 0.9%, and its RMSE was less than 1.2%. Owing to the large discharge current in US06 conditions, the chemical reaction inside the battery was severe, resulting in the larger error result of SOC compared with the previous two conditions. However, its MAE was also within 1.5%, and its RMSE was within 1.9%. The SOC prediction results of GRU model were more accurate than those of the GRU-ATL network model in the LA92 condition of 10 °C and US06 condition of 0 °C. It was mainly because the difference of SOC prediction accuracy between GRU network model and GRU-ATL was not very obvious, and the SOC prediction results of GRU-ATL were inevitably lower than those of the GRU model in different time periods. However, the SOC prediction results of GRU-ATL were more accurate than those of GRU in the whole testing set at each temperature. In CC-CV conditions, the SOC prediction results of GRU-ATL model were more accurate than the other three models. But the SOC error results of the four network models were relatively large. The main reason is that the data acquisition unit was 60 s in CC-CV conditions, which did not provide enough data for network training. Through the analysis of the above results, the GRU-ATL network model had the highest accuracy and most stable SOC estimation among the four network models. This shows that the Tanh activation function, leaky ReLU activation function, and clipped ReLU activation function are helpful to improve the prediction accuracy and stability of GRU network model.  conditions, the chemical reaction inside the battery was severe, resulting in the larger error result of SOC compared with the previous two conditions. However, its MAE was also within 1.5%, and its RMSE was within 1.9%. The SOC prediction results of GRU model were more accurate than those of the GRU-ATL network model in the LA92 condition of 10 °C and US06 condition of 0 °C. It was mainly because the difference of SOC prediction accuracy between GRU network model and GRU-ATL was not very obvious, and the SOC prediction results of GRU-ATL were inevitably lower than those of the GRU model in different time periods. However, the SOC prediction results of GRU-ATL were more accurate than those of GRU in the whole testing set at each temperature. In CC-CV conditions, the SOC prediction results of GRU-ATL model were more accurate than the other three models. But the SOC error results of the four network models were relatively large. The main reason is that the data acquisition unit was 60 s in CC-CV conditions, which did not provide enough data for network training. Through the analysis of the above results, the GRU-ATL network model had the highest accuracy and most stable SOC estimation among the four network models. This shows that the Tanh activation function, leaky ReLU activation function, and clipped ReLU activation function are helpful to improve the prediction accuracy and stability of GRU network model.

SOC Estimation Results under Unknown Conditions
In the process of driving, vehicles encounter a variety of complex working conditions. Therefore, to verify the prediction effect of the GRU-ATL network under unknown conditions, the training set used in this part was composed of eight mixed dynamic test conditions and CC-CV charging conditions. The testing sets consisted of CC-CV charging conditions and four standard dynamic test conditions. In other words, the testing sets were not added to the training set to train together. In addition, this part mainly discusses the influence of the number of neurons in the two network layers on the prediction results. The number of neurons in the two network layers had a certain influence on the final prediction results. However, no scientific and effective method to effectively select the number of neurons has been developed. The swarm intelligence optimization algorithm or evolutionary algorithm can find the best number of neurons in the case of minimal data to achieve the best prediction performance of the model. However, these algorithms need large amounts of calculation in some cases with large amounts of data; thus, they are not suitable for this paper. We then used three cases as examples to analyze the influence of the number of neurons on the accuracy of the model. In addition to the two evaluation indexes introduced in Section 3.2, the training time was also used as the evaluation index of the model. Case 1 indicated that the number of neurons in the GRU layer (N GRU ) was 55, and the number of neurons in the FC layer (N FC ) increased according to the number in Table 4. Case 2 indicated that the N FC in the FC layer was 55, and the N GRU in the GRU layer increased according to the number in Table 4. Case 3 indicated that the N GRU and N FC in both layers increased according to the number in Table 4.
Under the 25 • C condition, the indexes under three conditions are shown in Figure 6, and the statistical results of each index are listed in Table 4. Figure 6a,b shows that with the increase of N GRU and N FC , the estimation accuracy of SOC had a certain degree of improvement. When the number of neurons increased to a certain value, the estimation accuracy tended to decline. However, no mathematical relationship was found between the number of neurons and the accuracy of the model. Table 4 shows that in cases 2 and 3, with the increase of N GRU in the GRU layer, SOC estimation in some cases did not achieve ideal results. The reason for this finding was that too many neurons resulted in the underfitting phenomenon in the model, eventually leading to the failure of the GRU-ATL network model in SOC prediction. As can be seen from Figure 6c, under the conditions of cases 2 and 3, the training time doubled with the increase of N GRU in the GRU layer. By contrast, the training time did not increase significantly with the increase of N FC in the FC layer. Under the 25 °C condition, the indexes under three conditions are shown in Figure 6, and the statistical results of each index are listed in Table 4. Figure 6a,b shows that with the increase of NGRU and NFC, the estimation accuracy of SOC had a certain degree of improvement. When the number of neurons increased to a certain value, the estimation accuracy tended to decline. However, no mathematical relationship was found between the number of neurons and the accuracy of the model. Table 4 shows that in cases 2 and 3, with the increase of NGRU in the GRU layer, SOC estimation in some cases did not achieve ideal results. The reason for this finding was that too many neurons resulted in the underfitting phenomenon in the model, eventually leading to the failure of the GRU-ATL network model in SOC prediction. As can be seen from Figure 6c, under the conditions of cases 2 and 3, the training time doubled with the increase of NGRU in the GRU layer. By contrast, the training time did not increase significantly with the increase of NFC in the FC layer. A large number of simulation experiments showed that when the NGRU in the GRU layer is 55 and the NFC in the FC layer was 160, the proposed network model had higher accuracy for SOC estimation and saved considerable training time. Figure 7 shows the SOC estimation results at 25 °C. According to the enlarged layout of the three dynamic conditions in Figure 7c-e, the predicted SOC curve could effectively track the real SOC curve. Table 5 shows the statistical results of SOC error at five temperatures. At 0, 10, 25, and 40 °C, the MAE of SOC error was within 1.1%, and the RMSE was within 1.5%. At −10 °C, the error index of SOC was relatively high, the MAE was 1.271%, and the RMSE is 2.005%. The main reason for this finding is that the amount of test data at −10 °C was small, and the chemical reaction of the battery at low temperature also had a certain effect on the quality of the data set. From the analysis of the above experimental results, the GRU-ATL network model could obtain accurate SOC estimation results under unknown conditions.  A large number of simulation experiments showed that when the N GRU in the GRU layer is 55 and the N FC in the FC layer was 160, the proposed network model had higher accuracy for SOC estimation and saved considerable training time. Figure 7 shows the SOC estimation results at 25 • C. According to the enlarged layout of the three dynamic conditions in Figure 7c-e, the predicted SOC curve could effectively track the real SOC curve. Table 5 shows the statistical results of SOC error at five temperatures. At 0, 10, 25, and 40 • C, the MAE of SOC error was within 1.1%, and the RMSE was within 1.5%. At −10 • C, the error index of SOC was relatively high, the MAE was 1.271%, and the RMSE is 2.005%. The main reason for this finding is that the amount of test data at −10 • C was small, and the chemical reaction of the battery at low temperature also had a certain effect on the quality of the data set. From the analysis of the above experimental results, the GRU-ATL network model could obtain accurate SOC estimation results under unknown conditions.

SOC Estimation Results with Noise
In the working process of a battery management system, owing to the complex and changeable external environment and the poor accuracy of the signal acquisition sensor, the collected current, voltage, and temperature signals usually contain certain measurement errors. Gaussian and non-Gaussian noises were added to the measured signals to test the prediction performance of the GRU-ATL network model.
As shown in Figure 8a, Gaussian distribution noise (Noise 1) is white Gaussian noise with a mean value of 0 and a standard deviation of 0.02. Its calculation formula is as follows: where α Noise1 is the coefficient of standard deviation, α Noise1 = 0.02, and n is the length of prediction data.

SOC Estimation Results with Noise
In the working process of a battery management system, owing to the complex and changeable external environment and the poor accuracy of the signal acquisition sensor, the collected current, voltage, and temperature signals usually contain certain measurement errors. Gaussian and non-Gaussian noises were added to the measured signals to test the prediction performance of the GRU-ATL network model.
As shown in Figure 8a, Gaussian distribution noise (Noise1) is white Gaussian noise with a mean value of 0 and a standard deviation of 0.02. Its calculation formula is as follows:   11 randn 1, Noise Noise n   (12) where αNoise1 is the coefficient of standard deviation, αNoise1 = 0.02, and n is the length of prediction data. As shown in Figure 8b, Noise2 was a uniformly distributed random noise between [0, 0.01]. Noise3 is non-Gaussian noise, which is the sum of Noise1 and Noise2. The expression is as follows: where αNoise2 is the coefficient of uniformly distributed noise, αNoise2 = 0.01.

SOC Estimation Results with Noise
In the working process of a battery management system, owing to the complex and changeable external environment and the poor accuracy of the signal acquisition sensor, the collected current, voltage, and temperature signals usually contain certain measurement errors. Gaussian and non-Gaussian noises were added to the measured signals to test the prediction performance of the GRU-ATL network model.
As shown in Figure 8a, Gaussian distribution noise (Noise1) is white Gaussian noise with a mean value of 0 and a standard deviation of 0.02. Its calculation formula is as follows: where αNoise1 is the coefficient of standard deviation, αNoise1 = 0.02, and n is the length of prediction data. As shown in Figure 8b, Noise2 was a uniformly distributed random noise between [0, 0.01]. Noise3 is non-Gaussian noise, which is the sum of Noise1 and Noise2. The expression is as follows: where αNoise2 is the coefficient of uniformly distributed noise, αNoise2 = 0.01.  As shown in Figure 8b, Noise 2 was a uniformly distributed random noise between [0, 0.01]. Noise 3 is non-Gaussian noise, which is the sum of Noise 1 and Noise 2. The expression is as follows: where α Noise2 is the coefficient of uniformly distributed noise, α Noise2 = 0.01. To further study the influence of measurement signal noise on the prediction accuracy of the model, the current with Noise 1 and Noise 3 was labeled as case 1, the voltage with Noise 1 and Noise 3 was labeled as case 2, the temperature with Noise 1 and Noise 3 was labeled as case 3, and the Energies 2020, 13, 6366 14 of 19 three measurement signals with Noise 1 and Noise 3 were labeled as case 4. The Gaussian noise and non-Gaussian noise of temperature are 25 times higher than those of Noise 1 and Noise 3, respectively. For example, when Noise 1 was 0.05, the Gaussian noise of current was 0.05 A, the Gaussian noise of voltage was 0.05 V, and the Gaussian noise of temperature was 1.25 • C. The training set used in the experiment was the same as that in Section 4.2. The test set was the data set with different noise added to the measured signals at each temperature. That is, the number of training set was 1 and the number of testing set was 40. In the GRU-FC network model, the N GRU in the GRU layer was set to 55, the N FC in the FC layer was set to 160, and the number of iterations was set to 2000. Other parameters were consistent with those in Section 2.3. At the same time, the LSTM network model and the GRU model were introduced for comparison. Figure 9 shows the SOC prediction results of LSTM model with Gaussian noise at 10 • C. Figure 10 shows the SOC prediction results of GRU-ATL model with Gaussian noise at 10 • C. Figure 11 shows the SOC prediction results of a simple GRU model with non-Gaussian noise at 40 • C. Figure 12 shows the SOC prediction results of GRU-ATL model with non-Gaussian noise at 40 • C. As can be seen from Figures 9-12, the SOC prediction curves of three network models well tracked the actual SOC curve in four cases. Table 6 shows the statistical results of SOC error under each condition of the three models. According to the statistical results, the SOC prediction accuracy of GRU-ATL model was 0.1-0.4% higher than the GRU model and 0.3-0.7% higher than the LSTM model. The MAE of the SOC predicted by GRU-ATL model was stable in the range of 0.7-1.4%, and RMSE was stable between 1.2-1.9%. In some cases of −10 • C, the MAE and RMSE of SOC predicted by the GRU model were 1.7% and 2.5%, respectively, and the MAE and RMSE of SOC predicted by LSTM model were 1.8% and 2.7%, respectively. This shows that the SOC prediction results of GRU-ATL model were more stable and accurate at a low temperature. Compared with LSTM and GRU network models, GRU-ATL network model had better prediction accuracy and stronger robustness in unknown conditions with noise. To further study the influence of measurement signal noise on the prediction accuracy of the model, the current with Noise 1 and Noise 3 was labeled as case 1, the voltage with Noise 1 and Noise 3 was labeled as case 2, the temperature with Noise 1 and Noise 3 was labeled as case 3, and the three measurement signals with Noise 1 and Noise 3 were labeled as case 4. The Gaussian noise and non-Gaussian noise of temperature are 25 times higher than those of Noise1 and Noise3, respectively. For example, when Noise1 was 0.05, the Gaussian noise of current was 0.05 A, the Gaussian noise of voltage was 0.05 V, and the Gaussian noise of temperature was 1.25 °C. The training set used in the experiment was the same as that in Section 4.2. The test set was the data set with different noise added to the measured signals at each temperature. That is, the number of training set was 1 and the number of testing set was 40. In the GRU-FC network model, the NGRU in the GRU layer was set to 55, the NFC in the FC layer was set to 160, and the number of iterations was set to 2000. Other parameters were consistent with those in Section 2.3. At the same time, the LSTM network model and the GRU model were introduced for comparison.

Conclusions
This paper proposed a gated recurrent unit recurrent neural network model (GRU-RNN) with an activation function layer (GRU-ATL) for battery SOC estimation. This model mainly added a Tanh layer, a leaky rectified linear unit (ReLU) layer, and a clipped rectified linear unit (ReLU) layer to GRU-RNN network model. The relationship between current, voltage, temperature, and SOC was established in the training set. Then the online SOC estimation was carried out on different testing sets using the trained model. The main work of this paper are as follows: (1) Compared with an LSTM network, a GRU network structure has the advantages of fewer parameters and simple structure. It can save a lot of training and prediction time on the premise of ensuring the prediction accuracy of the model. Compared with FNN, LSTM, and GRU network models, it was found that the proposed model could obtain more accurate and stable SOC estimation results in different operating conditions. Adding the above three activation function layers in the GRU-RNN network could improve the prediction accuracy and robustness of the model; (2) The prediction accuracy of the model could be improved by appropriately increasing the number of neurons in the GRU layer and FC layer. But the excessive number of neurons in the two layers caused an over fitting phenomenon, which affected the SOC estimation accuracy. In order to save computation, the number of neurons in the GRU layer was 2-3 times less than that in the FC layer. A large number of experiments showed that the SOC was more accurate when the number of neurons in the GRU layer was 55 and that of the FC layer was 160. The MAE was less than 1.3% and RMSE was less than 2%; (3) The SOC prediction performance of LSTM, GRU, and GRU-ATL network models was compared when the measurement signals contained Gaussian noise and non-Gaussian noise.
The experimental results showed that the SOC prediction accuracy of GRU-ATL model was 0.1-0.4% higher than the GRU model and 0.3-0.7% higher than the LSTM model. The MAE of SOC predicted by the GRU-ATL model was stable in the range of 0.7-1.4%, and the RMSE was stable between 1.2-1.9%. The SOC prediction results of the GRU-ATL network model were still more accurate and stable at a low temperature; (4) The experiments in this paper showed that the GRU-ATL network model could realize online SOC estimation under different working conditions without relying on an accurate battery model. When the measurement data contained noise, the model still had high prediction accuracy and robustness, which could meet the SOC estimation in complex vehicle working conditions.
The future work was mainly to use the model proposed in this paper to verify SOC estimation on other battery data sets. The SOC value used in the experiment was calculated according to the actual capacity. It was difficult to obtain the actual capacity value due to the aging phenomenon of the battery. Therefore, how to use deep learning networks to predict the actual capacity is also an important task in the future.
Author Contributions: Conceptualization, W.D. and F.X.; methodology, W.D. and C.S.; software, S.P. and S.S.; validation, S.P., S.S. and Y.S.; formal analysis, S.P. and Y.S.; writing-original draft preparation, W.D. and F.X.; writing-review and editing, W.D. and C.S.; project administration, C.S. All authors have read and agreed to the published version of the manuscript.