SOH Estimation of Lithium-Ion Battery Pack Based on Integrated State Information from Cells

Accompanied by the development of new energy resources, lithium-ion batteries have been used widely in various fields. Due to the significant influence of system performance, much attention has been paid to the accurate estimation and prediction about health status of lithium-ion batteries. In a battery pack, the structure connection causes sophisticated interaction between cells, or between the cells and the pack. Therefore, the degradation of any cell is the result of the deterioration of conjoint cells, and a rapid degradation speed for any individual cell can lead to the accelerated degradation of others beyond expectation, which is one of the primary reasons why the State of Health and life cannot be calculated precisely. To solve this problem, a novel method based on integrated state information from cells has been proposed to estimate status of packs, considering about the degradation effect that cells contribute to the corresponding pack. Using this method, the interactive relationship was described in the form of a neural network in order to mine the effect from the inter-degradation between cells. It was proven that the novel method had better performance than a method based only on the degradation indicators from battery packs.


Introduction
As a kind of eco-friendly energy, lithium-ion batteries have been widely used in electronic products, electric vehicles, and various types of aircraft, with advantages, such as stability, high energy density, long lifetimes, environmental protection, and so on. Being part of a power supply device, in practical use, the system performance decline of these battery packs causes accidents without being monitored. Therefore, the accurate assessment of battery health has become a pressing problem of great concern in practical use.
The voltage of the lithium-ion battery cell is relatively lower, so batteries must be used in combination. However, most studies about the state of health (SOH) assessment and prediction of lithium-ion batteries are focused on the modeling and methods for objects that we can monitor directly. It means that we can only assess and predict by direct indicators without indirect factors, such as status estimation of cells [1,2], fault diagnosis tools development [3], remaining useful life (RUL) prediction [4]. For example, in recent studies, the analysis on packs only consider about the indicators captured from the packs, including total current, total voltage. But the problem is that due to the correlation of the electrical connections, the interaction between cells is widespread [5], which means the irregularly degradation process of one cell will cause other cells to deteriorate rapidly [6]. Therefore, the overall performance of a battery pack depends on the comprehensive state of all cells in the structure [7,8]. Based on this premise, the degradation of a battery pack is essentially the result of the interaction of a large number of cells. Each characteristic of the cells or other items in the pack makes sense for assessment and prediction of the pack, which is always researched as dependency between cells and packs in recent studies.
The typical dependency analysis methods that are widely used at present are divided into two classes: the statistical approach represented by Copula [9] and the probability graph model (PGM) [10] represented by a Bayesian network. However, it has been found that there are evident disadvantages for these two classes: • The algorithm complexity of a copula is relatively high [11], especially for high-dimension extension problems, such as analysis for battery packs with a great number of cells [12]. Similarly, linear or nonlinear correlation functions that are used to quantize the interaction cannot be extended easily either [13]. • For PGMs, the Bayesian network and the Markov process take advantage of graph structures [14]. Additionally, due to the complication of electrochemical properties, there are many obstacles to establish chemical reaction models for battery packs.
Hence, a method that can effectively integrate state information of cells for pack evaluation is needed. Therefore, we aim at expressing the health sate of the whole pack with dependency information from the cells for more practical application. Compared with traditional dependency analysis methods mentioned above, in recent years, data-driven methods shows excellent merits in engineering applications [15]. Modified algorithms based on Kalman filters [16], particle filters, support vector regression [17], neural networks [18], and fuzzy logic also have been applied to the SOH analysis of battery cells. Among these methods, neural networks have been developing the most rapidly in recent years, with strong advantages. For example, Mohsen et al. established a novel prediction model of lithium-ion battery cycle life using a feed-forward neural network [19]. Farzaneh et al. used a neural network to achieve the short-term power prediction of a battery [20].
Considering about the most critical step for achieving the expression of the interactive degradation within battery packs is how to make use of the multi-degradation information for the system description, the high-dimension data processing is the most significant problem. In this area, deep learning shows strong advantages [21]. Time series analysis models, such as recurrent neural network (RNN), do well in time-dependent relations model realization [22]. The historical data memory and relationship analysis ability of an RNN can be utilized to screen and retain informative the degradation process from the long-term data. With the help of deep learning, this type of network is expected to achieve a model description of the multi-degradation process and an accurate SOH prediction [23]. At present, neural networks have been increasingly used in SOH prediction and estimation of batteries.
Take advantage of deep learning, in this research, we adopted a data-driven theory to propose a multi-cell degradation information fusion method for battery pack SOH prediction based on deep learning, making full use of the inter-degradation information of the cells. In the proposed method, there are three critical improvements to achieve our goal:

1.
Different with the existing studies, the status of battery pack is the object focused on to realize global assessment result of the power system instead of SOH of cells.

2.
Multi-dimension SOH data of cells from different levels of structure are utilized as input to assess battery packs, instead of only pack indicators, which provide dependency information in interior.

3.
To improve prediction performance, not only SOH information, but also exhaustive temperature data of battery cells is captured to offer status information of battery cells.
To verify the performance and feasibility of the novel method, the experiments were carried out for battery packs. In the experiment, the electrical indicators of cells and packs were acquired as input for the model established based on long-short term memory (LSTM) basic structure and they were used to determine the interaction between the cells for SOH prediction.
The main content of this paper is arranged as follows: Section 2 briefly introduces the basic method theories involved in this paper. Based on the basic theories, the main ideas of the technical method proposed in this paper is explained in Section 2. As described in Section 3, for the demands of SOH data collection, the cycle experiments of battery packs were designed for data acquisition. The data for typical battery packs were preliminary analyzed for subsequent research and the validation of the method. In Section 4, the data set collected in the experiments mentioned in Section 4 is used to verify the feasibility and advantages of the novel method. Section 5 summarizes the entire paper.

Indicators of Battery
To describe the deterioration process, the electrical parameters during charge and discharge were mostly used generally. For cells, there were numerous indicators used to monitor the status of the batteries, such as the discharge current, cut-off voltage, and open-circuit voltage. The most common assessment indicator of batteries is capacity, which is measured by calculating the electric quantity. The electric quantity of battery is always estimated by ampere-hour method. According to ampere-hour method, the capacity is calculated as: where Q represents the current capacity, I is the discharge current, and η is the discharge coefficient.
Here are three commonly used: SOH, state of charge (SOC) and depth of discharge (DOD). SOC and DOD are used to assessment the ability at time t under one discharge cycle. DOD is the released electric quantity on a given cycle, expressed as a percentage of the total capacity of the battery: SOC is the verse of DOD, which is defined as [24]: When the deterioration of battery is not considered, Q rated is constant. However, in most cases, the rated capacity of battery decreased under charge and discharge cycles. SOH is used to assess the extent of the reduction of rated capacity. The unit of SOH is percent, and 100% means it is a fresh battery [25]. Here we define the failure threshold of battery is 80% initial capacity. If Q 0 is assumed to be the initial capacity, then the SOH of a battery is defined by: In our research, SOH is utilized to present the evolution of different cycles in order to assess health status of cells and battery packs.

Neural Network Structure
Deep learning is an important branch of machine learning technology that is established based on a neural network model. As discussed in the previous part of this paper, compared with statistical theory, probability graph models, and other means, the advantage of machine learning is the ability to analyze large amounts of original data and to extract characteristics [26]. The implementation process of deep learning is illustrated in Figure 1. For applications of deep learning, a task involving time series analysis is a typical problem. Currently, many researchers are studying algorithms to analyze the characteristics of a time series. An LSTM is a kind of model structure that has many applications. LSTM was first proposed by Sepp Hochreiter and Jürgen Schmidnuber, and its basic framework was developed from the RNN model for time series processing. As is well known, there is always some long-term memory information in a time series. However, for practical applications of the RNN model, it is difficult to deal with longterm dependence due to the gradient disappearance and gradient explosion in the algorithm. Due to the loss of long-term information, the analysis performance of a time series is very limited. LSTM is an extended RNN model proposed in this context to solve problems involving long calculation times and the forgetfulness of long-term information [27]. LSTM belongs to a gated RNN sequence model, and its principle is shown in Figure 2. Through this type of structure setting, a neural network can remove or increase information memory with the help of a gate structure in order to achieve better dependent information learning. In the following proposed method, LSTM is utilized to be the main structure of integrate model. In the training of neural networks, insufficient data leads to overfitting problems, which refer to the phenomenon for which a model has a fixed memory of the training data. This makes the output performance of training data set excellent, while the performance of validation dataset is extremely poor. To solve overfitting problems, the dropout method was proposed [28]. The core idea of dropout is optimizing the network to be thinner by integrating all of the subnetworks via the removal of nonoutput units from the primary network (see Figure 3) and reducing the computation burden with the same training parameters. In this research, dropout was adopted to avoid rote memorization of the training dataset. For applications of deep learning, a task involving time series analysis is a typical problem. Currently, many researchers are studying algorithms to analyze the characteristics of a time series. An LSTM is a kind of model structure that has many applications. LSTM was first proposed by Sepp Hochreiter and Jürgen Schmidnuber, and its basic framework was developed from the RNN model for time series processing. As is well known, there is always some long-term memory information in a time series. However, for practical applications of the RNN model, it is difficult to deal with long-term dependence due to the gradient disappearance and gradient explosion in the algorithm. Due to the loss of long-term information, the analysis performance of a time series is very limited. LSTM is an extended RNN model proposed in this context to solve problems involving long calculation times and the forgetfulness of long-term information [27]. LSTM belongs to a gated RNN sequence model, and its principle is shown in Figure 2. Through this type of structure setting, a neural network can remove or increase information memory with the help of a gate structure in order to achieve better dependent information learning. In the following proposed method, LSTM is utilized to be the main structure of integrate model. For applications of deep learning, a task involving time series analysis is a typical problem. Currently, many researchers are studying algorithms to analyze the characteristics of a time series. An LSTM is a kind of model structure that has many applications. LSTM was first proposed by Sepp Hochreiter and Jürgen Schmidnuber, and its basic framework was developed from the RNN model for time series processing. As is well known, there is always some long-term memory information in a time series. However, for practical applications of the RNN model, it is difficult to deal with longterm dependence due to the gradient disappearance and gradient explosion in the algorithm. Due to the loss of long-term information, the analysis performance of a time series is very limited. LSTM is an extended RNN model proposed in this context to solve problems involving long calculation times and the forgetfulness of long-term information [27]. LSTM belongs to a gated RNN sequence model, and its principle is shown in Figure 2. Through this type of structure setting, a neural network can remove or increase information memory with the help of a gate structure in order to achieve better dependent information learning. In the following proposed method, LSTM is utilized to be the main structure of integrate model. In the training of neural networks, insufficient data leads to overfitting problems, which refer to the phenomenon for which a model has a fixed memory of the training data. This makes the output performance of training data set excellent, while the performance of validation dataset is extremely poor. To solve overfitting problems, the dropout method was proposed [28]. The core idea of dropout is optimizing the network to be thinner by integrating all of the subnetworks via the removal of nonoutput units from the primary network (see Figure 3) and reducing the computation burden with the same training parameters. In this research, dropout was adopted to avoid rote memorization of the training dataset. In the training of neural networks, insufficient data leads to overfitting problems, which refer to the phenomenon for which a model has a fixed memory of the training data. This makes the output performance of training data set excellent, while the performance of validation dataset is extremely poor. To solve overfitting problems, the dropout method was proposed [28]. The core idea of dropout is optimizing the network to be thinner by integrating all of the subnetworks via the removal of non-output units from the primary network (see Figure 3) and reducing the computation burden with the same training parameters. In this research, dropout was adopted to avoid rote memorization of the training dataset.

Deterioration Information Integrate for SOH Prediction
For deterioration information integrate, data collection, processing, prediction and validation are all significant steps. Therefore, before the specific prediction method is given, the whole technological process of deterioration information integrate for SOH prediction is introduced, as seen in Figure 4. Firstly, reasonable acquisition method is designed to collect running data. The monitored objects in collection depend on the state data used for prediction and assessment, including cells and packs. In general, voltage, current, running time, sampling frequency and temperature are all necessary parameters for SOH analysis. After data collection, the original data need to be processed for better performance in following analysis. Secondly, the proposed method is used to predict SOH of packs based on integrated states and ambient indicators of cells, which is realized by deep learning model (LSTM) and corresponding algorithm, such as regularization and optimization algorithms. At the same time, the deep learning model is also used to predict only based on states and ambient indicators of the packs, which is used to compare. At last, there is a comparison between predictions based on state indicators from cells and packs to validate the effectiveness of information integrate. Here we will discuss the theoretical method in order to demonstrate the core ideas firstly. In later sections, the experiment and validation will be set forth follow this process.
In this research, the basic framework of LSTM was adopted to build the SOH prediction model in order to utilize the multi-cell degradation information for interaction to predict the overall SOH of the battery packs. The flow chart of the proposed method according to the basic structure of the deep learning method is shown in Figure 5. Referring to the framework in Figure 5, the specific process can be explained as follows:

Deterioration Information Integrate for SOH Prediction
For deterioration information integrate, data collection, processing, prediction and validation are all significant steps. Therefore, before the specific prediction method is given, the whole technological process of deterioration information integrate for SOH prediction is introduced, as seen in Figure 4. Firstly, reasonable acquisition method is designed to collect running data. The monitored objects in collection depend on the state data used for prediction and assessment, including cells and packs. In general, voltage, current, running time, sampling frequency and temperature are all necessary parameters for SOH analysis. After data collection, the original data need to be processed for better performance in following analysis. Secondly, the proposed method is used to predict SOH of packs based on integrated states and ambient indicators of cells, which is realized by deep learning model (LSTM) and corresponding algorithm, such as regularization and optimization algorithms. At the same time, the deep learning model is also used to predict only based on states and ambient indicators of the packs, which is used to compare. At last, there is a comparison between predictions based on state indicators from cells and packs to validate the effectiveness of information integrate.

Deterioration Information Integrate for SOH Prediction
For deterioration information integrate, data collection, processing, prediction and validation are all significant steps. Therefore, before the specific prediction method is given, the whole technological process of deterioration information integrate for SOH prediction is introduced, as seen in Figure 4. Firstly, reasonable acquisition method is designed to collect running data. The monitored objects in collection depend on the state data used for prediction and assessment, including cells and packs. In general, voltage, current, running time, sampling frequency and temperature are all necessary parameters for SOH analysis. After data collection, the original data need to be processed for better performance in following analysis. Secondly, the proposed method is used to predict SOH of packs based on integrated states and ambient indicators of cells, which is realized by deep learning model (LSTM) and corresponding algorithm, such as regularization and optimization algorithms. At the same time, the deep learning model is also used to predict only based on states and ambient indicators of the packs, which is used to compare. At last, there is a comparison between predictions based on state indicators from cells and packs to validate the effectiveness of information integrate. Here we will discuss the theoretical method in order to demonstrate the core ideas firstly. In later sections, the experiment and validation will be set forth follow this process.
In this research, the basic framework of LSTM was adopted to build the SOH prediction model in order to utilize the multi-cell degradation information for interaction to predict the overall SOH of the battery packs. The flow chart of the proposed method according to the basic structure of the deep learning method is shown in Figure 5. Referring to the framework in Figure 5, the specific process can be explained as follows: Here we will discuss the theoretical method in order to demonstrate the core ideas firstly. In later sections, the experiment and validation will be set forth follow this process.
In this research, the basic framework of LSTM was adopted to build the SOH prediction model in order to utilize the multi-cell degradation information for interaction to predict the overall SOH of the battery packs. The flow chart of the proposed method according to the basic structure of the deep learning method is shown in Figure 5. Referring to the framework in Figure 5, the specific process can be explained as follows: Appl. Sci. 2020, 10, x 6 of 19 Figure 5. Flow chart of the proposed method.

Input Data Preparation
The critical point of overall SOH prediction is to obtain the degradation trend of cells under interaction. This input information can be learned from the earlier degradation process of the battery cells. If ori S represents the training dataset collected in the experiment, the normalized dataset can be expressed as S . In order to provide a clear description, this paper describes the mathematical expressions under the condition that the health status indicators only include the SOH. The health status data of the cells is expressed as   1 2 , , , , , where N is the number of cells located in the battery pack. The health status data of the battery pack is expressed as   1 2 , , , , , is the time moments and T is the collection time. Thus, the input data and the output data structure in this model were chaired. However, considering the generally poor prediction performance of a time series, a large error may have been introduced in the input link. The time label and the ambient temperature were used as auxiliary input data to help estimate the SOH.


Time label: For the circumstance of no external impact, the degradation process of battery cells is consistent. The main deterioration of battery cells can be determined by taking the number of cycles as an estimation benchmark. Here, the time label input is expressed as Similarly, T is the collection time of the time series.  Ambient temperature: A slight fluctuation of the ambient temperature has an influence on the charge and discharge performance of battery cells. Consideration of the ambient temperature

Input Data Preparation
The critical point of overall SOH prediction is to obtain the degradation trend of cells under interaction. This input information can be learned from the earlier degradation process of the battery cells. If S ori represents the training dataset collected in the experiment, the normalized dataset can be expressed as S. In order to provide a clear description, this paper describes the mathematical expressions under the condition that the health status indicators only include the SOH. The health status data of the cells is expressed as S c = (X 1 , X 2 , · · · , X n , · · · , X N ) T , n ∈ N, where N is the number of cells located in the battery pack. The health status data of the battery pack is expressed as where M represents the indicator types of the battery pack. The simplified indicator expression can be written as S P = [Y]. The health status data X n of each battery cell can be expressed as a high-dimensional time series X n = , p ∈ P represents the types (dimensions) of the health status indicators. X p,n = (x n p,1 , x n p,2 , . . . , x n p,t , . . . , x n p,T ) is the captured data for the pth indicator. t ∈ {1, 2, . . . , T} is the time moments and T is the collection time. Thus, the input data and the output data structure in this model were chaired. However, considering the generally poor prediction performance of a time series, a large error may have been introduced in the input link. The time label and the ambient temperature were used as auxiliary input data to help estimate the SOH.

•
Time label: For the circumstance of no external impact, the degradation process of battery cells is consistent. The main deterioration of battery cells can be determined by taking the number of cycles as an estimation benchmark. Here, the time label input is expressed as L = (l 1 , l 2 , . . . , l T ). Similarly, T is the collection time of the time series.
• Ambient temperature: A slight fluctuation of the ambient temperature has an influence on the charge and discharge performance of battery cells. Consideration of the ambient temperature improves the prediction accuracy to a certain extent. Here, the environmental temperature input is expressed as Based on the above description, it can be concluded that the input of the cells prediction model , and the input of the pack prediction model was X . The input mode of the training data for the pack model is shown in Figure 6.
Appl. Sci. 2020, 10, x 7 of 19 improves the prediction accuracy to a certain extent. Here, the environmental temperature input is expressed as   1 2 , , , , , Based on the above description, it can be concluded that the input of the cells prediction model , and the input of the pack prediction model was . The input mode of the training data for the pack model is shown in Figure 6.

Network Model Building
The determination of a network structure depends on dataset form, scale, and optimization result. The network structure proposed in this paper was based on LSTM and the dropout method. This structure is shown in Figure 7.


Input layer: The degradation time series of battery cells is taken as the model input, whose dimension depends on the number of cells located in the battery pack and on the auxiliary indicator parameters (time label, ambient temperature, etc.). In this research, there is only one health status type, and p = 1. Hence, subject to the battery pack structure in the experiment, N = 2 in series and parallel packs, and the input data were set as a matrix including four vectors: Further, for the series-parallel pack, it is a matrix including six vectors with N = 4:

Network Model Building
The determination of a network structure depends on dataset form, scale, and optimization result. The network structure proposed in this paper was based on LSTM and the dropout method. This structure is shown in Figure 7. improves the prediction accuracy to a certain extent. Here, the environmental temperature input is expressed as   1 2 , , , , , Based on the above description, it can be concluded that the input of the cells prediction model , and the input of the pack prediction model was . The input mode of the training data for the pack model is shown in Figure 6.

Network Model Building
The determination of a network structure depends on dataset form, scale, and optimization result. The network structure proposed in this paper was based on LSTM and the dropout method. This structure is shown in Figure 7.


Input layer: The degradation time series of battery cells is taken as the model input, whose dimension depends on the number of cells located in the battery pack and on the auxiliary indicator parameters (time label, ambient temperature, etc.). In this research, there is only one health status type, and p = 1. Hence, subject to the battery pack structure in the experiment, N = 2 in series and parallel packs, and the input data were set as a matrix including four vectors: Further, for the series-parallel pack, it is a matrix including six vectors with N = 4: • Input layer: The degradation time series of battery cells is taken as the model input, whose dimension depends on the number of cells located in the battery pack and on the auxiliary indicator parameters (time label, ambient temperature, etc.). In this research, there is only one health status type, and p = 1. Hence, subject to the battery pack structure in the experiment, N = 2 in series and parallel packs, and the input data were set as a matrix including four vectors: Further, for the series-parallel pack, it is a matrix including six vectors with N = 4: Considering the difficulty and accuracy of the auxiliary indicators, the time and ambient temperature, which were easy to monitor, were selected to reduce the accumulated errors caused by degradation fluctuation.

•
Hidden layer: The internal structure of this model was composed of several LSTM network structures, and the number of layers and neurons depended on the data dimension and scale.
With increase of quantity and scale, a neural network composed of only LSTM is prone to experiencing overfitting, which makes the generalization ability of the model extremely poor. In order to reduce the occurrence of overfitting, dropout was set between every two LSTM layers, and the neurons were randomly discarded and reconstructed.

•
Output layer: The output model was the degradation indicator time series of the battery pack, and its dimension was consistent with the input sequence. The output was the SOH prediction values of the battery pack. Accordingly, the model for deterioration information integrate, which is used to predict status of packs, has been built. The model extracts numerical characteristics of cells related to pack degradation, and expresses the function relations between them to integrate all deteriorate information in cells.

Hyper-Parameter Solution
In the process of model training, the setting of the Loss function was a significant link for the model solution. In this research, the minimum MSE (mean-square error) was adopted as the optimization objective. The specific calculation formula of the MSE is: where X i is the observed real sample sequence andX i is the prediction sample sequence of X i . The MSE between them can be obtained as γ according to the above question. The determination of the model focused on accurately finding the optimal value of the hyper-parameter, which had a direct effect on whether the appropriate model can perform well enough for the dataset. When model training is carried out for the case of a non-convex function, the learning trajectory needs to experience different types of structures and to finally find a region whose local part is a convex bowl. In the past, for such problems, an adaptive gradient algorithm (AdaGrad) algorithm was used to contract the learning rate according to the whole history of the square gradient, but it could make the learning rate reach too small before the convex structure was found. Therefore, the root mean square prop optimizer (RMSprop) algorithm was utilized in this research to optimize the parameters.
The RMSprop algorithm adopts an exponential decay average to forget the previous learning rate history, enabling the algorithm to rapidly converge after finding the target convex structure. The latest gradient value is used to normalize the current gradient value, and it divides the current gradient by the moving average of the RMS gradient [29]. At first, the learning rate at current time is obtained by Equation (8) as follows: where L(θ) is the loss function of the optimization problem, which quantitative the consistency of model output and correct results. θ is parameters of the loss function, influencing the variation characteristics of L(θ). f (θ t ) is the derivative of the loss function L(θ) with respect to the parameter θ at moment t. Secondly, the variation r t at moment t is calculated by: where β is the forgetting factor, typically 0.9 [30]. β is a decay term which controls the influence from history information. By Equation (9), dynamic variable is introduced to learning rate of RMSprop. At last, the derivative can be updated by the following formulas: where v t+1 is moving step of updating parameter θ, which is decided by step size α and decay term β.
Step size α decides the searching speed of parameter updating. By Equation (11), the learning process of RMSprop is changed.

Model Training and Output
Model training is the most critical link for determining the validity and accuracy of a neural network model. In order to obtain satisfying training results in the application scenario described in this paper, the following content needed to be well implemented: • Normalization: Good normalization was necessary for model training. In the application scenario of this paper, in order to efficiently produce the solution of the parameters, the time label and the ambient temperature are transformed into the same numerical magnitude and uniformly normalized. • Dataset division: According to the scale of the data obtained from the experiment, the training dataset, validation dataset, and test dataset were divided with the ratios of 8:1:1 or 7:2:1. These division ratios were considered to provide sufficient training data for the optimization of the model parameters, and a validation dataset no smaller than the test set was used to ensure the training performance. It followed the general rules of dataset division of deep learning. • Data ordering: according to the consistency of characteristics before and after, reordering the data was helpful for including various characteristics in the training data.
After the input data preprocessing, the model was trained. During training, the specific model parameters were determined according to the performance of the training data. The output of model was set as the SOH of the battery pack.

Experiment Design
Due to the complexity and difficulty of monitoring on industrial battery packs, an experiment for battery SOH prediction was developed and conducted in our study in order to validate the method proposed in Section 2. In the experiment, typical structures of battery packs were set up for diverse data, and well-matched equipment were designed to collect running data. In the experiments, 18,650 lithium-ion batteries (Type: ICR18650-22PM) were utilized to conduct the experiment. These batteries were produced by Samsung (Seoul, South Korea). The 18,650 lithium-ion battery is one of the most popular battery types, widely used in various rechargeable electrical products, such as vehicles. The indicators of an ICD18650-22PM lithium-ion battery are listed in Table 1. The consistency of all samples used in the experiment need to be ensured. The samples used in our experiment are referred to as S01-S08 in this paper. In this experiment, measurement equipment was used to capture and record the charge and discharge states of battery packs. Due to the limited capacity of the measurement equipment, the current and voltage of the cells located in the packs could not be measured directly. In order to test the cells at the same time, we designed a peripheral circuit to monitor the current and voltage of the cells. The schematic diagram of the test platform is shown in Figure 8. As demonstrated in Figure 8, the test platform is divided into two parts. Computer A was responsible for data collection and control of cell screening and pack testing. This part was carried out by battery testing system from Neware Company (Shenzhen, China). Computer B was used to conduct data acquisition of cells testing. Cells testing consists of four similar acquisition models, which have the ability to measure and record current and voltage data. The specific pictures about the platform is introduced by Zhuo Wang [9], and here we will not go into details to avoid redundancy.  In this experiment, measurement equipment was used to capture and record the charge and discharge states of battery packs. Due to the limited capacity of the measurement equipment, the current and voltage of the cells located in the packs could not be measured directly. In order to test the cells at the same time, we designed a peripheral circuit to monitor the current and voltage of the cells. The schematic diagram of the test platform is shown in Figure 8. As demonstrated in Figure 8, the test platform is divided into two parts. Computer A was responsible for data collection and control of cell screening and pack testing. This part was carried out by battery testing system from Neware Company (Shenzhen, China). Computer B was used to conduct data acquisition of cells testing. Cells testing consists of four similar acquisition models, which have the ability to measure and record current and voltage data. The specific pictures about the platform is introduced by Zhuo Wang [9], and here we will not go into details to avoid redundancy.

Data Acquisition
The eight cell samples picked from screening were divided into four groups, for which different battery packs were built. The details of the four structures of tested battery packs are listed in Table  2.

Data Acquisition
The eight cell samples picked from screening were divided into four groups, for which different battery packs were built. The details of the four structures of tested battery packs are listed in Table 2. To maintain consistency with a real working environment, the experiments were carried out at room temperature, which means the environment was not controlled during experiments. However, the room temperature was monitored continuously to protect the batteries from thermal shock that might cause inner damage to the cells. Simultaneously, the real-time surface temperatures of the cells were supervised by temperature sensors to provide precise health statuses. The constant current-constant voltage (CC-CV) method was employed to test all of the battery packs.

Primary Analysis
Taking a series-parallel pack as an example, the capacity time series data of the pack and cells (S05~S08) can be seen in Figure 9. On the one hand, just as seen in Figure 9, the deterioration of the batteries is related to the charge and discharge cycles. On the other hand, compared with Figure 10, preliminary conclusions can be drawn that the slight fluctuations of degradation are closely related to the environment temperature in a qualitative fashion.  To maintain consistency with a real working environment, the experiments were carried out at room temperature, which means the environment was not controlled during experiments. However, the room temperature was monitored continuously to protect the batteries from thermal shock that might cause inner damage to the cells. Simultaneously, the real-time surface temperatures of the cells were supervised by temperature sensors to provide precise health statuses. The constant currentconstant voltage (CC-CV) method was employed to test all of the battery packs.

Primary Analysis
Taking a series-parallel pack as an example, the capacity time series data of the pack and cells (S05~S08) can be seen in Figure 9. On the one hand, just as seen in Figure 9, the deterioration of the batteries is related to the charge and discharge cycles. On the other hand, compared with Figure 10, preliminary conclusions can be drawn that the slight fluctuations of degradation are closely related to the environment temperature in a qualitative fashion. We further ensured the relevance of the degradation rate and temperature using a linear correlation analysis. In the same way, a series-parallel pack was chosen as an example. Figure 11   To maintain consistency with a real working environment, the experiments were carried out at room temperature, which means the environment was not controlled during experiments. However, the room temperature was monitored continuously to protect the batteries from thermal shock that might cause inner damage to the cells. Simultaneously, the real-time surface temperatures of the cells were supervised by temperature sensors to provide precise health statuses. The constant currentconstant voltage (CC-CV) method was employed to test all of the battery packs.

Primary Analysis
Taking a series-parallel pack as an example, the capacity time series data of the pack and cells (S05~S08) can be seen in Figure 9. On the one hand, just as seen in Figure 9, the deterioration of the batteries is related to the charge and discharge cycles. On the other hand, compared with Figure 10, preliminary conclusions can be drawn that the slight fluctuations of degradation are closely related to the environment temperature in a qualitative fashion. We further ensured the relevance of the degradation rate and temperature using a linear correlation analysis. In the same way, a series-parallel pack was chosen as an example. Figure 11  We further ensured the relevance of the degradation rate and temperature using a linear correlation analysis. In the same way, a series-parallel pack was chosen as an example. Figure 11 presents the correlation between the deterioration rates and temperature fluctuation at different times. Taking the performance and the access to input data into account, the time label (cycles) and the temperature were both selected to be input indexes for prediction.
Appl. Sci. 2020, 10, x 12 of 19 presents the correlation between the deterioration rates and temperature fluctuation at different times. Taking the performance and the access to input data into account, the time label (cycles) and the temperature were both selected to be input indexes for prediction. Figure 11. Correlation analysis between the deterioration rate and the temperature fluctuation.

Two-Cell Pack Prediction
For a better analysis of the performance of the proposed method, the validation started from the most basic battery structure, a series and parallel pack. Figure 12 displays the samples and measurement circuit for a two-cell pack. To state the advantage of proposed method, we used a prediction result with only pack indicators for comparison. The abbreviations of two methods used for comparison are listed in Table 3.

Two-Cell Pack Prediction
For a better analysis of the performance of the proposed method, the validation started from the most basic battery structure, a series and parallel pack. Figure 12 displays the samples and measurement circuit for a two-cell pack. To state the advantage of proposed method, we used a prediction result with only pack indicators for comparison. The abbreviations of two methods used for comparison are listed in Table 3.
presents the correlation between the deterioration rates and temperature fluctuation at different times. Taking the performance and the access to input data into account, the time label (cycles) and the temperature were both selected to be input indexes for prediction. Figure 11. Correlation analysis between the deterioration rate and the temperature fluctuation.

Two-Cell Pack Prediction
For a better analysis of the performance of the proposed method, the validation started from the most basic battery structure, a series and parallel pack. Figure 12 displays the samples and measurement circuit for a two-cell pack. To state the advantage of proposed method, we used a prediction result with only pack indicators for comparison. The abbreviations of two methods used for comparison are listed in Table 3.   For the simpler structure of series and parallel battery packs, the data fluctuations and rules are relatively stable. Therefore, the prediction performance is excellent in the entire life. The comparison results of the series and parallel pack are demonstrated in Figure 13. For accurate presentation, it only shows the SOH prediction results of different methods at 341-371 cycles except for the training and test data, and emphasizes the error in prediction phases. In addition, because of the disparity between the experiment times, the cycles of the series and parallel pack tests were slightly different. For the simpler structure of series and parallel battery packs, the data fluctuations and rules are relatively stable. Therefore, the prediction performance is excellent in the entire life. The comparison results of the series and parallel pack are demonstrated in Figure 13. For accurate presentation, it only shows the SOH prediction results of different methods at 341-371 cycles except for the training and test data, and emphasizes the error in prediction phases. In addition, because of the disparity between the experiment times, the cycles of the series and parallel pack tests were slightly different.  Figure 13a,b compare the SOH of series and parallel packs based on different methods, and same color lines symbolize the same method. As seen in Figure 13, in general, the SOH curves based on pack-prediction and cells-prediction methods are both closed to the real data. It means the both prediction methods have good performance. Furthermore, the cells-prediction always performs better than pack-prediction method. At the same time, the both curves are higher than real experiment data, which means prediction methods always give optimistic life expectation. It is caused by persistent negative impact between cells, which is getting worse during working. For the series pack in Figure 13 a, the real data fluctuates apparently, which is captured by cells-prediction method better than the pack-prediction method. The pack-prediction curve is too flat to describe the dynamic degradation process. On the contrary, in the parallel pack, there are too many inaccurate fluctuations in pack-prediction at 345, 351, 359, 361 cycles, and so on. It makes the result of packprediction unreliable.
For better comparison of the SOH prediction result, the relative error was adopted to be the indicator showing performances. The relative error was calculated as follows: where RE represents the relative error, ŷ is the prediction value of the SOH, and y is the real value. The comparison result of the relative errors is shown in Figure 14.  Figure 13a,b compare the SOH of series and parallel packs based on different methods, and same color lines symbolize the same method. As seen in Figure 13, in general, the SOH curves based on pack-prediction and cells-prediction methods are both closed to the real data. It means the both prediction methods have good performance. Furthermore, the cells-prediction always performs better than pack-prediction method. At the same time, the both curves are higher than real experiment data, which means prediction methods always give optimistic life expectation. It is caused by persistent negative impact between cells, which is getting worse during working. For the series pack in Figure 13a, the real data fluctuates apparently, which is captured by cells-prediction method better than the pack-prediction method. The pack-prediction curve is too flat to describe the dynamic degradation process. On the contrary, in the parallel pack, there are too many inaccurate fluctuations in pack-prediction at 345, 351, 359, 361 cycles, and so on. It makes the result of pack-prediction unreliable.
For better comparison of the SOH prediction result, the relative error was adopted to be the indicator showing performances. The relative error was calculated as follows: where RE represents the relative error,ŷ is the prediction value of the SOH, and y is the real value. The comparison result of the relative errors is shown in Figure 14. The comparison results in Figure 13 and Figure 14 illustrate the fact that the two prediction methods could both achieve high accuracy in the case of a series and parallel structure. It can be seen visually that the cells-prediction method had a more precise and stable performance. However, because of the plain structure, the interaction between cells was limited, which led to the inconspicuous preponderance of the cells-prediction method.

Series-Parallel Pack Analysis
According to the above elaboration, the cells-prediction method could provide better SOH prediction results under the premise of sufficient information from cells. In order to verify the generalization ability, we provide an application example of a four-cell pack in this section. The structure of a tested pack is shown in Figure 15. Figure 16 shows the comparison of the prediction result.  The comparison results in Figures 13 and 14 illustrate the fact that the two prediction methods could both achieve high accuracy in the case of a series and parallel structure. It can be seen visually that the cells-prediction method had a more precise and stable performance. However, because of the plain structure, the interaction between cells was limited, which led to the inconspicuous preponderance of the cells-prediction method.

Series-Parallel Pack Analysis
According to the above elaboration, the cells-prediction method could provide better SOH prediction results under the premise of sufficient information from cells. In order to verify the generalization ability, we provide an application example of a four-cell pack in this section. The structure of a tested pack is shown in Figure 15. Figure 16 shows the comparison of the prediction result. The comparison results in Figure 13 and Figure 14 illustrate the fact that the two prediction methods could both achieve high accuracy in the case of a series and parallel structure. It can be seen visually that the cells-prediction method had a more precise and stable performance. However, because of the plain structure, the interaction between cells was limited, which led to the inconspicuous preponderance of the cells-prediction method.

Series-Parallel Pack Analysis
According to the above elaboration, the cells-prediction method could provide better SOH prediction results under the premise of sufficient information from cells. In order to verify the generalization ability, we provide an application example of a four-cell pack in this section. The structure of a tested pack is shown in Figure 15. Figure 16 shows the comparison of the prediction result.  Among the 420 cycles data acquired in the experiment, the first 379 cycles (90% of the data) were used to train and construct the model, and the final 42 cycles (10% of data) were used to compare the prediction performance. As shown in Figure 16, in the prediction data, the cells-prediction gave the closest result to the experimental data, for which there were obvious fluctuations in the pack- Among the 420 cycles data acquired in the experiment, the first 379 cycles (90% of the data) were used to train and construct the model, and the final 42 cycles (10% of data) were used to compare the prediction performance. As shown in Figure 16, in the prediction data, the cells-prediction gave the closest result to the experimental data, for which there were obvious fluctuations in the pack-prediction. Through data analysis, it was realized that the reason for this was a strong dependence on the input data caused by input information being too limited. In other words, when the temperature data appeared to fluctuate, the prediction result of the pack-prediction waved in an exaggerated fashion. To confirm this reason, the correlation analysis between the degradation rate and temperature fluctuation is shown in Figure 17. Among the 420 cycles data acquired in the experiment, the first 379 cycles (90% of the data) were used to train and construct the model, and the final 42 cycles (10% of data) were used to compare the prediction performance. As shown in Figure 16, in the prediction data, the cells-prediction gave the closest result to the experimental data, for which there were obvious fluctuations in the packprediction. Through data analysis, it was realized that the reason for this was a strong dependence on the input data caused by input information being too limited. In other words, when the temperature data appeared to fluctuate, the prediction result of the pack-prediction waved in an exaggerated fashion. To confirm this reason, the correlation analysis between the degradation rate and temperature fluctuation is shown in Figure 17. As shown in Figure 17, there was a strong correlation between the degradation rate and the temperature fluctuation. However, as seen in Figure 5, the environment temperature was not the only parameter that had an effect on the degradation process of the battery. Due to the lack of information from the cells, other uncertain and certain influences were unknown. In Figure 17, compared with pack-prediction, the cells-prediction provided efficient fluctuation information and avoided excessive dependency on the limited input. The pack prediction with adequate information from the As shown in Figure 17, there was a strong correlation between the degradation rate and the temperature fluctuation. However, as seen in Figure 5, the environment temperature was not the only parameter that had an effect on the degradation process of the battery. Due to the lack of information from the cells, other uncertain and certain influences were unknown. In Figure 17, compared with pack-prediction, the cells-prediction provided efficient fluctuation information and avoided excessive dependency on the limited input. The pack prediction with adequate information from the cells performed better, while a lack of information brought an exaggerated effect from input variances.

Accuracy Analysis
Based on the previous analysis result in Section 2.3, in order to clarify the evaluation of the effectiveness of the proposed method, the accuracies of multiple predictions are calculated in this section. Further, the differences were still concluded compared with the pack-prediction method. For the stability of the methods, box plots were adopted to assess the accuracy dispersion of many predictions. The box plots mainly focused on the highest and lowest data points, the median, and the first and third quartiles. These indicators are listed in Table 4 as follows. Table 4. Indicators of the box plots.

Index Definitions
The highest datum The highest datum still within 1.5 IQR (interquartile range) of the upper quartile The lowest datum The lowest datum still within 1.5 IQR of the lower quartile

Median
The value separating the higher half from the lower half of a data sample The first quartile The middle value between the smallest value and the median of the data set The third quartile The middle value between the median and the highest value of the data set In Table 4, the term IQR represents the interquartile deviation, which is the difference between the third quartile and first quartile. The error of different scale predictions is compared in Figure 18, and the data are explained in detail in Table 5. The compared scale was selected to be 20, 50, 100, and 400 prediction cycles for a fixed time series. It can be concluded from Figure 18 that there was an obvious disparity between the two prediction methods:


The accuracy of the SOH prediction with fused information from the cells was more stable when accompanied by the increase of the prediction cycles. In the pack-prediction method, the accuracy fluctuated too severely to be determined the best prediction scale.  With the premise that the cells provided sufficient deterioration information, there was less abnormal data for the prediction result. However, with limited input variances in the packprediction method, a large number of outliers occurred in the results (as seen in 400 cycles).  With the increase of the prediction scale, the mean and minimum of the relative errors in the pack-prediction method became better and better. However, in the repeated prediction process, the error fluctuation and abnormal data made it harder to optimize the parameters of the model.
Based on the above analysis, the SOH performance of the battery packs was significantly improved with the premise of the effective fusion of multi-cell degradation information by LSTM. The main factors were the two aspects that follow.


The degradation data of the cells provided more trend and fluctuation information for the evolution process of the battery packs, and it could describe the evolution characteristics more accurately. Conversely, the absence of such information meant that the actual SOH of the cells was not considered, so it would be more difficult to predict and estimate.  With the premise of using less information (such as using only temperature and time labels as input), it could be observed that in the process of model training the strong effect was produced  It can be concluded from Figure 18 that there was an obvious disparity between the two prediction methods:

•
The accuracy of the SOH prediction with fused information from the cells was more stable when accompanied by the increase of the prediction cycles. In the pack-prediction method, the accuracy fluctuated too severely to be determined the best prediction scale.

•
With the premise that the cells provided sufficient deterioration information, there was less abnormal data for the prediction result. However, with limited input variances in the pack-prediction method, a large number of outliers occurred in the results (as seen in 400 cycles).

•
With the increase of the prediction scale, the mean and minimum of the relative errors in the pack-prediction method became better and better. However, in the repeated prediction process, the error fluctuation and abnormal data made it harder to optimize the parameters of the model.
Based on the above analysis, the SOH performance of the battery packs was significantly improved with the premise of the effective fusion of multi-cell degradation information by LSTM. The main factors were the two aspects that follow.

•
The degradation data of the cells provided more trend and fluctuation information for the evolution process of the battery packs, and it could describe the evolution characteristics more accurately. Conversely, the absence of such information meant that the actual SOH of the cells was not considered, so it would be more difficult to predict and estimate.

•
With the premise of using less information (such as using only temperature and time labels as input), it could be observed that in the process of model training the strong effect was produced by the existing limited information (e.g., temperature). The SOH prediction result of the battery pack only depended on the change of temperature, and the one-sidedness was obviously enhanced. As the number of cells increased, this defect became more severe.

Conclusions
To solve the problem of insufficient background information from the interaction degradation of cells in SOH prediction, this paper proposed a modified data-driven method and regulation for the SOH prediction of a lithium-ion battery pack based on deep learning, with the fusion of multi-cell deterioration information. With experimental data analysis and validation, it was demonstrated that compared to SOH prediction using only an overall indicator of the packs, the proposed method performed better for future trend forecasting. The proposed method also describes the relationship between degradation process of cells and packs with neural networks. The proposed method mines the deep influence from cells on a battery pack, and improves the input quantity of the information. Because of this, the method improves the prediction accuracy and stability efficiently along with providing internal information on battery pack structures, and prevents the strong dependency on limited external factors.
Take advantages of the development of machine learning, data-driven methods bring a qualitative improvement to SOH management and prediction in order to help improve the application quality of a power lithium-ion battery. However, in applications, there are still some problems need to solve. On the one hand, even if we proposed ideal model to predict the SOH of battery packs, the performances of the model still depend on the parameters choices facing to different training data, which is a common problem in machine learning method. On the other hand, the model adopted should be improved and optimized with the development of deep learning technology. More advantageous data-driven analysis methods are preferred to be selected in our future research. What is certain is that according to the ideas and method proposed in this paper, the performance will be gradually optimized with adaptive research carried out in the future.

Conflicts of Interest:
The authors declare no conflict of interest.