Online Capacity Estimation for Lithium-Ion Batteries Based on Semi-Supervised Convolutional Neural Network

: Accurate capacity estimation can ensure the safe and reliable operation of lithium-ion batteries in practical applications. Recently, deep learning-based capacity estimation methods have demonstrated impressive advances. However, such methods suffer from limited labeled data for training, i.e., the capacity ground-truth of lithium-ion batteries. A capacity estimation method is proposed based on a semi-supervised convolutional neural network (SS-CNN). This method can automatically extract features from battery partial-charge information for capacity estimation. Furthermore, a semi-supervised training strategy is developed to take advantage of the extra unlabeled sample, which can improve the generalization of the model and the accuracy of capacity estimation even in the presence of limited labeled data. Compared with artiﬁcial neural networks and convolutional neural networks, the proposed method is demonstrated to improve capacity estimation accuracy.


Introduction
Due to high power density, low self-discharge rate, and long service life, lithium-ion batteries are widely used as energy storage devices for various applications such as smart grids, electric vehicles, etc. The stabilization of lithium-ion batteries is the cornerstone of the safety and reliability of the entire system. To improve stabilization and prevent severe accidents through the use of lithium-ion batteries, good battery management systems (BMSs) for safety monitoring and timely maintenance are in great demand [1,2]. Regarding BMSs, various sensors and algorithms are adopted to improve performance. Particularly, a battery capacity estimation, which provides rich information on batteries [3], is the essential element of BMSs.
The capacity estimation method for lithium-ion batteries can be divided into modelbased and data-driven methods in general [4]. Model-based methods yield estimation by identifying the model parameters of the battery (e.g., equivalent circuit model and electrochemical model, etc.) [5][6][7]. However, this these methods require precise models that are not trivial in practice. Data-driven methods attempt to estimate the capacity of batteries using a two-step fashion, feature extraction and machine-learning based regression [8,9]. In the first step, available features that can indicate battery degradation are extracted based on the operation data, such as the slope of the charging curve [10], the time interval of an equal voltage difference [11], the incremental capacity [12], or the differential temperature [13]. Then, machine learning methods, such as the support vector machine (SVM) [8], Gaussian process regression (GPR) [14], or random forest (RF) [15], are used to model the relationship between the capacity and features. Data-driven methods reduce the dependence on precise battery models. They still have potential in terms of their generalization ability and capacity estimation accuracy.
Recently, deep learning methods have attracted significant attention due to their capability of automatic feature extraction and good generalization performance [16]. The seminal works of deep learning methods for battery capacity estimation, including long short term memory (LSTM) [17], convolutional neural network (CNN) [18], etc., provide huge advantages. Even though deep learning methods achieve compelling battery capacity estimation results, they suffer from limited training data as a large amount of labeled data is needed for training. However, data annotation is costly, time-consuming, and even impractical under specific working conditions for batteries.
In this paper, we proposed a battery capacity estimation approach based on a semisupervised convolutional neural network (SS-CNN). The key contribution is introducing unsupervised learning into the CNN-based method to reduce the dependency of labeled data. The basis of our method is a CNN with battery local charging information as the input. Firstly, unsupervised pre-training of the model is performed based on unlabeled battery samples. Secondly, the model is trained under supervision based on a small number of labeled samples. The experiments show that our SS-CNN method not only maintains annotation data, but also improves the CNN model's generalization ability and the accuracy of capacity estimation.

Input and Output Structures
In order to establish the lithium-ion battery capacity estimation model, it is necessary to construct the input and output vectors {x i , y i } of the SS-CNN. The output vector is the discharge capacity (i.e., y i = Q i ). The input vector is obtained from the battery monitoring signal.
The battery usually works under dynamic discharging conditions, while the charging method follows a standard procedure. Therefore, we use partial charge information to construct the input vector. The structure of the input vector is shown in Figure 1. To make full use of the charging information, the voltage (V), current (I), and charging capacity (C) are used to construct the input vector. The initial voltage (V 1 = V initial ) is selected according to the depth of discharge. Then, the charging data from t 1 (time corresponding to V 1 ) to a fixed length of time interval (t L ) are used to build the input vector, which is defined as: where, V l and I l are the charging voltage and current at the l time interval, respectively. C l denotes the charging capacity from t 1 to t l , which is calculated using the coulomb counting method C l = t l t 1 Idt.

Design of the SS-CNN
The basis of the SS-CNN is a CNN. By introducing the convolution operation, which naturally supports processing on multiple input signals, CNN has better performance, especially for multi-channel input based battery capacity estimation systems.
However, the hyperparameters of the CNN are randomly initialized before training, which may lead to local optimization. Hinton et al. proposed a method to initialize the network by unsupervised pre-training, called the autoencoder model [19]. An autoencoder directly learns features by encoding and decoding the input vector, and then minimizes the error between the reconstructed and the original signal. Inspired by the unsupervised training from the autoencoder, this paper introduces the unsupervised mechanism to pre-train the CNN using a large amount of unlabeled data. The pipeline of the proposed SS-CNN is shown in Figure 2.

Design of the SS-CNN
The basis of the SS-CNN is a CNN. By introducing the convolution operation, which naturally supports processing on multiple input signals, CNN has better performance, especially for multi-channel input based battery capacity estimation systems.
However, the hyperparameters of the CNN are randomly initialized before training, which may lead to local optimization. Hinton et al. proposed a method to initialize the network by unsupervised pre-training, called the autoencoder model [19]. An autoencoder directly learns features by encoding and decoding the input vector, and then minimizes the error between the reconstructed and the original signal. Inspired by the unsupervised training from the autoencoder, this paper introduces the unsupervised mechanism to pre-train the CNN using a large amount of unlabeled data. The pipeline of the proposed SS-CNN is shown in Figure 2.

Design of the SS-CNN
The basis of the SS-CNN is a CNN. By introducing the convolution operation naturally supports processing on multiple input signals, CNN has better perfo especially for multi-channel input based battery capacity estimation systems.
However, the hyperparameters of the CNN are randomly initialized before t which may lead to local optimization. Hinton et al. proposed a method to initia network by unsupervised pre-training, called the autoencoder model [19]. An coder directly learns features by encoding and decoding the input vector, and th mizes the error between the reconstructed and the original signal. Inspired by th pervised training from the autoencoder, this paper introduces the unsupervised nism to pre-train the CNN using a large amount of unlabeled data. The pipelin proposed SS-CNN is shown in Figure 2.  Figure 2a, i.e., a convolutional coder, deconvolutional decoder, and regression branch. As shown in Figure 2b, the tr ing scheme includes three steps. First, unsupervised training is performed using the c volutional encoder and deconvolutional decoder by minimizing the reconstruction er Thus, the convolutional encoder branch can be pre-trained without using any lab data. Then, the convolution encoder is frozen to the train regression branch using labe data. Finally, the weights of the convolutional encoder and regression branch are f tuned under supervised training.
Specifically, in the SS-CNN model, supposing that the input vector is , the ou vector is ̂, and represents the ground-truth of the output vector. The labeled dat consists of N samples, i.e., = { 1 , ⋯ , ⋯ , }, and = { 1 , ⋯ , ⋯ , }. The unlab dataset contains M samples, which has the input vector ′ = { +1 , +2 , ⋯ , + }. CNN contains a set of convolutional layers, pooling layers and fully connected lay which can be described by the hypothesis function ℎ( ): denotes the outputs of the hidden layers, l represents the layer of the network, the parameters of the ℎ( ) need to be identified during the train process, and are composed of weights and bias. The hypothesis function can be expres as: where ( ) and represent the nth input and the corresponding unknown weights spectively. b0 is the bias while (0) = 1.
The training process of SS-CNN is used to identify the weights and bias of the ℎ( To identify these parameters, we define the loss functions ℒ to measure the differen between the model predictions and ground-truth. Stochastic gradient descent (SGD) w momentum is used to iteratively optimize the loss functions. During each iteration, ̅ be updated as: where ̂ is the gradient of ℒ, ̅ is the parameter in the jth iteration, denotes the o mized momentum, which can accelerate the change of the gradient vector in the relev SS-CNN consists of three sub-networks shown in Figure 2a, i.e., a convolutional encoder, deconvolutional decoder, and regression branch. As shown in Figure 2b, the training scheme includes three steps. First, unsupervised training is performed using the convolutional encoder and deconvolutional decoder by minimizing the reconstruction error. Thus, the convolutional encoder branch can be pre-trained without using any labeled data. Then, the convolution encoder is frozen to the train regression branch using labeled data. Finally, the weights of the convolutional encoder and regression branch are fine-tuned under supervised training.
Specifically, in the SS-CNN model, supposing that the input vector is x i , the output vector isŷ i , and y i represents the ground-truth of the output vector. The labeled dataset consists of N samples, i.e., X = {x 1 , · · · x i , · · · , x N }, and Y = {y 1 , · · · y i , · · · , y N }. The unlabeled dataset contains M samples, which has the input vector X = {x N+1 , x N+2 , · · · , x N+M }. SS-CNN contains a set of convolutional layers, pooling layers and fully connected layers, which can be described by the hypothesis function h(x i ): denotes the outputs of the hidden layers, l represents the lth layer of the network, the parameters of the h(x i ) need to be identified during the training process, and are composed of weights and bias. The hypothesis function can be expressed as: and ω n represent the nth input and the corresponding unknown weights, The training process of SS-CNN is used to identify the weights and bias of the h(x i ). To identify these parameters, we define the loss functions L to measure the differences between the model predictions and ground-truth. Stochastic gradient descent (SGD) with momentum is used to iteratively optimize the loss functions. During each iteration, θ can be updated as: whereĝ is the gradient of L, θ j is the parameter in the jth iteration, γ denotes the optimized momentum, which can accelerate the change of the gradient vector in the relevant direction.
λ is the weight coefficient, α is the learning rate, which determines the step size at each iteration while moving toward the minimum of the loss function.

Design of the Training Strategy
The training process of the SS-CNN model consists of three steps: (1) unsupervised reconstruction (named as SS-CNN-S1), (2) supervised regression (named as SS-CNN-S2), and (3) supervised fine-tuning (named as SS-CNN-S3), as shown in Figure 2b. In this section, we will introduce each step and the corresponding loss functions we designed in detail.

Unsupervised Reconstruction
The purpose of this step is to use large amount of unlabeled data to obtain the convolutional coding branch parameters, used to extract effective features. In the reconstruction training step, the regression estimation branch is frozen. The input variable x i is convolutionally encoded to obtain the hidden variable z i , and then a deconvolutional decoder is used to reconstruct the input variable, denoted asx i . The loss function L s1 is used to minimize the error between the input x i and the reconstructed inputx i . Then, the parameters for convolutional coding and deconvolutional decoding can be obtained. The loss function is defined as: where, L x is the reconstructed constraint term, which is used to constrain the reconstruction inputx i as similar as possible to the input data x i .
L KL is the KL (Kullback-Leibler) divergence constraint term: where, µ and σ are the mean and variance of the distribution which hidden variable z i is subjected to, respectively. The KL divergence can encourage the diversity of features to improve the generalization of the network. λ KL is the weight of the KL divergence constraint term. L R is the regularization term, which is used to prevent from network overfitting: where, λ R is the weight of the regularization term.

Supervised Regression
In the supervised regression stage, we freeze the convolutional encoder branch and deconvolutional decoder branch to train only the regression branch. Using the trained weights of convolutional encoder branch in the last stage, input x i can be inferenced to yield hidden variable z i . As z i can yieldŷ i using regression branch, we useŷ i and its corresponding ground-truth y i to formulate a supervised loss L s2 . Parameters of regression branch can be trained using L s2 . Note that, as the convolutional encoder branch is frozen, its parameters are not updated during training in this stage. The loss function L s2 is defined as: where, L r is the regression term, used to constraint the network output close to groundtruth. L r is defined as:

Supervised Fine-Tuning
Using the previous two training stages, the parameters of the convolutional encoder branch and regression branch are updated, respectively. In this last supervised fine-tuning stage, we use L s2 to jointly train the two branches to obtain a better regression model. Note that the deconvolutional decoder branch is not used in this stage.

Battery Dataset
The dataset from NASA [20] is employed to investigate the performance of the SS-CNN capacity estimation model. Four sets of batteries with 2 Ah nominal capacity were cycled under different operating conditions. In our experiment, the test data from the first set with four batteries (batteries #5, #6, #7, and #18) are set as the labeled data (627 samples). These batteries were fully charged with the standard charging method, and then discharged under a 1C rate (2A) current. The discharge cut-off voltages are 2.7 V, 2.5 V, 2.2 V, and 2.5 V, respectively. The discharge capacity is calculated to 2.7 V as the ground-truth. The change of the battery capacity with the cycle is shown in Figure 3. The remaining three sets of battery data are utilized as the unlabeled samples (747 samples), which means only the battery charging current and voltage are known while the discharge capacity under each cycle is unknown. Using the previous two training stages, the parameters of the convolutional encoder branch and regression branch are updated, respectively. In this last supervised fine-tuning stage, we use ℒ 2 to jointly train the two branches to obtain a better regression model. Note that the deconvolutional decoder branch is not used in this stage.

Battery Dataset
The dataset from NASA [20] is employed to investigate the performance of the SS-CNN capacity estimation model. Four sets of batteries with 2 Ah nominal capacity were cycled under different operating conditions. In our experiment, the test data from the first set with four batteries (batteries #5, #6, #7, and #18) are set as the labeled data (627 samples). These batteries were fully charged with the standard charging method, and then discharged under a 1C rate (2A) current. The discharge cut-off voltages are 2.7 V, 2.5 V, 2.2 V, and 2.5 V, respectively. The discharge capacity is calculated to 2.7 V as the groundtruth. The change of the battery capacity with the cycle is shown in Figure 3. The remaining three sets of battery data are utilized as the unlabeled samples (747 samples), which means only the battery charging current and voltage are known while the discharge capacity under each cycle is unknown.

Capacity Estimation Results
In order to evaluate the performance of the SS-CNN capacity estimation method, the artificial neural network (NN) and conventional CNN are also implemented to compare with SS-CNN. Considering the operating range and overall sampling data, the starting charging voltage is selected as 3.8 V. The subsequent 3000 s voltage, current, and capacity data are used as the model input. Thus, the size of the input for CNN and SS-CNN are both 3000 × 3. The structure for the CNN is consistent with the SS-CNN, but without semisupervised branches. For the NN model, two hidden layers are used, and the network structure is 9000-(256-128)-1. The learning rate for NN and CNN is 0.01, the number of

Capacity Estimation Results
In order to evaluate the performance of the SS-CNN capacity estimation method, the artificial neural network (NN) and conventional CNN are also implemented to compare with SS-CNN. Considering the operating range and overall sampling data, the starting charging voltage is selected as 3.8 V. The subsequent 3000 s voltage, current, and capacity data are used as the model input. Thus, the size of the input for CNN and SS-CNN are both 3000 × 3. The structure for the CNN is consistent with the SS-CNN, but without semi-supervised branches. For the NN model, two hidden layers are used, and the network structure is 9000-(256-128)-1. The learning rate for NN and CNN is 0.01, the number of iterations is 35, and the batch size is 64. The cross-validation method is used for a performance analysis, that is, one battery among all the four batteries (batteries #5, #6, #7, and #18) is selected as the test sample, and the remaining three batteries are used as the labeled training sample. For the SS-CNN model, additional unlabeled training samples are used to pre-train the network training. The average value of 10 repetitions is calculated as the final result for all methods. Root mean square errors (RMSE), mean absolute error (MAE), and maximum relative error (MaxRE) are utilized to evaluate the capacity estimation performance. The expressions of these metrics are shown as follows: where, Q i,est is the average estimation value of 10 repetitions, Q i,true is the true capacity value, and n is the number of samples. The capacity estimation results for the three methods are listed in Table 1, and the estimation results for battery #7 are shown in Figure 4. According to Table 1 and Figure 4, all methods can accurately estimate the capacity for different batteries, which verifies the effectiveness of capacity estimation based on partial charge information. The capacity estimation results based on CNN and SS-CNN outperform the NN model for the same battery. For example, for battery #5, the RMSE based on NN model is 1.2983%, while the RMSEs for CNN and SS-CNN are 1.1349% and 0.7382%, respectively. These results indicate that, compared to the artificial neural network, deep networks can extract the hidden features better, thereby improving the accuracy of capacity estimation. Furthermore, the SS-CNN-S2 model is effectively achieved through unsupervised and supervised training, so the estimation error is smaller than the CNN model which only uses label training data. In addition, compared with the SS-CNN-S2 stage, the SS-CNN-S3 stage adds a fine-tuning training step, which further improves the estimation performance.

Effect of the Starting Charge Voltage
Lithium-ion batteries usually work under partial discharge conditions, which means the starting charge voltage used to construct the input vector varies under practical applications. Therefore, the effect of the starting charging voltage on the estimation results is investigated. Different starting voltages (3.7 V, 3.75 V, 3.8 V, 3.85 V, and 3.9 V) are selected to represent different depths of discharge before charging. Then, the corresponding input vector of the model is constructed to estimate the capacity. Table 2 shows the capacity estimation results for battery #7 based on different charging starting voltages. It can be seen from Table 2 that the MaxRE among all different starting charging voltages is 3.5384%, which is less than 5%. This means the proposed method can accurately estimate the battery capacity regardless of whether the battery is under deep or shallow

Effect of the Starting Charge Voltage
Lithium-ion batteries usually work under partial discharge conditions, which means the starting charge voltage used to construct the input vector varies under practical applications. Therefore, the effect of the starting charging voltage on the estimation results is investigated. Different starting voltages (3.7 V, 3.75 V, 3.8 V, 3.85 V, and 3.9 V) are selected to represent different depths of discharge before charging. Then, the corresponding input vector of the model is constructed to estimate the capacity. Table 2 shows the capacity estimation results for battery #7 based on different charging starting voltages. It can be seen from Table 2 that the MaxRE among all different starting charging voltages is 3.5384%, which is less than 5%. This means the proposed method can accurately estimate the battery capacity regardless of whether the battery is under deep or shallow discharge conditions. However, the overall performance with 3.7-3.8 V is better than that with 3.85-3.9 V. To further explain this, Figure 5 shows the evolution of the voltage curves over the life of the battery with 3.7 V, 3.8 V, and 3.9 V as the starting voltages, respectively. The colors range from light to dark as the capacity decreases. When the starting voltages are 3.7 V and 3.8 V, most of the voltage information used to construct the input vector originates from the constant current charging step, and the voltage curve changes significantly with the decrease in the capacity. However, the voltage curves have relatively small changes with the starting voltage of 3.9 V. This indicates that phase transitions of the battery electrodes may occur during the constant current charging step, which is closely related to the battery degradation [21]. Hence, the constant current charging voltage is regarded as an important area for identifying the battery degradation, which provides better performance when starting voltage is less than 3.8 V. originates from the constant current charging step, and the voltage curve changes significantly with the decrease in the capacity. However, the voltage curves have relatively small changes with the starting voltage of 3.9 V. This indicates that phase transitions of the battery electrodes may occur during the constant current charging step, which is closely related to the battery degradation [21]. Hence, the constant current charging voltage is regarded as an important area for identifying the battery degradation, which provides better performance when starting voltage is less than 3.8 V.

Effect of the Training Sample Size
The key objective of our proposed SS-CNN is to pre-train the model with a large amount of unlabeled data, and then to train the model using the relatively small amount of labeled data. Thus, to thoroughly investigate the performance of the proposed SS-CNN model, we trained the model with different size of labeled and unlabeled samples.

Different Sizes of Unlabeled Samples
We randomly selected 10%, 20%, …, 100% of the total unlabeled samples to pre-train the model, respectively. Then, for each model, the supervised training was performed based on all the labeled samples. The effect of the unlabeled sample size on the overall test performance is shown in Figure 6. It can be seen that, with the increase of the unlabeled sample size, the capacity estimation performance clearly improves. This suggests that better latent features can be extracted in the convolutional encoder-decoder with unlabeled samples. Thus, more unlabeled samples bring better latent features and finally higher capacity estimation accuracy. However, the increase of unlabeled data has an upper limit. In our experiments, along with the percentage of the unlabeled samples increasing, we found that the capacity estimation error reaches the lower bound with small shaking when unlabeled samples are over 60%. Since the unlabeled samples mainly contribute to the latent features extraction, samples become redundant but present with noises when the latent features are well extracted.

Effect of the Training Sample Size
The key objective of our proposed SS-CNN is to pre-train the model with a large amount of unlabeled data, and then to train the model using the relatively small amount of labeled data. Thus, to thoroughly investigate the performance of the proposed SS-CNN model, we trained the model with different size of labeled and unlabeled samples.

Different Sizes of Unlabeled Samples
We randomly selected 10%, 20%, . . . , 100% of the total unlabeled samples to pre-train the model, respectively. Then, for each model, the supervised training was performed based on all the labeled samples. The effect of the unlabeled sample size on the overall test performance is shown in Figure 6. It can be seen that, with the increase of the unlabeled sample size, the capacity estimation performance clearly improves. This suggests that better latent features can be extracted in the convolutional encoder-decoder with unlabeled samples. Thus, more unlabeled samples bring better latent features and finally higher capacity estimation accuracy. However, the increase of unlabeled data has an upper limit. In our experiments, along with the percentage of the unlabeled samples increasing, we found that the capacity estimation error reaches the lower bound with small shaking when unlabeled samples are over 60%. Since the unlabeled samples mainly contribute to the latent features extraction, samples become redundant but present with noises when the latent features are well extracted.

Different Sizes of Labeled Samples
To study the effectiveness of labeled sample size, we randomly selected 10%, 20%, …, 100% of the total labeled samples for supervised training. Figure 7 shows the capacity estimation results based on different labeled samples. It can be seen from Figure 7 that a satisfactory accuracy (MaxRE less than 5%) can be obtained even with 10% labeled samples (63 sets of samples). These results further demonstrate that unsupervised training, based on unlabeled samples, can effectively train the convolutional network for extracting battery degradation features. Therefore, only a small number of labeled samples are required for subsequent training.

Different Sizes of Labeled Samples
To study the effectiveness of labeled sample size, we randomly selected 10%, 20%, . . . , 100% of the total labeled samples for supervised training. Figure 7 shows the capacity estimation results based on different labeled samples. It can be seen from Figure 7 that a satisfactory accuracy (MaxRE less than 5%) can be obtained even with 10% labeled samples (63 sets of samples). These results further demonstrate that unsupervised training, based on unlabeled samples, can effectively train the convolutional network for extracting battery degradation features. Therefore, only a small number of labeled samples are required for subsequent training.

Different Sizes of Labeled Samples
To study the effectiveness of labeled sample size, we randomly selected 10%, 20%, …, 100% of the total labeled samples for supervised training. Figure 7 shows the capacity estimation results based on different labeled samples. It can be seen from Figure 7 that a satisfactory accuracy (MaxRE less than 5%) can be obtained even with 10% labeled samples (63 sets of samples). These results further demonstrate that unsupervised training, based on unlabeled samples, can effectively train the convolutional network for extracting battery degradation features. Therefore, only a small number of labeled samples are required for subsequent training.  When the labeled sample size increases from 10% to 70%, the capacity estimation error correspondingly decreases, indicating that more labeled samples can effectively improve the generalization of the model. When the labeled sample exceeds 70%, the capacity estimation performance shows a fluctuating trend with increasing samples. This may be caused by the inherent differences between the training batteries and test batteries. In addition, the cut-off voltage of the four batteries during the cycling test are also different. Thus, employing too many labeled samples may also introduce random errors to model training, and may prevent further improvements of the model.

Conclusions
This paper proposed a capacity estimation method for lithium-ion batteries, based on partial charge information and SS-CNN. The main contributions of this method are as follows: (1) By taking the incomplete charging and discharging state of lithium-ion battery into account, a partial charge information selection method is proposed to construct the capacity estimation model input, which not only considers the actual working conditions, but also avoids the inconsistency of the input data shape under different states; (2) In view of the uncertainty and complexity in the traditional feature extraction process, the advantages of a CNN deep learning network in automatic feature extraction are fully utilized. The feature information, related to battery degradation, is directly mined from the original charging data. Then, the relationship between the original charging information and capacity is automatically constructed; (3) Considering the problem of small-battery capacity annotated samples in practical application, the concept of unsupervised learning is integrated into CNN. Combined with the unsupervised pre-training of the auto-encoder and the supervised regression branch, our method outperforms typical CNN with regard to generalization ability and the accuracy of capacity estimation.